kernel/rcu/tasks.h:1031:6: warning: no previous prototype for 'show_rcu_tasks_gp_kthreads'
Hi Paul, FYI, the error/warning still remains. tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master head: 7af08140979a6e7e12b78c93b8625c8d25b084e2 commit: e21408ceec2de5be418efa39feb1e2c00f824a72 rcu-tasks: Add RCU tasks to rcutorture writer stall output date: 12 months ago config: h8300-allnoconfig (attached as .config) compiler: h8300-linux-gcc (GCC) 9.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e21408ceec2de5be418efa39feb1e2c00f824a72 git remote add linus https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git git fetch --no-tags linus master git checkout e21408ceec2de5be418efa39feb1e2c00f824a72 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross W=1 ARCH=h8300 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All warnings (new ones prefixed by >>): In file included from kernel/rcu/update.c:587: >> kernel/rcu/tasks.h:1031:6: warning: no previous prototype for >> 'show_rcu_tasks_gp_kthreads' [-Wmissing-prototypes] 1031 | void show_rcu_tasks_gp_kthreads(void) {} | ^~ vim +/show_rcu_tasks_gp_kthreads +1031 kernel/rcu/tasks.h 1028 1029 #else /* #ifdef CONFIG_TASKS_RCU_GENERIC */ 1030 static inline void rcu_tasks_bootup_oddness(void) {} > 1031 void show_rcu_tasks_gp_kthreads(void) {} --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip
Re: Doubt regarding memory allocation in KVM
On 20/04/21 07:45, Shivank Garg wrote: Hi, I'm learning about qemu KVM, looking into code and experimenting on it. I have the following doubts regarding it, I would be grateful if you help me to get some idea on them. 1. I observe that KVM allocates memory to guests when it needs it but doesn't take it back (except for ballooning case). Also, the Qemu/KVM process does not free the memory even when the guest is rebooted. In this case, Does the Guest VM get access to memory already pre-filled with some garbage from the previous run?? Yes. (Since the host would allocate zeroed pages to guests the first time it requests but after that it's up to guests). Can it be a security issue? No, it's the same that happens on non-virtual machine. 2. How does the KVM know if GPFN (guest physical frame number) is backed by an actual machine frame number in host? If not mapped, then it faults in the host and allocates a physical frame for guests in the host. (kvm_mmu_page_fault) It's all handled by Linux. KVM only does a call to get_user_pages. See functions whose name starts with hva_to_pfn in virt/kvm/kvm_main.c Given a GPA, the GFN is simply the guest physical address minus bits 0:11, so shifted right by 12. Paolo
Re: [PATCH v1 2/2] powerpc: Enable OPTPROBES on PPC32
Christophe Leroy wrote: For that, create a 32 bits version of patch_imm64_load_insns() and create a patch_imm_load_insns() which calls patch_imm32_load_insns() on PPC32 and patch_imm64_load_insns() on PPC64. Adapt optprobes_head.S for PPC32. Use PPC_LL/PPC_STL macros instead of raw ld/std, opt out things linked to paca and use stmw/lmw to save/restore registers. Signed-off-by: Christophe Leroy --- arch/powerpc/Kconfig | 2 +- arch/powerpc/kernel/optprobes.c | 24 +-- arch/powerpc/kernel/optprobes_head.S | 46 +++- 3 files changed, 53 insertions(+), 19 deletions(-) Thanks for adding support for ppc32. It is good to see that it works without too many changes. diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index c1344c05226c..49b538e54efb 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -227,7 +227,7 @@ config PPC select HAVE_MOD_ARCH_SPECIFIC select HAVE_NMI if PERF_EVENTS || (PPC64 && PPC_BOOK3S) select HAVE_HARDLOCKUP_DETECTOR_ARCHif PPC64 && PPC_BOOK3S && SMP - select HAVE_OPTPROBES if PPC64 + select HAVE_OPTPROBES select HAVE_PERF_EVENTS select HAVE_PERF_EVENTS_NMI if PPC64 select HAVE_HARDLOCKUP_DETECTOR_PERFif PERF_EVENTS && HAVE_PERF_EVENTS_NMI && !HAVE_HARDLOCKUP_DETECTOR_ARCH diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c index 58fdb9f66e0f..cdf87086fa33 100644 --- a/arch/powerpc/kernel/optprobes.c +++ b/arch/powerpc/kernel/optprobes.c @@ -141,11 +141,21 @@ void arch_remove_optimized_kprobe(struct optimized_kprobe *op) } } +static void patch_imm32_load_insns(unsigned long val, int reg, kprobe_opcode_t *addr) +{ + patch_instruction((struct ppc_inst *)addr, + ppc_inst(PPC_RAW_LIS(reg, IMM_H(val; + addr++; + + patch_instruction((struct ppc_inst *)addr, + ppc_inst(PPC_RAW_ORI(reg, reg, IMM_L(val; +} + /* * Generate instructions to load provided immediate 64-bit value * to register 'reg' and patch these instructions at 'addr'. */ -static void patch_imm64_load_insns(unsigned long val, int reg, kprobe_opcode_t *addr) +static void patch_imm64_load_insns(unsigned long long val, int reg, kprobe_opcode_t *addr) Do you really need this? { /* lis reg,(op)@highest */ patch_instruction((struct ppc_inst *)addr, @@ -177,6 +187,14 @@ static void patch_imm64_load_insns(unsigned long val, int reg, kprobe_opcode_t * ___PPC_RS(reg) | (val & 0x))); } +static void patch_imm_load_insns(unsigned long val, int reg, kprobe_opcode_t *addr) +{ + if (IS_ENABLED(CONFIG_PPC64)) + patch_imm64_load_insns(val, reg, addr); + else + patch_imm32_load_insns(val, reg, addr); +} + int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *p) { struct ppc_inst branch_op_callback, branch_emulate_step, temp; @@ -230,7 +248,7 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *p) * Fixup the template with instructions to: * 1. load the address of the actual probepoint */ - patch_imm64_load_insns((unsigned long)op, 3, buff + TMPL_OP_IDX); + patch_imm_load_insns((unsigned long)op, 3, buff + TMPL_OP_IDX); /* * 2. branch to optimized_callback() and emulate_step() @@ -264,7 +282,7 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *p) * 3. load instruction to be emulated into relevant register, and */ temp = ppc_inst_read((struct ppc_inst *)p->ainsn.insn); - patch_imm64_load_insns(ppc_inst_as_ulong(temp), 4, buff + TMPL_INSN_IDX); + patch_imm_load_insns(ppc_inst_as_ulong(temp), 4, buff + TMPL_INSN_IDX); /* * 4. branch back from trampoline diff --git a/arch/powerpc/kernel/optprobes_head.S b/arch/powerpc/kernel/optprobes_head.S index ff8ba4d3824d..49f31e554573 100644 --- a/arch/powerpc/kernel/optprobes_head.S +++ b/arch/powerpc/kernel/optprobes_head.S @@ -30,39 +30,47 @@ optinsn_slot: .global optprobe_template_entry optprobe_template_entry: /* Create an in-memory pt_regs */ - stdur1,-INT_FRAME_SIZE(r1) + PPC_STLUr1,-INT_FRAME_SIZE(r1) SAVE_GPR(0,r1) /* Save the previous SP into stack */ addir0,r1,INT_FRAME_SIZE - std r0,GPR1(r1) + PPC_STL r0,GPR1(r1) +#ifdef CONFIG_PPC64 SAVE_10GPRS(2,r1) SAVE_10GPRS(12,r1) SAVE_10GPRS(22,r1) +#else + stmwr2, GPR2(r1) +#endif /* Save SPRS */ mfmsr r5 - std r5,_MSR(r1) + PPC_STL r5,_MSR(r1) li r5,0x700 - std r5,_TRAP(r1) + PPC_STL r5,_TRAP(r1) li r5,0 - std r5,ORIG_GPR3(r1)
Re: [PATCH v3 4/4] media: i2c: ov2659: Use clk_{prepare_enable,disable_unprepare}() to set xvclk on/off
On Tue, 20 Apr 2021 at 08:46, dillon min wrote: > > Hi All, > > Just a gentle ping, hope some expert could take a look, thanks. Don't ping people after 5 days. It's not gentle. Best regards, Krzysztof
Re: [PATCH 5.10 000/103] 5.10.32-rc1 review
On Mon, 19 Apr 2021 at 18:49, Greg Kroah-Hartman wrote: > > This is the start of the stable review cycle for the 5.10.32 release. > There are 103 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Wed, 21 Apr 2021 13:05:09 +. > Anything received after that time might be too late. > > The whole patch series can be found in one patch at: > > https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.32-rc1.gz > or in the git tree and branch at: > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > linux-5.10.y > and the diffstat can be found below. > > thanks, > > greg k-h Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386. Tested-by: Linux Kernel Functional Testing ## Build * kernel: 5.10.32-rc1 * git: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git * git branch: linux-5.10.y * git commit: bcedd92af6e5899132429d20a9322b12afec2188 * git describe: v5.10.31-104-gbcedd92af6e5 * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.10.y/build/v5.10.31-104-gbcedd92af6e5 ## No regressions (compared to v5.10.31-56-g86a799ba8d60) ## No fixes (compared to v5.10.31-56-g86a799ba8d60) ## Test result summary total: 69955, pass: 58611, fail: 1787, skip: 9326, xfail: 231, ## Build Summary * arc: 10 total, 10 passed, 0 failed * arm: 192 total, 192 passed, 0 failed * arm64: 26 total, 26 passed, 0 failed * dragonboard-410c: 1 total, 1 passed, 0 failed * hi6220-hikey: 1 total, 1 passed, 0 failed * i386: 25 total, 25 passed, 0 failed * juno-r2: 1 total, 1 passed, 0 failed * mips: 45 total, 45 passed, 0 failed * parisc: 9 total, 9 passed, 0 failed * powerpc: 27 total, 27 passed, 0 failed * riscv: 21 total, 21 passed, 0 failed * s390: 18 total, 18 passed, 0 failed * sh: 18 total, 18 passed, 0 failed * sparc: 9 total, 9 passed, 0 failed * x15: 1 total, 1 passed, 0 failed * x86: 1 total, 1 passed, 0 failed * x86_64: 26 total, 26 passed, 0 failed ## Test suites summary * fwts * igt-gpu-tools * install-android-platform-tools-r2600 * kselftest- * kselftest-android * kselftest-bpf * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-efivarfs * kselftest-filesystems * kselftest-firmware * kselftest-fpu * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kexec * kselftest-kvm * kselftest-lib * kselftest-livepatch * kselftest-lkdtm * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-splice * kselftest-static_keys * kselftest-sync * kselftest-sysctl * kselftest-tc-testing * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-vm * kselftest-vsyscall-mode-native- * kselftest-vsyscall-mode-none- * kselftest-x86 * kselftest-zram * kunit * kvm-unit-tests * libhugetlbfs * linux-log-parser * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-controllers-tests * ltp-cpuhotplug-tests * ltp-crypto-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-open-posix-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-tracing-tests * network-basic-tests * perf * rcutorture * ssuite * v4l2-compliance -- Linaro LKFT https://lkft.linaro.org
Re: [PATCH v3 4/4] media: i2c: ov2659: Use clk_{prepare_enable,disable_unprepare}() to set xvclk on/off
Hi All, Just a gentle ping, hope some expert could take a look, thanks. Best regards. Dillon On Thu, Apr 15, 2021 at 12:06 PM wrote: > > From: dillon min > > On some platform(imx6q), xvclk might not switch on in advance, > also for power save purpose, xvclk should not be always on. > so, add clk_prepare_enable(), clk_disable_unprepare() in driver > side to set xvclk on/off at proper stage. > > Add following changes: > - add 'struct clk *clk;' in 'struct ov2659 {}' > - enable xvclk in ov2659_power_on() > - disable xvclk in ov2659_power_off() > > Signed-off-by: dillon min > --- > v3: optimize commit message > > drivers/media/i2c/ov2659.c | 24 ++-- > 1 file changed, 18 insertions(+), 6 deletions(-) > > diff --git a/drivers/media/i2c/ov2659.c b/drivers/media/i2c/ov2659.c > index 42f64175a6df..fb78a1cedc03 100644 > --- a/drivers/media/i2c/ov2659.c > +++ b/drivers/media/i2c/ov2659.c > @@ -204,6 +204,7 @@ struct ov2659 { > struct i2c_client *client; > struct v4l2_ctrl_handler ctrls; > struct v4l2_ctrl *link_frequency; > + struct clk *clk; > const struct ov2659_framesize *frame_size; > struct sensor_register *format_ctrl_regs; > struct ov2659_pll_ctrl pll; > @@ -1270,6 +1271,8 @@ static int ov2659_power_off(struct device *dev) > > gpiod_set_value(ov2659->pwdn_gpio, 1); > > + clk_disable_unprepare(ov2659->clk); > + > return 0; > } > > @@ -1278,9 +1281,17 @@ static int ov2659_power_on(struct device *dev) > struct i2c_client *client = to_i2c_client(dev); > struct v4l2_subdev *sd = i2c_get_clientdata(client); > struct ov2659 *ov2659 = to_ov2659(sd); > + int ret; > > dev_dbg(&client->dev, "%s:\n", __func__); > > + ret = clk_prepare_enable(ov2659->clk); > + if (ret) { > + dev_err(&client->dev, "%s: failed to enable clock\n", > + __func__); > + return ret; > + } > + > gpiod_set_value(ov2659->pwdn_gpio, 0); > > if (ov2659->resetb_gpio) { > @@ -1425,7 +1436,6 @@ static int ov2659_probe(struct i2c_client *client) > const struct ov2659_platform_data *pdata = ov2659_get_pdata(client); > struct v4l2_subdev *sd; > struct ov2659 *ov2659; > - struct clk *clk; > int ret; > > if (!pdata) { > @@ -1440,11 +1450,11 @@ static int ov2659_probe(struct i2c_client *client) > ov2659->pdata = pdata; > ov2659->client = client; > > - clk = devm_clk_get(&client->dev, "xvclk"); > - if (IS_ERR(clk)) > - return PTR_ERR(clk); > + ov2659->clk = devm_clk_get(&client->dev, "xvclk"); > + if (IS_ERR(ov2659->clk)) > + return PTR_ERR(ov2659->clk); > > - ov2659->xvclk_frequency = clk_get_rate(clk); > + ov2659->xvclk_frequency = clk_get_rate(ov2659->clk); > if (ov2659->xvclk_frequency < 600 || > ov2659->xvclk_frequency > 2700) > return -EINVAL; > @@ -1506,7 +1516,9 @@ static int ov2659_probe(struct i2c_client *client) > ov2659->frame_size = &ov2659_framesizes[2]; > ov2659->format_ctrl_regs = ov2659_formats[0].format_ctrl_regs; > > - ov2659_power_on(&client->dev); > + ret = ov2659_power_on(&client->dev); > + if (ret < 0) > + goto error; > > ret = ov2659_detect(sd); > if (ret < 0) > -- > 2.7.4 >
Re: [RFC] memory reserve for userspace oom-killer
On Mon 19-04-21 18:44:02, Shakeel Butt wrote: > Proposal: Provide memory guarantees to userspace oom-killer. > > Background: > > Issues with kernel oom-killer: > 1. Very conservative and prefer to reclaim. Applications can suffer > for a long time. > 2. Borrows the context of the allocator which can be resource limited > (low sched priority or limited CPU quota). > 3. Serialized by global lock. > 4. Very simplistic oom victim selection policy. > > These issues are resolved through userspace oom-killer by: > 1. Ability to monitor arbitrary metrics (PSI, vmstat, memcg stats) to > early detect suffering. > 2. Independent process context which can be given dedicated CPU quota > and high scheduling priority. > 3. Can be more aggressive as required. > 4. Can implement sophisticated business logic/policies. > > Android's LMKD and Facebook's oomd are the prime examples of userspace > oom-killers. One of the biggest challenges for userspace oom-killers > is to potentially function under intense memory pressure and are prone > to getting stuck in memory reclaim themselves. Current userspace > oom-killers aim to avoid this situation by preallocating user memory > and protecting themselves from global reclaim by either mlocking or > memory.min. However a new allocation from userspace oom-killer can > still get stuck in the reclaim and policy rich oom-killer do trigger > new allocations through syscalls or even heap. Can you be more specific please? > Our attempt of userspace oom-killer faces similar challenges. > Particularly at the tail on the very highly utilized machines we have > observed userspace oom-killer spectacularly failing in many possible > ways in the direct reclaim. We have seen oom-killer stuck in direct > reclaim throttling, stuck in reclaim and allocations from interrupts > keep stealing reclaimed memory. We have even observed systems where > all the processes were stuck in throttle_direct_reclaim() and only > kswapd was running and the interrupts kept stealing the memory > reclaimed by kswapd. > > To reliably solve this problem, we need to give guaranteed memory to > the userspace oom-killer. There is nothing like that. Even memory reserves are a finite resource which can be consumed as it is sharing those reserves with other users who are not necessarily coordinated. So before we start discussing making this even more muddy by handing over memory reserves to the userspace we should really examine whether pre-allocation is something that will not work. > At the moment we are contemplating between > the following options and I would like to get some feedback. > > 1. prctl(PF_MEMALLOC) > > The idea is to give userspace oom-killer (just one thread which is > finding the appropriate victims and will be sending SIGKILLs) access > to MEMALLOC reserves. Most of the time the preallocation, mlock and > memory.min will be good enough but for rare occasions, when the > userspace oom-killer needs to allocate, the PF_MEMALLOC flag will > protect it from reclaim and let the allocation dip into the memory > reserves. I do not think that handing over an unlimited ticket to the memory reserves to userspace is a good idea. Even the in kernel oom killer is bound to a partial access to reserves. So if we really want this then it should be in sync with and bound by the ALLOC_OOM. > The misuse of this feature would be risky but it can be limited to > privileged applications. Userspace oom-killer is the only appropriate > user of this feature. This option is simple to implement. > > 2. Mempool > > The idea is to preallocate mempool with a given amount of memory for > userspace oom-killer. Preferably this will be per-thread and > oom-killer can preallocate mempool for its specific threads. The core > page allocator can check before going to the reclaim path if the task > has private access to the mempool and return page from it if yes. Could you elaborate some more on how this would be controlled from the userspace? A dedicated syscall? A driver? > This option would be more complicated than the previous option as the > lifecycle of the page from the mempool would be more sophisticated. > Additionally the current mempool does not handle higher order pages > and we might need to extend it to allow such allocations. Though this > feature might have more use-cases and it would be less risky than the > previous option. I would tend to agree. > Another idea I had was to use kthread based oom-killer and provide the > policies through eBPF program. Though I am not sure how to make it > monitor arbitrary metrics and if that can be done without any > allocations. A kernel module or eBPF to implement oom decisions has already been discussed few years back. But I am afraid this would be hard to wire in for anything except for the victim selection. I am not sure it is maintainable to also control when the OOM handling should trigger. -- Michal Hocko SUSE Labs
Re: [PATCH] hrtimer: Update softirq_expires_next correctly after __hrtimer_get_next_event()
On Mon, Apr 19 2021 at 20:12, Maciej Żenczykowski wrote: > On Thu, Apr 15, 2021 at 9:47 AM Thomas Gleixner wrote: >> Run the test on a kernels with and without that commit and collect trace >> data for both. >> >> That should give me a pretty clear picture what's going on. > > Lorenzo is trying to get the traces you asked for, or rather he’s > trying to get confirmation he can post them. > > Our initial observation of these results seems to suggest that > updating the timer (hrtimer_start, which seems to also call > hrtimer_cancel) takes twice as long as it used to. Which contradicts my measurements. The change in complexity is marginal and the overhead in cycles/instructions is close to noise. It's measurable in a microbenchmark, but it's in the < 1% range which is far away from the 60% you are seing. > My gut feeling is that softirq_activated is usually false, and the old > code in such a case calls just __hrtimer_get_next_event(, > HRTIMER_ACTIVE_ALL). While the new code will first call > __hrtimer_get_next_event(, HRTIMER_ACTIVE_SOFT) and then > __hrtimer_get_next_event(, HRTIMER_ACTIVE_HARD) > > Perhaps __hrtimer_get_next_event() should return both soft and hard > event times in one function call? > Or perhaps hrtimer_start should not call hrtimer_cancel? Perhaps we do a proper analysis first :) Thanks, tglx
Re: [PATCH v3 3/4] arm: dts: imx: Add i.mx6q DaSheng COM-9XX SBC board support
Hi, Just a gentle ping, hope some expert could take a look, thanks. Best regards. Dillon On Thu, Apr 15, 2021 at 12:05 PM wrote: > > From: dillon min > > The DaSheng Com-9xx is and ARM based signle board computer (SBC) > featuring: > - i.MX6Q > - 2GiB LPDDR3 DRAM > - 8GiB eMMC 5.0 FLASH > - 4MiB SPI Flash > - USB 2.0 Host/Device > - Multiple multi-protocol RS232/RS485 Serial ports > - microSD socket > - 5V DC power input > - HDMI1.4a,1080p@60 > - RGMIIx1 Gigabit Ethernet > - CSI0x1, connect with ov2659 > > Signed-off-by: dillon min > --- > v3: move imx6q-ds.dtb after imx6q-dms-ba16.dtb to follow the alphabetical > order > > arch/arm/boot/dts/Makefile| 1 + > arch/arm/boot/dts/imx6q-ds.dts| 17 ++ > arch/arm/boot/dts/imx6qdl-ds.dtsi | 465 > ++ > 3 files changed, 483 insertions(+) > create mode 100644 arch/arm/boot/dts/imx6q-ds.dts > create mode 100644 arch/arm/boot/dts/imx6qdl-ds.dtsi > > diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile > index a19c5ab9df84..425fe17ef7c1 100644 > --- a/arch/arm/boot/dts/Makefile > +++ b/arch/arm/boot/dts/Makefile > @@ -510,6 +510,7 @@ dtb-$(CONFIG_SOC_IMX6Q) += \ > imx6q-display5-tianma-tm070-1280x768.dtb \ > imx6q-dmo-edmqmx6.dtb \ > imx6q-dms-ba16.dtb \ > + imx6q-ds.dtb \ > imx6q-emcon-avari.dtb \ > imx6q-evi.dtb \ > imx6q-gk802.dtb \ > diff --git a/arch/arm/boot/dts/imx6q-ds.dts b/arch/arm/boot/dts/imx6q-ds.dts > new file mode 100644 > index ..b0a63a133977 > --- /dev/null > +++ b/arch/arm/boot/dts/imx6q-ds.dts > @@ -0,0 +1,17 @@ > +// SPDX-License-Identifier: GPL-2.0+ > +// > +// Copyright 2021 Dillon Min > +// > +// Based on imx6qdl-sabresd.dtsi which is: > +// Copyright 2012 Freescale Semiconductor, Inc. > +// Copyright 2011 Linaro Ltd. > + > +/dts-v1/; > + > +#include "imx6q.dtsi" > +#include "imx6qdl-ds.dtsi" > + > +/ { > + model = "DaSheng i.MX6 Quad Com-9xx Board"; > + compatible = "ds,imx6q-sbc", "fsl,imx6q"; > +}; > diff --git a/arch/arm/boot/dts/imx6qdl-ds.dtsi > b/arch/arm/boot/dts/imx6qdl-ds.dtsi > new file mode 100644 > index ..d28e065349cd > --- /dev/null > +++ b/arch/arm/boot/dts/imx6qdl-ds.dtsi > @@ -0,0 +1,465 @@ > +// SPDX-License-Identifier: GPL-2.0+ > +// > +// Copyright 2021 Dillon Min > +// > +// Based on imx6qdl-sabresd.dtsi which is: > +// Copyright 2012 Freescale Semiconductor, Inc. > +// Copyright 2011 Linaro Ltd. > + > +#include > +#include > +#include > + > +/ { > + chosen { > + stdout-path = &uart1; > + }; > + > + memory@1000 { > + device_type = "memory"; > + reg = <0x1000 0x8000>; > + }; > + > + reg_usb_otg_vbus: regulator-usb-otg-vbus { > + compatible = "regulator-fixed"; > + regulator-name = "usb_otg_vbus"; > + regulator-min-microvolt = <500>; > + regulator-max-microvolt = <500>; > + regulator-always-on; > + }; > + > + reg_usb_h1_vbus: regulator-usb-h1-vbus { > + compatible = "regulator-fixed"; > + regulator-name = "usb_h1_vbus"; > + regulator-min-microvolt = <500>; > + regulator-max-microvolt = <500>; > + regulator-always-on; > + }; > + > + leds { > + compatible = "gpio-leds"; > + pinctrl-names = "default"; > + pinctrl-0 = <&pinctrl_gpio_leds>; > + > + green { > + gpios = <&gpio4 8 0>; > + default-state = "on"; > + linux,default-trigger = "heartbeat"; > + }; > + }; > +}; > + > +&ipu1_csi0_from_ipu1_csi0_mux { > + bus-width = <8>; > + data-shift = <12>; /* Lines 19:12 used */ > + hsync-active = <1>; > + vsync-active = <1>; > +}; > + > +&ipu1_csi0_mux_from_parallel_sensor { > + remote-endpoint = <&ov2659_to_ipu1_csi0_mux>; > +}; > + > +&ipu1_csi0 { > + pinctrl-names = "default"; > + pinctrl-0 = <&pinctrl_ipu1_csi0>; > + status = "okay"; > +}; > + > +&clks { > + assigned-clocks = <&clks IMX6QDL_CLK_LDB_DI0_SEL>, > + <&clks IMX6QDL_CLK_LDB_DI1_SEL>; > + assigned-clock-parents = <&clks IMX6QDL_CLK_PLL3_USB_OTG>, > +<&clks IMX6QDL_CLK_PLL3_USB_OTG>; > +}; > + > +&ecspi1 { > + cs-gpios = <&gpio4 9 GPIO_ACTIVE_LOW>; > + pinctrl-names = "default"; > + pinctrl-0 = <&pinctrl_ecspi1>; > + status = "okay"; > + > + flash: m25p80@0 { > + #address-cells = <1>; > + #size-cells = <1>; > + compatible = "st,m25p80", "jedec,spi-nor"; > + spi-max-frequency = <2000>; > + reg = <0>; > + }; > +}; > + > +&fec { > + pinctrl-names = "default"; > + pinctrl-0 =
[PATCH] MIPS: pci-legacy: revert "use generic pci_enable_resources"
This mostly reverts commit 99bca615d895 ("MIPS: pci-legacy: use generic pci_enable_resources"). Fixes regressions such as: ata_piix :00:0a.1: can't enable device: BAR 0 [io 0x01f0-0x01f7] not claimed ata_piix: probe of :00:0a.1 failed with error -22 The only changes from the strict revert are to fix checkpatch errors: ERROR: spaces required around that '=' (ctx:VxV) #33: FILE: arch/mips/pci/pci-legacy.c:252: + for (idx=0; idx < PCI_NUM_RESOURCES; idx++) { ^ ERROR: do not use assignment in if condition #67: FILE: arch/mips/pci/pci-legacy.c:284: + if ((err = pcibios_enable_resources(dev, mask)) < 0) Reported-by: Guenter Roeck Signed-off-by: Ilya Lipnitskiy --- Guenter, sorry about this - let's just revert this for now to minimize turmoil to the legacy PCI code. Obviously, this needs more testing before applying. Thanks for your report! arch/mips/pci/pci-legacy.c | 38 +- 1 file changed, 37 insertions(+), 1 deletion(-) diff --git a/arch/mips/pci/pci-legacy.c b/arch/mips/pci/pci-legacy.c index c24226ea0a6e..468722c8a5c6 100644 --- a/arch/mips/pci/pci-legacy.c +++ b/arch/mips/pci/pci-legacy.c @@ -241,9 +241,45 @@ static int __init pcibios_init(void) subsys_initcall(pcibios_init); +static int pcibios_enable_resources(struct pci_dev *dev, int mask) +{ + u16 cmd, old_cmd; + int idx; + struct resource *r; + + pci_read_config_word(dev, PCI_COMMAND, &cmd); + old_cmd = cmd; + for (idx = 0; idx < PCI_NUM_RESOURCES; idx++) { + /* Only set up the requested stuff */ + if (!(mask & (1flags & (IORESOURCE_IO | IORESOURCE_MEM))) + continue; + if ((idx == PCI_ROM_RESOURCE) && + (!(r->flags & IORESOURCE_ROM_ENABLE))) + continue; + if (!r->start && r->end) { + pci_err(dev, + "can't enable device: resource collisions\n"); + return -EINVAL; + } + if (r->flags & IORESOURCE_IO) + cmd |= PCI_COMMAND_IO; + if (r->flags & IORESOURCE_MEM) + cmd |= PCI_COMMAND_MEMORY; + } + if (cmd != old_cmd) { + pci_info(dev, "enabling device (%04x -> %04x)\n", old_cmd, cmd); + pci_write_config_word(dev, PCI_COMMAND, cmd); + } + return 0; +} + int pcibios_enable_device(struct pci_dev *dev, int mask) { - int err = pci_enable_resources(dev, mask); + int err = pcibios_enable_resources(dev, mask); if (err < 0) return err; -- 2.31.1
Re: [PATCH V3 2/4] soc: qcom: dcc:Add driver support for Data Capture and Compare unit(DCC)
On 2021-04-19 20:02, Bjorn Andersson wrote: On Mon 19 Apr 05:32 CDT 2021, schow...@codeaurora.org wrote: On 2021-04-15 12:01, Felipe Balbi wrote: > Hi, > > Souradeep Chowdhury writes: > > diff --git a/drivers/soc/qcom/Makefile b/drivers/soc/qcom/Makefile > > index ad675a6..e7f0ccb 100644 > > --- a/drivers/soc/qcom/Makefile > > +++ b/drivers/soc/qcom/Makefile > > @@ -1,19 +1,22 @@ > > # SPDX-License-Identifier: GPL-2.0 > > CFLAGS_rpmh-rsc.o := -I$(src) > > obj-$(CONFIG_QCOM_AOSS_QMP) += qcom_aoss.o > > -obj-$(CONFIG_QCOM_GENI_SE) +=qcom-geni-se.o > > +obj-$(CONFIG_QCOM_APR) += apr.o > > obj-$(CONFIG_QCOM_COMMAND_DB) += cmd-db.o > > obj-$(CONFIG_QCOM_CPR) += cpr.o > > +obj-$(CONFIG_QCOM_DCC) += dcc.o > > +obj-$(CONFIG_QCOM_GENI_SE) += qcom-geni-se.o > > obj-$(CONFIG_QCOM_GSBI) += qcom_gsbi.o > > +obj-$(CONFIG_QCOM_KRYO_L2_ACCESSORS) += kryo-l2-accessors.o > > +obj-$(CONFIG_QCOM_LLCC) += llcc-qcom.o > > obj-$(CONFIG_QCOM_MDT_LOADER)+= mdt_loader.o > > obj-$(CONFIG_QCOM_OCMEM) += ocmem.o > > obj-$(CONFIG_QCOM_PDR_HELPERS) += pdr_interface.o > > obj-$(CONFIG_QCOM_QMI_HELPERS) += qmi_helpers.o > > -qmi_helpers-y+= qmi_encdec.o qmi_interface.o > > obj-$(CONFIG_QCOM_RMTFS_MEM) += rmtfs_mem.o > > obj-$(CONFIG_QCOM_RPMH) += qcom_rpmh.o > > -qcom_rpmh-y += rpmh-rsc.o > > -qcom_rpmh-y += rpmh.o > > +obj-$(CONFIG_QCOM_RPMHPD) += rpmhpd.o > > +obj-$(CONFIG_QCOM_RPMPD) += rpmpd.o > > obj-$(CONFIG_QCOM_SMD_RPM) += smd-rpm.o > > obj-$(CONFIG_QCOM_SMEM) += smem.o > > obj-$(CONFIG_QCOM_SMEM_STATE) += smem_state.o > > @@ -21,8 +24,6 @@ obj-$(CONFIG_QCOM_SMP2P)+= smp2p.o > > obj-$(CONFIG_QCOM_SMSM) += smsm.o > > obj-$(CONFIG_QCOM_SOCINFO) += socinfo.o > > obj-$(CONFIG_QCOM_WCNSS_CTRL) += wcnss_ctrl.o > > -obj-$(CONFIG_QCOM_APR) += apr.o > > -obj-$(CONFIG_QCOM_LLCC) += llcc-qcom.o > > -obj-$(CONFIG_QCOM_RPMHPD) += rpmhpd.o > > -obj-$(CONFIG_QCOM_RPMPD) += rpmpd.o > > -obj-$(CONFIG_QCOM_KRYO_L2_ACCESSORS) += kryo-l2-accessors.o > > +qmi_helpers-y += qmi_encdec.o qmi_interface.o > > +qcom_rpmh-y += rpmh-rsc.o > > +qcom_rpmh-y += rpmh.o > > why so many changes? This has been accidentally sorted based on the config names. Will be fixing this in next version of the patch. > > > diff --git a/drivers/soc/qcom/dcc.c b/drivers/soc/qcom/dcc.c > > new file mode 100644 > > index 000..fcd5580 > > --- /dev/null > > +++ b/drivers/soc/qcom/dcc.c > > @@ -0,0 +1,1539 @@ > > +// SPDX-License-Identifier: GPL-2.0-only > > +/* > > + * Copyright (c) 2015-2021, The Linux Foundation. All rights > > reserved. > > + */ > > + > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > + > > one blank line is enough Ack > > > +#define TIMEOUT_US 100 > > + > > +#define dcc_writel(drvdata, val, off) \ > > + writel((val), drvdata->base + dcc_offset_conv(drvdata, off)) > > +#define dcc_readl(drvdata, off) \ > > + readl(drvdata->base + dcc_offset_conv(drvdata, off)) > > + > > +#define dcc_sram_readl(drvdata, off) \ > > + readl(drvdata->ram_base + off) > > this would be probably be better as static inlines. These are simple read and write operations used in the driver which just calls the generic writel and readl function. That's why macros have been used here to lesson the overhead of an extra function call. The compiler will realize that your static dcc_sram_readl() is cheaper to inline than call and do so for you. So you can expect that there's no difference in the output from the compiler, and if there is then the compiler knows something that you're overlooking. Ack. Will go for static inline here. Regards, Bjorn
Re: [PATCH -next] tools/testing/nvdimm: Make symbol '__nfit_test_ioremap' static
Hi Zou, Zou Wei writes: > The sparse tool complains as follows: > > tools/testing/nvdimm/test/iomap.c:65:14: warning: > symbol '__nfit_test_ioremap' was not declared. Should it be static? > > This symbol is not used outside of security.c, so this s/security.c/iomap.c/ Thanks, Santosh > commit marks it static. > > Reported-by: Hulk Robot > Signed-off-by: Zou Wei > --- > tools/testing/nvdimm/test/iomap.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/tools/testing/nvdimm/test/iomap.c > b/tools/testing/nvdimm/test/iomap.c > index c62d372..ed563bd 100644 > --- a/tools/testing/nvdimm/test/iomap.c > +++ b/tools/testing/nvdimm/test/iomap.c > @@ -62,7 +62,7 @@ struct nfit_test_resource *get_nfit_res(resource_size_t > resource) > } > EXPORT_SYMBOL(get_nfit_res); > > -void __iomem *__nfit_test_ioremap(resource_size_t offset, unsigned long size, > +static void __iomem *__nfit_test_ioremap(resource_size_t offset, unsigned > long size, > void __iomem *(*fallback_fn)(resource_size_t, unsigned long)) > { > struct nfit_test_resource *nfit_res = get_nfit_res(offset); > -- > 2.6.2 > ___ > Linux-nvdimm mailing list -- linux-nvd...@lists.01.org > To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
Re: [PATCH AUTOSEL 5.11 17/23] i2c: mv64xxx: Fix random system lock caused by runtime PM
On Mon, 19 Apr 2021 16:43:36 -0400 Sasha Levin wrote: > This first appeared with commit e5c02cf54154 ("i2c: mv64xxx: Add runtime > PM support"). I forgot to add Fixes: tag to this commit. But the bug first appeared with commit e5c02cf54154 ("i2c: mv64xxx: Add runtime PM support") which is in 5.12, but not 5.11 or any others. So this fix is not needed for the stable releases (althogh it does not break anything on those...). Marek
Re: [PATCH 5.11 000/122] 5.11.16-rc1 review
On Mon, 19 Apr 2021 at 18:39, Greg Kroah-Hartman wrote: > > This is the start of the stable review cycle for the 5.11.16 release. > There are 122 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Wed, 21 Apr 2021 13:05:09 +. > Anything received after that time might be too late. > > The whole patch series can be found in one patch at: > > https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.11.16-rc1.gz > or in the git tree and branch at: > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > linux-5.11.y > and the diffstat can be found below. > > thanks, > > greg k-h Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386. Tested-by: Linux Kernel Functional Testing ## Build * kernel: 5.11.16-rc1 * git: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git * git branch: linux-5.11.y * git commit: f34f787f0e47983f4f18c818b2cf23f5e3bf576e * git describe: v5.11.15-123-gf34f787f0e47 * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.11.y/build/v5.11.15-123-gf34f787f0e47 ## No regressions (compared to v5.11.14-24-g7825299a896f) ## No fixes (compared to v5.11.14-24-g7825299a896f) ## Test result summary total: 76776, pass: 63385, fail: 2827, skip: 10320, xfail: 244, ## Build Summary * arc: 10 total, 10 passed, 0 failed * arm: 192 total, 192 passed, 0 failed * arm64: 26 total, 26 passed, 0 failed * dragonboard-410c: 2 total, 2 passed, 0 failed * hi6220-hikey: 2 total, 2 passed, 0 failed * i386: 26 total, 26 passed, 0 failed * juno-r2: 2 total, 2 passed, 0 failed * mips: 45 total, 45 passed, 0 failed * parisc: 9 total, 9 passed, 0 failed * powerpc: 27 total, 27 passed, 0 failed * riscv: 21 total, 21 passed, 0 failed * s390: 18 total, 18 passed, 0 failed * sh: 18 total, 18 passed, 0 failed * sparc: 9 total, 9 passed, 0 failed * x15: 1 total, 0 passed, 1 failed * x86: 2 total, 2 passed, 0 failed * x86_64: 26 total, 26 passed, 0 failed ## Test suites summary * fwts * igt-gpu-tools * install-android-platform-tools-r2600 * kselftest- * kselftest-android * kselftest-bpf * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-efivarfs * kselftest-filesystems * kselftest-firmware * kselftest-fpu * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kexec * kselftest-kvm * kselftest-lib * kselftest-livepatch * kselftest-lkdtm * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-splice * kselftest-static_keys * kselftest-sync * kselftest-sysctl * kselftest-tc-testing * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-vm * kselftest-vsyscall-mode-native- * kselftest-vsyscall-mode-none- * kselftest-x86 * kselftest-zram * kunit * kvm-unit-tests * libhugetlbfs * linux-log-parser * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-controllers-tests * ltp-cpuhotplug-tests * ltp-crypto-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-open-posix-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-tracing-tests * network-basic-tests * perf * rcutorture * ssuite * v4l2-compliance -- Linaro LKFT https://lkft.linaro.org
[tip:auto-latest] BUILD SUCCESS e48f64fea89e61301248c4be02b406d934a4bac3
nfig alpha defconfig alphaallyesconfig h8300allyesconfig arc defconfig sh allmodconfig parisc defconfig s390 allyesconfig s390 allmodconfig parisc allyesconfig s390defconfig sparcallyesconfig sparc defconfig i386defconfig mips allyesconfig mips allmodconfig powerpc allyesconfig powerpc allmodconfig powerpc allnoconfig x86_64 randconfig-a003-20210419 x86_64 randconfig-a001-20210419 x86_64 randconfig-a005-20210419 x86_64 randconfig-a002-20210419 x86_64 randconfig-a006-20210419 x86_64 randconfig-a004-20210419 i386 randconfig-a003-20210419 i386 randconfig-a001-20210419 i386 randconfig-a006-20210419 i386 randconfig-a005-20210419 i386 randconfig-a004-20210419 i386 randconfig-a002-20210419 i386 randconfig-a015-20210419 i386 randconfig-a013-20210419 i386 randconfig-a014-20210419 i386 randconfig-a016-20210419 i386 randconfig-a012-20210419 i386 randconfig-a011-20210419 riscvnommu_k210_defconfig riscvnommu_virt_defconfig riscv allnoconfig riscv defconfig riscv rv32_defconfig um allmodconfig umallnoconfig um allyesconfig um defconfig x86_64rhel-8.3-kselftests x86_64 defconfig x86_64 rhel-8.3 x86_64 rhel-8.3-kbuiltin x86_64 kexec clang tested configs: x86_64 randconfig-a014-20210419 x86_64 randconfig-a015-20210419 x86_64 randconfig-a013-20210419 x86_64 randconfig-a011-20210419 x86_64 randconfig-a012-20210419 x86_64 randconfig-a016-20210419 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org
Re: [PATCH 2/4] PKCS#7: Check codeSigning EKU for kernel module and kexec pe verification
Hi Varad, Thanks for your review! On Thu, Apr 15, 2021 at 02:08:32PM +0200, Varad Gautam wrote: > Hi Joey, > > On 4/9/21 4:46 AM, Lee, Chun-Yi wrote: > > This patch adds the logic for checking the CodeSigning extended > > key usage when verifying signature of kernel module or > > kexec PE binary in PKCS#7. > > > > Signed-off-by: "Lee, Chun-Yi" > > --- > > certs/system_keyring.c | 2 +- > > crypto/asymmetric_keys/Kconfig | 9 + > > crypto/asymmetric_keys/pkcs7_trust.c | 37 > > +--- > > include/crypto/pkcs7.h | 3 ++- > > 4 files changed, 46 insertions(+), 5 deletions(-) > > > > diff --git a/certs/system_keyring.c b/certs/system_keyring.c > > index 4b693da488f1..c9f8bca0b0d3 100644 > > --- a/certs/system_keyring.c > > +++ b/certs/system_keyring.c > > @@ -243,7 +243,7 @@ int verify_pkcs7_message_sig(const void *data, size_t > > len, > > goto error; > > } > > } > > - ret = pkcs7_validate_trust(pkcs7, trusted_keys); > > + ret = pkcs7_validate_trust(pkcs7, trusted_keys, usage); > > if (ret < 0) { > > if (ret == -ENOKEY) > > pr_devel("PKCS#7 signature not signed with a trusted > > key\n"); > > diff --git a/crypto/asymmetric_keys/Kconfig b/crypto/asymmetric_keys/Kconfig > > index 1f1f004dc757..1754812df989 100644 > > --- a/crypto/asymmetric_keys/Kconfig > > +++ b/crypto/asymmetric_keys/Kconfig > > @@ -96,4 +96,13 @@ config SIGNED_PE_FILE_VERIFICATION > > This option provides support for verifying the signature(s) on a > > signed PE binary. > > > > +config CHECK_CODESIGN_EKU > > + bool "Check codeSigning extended key usage" > > + depends on PKCS7_MESSAGE_PARSER=y > > + depends on SYSTEM_DATA_VERIFICATION > > + help > > + This option provides support for checking the codeSigning extended > > + key usage when verifying the signature in PKCS#7. It affects kernel > > + module verification and kexec PE binary verification. > > + > > endif # ASYMMETRIC_KEY_TYPE > > diff --git a/crypto/asymmetric_keys/pkcs7_trust.c > > b/crypto/asymmetric_keys/pkcs7_trust.c > > index b531df2013c4..077bfef928b6 100644 > > --- a/crypto/asymmetric_keys/pkcs7_trust.c > > +++ b/crypto/asymmetric_keys/pkcs7_trust.c > > @@ -16,12 +16,36 @@ > > #include > > #include "pkcs7_parser.h" > > > > +#ifdef CONFIG_CHECK_CODESIGN_EKU > > +static bool check_codesign_eku(struct key *key, > > +enum key_being_used_for usage) > > +{ > > + struct public_key *public_key = key->payload.data[asym_crypto]; > > + > > + switch (usage) { > > + case VERIFYING_MODULE_SIGNATURE: > > + case VERIFYING_KEXEC_PE_SIGNATURE: > > + return !!(public_key->eku & EKU_codeSigning); > > + default: > > + break; > > + } > > + return true; > > +} > > +#else > > +static bool check_codesign_eku(struct key *key, > > +enum key_being_used_for usage) > > +{ > > + return true; > > +} > > +#endif > > + > > /* > > * Check the trust on one PKCS#7 SignedInfo block. > > */ > > static int pkcs7_validate_trust_one(struct pkcs7_message *pkcs7, > > struct pkcs7_signed_info *sinfo, > > - struct key *trust_keyring) > > + struct key *trust_keyring, > > + enum key_being_used_for usage) > > { > > struct public_key_signature *sig = sinfo->sig; > > struct x509_certificate *x509, *last = NULL, *p; > > @@ -112,6 +136,12 @@ static int pkcs7_validate_trust_one(struct > > pkcs7_message *pkcs7, > > return -ENOKEY; > > > > matched: > > + if (!check_codesign_eku(key, usage)) { > > Perhaps this can be a generic check_eku_usage() call, with codesigning as one > of the > things it can check for. > Because only codesign EKU be checked now. So I prefer to keep it as my current implementation until there have other EKU requirement. > > + pr_warn("sinfo %u: The signer %x key is not CodeSigning\n", > > + sinfo->index, key_serial(key)); > > + key_put(key); > > + return -ENOKEY; > > + } > > ret = verify_signature(key, sig); > > key_put(key); > > if (ret < 0) { > > @@ -156,7 +186,8 @@ static int pkcs7_validate_trust_one(struct > > pkcs7_message *pkcs7, > > * May also return -ENOMEM. > > */ > > int pkcs7_validate_trust(struct pkcs7_message *pkcs7, > > -struct key *trust_keyring) > > +struct key *trust_keyring, > > +enum key_being_used_for usage) > > Please also update the comment above to mention the `usage` parameter. > > Regards, > Varad I will update it to the comment. Thanks! Joey Lee > > > { > > struct pkcs7_signed_info *sinfo; > > struct x509_certificate *p; > > @@ -167,7 +198,7 @@ int pkcs7_validate_trust(struct pkcs7_message *pkcs7, > >
Re: [PATCH v2] irqchip/xilinx: Expose Kconfig option for Zynq/ZynqMP
On 4/19/21 9:42 PM, Robert Hancock wrote: > Previously the XILINX_INTC config option was hidden and only > auto-selected on the MicroBlaze platform. However, this IP can also be > used on the Zynq and ZynqMP platforms as a secondary cascaded > controller. Allow this option to be user-enabled on those platforms. > > Signed-off-by: Robert Hancock > --- > drivers/irqchip/Kconfig | 7 ++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig > index 15536e321df5..1020cc5a7800 100644 > --- a/drivers/irqchip/Kconfig > +++ b/drivers/irqchip/Kconfig > @@ -279,8 +279,13 @@ config XTENSA_MX > select GENERIC_IRQ_EFFECTIVE_AFF_MASK > > config XILINX_INTC > - bool > + bool "Xilinx Interrupt Controller IP" > + depends on MICROBLAZE || ARCH_ZYNQ || ARCH_ZYNQMP || COMPILE_TEST > select IRQ_DOMAIN > + help > + Support for the Xilinx Interrupt Controller IP core. > + This is used as a primary controller with MicroBlaze and can also > + be used as a secondary chained controller on other platforms. > > config IRQ_CROSSBAR > bool > Acked-by: Michal Simek Thanks, Michal
[PATCH v10 1/7] dt-bindings: PCI: mediatek-gen3: Add YAML schema
Add YAML schemas documentation for Gen3 PCIe controller on MediaTek SoCs. Signed-off-by: Jianjun Wang Acked-by: Ryder Lee Reviewed-by: Rob Herring --- .../bindings/pci/mediatek-pcie-gen3.yaml | 181 ++ 1 file changed, 181 insertions(+) create mode 100644 Documentation/devicetree/bindings/pci/mediatek-pcie-gen3.yaml diff --git a/Documentation/devicetree/bindings/pci/mediatek-pcie-gen3.yaml b/Documentation/devicetree/bindings/pci/mediatek-pcie-gen3.yaml new file mode 100644 index ..e7b1f9892da4 --- /dev/null +++ b/Documentation/devicetree/bindings/pci/mediatek-pcie-gen3.yaml @@ -0,0 +1,181 @@ +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/pci/mediatek-pcie-gen3.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Gen3 PCIe controller on MediaTek SoCs + +maintainers: + - Jianjun Wang + +description: |+ + PCIe Gen3 MAC controller for MediaTek SoCs, it supports Gen3 speed + and compatible with Gen2, Gen1 speed. + + This PCIe controller supports up to 256 MSI vectors, the MSI hardware + block diagram is as follows: + ++-+ +| GIC | ++-+ + ^ + | + port->irq + | + +-+-+-+-+-+-+-+-+ + |0|1|2|3|4|5|6|7| (PCIe intc) + +-+-+-+-+-+-+-+-+ +^ ^ ^ +| |...| ++---+ +--++---+ +||| + +-+-+---+--+--+ +-+-+---+--+--+ +-+-+---+--+--+ + |0|1|...|30|31| |0|1|...|30|31| |0|1|...|30|31| (MSI sets) + +-+-+---+--+--+ +-+-+---+--+--+ +-+-+---+--+--+ + ^ ^ ^ ^^ ^ ^ ^^ ^ ^ ^ + | | | || | | || | | | (MSI vectors) + | | | || | | || | | | + +(MSI SET0) (MSI SET1) ... (MSI SET7) + + With 256 MSI vectors supported, the MSI vectors are composed of 8 sets, + each set has its own address for MSI message, and supports 32 MSI vectors + to generate interrupt. + +allOf: + - $ref: /schemas/pci/pci-bus.yaml# + +properties: + compatible: +const: mediatek,mt8192-pcie + + reg: +maxItems: 1 + + reg-names: +items: + - const: pcie-mac + + interrupts: +maxItems: 1 + + ranges: +minItems: 1 +maxItems: 8 + + resets: +minItems: 1 +maxItems: 2 + + reset-names: +minItems: 1 +maxItems: 2 +items: + - const: phy + - const: mac + + clocks: +maxItems: 6 + + clock-names: +items: + - const: pl_250m + - const: tl_26m + - const: tl_96m + - const: tl_32k + - const: peri_26m + - const: top_133m + + assigned-clocks: +maxItems: 1 + + assigned-clock-parents: +maxItems: 1 + + phys: +maxItems: 1 + + '#interrupt-cells': +const: 1 + + interrupt-controller: +description: Interrupt controller node for handling legacy PCI interrupts. +type: object +properties: + '#address-cells': +const: 0 + '#interrupt-cells': +const: 1 + interrupt-controller: true + +required: + - '#address-cells' + - '#interrupt-cells' + - interrupt-controller + +additionalProperties: false + +required: + - compatible + - reg + - reg-names + - interrupts + - ranges + - clocks + - '#interrupt-cells' + - interrupt-controller + +unevaluatedProperties: false + +examples: + - | +#include +#include + +bus { +#address-cells = <2>; +#size-cells = <2>; + +pcie: pcie@1123 { +compatible = "mediatek,mt8192-pcie"; +device_type = "pci"; +#address-cells = <3>; +#size-cells = <2>; +reg = <0x00 0x1123 0x00 0x4000>; +reg-names = "pcie-mac"; +interrupts = ; +bus-range = <0x00 0xff>; +ranges = <0x8200 0x00 0x1200 0x00 + 0x1200 0x00 0x100>; +clocks = <&infracfg 44>, + <&infracfg 40>, + <&infracfg 43>, + <&infracfg 97>, + <&infracfg 99>, + <&infracfg 111>; +clock-names = "pl_250m", "tl_26m", "tl_96m", + "tl_32k", "peri_26m", "top_133m"; +assigned-clocks = <&topckgen 50>; +assigned-clock-parents = <&topckgen 91>; + +phys = <&pciephy>; +phy-names = "pcie-phy"; + +resets = <&infracfg_rst 2>, + <&infracfg_rst 3>; +reset-names = "phy", "mac"; + +#interrupt-cells = <1>; +interrupt-map-mask = <0 0 0 0x7>; +interrupt-map = <0 0 0 1 &pcie_intc 0>, +<0 0 0 2 &pcie_intc 1>, +<0 0 0 3 &pcie_intc 2>,
[PATCH v10 7/7] MAINTAINERS: Add Jianjun Wang as MediaTek PCI co-maintainer
Update entry for MediaTek PCIe controller, add Jianjun Wang as MediaTek PCI co-maintainer. Signed-off-by: Jianjun Wang Acked-by: Ryder Lee --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index d92f85ca831d..8050c14e6a7a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -13919,6 +13919,7 @@ F: drivers/pci/controller/dwc/pcie-histb.c PCIE DRIVER FOR MEDIATEK M: Ryder Lee +M: Jianjun Wang L: linux-...@vger.kernel.org L: linux-media...@lists.infradead.org S: Supported -- 2.25.1
[PATCH v10 6/7] PCI: mediatek-gen3: Add system PM support
Add suspend_noirq and resume_noirq callback functions to implement PM system suspend and resume hooks for the MediaTek Gen3 PCIe controller. When the system suspends, trigger the PCIe link to enter the L2 state and pull down the PERST# pin, gating the clocks of the MAC layer, and then power-off the physical layer to provide power-saving. When the system resumes, the PCIe link should be re-established and the related control register values should be restored. Signed-off-by: Jianjun Wang Acked-by: Ryder Lee --- drivers/pci/controller/pcie-mediatek-gen3.c | 113 1 file changed, 113 insertions(+) diff --git a/drivers/pci/controller/pcie-mediatek-gen3.c b/drivers/pci/controller/pcie-mediatek-gen3.c index ee1b51207d11..20165e4a75b2 100644 --- a/drivers/pci/controller/pcie-mediatek-gen3.c +++ b/drivers/pci/controller/pcie-mediatek-gen3.c @@ -45,6 +45,9 @@ #define PCIE_PE_RSTB BIT(3) #define PCIE_LTSSM_STATUS_REG 0x150 +#define PCIE_LTSSM_STATE_MASK GENMASK(28, 24) +#define PCIE_LTSSM_STATE(val) ((val & PCIE_LTSSM_STATE_MASK) >> 24) +#define PCIE_LTSSM_STATE_L2_IDLE 0x14 #define PCIE_LINK_STATUS_REG 0x154 #define PCIE_PORT_LINKUP BIT(8) @@ -73,6 +76,9 @@ #define PCIE_MSI_SET_ADDR_HI_BASE 0xc80 #define PCIE_MSI_SET_ADDR_HI_OFFSET0x04 +#define PCIE_ICMD_PM_REG 0x198 +#define PCIE_TURN_OFF_LINK BIT(4) + #define PCIE_TRANS_TABLE_BASE_REG 0x800 #define PCIE_ATR_SRC_ADDR_MSB_OFFSET 0x4 #define PCIE_ATR_TRSL_ADDR_LSB_OFFSET 0x8 @@ -95,10 +101,12 @@ * struct mtk_msi_set - MSI information for each set * @base: IO mapped register base * @msg_addr: MSI message address + * @saved_irq_state: IRQ enable state saved at suspend time */ struct mtk_msi_set { void __iomem *base; phys_addr_t msg_addr; + u32 saved_irq_state; }; /** @@ -112,6 +120,7 @@ struct mtk_msi_set { * @clks: PCIe clocks * @num_clks: PCIe clocks count for this port * @irq: PCIe controller interrupt number + * @saved_irq_state: IRQ enable state saved at suspend time * @irq_lock: lock protecting IRQ register access * @intx_domain: legacy INTx IRQ domain * @msi_domain: MSI IRQ domain @@ -131,6 +140,7 @@ struct mtk_pcie_port { int num_clks; int irq; + u32 saved_irq_state; raw_spinlock_t irq_lock; struct irq_domain *intx_domain; struct irq_domain *msi_domain; @@ -894,6 +904,108 @@ static int mtk_pcie_remove(struct platform_device *pdev) return 0; } +static void __maybe_unused mtk_pcie_irq_save(struct mtk_pcie_port *port) +{ + int i; + + raw_spin_lock(&port->irq_lock); + + port->saved_irq_state = readl_relaxed(port->base + PCIE_INT_ENABLE_REG); + + for (i = 0; i < PCIE_MSI_SET_NUM; i++) { + struct mtk_msi_set *msi_set = &port->msi_sets[i]; + + msi_set->saved_irq_state = readl_relaxed(msi_set->base + + PCIE_MSI_SET_ENABLE_OFFSET); + } + + raw_spin_unlock(&port->irq_lock); +} + +static void __maybe_unused mtk_pcie_irq_restore(struct mtk_pcie_port *port) +{ + int i; + + raw_spin_lock(&port->irq_lock); + + writel_relaxed(port->saved_irq_state, port->base + PCIE_INT_ENABLE_REG); + + for (i = 0; i < PCIE_MSI_SET_NUM; i++) { + struct mtk_msi_set *msi_set = &port->msi_sets[i]; + + writel_relaxed(msi_set->saved_irq_state, + msi_set->base + PCIE_MSI_SET_ENABLE_OFFSET); + } + + raw_spin_unlock(&port->irq_lock); +} + +static int __maybe_unused mtk_pcie_turn_off_link(struct mtk_pcie_port *port) +{ + u32 val; + + val = readl_relaxed(port->base + PCIE_ICMD_PM_REG); + val |= PCIE_TURN_OFF_LINK; + writel_relaxed(val, port->base + PCIE_ICMD_PM_REG); + + /* Check the link is L2 */ + return readl_poll_timeout(port->base + PCIE_LTSSM_STATUS_REG, val, + (PCIE_LTSSM_STATE(val) == + PCIE_LTSSM_STATE_L2_IDLE), 20, + 50 * USEC_PER_MSEC); +} + +static int __maybe_unused mtk_pcie_suspend_noirq(struct device *dev) +{ + struct mtk_pcie_port *port = dev_get_drvdata(dev); + int err; + u32 val; + + /* Trigger link to L2 state */ + err = mtk_pcie_turn_off_link(port); + if (err) { + dev_err(port->dev, "cannot enter L2 state\n"); + return err; + } + + /* Pull down the PERST# pin */ + val = readl_relaxed(port->base + PCIE_RST_CTRL_REG); + val |= PCIE_PE_RSTB; + writel_relaxed(val, port->base + PCIE_RST_CTRL_REG); + + dev_dbg(port->dev, "entered L2 states successfully"); + + mtk_pcie_irq_save(port); + mtk_pcie_power_down(port); + + return 0; +} + +static int __maybe_unused mtk_pcie_resume_noirq(struct
[PATCH v10 5/7] PCI: mediatek-gen3: Add MSI support
Add MSI support for MediaTek Gen3 PCIe controller. This PCIe controller supports up to 256 MSI vectors, the MSI hardware block diagram is as follows: +-+ | GIC | +-+ ^ | port->irq | +-+-+-+-+-+-+-+-+ |0|1|2|3|4|5|6|7| (PCIe intc) +-+-+-+-+-+-+-+-+ ^ ^ ^ | |...| +---+ +--++---+ ||| +-+-+---+--+--+ +-+-+---+--+--+ +-+-+---+--+--+ |0|1|...|30|31| |0|1|...|30|31| |0|1|...|30|31| (MSI sets) +-+-+---+--+--+ +-+-+---+--+--+ +-+-+---+--+--+ ^ ^ ^ ^^ ^ ^ ^^ ^ ^ ^ | | | || | | || | | | (MSI vectors) | | | || | | || | | | (MSI SET0) (MSI SET1) ... (MSI SET7) With 256 MSI vectors supported, the MSI vectors are composed of 8 sets, each set has its own address for MSI message, and supports 32 MSI vectors to generate interrupt. Signed-off-by: Jianjun Wang Acked-by: Ryder Lee Reviewed-by: Marc Zyngier --- drivers/pci/controller/pcie-mediatek-gen3.c | 276 1 file changed, 276 insertions(+) diff --git a/drivers/pci/controller/pcie-mediatek-gen3.c b/drivers/pci/controller/pcie-mediatek-gen3.c index ff91ad587461..ee1b51207d11 100644 --- a/drivers/pci/controller/pcie-mediatek-gen3.c +++ b/drivers/pci/controller/pcie-mediatek-gen3.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include @@ -48,12 +49,29 @@ #define PCIE_LINK_STATUS_REG 0x154 #define PCIE_PORT_LINKUP BIT(8) +#define PCIE_MSI_SET_NUM 8 +#define PCIE_MSI_IRQS_PER_SET 32 +#define PCIE_MSI_IRQS_NUM \ + (PCIE_MSI_IRQS_PER_SET * PCIE_MSI_SET_NUM) + #define PCIE_INT_ENABLE_REG0x180 +#define PCIE_MSI_ENABLEGENMASK(PCIE_MSI_SET_NUM + 8 - 1, 8) +#define PCIE_MSI_SHIFT 8 #define PCIE_INTX_SHIFT24 #define PCIE_INTX_ENABLE \ GENMASK(PCIE_INTX_SHIFT + PCI_NUM_INTX - 1, PCIE_INTX_SHIFT) #define PCIE_INT_STATUS_REG0x184 +#define PCIE_MSI_SET_ENABLE_REG0x190 +#define PCIE_MSI_SET_ENABLEGENMASK(PCIE_MSI_SET_NUM - 1, 0) + +#define PCIE_MSI_SET_BASE_REG 0xc00 +#define PCIE_MSI_SET_OFFSET0x10 +#define PCIE_MSI_SET_STATUS_OFFSET 0x04 +#define PCIE_MSI_SET_ENABLE_OFFSET 0x08 + +#define PCIE_MSI_SET_ADDR_HI_BASE 0xc80 +#define PCIE_MSI_SET_ADDR_HI_OFFSET0x04 #define PCIE_TRANS_TABLE_BASE_REG 0x800 #define PCIE_ATR_SRC_ADDR_MSB_OFFSET 0x4 @@ -73,6 +91,16 @@ #define PCIE_ATR_TLP_TYPE_MEM PCIE_ATR_TLP_TYPE(0) #define PCIE_ATR_TLP_TYPE_IO PCIE_ATR_TLP_TYPE(2) +/** + * struct mtk_msi_set - MSI information for each set + * @base: IO mapped register base + * @msg_addr: MSI message address + */ +struct mtk_msi_set { + void __iomem *base; + phys_addr_t msg_addr; +}; + /** * struct mtk_pcie_port - PCIe port information * @dev: pointer to PCIe device @@ -86,6 +114,11 @@ * @irq: PCIe controller interrupt number * @irq_lock: lock protecting IRQ register access * @intx_domain: legacy INTx IRQ domain + * @msi_domain: MSI IRQ domain + * @msi_bottom_domain: MSI IRQ bottom domain + * @msi_sets: MSI sets information + * @lock: lock protecting IRQ bit map + * @msi_irq_in_use: bit map for assigned MSI IRQ */ struct mtk_pcie_port { struct device *dev; @@ -100,6 +133,11 @@ struct mtk_pcie_port { int irq; raw_spinlock_t irq_lock; struct irq_domain *intx_domain; + struct irq_domain *msi_domain; + struct irq_domain *msi_bottom_domain; + struct mtk_msi_set msi_sets[PCIE_MSI_SET_NUM]; + struct mutex lock; + DECLARE_BITMAP(msi_irq_in_use, PCIE_MSI_IRQS_NUM); }; /** @@ -196,6 +234,35 @@ static int mtk_pcie_set_trans_table(struct mtk_pcie_port *port, return 0; } +static void mtk_pcie_enable_msi(struct mtk_pcie_port *port) +{ + int i; + u32 val; + + for (i = 0; i < PCIE_MSI_SET_NUM; i++) { + struct mtk_msi_set *msi_set = &port->msi_sets[i]; + + msi_set->base = port->base + PCIE_MSI_SET_BASE_REG + + i * PCIE_MSI_SET_OFFSET; + msi_set->msg_addr = port->reg_base + PCIE_MSI_SET_BASE_REG + + i * PCIE_MSI_SET_OFFSET; + + /* Configure the MSI capture address */ + writel_relaxed(lower_32_bits(msi_set->msg_addr), msi_set->base); + writel_relaxed(upper_32_bits(msi_set->msg_addr), + port->base + PCIE_MSI_SET_ADDR_HI_BASE + + i * PCIE_MSI_SET_ADDR_HI_OFFSET); + } + + val = readl_relaxed(p
[PATCH v10 3/7] PCI: mediatek-gen3: Add MediaTek Gen3 driver for MT8192
MediaTek's PCIe host controller has three generation HWs, the new generation HW is an individual bridge, it supports Gen3 speed and compatible with Gen2, Gen1 speed. Add support for new Gen3 controller which can be found on MT8192. Signed-off-by: Jianjun Wang Acked-by: Ryder Lee --- drivers/pci/controller/Kconfig | 13 + drivers/pci/controller/Makefile | 1 + drivers/pci/controller/pcie-mediatek-gen3.c | 464 3 files changed, 478 insertions(+) create mode 100644 drivers/pci/controller/pcie-mediatek-gen3.c diff --git a/drivers/pci/controller/Kconfig b/drivers/pci/controller/Kconfig index 5aa8977d7b0f..1e925ac47279 100644 --- a/drivers/pci/controller/Kconfig +++ b/drivers/pci/controller/Kconfig @@ -233,6 +233,19 @@ config PCIE_MEDIATEK Say Y here if you want to enable PCIe controller support on MediaTek SoCs. +config PCIE_MEDIATEK_GEN3 + tristate "MediaTek Gen3 PCIe controller" + depends on ARCH_MEDIATEK || COMPILE_TEST + depends on PCI_MSI_IRQ_DOMAIN + help + Adds support for PCIe Gen3 MAC controller for MediaTek SoCs. + This PCIe controller is compatible with Gen3, Gen2 and Gen1 speed, + and support up to 256 MSI interrupt numbers for + multi-function devices. + + Say Y here if you want to enable Gen3 PCIe controller support on + MediaTek SoCs. + config VMD depends on PCI_MSI && X86_64 && SRCU tristate "Intel Volume Management Device Driver" diff --git a/drivers/pci/controller/Makefile b/drivers/pci/controller/Makefile index e4559f2182f2..579973327815 100644 --- a/drivers/pci/controller/Makefile +++ b/drivers/pci/controller/Makefile @@ -27,6 +27,7 @@ obj-$(CONFIG_PCIE_ROCKCHIP) += pcie-rockchip.o obj-$(CONFIG_PCIE_ROCKCHIP_EP) += pcie-rockchip-ep.o obj-$(CONFIG_PCIE_ROCKCHIP_HOST) += pcie-rockchip-host.o obj-$(CONFIG_PCIE_MEDIATEK) += pcie-mediatek.o +obj-$(CONFIG_PCIE_MEDIATEK_GEN3) += pcie-mediatek-gen3.o obj-$(CONFIG_PCIE_MICROCHIP_HOST) += pcie-microchip-host.o obj-$(CONFIG_VMD) += vmd.o obj-$(CONFIG_PCIE_BRCMSTB) += pcie-brcmstb.o diff --git a/drivers/pci/controller/pcie-mediatek-gen3.c b/drivers/pci/controller/pcie-mediatek-gen3.c new file mode 100644 index ..3546e53b3c85 --- /dev/null +++ b/drivers/pci/controller/pcie-mediatek-gen3.c @@ -0,0 +1,464 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * MediaTek PCIe host controller driver. + * + * Copyright (c) 2020 MediaTek Inc. + * Author: Jianjun Wang + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "../pci.h" + +#define PCIE_SETTING_REG 0x80 +#define PCIE_PCI_IDS_1 0x9c +#define PCI_CLASS(class) (class << 8) +#define PCIE_RC_MODE BIT(0) + +#define PCIE_CFGNUM_REG0x140 +#define PCIE_CFG_DEVFN(devfn) ((devfn) & GENMASK(7, 0)) +#define PCIE_CFG_BUS(bus) (((bus) << 8) & GENMASK(15, 8)) +#define PCIE_CFG_BYTE_EN(bytes)(((bytes) << 16) & GENMASK(19, 16)) +#define PCIE_CFG_FORCE_BYTE_EN BIT(20) +#define PCIE_CFG_OFFSET_ADDR 0x1000 +#define PCIE_CFG_HEADER(bus, devfn) \ + (PCIE_CFG_BUS(bus) | PCIE_CFG_DEVFN(devfn)) + +#define PCIE_RST_CTRL_REG 0x148 +#define PCIE_MAC_RSTB BIT(0) +#define PCIE_PHY_RSTB BIT(1) +#define PCIE_BRG_RSTB BIT(2) +#define PCIE_PE_RSTB BIT(3) + +#define PCIE_LTSSM_STATUS_REG 0x150 + +#define PCIE_LINK_STATUS_REG 0x154 +#define PCIE_PORT_LINKUP BIT(8) + +#define PCIE_TRANS_TABLE_BASE_REG 0x800 +#define PCIE_ATR_SRC_ADDR_MSB_OFFSET 0x4 +#define PCIE_ATR_TRSL_ADDR_LSB_OFFSET 0x8 +#define PCIE_ATR_TRSL_ADDR_MSB_OFFSET 0xc +#define PCIE_ATR_TRSL_PARAM_OFFSET 0x10 +#define PCIE_ATR_TLB_SET_OFFSET0x20 + +#define PCIE_MAX_TRANS_TABLES 8 +#define PCIE_ATR_ENBIT(0) +#define PCIE_ATR_SIZE(size) \ + (size) - 1) << 1) & GENMASK(6, 1)) | PCIE_ATR_EN) +#define PCIE_ATR_ID(id)((id) & GENMASK(3, 0)) +#define PCIE_ATR_TYPE_MEM PCIE_ATR_ID(0) +#define PCIE_ATR_TYPE_IO PCIE_ATR_ID(1) +#define PCIE_ATR_TLP_TYPE(type)(((type) << 16) & GENMASK(18, 16)) +#define PCIE_ATR_TLP_TYPE_MEM PCIE_ATR_TLP_TYPE(0) +#define PCIE_ATR_TLP_TYPE_IO PCIE_ATR_TLP_TYPE(2) + +/** + * struct mtk_pcie_port - PCIe port information + * @dev: pointer to PCIe device + * @base: IO mapped register base + * @reg_base: physical register base + * @mac_reset: MAC reset control + * @phy_reset: PHY reset control + * @phy: PHY controller block + * @clks: PCIe clocks + * @num_clks: PCIe clocks count for this port + */ +struct mtk_pcie_port { + struct device *dev; + void __iomem *base; + phys_a
[PATCH v10 4/7] PCI: mediatek-gen3: Add INTx support
Add INTx support for MediaTek Gen3 PCIe controller. Signed-off-by: Jianjun Wang Acked-by: Ryder Lee Reviewed-by: Marc Zyngier --- drivers/pci/controller/pcie-mediatek-gen3.c | 172 1 file changed, 172 insertions(+) diff --git a/drivers/pci/controller/pcie-mediatek-gen3.c b/drivers/pci/controller/pcie-mediatek-gen3.c index 3546e53b3c85..ff91ad587461 100644 --- a/drivers/pci/controller/pcie-mediatek-gen3.c +++ b/drivers/pci/controller/pcie-mediatek-gen3.c @@ -9,6 +9,9 @@ #include #include #include +#include +#include +#include #include #include #include @@ -45,6 +48,13 @@ #define PCIE_LINK_STATUS_REG 0x154 #define PCIE_PORT_LINKUP BIT(8) +#define PCIE_INT_ENABLE_REG0x180 +#define PCIE_INTX_SHIFT24 +#define PCIE_INTX_ENABLE \ + GENMASK(PCIE_INTX_SHIFT + PCI_NUM_INTX - 1, PCIE_INTX_SHIFT) + +#define PCIE_INT_STATUS_REG0x184 + #define PCIE_TRANS_TABLE_BASE_REG 0x800 #define PCIE_ATR_SRC_ADDR_MSB_OFFSET 0x4 #define PCIE_ATR_TRSL_ADDR_LSB_OFFSET 0x8 @@ -73,6 +83,9 @@ * @phy: PHY controller block * @clks: PCIe clocks * @num_clks: PCIe clocks count for this port + * @irq: PCIe controller interrupt number + * @irq_lock: lock protecting IRQ register access + * @intx_domain: legacy INTx IRQ domain */ struct mtk_pcie_port { struct device *dev; @@ -83,6 +96,10 @@ struct mtk_pcie_port { struct phy *phy; struct clk_bulk_data *clks; int num_clks; + + int irq; + raw_spinlock_t irq_lock; + struct irq_domain *intx_domain; }; /** @@ -198,6 +215,11 @@ static int mtk_pcie_startup_port(struct mtk_pcie_port *port) val |= PCI_CLASS(PCI_CLASS_BRIDGE_PCI << 8); writel_relaxed(val, port->base + PCIE_PCI_IDS_1); + /* Mask all INTx interrupts */ + val = readl_relaxed(port->base + PCIE_INT_ENABLE_REG); + val &= ~PCIE_INTX_ENABLE; + writel_relaxed(val, port->base + PCIE_INT_ENABLE_REG); + /* Assert all reset signals */ val = readl_relaxed(port->base + PCIE_RST_CTRL_REG); val |= PCIE_MAC_RSTB | PCIE_PHY_RSTB | PCIE_BRG_RSTB | PCIE_PE_RSTB; @@ -261,6 +283,150 @@ static int mtk_pcie_startup_port(struct mtk_pcie_port *port) return 0; } +static int mtk_pcie_set_affinity(struct irq_data *data, +const struct cpumask *mask, bool force) +{ + return -EINVAL; +} + +static void mtk_intx_mask(struct irq_data *data) +{ + struct mtk_pcie_port *port = irq_data_get_irq_chip_data(data); + unsigned long flags; + u32 val; + + raw_spin_lock_irqsave(&port->irq_lock, flags); + val = readl_relaxed(port->base + PCIE_INT_ENABLE_REG); + val &= ~BIT(data->hwirq + PCIE_INTX_SHIFT); + writel_relaxed(val, port->base + PCIE_INT_ENABLE_REG); + raw_spin_unlock_irqrestore(&port->irq_lock, flags); +} + +static void mtk_intx_unmask(struct irq_data *data) +{ + struct mtk_pcie_port *port = irq_data_get_irq_chip_data(data); + unsigned long flags; + u32 val; + + raw_spin_lock_irqsave(&port->irq_lock, flags); + val = readl_relaxed(port->base + PCIE_INT_ENABLE_REG); + val |= BIT(data->hwirq + PCIE_INTX_SHIFT); + writel_relaxed(val, port->base + PCIE_INT_ENABLE_REG); + raw_spin_unlock_irqrestore(&port->irq_lock, flags); +} + +/** + * mtk_intx_eoi() - Clear INTx IRQ status at the end of interrupt + * @data: pointer to chip specific data + * + * As an emulated level IRQ, its interrupt status will remain + * until the corresponding de-assert message is received; hence that + * the status can only be cleared when the interrupt has been serviced. + */ +static void mtk_intx_eoi(struct irq_data *data) +{ + struct mtk_pcie_port *port = irq_data_get_irq_chip_data(data); + unsigned long hwirq; + + hwirq = data->hwirq + PCIE_INTX_SHIFT; + writel_relaxed(BIT(hwirq), port->base + PCIE_INT_STATUS_REG); +} + +static struct irq_chip mtk_intx_irq_chip = { + .irq_mask = mtk_intx_mask, + .irq_unmask = mtk_intx_unmask, + .irq_eoi= mtk_intx_eoi, + .irq_set_affinity = mtk_pcie_set_affinity, + .name = "INTx", +}; + +static int mtk_pcie_intx_map(struct irq_domain *domain, unsigned int irq, +irq_hw_number_t hwirq) +{ + irq_set_chip_data(irq, domain->host_data); + irq_set_chip_and_handler_name(irq, &mtk_intx_irq_chip, + handle_fasteoi_irq, "INTx"); + return 0; +} + +static const struct irq_domain_ops intx_domain_ops = { + .map = mtk_pcie_intx_map, +}; + +static int mtk_pcie_init_irq_domains(struct mtk_pcie_port *port) +{ + struct device *dev = port->dev; + struct device_node *intc_node, *node = dev->of_node; + + raw_spin_lock_init(&port->irq_lock); + + /* Setup INTx */ +
[PATCH v10 2/7] PCI: Export pci_pio_to_address() for module use
This interface will be used by PCI host drivers for PIO translation, export it to support compiling those drivers as kernel modules. Signed-off-by: Jianjun Wang Acked-by: Bjorn Helgaas --- drivers/pci/pci.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 16a17215f633..09f1714e23be 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4052,6 +4052,7 @@ phys_addr_t pci_pio_to_address(unsigned long pio) return address; } +EXPORT_SYMBOL_GPL(pci_pio_to_address); unsigned long __weak pci_address_to_pio(phys_addr_t address) { -- 2.25.1
[PATCH v10 0/7] PCI: mediatek: Add new generation controller support
From: mtk15901 These series patches add pcie-mediatek-gen3.c and dt-bindings file to support new generation PCIe controller. Changes in v10: 1. Fix the subject line format in commit message; 2. Use EXPORT_SYMBOL_GPL() to export pci_pio_to_address(). Changes in v9: 1. Use mtk_pcie_parse_port() to get the hw resources; 2. Remove unnecessary logs; 3. Add local IRQ enable status save/restore instead of the enable/disable callbacks; 4. Fix typos. Changes in v8: 1. Add irq_clock to protect IRQ register access; 2. Mask all INTx interrupt when startup port; 3. Remove activate/deactivate callbacks from bottom_domain_ops; 4. Add unmask/mask callbacks in mtk_msi_bottom_irq_chip; 5. Add property information for reg-names. Changes in v7: 1. Split the driver patch to core PCIe, INTx, MSI and PM patches; 2. Reshape MSI init and handle flow, use msi_bottom_domain to cover all sets; 3. Replace readl/writel with their relaxed version; 4. Add MSI description in binding document; 5. Add pl_250m clock in binding document. Changes in v6: 1. Export pci_pio_to_address() to support compiling as kernel module; 2. Replace usleep_range(100 * 1000, 120 * 1000) with msleep(100); 3. Replace dev_notice with dev_err; 4. Fix MSI get hwirq flow; 5. Fix warning for possible recursive locking in mtk_pcie_set_affinity. Changes in v5: 1. Remove unused macros 2. Modify the config read/write callbacks, set the config byte field in TLP header and use pci_generic_config_read32/write32 to access the config space 3. Fix the settings of translation window, both MEM and IO regions works properly 4. Fix typos Changes in v4: 1. Fix PCIe power up/down flow 2. Use "mac" and "phy" for reset names 3. Add clock names 4. Fix the variables type Changes in v3: 1. Remove standard property in binding document 2. Return error number when get_optional* API throws an error 3. Use the bulk clk APIs Changes in v2: 1. Fix the typo of dt-bindings patch 2. Remove the unnecessary properties in binding document 3. dispos the irq mappings of msi top domain when irq teardown Jianjun Wang (7): dt-bindings: PCI: mediatek-gen3: Add YAML schema PCI: Export pci_pio_to_address() for module use PCI: mediatek-gen3: Add MediaTek Gen3 driver for MT8192 PCI: mediatek-gen3: Add INTx support PCI: mediatek-gen3: Add MSI support PCI: mediatek-gen3: Add system PM support MAINTAINERS: Add Jianjun Wang as MediaTek PCI co-maintainer .../bindings/pci/mediatek-pcie-gen3.yaml | 181 +++ MAINTAINERS |1 + drivers/pci/controller/Kconfig| 13 + drivers/pci/controller/Makefile |1 + drivers/pci/controller/pcie-mediatek-gen3.c | 1025 + drivers/pci/pci.c |1 + 6 files changed, 1222 insertions(+) create mode 100644 Documentation/devicetree/bindings/pci/mediatek-pcie-gen3.yaml create mode 100644 drivers/pci/controller/pcie-mediatek-gen3.c -- 2.25.1
Re: [PATCH 00/13] [RFC] Rust support
On Tue, Apr 20, 2021 at 07:56:18AM +0200, Greg Kroah-Hartman wrote: > I would LOVE it if some "executives" would see the above presentations, > because then they would maybe actually fund developers to fix bugs and > maintain the kernel code, instead of only allowing them to add new > features. > > Seriously, that's the real problem, that Dmitry's work has exposed, the > lack of people allowed to do this type of bugfixing and maintenance on > company time, for something that the company relies on, is a huge issue. > "executives" feel that they are willing to fund the initial work and > then "throw it over the wall to the community" once it is merged, and > then they can forget about it as "the community" will maintain it for > them for free. And that's a lie, as Dmitry's work shows. That's sadly the eternal situation, and I'm suspecting that software development and maintenance is not identified as a requirement for a large number of hardware vendors, especially on the consumer side where margins are lower. A contractor is paid to develop a driver, *sometimes* to try to mainline it (and the later they engage with the community, the longer it takes in round trips), and once the code finally gets merged, all the initial budget is depleted and no more software work will be done. Worse, we could imagine kicking unmaintained drivers faster off the tree, but that would actually help these unscrupulous vendors by forcing their customers to switch to the new model :-/ And most of them wouldn't care either if their contributions were refused based on their track record of not maintaining their code, since they often see this as a convenience to please their customers and not something they need (after all, relying on a bogus and vulnerable BSP has never prevented from selling a device, quite the opposite). In short, there is a parallel universe where running highly bogus and vulnerable out-of-tree code seems like the norm and where there is no sort of care for what is mainlined as it's possibly just made to look "cool". We also need to recognize that it's expectable that some vendors are not willing to engage on supporting a driver for a decade if they expect their device to last 5 years only, and maybe we should make some rules clear about mainlining drivers and what to expect for users (in which case the end of support would be clear and nobody would be surprised if the driver is removed at the end of its maintenance, barring a switch to a community maintainer). Just my two cents, Willy
Re: [PATCH] pci: Rename pci_dev->untrusted to pci_dev->external
On Mon, Apr 19, 2021 at 05:30:49PM -0700, Rajat Jain wrote: > The current flag name "untrusted" is not correct as it is populated > using the firmware property "external-facing" for the parent ports. In > other words, the firmware only says which ports are external facing, so > the field really identifies the devices as external (vs internal). > > Only field renaming. No functional change intended. I don't think this is a good idea. First the field should have been added to the generic struct device as requested multiple times before. Right now this requires horrible hacks in the IOMMU code to get at the pci_dev, and also doesn't scale to various other potential users. Second the untrusted is objectively a better name. Because untrusted is how we treat the device, which is what mattes. External is just how we come to that conclusion.
Re: [PATCH] KVM: Boost vCPU candidiate in user mode which is delivering interrupt
On Tue, 20 Apr 2021 at 14:02, Wanpeng Li wrote: > > On Tue, 20 Apr 2021 at 00:59, Paolo Bonzini wrote: > > > > On 19/04/21 18:32, Sean Christopherson wrote: > > > If false positives are a big concern, what about adding another pass to > > > the loop > > > and only yielding to usermode vCPUs with interrupts in the second full > > > pass? > > > I.e. give vCPUs that are already in kernel mode priority, and only yield > > > to > > > handle an interrupt if there are no vCPUs in kernel mode. > > > > > > kvm_arch_dy_runnable() pulls in pv_unhalted, which seems like a good > > > thing. > > > > pv_unhalted won't help if you're waiting for a kernel spinlock though, > > would it? Doing two passes (or looking for a "best" candidate that > > prefers kernel mode vCPUs to user mode vCPUs waiting for an interrupt) > > seems like the best choice overall. > > How about something like this: Sorry, should be the codes below: diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 6b4dd95..9bc5f87 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -325,10 +325,12 @@ struct kvm_vcpu { * Cpu relax intercept or pause loop exit optimization * in_spin_loop: set when a vcpu does a pause loop exit * or cpu relax intercepted. + * pending_interrupt: set when a vcpu waiting for an interrupt * dy_eligible: indicates whether vcpu is eligible for directed yield. */ struct { bool in_spin_loop; +bool pending_interrupt; bool dy_eligible; } spin_loop; #endif @@ -1427,6 +1429,12 @@ static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val) { vcpu->spin_loop.in_spin_loop = val; } + +static inline void kvm_vcpu_set_pending_interrupt(struct kvm_vcpu *vcpu, bool val) +{ +vcpu->spin_loop.pending_interrupt = val; +} + static inline void kvm_vcpu_set_dy_eligible(struct kvm_vcpu *vcpu, bool val) { vcpu->spin_loop.dy_eligible = val; @@ -1438,6 +1446,10 @@ static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val) { } +static inline void kvm_vcpu_set_pending_interrupt(struct kvm_vcpu *vcpu, bool val) +{ +} + static inline void kvm_vcpu_set_dy_eligible(struct kvm_vcpu *vcpu, bool val) { } diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 529cff1..bf6f1ec 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -410,6 +410,7 @@ static void kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id) INIT_LIST_HEAD(&vcpu->blocked_vcpu_list); kvm_vcpu_set_in_spin_loop(vcpu, false); +kvm_vcpu_set_pending_interrupt(vcpu, false); kvm_vcpu_set_dy_eligible(vcpu, false); vcpu->preempted = false; vcpu->ready = false; @@ -3079,14 +3080,17 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_yield_to); * Helper that checks whether a VCPU is eligible for directed yield. * Most eligible candidate to yield is decided by following heuristics: * - * (a) VCPU which has not done pl-exit or cpu relax intercepted recently - * (preempted lock holder), indicated by @in_spin_loop. - * Set at the beginning and cleared at the end of interception/PLE handler. + * (a) VCPU which has not done pl-exit or cpu relax intercepted and is not + * waiting for an interrupt recently (preempted lock holder). The former + * one is indicated by @in_spin_loop, set at the beginning and cleared at + * the end of interception/PLE handler. The later one is indicated by + * @pending_interrupt, set when interrupt is delivering and cleared at + * the end of directed yield. * - * (b) VCPU which has done pl-exit/ cpu relax intercepted but did not get - * chance last time (mostly it has become eligible now since we have probably - * yielded to lockholder in last iteration. This is done by toggling - * @dy_eligible each time a VCPU checked for eligibility.) + * (b) VCPU which has done pl-exit/ cpu relax intercepted or is waiting for + * interrupt but did not get chance last time (mostly it has become eligible + * now since we have probably yielded to lockholder in last iteration. This + * is done by toggling @dy_eligible each time a VCPU checked for eligibility.) * * Yielding to a recently pl-exited/cpu relax intercepted VCPU before yielding * to preempted lock-holder could result in wrong VCPU selection and CPU @@ -3102,10 +3106,10 @@ static bool kvm_vcpu_eligible_for_directed_yield(struct kvm_vcpu *vcpu) #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT bool eligible; -eligible = !vcpu->spin_loop.in_spin_loop || +eligible = !(vcpu->spin_loop.in_spin_loop || vcpu->spin_loop.pending_interrupt) || vcpu->spin_loop.dy_eligible; -if (vcpu->spin_loop.in_spin_loop) +if (vcpu->spin_loop.in_spin_loop || vcpu->spin_loop.pending_interrupt) kvm_vcpu_set_dy_eligible(vcpu, !vcpu->spin_loop.dy_eligible); return eligible; @@ -3137,6 +3141,16 @@ static bool vcpu_dy_runnable(struct kvm_vcpu *vcpu) return false; } +static bool kvm_has_int
Re: [PATCHv2 2/2] iommu/arm-smmu-qcom: Move the adreno smmu specific impl earlier
On 2021-04-19 20:08, Bjorn Andersson wrote: On Fri 26 Feb 03:55 CST 2021, Sai Prakash Ranjan wrote: Adreno(GPU) SMMU and APSS(Application Processor SubSystem) SMMU both implement "arm,mmu-500" in some QTI SoCs and to run through adreno smmu specific implementation such as enabling split pagetables support, we need to match the "qcom,adreno-smmu" compatible first before apss smmu or else we will be running apps smmu implementation for adreno smmu and the additional features for adreno smmu is never set. For ex: we have "qcom,sc7280-smmu-500" compatible for both apps and adreno smmu implementing "arm,mmu-500", so the adreno smmu implementation is never reached because the current sequence checks for apps smmu compatible(qcom,sc7280-smmu-500) first and runs that specific impl and we never reach adreno smmu specific implementation. Suggested-by: Akhil P Oommen Signed-off-by: Sai Prakash Ranjan Sorry for taking my time thinking about this. Reviewed-by: Bjorn Andersson No worries, thanks Bjorn. -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCHv3 2/2] iommu/arm-smmu-qcom: Move the adreno smmu specific impl earlier
Adreno(GPU) SMMU and APSS(Application Processor SubSystem) SMMU both implement "arm,mmu-500" in some QTI SoCs and to run through adreno smmu specific implementation such as enabling split pagetables support, we need to match the "qcom,adreno-smmu" compatible first before apss smmu or else we will be running apps smmu implementation for adreno smmu and the additional features for adreno smmu is never set. For ex: we have "qcom,sc7280-smmu-500" compatible for both apps and adreno smmu implementing "arm,mmu-500", so the adreno smmu implementation is never reached because the current sequence checks for apps smmu compatible(qcom,sc7280-smmu-500) first and runs that specific impl and we never reach adreno smmu specific implementation. Suggested-by: Akhil P Oommen Signed-off-by: Sai Prakash Ranjan Reviewed-by: Bjorn Andersson Acked-by: Jordan Crouse --- drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c index bea3ee0dabc2..03f048aebb80 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c @@ -345,11 +345,17 @@ struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu) { const struct device_node *np = smmu->dev->of_node; - if (of_match_node(qcom_smmu_impl_of_match, np)) - return qcom_smmu_create(smmu, &qcom_smmu_impl); - + /* +* Do not change this order of implementation, i.e., first adreno +* smmu impl and then apss smmu since we can have both implementing +* arm,mmu-500 in which case we will miss setting adreno smmu specific +* features if the order is changed. +*/ if (of_device_is_compatible(np, "qcom,adreno-smmu")) return qcom_smmu_create(smmu, &qcom_adreno_smmu_impl); + if (of_match_node(qcom_smmu_impl_of_match, np)) + return qcom_smmu_create(smmu, &qcom_smmu_impl); + return smmu; } -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCHv3 0/2] iommu/arm-smmu-qcom: Add SC7280 support
Patch 1 adds the sc7280 smmu compatible. Patch 2 moves the adreno smmu check before apss smmu to enable adreno smmu specific implementation. Note that dt-binding for sc7280 is already merged. Changes in v3: * Collect acks and reviews * Rebase on top of for-joerg/arm-smmu/updates Changes in v2: * Add a comment to make sure this order is not changed in future (Jordan) Sai Prakash Ranjan (2): iommu/arm-smmu-qcom: Add SC7280 SMMU compatible iommu/arm-smmu-qcom: Move the adreno smmu specific impl earlier drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCHv3 1/2] iommu/arm-smmu-qcom: Add SC7280 SMMU compatible
Add compatible for SC7280 SMMU to use the Qualcomm Technologies, Inc. specific implementation. Signed-off-by: Sai Prakash Ranjan --- drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c index 98b3a1c2a181..bea3ee0dabc2 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c @@ -166,6 +166,7 @@ static const struct of_device_id qcom_smmu_client_of_match[] __maybe_unused = { { .compatible = "qcom,mdss" }, { .compatible = "qcom,sc7180-mdss" }, { .compatible = "qcom,sc7180-mss-pil" }, + { .compatible = "qcom,sc7280-mdss" }, { .compatible = "qcom,sc8180x-mdss" }, { .compatible = "qcom,sdm845-mdss" }, { .compatible = "qcom,sdm845-mss-pil" }, @@ -330,6 +331,7 @@ static struct arm_smmu_device *qcom_smmu_create(struct arm_smmu_device *smmu, static const struct of_device_id __maybe_unused qcom_smmu_impl_of_match[] = { { .compatible = "qcom,msm8998-smmu-v2" }, { .compatible = "qcom,sc7180-smmu-500" }, + { .compatible = "qcom,sc7280-smmu-500" }, { .compatible = "qcom,sc8180x-smmu-500" }, { .compatible = "qcom,sdm630-smmu-v2" }, { .compatible = "qcom,sdm845-smmu-500" }, -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH] KVM: Boost vCPU candidiate in user mode which is delivering interrupt
On Tue, 20 Apr 2021 at 00:59, Paolo Bonzini wrote: > > On 19/04/21 18:32, Sean Christopherson wrote: > > If false positives are a big concern, what about adding another pass to the > > loop > > and only yielding to usermode vCPUs with interrupts in the second full pass? > > I.e. give vCPUs that are already in kernel mode priority, and only yield to > > handle an interrupt if there are no vCPUs in kernel mode. > > > > kvm_arch_dy_runnable() pulls in pv_unhalted, which seems like a good thing. > > pv_unhalted won't help if you're waiting for a kernel spinlock though, > would it? Doing two passes (or looking for a "best" candidate that > prefers kernel mode vCPUs to user mode vCPUs waiting for an interrupt) > seems like the best choice overall. How about something like this: diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 6b4dd95..8ba50be 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -325,10 +325,12 @@ struct kvm_vcpu { * Cpu relax intercept or pause loop exit optimization * in_spin_loop: set when a vcpu does a pause loop exit * or cpu relax intercepted. + * pending_interrupt: set when a vcpu waiting for an interrupt * dy_eligible: indicates whether vcpu is eligible for directed yield. */ struct { bool in_spin_loop; +bool pending_interrupt; bool dy_eligible; } spin_loop; #endif @@ -1427,6 +1429,12 @@ static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val) { vcpu->spin_loop.in_spin_loop = val; } + +static inline void kvm_vcpu_set_pending_interrupt(struct kvm_vcpu *vcpu, bool val) +{ +vcpu->spin_loop.pending__interrupt = val; +} + static inline void kvm_vcpu_set_dy_eligible(struct kvm_vcpu *vcpu, bool val) { vcpu->spin_loop.dy_eligible = val; @@ -1438,6 +1446,10 @@ static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val) { } +static inline void kvm_vcpu_set_pending_interrupt(struct kvm_vcpu *vcpu, bool val) +{ +} + static inline void kvm_vcpu_set_dy_eligible(struct kvm_vcpu *vcpu, bool val) { } diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 529cff1..42e0255 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -410,6 +410,7 @@ static void kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id) INIT_LIST_HEAD(&vcpu->blocked_vcpu_list); kvm_vcpu_set_in_spin_loop(vcpu, false); +kvm_vcpu_set_pending_interrupt(vcpu, false); kvm_vcpu_set_dy_eligible(vcpu, false); vcpu->preempted = false; vcpu->ready = false; @@ -3079,14 +3080,17 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_yield_to); * Helper that checks whether a VCPU is eligible for directed yield. * Most eligible candidate to yield is decided by following heuristics: * - * (a) VCPU which has not done pl-exit or cpu relax intercepted recently - * (preempted lock holder), indicated by @in_spin_loop. - * Set at the beginning and cleared at the end of interception/PLE handler. + * (a) VCPU which has not done pl-exit or cpu relax intercepted and is not + * waiting for an interrupt recently (preempted lock holder). The former + * one is indicated by @in_spin_loop, set at the beginning and cleared at + * the end of interception/PLE handler. The later one is indicated by + * @pending_interrupt, set when interrupt is delivering and cleared at + * the end of directed yield. * - * (b) VCPU which has done pl-exit/ cpu relax intercepted but did not get - * chance last time (mostly it has become eligible now since we have probably - * yielded to lockholder in last iteration. This is done by toggling - * @dy_eligible each time a VCPU checked for eligibility.) + * (b) VCPU which has done pl-exit/ cpu relax intercepted or is waiting for + * interrupt but did not get chance last time (mostly it has become eligible + * now since we have probably yielded to lockholder in last iteration. This + * is done by toggling @dy_eligible each time a VCPU checked for eligibility.) * * Yielding to a recently pl-exited/cpu relax intercepted VCPU before yielding * to preempted lock-holder could result in wrong VCPU selection and CPU @@ -3102,10 +3106,10 @@ static bool kvm_vcpu_eligible_for_directed_yield(struct kvm_vcpu *vcpu) #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT bool eligible; -eligible = !vcpu->spin_loop.in_spin_loop || +eligible = !(vcpu->spin_loop.in_spin_loop || vcpu->spin_loop.has_interrupt) || vcpu->spin_loop.dy_eligible; -if (vcpu->spin_loop.in_spin_loop) +if (vcpu->spin_loop.in_spin_loop || vcpu->spin_loop.has_interrupt) kvm_vcpu_set_dy_eligible(vcpu, !vcpu->spin_loop.dy_eligible); return eligible; @@ -3137,6 +3141,16 @@ static bool vcpu_dy_runnable(struct kvm_vcpu *vcpu) return false; } +static bool kvm_has_interrupt_delivery(struct kvm_vcpu *vcpu) +{ +if (vcpu_dy_runnable(vcpu)) { +kvm_vcpu_set_pending_interrupt(vcpu, true); +return t
Re: Enabling pmbus power control
On 4/19/21 10:50 PM, Zev Weiss wrote: [ ... ] > I had a glance at the enclosure driver; it looks pretty geared toward > SES-like things (drivers/scsi/ses.c being its only usage I can see in the > kernel at the moment) and while it could perhaps be pressed into working for > this it seems like it would probably drag in a fair amount of boilerplate and > result in a somewhat gratuitously confusing driver arrangement (calling the > things involved in the cases we're looking at "enclosures" seems like a bit > of a stretch). > > As an alternative, would something like the patch below be more along the > lines of what you're suggesting? And if so, would it make sense to > generalize it into something like 'pmbus-switch.c' and add a > PMBUS_HAVE_POWERSWITCH functionality bit or similar in the pmbus code instead > of hardcoding it for only LM25066 support? > > No. Don't access pmbus functions from outside drivers/hwmon/pmbus. I used to be opposed to function export restrictions (aka export namespaces), but you are making a good case that we need to introduce them for pmbus functions. Guenter
Re: [PATCH 00/13] [RFC] Rust support
On Mon, Apr 19, 2021 at 05:24:33PM -0700, Nick Desaulniers wrote: > On Fri, Apr 16, 2021 at 10:39 AM Willy Tarreau wrote: > > > > resources usage, I'm really not convinced at all it's suited for > > low-level development. I understand the interest of the experiment > > to help the language evolve into that direction, but I fear that > > the kernel will soon be as bloated and insecure as a browser, and > > that's really not to please me. > > Dunno, I don't think the introduction of Rust made Firefox _more_ insecure. > https://wiki.mozilla.org/Oxidation#Within_Firefox > > I pray no executives ever see Dmitry Vyukov's 2019 Linux Plumbers Conf > talk "Reflections on kernel quality, development process and testing." > https://www.youtube.com/watch?v=iAfrrNdl2f4 > or his 2018 Linux Security Summit talk "Syzbot and the Tale of > Thousand Kernel Bugs" https://www.youtube.com/watch?v=qrBVXxZDVQY > (and they're just fuzzing the syscall interface and USB devices. > Imagine once folks can more easily craft malformed bluetooth and wifi > packets.) > > I'd imagine the first term that comes to mind for them might be > "liability." They are quite sensitive to these vulnerabilities with > silly names, logos, and websites. There are many of us that believe > an incremental approach of introducing a memory safe language to our > existing infrastructure at the very least to attempt to improve the > quality of drivers for those that choose to use such tools is a better > approach. I would LOVE it if some "executives" would see the above presentations, because then they would maybe actually fund developers to fix bugs and maintain the kernel code, instead of only allowing them to add new features. Seriously, that's the real problem, that Dmitry's work has exposed, the lack of people allowed to do this type of bugfixing and maintenance on company time, for something that the company relies on, is a huge issue. "executives" feel that they are willing to fund the initial work and then "throw it over the wall to the community" once it is merged, and then they can forget about it as "the community" will maintain it for them for free. And that's a lie, as Dmitry's work shows. The world creates new use cases and testing ability all the time, which exposes bugs that have been around in old code. Once the bugs are fixed in that layer of code, the next layer down can finally be tested and then look, more corner cases of issues. Rewriting the kernel in another language is not going to fix the majority of the issues that fuzzing finds here automagically, as that work has exposed us to things like fault-injection and malicious USB packets that no language would have saved us from "automatically". All of those code paths deal with "unsafe" data that doesn't magically become "safe" because we switch languages. And over time, what we have determined is "safe" has changed! People forget that only a few years ago have we decided that the kernel now has to protect userspace programs from malicious hardware. That's a major shift in thinking, now data that we used to blindly trust can not be trusted at all. And "executives" want us to fix all of those issues for free, when really it's a major design shift for loads of kernel subsystems to handle this new threat model. So yes, please spread that talk around. Maybe then will we actually get funding and support to FIX the bugs that those tools test. Right now, the primary fixer of those findings are _INTERNS_ as that's all companies are willing to fund to fix this type of thing. And don't get me started on the inability for "executives" to fund other parts of Linux that they rely on, because they want "other companies" to do it instead. The tragedy-of-the-commons is a real threat to Linux, and always has been... thanks, greg k-h
[PATCH v3 2/2] mmc: block: Update ext_csd.cache_ctrl if it was written
The cache function can be turned ON and OFF by writing to the CACHE_CTRL byte (EXT_CSD byte [33]). However, card->ext_csd.cache_ctrl is only set on init if cache size > 0. Fix that by explicitly setting ext_csd.cache_ctrl on ext-csd write. Signed-off-by: Avri Altman --- drivers/mmc/core/block.c | 12 1 file changed, 12 insertions(+) diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index 5b6501fc9fb7..8b07ed5e08de 100644 --- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -572,6 +572,18 @@ static int __mmc_blk_ioctl_cmd(struct mmc_card *card, struct mmc_blk_data *md, main_md->part_curr = value & EXT_CSD_PART_CONFIG_ACC_MASK; } + /* +* Make sure to update CACHE_CTRL in case it was changed. The cache +* will get turned back on if the card is re-initialized, e.g. +* suspend/resume or hw reset in recovery. +*/ + if ((MMC_EXTRACT_INDEX_FROM_ARG(cmd.arg) == EXT_CSD_CACHE_CTRL) && + (cmd.opcode == MMC_SWITCH)) { + u8 value = MMC_EXTRACT_VALUE_FROM_ARG(cmd.arg) & 1; + + card->ext_csd.cache_ctrl = value; + } + /* * According to the SD specs, some commands require a delay after * issuing the command. -- 2.25.1
[PATCH v3 1/2] mmc: block: Issue flush only if allowed
The cache may be flushed to the nonvolatile storage by writing to FLUSH_CACHE byte (EXT_CSD byte [32]). When in command queueing mode, the cache may be flushed by issuing a CMDQ_TASK_ DEV_MGMT (CMD48) with a FLUSH_CACHE op-code. Either way, verify that The cache function is turned ON before doing so. fixes: 1e8e55b67030 (mmc: block: Add CQE support) Reported-by: Brendan Peter Tested-by: Brendan Peter Signed-off-by: Avri Altman --- drivers/mmc/core/block.c | 7 +++ drivers/mmc/core/mmc.c | 2 +- drivers/mmc/core/mmc_ops.h | 5 + 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index 8bfd4d95b386..5b6501fc9fb7 100644 --- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -1476,6 +1476,11 @@ static int mmc_blk_cqe_issue_flush(struct mmc_queue *mq, struct request *req) struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req); struct mmc_request *mrq = mmc_blk_cqe_prep_dcmd(mqrq, req); + if (mmc_card_mmc(mq->card) && !mmc_flush_allowed(mq->card)) { + blk_mq_end_request(req, BLK_STS_OK); + return -EPERM; + } + mrq->cmd->opcode = MMC_SWITCH; mrq->cmd->arg = (MMC_SWITCH_MODE_WRITE_BYTE << 24) | (EXT_CSD_FLUSH_CACHE << 16) | @@ -2226,6 +2231,8 @@ enum mmc_issued mmc_blk_mq_issue_rq(struct mmc_queue *mq, struct request *req) switch (req_op(req)) { case REQ_OP_FLUSH: ret = mmc_blk_cqe_issue_flush(mq, req); + if (ret == -EPERM) + return MMC_REQ_FINISHED; break; case REQ_OP_READ: case REQ_OP_WRITE: diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c index 9ad4aa537867..e3da62ffcb5e 100644 --- a/drivers/mmc/core/mmc.c +++ b/drivers/mmc/core/mmc.c @@ -2037,7 +2037,7 @@ static int _mmc_flush_cache(struct mmc_card *card) { int err = 0; - if (card->ext_csd.cache_size > 0 && card->ext_csd.cache_ctrl & 1) { + if (mmc_flush_allowed(card)) { err = mmc_switch(card, EXT_CSD_CMD_SET_NORMAL, EXT_CSD_FLUSH_CACHE, 1, CACHE_FLUSH_TIMEOUT_MS); diff --git a/drivers/mmc/core/mmc_ops.h b/drivers/mmc/core/mmc_ops.h index 5782fdf4e8e9..2682bf66708a 100644 --- a/drivers/mmc/core/mmc_ops.h +++ b/drivers/mmc/core/mmc_ops.h @@ -19,6 +19,11 @@ enum mmc_busy_cmd { struct mmc_host; struct mmc_card; +static inline bool mmc_flush_allowed(struct mmc_card *card) +{ + return card->ext_csd.cache_size > 0 && card->ext_csd.cache_ctrl & 1; +} + int mmc_select_card(struct mmc_card *card); int mmc_deselect_cards(struct mmc_host *host); int mmc_set_dsr(struct mmc_host *host); -- 2.25.1
[PATCH v3 0/2] Do not flush cache when it is disabled
v2 -> v3: - rebase onto recent cache changes v1 -> v2: - Attend Adrian's comments Cache is a temporary storage space in an eMMC device. Volatile by nature, the cache should in typical case reduce the access time compared to an access to the main nonvolatile storage. The cache function can be turned ON and OFF. Once OFF, the host is not expected to issue a flush-cache command to the device. Avri Altman (2): mmc: block: Issue flush only if allowed mmc: block: Update ext_csd.cache_ctrl if it was written drivers/mmc/core/block.c | 19 +++ drivers/mmc/core/mmc.c | 2 +- drivers/mmc/core/mmc_ops.h | 5 + 3 files changed, 25 insertions(+), 1 deletion(-) -- 2.25.1
Re: Enabling pmbus power control
On Mon, Apr 19, 2021 at 10:36:48PM CDT, Guenter Roeck wrote: On Mon, Apr 19, 2021 at 08:29:53PM -0500, Zev Weiss wrote: On Tue, Mar 30, 2021 at 02:38:10PM CDT, Guenter Roeck wrote: > On Tue, Mar 30, 2021 at 07:02:00PM +0100, Mark Brown wrote: > > On Tue, Mar 30, 2021 at 12:56:56PM -0500, Zev Weiss wrote: > > > > > Okay, to expand a bit on the description in my initial message -- we've > > > got a single chassis with multiple server boards and a single manager board > > > that handles, among other things, power control for the servers. > > > The manager board has one LM25066 for each attached server, which acts as > > > the "power switch" for that server. There thus really isn't any driver to > > > speak of for the downstream device. > > > > This sounds like you need a driver representing those server boards (or > > the slots they plug into perhaps) that represents everything about those > > boards to userspace, including power switching. I don't see why you > > wouldn't have a driver for that - it's a thing that physically exists > > and clearly has some software control, and you'd presumably also expect > > to represent some labelling about the slot as well. > > Absolutely agree. > > Thanks, > Guenter Hi Guenter, Mark, I wanted to return to this to try to explain why structuring the kernel support for this in a way that's specific to the device behind the PMIC seems like an awkward fit to me, and ultimately kind of artificial. In the system I described, the manager board with the LM25066 devices is its own little self-contained BMC system running its own Linux kernel (though "BMC" is perhaps a slightly misleading term because there's no host system that it manages). The PMICs are really the only relevant connection it has to the servers it controls power for -- they have their own dedicated local BMCs on board as well doing all the usual BMC tasks. A hypothetical dedicated driver for this on the manager board wouldn't have any other hardware to touch aside from the pmbus interface of the LM25066 itself, so calling it a server board driver seems pretty arbitrary -- and in fact the same system also has another LM25066 that controls the power supply to the chassis cooling fans (a totally different downstream device, but one that would be equally well-served by the same software). More recently, another system has entered the picture for us that might illustrate it more starkly -- the Delta Open19 power shelf [0] supported by a recent code release from LinkedIn [1]. This is a rackmount power supply All I can see is that this code is a mess. with fifty outputs, each independently switchable via an LM25066 attached to an on-board BMC-style management controller (though again, no host system involved). We (Equinix Metal) are interested in porting a modern OpenBMC to it to replace the dated, crufty, pre-Linux-Foundation version of OpenBMC it currently runs (as found in the linked repo). The exact nature of the devices powered by its outputs is well outside the scope of the firmware running on that controller (it could be any arbitrary thing that runs on 12VDC), but we still want to be able to both (a) retrieve per-output voltage/current/power readings as provided by the existing LM25066 hwmon support, and (b) control the on/off state of those outputs from userspace. Given the array of possible use-cases, an approach of adding power-switch functionality to the existing LM25066 support seems like the most obvious way to support this, so I'm hoping to see if I can get some idea of what an acceptable implementation of that might look like. Sorry, that is simply a no-go for the LM25066 driver. I mentioned it before, and it is still true: The hwmon subsystem is not in the business of power control. Have you looked into enclosures (drivers/misc/enclosure.c) ? Maybe that or something similar could be used for your purposes. Thanks, Guenter I had a glance at the enclosure driver; it looks pretty geared toward SES-like things (drivers/scsi/ses.c being its only usage I can see in the kernel at the moment) and while it could perhaps be pressed into working for this it seems like it would probably drag in a fair amount of boilerplate and result in a somewhat gratuitously confusing driver arrangement (calling the things involved in the cases we're looking at "enclosures" seems like a bit of a stretch). As an alternative, would something like the patch below be more along the lines of what you're suggesting? And if so, would it make sense to generalize it into something like 'pmbus-switch.c' and add a PMBUS_HAVE_POWERSWITCH functionality bit or similar in the pmbus code instead of hardcoding it for only LM25066 support? Thanks, Zev From f4c0a3637371d69a414b6fb882497d22e5de3ba0 Mon Sep 17 00:00:00 2001 From: Zev Weiss Date: Wed, 31 Mar 2021 01:58:35 -0500 Subject: [PATCH] misc: add lm25066-switch driver This driver allows an lm25066 to be switched on and off from userspace via sysf
Doubt regarding memory allocation in KVM
Hi, I'm learning about qemu KVM, looking into code and experimenting on it. I have the following doubts regarding it, I would be grateful if you help me to get some idea on them. 1. I observe that KVM allocates memory to guests when it needs it but doesn't take it back (except for ballooning case). Also, the Qemu/KVM process does not free the memory even when the guest is rebooted. In this case, Does the Guest VM get access to memory already pre-filled with some garbage from the previous run?? (Since the host would allocate zeroed pages to guests the first time it requests but after that it's up to guests). Can it be a security issue? 2. How does the KVM know if GPFN (guest physical frame number) is backed by an actual machine frame number in host? If not mapped, then it faults in the host and allocates a physical frame for guests in the host. (kvm_mmu_page_fault) 3. How/where can I access the GPFNs in the host? Is "gfn_t gfn = gpa >> PAGE_SHIFT" and "gpa_t cr2_or_gpa" in the KVM page fault handler, x86 is the same as GPFN. (that is can I use pfn_to_page in guest VM to access the struct page in Guest) Thank You. Best Regards, Shivank Garg M.Tech Student, IIT Kanpur
Re: [PATCH] watchdog: qcom: Move suspend/resume to suspend_late/resume_early
Hi Guenter, On 2021-03-11 01:53, Guenter Roeck wrote: On Thu, Mar 11, 2021 at 01:50:04AM +0530, Sai Prakash Ranjan wrote: During suspend/resume usecases and tests, it is common to see issues such as lockups either in suspend path or resume path because of the bugs in the corresponding device driver pm handling code. In such cases, it is important that watchdog is active to make sure that we either receive a watchdog pretimeout notification or a bite causing reset instead of a hang causing us to hard reset the machine. There are good reasons as to why we need this because: * We can have a watchdog pretimeout governor set to panic in which case we can have a backtrace which would help identify the issue with the particular driver and cause a normal reboot. * Even in case where there is no pretimeout support, a watchdog bite is still useful because some firmware has debug support to dump CPU core context on watchdog bite for post-mortem analysis. * One more usecase which comes to mind is of warm reboot. In case we hard reset the target, a cold reboot could be induced resulting in lose of ddr contents thereby losing all the debug info. Currently, the watchdog pm callback just invokes the usual suspend and resume callback which do not have any special ordering in the sense that a watchdog can be suspended before the buggy device driver suspend callback and watchdog resume can happen after the buggy device driver resume callback. This would mean that the watchdog will not be active when the buggy driver cause the lockups thereby hanging the system. So to make sure this doesn't happen, move the watchdog pm to use late/early system pm callbacks which will ensure that the watchdog is suspended late and resumed early so that it can catch such issues. Signed-off-by: Sai Prakash Ranjan Reviewed-by: Guenter Roeck Gentle Ping. I don't see this in linux-next or linux-watchdog, please let me know if anything is pending from my side. Thanks, Sai --- drivers/watchdog/qcom-wdt.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/watchdog/qcom-wdt.c b/drivers/watchdog/qcom-wdt.c index e38a87ffe5f5..0d2209c5eaca 100644 --- a/drivers/watchdog/qcom-wdt.c +++ b/drivers/watchdog/qcom-wdt.c @@ -329,7 +329,9 @@ static int __maybe_unused qcom_wdt_resume(struct device *dev) return 0; } -static SIMPLE_DEV_PM_OPS(qcom_wdt_pm_ops, qcom_wdt_suspend, qcom_wdt_resume); +static const struct dev_pm_ops qcom_wdt_pm_ops = { + SET_LATE_SYSTEM_SLEEP_PM_OPS(qcom_wdt_suspend, qcom_wdt_resume) +}; static const struct of_device_id qcom_wdt_of_table[] = { { .compatible = "qcom,kpss-timer", .data = &match_data_apcs_tmr }, -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH] x86/mm: Fix copy&paste error in comments
Direct page mapping in bottom-up way will allocate memory from low address for page structures in a range, which is the *bottom*, not the *end*. Signed-off-by: Cao jin --- arch/x86/mm/init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index e26f5c5c6565..bc2f871c75f1 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -663,7 +663,7 @@ static void __init memory_map_bottom_up(unsigned long map_start, /* * We start from the bottom (@map_start) and go to the top (@map_end). * The memblock_find_in_range() gets us a block of RAM from the -* end of RAM in [min_pfn_mapped, max_pfn_mapped) used as new pages +* bottom of RAM in [min_pfn_mapped, max_pfn_mapped) used as new pages * for page table. */ while (start < map_end) { -- 2.30.1
Re: [PATCH 1/1] powerpc/pseries/iommu: Fix window size for direct mapping with pmem
On Tue, 2021-04-20 at 15:18 +1000, Alexey Kardashevskiy wrote: > > On 20/04/2021 14:54, Leonardo Bras wrote: > > As of today, if the DDW is big enough to fit (1 << MAX_PHYSMEM_BITS) it's > > possible to use direct DMA mapping even with pmem region. > > > > But, if that happens, the window size (len) is set to > > (MAX_PHYSMEM_BITS - page_shift) instead of MAX_PHYSMEM_BITS, causing a > > pagesize times smaller DDW to be created, being insufficient for correct > > usage. > > > > Fix this so the correct window size is used in this case. > > Good find indeed. > > afaict this does not create a huge problem though as > query.largest_available_block is always smaller than (MAX_PHYSMEM_BITS - > page_shift) where it matters (phyp). > > > Reviewed-by: Alexey Kardashevskiy > Thanks for reviewing! Leonardo Bras
Re: [PATCH 5/7] mfd: sec: Simplify getting of_device_id match data
On 19.04.2021 10:17, Krzysztof Kozlowski wrote: > Use of_device_get_match_data() to make the code slightly smaller. > > Signed-off-by: Krzysztof Kozlowski > --- > drivers/mfd/sec-core.c | 9 +++-- > 1 file changed, 3 insertions(+), 6 deletions(-) > > diff --git a/drivers/mfd/sec-core.c b/drivers/mfd/sec-core.c > index 8d55992da19e..3126c39f3203 100644 > --- a/drivers/mfd/sec-core.c > +++ b/drivers/mfd/sec-core.c > @@ -10,6 +10,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -324,12 +325,8 @@ static inline unsigned long > sec_i2c_get_driver_data(struct i2c_client *i2c, > const struct i2c_device_id *id) > { > #ifdef CONFIG_OF > - if (i2c->dev.of_node) { > - const struct of_device_id *match; > - > - match = of_match_node(sec_dt_match, i2c->dev.of_node); > - return (unsigned long)match->data; > - } > + if (i2c->dev.of_node) > + return (unsigned long)of_device_get_match_data(&i2c->dev); > #endif Does it make any sense to keep the #ifdef CONFIG_OF after this change? I would also skip (i2c->dev.of_node) check, because of_device_get_match_data() already does that (although indirectly). > return id->driver_data; > } Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland
Re: [PATCH v2 8/8] MIPS: pci-legacy: use generic pci_enable_resources
On Tue, Apr 13, 2021 at 08:12:40PM -0700, Ilya Lipnitskiy wrote: > Follow the reasoning from commit 842de40d93e0 ("PCI: add generic > pci_enable_resources()"): > > The only functional difference from the MIPS version is that the > generic one uses "!r->parent" to check for resource collisions > instead of "!r->start && r->end". > > That should have no effect on any pci-legacy driver. > Unfortunately it does. With this patch in place, all my mips qemu emulations fail to boot from ide/ata drive; the driver is not instantiated. The error message is: ata_piix :00:0a.1: can't enable device: BAR 0 [io 0x01f0-0x01f7] not claimed ata_piix: probe of :00:0a.1 failed with error -22 Reverting this patch fixes the problem, and the message displayed by the driver is: ata_piix :00:0a.1: enabling device ( -> 0001) Bisect log is attached. Guenter --- # bad: [50b8b1d699ac313c0a07a3c185ffb23aecab8abb] Add linux-next specific files for 20210419 # good: [bf05bf16c76bb44ab5156223e1e58e26dfe30a88] Linux 5.12-rc8 git bisect start 'HEAD' 'v5.12-rc8' # bad: [c4bb91fc07e59241cde97f913d7a2fbedc248f0d] Merge remote-tracking branch 'crypto/master' git bisect bad c4bb91fc07e59241cde97f913d7a2fbedc248f0d # bad: [499f739ad70f2a58aac985dceb25ca7666da88be] Merge remote-tracking branch 'jc_docs/docs-next' git bisect bad 499f739ad70f2a58aac985dceb25ca7666da88be # bad: [11b56408a328d1c5c4dfa7667c5dc46956b64aec] Merge remote-tracking branch 'parisc-hd/for-next' git bisect bad 11b56408a328d1c5c4dfa7667c5dc46956b64aec # good: [09ccc0ee1227f2cfe50d8dbbe241d115d9b3885f] Merge branch 'arm/defconfig' into for-next git bisect good 09ccc0ee1227f2cfe50d8dbbe241d115d9b3885f # good: [a5b76c2f17330e266a5c56dde21430e27b0d0dbb] Merge remote-tracking branch 'arm-soc/for-next' git bisect good a5b76c2f17330e266a5c56dde21430e27b0d0dbb # good: [1e4241f6813f1c1a0027d96df075ffd01808b3cf] Merge remote-tracking branch 'ti-k3/ti-k3-next' git bisect good 1e4241f6813f1c1a0027d96df075ffd01808b3cf # good: [7496a43be7a362391607d78e49a3f28de80029ce] Merge remote-tracking branch 'h8300/h8300-next' git bisect good 7496a43be7a362391607d78e49a3f28de80029ce # good: [66633abd0642f1e89d26e15f36fb13d3a1c535ff] MIPS/bpf: Enable bpf_probe_read{, str}() on MIPS again git bisect good 66633abd0642f1e89d26e15f36fb13d3a1c535ff # good: [2c92ef8ff8d327797c1920ae7f938bcc6f3f7421] MIPS: Fix strnlen_user access check git bisect good 2c92ef8ff8d327797c1920ae7f938bcc6f3f7421 # good: [3a070801c61f4e3987d59b1068368ba71d727208] Merge remote-tracking branch 'microblaze/next' git bisect good 3a070801c61f4e3987d59b1068368ba71d727208 # good: [317f553bb677e324c9c865ff7f14597bc5ceeb9c] MIPS: pci-legacy: remove redundant info messages git bisect good 317f553bb677e324c9c865ff7f14597bc5ceeb9c # bad: [6ce48897ce476bed86fde28752c27596e8753277] MIPS: Loongson64: Add kexec/kdump support git bisect bad 6ce48897ce476bed86fde28752c27596e8753277 # bad: [99bca615d89510917864fac6b26fd343eff2aba2] MIPS: pci-legacy: use generic pci_enable_resources git bisect bad 99bca615d89510917864fac6b26fd343eff2aba2 # good: [0af83d2e447af3e5098583cb6320bb1b1fb0976b] MIPS: pci-legacy: remove busn_resource field git bisect good 0af83d2e447af3e5098583cb6320bb1b1fb0976b # first bad commit: [99bca615d89510917864fac6b26fd343eff2aba2] MIPS: pci-legacy: use generic pci_enable_resources
Re: [PATCH] mips: kdump: Crash kernel should be able to see old memories
Hi, Jiaxun On 04/20/2021 09:11 AM, Jiaxun Yang wrote: 在 2021/4/19 18:56, Youling Tang 写道: From: Huacai Chen kexec-tools use mem=X@Y to pass usable memories to crash kernel, but in commit a94e4f24ec836c8984f83959 ("MIPS: init: Drop boot_mem_map") all BIOS passed memories are removed by early_parse_mem(). I think this is reasonable for a normal kernel but not for a crash kernel, because a crash kernel should be able to see all old memories, even though it is not supposed to use them. Fixes: a94e4f24ec836c8984f83959 ("MIPS: init: Drop boot_mem_map") Signed-off-by: Huacai Chen Signed-off-by: Youling Tang --- arch/mips/kernel/setup.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c index b86e241..ac90d3b 100644 --- a/arch/mips/kernel/setup.c +++ b/arch/mips/kernel/setup.c @@ -351,8 +351,10 @@ static int __init early_parse_mem(char *p) */ if (usermem == 0) { usermem = 1; +#ifndef CONFIG_CRASH_DUMP Why depend on a config instead of a runtime variable? If not depend on config, we can determine whether the command line contains the "elfcorehdr=" parameter, because the "mem=" and "elfcorhdr=" parameters are automatically added in kexec-tools. So if there is an "elfcorehdr=" parameter in the command line, it means that the currently running kernel is a capture kernel, and the memblock_remove() operation is not called. The revised patch is as follows: if (usermem == 0) { usermem = 1; - memblock_remove(memblock_start_of_DRAM(), - memblock_end_of_DRAM() - memblock_start_of_DRAM()); + if (!strstr(boot_command_line, "elfcorehdr")) { + memblock_remove(memblock_start_of_DRAM(), + memblock_end_of_DRAM() - memblock_start_of_DRAM()); + } Do you think it is feasible? Btw as you are fixing my commit please keep me CCed. Sorry, I will add your CCed. Thanks, Youling Thanks. - Jiaxun
Re: [PATCH 1/1] powerpc/pseries/iommu: Fix window size for direct mapping with pmem
On 20/04/2021 14:54, Leonardo Bras wrote: As of today, if the DDW is big enough to fit (1 << MAX_PHYSMEM_BITS) it's possible to use direct DMA mapping even with pmem region. But, if that happens, the window size (len) is set to (MAX_PHYSMEM_BITS - page_shift) instead of MAX_PHYSMEM_BITS, causing a pagesize times smaller DDW to be created, being insufficient for correct usage. Fix this so the correct window size is used in this case. Good find indeed. afaict this does not create a huge problem though as query.largest_available_block is always smaller than (MAX_PHYSMEM_BITS - page_shift) where it matters (phyp). Reviewed-by: Alexey Kardashevskiy Fixes: bf6e2d562bbc4("powerpc/dma: Fallback to dma_ops when persistent memory present") Signed-off-by: Leonardo Bras --- arch/powerpc/platforms/pseries/iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c index 9fc5217f0c8e..836cbbe0ecc5 100644 --- a/arch/powerpc/platforms/pseries/iommu.c +++ b/arch/powerpc/platforms/pseries/iommu.c @@ -1229,7 +1229,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn) if (pmem_present) { if (query.largest_available_block >= (1ULL << (MAX_PHYSMEM_BITS - page_shift))) - len = MAX_PHYSMEM_BITS - page_shift; + len = MAX_PHYSMEM_BITS; else dev_info(&dev->dev, "Skipping ibm,pmemory"); } -- Alexey
[tip:master] BUILD SUCCESS 75ce17f4d207d047b991e28cd5243f2345889c0c
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git master branch HEAD: 75ce17f4d207d047b991e28cd5243f2345889c0c Merge branch 'x86/build' elapsed time: 722m configs tested: 117 configs skipped: 3 The following configs have been built successfully. More configs may be tested in the coming days. gcc tested configs: arm defconfig arm64allyesconfig arm64 defconfig arm allyesconfig arm allmodconfig x86_64 allyesconfig riscvallmodconfig i386 allyesconfig riscvallyesconfig powerpc motionpro_defconfig sh ap325rxa_defconfig mips pic32mzda_defconfig powerpc xes_mpc85xx_defconfig m68kmvme16x_defconfig arm versatile_defconfig s390 alldefconfig mips pistachio_defconfig armspear3xx_defconfig sh rsk7203_defconfig powerpc mpc8272_ads_defconfig mips rb532_defconfig shedosk7705_defconfig powerpc asp8347_defconfig armrealview_defconfig sparc sparc32_defconfig um alldefconfig sh rts7751r2dplus_defconfig sh se7705_defconfig powerpc redwood_defconfig powerpc katmai_defconfig sh ecovec24_defconfig sh rsk7201_defconfig ia64 allmodconfig ia64defconfig parisc alldefconfig powerpc holly_defconfig shsh7785lcr_defconfig m68k multi_defconfig arm omap2plus_defconfig sh sh7724_generic_defconfig mips loongson1b_defconfig powerpc g5_defconfig arm footbridge_defconfig powerpc sequoia_defconfig ia64 allyesconfig m68k allmodconfig m68kdefconfig m68k allyesconfig nios2 defconfig arc allyesconfig nds32 allnoconfig nds32 defconfig nios2allyesconfig cskydefconfig alpha defconfig alphaallyesconfig xtensa allyesconfig h8300allyesconfig arc defconfig sh allmodconfig parisc defconfig s390 allyesconfig s390 allmodconfig parisc allyesconfig s390defconfig sparcallyesconfig sparc defconfig i386defconfig mips allyesconfig mips allmodconfig powerpc allyesconfig powerpc allmodconfig powerpc allnoconfig x86_64 randconfig-a003-20210419 x86_64 randconfig-a001-20210419 x86_64 randconfig-a005-20210419 x86_64 randconfig-a002-20210419 x86_64 randconfig-a006-20210419 x86_64 randconfig-a004-20210419 i386 randconfig-a003-20210419 i386 randconfig-a001-20210419 i386 randconfig-a006-20210419 i386 randconfig-a005-20210419 i386 randconfig-a004-20210419 i386 randconfig-a002-20210419 i386 randconfig-a012-20210420 i386 randconfig-a014-20210420 i386 randconfig-a011-20210420 i386 randconfig-a013-20210420 i386 randconfig-a015-20210420 i386 randconfig-a016-20210420 i386 randconfig-a015-20210419 i386 randconfig-a013-20210419 i386 randconfig-a014-20210419 i386 randconfig-a016-20210419 i386 randconfig-a012-20210419 i386 randconfig-a011-20210419 riscvnommu_k210_defconfig riscvnommu_virt_defconfig riscv allnoconfig riscv defco
[syzbot] KASAN: use-after-free Read in sctp_do_8_2_transport_strike
Hello, syzbot found the following issue on: HEAD commit:bf05bf16 Linux 5.12-rc8 git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=16a00dfed0 kernel config: https://syzkaller.appspot.com/x/.config?x=9404cfa686df2c05 dashboard link: https://syzkaller.appspot.com/bug?extid=bbe538efd1046586f587 Unfortunately, I don't have any reproducer for this issue yet. IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+bbe538efd1046586f...@syzkaller.appspotmail.com == BUG: KASAN: use-after-free in sctp_do_8_2_transport_strike.constprop.0+0xa27/0xab0 net/sctp/sm_sideeffect.c:531 Read of size 4 at addr 888024d65154 by task swapper/1/0 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.12.0-rc8-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:232 __kasan_report mm/kasan/report.c:399 [inline] kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416 sctp_do_8_2_transport_strike.constprop.0+0xa27/0xab0 net/sctp/sm_sideeffect.c:531 sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1636 [inline] sctp_side_effects net/sctp/sm_sideeffect.c:1185 [inline] sctp_do_sm+0x114f/0x5120 net/sctp/sm_sideeffect.c:1156 sctp_generate_timeout_event+0x1bb/0x3d0 net/sctp/sm_sideeffect.c:295 call_timer_fn+0x1a5/0x6b0 kernel/time/timer.c:1431 expire_timers kernel/time/timer.c:1476 [inline] __run_timers.part.0+0x67c/0xa50 kernel/time/timer.c:1745 __run_timers kernel/time/timer.c:1726 [inline] run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1758 __do_softirq+0x29b/0x9f6 kernel/softirq.c:345 invoke_softirq kernel/softirq.c:221 [inline] __irq_exit_rcu kernel/softirq.c:422 [inline] irq_exit_rcu+0x134/0x200 kernel/softirq.c:434 sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1100 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:632 RIP: 0010:native_save_fl arch/x86/include/asm/irqflags.h:29 [inline] RIP: 0010:arch_local_save_flags arch/x86/include/asm/irqflags.h:70 [inline] RIP: 0010:arch_irqs_disabled arch/x86/include/asm/irqflags.h:137 [inline] RIP: 0010:acpi_safe_halt drivers/acpi/processor_idle.c:112 [inline] RIP: 0010:acpi_idle_do_entry+0x1c9/0x250 drivers/acpi/processor_idle.c:517 Code: ed 56 6e f8 84 db 75 ac e8 34 50 6e f8 e8 3f 3f 74 f8 e9 0c 00 00 00 e8 25 50 6e f8 0f 00 2d be a5 c7 00 e8 19 50 6e f8 fb f4 <9c> 5b 81 e3 00 02 00 00 fa 31 ff 48 89 de e8 24 58 6e f8 48 85 db RSP: 0018:c9d57d18 EFLAGS: 0293 RAX: RBX: RCX: RDX: 8880111c54c0 RSI: 8905a167 RDI: RBP: 888141433864 R08: 0001 R09: 0001 R10: 8179e0c8 R11: R12: 0001 R13: 888141433800 R14: 888141433864 R15: 888017879004 acpi_idle_enter+0x361/0x500 drivers/acpi/processor_idle.c:652 cpuidle_enter_state+0x1b1/0xc80 drivers/cpuidle/cpuidle.c:237 cpuidle_enter+0x4a/0xa0 drivers/cpuidle/cpuidle.c:351 call_cpuidle kernel/sched/idle.c:158 [inline] cpuidle_idle_call kernel/sched/idle.c:239 [inline] do_idle+0x3e1/0x590 kernel/sched/idle.c:300 cpu_startup_entry+0x14/0x20 kernel/sched/idle.c:397 start_secondary+0x274/0x350 arch/x86/kernel/smpboot.c:272 secondary_startup_64_no_verify+0xb0/0xbb Allocated by task 14433: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:46 [inline] set_alloc_info mm/kasan/common.c:427 [inline] kasan_kmalloc mm/kasan/common.c:506 [inline] kasan_kmalloc mm/kasan/common.c:465 [inline] __kasan_kmalloc+0x99/0xc0 mm/kasan/common.c:515 kmalloc include/linux/slab.h:554 [inline] kzalloc include/linux/slab.h:684 [inline] sctp_transport_new+0x8c/0x690 net/sctp/transport.c:96 sctp_assoc_add_peer+0x28f/0x1160 net/sctp/associola.c:618 sctp_process_init+0x12a/0x2940 net/sctp/sm_make_chunk.c:2345 sctp_sf_do_dupcook_a net/sctp/sm_statefuns.c:1800 [inline] sctp_sf_do_5_2_4_dupcook+0x1401/0x2d50 net/sctp/sm_statefuns.c:2200 sctp_do_sm+0x179/0x5120 net/sctp/sm_sideeffect.c:1153 sctp_assoc_bh_rcv+0x386/0x6c0 net/sctp/associola.c:1048 sctp_inq_push+0x1da/0x270 net/sctp/inqueue.c:80 sctp_rcv+0xf64/0x2f10 net/sctp/input.c:256 sctp6_rcv+0x38/0x60 net/sctp/ipv6.c:1077 ip6_protocol_deliver_rcu+0x2e9/0x17f0 net/ipv6/ip6_input.c:422 ip6_input_finish+0x7f/0x160 net/ipv6/ip6_input.c:463 NF_HOOK include/linux/netfilter.h:301 [inline] NF_HOOK include/linux/netfilter.h:295 [inline] ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:472 dst_input include/net/dst.h:458 [inline] ip6_rcv_finish net/ipv6/ip6_input.c:76 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] NF_HOOK include/linux/netfilter.h:295 [inline] ipv6_rcv+0x28e/0x3c0 net/i
Re: [PATCH] bonding: 3ad: update slave arr after initialize
jin yiting wrote: [...] >> The described issue is a race condition (in that >> ad_agg_selection_logic clears agg->is_active under mode_lock, but >> bond_open -> bond_update_slave_arr is inspecting agg->is_active outside >> the lock). I don't see how the above change will reliably manage this; >> the real issue looks to be that bond_update_slave_arr is committing >> changes to the array (via bond_reset_slave_arr) based on a racy >> inspection of the active aggregator state while it is in flux. >> >> Also, the description of the issue says "The best aggregator in >> ad_agg_selection_logic has not changed, no need to update slave arr," >> but the change above does the opposite, and will set update_slave_arr >> when the aggregator has not changed (update_slave_arr remains false at >> return of ad_agg_selection_logic). >> >> I believe I understand the described problem, but I don't see >> how the patch fixes it. I suspect (but haven't tested) that the proper >> fix is to acquire mode_lock in bond_update_slave_arr while calling >> bond_3ad_get_active_agg_info to avoid conflict with the state machine. >> >> -J >> >> --- >> -Jay Vosburgh, jay.vosbu...@canonical.com >> . >> > > Thank you for your reply. The last patch does have redundant >update slave arr.Thank you for your correction. > >As you said, holding mode_lock in bond_update_slave_arr while >calling bond_3ad_get_active_agg_info can avoid conflictwith the state >machine. I have tested this patch, with ifdown/ifup operations for bond or >slaves. > >But bond_update_slave_arr is expected to hold RTNL only and NO >other lock. And it have WARN_ON(lockdep_is_held(&bond->mode_lock)); in >bond_update_slave_arr. I'm not sure that holding mode_lock in >bond_update_slave_arr while calling bond_3ad_get_active_agg_info is a >correct action. That WARN_ON came up in discussion recently, and my opinion is that it's incorrect, and is trying to insure bond_update_slave_arr is safe for a potential sleep when allocating memory. https://lore.kernel.org/netdev/20210322123846.3024549-1-maxi...@nvidia.com/ The original authors haven't replied, so I would suggest you remove the WARN_ON and the surrounding CONFIG_LOCKDEP ifdefs as part of your patch and replace it with a call to might_sleep. The other callers of bond_3ad_get_active_agg_info are generally obtaining the state in order to report it to user space, so I think it's safe to leave those calls not holding the mode_lock. The race is still there, but the data returned to user space is a snapshot and so may reflect an incomplete state during a transition. Further, having the inspection functions acquire the mode_lock permits user space to spam the lock with little effort. -J >diff --git a/drivers/net/bonding/bond_main.c >b/drivers/net/bonding/bond_main.c >index 74cbbb2..db988e5 100644 >--- a/drivers/net/bonding/bond_main.c >+++ b/drivers/net/bonding/bond_main.c >@@ -4406,7 +4406,9 @@ int bond_update_slave_arr(struct bonding *bond, >struct slave *skipslave) >if (BOND_MODE(bond) == BOND_MODE_8023AD) { >struct ad_info ad_info; > >+ spin_lock_bh(&bond->mode_lock); >if (bond_3ad_get_active_agg_info(bond, &ad_info)) { >+ spin_unlock_bh(&bond->mode_lock); >pr_debug("bond_3ad_get_active_agg_info failed\n"); >/* No active aggragator means it's not safe to use > * the previous array. >@@ -4414,6 +4416,7 @@ int bond_update_slave_arr(struct bonding *bond, >struct slave *skipslave) >bond_reset_slave_arr(bond); >goto out; >} >+ spin_unlock_bh(&bond->mode_lock); >agg_id = ad_info.aggregator_id; >} >bond_for_each_slave(bond, slave, iter) { --- -Jay Vosburgh, jay.vosbu...@canonical.com
Re: PPC_FPU, ALTIVEC: enable_kernel_fp, put_vr, get_vr
Le 19/04/2021 à 23:39, Randy Dunlap a écrit : On 4/19/21 6:16 AM, Michael Ellerman wrote: Randy Dunlap writes: Sure. I'll post them later today. They keep FPU and ALTIVEC as independent (build) features. Those patches look OK. But I don't think it makes sense to support that configuration, FPU=n ALTVEC=y. No one is ever going to make a CPU like that. We have enough testing surface due to configuration options, without adding artificial combinations that no one is ever going to use. IMHO :) So I'd rather we just make ALTIVEC depend on FPU. That's rather simple. See below. I'm doing a bunch of randconfig builds with it now. --- From: Randy Dunlap Subject: [PATCH] powerpc: make ALTIVEC depend PPC_FPU On a kernel config with ALTIVEC=y and PPC_FPU not set/enabled, there are build errors: drivers/cpufreq/pmac32-cpufreq.c:262:2: error: implicit declaration of function 'enable_kernel_fp' [-Werror,-Wimplicit-function-declaration] enable_kernel_fp(); ../arch/powerpc/lib/sstep.c: In function 'do_vec_load': ../arch/powerpc/lib/sstep.c:637:3: error: implicit declaration of function 'put_vr' [-Werror=implicit-function-declaration] 637 | put_vr(rn, &u.v); | ^~ ../arch/powerpc/lib/sstep.c: In function 'do_vec_store': ../arch/powerpc/lib/sstep.c:660:3: error: implicit declaration of function 'get_vr'; did you mean 'get_oc'? [-Werror=implicit-function-declaration] 660 | get_vr(rn, &u.v); | ^~ In theory ALTIVEC is independent of PPC_FPU but in practice nobody is going to build such a machine, so make ALTIVEC require PPC_FPU by depending on PPC_FPU. Signed-off-by: Randy Dunlap Reported-by: kernel test robot Cc: Michael Ellerman Cc: linuxppc-...@lists.ozlabs.org Cc: Christophe Leroy Cc: Segher Boessenkool Cc: l...@intel.com --- arch/powerpc/platforms/86xx/Kconfig|1 + arch/powerpc/platforms/Kconfig.cputype |2 ++ 2 files changed, 3 insertions(+) --- linux-next-20210416.orig/arch/powerpc/platforms/86xx/Kconfig +++ linux-next-20210416/arch/powerpc/platforms/86xx/Kconfig @@ -4,6 +4,7 @@ menuconfig PPC_86xx bool "86xx-based boards" depends on PPC_BOOK3S_32 select FSL_SOC + select PPC_FPU select ALTIVEC help The Freescale E600 SoCs have 74xx cores. --- linux-next-20210416.orig/arch/powerpc/platforms/Kconfig.cputype +++ linux-next-20210416/arch/powerpc/platforms/Kconfig.cputype @@ -186,6 +186,7 @@ config E300C3_CPU config G4_CPU bool "G4 (74xx)" depends on PPC_BOOK3S_32 + select PPC_FPU select ALTIVEC endchoice @@ -309,6 +310,7 @@ config PHYS_64BIT config ALTIVEC bool "AltiVec Support" + depends on PPC_FPU Shouldn't we do it the other way round ? In extenso make ALTIVEC select PPC_FPU and avoid the two selects that are above ? depends on PPC_BOOK3S_32 || PPC_BOOK3S_64 || (PPC_E500MC && PPC64) help This option enables kernel support for the Altivec extensions to the
Re: [PATCH v5 09/12] media: uvcvideo: Implement UVC_QUIRK_PRIVACY_DURING_STREAM
On Wed, Dec 23, 2020 at 9:56 PM Ricardo Ribalda wrote: > > Hi again > > On Wed, Dec 23, 2020 at 9:31 AM Ricardo Ribalda wrote: > > > > Hi Laurent > > > > On Wed, Dec 23, 2020 at 9:05 AM Laurent Pinchart > > wrote: > > > > > > Hi Ricardo, > > > > > > On Tue, Dec 22, 2020 at 09:04:19PM +0100, Ricardo Ribalda wrote: > > > > On Tue, Dec 22, 2020 at 11:30 AM Laurent Pinchart wrote: > > > > > On Mon, Dec 21, 2020 at 05:48:16PM +0100, Ricardo Ribalda wrote: > > > > > > Some devices, can only read the privacy_pin if the device is > > > > > > > > > > s/devices,/devices/ > > > > > > > > > > > streaming. > > > > > > > > > > > > This patch implement a quirk for such devices, in order to avoid > > > > > > invalid > > > > > > reads and/or spurious events. > > > > > > > > > > > > Signed-off-by: Ricardo Ribalda > > > > > > --- > > > > > > drivers/media/usb/uvc/uvc_driver.c | 57 > > > > > > -- > > > > > > drivers/media/usb/uvc/uvc_queue.c | 3 ++ > > > > > > drivers/media/usb/uvc/uvcvideo.h | 4 +++ > > > > > > 3 files changed, 61 insertions(+), 3 deletions(-) > > > > > > > > > > > > diff --git a/drivers/media/usb/uvc/uvc_driver.c > > > > > > b/drivers/media/usb/uvc/uvc_driver.c > > > > > > index 72516101fdd0..7af37d4bd60a 100644 > > > > > > --- a/drivers/media/usb/uvc/uvc_driver.c > > > > > > +++ b/drivers/media/usb/uvc/uvc_driver.c > > > > > > @@ -7,6 +7,7 @@ > > > > > > */ > > > > > > > > > > > > #include > > > > > > +#include > > > > > > #include > > > > > > #include > > > > > > #include > > > > > > @@ -1472,6 +1473,17 @@ static int uvc_parse_control(struct > > > > > > uvc_device *dev) > > > > > > /* > > > > > > - > > > > > > * Privacy GPIO > > > > > > */ > > > > > > > > > > There should be a blank line here. > > > > > > > > > > > +static bool uvc_gpio_is_streaming(struct uvc_device *dev) > > > > > > +{ > > > > > > + struct uvc_streaming *streaming; > > > > > > + > > > > > > + list_for_each_entry(streaming, &dev->streams, list) { > > > > > > + if (uvc_queue_streaming(&streaming->queue)) > > > > > > + return true; > > > > > > + } > > > > > > + > > > > > > + return false; > > > > > > +} > > > > > > > > > > > > > > > > > > > > > > But not too blank lines here. > > > > > > > > > > > static u8 uvc_gpio_update_value(struct uvc_device *dev, > > > > > > @@ -1499,7 +1511,12 @@ static int uvc_gpio_get_cur(struct > > > > > > uvc_device *dev, struct uvc_entity *entity, > > > > > > if (cs != UVC_CT_PRIVACY_CONTROL || size < 1) > > > > > > return -EINVAL; > > > > > > > > > > > > + if ((dev->quirks & UVC_QUIRK_PRIVACY_DURING_STREAM) && > > > > > > + !uvc_gpio_is_streaming(dev)) > > > > > > + return -EBUSY; > > > > > > + > > > > > > *(uint8_t *)data = uvc_gpio_update_value(dev, entity); > > > > > > + > > > > > > return 0; > > > > > > } > > > > > > > > > > > > @@ -1528,19 +1545,50 @@ static struct uvc_entity > > > > > > *uvc_gpio_find_entity(struct uvc_device *dev) > > > > > > return NULL; > > > > > > } > > > > > > > > > > > > -static irqreturn_t uvc_gpio_irq(int irq, void *data) > > > > > > +void uvc_privacy_gpio_event(struct uvc_device *dev) > > > > > > { > > > > > > - struct uvc_device *dev = data; > > > > > > struct uvc_entity *unit; > > > > > > > > > > > > + > > > > > > unit = uvc_gpio_find_entity(dev); > > > > > > if (!unit) > > > > > > - return IRQ_HANDLED; > > > > > > + return; > > > > > > > > > > > > uvc_gpio_update_value(dev, unit); > > > > > > +} > > > > > > + > > > > > > +static irqreturn_t uvc_gpio_irq(int irq, void *data) > > > > > > +{ > > > > > > + struct uvc_device *dev = data; > > > > > > + > > > > > > + /* Ignore privacy events during streamoff */ > > > > > > + if (dev->quirks & UVC_QUIRK_PRIVACY_DURING_STREAM) > > > > > > + if (!uvc_gpio_is_streaming(dev)) > > > > > > + return IRQ_HANDLED; > > > > > > > > > > I'm still a bit concerned of race conditions. When stopping the > > > > > stream, > > > > > vb2_queue.streaming is set to 0 after calling the driver's > > > > > .stop_stream() > > > > > handler. This means that the device will cut power before > > > > > uvc_gpio_is_streaming() can detect that streaming has stopped, and the > > > > > GPIO could thus trigger an IRQ. > > > > > > > > On the affected devices I have not seen this. I guess it takes some > > > > time to discharge. Anyway I am implementing a workaround. Tell me if > > > > it is too ugly. > > > > > > > > > You mentioned that devices have a pull-up or pull-down on the GPIO > > > > > line. > > > > > As there are only two devices affected, do you know if it's a pull-up > > > > > or > > > > > pull-down ? Would it be worse to expose that state to userspace than > > > > > to > > > > > return -EBUSY when reading th
[PATCH 1/1] powerpc/pseries/iommu: Fix window size for direct mapping with pmem
As of today, if the DDW is big enough to fit (1 << MAX_PHYSMEM_BITS) it's possible to use direct DMA mapping even with pmem region. But, if that happens, the window size (len) is set to (MAX_PHYSMEM_BITS - page_shift) instead of MAX_PHYSMEM_BITS, causing a pagesize times smaller DDW to be created, being insufficient for correct usage. Fix this so the correct window size is used in this case. Fixes: bf6e2d562bbc4("powerpc/dma: Fallback to dma_ops when persistent memory present") Signed-off-by: Leonardo Bras --- arch/powerpc/platforms/pseries/iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c index 9fc5217f0c8e..836cbbe0ecc5 100644 --- a/arch/powerpc/platforms/pseries/iommu.c +++ b/arch/powerpc/platforms/pseries/iommu.c @@ -1229,7 +1229,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn) if (pmem_present) { if (query.largest_available_block >= (1ULL << (MAX_PHYSMEM_BITS - page_shift))) - len = MAX_PHYSMEM_BITS - page_shift; + len = MAX_PHYSMEM_BITS; else dev_info(&dev->dev, "Skipping ibm,pmemory"); } -- 2.30.2
Re: [PATCH] bpf: Fix backport of "bpf: restrict unknown scalars of mixed signed bounds for unprivileged"
On Mon, Apr 19, 2021 at 04:56:41PM -0700, Samuel Mendoza-Jonas wrote: > The 4.14 backport of 9d7eceede ("bpf: restrict unknown scalars of mixed > signed bounds for unprivileged") adds the PTR_TO_MAP_VALUE check to the > wrong location in adjust_ptr_min_max_vals(), most likely because 4.14 > doesn't include the commit that updates the if-statement to a > switch-statement (aad2eeaf4 "bpf: Simplify ptr_min_max_vals adjustment"). > > Move the check to the proper location in adjust_ptr_min_max_vals(). > > Fixes: 17efa65350c5a ("bpf: restrict unknown scalars of mixed signed bounds > for unprivileged") > Signed-off-by: Samuel Mendoza-Jonas > Reviewed-by: Frank van der Linden > Reviewed-by: Ethan Chen > --- Thanks for catching it :) Reviewed-by: Balbir Singh
RE: [PATCH v2] Drivers: hv: vmbus: Initialize unload_event statically
From: Andrea Parri (Microsoft) Sent: Monday, April 19, 2021 6:44 PM > > If a malicious or compromised Hyper-V sends a spurious message of type > CHANNELMSG_UNLOAD_RESPONSE, the function vmbus_unload_response() will > call complete() on an uninitialized event, and cause an oops. > > Reported-by: Michael Kelley > Signed-off-by: Andrea Parri (Microsoft) > --- > Changes since v1[1]: > - add inline comment in vmbus_unload_response() > > [1] > https://lore.kernel.org/linux-hyperv/20210416143932.16512-1-parri.and...@gmail.com/ > > drivers/hv/channel_mgmt.c | 7 ++- > drivers/hv/connection.c | 2 ++ > 2 files changed, 8 insertions(+), 1 deletion(-) > Reviewed-by: Michael Kelley > diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c > index 4c9e45d1f462c..335a10ee03a5e 100644 > --- a/drivers/hv/channel_mgmt.c > +++ b/drivers/hv/channel_mgmt.c > @@ -826,6 +826,11 @@ static void vmbus_unload_response(struct > vmbus_channel_message_header *hdr) > /* >* This is a global event; just wakeup the waiting thread. >* Once we successfully unload, we can cleanup the monitor state. > + * > + * NB. A malicious or compromised Hyper-V could send a spurious > + * message of type CHANNELMSG_UNLOAD_RESPONSE, and trigger a call > + * of the complete() below. Make sure that unload_event has been > + * initialized by the time this complete() is executed. >*/ > complete(&vmbus_connection.unload_event); > } > @@ -841,7 +846,7 @@ void vmbus_initiate_unload(bool crash) > if (vmbus_proto_version < VERSION_WIN8_1) > return; > > - init_completion(&vmbus_connection.unload_event); > + reinit_completion(&vmbus_connection.unload_event); > memset(&hdr, 0, sizeof(struct vmbus_channel_message_header)); > hdr.msgtype = CHANNELMSG_UNLOAD; > vmbus_post_msg(&hdr, sizeof(struct vmbus_channel_message_header), > diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c > index dc19d5ae4373c..311cd005b3be6 100644 > --- a/drivers/hv/connection.c > +++ b/drivers/hv/connection.c > @@ -26,6 +26,8 @@ > > struct vmbus_connection vmbus_connection = { > .conn_state = DISCONNECTED, > + .unload_event = COMPLETION_INITIALIZER( > + vmbus_connection.unload_event), > .next_gpadl_handle = ATOMIC_INIT(0xE1E10), > > .ready_for_suspend_event = COMPLETION_INITIALIZER( > -- > 2.25.1
[PATCH v2 1/1] Drivers: hv: vmbus: Increase wait time for VMbus unload
When running in Azure, disks may be connected to a Linux VM with read/write caching enabled. If a VM panics and issues a VMbus UNLOAD request to Hyper-V, the response is delayed until all dirty data in the disk cache is flushed. In extreme cases, this flushing can take 10's of seconds, depending on the disk speed and the amount of dirty data. If kdump is configured for the VM, the current 10 second timeout in vmbus_wait_for_unload() may be exceeded, and the UNLOAD complete message may arrive well after the kdump kernel is already running, causing problems. Note that no problem occurs if kdump is not enabled because Hyper-V waits for the cache flush before doing a reboot through the BIOS/UEFI code. Fix this problem by increasing the timeout in vmbus_wait_for_unload() to 100 seconds. Also output periodic messages so that if anyone is watching the serial console, they won't think the VM is completely hung. Fixes: 911e1987efc8 ("Drivers: hv: vmbus: Add timeout to vmbus_wait_for_unload") Signed-off-by: Michael Kelley --- Changed in v2: Fixed silly error in the argument to mdelay() --- drivers/hv/channel_mgmt.c | 30 +- 1 file changed, 25 insertions(+), 5 deletions(-) diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c index f3cf4af..ef4685c 100644 --- a/drivers/hv/channel_mgmt.c +++ b/drivers/hv/channel_mgmt.c @@ -755,6 +755,12 @@ static void init_vp_index(struct vmbus_channel *channel) free_cpumask_var(available_mask); } +#define UNLOAD_DELAY_UNIT_MS 10 /* 10 milliseconds */ +#define UNLOAD_WAIT_MS (100*1000) /* 100 seconds */ +#define UNLOAD_WAIT_LOOPS (UNLOAD_WAIT_MS/UNLOAD_DELAY_UNIT_MS) +#define UNLOAD_MSG_MS (5*1000)/* Every 5 seconds */ +#define UNLOAD_MSG_LOOPS (UNLOAD_MSG_MS/UNLOAD_DELAY_UNIT_MS) + static void vmbus_wait_for_unload(void) { int cpu; @@ -772,12 +778,17 @@ static void vmbus_wait_for_unload(void) * vmbus_connection.unload_event. If not, the last thing we can do is * read message pages for all CPUs directly. * -* Wait no more than 10 seconds so that the panic path can't get -* hung forever in case the response message isn't seen. +* Wait up to 100 seconds since an Azure host must writeback any dirty +* data in its disk cache before the VMbus UNLOAD request will +* complete. This flushing has been empirically observed to take up +* to 50 seconds in cases with a lot of dirty data, so allow additional +* leeway and for inaccuracies in mdelay(). But eventually time out so +* that the panic path can't get hung forever in case the response +* message isn't seen. */ - for (i = 0; i < 1000; i++) { + for (i = 1; i <= UNLOAD_WAIT_LOOPS; i++) { if (completion_done(&vmbus_connection.unload_event)) - break; + goto completed; for_each_online_cpu(cpu) { struct hv_per_cpu_context *hv_cpu @@ -800,9 +811,18 @@ static void vmbus_wait_for_unload(void) vmbus_signal_eom(msg, message_type); } - mdelay(10); + /* +* Give a notice periodically so someone watching the +* serial output won't think it is completely hung. +*/ + if (!(i % UNLOAD_MSG_LOOPS)) + pr_notice("Waiting for VMBus UNLOAD to complete\n"); + + mdelay(UNLOAD_DELAY_UNIT_MS); } + pr_err("Continuing even though VMBus UNLOAD did not complete\n"); +completed: /* * We're crashing and already got the UNLOAD_RESPONSE, cleanup all * maybe-pending messages on all CPUs to be able to receive new -- 1.8.3.1
Re: [PATCH bpf-next v2 4/4] libbpf: add selftests for TC-BPF API
On Mon, Apr 19, 2021 at 5:18 AM Kumar Kartikeya Dwivedi wrote: > > This adds some basic tests for the low level bpf_tc_cls_* API. > > Reviewed-by: Toke Høiland-Jørgensen > Signed-off-by: Kumar Kartikeya Dwivedi > --- > .../selftests/bpf/prog_tests/test_tc_bpf.c| 112 ++ > .../selftests/bpf/progs/test_tc_bpf_kern.c| 12 ++ > 2 files changed, 124 insertions(+) > create mode 100644 tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c > create mode 100644 tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c > > diff --git a/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c > b/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c > new file mode 100644 > index ..945f3a1a72f8 > --- /dev/null > +++ b/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c > @@ -0,0 +1,112 @@ > +// SPDX-License-Identifier: GPL-2.0 > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define LO_IFINDEX 1 > + > +static int test_tc_cls_internal(int fd, __u32 parent_id) > +{ > + DECLARE_LIBBPF_OPTS(bpf_tc_cls_opts, opts, .handle = 1, .priority = > 10, > + .class_id = TC_H_MAKE(1UL << 16, 1), > + .chain_index = 5); > + struct bpf_tc_cls_attach_id id = {}; > + struct bpf_tc_cls_info info = {}; > + int ret; > + > + ret = bpf_tc_cls_attach(fd, LO_IFINDEX, parent_id, &opts, &id); > + if (CHECK_FAIL(ret < 0)) > + return ret; > + > + ret = bpf_tc_cls_get_info(fd, LO_IFINDEX, parent_id, NULL, &info); > + if (CHECK_FAIL(ret < 0)) > + goto end; > + > + ret = -1; > + > + if (CHECK_FAIL(info.id.handle != id.handle) || > + CHECK_FAIL(info.id.chain_index != id.chain_index) || > + CHECK_FAIL(info.id.priority != id.priority) || > + CHECK_FAIL(info.id.handle != 1) || > + CHECK_FAIL(info.id.priority != 10) || > + CHECK_FAIL(info.class_id != TC_H_MAKE(1UL << 16, 1)) || > + CHECK_FAIL(info.id.chain_index != 5)) > + goto end; > + > + ret = bpf_tc_cls_replace(fd, LO_IFINDEX, parent_id, &opts, &id); > + if (CHECK_FAIL(ret < 0)) > + return ret; > + > + if (CHECK_FAIL(info.id.handle != 1) || > + CHECK_FAIL(info.id.priority != 10) || > + CHECK_FAIL(info.class_id != TC_H_MAKE(1UL << 16, 1))) > + goto end; > + > + /* Demonstrate changing attributes */ > + opts.class_id = TC_H_MAKE(1UL << 16, 2); > + > + ret = bpf_tc_cls_change(fd, LO_IFINDEX, parent_id, &opts, &info.id); > + if (CHECK_FAIL(ret < 0)) > + goto end; > + > + ret = bpf_tc_cls_get_info(fd, LO_IFINDEX, parent_id, NULL, &info); > + if (CHECK_FAIL(ret < 0)) > + goto end; > + > + if (CHECK_FAIL(info.class_id != TC_H_MAKE(1UL << 16, 2))) > + goto end; > + if (CHECK_FAIL((info.bpf_flags & TCA_BPF_FLAG_ACT_DIRECT) != 1)) > + goto end; > + > +end: > + ret = bpf_tc_cls_detach(LO_IFINDEX, parent_id, &id); > + CHECK_FAIL(ret < 0); > + return ret; > +} > + > +void test_test_tc_bpf(void) > +{ > + const char *file = "./test_tc_bpf_kern.o"; > + struct bpf_program *clsp; > + struct bpf_object *obj; > + int cls_fd, ret; > + > + obj = bpf_object__open(file); > + if (CHECK_FAIL(IS_ERR_OR_NULL(obj))) > + return; > + > + clsp = bpf_object__find_program_by_title(obj, "classifier"); > + if (CHECK_FAIL(IS_ERR_OR_NULL(clsp))) > + goto end; > + > + ret = bpf_object__load(obj); > + if (CHECK_FAIL(ret < 0)) > + goto end; > + > + cls_fd = bpf_program__fd(clsp); > + > + system("tc qdisc del dev lo clsact"); > + > + ret = test_tc_cls_internal(cls_fd, BPF_TC_CLSACT_INGRESS); > + if (CHECK_FAIL(ret < 0)) > + goto end; > + > + if (CHECK_FAIL(system("tc qdisc del dev lo clsact"))) > + goto end; > + > + ret = test_tc_cls_internal(cls_fd, BPF_TC_CLSACT_EGRESS); > + if (CHECK_FAIL(ret < 0)) > + goto end; > + > + CHECK_FAIL(system("tc qdisc del dev lo clsact")); please don't use CHECK_FAIL. And prefer ASSERT_xxx over CHECK(). > + > +end: > + bpf_object__close(obj); > +} > diff --git a/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c > b/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c > new file mode 100644 > index ..3dd40e21af8e > --- /dev/null > +++ b/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c > @@ -0,0 +1,12 @@ > +// SPDX-License-Identifier: GPL-2.0 > + > +#include > +#include > + > +// Dummy prog to test TC-BPF API no C++-style comments, please (except for SPDX header, of course) > + > +SEC("classifier") > +int cls(struct __sk_buff *skb) > +{ > + return 0; > +} > -- > 2.30.2 >
Re: [PATCH] arm64: dts: ls1028a-rdb: enable optee node
On Tue, Mar 30, 2021 at 9:53 AM Sahil Malhotra wrote: > > From: Sahil Malhotra > > optee node was disabled by default, enabling it for ls1028a-rdb. > > Signed-off-by: Michael Walle > Signed-off-by: Sahil Malhotra Acked-by: Li Yang > --- > arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts | 4 > arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi| 2 +- > 2 files changed, 5 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts > b/arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts > index 41ae6e7675ba..1bdf0104d492 100644 > --- a/arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts > +++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts > @@ -274,6 +274,10 @@ > status = "okay"; > }; > > +&optee { > + status = "okay"; > +}; > + > &sai4 { > status = "okay"; > }; > diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi > b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi > index 262fbad8f0ec..fb155ac192b1 100644 > --- a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi > +++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi > @@ -92,7 +92,7 @@ > }; > > firmware { > - optee { > + optee: optee { > compatible = "linaro,optee-tz"; > method = "smc"; > status = "disabled"; > -- > 2.17.1 >
[PATCH v2 2/2] usb: cdnsp: Fix lack of removing request from pending list.
From: Pawel Laszczak Patch fixes lack of removing request from ep->pending_list on failure of the stop endpoint command. Driver even after failing this command must remove request from ep->pending_list. Without this fix driver can stuck in cdnsp_gadget_ep_disable function in loop: while (!list_empty(&pep->pending_list)) { preq = next_request(&pep->pending_list); cdnsp_ep_dequeue(pep, preq); } Fixes: 3d82904559f4 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver") Signed-off-by: Pawel Laszczak --- Changelog: v2: - removed blank space - added "Fixes" tag drivers/usb/cdns3/cdnsp-gadget.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/usb/cdns3/cdnsp-gadget.c b/drivers/usb/cdns3/cdnsp-gadget.c index 56707b6b0f57..c083985e387b 100644 --- a/drivers/usb/cdns3/cdnsp-gadget.c +++ b/drivers/usb/cdns3/cdnsp-gadget.c @@ -422,17 +422,17 @@ int cdnsp_ep_enqueue(struct cdnsp_ep *pep, struct cdnsp_request *preq) int cdnsp_ep_dequeue(struct cdnsp_ep *pep, struct cdnsp_request *preq) { struct cdnsp_device *pdev = pep->pdev; - int ret; + int ret_stop = 0; + int ret_rem; trace_cdnsp_request_dequeue(preq); - if (GET_EP_CTX_STATE(pep->out_ctx) == EP_STATE_RUNNING) { - ret = cdnsp_cmd_stop_ep(pdev, pep); - if (ret) - return ret; - } + if (GET_EP_CTX_STATE(pep->out_ctx) == EP_STATE_RUNNING) + ret_stop = cdnsp_cmd_stop_ep(pdev, pep); + + ret_rem = cdnsp_remove_request(pdev, preq, pep); - return cdnsp_remove_request(pdev, preq, pep); + return ret_rem ? ret_rem : ret_stop; } static void cdnsp_zero_in_ctx(struct cdnsp_device *pdev) -- 2.25.1
Re: linux-next: Tree for Apr 19 (bcache)
On 4/20/21 1:50 AM, Jens Axboe wrote: > On 4/19/21 10:26 AM, Coly Li wrote: >> On 4/19/21 11:40 PM, Randy Dunlap wrote: >>> On 4/19/21 3:23 AM, Stephen Rothwell wrote: Hi all, Changes since 20210416: >>> >>> on x86_64: >>> >>> when >>> # CONFIG_BLK_DEV is not set >>> >>> >>> WARNING: unmet direct dependencies detected for LIBNVDIMM >>> Depends on [n]: PHYS_ADDR_T_64BIT [=y] && HAS_IOMEM [=y] && BLK_DEV [=n] >>> Selected by [y]: >>> - BCACHE_NVM_PAGES [=y] && MD [=y] && BCACHE [=y] && PHYS_ADDR_T_64BIT >>> [=y] >>> >>> >>> Full randconfig file is attached. >>> >> >> I need hint from kbuild expert. >> >> My original idea to use "select LIBNVDIMM" is to avoid the >> BCACHE_NVM_PAGES option to disappear if LIBNVDIMM is not enabled. >> Otherwise if nvdimm driver is not configure, users won't know there is a >> BCACHE_NVM_PAGES option unless they read bcache Kconfig file. > > But why? That's exactly how it should work. Just use depends to set the > dependency. > >> But I see nvdimm's Kconfig, it uses "depends on BLK_DEV", I understand >> it is acceptable that LIBNVDIMM option to disappear from "make >> menuconfig" if BLK_DEV is not enabled. >> >> For such condition, which one is the proper way to set the dependence ? >> - Change "select LIBNVDIMM" and "select DAX" to "depends on LIBNVDIMM" >> and "depends on DAX" in bcache Kconfig >> - Or change "depends on BLK_DEV" to "select BLK_DEV" in nvdimm Kconfig. > > The former. > Copied. Thanks for the hint. I will post a fix soon. Coly Li
Re: [PATCH] riscv: Remove 32b kernel mapping from page table dump
On Sun, Apr 18, 2021 at 4:59 PM Alexandre Ghiti wrote: > > The 32b kernel mapping lies in the linear mapping, there is no point in > printing its address in page table dump, so remove this leftover that > comes from moving the kernel mapping outside the linear mapping for 64b > kernel. > > Fixes: e9efb21fe352 ("riscv: Prepare ptdump for vm layout dynamic addresses") > Signed-off-by: Alexandre Ghiti Looks good to me. Reviewed-by: Anup Patel Regards, Anup > --- > arch/riscv/mm/ptdump.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c > index 0aba4421115c..a4ed4bdbbfde 100644 > --- a/arch/riscv/mm/ptdump.c > +++ b/arch/riscv/mm/ptdump.c > @@ -76,8 +76,8 @@ enum address_markers_idx { > PAGE_OFFSET_NR, > #ifdef CONFIG_64BIT > MODULES_MAPPING_NR, > -#endif > KERNEL_MAPPING_NR, > +#endif > END_OF_SPACE_NR > }; > > @@ -99,8 +99,8 @@ static struct addr_marker address_markers[] = { > {0, "Linear mapping"}, > #ifdef CONFIG_64BIT > {0, "Modules mapping"}, > -#endif > {0, "Kernel mapping (kernel, BPF)"}, > +#endif > {-1, NULL}, > }; > > @@ -379,8 +379,8 @@ static int ptdump_init(void) > address_markers[PAGE_OFFSET_NR].start_address = PAGE_OFFSET; > #ifdef CONFIG_64BIT > address_markers[MODULES_MAPPING_NR].start_address = MODULES_VADDR; > -#endif > address_markers[KERNEL_MAPPING_NR].start_address = kernel_virt_addr; > +#endif > > kernel_ptd_info.base_addr = KERN_VIRT_START; > > -- > 2.20.1 >
Re: [PATCH] riscv: Fix 32b kernel caused by 64b kernel mapping moving outside linear mapping
On Sat, Apr 17, 2021 at 10:52 PM Alexandre Ghiti wrote: > > Fix multiple leftovers when moving the kernel mapping outside the linear > mapping for 64b kernel that left the 32b kernel unusable. > > Fixes: 4b67f48da707 ("riscv: Move kernel mapping outside of linear mapping") > Signed-off-by: Alexandre Ghiti Quite a few #ifdef but I don't see any better way at the moment. Maybe we can clean this later. Otherwise looks good to me. Reviewed-by: Anup Patel Regards, Anup > --- > arch/riscv/include/asm/page.h| 9 + > arch/riscv/include/asm/pgtable.h | 16 > arch/riscv/mm/init.c | 25 - > 3 files changed, 45 insertions(+), 5 deletions(-) > > diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h > index 22cfb2be60dc..f64b61296c0c 100644 > --- a/arch/riscv/include/asm/page.h > +++ b/arch/riscv/include/asm/page.h > @@ -90,15 +90,20 @@ typedef struct page *pgtable_t; > > #ifdef CONFIG_MMU > extern unsigned long va_pa_offset; > +#ifdef CONFIG_64BIT > extern unsigned long va_kernel_pa_offset; > +#endif > extern unsigned long pfn_base; > #define ARCH_PFN_OFFSET(pfn_base) > #else > #define va_pa_offset 0 > +#ifdef CONFIG_64BIT > #define va_kernel_pa_offset0 > +#endif > #define ARCH_PFN_OFFSET(PAGE_OFFSET >> PAGE_SHIFT) > #endif /* CONFIG_MMU */ > > +#ifdef CONFIG_64BIT > extern unsigned long kernel_virt_addr; > > #define linear_mapping_pa_to_va(x) ((void *)((unsigned long)(x) + > va_pa_offset)) > @@ -112,6 +117,10 @@ extern unsigned long kernel_virt_addr; > (_x < kernel_virt_addr) ? > \ > linear_mapping_va_to_pa(_x) : kernel_mapping_va_to_pa(_x); > \ > }) > +#else > +#define __pa_to_va_nodebug(x) ((void *)((unsigned long) (x) + va_pa_offset)) > +#define __va_to_pa_nodebug(x) ((unsigned long)(x) - va_pa_offset) > +#endif > > #ifdef CONFIG_DEBUG_VIRTUAL > extern phys_addr_t __virt_to_phys(unsigned long x); > diff --git a/arch/riscv/include/asm/pgtable.h > b/arch/riscv/include/asm/pgtable.h > index 80e63a93e903..5afda75cc2c3 100644 > --- a/arch/riscv/include/asm/pgtable.h > +++ b/arch/riscv/include/asm/pgtable.h > @@ -16,19 +16,27 @@ > #else > > #define ADDRESS_SPACE_END (UL(-1)) > -/* > - * Leave 2GB for kernel and BPF at the end of the address space > - */ > + > +#ifdef CONFIG_64BIT > +/* Leave 2GB for kernel and BPF at the end of the address space */ > #define KERNEL_LINK_ADDR (ADDRESS_SPACE_END - SZ_2G + 1) > +#else > +#define KERNEL_LINK_ADDR PAGE_OFFSET > +#endif > > #define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1) > #define VMALLOC_END (PAGE_OFFSET - 1) > #define VMALLOC_START(PAGE_OFFSET - VMALLOC_SIZE) > > -/* KASLR should leave at least 128MB for BPF after the kernel */ > #define BPF_JIT_REGION_SIZE(SZ_128M) > +#ifdef CONFIG_64BIT > +/* KASLR should leave at least 128MB for BPF after the kernel */ > #define BPF_JIT_REGION_START PFN_ALIGN((unsigned long)&_end) > #define BPF_JIT_REGION_END (BPF_JIT_REGION_START + BPF_JIT_REGION_SIZE) > +#else > +#define BPF_JIT_REGION_START (PAGE_OFFSET - BPF_JIT_REGION_SIZE) > +#define BPF_JIT_REGION_END (VMALLOC_END) > +#endif > > /* Modules always live before the kernel */ > #ifdef CONFIG_64BIT > diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c > index 093f3a96ecfc..dc9b988e0778 100644 > --- a/arch/riscv/mm/init.c > +++ b/arch/riscv/mm/init.c > @@ -91,8 +91,10 @@ static void print_vm_layout(void) > (unsigned long)VMALLOC_END); > print_mlm("lowmem", (unsigned long)PAGE_OFFSET, > (unsigned long)high_memory); > +#ifdef CONFIG_64BIT > print_mlm("kernel", (unsigned long)KERNEL_LINK_ADDR, > (unsigned long)ADDRESS_SPACE_END); > +#endif > } > #else > static void print_vm_layout(void) { } > @@ -165,9 +167,11 @@ static struct pt_alloc_ops pt_ops; > /* Offset between linear mapping virtual address and kernel load address */ > unsigned long va_pa_offset; > EXPORT_SYMBOL(va_pa_offset); > +#ifdef CONFIG_64BIT > /* Offset between kernel mapping virtual address and kernel load address */ > unsigned long va_kernel_pa_offset; > EXPORT_SYMBOL(va_kernel_pa_offset); > +#endif > unsigned long pfn_base; > EXPORT_SYMBOL(pfn_base); > > @@ -410,7 +414,9 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa) > load_sz = (uintptr_t)(&_end) - load_pa; > > va_pa_offset = PAGE_OFFSET - load_pa; > +#ifdef CONFIG_64BIT > va_kernel_pa_offset = kernel_virt_addr - load_pa; > +#endif > > pfn_base = PFN_DOWN(load_pa); > > @@ -469,12 +475,16 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa) >pa + PMD_SIZE, PMD_SIZE, PAGE_KERNEL); > dtb_early_va = (void *)DTB_EARLY_BASE_VA + (dtb_pa & (PMD_SIZE - 1)); > #else /* CONFIG_BUILTIN_DTB */ > +#ifdef CONFIG_64BIT > /* >
Re: [PATCH v20 1/2] scsi: ufs: Enable power management for wlun
On 20/04/21 12:53 am, Asutosh Das (asd) wrote: > On 4/19/2021 11:37 AM, Adrian Hunter wrote: >> On 16/04/21 10:49 pm, Asutosh Das wrote: >>> >>> Co-developed-by: Can Guo >>> Signed-off-by: Can Guo >>> Signed-off-by: Asutosh Das >>> --- >> >> I came across 3 issues while testing. See comments below. >> > Hi Adrian > Thanks for the comments. >> >> >>> @@ -5794,7 +5839,7 @@ static void ufshcd_err_handling_unprepare(struct >>> ufs_hba *hba) >>> if (ufshcd_is_clkscaling_supported(hba)) >>> ufshcd_clk_scaling_suspend(hba, false); >>> ufshcd_clear_ua_wluns(hba); >> >> ufshcd_clear_ua_wluns() deadlocks trying to clear UFS_UPIU_RPMB_WLUN >> if sdev_rpmb is suspended and sdev_ufs_device is suspending. >> e.g. ufshcd_wl_suspend() is waiting on host_sem while ufshcd_err_handler() >> is running, at which point sdev_rpmb has already suspended. >> > Umm, I didn't understand this deadlock. > When you say, sdev_rpmb is suspended, does it mean runtime_suspended? > sdev_ufs_device is suspending - this can't be runtime_suspending, while > ufshcd_err_handling_unprepare is running. > > If you've a call-stack of this deadlock, please can you share it with me. > I'll also try to reproduce this. Yes it is system suspend. sdev_rpmb has suspended, sdev_ufs_device is waiting on host_sem. ufshcd_err_handler() holds host_sem. ufshcd_clear_ua_wlun(UFS_UPIU_RPMB_WLUN) gets stuck. I will get some call-stacks. > > I'll address the other comments in the next version. > > > Thank you! > >>> - pm_runtime_put(hba->dev); >>> + ufshcd_rpm_put(hba); >>> } >> >> >> >>> +void ufshcd_resume_complete(struct device *dev) >>> +{ >
linux-next: manual merge of the ftrace tree with the bpf-next tree
Hi all, Today's linux-next merge of the ftrace tree got a conflict in: kernel/trace/bpf_trace.c between commit: d9c9e4db186a ("bpf: Factorize bpf_trace_printk and bpf_seq_printf") from the bpf-next tree and commit: f2cc020d7876 ("tracing: Fix various typos in comments") from the ftrace tree. I fixed it up (the former removed the comment updated by the latter) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell pgpZSIISeFaSc.pgp Description: OpenPGP digital signature
Re: [PATCH 0/2] Change struct page layout for page_pool
"Matthew Wilcox (Oracle)" writes: > The first patch here fixes two bugs on ppc32, and mips32. It fixes one > bug on arc and arm32 (in certain configurations). It probably makes > sense to get it in ASAP through the networking tree. I'd like to see > testing on those four architectures if possible? Sorry I don't have easy access to any hardware that fits the bill. At some point I'll be able to go to the office and setup a machine that (I think) can test these paths, but I don't have an ETA on that. You and others seem to have done lots of analysis on this though, so I think you should merge the fixes without waiting on ppc32 testing. cheers > > The second patch enables new functionality. It is much less urgent. > I'd really like to see Mel & Michal's thoughts on it. > > I have only compile-tested these patches. > > Matthew Wilcox (Oracle) (2): > mm: Fix struct page layout on 32-bit systems > mm: Indicate pfmemalloc pages in compound_head > > include/linux/mm.h | 12 +++- > include/linux/mm_types.h | 9 - > include/net/page_pool.h | 12 +++- > net/core/page_pool.c | 12 +++- > 4 files changed, 29 insertions(+), 16 deletions(-) > > -- > 2.30.2
RE: [PATCH 1/2] usb: gadget: f_uac2: Stop endpoint before enabling it.
>On 21-04-19 09:50:53, Pawel Laszczak wrote: >> From: Pawel Laszczak >> >> Patch adds disabling endpoint before enabling it during changing >> alternate setting. Lack of this functionality causes that in some >> cases uac2 queue the same request multiple time. >> Such situation can occur when host send set interface with >> alternate setting 1 twice. >> >> Signed-off-by: Pawel Laszczak >> --- >> drivers/usb/gadget/function/f_uac2.c | 8 >> 1 file changed, 4 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/usb/gadget/function/f_uac2.c >> b/drivers/usb/gadget/function/f_uac2.c >> index 9cc5c512a5cd..7d20a9d8a1b4 100644 >> --- a/drivers/usb/gadget/function/f_uac2.c >> +++ b/drivers/usb/gadget/function/f_uac2.c >> @@ -890,17 +890,17 @@ afunc_set_alt(struct usb_function *fn, unsigned intf, >> unsigned alt) >> if (intf == uac2->as_out_intf) { >> uac2->as_out_alt = alt; >> >> +u_audio_stop_capture(&uac2->g_audio); >> + >> if (alt) >> ret = u_audio_start_capture(&uac2->g_audio); >> -else >> -u_audio_stop_capture(&uac2->g_audio); >> } else if (intf == uac2->as_in_intf) { >> uac2->as_in_alt = alt; >> >> +u_audio_stop_playback(&uac2->g_audio); >> + >> if (alt) >> ret = u_audio_start_playback(&uac2->g_audio); >> -else >> -u_audio_stop_playback(&uac2->g_audio); >> } else { >> dev_err(dev, "%s:%d Error!\n", __func__, __LINE__); >> return -EINVAL; > >To avoid this, you may use prm->ep_enabled to judge if the endpoint has >already enabled. Such condition is as first instruction inside u_audio_stop_playback->free_ep function, so we don't need duplicate it here. > >-- > >Thanks, >Peter Chen -- Regards, Pawe Laszczak
[PATCH] mhi: add MHI_STATE_M2 to resume success criteria
During system resume, mhi driver triggers M3->M0 transition and then waits for target device to enter M0 state. Once done, the device queues a state change event into ctrl event ring and notify mhi dirver by raising an interrupt, where a tasklet is scheduled to process this event. In most cases, the taklet is served timely and wait operation succeeds. However, there are cases where CPU is busy and can not serve this tasklet for some time. Once delay goes long enough, the device moves itself to M1 state and also interrupts mhi driver after inserting a new state change event to ctrl ring. Later CPU finally has time to process the ring, however there are two events in it now: 1. for M3->M0 event, which is processed first as queued first, tasklet handler updates device state to M0 and wakes up the task, i.e., the mhi driver. 2. for M0->M1 event, which is processed later, tasklet handler triggers M1->M2 transition and updates device state to M2 directly, then wakes up the mhi driver(if still sleeping on this wait queue). Note that although mhi driver has been woken up while processing the first event, it may still has no chance to run before the second event is processed. In other words, mhi driver has to keep waiting till timeout cause the M0 state has been missed. kernel log here: ... Apr 15 01:45:14 test-NUC8i7HVK kernel: [ 4247.911251] mhi :06:00.0: Entered with PM state: M3, MHI state: M3 Apr 15 01:45:14 test-NUC8i7HVK kernel: [ 4247.917762] mhi :06:00.0: State change event to state: M0 Apr 15 01:45:14 test-NUC8i7HVK kernel: [ 4247.917767] mhi :06:00.0: State change event to state: M1 Apr 15 01:45:14 test-NUC8i7HVK kernel: [ 4338.788231] mhi :06:00.0: Did not enter M0 state, MHI state: M2, PM state: M2 ... Fix this issue by simply adding M2 as a valid state for resume. Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-01720.1-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1 Signed-off-by: Baochen Qiang --- drivers/bus/mhi/core/pm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c index ce73cfa63cb3..ca5f2feed9d5 100644 --- a/drivers/bus/mhi/core/pm.c +++ b/drivers/bus/mhi/core/pm.c @@ -900,6 +900,7 @@ int mhi_pm_resume(struct mhi_controller *mhi_cntrl) ret = wait_event_timeout(mhi_cntrl->state_event, mhi_cntrl->dev_state == MHI_STATE_M0 || +mhi_cntrl->dev_state == MHI_STATE_M2 || MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state), msecs_to_jiffies(mhi_cntrl->timeout_ms)); -- 2.25.1
Re: [PATCH v4 4/4] pinctrl: add rsel setting on MT8195
On Mon, Apr 12, 2021 at 10:57 PM Zhiyong Tao wrote: > @@ -176,6 +180,12 @@ static int mtk_pinconf_get(struct pinctrl_dev *pctldev, > else > err = -ENOTSUPP; > break; > + case MTK_PIN_CONFIG_RSEL: > + if (hw->soc->rsel_get) > + err = hw->soc->rsel_get(hw, desc, &ret); > + else > + err = -EOPNOTSUPP; I think that should want to be -ENOTSUPP to align other occurrences. > + break; > default: > err = -ENOTSUPP; > } > @@ -295,6 +305,12 @@ static int mtk_pinconf_set(struct pinctrl_dev *pctldev, > unsigned int pin, > else > err = -ENOTSUPP; > break; > + case MTK_PIN_CONFIG_RSEL: > + if (hw->soc->rsel_set) > + err = hw->soc->rsel_set(hw, desc, arg); > + else > + err = -EOPNOTSUPP; Ditto > + break; > default: > err = -ENOTSUPP; > } > -- > 2.18.0 >
linux-next: build warning after merge of the spi tree
Hi all, After merging the spi tree, today's linux-next build (x86_64 allmodconfig) produced this warning: In file included from include/linux/printk.h:409, from include/linux/kernel.h:16, from include/linux/clk.h:13, from drivers/spi/spi-stm32-qspi.c:7: drivers/spi/spi-stm32-qspi.c: In function 'stm32_qspi_dirmap_read': drivers/spi/spi-stm32-qspi.c:481:21: warning: format '%x' expects argument of type 'unsigned int', but argument 5 has type 'size_t' {aka 'long unsigned int'} [-Wformat=] 481 | dev_dbg(qspi->dev, "%s len = 0x%x offs = 0x%llx buf = 0x%p\n", __func__, len, offs, buf); | ^~ include/linux/dynamic_debug.h:129:15: note: in definition of macro '__dynamic_func_call' 129 | func(&id, ##__VA_ARGS__); \ | ^~~ include/linux/dynamic_debug.h:161:2: note: in expansion of macro '_dynamic_func_call' 161 | _dynamic_func_call(fmt,__dynamic_dev_dbg, \ | ^~ include/linux/dev_printk.h:123:2: note: in expansion of macro 'dynamic_dev_dbg' 123 | dynamic_dev_dbg(dev, dev_fmt(fmt), ##__VA_ARGS__) | ^~~ include/linux/dev_printk.h:123:23: note: in expansion of macro 'dev_fmt' 123 | dynamic_dev_dbg(dev, dev_fmt(fmt), ##__VA_ARGS__) | ^~~ drivers/spi/spi-stm32-qspi.c:481:2: note: in expansion of macro 'dev_dbg' 481 | dev_dbg(qspi->dev, "%s len = 0x%x offs = 0x%llx buf = 0x%p\n", __func__, len, offs, buf); | ^~~ drivers/spi/spi-stm32-qspi.c:481:34: note: format string is defined here 481 | dev_dbg(qspi->dev, "%s len = 0x%x offs = 0x%llx buf = 0x%p\n", __func__, len, offs, buf); | ~^ | | | unsigned int | %lx Introduced by commit 18674dee3cd6 ("spi: stm32-qspi: Add dirmap support") -- Cheers, Stephen Rothwell pgpnXiPdMIUXz.pgp Description: OpenPGP digital signature
Re: [PATCH 00/13] [RFC] Rust support
Hi Nick, On Mon, Apr 19, 2021 at 05:24:33PM -0700, Nick Desaulniers wrote: > I don't think the introduction of Rust made Firefox _more_ insecure. > https://wiki.mozilla.org/Oxidation#Within_Firefox Browsers are human interfaces and do not fundamentally require low level access to memory/hardware/whatever. They can be written in about any language, only the resource usage and performance will make a difference. As such, some were even written in Java or JS for example. Operating systems, and particularly drivers *do* require low-level accesses, and stuff that can hardly be abstracted or understood by a compiler. You may have to perform two 16-bit reads/writes on a 32-bit MMIO address to perform an operation and the compiler does not have to know it, just to obey. > Really, a key point is that a lot of common mistakes in C are compile > time errors in Rust. I know no "true" kernel dev would make such > mistakes in C, Everyone makes mistakes, the level of attention varies over time and the focus often changes when dealing with build errors. How many time some of us facing a bug remembered having changed the code very late after a build error, and being less careful from this point when the goal changed from "let's do it right" to "let's get this to build" ? > but is there nothing we can do to help our peers > writing drivers? The point is to transfer cost from runtime to > compile time to avoid costs at runtime; like all of the memory safety > bugs which are costing our industry. And do we have stats on the number of logical bugs, some of which are caused by developers trying to work around compilers' stubbornness ? For me, personally speaking, they have *increased* over time, usually trying to avoid some annoying modern gcc warnings, resulting in integer casts being placed close to string formats, or returns being placed in switch/case to avoid the fall-through warning, etc. Thus I'm worried that a non-negligible part of the 70% of bugs caused by memory safety issues could be replaced with logic bugs to get to the point where the rust compiler finally accepts to compile the code. It makes me think about researchers trying to reduce the causes of certain deaths and claiming to "save lives" while in the end the people they "save" will simply die from something else. And I'm not particularly trying to blindly defend C here. I'm complaining every single day about some of its shortcomings like the vast amount of UB, stupid type promotion, counter-intuitive operators precedence when combining bit-ops with arithmetic, limited size of enums, lack of rotate operator, strict aliasing, or the recourse to asm() statements every 10 lines to do stuff that can hardly be expressed in a way understandable by a compiler. I'm just seeing that a lot of the griefs I'm having against C come from the compiler trying to be too smart or too stubborn, so giving even more of the handle to a compiler doesn't appeal me at all. In addition, we all know how painful it is to work around compiler bugs by writing complex code that carefully avoids certain constructs. I'm wondering if we'll still have that luxury with a stricter compiler, or if the only response will have to be between "let's disable this driver that does not compile" or "please force distros to upgrade their compilers". But we'll see :-/ Regards, Willy
Re: [PATCH v4 3/4] pinctrl: add drive for I2C related pins on MT8195
On Mon, Apr 12, 2021 at 10:57 PM Zhiyong Tao wrote: > > This patch provides the advanced drive raw data setting version > for I2C used pins on MT8195. > > Signed-off-by: Zhiyong Tao Acked-by: Sean Wang > --- > drivers/pinctrl/mediatek/pinctrl-mt8195.c | 22 +++ > .../pinctrl/mediatek/pinctrl-mtk-common-v2.c | 14 > .../pinctrl/mediatek/pinctrl-mtk-common-v2.h | 5 + > 3 files changed, 41 insertions(+) > > diff --git a/drivers/pinctrl/mediatek/pinctrl-mt8195.c > b/drivers/pinctrl/mediatek/pinctrl-mt8195.c > index 063f164d7c9b..a7500e18bb1d 100644 > --- a/drivers/pinctrl/mediatek/pinctrl-mt8195.c > +++ b/drivers/pinctrl/mediatek/pinctrl-mt8195.c > @@ -760,6 +760,25 @@ static const struct mtk_pin_field_calc > mt8195_pin_drv_range[] = { > PIN_FIELD_BASE(143, 143, 1, 0x020, 0x10, 24, 3), > }; > > +static const struct mtk_pin_field_calc mt8195_pin_drv_adv_range[] = { > + PIN_FIELD_BASE(8, 8, 4, 0x020, 0x10, 15, 3), > + PIN_FIELD_BASE(9, 9, 4, 0x020, 0x10, 0, 3), > + PIN_FIELD_BASE(10, 10, 4, 0x020, 0x10, 18, 3), > + PIN_FIELD_BASE(11, 11, 4, 0x020, 0x10, 3, 3), > + PIN_FIELD_BASE(12, 12, 4, 0x020, 0x10, 21, 3), > + PIN_FIELD_BASE(13, 13, 4, 0x020, 0x10, 6, 3), > + PIN_FIELD_BASE(14, 14, 4, 0x020, 0x10, 24, 3), > + PIN_FIELD_BASE(15, 15, 4, 0x020, 0x10, 9, 3), > + PIN_FIELD_BASE(16, 16, 4, 0x020, 0x10, 27, 3), > + PIN_FIELD_BASE(17, 17, 4, 0x020, 0x10, 12, 3), > + PIN_FIELD_BASE(29, 29, 2, 0x020, 0x10, 0, 3), > + PIN_FIELD_BASE(30, 30, 2, 0x020, 0x10, 3, 3), > + PIN_FIELD_BASE(34, 34, 1, 0x040, 0x10, 0, 3), > + PIN_FIELD_BASE(35, 35, 1, 0x040, 0x10, 3, 3), > + PIN_FIELD_BASE(44, 44, 1, 0x040, 0x10, 6, 3), > + PIN_FIELD_BASE(45, 45, 1, 0x040, 0x10, 9, 3), > +}; > + > static const struct mtk_pin_reg_calc mt8195_reg_cals[PINCTRL_PIN_REG_MAX] = { > [PINCTRL_PIN_REG_MODE] = MTK_RANGE(mt8195_pin_mode_range), > [PINCTRL_PIN_REG_DIR] = MTK_RANGE(mt8195_pin_dir_range), > @@ -773,6 +792,7 @@ static const struct mtk_pin_reg_calc > mt8195_reg_cals[PINCTRL_PIN_REG_MAX] = { > [PINCTRL_PIN_REG_PUPD] = MTK_RANGE(mt8195_pin_pupd_range), > [PINCTRL_PIN_REG_R0] = MTK_RANGE(mt8195_pin_r0_range), > [PINCTRL_PIN_REG_R1] = MTK_RANGE(mt8195_pin_r1_range), > + [PINCTRL_PIN_REG_DRV_ADV] = MTK_RANGE(mt8195_pin_drv_adv_range), > }; > > static const char * const mt8195_pinctrl_register_base_names[] = { > @@ -801,6 +821,8 @@ static const struct mtk_pin_soc mt8195_data = { > .bias_get_combo = mtk_pinconf_bias_get_combo, > .drive_set = mtk_pinconf_drive_set_rev1, > .drive_get = mtk_pinconf_drive_get_rev1, > + .adv_drive_get = mtk_pinconf_adv_drive_get_raw, > + .adv_drive_set = mtk_pinconf_adv_drive_set_raw, > }; > > static const struct of_device_id mt8195_pinctrl_of_match[] = { > diff --git a/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.c > b/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.c > index 72f17f26acd8..2b51f4a9b860 100644 > --- a/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.c > +++ b/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.c > @@ -1027,6 +1027,20 @@ int mtk_pinconf_adv_drive_get(struct mtk_pinctrl *hw, > } > EXPORT_SYMBOL_GPL(mtk_pinconf_adv_drive_get); > > +int mtk_pinconf_adv_drive_set_raw(struct mtk_pinctrl *hw, > + const struct mtk_pin_desc *desc, u32 arg) > +{ > + return mtk_hw_set_value(hw, desc, PINCTRL_PIN_REG_DRV_ADV, arg); > +} > +EXPORT_SYMBOL_GPL(mtk_pinconf_adv_drive_set_raw); > + > +int mtk_pinconf_adv_drive_get_raw(struct mtk_pinctrl *hw, > + const struct mtk_pin_desc *desc, u32 *val) > +{ > + return mtk_hw_get_value(hw, desc, PINCTRL_PIN_REG_DRV_ADV, val); > +} > +EXPORT_SYMBOL_GPL(mtk_pinconf_adv_drive_get_raw); > + > MODULE_LICENSE("GPL v2"); > MODULE_AUTHOR("Sean Wang "); > MODULE_DESCRIPTION("Pin configuration library module for mediatek SoCs"); > diff --git a/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.h > b/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.h > index e2aae285b5fc..fd5ce9c5dcbd 100644 > --- a/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.h > +++ b/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.h > @@ -66,6 +66,7 @@ enum { > PINCTRL_PIN_REG_DRV_EN, > PINCTRL_PIN_REG_DRV_E0, > PINCTRL_PIN_REG_DRV_E1, > + PINCTRL_PIN_REG_DRV_ADV, > PINCTRL_PIN_REG_MAX, > }; > > @@ -314,6 +315,10 @@ int mtk_pinconf_adv_drive_set(struct mtk_pinctrl *hw, > const struct mtk_pin_desc *desc, u32 arg); > int mtk_pinconf_adv_drive_get(struct mtk_pinctrl *hw, > const struct mtk_pin_desc *desc, u32 *val); > +int mtk_pinconf_adv_drive_set_raw(struct mtk_pinctrl *hw, > + const struct mtk_pin_desc *desc, u32 arg); > +int mtk_pinconf_adv_drive_get_raw(struct mtk_pinctrl *h
[PATCH] drivers: regulator: core.c: Fix indentation of comment
Shifted the closing */ of multiline comment to a new line This is done to maintain code uniformity Signed-off-by: Shubhankar Kuranagatti --- drivers/regulator/core.c | 21 ++--- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c index 16114aea099a..06fbf18b6524 100644 --- a/drivers/regulator/core.c +++ b/drivers/regulator/core.c @@ -538,7 +538,8 @@ static int regulator_mode_constrain(struct regulator_dev *rdev, /* The modes are bitmasks, the most power hungry modes having * the lowest values. If the requested mode isn't supported -* try higher modes. */ +* try higher modes. +*/ while (*mode) { if (rdev->constraints->valid_modes_mask & *mode) return 0; @@ -931,7 +932,8 @@ static DEVICE_ATTR(bypass, 0444, regulator_bypass_show, NULL); /* Calculate the new optimum regulator operating mode based on the new total - * consumer load. All locks held by caller */ + * consumer load. All locks held by caller + */ static int drms_uA_update(struct regulator_dev *rdev) { struct regulator *sibling; @@ -1219,7 +1221,8 @@ static int machine_constraints_voltage(struct regulator_dev *rdev, int cmax = constraints->max_uV; /* it's safe to autoconfigure fixed-voltage supplies - and the constraints are used by list_voltage. */ +* and the constraints are used by list_voltage. +*/ if (count == 1 && !cmin) { cmin = 1; cmax = INT_MAX; @@ -2525,7 +2528,8 @@ static int _regulator_do_enable(struct regulator_dev *rdev) /* Allow the regulator to ramp; it would be useful to extend * this for bulk operations so that the regulators can ramp -* together. */ +* together. +*/ trace_regulator_enable_delay(rdev_get_name(rdev)); /* If poll_enabled_time is set, poll upto the delay calculated @@ -5337,10 +5341,12 @@ regulator_register(const struct regulator_desc *regulator_desc, ret = set_machine_constraints(rdev); if (ret == -EPROBE_DEFER) { /* Regulator might be in bypass mode and so needs its supply -* to set the constraints */ +* to set the constraints +*/ /* FIXME: this currently triggers a chicken-and-egg problem * when creating -SUPPLY symlink in sysfs to a regulator -* that is just being created */ +* that is just being created +*/ rdev_dbg(rdev, "will resolve supply early: %s\n", rdev->supply_name); ret = regulator_resolve_supply(rdev); @@ -5899,7 +5905,8 @@ static int regulator_late_cleanup(struct device *dev, void *data) if (have_full_constraints()) { /* We log since this may kill the system if it goes -* wrong. */ +* wrong. +*/ rdev_info(rdev, "disabling\n"); ret = _regulator_do_disable(rdev); if (ret != 0) -- 2.17.1
Re: [PATCH v4 2/4] pinctrl: add pinctrl driver on mt8195
On Mon, Apr 12, 2021 at 10:57 PM Zhiyong Tao wrote: > > This commit includes pinctrl driver for mt8195. > > Signed-off-by: Zhiyong Tao Acked-by: Sean Wang > --- > drivers/pinctrl/mediatek/Kconfig |6 + > drivers/pinctrl/mediatek/Makefile |1 + > drivers/pinctrl/mediatek/pinctrl-mt8195.c | 828 > drivers/pinctrl/mediatek/pinctrl-mtk-mt8195.h | 1669 + > 4 files changed, 2504 insertions(+) > create mode 100644 drivers/pinctrl/mediatek/pinctrl-mt8195.c > create mode 100644 drivers/pinctrl/mediatek/pinctrl-mtk-mt8195.h > > diff --git a/drivers/pinctrl/mediatek/Kconfig > b/drivers/pinctrl/mediatek/Kconfig > index eef17f228669..90f0c8255eaf 100644 > --- a/drivers/pinctrl/mediatek/Kconfig > +++ b/drivers/pinctrl/mediatek/Kconfig > @@ -147,6 +147,12 @@ config PINCTRL_MT8192 > default ARM64 && ARCH_MEDIATEK > select PINCTRL_MTK_PARIS > > +config PINCTRL_MT8195 > + bool "Mediatek MT8195 pin control" > + depends on OF > + depends on ARM64 || COMPILE_TEST > + select PINCTRL_MTK_PARIS > + > config PINCTRL_MT8516 > bool "Mediatek MT8516 pin control" > depends on OF > diff --git a/drivers/pinctrl/mediatek/Makefile > b/drivers/pinctrl/mediatek/Makefile > index 01218bf4dc30..06fde993ace2 100644 > --- a/drivers/pinctrl/mediatek/Makefile > +++ b/drivers/pinctrl/mediatek/Makefile > @@ -21,5 +21,6 @@ obj-$(CONFIG_PINCTRL_MT8167) += pinctrl-mt8167.o > obj-$(CONFIG_PINCTRL_MT8173) += pinctrl-mt8173.o > obj-$(CONFIG_PINCTRL_MT8183) += pinctrl-mt8183.o > obj-$(CONFIG_PINCTRL_MT8192) += pinctrl-mt8192.o > +obj-$(CONFIG_PINCTRL_MT8195)+= pinctrl-mt8195.o > obj-$(CONFIG_PINCTRL_MT8516) += pinctrl-mt8516.o > obj-$(CONFIG_PINCTRL_MT6397) += pinctrl-mt6397.o > diff --git a/drivers/pinctrl/mediatek/pinctrl-mt8195.c > b/drivers/pinctrl/mediatek/pinctrl-mt8195.c > new file mode 100644 > index ..063f164d7c9b > --- /dev/null > +++ b/drivers/pinctrl/mediatek/pinctrl-mt8195.c > @@ -0,0 +1,828 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Copyright (C) 2020 MediaTek Inc. > + * > + * Author: Zhiyong Tao > + * > + */ > + > +#include "pinctrl-mtk-mt8195.h" > +#include "pinctrl-paris.h" > + > +/* MT8195 have multiple bases to program pin configuration listed as the > below: > + * iocfg[0]:0x10005000, iocfg[1]:0x11d1, iocfg[2]:0x11d3, > + * iocfg[3]:0x11d4, iocfg[4]:0x11e2, iocfg[5]:0x11eb, > + * iocfg[6]:0x11f4. > + * _i_based could be used to indicate what base the pin should be mapped > into. > + */ > + > +#define PIN_FIELD_BASE(s_pin, e_pin, i_base, s_addr, x_addrs, s_bit, x_bits) > \ > + PIN_FIELD_CALC(s_pin, e_pin, i_base, s_addr, x_addrs, s_bit, x_bits, \ > + 32, 0) > + > +#define PINS_FIELD_BASE(s_pin, e_pin, i_base, s_addr, x_addrs, s_bit, > x_bits) \ > + PIN_FIELD_CALC(s_pin, e_pin, i_base, s_addr, x_addrs, s_bit, x_bits, > \ > + 32, 1) > + > +static const struct mtk_pin_field_calc mt8195_pin_mode_range[] = { > + PIN_FIELD(0, 144, 0x300, 0x10, 0, 4), > +}; > + > +static const struct mtk_pin_field_calc mt8195_pin_dir_range[] = { > + PIN_FIELD(0, 144, 0x0, 0x10, 0, 1), > +}; > + > +static const struct mtk_pin_field_calc mt8195_pin_di_range[] = { > + PIN_FIELD(0, 144, 0x200, 0x10, 0, 1), > +}; > + > +static const struct mtk_pin_field_calc mt8195_pin_do_range[] = { > + PIN_FIELD(0, 144, 0x100, 0x10, 0, 1), > +}; > + > +static const struct mtk_pin_field_calc mt8195_pin_ies_range[] = { > + PIN_FIELD_BASE(0, 0, 4, 0x040, 0x10, 0, 1), > + PIN_FIELD_BASE(1, 1, 4, 0x040, 0x10, 1, 1), > + PIN_FIELD_BASE(2, 2, 4, 0x040, 0x10, 2, 1), > + PIN_FIELD_BASE(3, 3, 4, 0x040, 0x10, 3, 1), > + PIN_FIELD_BASE(4, 4, 4, 0x040, 0x10, 4, 1), > + PIN_FIELD_BASE(5, 5, 4, 0x040, 0x10, 5, 1), > + PIN_FIELD_BASE(6, 6, 4, 0x040, 0x10, 6, 1), > + PIN_FIELD_BASE(7, 7, 4, 0x040, 0x10, 7, 1), > + PIN_FIELD_BASE(8, 8, 4, 0x040, 0x10, 13, 1), > + PIN_FIELD_BASE(9, 9, 4, 0x040, 0x10, 8, 1), > + PIN_FIELD_BASE(10, 10, 4, 0x040, 0x10, 14, 1), > + PIN_FIELD_BASE(11, 11, 4, 0x040, 0x10, 9, 1), > + PIN_FIELD_BASE(12, 12, 4, 0x040, 0x10, 15, 1), > + PIN_FIELD_BASE(13, 13, 4, 0x040, 0x10, 10, 1), > + PIN_FIELD_BASE(14, 14, 4, 0x040, 0x10, 16, 1), > + PIN_FIELD_BASE(15, 15, 4, 0x040, 0x10, 11, 1), > + PIN_FIELD_BASE(16, 16, 4, 0x040, 0x10, 17, 1), > + PIN_FIELD_BASE(17, 17, 4, 0x040, 0x10, 12, 1), > + PIN_FIELD_BASE(18, 18, 2, 0x040, 0x10, 5, 1), > + PIN_FIELD_BASE(19, 19, 2, 0x040, 0x10, 12, 1), > + PIN_FIELD_BASE(20, 20, 2, 0x040, 0x10, 11, 1), > + PIN_FIELD_BASE(21, 21, 2, 0x040, 0x10, 10, 1), > + PIN_FIELD_BASE(22, 22, 2, 0x040, 0x10, 0, 1), > + PIN_FIELD_BASE(23, 23, 2, 0x040, 0x10, 1, 1), > + PIN_FIELD_BASE(24, 24, 2, 0x040, 0x10, 2, 1), > + PIN_FIELD_B
Re: [PATCH] powerpc/pseries: Add shutdown() to vio_driver and vio_bus
Tyrel Datwyler writes: > On 4/17/21 5:30 AM, Michael Ellerman wrote: >> Tyrel Datwyler writes: >>> On 4/1/21 5:13 PM, Tyrel Datwyler wrote: Currently, neither the vio_bus or vio_driver structures provide support for a shutdown() routine. Add support for shutdown() by allowing drivers to provide a implementation via function pointer in their vio_driver struct and provide a proper implementation in the driver template for the vio_bus that calls a vio drivers shutdown() if defined. In the case that no shutdown() is defined by a vio driver and a kexec is in progress we implement a big hammer that calls remove() to ensure no further DMA for the devices is possible. Signed-off-by: Tyrel Datwyler --- >>> >>> Ping... any comments, problems with this approach? >> >> The kexec part seems like a bit of a hack. >> >> It also doesn't help for kdump, when none of the shutdown code is run. > > If I understand correctly for kdump we have a reserved memory space where the > kdump kernel is loaded, but for kexec the memory region isn't reserved ahead > of > time meaning we can try and load the kernel over potential memory used for DMA > by the current kernel. That's correct. >> How many drivers do we have? Can we just implement a proper shutdown for >> them? > > Well that is the end goal. I just don't currently have the bandwidth to do > each > driver myself with a proper shutdown sequence, and thought this was a > launching > off point to at least introduce the shutdown callback to the VIO bus. Fair enough. > Off the top of my head we have 3 storage drivers, 2 network drivers, vtpm, > vmc, > pseries_rng, nx, nx842, hvcs, hvc_vio. > > I can drop the kexec_in_progress hammer and just have each driver call > remove() > themselves in their shutdown function. Leave it to each maintainer to decide > if > remove() is enough or if there is a more lightweight quiesce sequence they > choose to implement. That's OK, you've convinced me. I'll take it as-is. Eventually it would be good for drivers to implement shutdown in the optimal way for their device, but that can be done incrementally. cheers
Re: [PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3.3)
On Mon, Apr 19, 2021 at 06:09:11PM +0200, Christian Brauner wrote: > On Mon, Apr 19, 2021 at 07:25:14AM -0500, Serge Hallyn wrote: > > cap_setfcap is required to create file capabilities. > > > > Since 8db6c34f1dbc ("Introduce v3 namespaced file capabilities"), a > > process running as uid 0 but without cap_setfcap is able to work around > > this as follows: unshare a new user namespace which maps parent uid 0 > > into the child namespace. While this task will not have new > > capabilities against the parent namespace, there is a loophole due to > > the way namespaced file capabilities are represented as xattrs. File > > capabilities valid in userns 1 are distinguished from file capabilities > > valid in userns 2 by the kuid which underlies uid 0. Therefore the > > restricted root process can unshare a new self-mapping namespace, add a > > namespaced file capability onto a file, then use that file capability in > > the parent namespace. > > > > To prevent that, do not allow mapping parent uid 0 if the process which > > opened the uid_map file does not have CAP_SETFCAP, which is the capability > > for setting file capabilities. > > > > As a further wrinkle: a task can unshare its user namespace, then > > open its uid_map file itself, and map (only) its own uid. In this > > case we do not have the credential from before unshare, which was > > potentially more restricted. So, when creating a user namespace, we > > record whether the creator had CAP_SETFCAP. Then we can use that > > during map_write(). > > > > With this patch: > > > > 1. Unprivileged user can still unshare -Ur > > > > ubuntu@caps:~$ unshare -Ur > > root@caps:~# logout > > > > 2. Root user can still unshare -Ur > > > > ubuntu@caps:~$ sudo bash > > root@caps:/home/ubuntu# unshare -Ur > > root@caps:/home/ubuntu# logout > > > > 3. Root user without CAP_SETFCAP cannot unshare -Ur: > > > > root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap -- > > root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap > > unable to set CAP_SETFCAP effective capability: Operation not permitted > > root@caps:/home/ubuntu# unshare -Ur > > unshare: write failed /proc/self/uid_map: Operation not permitted > > > > Note: an alternative solution would be to allow uid 0 mappings by > > processes without CAP_SETFCAP, but to prevent such a namespace from > > writing any file capabilities. This approach can be seen here: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4 > > > > Ah, can you link to the previous fix and its revert, please? I think > that was mentioned in the formerly private thread as well but we forgot: > > commit 95ebabde382c371572297915b104e55403674e73 > Author: Eric W. Biederman > Date: Thu Dec 17 09:42:00 2020 -0600 > > capabilities: Don't allow writing ambiguous v3 file capabilities > > commit 3b0c2d3eaa83da259d7726192cf55a137769012f > Author: Eric W. Biederman > Date: Fri Mar 12 15:07:09 2021 -0600 > > Revert 95ebabde382c ("capabilities: Don't allow writing ambiguous v3 file > capabilities") Sure. Is there a tag for that kind of thing or do I just mention it at the end of the description?
Re: [PATCH v4 5/7] cpufreq: qcom-hw: Implement CPRh aware OSM programming
On 19-04-21, 13:52, Bjorn Andersson wrote: > On Tue 19 Jan 11:45 CST 2021, AngeloGioacchino Del Regno wrote: > @Viresh, do you have any suggestion regarding my last comment? > > static int qcom_cpufreq_hw_driver_probe(struct platform_device *pdev) > > { > > + const struct qcom_cpufreq_soc_data *soc_data; > > + struct device_node *pd_node; > > + struct platform_device *pd_dev; > > struct device *cpu_dev; > > struct clk *clk; > > - int ret; > > + int clk_div, ret; > > + > > + cpu_dev = get_cpu_device(0); > > + if (!cpu_dev) > > + return -EPROBE_DEFER; > > + > > + soc_data = of_device_get_match_data(&pdev->dev); > > + if (!soc_data) > > + return -EINVAL; > > + > > + if (!soc_data->uses_tz) { > > + /* > > +* When the OSM is not pre-programmed from TZ, we will > > +* need to program the sequencer through SCM calls. > > +*/ > > + if (!qcom_scm_is_available()) > > + return -EPROBE_DEFER; > > + > > + /* > > +* If there are no power-domains, OSM programming cannot be > > +* performed, as in that case, we wouldn't know where to take > > +* the params from... > > +*/ > > + pd_node = of_parse_phandle(cpu_dev->of_node, > > + "power-domains", 0); > > + if (!pd_node) { > > + ret = PTR_ERR(pd_node); > > + dev_err(cpu_dev, "power domain not found: %d\n", ret); > > + return ret; > > + } > > + > > + /* > > +* If the power domain device is not registered yet, then > > +* defer probing this driver until that is available. > > +*/ > > + pd_dev = of_find_device_by_node(pd_node); > > + if (!pd_dev || !pd_dev->dev.driver || > > + !device_is_bound(&pd_dev->dev)) > > + return -EPROBE_DEFER; > > I wonder if there's a more appropriate way to probe defer on resources > described in the CPU nodes... Recently we made some updates to the OPP core to start returning EPROBE_DEFER on failure to acquire resources. I think you can get rid of many checks for resources here by just trying to create the OPP table and check its return value for EPROBE_DEFER. -- viresh
Re: Enabling pmbus power control
On Mon, Apr 19, 2021 at 08:29:53PM -0500, Zev Weiss wrote: > On Tue, Mar 30, 2021 at 02:38:10PM CDT, Guenter Roeck wrote: > > On Tue, Mar 30, 2021 at 07:02:00PM +0100, Mark Brown wrote: > > > On Tue, Mar 30, 2021 at 12:56:56PM -0500, Zev Weiss wrote: > > > > > > > Okay, to expand a bit on the description in my initial message -- we've > > > > got a single chassis with multiple server boards and a single manager > > > > board > > > > that handles, among other things, power control for the servers. > > > > The manager board has one LM25066 for each attached server, which acts > > > > as > > > > the "power switch" for that server. There thus really isn't any driver > > > > to > > > > speak of for the downstream device. > > > > > > This sounds like you need a driver representing those server boards (or > > > the slots they plug into perhaps) that represents everything about those > > > boards to userspace, including power switching. I don't see why you > > > wouldn't have a driver for that - it's a thing that physically exists > > > and clearly has some software control, and you'd presumably also expect > > > to represent some labelling about the slot as well. > > > > Absolutely agree. > > > > Thanks, > > Guenter > > Hi Guenter, Mark, > > I wanted to return to this to try to explain why structuring the kernel > support for this in a way that's specific to the device behind the PMIC > seems like an awkward fit to me, and ultimately kind of artificial. > > In the system I described, the manager board with the LM25066 devices is its > own little self-contained BMC system running its own Linux kernel (though > "BMC" is perhaps a slightly misleading term because there's no host system > that it manages). The PMICs are really the only relevant connection it has > to the servers it controls power for -- they have their own dedicated local > BMCs on board as well doing all the usual BMC tasks. A hypothetical > dedicated driver for this on the manager board wouldn't have any other > hardware to touch aside from the pmbus interface of the LM25066 itself, so > calling it a server board driver seems pretty arbitrary -- and in fact the > same system also has another LM25066 that controls the power supply to the > chassis cooling fans (a totally different downstream device, but one that > would be equally well-served by the same software). > > More recently, another system has entered the picture for us that might > illustrate it more starkly -- the Delta Open19 power shelf [0] supported by > a recent code release from LinkedIn [1]. This is a rackmount power supply All I can see is that this code is a mess. > with fifty outputs, each independently switchable via an LM25066 attached to > an on-board BMC-style management controller (though again, no host system > involved). We (Equinix Metal) are interested in porting a modern OpenBMC to > it to replace the dated, crufty, pre-Linux-Foundation version of OpenBMC it > currently runs (as found in the linked repo). The exact nature of the > devices powered by its outputs is well outside the scope of the firmware > running on that controller (it could be any arbitrary thing that runs on > 12VDC), but we still want to be able to both (a) retrieve per-output > voltage/current/power readings as provided by the existing LM25066 hwmon > support, and (b) control the on/off state of those outputs from userspace. > > Given the array of possible use-cases, an approach of adding power-switch > functionality to the existing LM25066 support seems like the most obvious > way to support this, so I'm hoping to see if I can get some idea of what an > acceptable implementation of that might look like. > Sorry, that is simply a no-go for the LM25066 driver. I mentioned it before, and it is still true: The hwmon subsystem is not in the business of power control. Have you looked into enclosures (drivers/misc/enclosure.c) ? Maybe that or something similar could be used for your purposes. Thanks, Guenter
[Patch v3 7/7] crypto: qce: aead: Schedule fallback algorithm
Qualcomm crypto engine does not handle the following scenarios and will issue an abort. In such cases, pass on the transformation to a fallback algorithm. - DES3 algorithms with all three keys same. - AES192 algorithms. - 0 length messages. Signed-off-by: Thara Gopinath --- v1->v2: - Updated crypto_aead_set_reqsize to include the size of fallback request as well. drivers/crypto/qce/aead.c | 64 --- drivers/crypto/qce/aead.h | 3 ++ 2 files changed, 56 insertions(+), 11 deletions(-) diff --git a/drivers/crypto/qce/aead.c b/drivers/crypto/qce/aead.c index ef66ae21eae3..6d06a19b48e4 100644 --- a/drivers/crypto/qce/aead.c +++ b/drivers/crypto/qce/aead.c @@ -512,7 +512,23 @@ static int qce_aead_crypt(struct aead_request *req, int encrypt) /* CE does not handle 0 length messages */ if (!rctx->cryptlen) { if (!(IS_CCM(rctx->flags) && IS_DECRYPT(rctx->flags))) - return -EINVAL; + ctx->need_fallback = true; + } + + /* If fallback is needed, schedule and exit */ + if (ctx->need_fallback) { + /* Reset need_fallback in case the same ctx is used for another transaction */ + ctx->need_fallback = false; + + aead_request_set_tfm(&rctx->fallback_req, ctx->fallback); + aead_request_set_callback(&rctx->fallback_req, req->base.flags, + req->base.complete, req->base.data); + aead_request_set_crypt(&rctx->fallback_req, req->src, + req->dst, req->cryptlen, req->iv); + aead_request_set_ad(&rctx->fallback_req, req->assoclen); + + return encrypt ? crypto_aead_encrypt(&rctx->fallback_req) : +crypto_aead_decrypt(&rctx->fallback_req); } /* @@ -553,7 +569,7 @@ static int qce_aead_ccm_setkey(struct crypto_aead *tfm, const u8 *key, memcpy(ctx->ccm4309_salt, key + keylen, QCE_CCM4309_SALT_SIZE); } - if (keylen != AES_KEYSIZE_128 && keylen != AES_KEYSIZE_256) + if (keylen != AES_KEYSIZE_128 && keylen != AES_KEYSIZE_256 && keylen != AES_KEYSIZE_192) return -EINVAL; ctx->enc_keylen = keylen; @@ -562,7 +578,12 @@ static int qce_aead_ccm_setkey(struct crypto_aead *tfm, const u8 *key, memcpy(ctx->enc_key, key, keylen); memcpy(ctx->auth_key, key, keylen); - return 0; + if (keylen == AES_KEYSIZE_192) + ctx->need_fallback = true; + + return IS_CCM_RFC4309(flags) ? + crypto_aead_setkey(ctx->fallback, key, keylen + QCE_CCM4309_SALT_SIZE) : + crypto_aead_setkey(ctx->fallback, key, keylen); } static int qce_aead_setkey(struct crypto_aead *tfm, const u8 *key, unsigned int keylen) @@ -593,20 +614,21 @@ static int qce_aead_setkey(struct crypto_aead *tfm, const u8 *key, unsigned int * The crypto engine does not support any two keys * being the same for triple des algorithms. The * verify_skcipher_des3_key does not check for all the -* below conditions. Return -EINVAL in case any two keys -* are the same. Revisit to see if a fallback cipher -* is needed to handle this condition. +* below conditions. Schedule fallback in this case. */ memcpy(_key, authenc_keys.enckey, DES3_EDE_KEY_SIZE); if (!((_key[0] ^ _key[2]) | (_key[1] ^ _key[3])) || !((_key[2] ^ _key[4]) | (_key[3] ^ _key[5])) || !((_key[0] ^ _key[4]) | (_key[1] ^ _key[5]))) - return -EINVAL; + ctx->need_fallback = true; } else if (IS_AES(flags)) { /* No random key sizes */ if (authenc_keys.enckeylen != AES_KEYSIZE_128 && + authenc_keys.enckeylen != AES_KEYSIZE_192 && authenc_keys.enckeylen != AES_KEYSIZE_256) return -EINVAL; + if (authenc_keys.enckeylen == AES_KEYSIZE_192) + ctx->need_fallback = true; } ctx->enc_keylen = authenc_keys.enckeylen; @@ -617,7 +639,7 @@ static int qce_aead_setkey(struct crypto_aead *tfm, const u8 *key, unsigned int memset(ctx->auth_key, 0, sizeof(ctx->auth_key)); memcpy(ctx->auth_key, authenc_keys.authkey, authenc_keys.authkeylen); - return 0; + return crypto_aead_setkey(ctx->fallback, key, keylen); } static int qce_aead_setauthsize(struct crypto_aead *tfm, unsigned int authsize) @@ -632,15 +654,33 @@ static int qce_aead_setauthsize(struct crypto_aead *tfm, unsigned int authsize) return -EINVAL; } ctx->authsize = authsize; - return 0; + + return crypto_aead_setauths
[Patch v3 6/7] crypto: qce: common: Add support for AEAD algorithms
Add register programming sequence for enabling AEAD algorithms on the Qualcomm crypto engine. Signed-off-by: Thara Gopinath --- v2->v3: - Made qce_be32_to_cpu_array truly be32 to cpu endian by using be32_to_cpup instead of cpu_to_be32p. Also remove the (u32 *) typcasting of arrays obtained as output from qce_be32_to_cpu_array as per Bjorn's review comments. - Wrapped newly introduced std_iv_sha1, std_iv_sha256 and qce_be32_to_cpu_array in CONFIG_CRYPTO_DEV_QCE_AEAD to prevent W1 warnings as reported by kernel test robot . v1->v2: - Minor fixes like removing not needed initializing of variables and using bool values in lieu of 0 and 1 as pointed out by Bjorn. - Introduced qce_be32_to_cpu_array which converts the u8 string in big endian order to array of u32 and returns back total number of words, as per Bjorn's review comments. Presently this function is used only by qce_setup_regs_aead to format keys, iv and nonce. cipher and hash algorithms can be made to use this function as a separate clean up patch. drivers/crypto/qce/common.c | 162 +++- 1 file changed, 160 insertions(+), 2 deletions(-) diff --git a/drivers/crypto/qce/common.c b/drivers/crypto/qce/common.c index 7b3d6caec1b2..6d6b3792323b 100644 --- a/drivers/crypto/qce/common.c +++ b/drivers/crypto/qce/common.c @@ -15,6 +15,7 @@ #include "core.h" #include "regs-v5.h" #include "sha.h" +#include "aead.h" static inline u32 qce_read(struct qce_device *qce, u32 offset) { @@ -96,7 +97,7 @@ static inline void qce_crypto_go(struct qce_device *qce, bool result_dump) qce_write(qce, REG_GOPROC, BIT(GO_SHIFT)); } -#ifdef CONFIG_CRYPTO_DEV_QCE_SHA +#if defined(CONFIG_CRYPTO_DEV_QCE_SHA) || defined(CONFIG_CRYPTO_DEV_QCE_AEAD) static u32 qce_auth_cfg(unsigned long flags, u32 key_size, u32 auth_size) { u32 cfg = 0; @@ -139,7 +140,9 @@ static u32 qce_auth_cfg(unsigned long flags, u32 key_size, u32 auth_size) return cfg; } +#endif +#ifdef CONFIG_CRYPTO_DEV_QCE_SHA static int qce_setup_regs_ahash(struct crypto_async_request *async_req) { struct ahash_request *req = ahash_request_cast(async_req); @@ -225,7 +228,7 @@ static int qce_setup_regs_ahash(struct crypto_async_request *async_req) } #endif -#ifdef CONFIG_CRYPTO_DEV_QCE_SKCIPHER +#if defined(CONFIG_CRYPTO_DEV_QCE_SKCIPHER) || defined(CONFIG_CRYPTO_DEV_QCE_AEAD) static u32 qce_encr_cfg(unsigned long flags, u32 aes_key_size) { u32 cfg = 0; @@ -271,7 +274,9 @@ static u32 qce_encr_cfg(unsigned long flags, u32 aes_key_size) return cfg; } +#endif +#ifdef CONFIG_CRYPTO_DEV_QCE_SKCIPHER static void qce_xts_swapiv(__be32 *dst, const u8 *src, unsigned int ivsize) { u8 swap[QCE_AES_IV_LENGTH]; @@ -386,6 +391,155 @@ static int qce_setup_regs_skcipher(struct crypto_async_request *async_req) } #endif +#ifdef CONFIG_CRYPTO_DEV_QCE_AEAD +static const u32 std_iv_sha1[SHA256_DIGEST_SIZE / sizeof(u32)] = { + SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4, 0, 0, 0 +}; + +static const u32 std_iv_sha256[SHA256_DIGEST_SIZE / sizeof(u32)] = { + SHA256_H0, SHA256_H1, SHA256_H2, SHA256_H3, + SHA256_H4, SHA256_H5, SHA256_H6, SHA256_H7 +}; + +static unsigned int qce_be32_to_cpu_array(u32 *dst, const u8 *src, unsigned int len) +{ + u32 *d = dst; + const u8 *s = src; + unsigned int n; + + n = len / sizeof(u32); + for (; n > 0; n--) { + *d = be32_to_cpup((const __be32 *)s); + s += sizeof(u32); + d++; + } + return DIV_ROUND_UP(len, sizeof(u32)); +} + +static int qce_setup_regs_aead(struct crypto_async_request *async_req) +{ + struct aead_request *req = aead_request_cast(async_req); + struct qce_aead_reqctx *rctx = aead_request_ctx(req); + struct qce_aead_ctx *ctx = crypto_tfm_ctx(async_req->tfm); + struct qce_alg_template *tmpl = to_aead_tmpl(crypto_aead_reqtfm(req)); + struct qce_device *qce = tmpl->qce; + u32 enckey[QCE_MAX_CIPHER_KEY_SIZE / sizeof(u32)] = {0}; + u32 enciv[QCE_MAX_IV_SIZE / sizeof(u32)] = {0}; + u32 authkey[QCE_SHA_HMAC_KEY_SIZE / sizeof(u32)] = {0}; + u32 authiv[SHA256_DIGEST_SIZE / sizeof(u32)] = {0}; + u32 authnonce[QCE_MAX_NONCE / sizeof(u32)] = {0}; + unsigned int enc_keylen = ctx->enc_keylen; + unsigned int auth_keylen = ctx->auth_keylen; + unsigned int enc_ivsize = rctx->ivsize; + unsigned int auth_ivsize; + unsigned int enckey_words, enciv_words; + unsigned int authkey_words, authiv_words, authnonce_words; + unsigned long flags = rctx->flags; + u32 encr_cfg, auth_cfg, config, totallen; + u32 iv_last_word; + + qce_setup_config(qce); + + /* Write encryption key */ + enckey_words = qce_be32_to_cpu_array(enckey, ctx->enc_key, e
[Patch v3 5/7] crypto: qce: common: Clean up qce_auth_cfg
Remove various redundant checks in qce_auth_cfg. Also allow qce_auth_cfg to take auth_size as a parameter which is a required setting for ccm(aes) algorithms Signed-off-by: Thara Gopinath --- drivers/crypto/qce/common.c | 21 + 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/drivers/crypto/qce/common.c b/drivers/crypto/qce/common.c index 7b5bc5a6ae81..7b3d6caec1b2 100644 --- a/drivers/crypto/qce/common.c +++ b/drivers/crypto/qce/common.c @@ -97,11 +97,11 @@ static inline void qce_crypto_go(struct qce_device *qce, bool result_dump) } #ifdef CONFIG_CRYPTO_DEV_QCE_SHA -static u32 qce_auth_cfg(unsigned long flags, u32 key_size) +static u32 qce_auth_cfg(unsigned long flags, u32 key_size, u32 auth_size) { u32 cfg = 0; - if (IS_AES(flags) && (IS_CCM(flags) || IS_CMAC(flags))) + if (IS_CCM(flags) || IS_CMAC(flags)) cfg |= AUTH_ALG_AES << AUTH_ALG_SHIFT; else cfg |= AUTH_ALG_SHA << AUTH_ALG_SHIFT; @@ -119,15 +119,16 @@ static u32 qce_auth_cfg(unsigned long flags, u32 key_size) cfg |= AUTH_SIZE_SHA256 << AUTH_SIZE_SHIFT; else if (IS_CMAC(flags)) cfg |= AUTH_SIZE_ENUM_16_BYTES << AUTH_SIZE_SHIFT; + else if (IS_CCM(flags)) + cfg |= (auth_size - 1) << AUTH_SIZE_SHIFT; if (IS_SHA1(flags) || IS_SHA256(flags)) cfg |= AUTH_MODE_HASH << AUTH_MODE_SHIFT; - else if (IS_SHA1_HMAC(flags) || IS_SHA256_HMAC(flags) || -IS_CBC(flags) || IS_CTR(flags)) + else if (IS_SHA1_HMAC(flags) || IS_SHA256_HMAC(flags)) cfg |= AUTH_MODE_HMAC << AUTH_MODE_SHIFT; - else if (IS_AES(flags) && IS_CCM(flags)) + else if (IS_CCM(flags)) cfg |= AUTH_MODE_CCM << AUTH_MODE_SHIFT; - else if (IS_AES(flags) && IS_CMAC(flags)) + else if (IS_CMAC(flags)) cfg |= AUTH_MODE_CMAC << AUTH_MODE_SHIFT; if (IS_SHA(flags) || IS_SHA_HMAC(flags)) @@ -136,10 +137,6 @@ static u32 qce_auth_cfg(unsigned long flags, u32 key_size) if (IS_CCM(flags)) cfg |= QCE_MAX_NONCE_WORDS << AUTH_NONCE_NUM_WORDS_SHIFT; - if (IS_CBC(flags) || IS_CTR(flags) || IS_CCM(flags) || - IS_CMAC(flags)) - cfg |= BIT(AUTH_LAST_SHIFT) | BIT(AUTH_FIRST_SHIFT); - return cfg; } @@ -171,7 +168,7 @@ static int qce_setup_regs_ahash(struct crypto_async_request *async_req) qce_clear_array(qce, REG_AUTH_KEY0, 16); qce_clear_array(qce, REG_AUTH_BYTECNT0, 4); - auth_cfg = qce_auth_cfg(rctx->flags, rctx->authklen); + auth_cfg = qce_auth_cfg(rctx->flags, rctx->authklen, digestsize); } if (IS_SHA_HMAC(rctx->flags) || IS_CMAC(rctx->flags)) { @@ -199,7 +196,7 @@ static int qce_setup_regs_ahash(struct crypto_async_request *async_req) qce_write_array(qce, REG_AUTH_BYTECNT0, (u32 *)rctx->byte_count, 2); - auth_cfg = qce_auth_cfg(rctx->flags, 0); + auth_cfg = qce_auth_cfg(rctx->flags, 0, digestsize); if (rctx->last_blk) auth_cfg |= BIT(AUTH_LAST_SHIFT); -- 2.25.1
[Patch v3 2/7] crypto: qce: common: Make result dump optional
Qualcomm crypto engine allows for IV registers and status register to be concatenated to the output. This option is enabled by setting the RESULTS_DUMP field in GOPROC register. This is useful for most of the algorithms to either retrieve status of operation or in case of authentication algorithms to retrieve the mac. But for ccm algorithms, the mac is part of the output stream and not retrieved from the IV registers, thus needing a separate buffer to retrieve it. Make enabling RESULTS_DUMP field optional so that algorithms can choose whether or not to enable the option. Note that in this patch, the enabled algorithms always choose RESULTS_DUMP to be enabled. But later with the introduction of ccm algorithms, this changes. Reviewed-by: Bjorn Andersson Signed-off-by: Thara Gopinath --- drivers/crypto/qce/common.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/crypto/qce/common.c b/drivers/crypto/qce/common.c index dd76175d5c62..7b5bc5a6ae81 100644 --- a/drivers/crypto/qce/common.c +++ b/drivers/crypto/qce/common.c @@ -88,9 +88,12 @@ static void qce_setup_config(struct qce_device *qce) qce_write(qce, REG_CONFIG, config); } -static inline void qce_crypto_go(struct qce_device *qce) +static inline void qce_crypto_go(struct qce_device *qce, bool result_dump) { - qce_write(qce, REG_GOPROC, BIT(GO_SHIFT) | BIT(RESULTS_DUMP_SHIFT)); + if (result_dump) + qce_write(qce, REG_GOPROC, BIT(GO_SHIFT) | BIT(RESULTS_DUMP_SHIFT)); + else + qce_write(qce, REG_GOPROC, BIT(GO_SHIFT)); } #ifdef CONFIG_CRYPTO_DEV_QCE_SHA @@ -219,7 +222,7 @@ static int qce_setup_regs_ahash(struct crypto_async_request *async_req) config = qce_config_reg(qce, 1); qce_write(qce, REG_CONFIG, config); - qce_crypto_go(qce); + qce_crypto_go(qce, true); return 0; } @@ -380,7 +383,7 @@ static int qce_setup_regs_skcipher(struct crypto_async_request *async_req) config = qce_config_reg(qce, 1); qce_write(qce, REG_CONFIG, config); - qce_crypto_go(qce); + qce_crypto_go(qce, true); return 0; } -- 2.25.1
[Patch v3 4/7] crypto: qce: Add support for AEAD algorithms
Introduce support to enable following algorithms in Qualcomm Crypto Engine. - authenc(hmac(sha1),cbc(des)) - authenc(hmac(sha1),cbc(des3_ede)) - authenc(hmac(sha256),cbc(des)) - authenc(hmac(sha256),cbc(des3_ede)) - authenc(hmac(sha256),cbc(aes)) - ccm(aes) - rfc4309(ccm(aes)) Signed-off-by: Thara Gopinath --- v1->v2: - Removed spurious totallen from qce_aead_async_req_handle since it was unused as pointed out by Hubert. - Updated qce_dma_prep_sgs to use #nents returned by dma_map_sg rather than using precomputed #nents. drivers/crypto/Kconfig | 15 + drivers/crypto/qce/Makefile | 1 + drivers/crypto/qce/aead.c | 799 drivers/crypto/qce/aead.h | 53 +++ drivers/crypto/qce/common.h | 2 + drivers/crypto/qce/core.c | 4 + 6 files changed, 874 insertions(+) create mode 100644 drivers/crypto/qce/aead.c create mode 100644 drivers/crypto/qce/aead.h diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig index 9a4c275a1335..1fe5b7eafc02 100644 --- a/drivers/crypto/Kconfig +++ b/drivers/crypto/Kconfig @@ -627,6 +627,12 @@ config CRYPTO_DEV_QCE_SHA select CRYPTO_SHA1 select CRYPTO_SHA256 +config CRYPTO_DEV_QCE_AEAD + bool + depends on CRYPTO_DEV_QCE + select CRYPTO_AUTHENC + select CRYPTO_LIB_DES + choice prompt "Algorithms enabled for QCE acceleration" default CRYPTO_DEV_QCE_ENABLE_ALL @@ -647,6 +653,7 @@ choice bool "All supported algorithms" select CRYPTO_DEV_QCE_SKCIPHER select CRYPTO_DEV_QCE_SHA + select CRYPTO_DEV_QCE_AEAD help Enable all supported algorithms: - AES (CBC, CTR, ECB, XTS) @@ -672,6 +679,14 @@ choice - SHA1, HMAC-SHA1 - SHA256, HMAC-SHA256 + config CRYPTO_DEV_QCE_ENABLE_AEAD + bool "AEAD algorithms only" + select CRYPTO_DEV_QCE_AEAD + help + Enable AEAD algorithms only: + - authenc() + - ccm(aes) + - rfc4309(ccm(aes)) endchoice config CRYPTO_DEV_QCE_SW_MAX_LEN diff --git a/drivers/crypto/qce/Makefile b/drivers/crypto/qce/Makefile index 14ade8a7d664..2cf8984e1b85 100644 --- a/drivers/crypto/qce/Makefile +++ b/drivers/crypto/qce/Makefile @@ -6,3 +6,4 @@ qcrypto-objs := core.o \ qcrypto-$(CONFIG_CRYPTO_DEV_QCE_SHA) += sha.o qcrypto-$(CONFIG_CRYPTO_DEV_QCE_SKCIPHER) += skcipher.o +qcrypto-$(CONFIG_CRYPTO_DEV_QCE_AEAD) += aead.o diff --git a/drivers/crypto/qce/aead.c b/drivers/crypto/qce/aead.c new file mode 100644 index ..ef66ae21eae3 --- /dev/null +++ b/drivers/crypto/qce/aead.c @@ -0,0 +1,799 @@ +// SPDX-License-Identifier: GPL-2.0-only + +/* + * Copyright (C) 2021, Linaro Limited. All rights reserved. + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "aead.h" + +#define CCM_NONCE_ADATA_SHIFT 6 +#define CCM_NONCE_AUTHSIZE_SHIFT 3 +#define MAX_CCM_ADATA_HEADER_LEN6 + +static LIST_HEAD(aead_algs); + +static void qce_aead_done(void *data) +{ + struct crypto_async_request *async_req = data; + struct aead_request *req = aead_request_cast(async_req); + struct qce_aead_reqctx *rctx = aead_request_ctx(req); + struct qce_aead_ctx *ctx = crypto_tfm_ctx(async_req->tfm); + struct qce_alg_template *tmpl = to_aead_tmpl(crypto_aead_reqtfm(req)); + struct qce_device *qce = tmpl->qce; + struct qce_result_dump *result_buf = qce->dma.result_buf; + enum dma_data_direction dir_src, dir_dst; + bool diff_dst; + int error; + u32 status; + unsigned int totallen; + unsigned char tag[SHA256_DIGEST_SIZE] = {0}; + int ret = 0; + + diff_dst = (req->src != req->dst) ? true : false; + dir_src = diff_dst ? DMA_TO_DEVICE : DMA_BIDIRECTIONAL; + dir_dst = diff_dst ? DMA_FROM_DEVICE : DMA_BIDIRECTIONAL; + + error = qce_dma_terminate_all(&qce->dma); + if (error) + dev_dbg(qce->dev, "aead dma termination error (%d)\n", + error); + if (diff_dst) + dma_unmap_sg(qce->dev, rctx->src_sg, rctx->src_nents, dir_src); + + dma_unmap_sg(qce->dev, rctx->dst_sg, rctx->dst_nents, dir_dst); + + if (IS_CCM(rctx->flags)) { + if (req->assoclen) { + sg_free_table(&rctx->src_tbl); + if (diff_dst) + sg_free_table(&rctx->dst_tbl); + } else { + if (!(IS_DECRYPT(rctx->flags) && !diff_dst)) + sg_free_table(&rctx->dst_tbl); + } + } else { + sg_free_table(&rctx->dst_tbl); + } + + error = qce_check_status(qce, &status); +
[Patch v3 3/7] crypto: qce: Add mode for rfc4309
rf4309 is the specification that uses aes ccm algorithms with IPsec security packets. Add a submode to identify rfc4309 ccm(aes) algorithm in the crypto driver. Reviewed-by: Bjorn Andersson Signed-off-by: Thara Gopinath --- v1->v2: - Moved up the QCE_ENCRYPT AND QCE_DECRYPT bit positions so that addition of other algorithms in future will not affect these macros. drivers/crypto/qce/common.h | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/crypto/qce/common.h b/drivers/crypto/qce/common.h index 3bc244bcca2d..b135440bf72b 100644 --- a/drivers/crypto/qce/common.h +++ b/drivers/crypto/qce/common.h @@ -51,9 +51,11 @@ #define QCE_MODE_CCM BIT(12) #define QCE_MODE_MASK GENMASK(12, 8) +#define QCE_MODE_CCM_RFC4309 BIT(13) + /* cipher encryption/decryption operations */ -#define QCE_ENCRYPTBIT(13) -#define QCE_DECRYPTBIT(14) +#define QCE_ENCRYPTBIT(30) +#define QCE_DECRYPTBIT(31) #define IS_DES(flags) (flags & QCE_ALG_DES) #define IS_3DES(flags) (flags & QCE_ALG_3DES) @@ -73,6 +75,7 @@ #define IS_CTR(mode) (mode & QCE_MODE_CTR) #define IS_XTS(mode) (mode & QCE_MODE_XTS) #define IS_CCM(mode) (mode & QCE_MODE_CCM) +#define IS_CCM_RFC4309(mode) ((mode) & QCE_MODE_CCM_RFC4309) #define IS_ENCRYPT(dir)(dir & QCE_ENCRYPT) #define IS_DECRYPT(dir)(dir & QCE_DECRYPT) -- 2.25.1
[Patch v3 0/7] Add support for AEAD algorithms in Qualcomm Crypto Engine driver
Enable support for AEAD algorithms in Qualcomm CE driver. The first three patches in this series are cleanups and add a few missing pieces required to add support for AEAD algorithms. Patch 4 introduces supported AEAD transformations on Qualcomm CE. Patches 5 and 6 implements the h/w infrastructure needed to enable and run the AEAD transformations on Qualcomm CE. Patch 7 adds support to queue fallback algorithms in case of unsupported special inputs. This patch series has been tested with in kernel crypto testing module tcrypt.ko with fuzz tests enabled as well. Thara Gopinath (7): crypto: qce: common: Add MAC failed error checking crypto: qce: common: Make result dump optional crypto: qce: Add mode for rfc4309 crypto: qce: Add support for AEAD algorithms crypto: qce: common: Clean up qce_auth_cfg crypto: qce: common: Add support for AEAD algorithms crypto: qce: aead: Schedule fallback algorithm drivers/crypto/Kconfig | 15 + drivers/crypto/qce/Makefile | 1 + drivers/crypto/qce/aead.c | 841 drivers/crypto/qce/aead.h | 56 +++ drivers/crypto/qce/common.c | 196 - drivers/crypto/qce/common.h | 9 +- drivers/crypto/qce/core.c | 4 + 7 files changed, 1102 insertions(+), 20 deletions(-) create mode 100644 drivers/crypto/qce/aead.c create mode 100644 drivers/crypto/qce/aead.h -- 2.25.1
[Patch v3 1/7] crypto: qce: common: Add MAC failed error checking
MAC_FAILED gets set in the status register if authenthication fails for ccm algorithms(during decryption). Add support to catch and flag this error. Reviewed-by: Bjorn Andersson Signed-off-by: Thara Gopinath --- v1->v2: - Split the error checking for -ENXIO and -EBADMSG into if-else clause so that the code is more readable as per Bjorn's review comment. drivers/crypto/qce/common.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/crypto/qce/common.c b/drivers/crypto/qce/common.c index dceb9579d87a..dd76175d5c62 100644 --- a/drivers/crypto/qce/common.c +++ b/drivers/crypto/qce/common.c @@ -419,6 +419,8 @@ int qce_check_status(struct qce_device *qce, u32 *status) */ if (*status & STATUS_ERRORS || !(*status & BIT(OPERATION_DONE_SHIFT))) ret = -ENXIO; + else if (*status & BIT(MAC_FAILED_SHIFT)) + ret = -EBADMSG; return ret; } -- 2.25.1
Re: [PATCH] net: ethernet: ravb: Fix release of refclk
On Mon, Apr 19, 2021 at 5:45 PM David Miller wrote: > > From: Adam Ford > Date: Sat, 17 Apr 2021 08:23:29 -0500 > > > The call to clk_disable_unprepare() can happen before priv is > > initialized. This means moving clk_disable_unprepare out of > > out_release into a new label. > > > > Fixes: 8ef7adc6beb2("net: ethernet: ravb: Enable optional refclk") > > Signed-off-by: Adam Ford > Thjis does not apply cleanly, please rebbase and resubmit. Which branch should I use as the rebase? I used net-next because that's where the bug is, but I know it changes frequently. > > Please fix the formatting of your Fixes tag while you are at it, thank you. no problem. Sorry about that adam
RE: [RFC PATCH v5 1/4] topology: Represent clusters of CPUs within a die
> -Original Message- > From: Greg KH [mailto:gre...@linuxfoundation.org] > Sent: Friday, March 19, 2021 11:02 PM > To: Jonathan Cameron > Cc: Song Bao Hua (Barry Song) ; > tim.c.c...@linux.intel.com; catalin.mari...@arm.com; w...@kernel.org; > r...@rjwysocki.net; vincent.guit...@linaro.org; b...@alien8.de; > t...@linutronix.de; mi...@redhat.com; l...@kernel.org; pet...@infradead.org; > dietmar.eggem...@arm.com; rost...@goodmis.org; bseg...@google.com; > mgor...@suse.de; msys.miz...@gmail.com; valentin.schnei...@arm.com; > juri.le...@redhat.com; mark.rutl...@arm.com; sudeep.ho...@arm.com; > aubrey...@linux.intel.com; linux-arm-ker...@lists.infradead.org; > linux-kernel@vger.kernel.org; linux-a...@vger.kernel.org; x...@kernel.org; > xuwei (O) ; Zengtao (B) ; > guodong...@linaro.org; yangyicong ; Liguozhu (Kenneth) > ; linux...@openeuler.org; h...@zytor.com > Subject: Re: [RFC PATCH v5 1/4] topology: Represent clusters of CPUs within > a die > > On Fri, Mar 19, 2021 at 09:36:16AM +, Jonathan Cameron wrote: > > On Fri, 19 Mar 2021 06:57:08 + > > "Song Bao Hua (Barry Song)" wrote: > > > > > > -Original Message- > > > > From: Greg KH [mailto:gre...@linuxfoundation.org] > > > > Sent: Friday, March 19, 2021 7:35 PM > > > > To: Song Bao Hua (Barry Song) > > > > Cc: tim.c.c...@linux.intel.com; catalin.mari...@arm.com; > w...@kernel.org; > > > > r...@rjwysocki.net; vincent.guit...@linaro.org; b...@alien8.de; > > > > t...@linutronix.de; mi...@redhat.com; l...@kernel.org; > pet...@infradead.org; > > > > dietmar.eggem...@arm.com; rost...@goodmis.org; bseg...@google.com; > > > > mgor...@suse.de; msys.miz...@gmail.com; valentin.schnei...@arm.com; > Jonathan > > > > Cameron ; juri.le...@redhat.com; > > > > mark.rutl...@arm.com; sudeep.ho...@arm.com; aubrey...@linux.intel.com; > > > > linux-arm-ker...@lists.infradead.org; linux-kernel@vger.kernel.org; > > > > linux-a...@vger.kernel.org; x...@kernel.org; xuwei (O) > ; > > > > Zengtao (B) ; guodong...@linaro.org; > yangyicong > > > > ; Liguozhu (Kenneth) ; > > > > linux...@openeuler.org; h...@zytor.com > > > > Subject: Re: [RFC PATCH v5 1/4] topology: Represent clusters of CPUs > > > > within > > > > a die > > > > > > > > On Fri, Mar 19, 2021 at 05:16:15PM +1300, Barry Song wrote: > > > > > diff --git a/Documentation/admin-guide/cputopology.rst > > > > b/Documentation/admin-guide/cputopology.rst > > > > > index b90dafc..f9d3745 100644 > > > > > --- a/Documentation/admin-guide/cputopology.rst > > > > > +++ b/Documentation/admin-guide/cputopology.rst > > > > > @@ -24,6 +24,12 @@ core_id: > > > > > identifier (rather than the kernel's). The actual value is > > > > > architecture and platform dependent. > > > > > > > > > > +cluster_id: > > > > > + > > > > > + the Cluster ID of cpuX. Typically it is the hardware platform's > > > > > + identifier (rather than the kernel's). The actual value is > > > > > + architecture and platform dependent. > > > > > + > > > > > book_id: > > > > > > > > > > the book ID of cpuX. Typically it is the hardware platform's > > > > > @@ -56,6 +62,14 @@ package_cpus_list: > > > > > human-readable list of CPUs sharing the same > > > > > physical_package_id. > > > > > (deprecated name: "core_siblings_list") > > > > > > > > > > +cluster_cpus: > > > > > + > > > > > + internal kernel map of CPUs within the same cluster. > > > > > + > > > > > +cluster_cpus_list: > > > > > + > > > > > + human-readable list of CPUs within the same cluster. > > > > > + > > > > > die_cpus: > > > > > > > > > > internal kernel map of CPUs within the same die. > > > > > > > > Why are these sysfs files in this file, and not in a Documentation/ABI/ > > > > file which can be correctly parsed and shown to userspace? > > > > > > Well. Those ABIs have been there for much a long time. It is like: > > > > > > [root@ceph1 topology]# ls > > > core_id core_siblings core_siblings_list physical_package_id > thread_siblings thread_siblings_list > > > [root@ceph1 topology]# pwd > > > /sys/devices/system/cpu/cpu100/topology > > > [root@ceph1 topology]# cat core_siblings_list > > > 64-127 > > > [root@ceph1 topology]# > > > > > > > > > > > Any chance you can fix that up here as well? > > > > > > Yes. we will send a separate patch to address this, which won't > > > be in this patchset. This patchset will base on that one. > > > > > > > > > > > Also note that "list" is not something that goes in sysfs, sysfs is "one > > > > value per file", and a list is not "one value". How do you prevent > > > > overflowing the buffer of the sysfs file if you have a "list"? > > > > > > > > > > At a glance, the list is using "-" rather than a real list > > > [root@ceph1 topology]# cat core_siblings_list > > > 64-127 > > > > > > Anyway, I will take a look if it has any chance to overflow. > > > > It could in theory be alternate CPUs as comma separated list. > > So it's would get interesting around 500-1000 cpus (guessing).
Re: [PATCH bpf-next 1/2] bpf: Remove bpf_jit_enable=2 debugging mode
On Sat, Apr 17, 2021 at 1:16 AM Christophe Leroy wrote: > > > > Le 16/04/2021 à 01:49, Alexei Starovoitov a écrit : > > On Thu, Apr 15, 2021 at 8:41 AM Quentin Monnet > > wrote: > >> > >> 2021-04-15 16:37 UTC+0200 ~ Daniel Borkmann > >>> On 4/15/21 11:32 AM, Jianlin Lv wrote: > For debugging JITs, dumping the JITed image to kernel log is discouraged, > "bpftool prog dump jited" is much better way to examine JITed dumps. > This patch get rid of the code related to bpf_jit_enable=2 mode and > update the proc handler of bpf_jit_enable, also added auxiliary > information to explain how to use bpf_jit_disasm tool after this change. > > Signed-off-by: Jianlin Lv > >> > >> Hello, > >> > >> For what it's worth, I have already seen people dump the JIT image in > >> kernel logs in Qemu VMs running with just a busybox, not for kernel > >> development, but in a context where buiding/using bpftool was not > >> possible. > > > > If building/using bpftool is not possible then majority of selftests won't > > be exercised. I don't think such environment is suitable for any kind > > of bpf development. Much so for JIT debugging. > > While bpf_jit_enable=2 is nothing but the debugging tool for JIT developers. > > I'd rather nuke that code instead of carrying it from kernel to kernel. > > > > When I implemented JIT for PPC32, it was extremely helpfull. > > As far as I understand, for the time being bpftool is not usable in my > environment because it > doesn't support cross compilation when the target's endianess differs from > the building host > endianess, see discussion at > https://lore.kernel.org/bpf/21e66a09-514f-f426-b9e2-13baab0b9...@csgroup.eu/ > > That's right that selftests can't be exercised because they don't build. > > The question might be candid as I didn't investigate much about the > replacement of "bpf_jit_enable=2 > debugging mode" by bpftool, how do we use bpftool exactly for that ? > Especially when using the BPF > test module ? the kernel developers can add any amount of printk and dumps to debug their code, but such debugging aid should not be part of the production kernel. That sysctl was two things at once: debugging tool for kernel devs and introspection for users. bpftool jit dump solves the 2nd part. It provides JIT introspection to users. Debugging of the kernel can be done with any amount of auxiliary code including calling print_hex_dump() during jiting.
Re: [PATCH] bonding: 3ad: update slave arr after initialize
在 2021/4/16 12:28, Jay Vosburgh 写道: jinyiting wrote: From: jin yiting The bond works in mode 4, and performs down/up operations on the bond that is normally negotiated. The probability of bond-> slave_arr is NULL Test commands: ifconfig bond1 down ifconfig bond1 up The conflict occurs in the following process: __dev_open (CPU A) --bond_open --queue_delayed_work(bond->wq,&bond->ad_work,0); --bond_update_slave_arr --bond_3ad_get_active_agg_info ad_work(CPU B) --bond_3ad_state_machine_handler --ad_agg_selection_logic ad_work runs on cpu B. In the function ad_agg_selection_logic, all agg->is_active will be cleared. Before the new active aggregator is selected on CPU B, bond_3ad_get_active_agg_info failed on CPU A, bond->slave_arr will be set to NULL. The best aggregator in ad_agg_selection_logic has not changed, no need to update slave arr. Signed-off-by: jin yiting --- drivers/net/bonding/bond_3ad.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c index 6908822..d100079 100644 --- a/drivers/net/bonding/bond_3ad.c +++ b/drivers/net/bonding/bond_3ad.c @@ -2327,6 +2327,12 @@ void bond_3ad_state_machine_handler(struct work_struct *work) aggregator = __get_first_agg(port); ad_agg_selection_logic(aggregator, &update_slave_arr); + if (!update_slave_arr) { + struct aggregator *active = __get_active_agg(aggregator); + + if (active && active->is_active) + update_slave_arr = true; + } } bond_3ad_set_carrier(bond); } The described issue is a race condition (in that ad_agg_selection_logic clears agg->is_active under mode_lock, but bond_open -> bond_update_slave_arr is inspecting agg->is_active outside the lock). I don't see how the above change will reliably manage this; the real issue looks to be that bond_update_slave_arr is committing changes to the array (via bond_reset_slave_arr) based on a racy inspection of the active aggregator state while it is in flux. Also, the description of the issue says "The best aggregator in ad_agg_selection_logic has not changed, no need to update slave arr," but the change above does the opposite, and will set update_slave_arr when the aggregator has not changed (update_slave_arr remains false at return of ad_agg_selection_logic). I believe I understand the described problem, but I don't see how the patch fixes it. I suspect (but haven't tested) that the proper fix is to acquire mode_lock in bond_update_slave_arr while calling bond_3ad_get_active_agg_info to avoid conflict with the state machine. -J --- -Jay Vosburgh, jay.vosbu...@canonical.com . Thank you for your reply. The last patch does have redundant update slave arr.Thank you for your correction. As you said, holding mode_lock in bond_update_slave_arr while calling bond_3ad_get_active_agg_info can avoid conflictwith the state machine. I have tested this patch, with ifdown/ifup operations for bond or slaves. But bond_update_slave_arr is expected to hold RTNL only and NO other lock. And it have WARN_ON(lockdep_is_held(&bond->mode_lock)); in bond_update_slave_arr. I'm not sure that holding mode_lock in bond_update_slave_arr while calling bond_3ad_get_active_agg_info is a correct action. diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 74cbbb2..db988e5 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4406,7 +4406,9 @@ int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave) if (BOND_MODE(bond) == BOND_MODE_8023AD) { struct ad_info ad_info; + spin_lock_bh(&bond->mode_lock); if (bond_3ad_get_active_agg_info(bond, &ad_info)) { + spin_unlock_bh(&bond->mode_lock); pr_debug("bond_3ad_get_active_agg_info failed\n"); /* No active aggragator means it's not safe to use * the previous array. @@ -4414,6 +4416,7 @@ int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave) bond_reset_slave_arr(bond); goto out; } + spin_unlock_bh(&bond->mode_lock); agg_id = ad_info.aggregator_id; } bond_for_each_slave(bond, slave, iter) {
[PATCH v7 9/9] Documentation: arm64: Document PMU counters access from userspace
From: Raphael Gault Add documentation to describe the access to the pmu hardware counters from userspace. Signed-off-by: Raphael Gault Signed-off-by: Rob Herring --- v7: - Merge into existing arm64 perf.rst v6: - Update the chained event section with attr.config1 details v2: - Update links to test examples Changes from Raphael's v4: - Convert to rSt - Update chained event status - Add section for heterogeneous systems --- Documentation/arm64/perf.rst | 67 +++- 1 file changed, 66 insertions(+), 1 deletion(-) diff --git a/Documentation/arm64/perf.rst b/Documentation/arm64/perf.rst index b567f177d385..c40fc2adc451 100644 --- a/Documentation/arm64/perf.rst +++ b/Documentation/arm64/perf.rst @@ -2,7 +2,10 @@ .. _perf_index: -= + +Perf + + Perf Event Attributes = @@ -88,3 +91,65 @@ exclude_host. However when using !exclude_hv there is a small blackout window at the guest entry/exit where host events are not captured. On VHE systems there are no blackout windows. + +Perf Userspace PMU Hardware Counter Access +== + +Overview + +The perf userspace tool relies on the PMU to monitor events. It offers an +abstraction layer over the hardware counters since the underlying +implementation is cpu-dependent. +Arm64 allows userspace tools to have access to the registers storing the +hardware counters' values directly. + +This targets specifically self-monitoring tasks in order to reduce the overhead +by directly accessing the registers without having to go through the kernel. + +How-to +-- +The focus is set on the armv8 PMUv3 which makes sure that the access to the pmu +registers is enabled and that the userspace has access to the relevant +information in order to use them. + +In order to have access to the hardware counter it is necessary to open the event +using the perf tool interface: the sys_perf_event_open syscall returns a fd which +can subsequently be used with the mmap syscall in order to retrieve a page of +memory containing information about the event. +The PMU driver uses this page to expose to the user the hardware counter's +index and other necessary data. Using this index enables the user to access the +PMU registers using the `mrs` instruction. + +The userspace access is supported in libperf using the perf_evsel__mmap() +and perf_evsel__read() functions. See `tools/lib/perf/tests/test-evsel.c`_ for +an example. + +About heterogeneous systems +--- +On heterogeneous systems such as big.LITTLE, userspace PMU counter access can +only be enabled when the tasks are pinned to a homogeneous subset of cores and +the corresponding PMU instance is opened by specifying the 'type' attribute. +The use of generic event types is not supported in this case. + +Have a look at `tools/perf/arch/arm64/tests/user-events.c`_ for an example. It +can be run using the perf tool to check that the access to the registers works +correctly from userspace: + +.. code-block:: sh + + perf test -v user + +About chained events and 64-bit counters + +Chained events are not supported in conjunction with userspace counter +access. If a 64-bit counter is requested (attr.config1:0), then +userspace access must also be requested with attr.config1:1 set. This +will disable counter chaining. The 'pmc_width' in the user page will +indicate the actual width of the counter which could be only 32-bits +depending on event and PMU features. + +.. Links +.. _tools/perf/arch/arm64/tests/user-events.c: + https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/arch/arm64/tests/user-events.c +.. _tools/lib/perf/tests/test-evsel.c: + https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/tests/test-evsel.c -- 2.27.0
[PATCH v7 7/9] perf: arm64: Add test for userspace counter access on heterogeneous systems
Userspace counter access only works on heterogeneous systems with some restrictions. The userspace process must be pinned to a homogeneous subset of CPUs and must open the corresponding PMU for those CPUs. This commit adds a test implementing these requirements. Signed-off-by: Rob Herring --- v6: - Add a check on cap_user_rdpmc v5: - Adapt to libperf mmap API changes v4: - Update perf_evsel__mmap params v2: - Drop all but heterogeneous test as others covered by libperf tests - Rework to use libperf --- tools/perf/arch/arm64/include/arch-tests.h | 7 + tools/perf/arch/arm64/tests/Build | 1 + tools/perf/arch/arm64/tests/arch-tests.c | 4 + tools/perf/arch/arm64/tests/user-events.c | 177 + 4 files changed, 189 insertions(+) create mode 100644 tools/perf/arch/arm64/tests/user-events.c diff --git a/tools/perf/arch/arm64/include/arch-tests.h b/tools/perf/arch/arm64/include/arch-tests.h index 90ec4c8cb880..380ad34a3f09 100644 --- a/tools/perf/arch/arm64/include/arch-tests.h +++ b/tools/perf/arch/arm64/include/arch-tests.h @@ -2,11 +2,18 @@ #ifndef ARCH_TESTS_H #define ARCH_TESTS_H +#include + #ifdef HAVE_DWARF_UNWIND_SUPPORT struct thread; struct perf_sample; +int test__arch_unwind_sample(struct perf_sample *sample, +struct thread *thread); #endif extern struct test arch_tests[]; +int test__rd_pinned(struct test __maybe_unused *test, + int __maybe_unused subtest); + #endif diff --git a/tools/perf/arch/arm64/tests/Build b/tools/perf/arch/arm64/tests/Build index a61c06bdb757..3f9a20c17fc6 100644 --- a/tools/perf/arch/arm64/tests/Build +++ b/tools/perf/arch/arm64/tests/Build @@ -1,4 +1,5 @@ perf-y += regs_load.o perf-$(CONFIG_DWARF_UNWIND) += dwarf-unwind.o +perf-y += user-events.o perf-y += arch-tests.o diff --git a/tools/perf/arch/arm64/tests/arch-tests.c b/tools/perf/arch/arm64/tests/arch-tests.c index 5b1543c98022..80ce7bd3c16d 100644 --- a/tools/perf/arch/arm64/tests/arch-tests.c +++ b/tools/perf/arch/arm64/tests/arch-tests.c @@ -10,6 +10,10 @@ struct test arch_tests[] = { .func = test__dwarf_unwind, }, #endif + { + .desc = "Pinned CPU user counter access", + .func = test__rd_pinned, + }, { .func = NULL, }, diff --git a/tools/perf/arch/arm64/tests/user-events.c b/tools/perf/arch/arm64/tests/user-events.c new file mode 100644 index ..c8efc6b369e6 --- /dev/null +++ b/tools/perf/arch/arm64/tests/user-events.c @@ -0,0 +1,177 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include + +#include +#include +#include + +#include "pmu.h" +#include "debug.h" +#include "tests/tests.h" +#include "arch-tests.h" + +static int run_test(struct perf_evsel *evsel) +{ + int n; + volatile int tmp = 0; + u64 delta, i, loops = 1000; + struct perf_counts_values counts = { .val = 0 }; + + for (n = 0; n < 6; n++) { + u64 stamp, now; + + perf_evsel__read(evsel, 0, 0, &counts); + stamp = counts.val; + + for (i = 0; i < loops; i++) + tmp++; + + perf_evsel__read(evsel, 0, 0, &counts); + now = counts.val; + loops *= 10; + + delta = now - stamp; + pr_debug("%14d: %14llu\n", n, (long long)delta); + + if (!delta) + break; + } + return delta ? 0 : -1; +} + +static struct perf_pmu *pmu_for_cpu(int cpu) +{ + int acpu, idx; + struct perf_pmu *pmu = NULL; + + while ((pmu = perf_pmu__scan(pmu)) != NULL) { + if (pmu->is_uncore) + continue; + perf_cpu_map__for_each_cpu(acpu, idx, pmu->cpus) + if (acpu == cpu) + return pmu; + } + return NULL; +} + +static bool pmu_is_homogeneous(void) +{ + int core_cnt = 0; + struct perf_pmu *pmu = NULL; + + while ((pmu = perf_pmu__scan(pmu)) != NULL) { + if (!pmu->is_uncore && !perf_cpu_map__empty(pmu->cpus)) + core_cnt++; + } + return core_cnt == 1; +} + +static int libperf_print(enum libperf_print_level level, +const char *fmt, va_list ap) +{ + (void)level; + return vfprintf(stderr, fmt, ap); +} + +static struct perf_evsel *perf_init(struct perf_event_attr *attr) +{ + int err; + struct perf_thread_map *threads; + struct perf_evsel *evsel; + struct perf_event_mmap_page *pc; + + libperf_init(libperf_print); + + threads = perf_thread_map__new_dummy(); + if (!threads) { + pr_err("failed to create threads\n"); + return NULL; + } + + perf_thread_map__set_pid(threads, 0, 0); + + evsel = perf_evsel__new(attr); + if (!evsel) { +
[PATCH v7 8/9] perf: arm64: Add tests for 32-bit and 64-bit counter size userspace access
Add arm64 specific tests for 32-bit and 64-bit counter user access. On arm64, counters default to 32-bit unless attr.config1:0 is set to 1. In order to enable user access for 64-bit counters, attr.config1 must be set to 3. Signed-off-by: Rob Herring --- v6: - New patch --- tools/perf/arch/arm64/include/arch-tests.h | 5 +++ tools/perf/arch/arm64/tests/arch-tests.c | 8 + tools/perf/arch/arm64/tests/user-events.c | 38 ++ 3 files changed, 51 insertions(+) diff --git a/tools/perf/arch/arm64/include/arch-tests.h b/tools/perf/arch/arm64/include/arch-tests.h index 380ad34a3f09..ddfa7460e1e1 100644 --- a/tools/perf/arch/arm64/include/arch-tests.h +++ b/tools/perf/arch/arm64/include/arch-tests.h @@ -15,5 +15,10 @@ extern struct test arch_tests[]; int test__rd_pinned(struct test __maybe_unused *test, int __maybe_unused subtest); +int test__rd_64bit(struct test __maybe_unused *test, + int __maybe_unused subtest); + +int test__rd_32bit(struct test __maybe_unused *test, + int __maybe_unused subtest); #endif diff --git a/tools/perf/arch/arm64/tests/arch-tests.c b/tools/perf/arch/arm64/tests/arch-tests.c index 80ce7bd3c16d..bbdb81aa3229 100644 --- a/tools/perf/arch/arm64/tests/arch-tests.c +++ b/tools/perf/arch/arm64/tests/arch-tests.c @@ -14,6 +14,14 @@ struct test arch_tests[] = { .desc = "Pinned CPU user counter access", .func = test__rd_pinned, }, + { + .desc = "User 64-bit counter access", + .func = test__rd_64bit, + }, + { + .desc = "User 32-bit counter access", + .func = test__rd_32bit, + }, { .func = NULL, }, diff --git a/tools/perf/arch/arm64/tests/user-events.c b/tools/perf/arch/arm64/tests/user-events.c index c8efc6b369e6..546323b5242c 100644 --- a/tools/perf/arch/arm64/tests/user-events.c +++ b/tools/perf/arch/arm64/tests/user-events.c @@ -175,3 +175,41 @@ int test__rd_pinned(struct test __maybe_unused *test, perf_evsel__delete(evsel); return ret; } + +static int test__rd_counter_size(struct test __maybe_unused *test, +int config1) +{ + int ret; + struct perf_evsel *evsel; + struct perf_event_attr attr = { + .type = PERF_TYPE_HARDWARE, + .config = PERF_COUNT_HW_INSTRUCTIONS, + .config1 = config1, + .exclude_kernel = 1, + }; + + if (!pmu_is_homogeneous()) + return TEST_SKIP; + + evsel = perf_init(&attr); + if (!evsel) + return -1; + + ret = run_test(evsel); + + perf_evsel__close(evsel); + perf_evsel__delete(evsel); + return ret; +} + +int test__rd_64bit(struct test __maybe_unused *test, + int __maybe_unused subtest) +{ + return test__rd_counter_size(test, 0x3); +} + +int test__rd_32bit(struct test __maybe_unused *test, + int __maybe_unused subtest) +{ + return test__rd_counter_size(test, 0x2); +} -- 2.27.0
[PATCH v7 6/9] libperf: Add arm64 support to perf_mmap__read_self()
Add the arm64 variants for read_perf_counter() and read_timestamp(). Unfortunately the counter number is encoded into the instruction, so the code is a bit verbose to enumerate all possible counters. Signed-off-by: Rob Herring --- v7: - Move enabling of libperf user read test for arm64 to this patch --- tools/lib/perf/mmap.c | 98 +++ tools/lib/perf/tests/test-evsel.c | 2 +- 2 files changed, 99 insertions(+), 1 deletion(-) diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c index 915469f00cf4..216e6c6212a2 100644 --- a/tools/lib/perf/mmap.c +++ b/tools/lib/perf/mmap.c @@ -13,6 +13,7 @@ #include #include #include +#include #include "internal.h" void perf_mmap__init(struct perf_mmap *map, struct perf_mmap *prev, @@ -294,6 +295,103 @@ static u64 read_timestamp(void) return low | ((u64)high) << 32; } +#elif defined(__aarch64__) +#define read_sysreg(r) ({ \ + u64 __val; \ + asm volatile("mrs %0, " __stringify(r) : "=r" (__val)); \ + __val; \ +}) + +static u64 read_pmccntr(void) +{ + return read_sysreg(pmccntr_el0); +} + +#define PMEVCNTR_READ(idx) \ + static u64 read_pmevcntr_##idx(void) { \ + return read_sysreg(pmevcntr##idx##_el0);\ + } + +PMEVCNTR_READ(0); +PMEVCNTR_READ(1); +PMEVCNTR_READ(2); +PMEVCNTR_READ(3); +PMEVCNTR_READ(4); +PMEVCNTR_READ(5); +PMEVCNTR_READ(6); +PMEVCNTR_READ(7); +PMEVCNTR_READ(8); +PMEVCNTR_READ(9); +PMEVCNTR_READ(10); +PMEVCNTR_READ(11); +PMEVCNTR_READ(12); +PMEVCNTR_READ(13); +PMEVCNTR_READ(14); +PMEVCNTR_READ(15); +PMEVCNTR_READ(16); +PMEVCNTR_READ(17); +PMEVCNTR_READ(18); +PMEVCNTR_READ(19); +PMEVCNTR_READ(20); +PMEVCNTR_READ(21); +PMEVCNTR_READ(22); +PMEVCNTR_READ(23); +PMEVCNTR_READ(24); +PMEVCNTR_READ(25); +PMEVCNTR_READ(26); +PMEVCNTR_READ(27); +PMEVCNTR_READ(28); +PMEVCNTR_READ(29); +PMEVCNTR_READ(30); + +/* + * Read a value direct from PMEVCNTR + */ +static u64 read_perf_counter(unsigned int counter) +{ + static u64 (* const read_f[])(void) = { + read_pmevcntr_0, + read_pmevcntr_1, + read_pmevcntr_2, + read_pmevcntr_3, + read_pmevcntr_4, + read_pmevcntr_5, + read_pmevcntr_6, + read_pmevcntr_7, + read_pmevcntr_8, + read_pmevcntr_9, + read_pmevcntr_10, + read_pmevcntr_11, + read_pmevcntr_13, + read_pmevcntr_12, + read_pmevcntr_14, + read_pmevcntr_15, + read_pmevcntr_16, + read_pmevcntr_17, + read_pmevcntr_18, + read_pmevcntr_19, + read_pmevcntr_20, + read_pmevcntr_21, + read_pmevcntr_22, + read_pmevcntr_23, + read_pmevcntr_24, + read_pmevcntr_25, + read_pmevcntr_26, + read_pmevcntr_27, + read_pmevcntr_28, + read_pmevcntr_29, + read_pmevcntr_30, + read_pmccntr + }; + + if (counter < ARRAY_SIZE(read_f)) + return (read_f[counter])(); + + return 0; +} + +static u64 read_timestamp(void) { return read_sysreg(cntvct_el0); } + #else static u64 read_perf_counter(unsigned int counter) { return 0; } static u64 read_timestamp(void) { return 0; } diff --git a/tools/lib/perf/tests/test-evsel.c b/tools/lib/perf/tests/test-evsel.c index 288b5feaefe2..670fcdb6d6af 100644 --- a/tools/lib/perf/tests/test-evsel.c +++ b/tools/lib/perf/tests/test-evsel.c @@ -148,7 +148,7 @@ static int test_stat_user_read(int event) pc = perf_evsel__mmap_base(evsel, 0, 0); -#if defined(__i386__) || defined(__x86_64__) +#if defined(__i386__) || defined(__x86_64__) || __defined(__aarch64__) __T("userspace counter access not supported", pc->cap_user_rdpmc); __T("userspace counter access not enabled", pc->index); __T("userspace counter width not set", pc->pmc_width >= 32); -- 2.27.0
[PATCH v7 5/9] arm64: perf: Add userspace counter access disable switch
Like x86, some users may want to disable userspace PMU counter altogether. Add a sysfs 'rdpmc' file to control userspace counter access. The default is '1' which is enabled. Writing '0' disables access. In the case of multiple PMUs (i.e. big.LITTLE), the control is per PMU and userspace must disable access on each PMU. Note that x86 also supports writing '2' to globally enable user access. As there's not existing userspace support to worry about, this shouldn't be necessary for Arm. It could be added later if the need arises. Signed-off-by: Rob Herring --- arch/arm64/kernel/perf_event.c | 61 -- include/linux/perf/arm_pmu.h | 4 ++- 2 files changed, 61 insertions(+), 4 deletions(-) diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c index bfbb7f449aca..1ab6308ca89c 100644 --- a/arch/arm64/kernel/perf_event.c +++ b/arch/arm64/kernel/perf_event.c @@ -336,6 +336,54 @@ static const struct attribute_group armv8_pmuv3_caps_attr_group = { .attrs = armv8_pmuv3_caps_attrs, }; +static void armv8pmu_disable_user_access(void *mm); + +static ssize_t get_attr_rdpmc(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct pmu *pmu = dev_get_drvdata(dev); + struct arm_pmu *cpu_pmu = container_of(pmu, struct arm_pmu, pmu); + + return snprintf(buf, 40, "%d\n", cpu_pmu->attr_rdpmc); +} + +static ssize_t set_attr_rdpmc(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct pmu *pmu = dev_get_drvdata(dev); + struct arm_pmu *cpu_pmu = container_of(pmu, struct arm_pmu, pmu); + unsigned long val; + ssize_t ret; + + ret = kstrtoul(buf, 0, &val); + if (ret) + return ret; + + if (val > 1) + return -EINVAL; + + if (val != cpu_pmu->attr_rdpmc) { + cpu_pmu->attr_rdpmc = !!val; + if (!val) + on_each_cpu_mask(&cpu_pmu->supported_cpus, armv8pmu_disable_user_access, NULL, 1); + } + + return count; +} + +static DEVICE_ATTR(rdpmc, S_IRUSR | S_IWUSR, get_attr_rdpmc, set_attr_rdpmc); + +static struct attribute *armv8_pmuv3_rdpmc_attrs[] = { + &dev_attr_rdpmc.attr, + NULL, +}; + +static const struct attribute_group armv8_pmuv3_rdpmc_attr_group = { + .attrs = armv8_pmuv3_rdpmc_attrs, +}; + /* * Perf Events' indices */ @@ -950,7 +998,8 @@ static void armv8pmu_sched_task(struct perf_event_context *ctx, bool sched_in) * If a new task has user access enabled, clear the dirty counters * to prevent the potential leak. */ - if (ctx && current->mm && atomic_read(¤t->mm->context.pmu_direct_access)) { + if (ctx && to_arm_pmu(ctx->pmu)->attr_rdpmc && + current->mm && atomic_read(¤t->mm->context.pmu_direct_access)) { armv8pmu_enable_user_access(); armv8pmu_clear_dirty_counters(to_arm_pmu(ctx->pmu)); } else { @@ -1093,7 +1142,8 @@ static int __armv8_pmuv3_map_event(struct perf_event *event, &armv8_pmuv3_perf_cache_map, ARMV8_PMU_EVTYPE_EVENT); - if (armv8pmu_event_want_user_access(event) || !armv8pmu_event_is_64bit(event)) + if (armpmu->attr_rdpmc && + (armv8pmu_event_want_user_access(event) || !armv8pmu_event_is_64bit(event))) event->hw.flags |= ARMPMU_EL0_RD_CNTR; /* @@ -1218,7 +1268,9 @@ static int armv8pmu_probe_pmu(struct arm_pmu *cpu_pmu) static int armv8pmu_undef_handler(struct pt_regs *regs, u32 insn) { - if (atomic_read(¤t->mm->context.pmu_direct_access)) { + struct arm_pmu *armpmu = *this_cpu_ptr(&cpu_armpmu); + + if (armpmu->attr_rdpmc && atomic_read(¤t->mm->context.pmu_direct_access)) { armv8pmu_enable_user_access(); return 0; } @@ -1277,6 +1329,9 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu, char *name, cpu_pmu->attr_groups[ARMPMU_ATTR_GROUP_CAPS] = caps ? caps : &armv8_pmuv3_caps_attr_group; + cpu_pmu->attr_groups[ARMPMU_ATTR_GROUP_RDPMC] = &armv8_pmuv3_rdpmc_attr_group; + cpu_pmu->attr_rdpmc = true; + return 0; } diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h index 1daad3b2cce5..9303cd07ac57 100644 --- a/include/linux/perf/arm_pmu.h +++ b/include/linux/perf/arm_pmu.h @@ -82,6 +82,7 @@ enum armpmu_attr_groups { ARMPMU_ATTR_GROUP_EVENTS, ARMPMU_ATTR_GROUP_FORMATS, ARMPMU_ATTR_GROUP_CAPS, + ARMPMU_ATTR_GROUP_RDPMC, ARMPMU_NR_ATTR_GROUPS }; @@ -107,7 +108,8 @@ struct arm_pmu { int (*map_event)(struct perf_event *event); int (*filter_match)(struct perf_event *event);
[PATCH v7 3/9] arm64: perf: Enable PMU counter direct access for perf event
From: Raphael Gault Keep track of event opened with direct access to the hardware counters and modify permissions while they are open. The strategy used here is the same which x86 uses: every time an event is mapped, the permissions are set if required. The atomic field added in the mm_context helps keep track of the different event opened and de-activate the permissions when all are unmapped. We also need to update the permissions in the context switch code so that tasks keep the right permissions. In order to enable 64-bit counters for userspace when available, a new config1 bit is added for userspace to indicate it wants userspace counter access. This bit allows the kernel to decide if chaining should be disabled and chaining and userspace access are incompatible. The modes for config1 are as follows: config1 = 0 or 2 : user access enabled and always 32-bit config1 = 1 : user access disabled and always 64-bit (using chaining if needed) config1 = 3 : user access enabled and counter size matches underlying counter. User access is enabled with config1 == 0 so that we match x86 behavior and don't need Arm specific code (outside the counter read). Signed-off-by: Raphael Gault Signed-off-by: Rob Herring --- Peter Z says (event->oncpu == smp_processor_id()) in the user page update is always true, but my testing says otherwise[1]. v7: - Clear disabled counters when user access is enabled for a task to avoid leaking other tasks counter data. - Rework context switch handling utilizing sched_task callback - Add armv8pmu_event_can_chain() helper - Rework config1 flags handling structure - Use ARMV8_IDX_CYCLE_COUNTER_USER define for remapped user cycle counter index v6: - Add new attr.config1 rdpmc bit for userspace to hint it wants userspace access when also requesting 64-bit counters. v5: - Only set cap_user_rdpmc if event is on current cpu - Limit enabling/disabling access to CPUs associated with the PMU (supported_cpus) and with the mm_struct matching current->active_mm. v2: - Move mapped/unmapped into arm64 code. Fixes arm32. - Rebase on cap_user_time_short changes Changes from Raphael's v4: - Drop homogeneous check - Disable access for chained counters - Set pmc_width in user page [1] https://lore.kernel.org/lkml/cal_jsqk+ekef5navnbfarcjre3myhfbfe54f9yhkbstnwql...@mail.gmail.com/ --- arch/arm64/include/asm/mmu.h | 5 + arch/arm64/kernel/perf_event.c | 179 +++-- include/linux/perf/arm_pmu.h | 6 ++ 3 files changed, 183 insertions(+), 7 deletions(-) diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h index 75beffe2ee8a..ee08447455da 100644 --- a/arch/arm64/include/asm/mmu.h +++ b/arch/arm64/include/asm/mmu.h @@ -18,6 +18,11 @@ typedef struct { atomic64_t id; + /* +* non-zero if userspace have access to hardware +* counters directly. +*/ + atomic_tpmu_direct_access; #ifdef CONFIG_COMPAT void*sigpage; #endif diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c index 40cf59455ce8..bfbb7f449aca 100644 --- a/arch/arm64/kernel/perf_event.c +++ b/arch/arm64/kernel/perf_event.c @@ -8,9 +8,11 @@ * This code is based heavily on the ARMv7 perf event code. */ +#include #include #include #include +#include #include #include @@ -288,15 +290,22 @@ static const struct attribute_group armv8_pmuv3_events_attr_group = { PMU_FORMAT_ATTR(event, "config:0-15"); PMU_FORMAT_ATTR(long, "config1:0"); +PMU_FORMAT_ATTR(rdpmc, "config1:1"); static inline bool armv8pmu_event_is_64bit(struct perf_event *event) { return event->attr.config1 & 0x1; } +static inline bool armv8pmu_event_want_user_access(struct perf_event *event) +{ + return event->attr.config1 & 0x2; +} + static struct attribute *armv8_pmuv3_format_attrs[] = { &format_attr_event.attr, &format_attr_long.attr, + &format_attr_rdpmc.attr, NULL, }; @@ -344,6 +353,15 @@ static bool armv8pmu_has_long_event(struct arm_pmu *cpu_pmu) return (cpu_pmu->pmuver >= ID_AA64DFR0_PMUVER_8_5); } +static inline bool armv8pmu_event_can_chain(struct perf_event *event) +{ + struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu); + + return !armv8pmu_event_want_user_access(event) && + armv8pmu_event_is_64bit(event) && + !armv8pmu_has_long_event(cpu_pmu); +} + /* * We must chain two programmable counters for 64 bit events, * except when we have allocated the 64bit cycle counter (for CPU @@ -353,11 +371,9 @@ static bool armv8pmu_has_long_event(struct arm_pmu *cpu_pmu) static inline bool armv8pmu_event_is_chained(struct perf_event *event) { int idx = event->hw.idx; - struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu); return !WARN_ON(idx < 0) && - armv8pmu_event_is_64bit(event) && - !armv8pmu_has_long_event(cpu_pmu) && +
[PATCH v7 4/9] drivers/perf: arm_pmu: Export the per_cpu cpu_armpmu
The arm64 PMU driver needs to retrieve the struct arm_pmu pointer for the current CPU. As the common code already maintains this with the percpu cpu_armpmu, let's make it global. Signed-off-by: Rob Herring --- drivers/perf/arm_pmu.c | 2 +- include/linux/perf/arm_pmu.h | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c index 2d10d84fb79c..6ac56280b9c7 100644 --- a/drivers/perf/arm_pmu.c +++ b/drivers/perf/arm_pmu.c @@ -99,7 +99,7 @@ static const struct pmu_irq_ops percpu_pmunmi_ops = { .free_pmuirq = armpmu_free_percpu_pmunmi }; -static DEFINE_PER_CPU(struct arm_pmu *, cpu_armpmu); +DEFINE_PER_CPU(struct arm_pmu *, cpu_armpmu); static DEFINE_PER_CPU(int, cpu_irq); static DEFINE_PER_CPU(const struct pmu_irq_ops *, cpu_irq_ops); diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h index 63af052e3909..1daad3b2cce5 100644 --- a/include/linux/perf/arm_pmu.h +++ b/include/linux/perf/arm_pmu.h @@ -127,6 +127,8 @@ struct arm_pmu { #define to_arm_pmu(p) (container_of(p, struct arm_pmu, pmu)) +DECLARE_PER_CPU(struct arm_pmu *, cpu_armpmu); + u64 armpmu_event_update(struct perf_event *event); int armpmu_event_set_period(struct perf_event *event); -- 2.27.0
[PATCH v7 2/9] arm64: pmu: Add function implementation to update event index in userpage
From: Raphael Gault In order to be able to access the counter directly for userspace, we need to provide the index of the counter using the userpage. We thus need to override the event_idx function to retrieve and convert the perf_event index to armv8 hardware index. Since the arm_pmu driver can be used by any implementation, even if not armv8, two components play a role into making sure the behaviour is correct and consistent with the PMU capabilities: * the ARMPMU_EL0_RD_CNTR flag which denotes the capability to access counter from userspace. * the event_idx call back, which is implemented and initialized by the PMU implementation: if no callback is provided, the default behaviour applies, returning 0 as index value. Signed-off-by: Raphael Gault Signed-off-by: Rob Herring --- v7: - Add define ARMV8_IDX_CYCLE_COUNTER_USER for userspace index of cycle counter --- arch/arm64/kernel/perf_event.c | 20 +++- include/linux/perf/arm_pmu.h | 2 ++ 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c index 4658fcf88c2b..40cf59455ce8 100644 --- a/arch/arm64/kernel/perf_event.c +++ b/arch/arm64/kernel/perf_event.c @@ -332,7 +332,7 @@ static const struct attribute_group armv8_pmuv3_caps_attr_group = { */ #defineARMV8_IDX_CYCLE_COUNTER 0 #defineARMV8_IDX_COUNTER0 1 - +#defineARMV8_IDX_CYCLE_COUNTER_USER32 /* * We unconditionally enable ARMv8.5-PMU long event counter support @@ -871,6 +871,22 @@ static void armv8pmu_clear_event_idx(struct pmu_hw_events *cpuc, clear_bit(idx - 1, cpuc->used_mask); } +static int armv8pmu_access_event_idx(struct perf_event *event) +{ + if (!(event->hw.flags & ARMPMU_EL0_RD_CNTR)) + return 0; + + /* +* We remap the cycle counter index to 32 to +* match the offset applied to the rest of +* the counter indices. +*/ + if (event->hw.idx == ARMV8_IDX_CYCLE_COUNTER) + return ARMV8_IDX_CYCLE_COUNTER_USER; + + return event->hw.idx; +} + /* * Add an event filter to a given event. */ @@ -1098,6 +1114,8 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu, char *name, cpu_pmu->set_event_filter = armv8pmu_set_event_filter; cpu_pmu->filter_match = armv8pmu_filter_match; + cpu_pmu->pmu.event_idx = armv8pmu_access_event_idx; + cpu_pmu->name = name; cpu_pmu->map_event = map_event; cpu_pmu->attr_groups[ARMPMU_ATTR_GROUP_EVENTS] = events ? diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h index 505480217cf1..d29aa981d989 100644 --- a/include/linux/perf/arm_pmu.h +++ b/include/linux/perf/arm_pmu.h @@ -26,6 +26,8 @@ */ /* Event uses a 64bit counter */ #define ARMPMU_EVT_64BIT 1 +/* Allow access to hardware counter from userspace */ +#define ARMPMU_EL0_RD_CNTR 2 #define HW_OP_UNSUPPORTED 0x #define C(_x) PERF_COUNT_HW_CACHE_##_x -- 2.27.0
[PATCH v7 0/9] arm64 userspace counter access support
Hi all, Another version of arm64 userspace counter access support. I sent out the libperf bits separately and those are now applied (Thanks!), so this is just the arm64 bits. This originally resurrected Raphael's series[1] to enable userspace counter access on arm64. My previous versions are here[2][3][4][5][6][7]. A git branch is here[8]. Changes in v7: - Handling of dirty counter leakage and reworking of context switch and user access enabling. The .sched_task hook and undef instruction handler are now utilized. (Patch 3) - Add a userspace disable switch like x86. (Patch 5) Changes in v6: - Reworking of the handling of 64-bit counters and user access. There's a new config1 flag to request user access. This takes priority over the 64-bit flag and the user will get the maximum size the h/w supports without chaining. - The libperf evsel mmap struct is stored in its own xyarray - New tests for user 64-bit and 32-bit counters - Rebase to v5.12-rc2 Changes in v5: - Limit enabling/disabling access to CPUs associated with the PMU (supported_cpus) and with the mm_struct matching current->active_mm. The x86 method of using mm_cpumask doesn't work for arm64 as it is not updated. - Only set cap_user_rdpmc if event is on current cpu. See patch 2. - Create an mmap for every event in an evsel. This results in some changes to the libperf mmap API from the last version. - Rebase to v5.11-rc2 Changes in v4: - Dropped 'arm64: pmu: Add hook to handle pmu-related undefined instructions'. The onus is on userspace to pin itself to a homogeneous subset of CPUs and avoid any aborts on heterogeneous systems, so the hook is not needed. - Make perf_evsel__mmap() take pages rather than bytes for size - Fix building arm64 heterogeneous test. Changes in v3: - Dropped removing x86 rdpmc test until libperf tests can run via 'perf test' - Added verbose prints for tests - Split adding perf_evsel__mmap() to separate patch The following changes to the arm64 support have been made compared to Raphael's last version: The major change is support for heterogeneous systems with some restrictions. Specifically, userspace must pin itself to like CPUs, open a specific PMU by type, and use h/w specific events. The tests have been reworked to demonstrate this. Chained events are not supported. The problem with supporting chained events was there's no way to distinguish between a chained event and a native 64-bit counter. We could add some flag, but do self monitoring processes really need that? Native 64-bit counters are supported if the PMU h/w has support. As there's already an explicit ABI to request 64-bit counters, userspace can request 64-bit counters and if user access is not enabled, then it must retry with 32-bit counters. Prior versions broke the build on arm32 (surprisingly never caught by 0-day). As a result, event_mapped and event_unmapped implementations have been moved into the arm64 code. There was a bug in that pmc_width was not set in the user page. The tests now check for this. The documentation has been converted to rST. I've added sections on chained events and heterogeneous. The tests have been expanded to test the cycle counter access. Rob [1] https://lore.kernel.org/r/20190822144220.27860-1-raphael.ga...@arm.com/ [2] https://lore.kernel.org/r/20200707205333.624938-1-r...@kernel.org/ [3] https://lore.kernel.org/r/20200828205614.3391252-1-r...@kernel.org/ [4] https://lore.kernel.org/r/20200911215118.2887710-1-r...@kernel.org/ [5] https://lore.kernel.org/r/20201001140116.651970-1-r...@kernel.org/ [6] https://lore.kernel.org/r/20210114020605.3943992-1-r...@kernel.org/ [7] https://lore.kernel.org/r/20210311000837.3630499-1-r...@kernel.org/ [8] git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git user-perf-event-v7 Raphael Gault (4): arm64: Restrict undef hook for cpufeature registers arm64: pmu: Add function implementation to update event index in userpage arm64: perf: Enable PMU counter direct access for perf event Documentation: arm64: Document PMU counters access from userspace Rob Herring (5): drivers/perf: arm_pmu: Export the per_cpu cpu_armpmu arm64: perf: Add userspace counter access disable switch libperf: Add arm64 support to perf_mmap__read_self() perf: arm64: Add test for userspace counter access on heterogeneous systems perf: arm64: Add tests for 32-bit and 64-bit counter size userspace access Documentation/arm64/perf.rst | 67 +- arch/arm64/include/asm/mmu.h | 5 + arch/arm64/kernel/cpufeature.c | 4 +- arch/arm64/kernel/perf_event.c | 254 - drivers/perf/arm_pmu.c | 2 +- include/linux/perf/arm_pmu.h | 14 +- tools/lib/perf/mmap.c | 98 tools/lib/perf/tests/test-evsel.c | 2 +- tools/perf/arch/arm64/include/arch-tests.h | 12 + tools/perf/arch/arm64/te
[PATCH v7 1/9] arm64: Restrict undef hook for cpufeature registers
From: Raphael Gault This commit modifies the mask of the mrs_hook declared in arch/arm64/kernel/cpufeatures.c which emulates only feature register access. This is necessary because this hook's mask was too large and thus masking any mrs instruction, even if not related to the emulated registers which made the pmu emulation inefficient. Signed-off-by: Raphael Gault Signed-off-by: Rob Herring --- v7: - Split off from Raphael's original undef hook patch as this change stands on its own. --- arch/arm64/kernel/cpufeature.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 066030717a4c..aa0777690ab1 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -2888,8 +2888,8 @@ static int emulate_mrs(struct pt_regs *regs, u32 insn) } static struct undef_hook mrs_hook = { - .instr_mask = 0xfff0, - .instr_val = 0xd530, + .instr_mask = 0x, + .instr_val = 0xd538, .pstate_mask = PSR_AA32_MODE_MASK, .pstate_val = PSR_MODE_EL0t, .fn = emulate_mrs, -- 2.27.0
Re: [PATCH] hrtimer: Update softirq_expires_next correctly after __hrtimer_get_next_event()
On Thu, Apr 15, 2021 at 9:47 AM Thomas Gleixner wrote: > > On Wed, Apr 14 2021 at 11:49, Lorenzo Colitti wrote: > > On Wed, Apr 14, 2021 at 2:14 AM Greg KH wrote: > >> To give context, the commit is now 46eb1701c046 ("hrtimer: Update > >> softirq_expires_next correctly after __hrtimer_get_next_event()") and is > >> attached below. > >> > >> The f_ncm.c driver is doing a lot of calls to hrtimer_start() with mode > >> HRTIMER_MODE_REL_SOFT for I think every packet it gets. If that should > >> not be happening, we can easily fix it but I guess no one has actually > >> had fast USB devices that noticed this until now :) > > > > AIUI the purpose of this timer is to support packet aggregation. USB > > transfers are relatively expensive/high latency. 6 Gbps is 500k > > 1500-byte packets per second, or one every 2us. So f_ncm buffers as > > many packets as will fit into 16k (usually, 10 1500-byte packets), and > > only initiates a USB transfer when those packets have arrived. That > > ends up doing only one transfer every 20us. It sets a 300us timer to > > ensure that if the 10 packets haven't arrived, it still sends out > > whatever it has when the timer fires. The timer is set/updated on > > every packet buffered by ncm. > > > > Is this somehow much more expensive in 5.10.24 than it was before? > > Even if this driver is somehow "holding it wrong", might there not be > > other workloads that have a similar problem? What about regressions on > > those workloads? > > Let's put the question of whether this hrtimer usage is sensible or not > aside for now. > > I stared at the change for a while and did some experiments to recreate > the problem, but that didn't get me anywhere. > > Could you please do the following? > > Enable tracing and enable the following tracepoints: > > timers/hrtimer_cancel > timers/hrtimer_start > timers/hrtimer_expire_entry > irq/softirq_raise > irq/softirq_enter > irq/softirq_exit > > and function tracing filtered on ncm_wrap_ntb() and > package_for_tx() only (to reduce the noise). > > Run the test on a kernels with and without that commit and collect trace > data for both. > > That should give me a pretty clear picture what's going on. Lorenzo is trying to get the traces you asked for, or rather he’s trying to get confirmation he can post them. Our initial observation of these results seems to suggest that updating the timer (hrtimer_start, which seems to also call hrtimer_cancel) takes twice as long as it used to. My gut feeling is that softirq_activated is usually false, and the old code in such a case calls just __hrtimer_get_next_event(, HRTIMER_ACTIVE_ALL). While the new code will first call __hrtimer_get_next_event(, HRTIMER_ACTIVE_SOFT) and then __hrtimer_get_next_event(, HRTIMER_ACTIVE_HARD) Perhaps __hrtimer_get_next_event() should return both soft and hard event times in one function call? Or perhaps hrtimer_start should not call hrtimer_cancel? We've also been looking at the f_ncm driver itself, and trying to reduce the number of hrtimer_cancel/start calls. It's pretty easy to reduce this by a factor of 10x (we’re not yet 100% certain this is race free), which does drastically improve performance. However, it still only takes the regression from 60% to 20%. - Maciej