date:20210419

kernel/rcu/tasks.h:1031:6: warning: no previous prototype for 'show_rcu_tasks_gp_kthreads'

2021-04-19 Thread kernel test robot

Hi Paul,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   7af08140979a6e7e12b78c93b8625c8d25b084e2
commit: e21408ceec2de5be418efa39feb1e2c00f824a72 rcu-tasks: Add RCU tasks to 
rcutorture writer stall output
date:   12 months ago
config: h8300-allnoconfig (attached as .config)
compiler: h8300-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e21408ceec2de5be418efa39feb1e2c00f824a72
git remote add linus 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
git fetch --no-tags linus master
git checkout e21408ceec2de5be418efa39feb1e2c00f824a72
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross W=1 
ARCH=h8300 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

   In file included from kernel/rcu/update.c:587:
>> kernel/rcu/tasks.h:1031:6: warning: no previous prototype for 
>> 'show_rcu_tasks_gp_kthreads' [-Wmissing-prototypes]
1031 | void show_rcu_tasks_gp_kthreads(void) {}
 |  ^~


vim +/show_rcu_tasks_gp_kthreads +1031 kernel/rcu/tasks.h

  1028  
  1029  #else /* #ifdef CONFIG_TASKS_RCU_GENERIC */
  1030  static inline void rcu_tasks_bootup_oddness(void) {}
> 1031  void show_rcu_tasks_gp_kthreads(void) {}

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip

Re: Doubt regarding memory allocation in KVM

2021-04-19 Thread Paolo Bonzini


On 20/04/21 07:45, Shivank Garg wrote:

Hi,
I'm learning about qemu KVM, looking into code and experimenting on
it. I have the following doubts regarding it, I would be grateful if
you help me to get some idea on them.

1. I observe that KVM allocates memory to guests when it needs it but
doesn't take it back (except for ballooning case).
Also, the Qemu/KVM process does not free the memory even when the
guest is rebooted. In this case,  Does the Guest VM get access to
memory already pre-filled with some garbage from the previous run??


Yes.


(Since the host would allocate zeroed pages to guests the first time
it requests but after that it's up to guests). Can it be a security
issue?


No, it's the same that happens on non-virtual machine.


2. How does the KVM know if GPFN (guest physical frame number) is
backed by an actual machine frame number in host? If not mapped, then
it faults in the host and allocates a physical frame for guests in the
host. (kvm_mmu_page_fault)


It's all handled by Linux.  KVM only does a call to get_user_pages.  See 
functions whose name starts with hva_to_pfn in virt/kvm/kvm_main.c


Given a GPA, the GFN is simply the guest physical address minus bits 
0:11, so shifted right by 12.


Paolo

Re: [PATCH v1 2/2] powerpc: Enable OPTPROBES on PPC32

2021-04-19 Thread Naveen N. Rao


Christophe Leroy wrote:

For that, create a 32 bits version of patch_imm64_load_insns()
and create a patch_imm_load_insns() which calls
patch_imm32_load_insns() on PPC32 and patch_imm64_load_insns()
on PPC64.

Adapt optprobes_head.S for PPC32. Use PPC_LL/PPC_STL macros instead
of raw ld/std, opt out things linked to paca and use stmw/lmw to
save/restore registers.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig |  2 +-
 arch/powerpc/kernel/optprobes.c  | 24 +--
 arch/powerpc/kernel/optprobes_head.S | 46 +++-
 3 files changed, 53 insertions(+), 19 deletions(-)


Thanks for adding support for ppc32. It is good to see that it works 
without too many changes.




diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index c1344c05226c..49b538e54efb 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -227,7 +227,7 @@ config PPC
select HAVE_MOD_ARCH_SPECIFIC
select HAVE_NMI if PERF_EVENTS || (PPC64 && 
PPC_BOOK3S)
select HAVE_HARDLOCKUP_DETECTOR_ARCHif PPC64 && PPC_BOOK3S && SMP
-   select HAVE_OPTPROBES   if PPC64
+   select HAVE_OPTPROBES
select HAVE_PERF_EVENTS
select HAVE_PERF_EVENTS_NMI if PPC64
select HAVE_HARDLOCKUP_DETECTOR_PERFif PERF_EVENTS && HAVE_PERF_EVENTS_NMI 
&& !HAVE_HARDLOCKUP_DETECTOR_ARCH
diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c
index 58fdb9f66e0f..cdf87086fa33 100644
--- a/arch/powerpc/kernel/optprobes.c
+++ b/arch/powerpc/kernel/optprobes.c
@@ -141,11 +141,21 @@ void arch_remove_optimized_kprobe(struct optimized_kprobe 
*op)
}
 }

+static void patch_imm32_load_insns(unsigned long val, int reg, kprobe_opcode_t 
*addr)
+{
+   patch_instruction((struct ppc_inst *)addr,
+ ppc_inst(PPC_RAW_LIS(reg, IMM_H(val;
+   addr++;
+
+   patch_instruction((struct ppc_inst *)addr,
+ ppc_inst(PPC_RAW_ORI(reg, reg, IMM_L(val;
+}
+
 /*
  * Generate instructions to load provided immediate 64-bit value
  * to register 'reg' and patch these instructions at 'addr'.
  */
-static void patch_imm64_load_insns(unsigned long val, int reg, kprobe_opcode_t 
*addr)
+static void patch_imm64_load_insns(unsigned long long val, int reg, 
kprobe_opcode_t *addr)


Do you really need this?


 {
/* lis reg,(op)@highest */
patch_instruction((struct ppc_inst *)addr,
@@ -177,6 +187,14 @@ static void patch_imm64_load_insns(unsigned long val, int 
reg, kprobe_opcode_t *
   ___PPC_RS(reg) | (val & 0x)));
 }

+static void patch_imm_load_insns(unsigned long val, int reg, kprobe_opcode_t 
*addr)
+{
+   if (IS_ENABLED(CONFIG_PPC64))
+   patch_imm64_load_insns(val, reg, addr);
+   else
+   patch_imm32_load_insns(val, reg, addr);
+}
+
 int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe 
*p)
 {
struct ppc_inst branch_op_callback, branch_emulate_step, temp;
@@ -230,7 +248,7 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe 
*op, struct kprobe *p)
 * Fixup the template with instructions to:
 * 1. load the address of the actual probepoint
 */
-   patch_imm64_load_insns((unsigned long)op, 3, buff + TMPL_OP_IDX);
+   patch_imm_load_insns((unsigned long)op, 3, buff + TMPL_OP_IDX);

/*
 * 2. branch to optimized_callback() and emulate_step()
@@ -264,7 +282,7 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe 
*op, struct kprobe *p)
 * 3. load instruction to be emulated into relevant register, and
 */
temp = ppc_inst_read((struct ppc_inst *)p->ainsn.insn);
-   patch_imm64_load_insns(ppc_inst_as_ulong(temp), 4, buff + 
TMPL_INSN_IDX);
+   patch_imm_load_insns(ppc_inst_as_ulong(temp), 4, buff + TMPL_INSN_IDX);

/*
 * 4. branch back from trampoline
diff --git a/arch/powerpc/kernel/optprobes_head.S 
b/arch/powerpc/kernel/optprobes_head.S
index ff8ba4d3824d..49f31e554573 100644
--- a/arch/powerpc/kernel/optprobes_head.S
+++ b/arch/powerpc/kernel/optprobes_head.S
@@ -30,39 +30,47 @@ optinsn_slot:
.global optprobe_template_entry
 optprobe_template_entry:
/* Create an in-memory pt_regs */
-   stdur1,-INT_FRAME_SIZE(r1)
+   PPC_STLUr1,-INT_FRAME_SIZE(r1)
SAVE_GPR(0,r1)
/* Save the previous SP into stack */
addir0,r1,INT_FRAME_SIZE
-   std r0,GPR1(r1)
+   PPC_STL r0,GPR1(r1)
+#ifdef CONFIG_PPC64
SAVE_10GPRS(2,r1)
SAVE_10GPRS(12,r1)
SAVE_10GPRS(22,r1)
+#else
+   stmwr2, GPR2(r1)
+#endif
/* Save SPRS */
mfmsr   r5
-   std r5,_MSR(r1)
+   PPC_STL r5,_MSR(r1)
li  r5,0x700
-   std r5,_TRAP(r1)
+   PPC_STL r5,_TRAP(r1)
li  r5,0
-   std r5,ORIG_GPR3(r1)

Re: [PATCH v3 4/4] media: i2c: ov2659: Use clk_{prepare_enable,disable_unprepare}() to set xvclk on/off

2021-04-19 Thread Krzysztof Kozlowski

On Tue, 20 Apr 2021 at 08:46, dillon min  wrote:
>
> Hi All,
>
> Just a gentle ping, hope some expert could take a look, thanks.

Don't ping people after 5 days. It's not gentle.

Best regards,
Krzysztof

Re: [PATCH 5.10 000/103] 5.10.32-rc1 review

2021-04-19 Thread Naresh Kamboju

On Mon, 19 Apr 2021 at 18:49, Greg Kroah-Hartman
 wrote:
>
> This is the start of the stable review cycle for the 5.10.32 release.
> There are 103 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 21 Apr 2021 13:05:09 +.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.32-rc1.gz
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-5.10.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h


Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.

Tested-by: Linux Kernel Functional Testing 

## Build
* kernel: 5.10.32-rc1
* git: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
* git branch: linux-5.10.y
* git commit: bcedd92af6e5899132429d20a9322b12afec2188
* git describe: v5.10.31-104-gbcedd92af6e5
* test details:
https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.10.y/build/v5.10.31-104-gbcedd92af6e5

## No regressions (compared to v5.10.31-56-g86a799ba8d60)

## No fixes (compared to v5.10.31-56-g86a799ba8d60)

## Test result summary
 total: 69955, pass: 58611, fail: 1787, skip: 9326, xfail: 231,

## Build Summary
* arc: 10 total, 10 passed, 0 failed
* arm: 192 total, 192 passed, 0 failed
* arm64: 26 total, 26 passed, 0 failed
* dragonboard-410c: 1 total, 1 passed, 0 failed
* hi6220-hikey: 1 total, 1 passed, 0 failed
* i386: 25 total, 25 passed, 0 failed
* juno-r2: 1 total, 1 passed, 0 failed
* mips: 45 total, 45 passed, 0 failed
* parisc: 9 total, 9 passed, 0 failed
* powerpc: 27 total, 27 passed, 0 failed
* riscv: 21 total, 21 passed, 0 failed
* s390: 18 total, 18 passed, 0 failed
* sh: 18 total, 18 passed, 0 failed
* sparc: 9 total, 9 passed, 0 failed
* x15: 1 total, 1 passed, 0 failed
* x86: 1 total, 1 passed, 0 failed
* x86_64: 26 total, 26 passed, 0 failed

## Test suites summary
* fwts
* igt-gpu-tools
* install-android-platform-tools-r2600
* kselftest-
* kselftest-android
* kselftest-bpf
* kselftest-capabilities
* kselftest-cgroup
* kselftest-clone3
* kselftest-core
* kselftest-cpu-hotplug
* kselftest-cpufreq
* kselftest-efivarfs
* kselftest-filesystems
* kselftest-firmware
* kselftest-fpu
* kselftest-futex
* kselftest-gpio
* kselftest-intel_pstate
* kselftest-ipc
* kselftest-ir
* kselftest-kcmp
* kselftest-kexec
* kselftest-kvm
* kselftest-lib
* kselftest-livepatch
* kselftest-lkdtm
* kselftest-membarrier
* kselftest-memfd
* kselftest-memory-hotplug
* kselftest-mincore
* kselftest-mount
* kselftest-mqueue
* kselftest-net
* kselftest-netfilter
* kselftest-nsfs
* kselftest-openat2
* kselftest-pid_namespace
* kselftest-pidfd
* kselftest-proc
* kselftest-pstore
* kselftest-ptrace
* kselftest-rseq
* kselftest-rtc
* kselftest-seccomp
* kselftest-sigaltstack
* kselftest-size
* kselftest-splice
* kselftest-static_keys
* kselftest-sync
* kselftest-sysctl
* kselftest-tc-testing
* kselftest-timens
* kselftest-timers
* kselftest-tmpfs
* kselftest-tpm2
* kselftest-user
* kselftest-vm
* kselftest-vsyscall-mode-native-
* kselftest-vsyscall-mode-none-
* kselftest-x86
* kselftest-zram
* kunit
* kvm-unit-tests
* libhugetlbfs
* linux-log-parser
* ltp-cap_bounds-tests
* ltp-commands-tests
* ltp-containers-tests
* ltp-controllers-tests
* ltp-cpuhotplug-tests
* ltp-crypto-tests
* ltp-cve-tests
* ltp-dio-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-mm-tests
* ltp-nptl-tests
* ltp-open-posix-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-tracing-tests
* network-basic-tests
* perf
* rcutorture
* ssuite
* v4l2-compliance

--
Linaro LKFT
https://lkft.linaro.org

Re: [PATCH v3 4/4] media: i2c: ov2659: Use clk_{prepare_enable,disable_unprepare}() to set xvclk on/off

2021-04-19 Thread dillon min

Hi All,

Just a gentle ping, hope some expert could take a look, thanks.

Best regards.
Dillon

On Thu, Apr 15, 2021 at 12:06 PM  wrote:
>
> From: dillon min 
>
> On some platform(imx6q), xvclk might not switch on in advance,
> also for power save purpose, xvclk should not be always on.
> so, add clk_prepare_enable(), clk_disable_unprepare() in driver
> side to set xvclk on/off at proper stage.
>
> Add following changes:
> - add 'struct clk *clk;' in 'struct ov2659 {}'
> - enable xvclk in ov2659_power_on()
> - disable xvclk in ov2659_power_off()
>
> Signed-off-by: dillon min 
> ---
> v3: optimize commit message
>
>  drivers/media/i2c/ov2659.c | 24 ++--
>  1 file changed, 18 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/media/i2c/ov2659.c b/drivers/media/i2c/ov2659.c
> index 42f64175a6df..fb78a1cedc03 100644
> --- a/drivers/media/i2c/ov2659.c
> +++ b/drivers/media/i2c/ov2659.c
> @@ -204,6 +204,7 @@ struct ov2659 {
> struct i2c_client *client;
> struct v4l2_ctrl_handler ctrls;
> struct v4l2_ctrl *link_frequency;
> +   struct clk *clk;
> const struct ov2659_framesize *frame_size;
> struct sensor_register *format_ctrl_regs;
> struct ov2659_pll_ctrl pll;
> @@ -1270,6 +1271,8 @@ static int ov2659_power_off(struct device *dev)
>
> gpiod_set_value(ov2659->pwdn_gpio, 1);
>
> +   clk_disable_unprepare(ov2659->clk);
> +
> return 0;
>  }
>
> @@ -1278,9 +1281,17 @@ static int ov2659_power_on(struct device *dev)
> struct i2c_client *client = to_i2c_client(dev);
> struct v4l2_subdev *sd = i2c_get_clientdata(client);
> struct ov2659 *ov2659 = to_ov2659(sd);
> +   int ret;
>
> dev_dbg(&client->dev, "%s:\n", __func__);
>
> +   ret = clk_prepare_enable(ov2659->clk);
> +   if (ret) {
> +   dev_err(&client->dev, "%s: failed to enable clock\n",
> +   __func__);
> +   return ret;
> +   }
> +
> gpiod_set_value(ov2659->pwdn_gpio, 0);
>
> if (ov2659->resetb_gpio) {
> @@ -1425,7 +1436,6 @@ static int ov2659_probe(struct i2c_client *client)
> const struct ov2659_platform_data *pdata = ov2659_get_pdata(client);
> struct v4l2_subdev *sd;
> struct ov2659 *ov2659;
> -   struct clk *clk;
> int ret;
>
> if (!pdata) {
> @@ -1440,11 +1450,11 @@ static int ov2659_probe(struct i2c_client *client)
> ov2659->pdata = pdata;
> ov2659->client = client;
>
> -   clk = devm_clk_get(&client->dev, "xvclk");
> -   if (IS_ERR(clk))
> -   return PTR_ERR(clk);
> +   ov2659->clk = devm_clk_get(&client->dev, "xvclk");
> +   if (IS_ERR(ov2659->clk))
> +   return PTR_ERR(ov2659->clk);
>
> -   ov2659->xvclk_frequency = clk_get_rate(clk);
> +   ov2659->xvclk_frequency = clk_get_rate(ov2659->clk);
> if (ov2659->xvclk_frequency < 600 ||
> ov2659->xvclk_frequency > 2700)
> return -EINVAL;
> @@ -1506,7 +1516,9 @@ static int ov2659_probe(struct i2c_client *client)
> ov2659->frame_size = &ov2659_framesizes[2];
> ov2659->format_ctrl_regs = ov2659_formats[0].format_ctrl_regs;
>
> -   ov2659_power_on(&client->dev);
> +   ret = ov2659_power_on(&client->dev);
> +   if (ret < 0)
> +   goto error;
>
> ret = ov2659_detect(sd);
> if (ret < 0)
> --
> 2.7.4
>

Re: [RFC] memory reserve for userspace oom-killer

2021-04-19 Thread Michal Hocko

On Mon 19-04-21 18:44:02, Shakeel Butt wrote:
> Proposal: Provide memory guarantees to userspace oom-killer.
> 
> Background:
> 
> Issues with kernel oom-killer:
> 1. Very conservative and prefer to reclaim. Applications can suffer
> for a long time.
> 2. Borrows the context of the allocator which can be resource limited
> (low sched priority or limited CPU quota).
> 3. Serialized by global lock.
> 4. Very simplistic oom victim selection policy.
> 
> These issues are resolved through userspace oom-killer by:
> 1. Ability to monitor arbitrary metrics (PSI, vmstat, memcg stats) to
> early detect suffering.
> 2. Independent process context which can be given dedicated CPU quota
> and high scheduling priority.
> 3. Can be more aggressive as required.
> 4. Can implement sophisticated business logic/policies.
> 
> Android's LMKD and Facebook's oomd are the prime examples of userspace
> oom-killers. One of the biggest challenges for userspace oom-killers
> is to potentially function under intense memory pressure and are prone
> to getting stuck in memory reclaim themselves. Current userspace
> oom-killers aim to avoid this situation by preallocating user memory
> and protecting themselves from global reclaim by either mlocking or
> memory.min. However a new allocation from userspace oom-killer can
> still get stuck in the reclaim and policy rich oom-killer do trigger
> new allocations through syscalls or even heap.

Can you be more specific please?

> Our attempt of userspace oom-killer faces similar challenges.
> Particularly at the tail on the very highly utilized machines we have
> observed userspace oom-killer spectacularly failing in many possible
> ways in the direct reclaim. We have seen oom-killer stuck in direct
> reclaim throttling, stuck in reclaim and allocations from interrupts
> keep stealing reclaimed memory. We have even observed systems where
> all the processes were stuck in throttle_direct_reclaim() and only
> kswapd was running and the interrupts kept stealing the memory
> reclaimed by kswapd.
> 
> To reliably solve this problem, we need to give guaranteed memory to
> the userspace oom-killer.

There is nothing like that. Even memory reserves are a finite resource
which can be consumed as it is sharing those reserves with other users
who are not necessarily coordinated. So before we start discussing
making this even more muddy by handing over memory reserves to the
userspace we should really examine whether pre-allocation is something
that will not work.

> At the moment we are contemplating between
> the following options and I would like to get some feedback.
> 
> 1. prctl(PF_MEMALLOC)
> 
> The idea is to give userspace oom-killer (just one thread which is
> finding the appropriate victims and will be sending SIGKILLs) access
> to MEMALLOC reserves. Most of the time the preallocation, mlock and
> memory.min will be good enough but for rare occasions, when the
> userspace oom-killer needs to allocate, the PF_MEMALLOC flag will
> protect it from reclaim and let the allocation dip into the memory
> reserves.

I do not think that handing over an unlimited ticket to the memory
reserves to userspace is a good idea. Even the in kernel oom killer is
bound to a partial access to reserves. So if we really want this then
it should be in sync with and bound by the ALLOC_OOM.

> The misuse of this feature would be risky but it can be limited to
> privileged applications. Userspace oom-killer is the only appropriate
> user of this feature. This option is simple to implement.
> 
> 2. Mempool
> 
> The idea is to preallocate mempool with a given amount of memory for
> userspace oom-killer. Preferably this will be per-thread and
> oom-killer can preallocate mempool for its specific threads. The core
> page allocator can check before going to the reclaim path if the task
> has private access to the mempool and return page from it if yes.

Could you elaborate some more on how this would be controlled from the
userspace? A dedicated syscall? A driver?

> This option would be more complicated than the previous option as the
> lifecycle of the page from the mempool would be more sophisticated.
> Additionally the current mempool does not handle higher order pages
> and we might need to extend it to allow such allocations. Though this
> feature might have more use-cases and it would be less risky than the
> previous option.

I would tend to agree.

> Another idea I had was to use kthread based oom-killer and provide the
> policies through eBPF program. Though I am not sure how to make it
> monitor arbitrary metrics and if that can be done without any
> allocations.

A kernel module or eBPF to implement oom decisions has already been
discussed few years back. But I am afraid this would be hard to wire in
for anything except for the victim selection. I am not sure it is
maintainable to also control when the OOM handling should trigger.

-- 
Michal Hocko
SUSE Labs

Re: [PATCH] hrtimer: Update softirq_expires_next correctly after __hrtimer_get_next_event()

2021-04-19 Thread Thomas Gleixner

On Mon, Apr 19 2021 at 20:12, Maciej Żenczykowski wrote:
> On Thu, Apr 15, 2021 at 9:47 AM Thomas Gleixner  wrote:
>> Run the test on a kernels with and without that commit and collect trace
>> data for both.
>>
>> That should give me a pretty clear picture what's going on.
>
> Lorenzo is trying to get the traces you asked for, or rather he’s
> trying to get confirmation he can post them.
>
> Our initial observation of these results seems to suggest that
> updating the timer (hrtimer_start, which seems to also call
> hrtimer_cancel) takes twice as long as it used to.

Which contradicts my measurements. The change in complexity is marginal
and the overhead in cycles/instructions is close to noise. It's
measurable in a microbenchmark, but it's in the < 1% range which is far
away from the 60% you are seing.

> My gut feeling is that softirq_activated is usually false, and the old
> code in such a case calls just __hrtimer_get_next_event(,
> HRTIMER_ACTIVE_ALL).  While the new code will first call
> __hrtimer_get_next_event(, HRTIMER_ACTIVE_SOFT) and then
> __hrtimer_get_next_event(, HRTIMER_ACTIVE_HARD)
>
> Perhaps __hrtimer_get_next_event() should return both soft and hard
> event times in one function call?
> Or perhaps hrtimer_start should not call hrtimer_cancel?

Perhaps we do a proper analysis first :)

Thanks,

tglx

Re: [PATCH v3 3/4] arm: dts: imx: Add i.mx6q DaSheng COM-9XX SBC board support

2021-04-19 Thread dillon min

Hi，

Just a gentle ping, hope some expert could take a look, thanks.

Best regards.
Dillon

On Thu, Apr 15, 2021 at 12:05 PM  wrote:
>
> From: dillon min 
>
> The DaSheng Com-9xx is and ARM based signle board computer (SBC)
> featuring:
> - i.MX6Q
> - 2GiB LPDDR3 DRAM
> - 8GiB eMMC 5.0 FLASH
> - 4MiB SPI Flash
> - USB 2.0 Host/Device
> - Multiple multi-protocol RS232/RS485 Serial ports
> - microSD socket
> - 5V DC power input
> - HDMI1.4a,1080p@60
> - RGMIIx1 Gigabit Ethernet
> - CSI0x1, connect with ov2659
>
> Signed-off-by: dillon min 
> ---
> v3: move imx6q-ds.dtb after imx6q-dms-ba16.dtb to follow the alphabetical 
> order
>
>  arch/arm/boot/dts/Makefile|   1 +
>  arch/arm/boot/dts/imx6q-ds.dts|  17 ++
>  arch/arm/boot/dts/imx6qdl-ds.dtsi | 465 
> ++
>  3 files changed, 483 insertions(+)
>  create mode 100644 arch/arm/boot/dts/imx6q-ds.dts
>  create mode 100644 arch/arm/boot/dts/imx6qdl-ds.dtsi
>
> diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
> index a19c5ab9df84..425fe17ef7c1 100644
> --- a/arch/arm/boot/dts/Makefile
> +++ b/arch/arm/boot/dts/Makefile
> @@ -510,6 +510,7 @@ dtb-$(CONFIG_SOC_IMX6Q) += \
> imx6q-display5-tianma-tm070-1280x768.dtb \
> imx6q-dmo-edmqmx6.dtb \
> imx6q-dms-ba16.dtb \
> +   imx6q-ds.dtb \
> imx6q-emcon-avari.dtb \
> imx6q-evi.dtb \
> imx6q-gk802.dtb \
> diff --git a/arch/arm/boot/dts/imx6q-ds.dts b/arch/arm/boot/dts/imx6q-ds.dts
> new file mode 100644
> index ..b0a63a133977
> --- /dev/null
> +++ b/arch/arm/boot/dts/imx6q-ds.dts
> @@ -0,0 +1,17 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +//
> +// Copyright 2021 Dillon Min 
> +//
> +// Based on imx6qdl-sabresd.dtsi which is:
> +// Copyright 2012 Freescale Semiconductor, Inc.
> +// Copyright 2011 Linaro Ltd.
> +
> +/dts-v1/;
> +
> +#include "imx6q.dtsi"
> +#include "imx6qdl-ds.dtsi"
> +
> +/ {
> +   model = "DaSheng i.MX6 Quad Com-9xx Board";
> +   compatible = "ds,imx6q-sbc", "fsl,imx6q";
> +};
> diff --git a/arch/arm/boot/dts/imx6qdl-ds.dtsi 
> b/arch/arm/boot/dts/imx6qdl-ds.dtsi
> new file mode 100644
> index ..d28e065349cd
> --- /dev/null
> +++ b/arch/arm/boot/dts/imx6qdl-ds.dtsi
> @@ -0,0 +1,465 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +//
> +// Copyright 2021 Dillon Min 
> +//
> +// Based on imx6qdl-sabresd.dtsi which is:
> +// Copyright 2012 Freescale Semiconductor, Inc.
> +// Copyright 2011 Linaro Ltd.
> +
> +#include 
> +#include 
> +#include 
> +
> +/ {
> +   chosen {
> +   stdout-path = &uart1;
> +   };
> +
> +   memory@1000 {
> +   device_type = "memory";
> +   reg = <0x1000 0x8000>;
> +   };
> +
> +   reg_usb_otg_vbus: regulator-usb-otg-vbus {
> +   compatible = "regulator-fixed";
> +   regulator-name = "usb_otg_vbus";
> +   regulator-min-microvolt = <500>;
> +   regulator-max-microvolt = <500>;
> +   regulator-always-on;
> +   };
> +
> +   reg_usb_h1_vbus: regulator-usb-h1-vbus {
> +   compatible = "regulator-fixed";
> +   regulator-name = "usb_h1_vbus";
> +   regulator-min-microvolt = <500>;
> +   regulator-max-microvolt = <500>;
> +   regulator-always-on;
> +   };
> +
> +   leds {
> +   compatible = "gpio-leds";
> +   pinctrl-names = "default";
> +   pinctrl-0 = <&pinctrl_gpio_leds>;
> +
> +   green {
> +   gpios = <&gpio4 8 0>;
> +   default-state = "on";
> +   linux,default-trigger = "heartbeat";
> +   };
> +   };
> +};
> +
> +&ipu1_csi0_from_ipu1_csi0_mux {
> +   bus-width = <8>;
> +   data-shift = <12>; /* Lines 19:12 used */
> +   hsync-active = <1>;
> +   vsync-active = <1>;
> +};
> +
> +&ipu1_csi0_mux_from_parallel_sensor {
> +   remote-endpoint = <&ov2659_to_ipu1_csi0_mux>;
> +};
> +
> +&ipu1_csi0 {
> +   pinctrl-names = "default";
> +   pinctrl-0 = <&pinctrl_ipu1_csi0>;
> +   status = "okay";
> +};
> +
> +&clks {
> +   assigned-clocks = <&clks IMX6QDL_CLK_LDB_DI0_SEL>,
> + <&clks IMX6QDL_CLK_LDB_DI1_SEL>;
> +   assigned-clock-parents = <&clks IMX6QDL_CLK_PLL3_USB_OTG>,
> +<&clks IMX6QDL_CLK_PLL3_USB_OTG>;
> +};
> +
> +&ecspi1 {
> +   cs-gpios = <&gpio4 9 GPIO_ACTIVE_LOW>;
> +   pinctrl-names = "default";
> +   pinctrl-0 = <&pinctrl_ecspi1>;
> +   status = "okay";
> +
> +   flash: m25p80@0 {
> +   #address-cells = <1>;
> +   #size-cells = <1>;
> +   compatible = "st,m25p80", "jedec,spi-nor";
> +   spi-max-frequency = <2000>;
> +   reg = <0>;
> +   };
> +};
> +
> +&fec {
> +   pinctrl-names = "default";
> +   pinctrl-0 =

[PATCH] MIPS: pci-legacy: revert "use generic pci_enable_resources"

2021-04-19 Thread Ilya Lipnitskiy

This mostly reverts commit 99bca615d895 ("MIPS: pci-legacy: use generic
pci_enable_resources"). Fixes regressions such as:
  ata_piix :00:0a.1: can't enable device: BAR 0 [io  0x01f0-0x01f7] not
claimed
  ata_piix: probe of :00:0a.1 failed with error -22

The only changes from the strict revert are to fix checkpatch errors:
  ERROR: spaces required around that '=' (ctx:VxV)
  #33: FILE: arch/mips/pci/pci-legacy.c:252:
  + for (idx=0; idx < PCI_NUM_RESOURCES; idx++) {
^

  ERROR: do not use assignment in if condition
  #67: FILE: arch/mips/pci/pci-legacy.c:284:
  + if ((err = pcibios_enable_resources(dev, mask)) < 0)

Reported-by: Guenter Roeck 
Signed-off-by: Ilya Lipnitskiy 
---
Guenter, sorry about this - let's just revert this for now to minimize
turmoil to the legacy PCI code. Obviously, this needs more testing
before applying. Thanks for your report!

 arch/mips/pci/pci-legacy.c | 38 +-
 1 file changed, 37 insertions(+), 1 deletion(-)

diff --git a/arch/mips/pci/pci-legacy.c b/arch/mips/pci/pci-legacy.c
index c24226ea0a6e..468722c8a5c6 100644
--- a/arch/mips/pci/pci-legacy.c
+++ b/arch/mips/pci/pci-legacy.c
@@ -241,9 +241,45 @@ static int __init pcibios_init(void)
 
 subsys_initcall(pcibios_init);
 
+static int pcibios_enable_resources(struct pci_dev *dev, int mask)
+{
+   u16 cmd, old_cmd;
+   int idx;
+   struct resource *r;
+
+   pci_read_config_word(dev, PCI_COMMAND, &cmd);
+   old_cmd = cmd;
+   for (idx = 0; idx < PCI_NUM_RESOURCES; idx++) {
+   /* Only set up the requested stuff */
+   if (!(mask & (1flags & (IORESOURCE_IO | IORESOURCE_MEM)))
+   continue;
+   if ((idx == PCI_ROM_RESOURCE) &&
+   (!(r->flags & IORESOURCE_ROM_ENABLE)))
+   continue;
+   if (!r->start && r->end) {
+   pci_err(dev,
+   "can't enable device: resource collisions\n");
+   return -EINVAL;
+   }
+   if (r->flags & IORESOURCE_IO)
+   cmd |= PCI_COMMAND_IO;
+   if (r->flags & IORESOURCE_MEM)
+   cmd |= PCI_COMMAND_MEMORY;
+   }
+   if (cmd != old_cmd) {
+   pci_info(dev, "enabling device (%04x -> %04x)\n", old_cmd, cmd);
+   pci_write_config_word(dev, PCI_COMMAND, cmd);
+   }
+   return 0;
+}
+
 int pcibios_enable_device(struct pci_dev *dev, int mask)
 {
-   int err = pci_enable_resources(dev, mask);
+   int err = pcibios_enable_resources(dev, mask);
 
if (err < 0)
return err;
-- 
2.31.1

Re: [PATCH V3 2/4] soc: qcom: dcc:Add driver support for Data Capture and Compare unit(DCC)

2021-04-19 Thread schowdhu


On 2021-04-19 20:02, Bjorn Andersson wrote:

On Mon 19 Apr 05:32 CDT 2021, schow...@codeaurora.org wrote:


On 2021-04-15 12:01, Felipe Balbi wrote:
> Hi,
>
> Souradeep Chowdhury  writes:
> > diff --git a/drivers/soc/qcom/Makefile b/drivers/soc/qcom/Makefile
> > index ad675a6..e7f0ccb 100644
> > --- a/drivers/soc/qcom/Makefile
> > +++ b/drivers/soc/qcom/Makefile
> > @@ -1,19 +1,22 @@
> >  # SPDX-License-Identifier: GPL-2.0
> >  CFLAGS_rpmh-rsc.o := -I$(src)
> >  obj-$(CONFIG_QCOM_AOSS_QMP) +=   qcom_aoss.o
> > -obj-$(CONFIG_QCOM_GENI_SE) +=qcom-geni-se.o
> > +obj-$(CONFIG_QCOM_APR) += apr.o
> >  obj-$(CONFIG_QCOM_COMMAND_DB) += cmd-db.o
> >  obj-$(CONFIG_QCOM_CPR)   += cpr.o
> > +obj-$(CONFIG_QCOM_DCC) += dcc.o
> > +obj-$(CONFIG_QCOM_GENI_SE) +=   qcom-geni-se.o
> >  obj-$(CONFIG_QCOM_GSBI)  +=  qcom_gsbi.o
> > +obj-$(CONFIG_QCOM_KRYO_L2_ACCESSORS) += kryo-l2-accessors.o
> > +obj-$(CONFIG_QCOM_LLCC) += llcc-qcom.o
> >  obj-$(CONFIG_QCOM_MDT_LOADER)+= mdt_loader.o
> >  obj-$(CONFIG_QCOM_OCMEM) += ocmem.o
> >  obj-$(CONFIG_QCOM_PDR_HELPERS)   += pdr_interface.o
> >  obj-$(CONFIG_QCOM_QMI_HELPERS)   += qmi_helpers.o
> > -qmi_helpers-y+= qmi_encdec.o qmi_interface.o
> >  obj-$(CONFIG_QCOM_RMTFS_MEM) += rmtfs_mem.o
> >  obj-$(CONFIG_QCOM_RPMH)  += qcom_rpmh.o
> > -qcom_rpmh-y  += rpmh-rsc.o
> > -qcom_rpmh-y  += rpmh.o
> > +obj-$(CONFIG_QCOM_RPMHPD) += rpmhpd.o
> > +obj-$(CONFIG_QCOM_RPMPD) += rpmpd.o
> >  obj-$(CONFIG_QCOM_SMD_RPM)   += smd-rpm.o
> >  obj-$(CONFIG_QCOM_SMEM) +=   smem.o
> >  obj-$(CONFIG_QCOM_SMEM_STATE) += smem_state.o
> > @@ -21,8 +24,6 @@ obj-$(CONFIG_QCOM_SMP2P)+= smp2p.o
> >  obj-$(CONFIG_QCOM_SMSM)  += smsm.o
> >  obj-$(CONFIG_QCOM_SOCINFO)   += socinfo.o
> >  obj-$(CONFIG_QCOM_WCNSS_CTRL) += wcnss_ctrl.o
> > -obj-$(CONFIG_QCOM_APR) += apr.o
> > -obj-$(CONFIG_QCOM_LLCC) += llcc-qcom.o
> > -obj-$(CONFIG_QCOM_RPMHPD) += rpmhpd.o
> > -obj-$(CONFIG_QCOM_RPMPD) += rpmpd.o
> > -obj-$(CONFIG_QCOM_KRYO_L2_ACCESSORS) +=  kryo-l2-accessors.o
> > +qmi_helpers-y   += qmi_encdec.o qmi_interface.o
> > +qcom_rpmh-y += rpmh-rsc.o
> > +qcom_rpmh-y += rpmh.o
>
> why so many changes?

This has been accidentally sorted based on the config names. Will be 
fixing

this in next version of the patch.

>
> > diff --git a/drivers/soc/qcom/dcc.c b/drivers/soc/qcom/dcc.c
> > new file mode 100644
> > index 000..fcd5580
> > --- /dev/null
> > +++ b/drivers/soc/qcom/dcc.c
> > @@ -0,0 +1,1539 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Copyright (c) 2015-2021, The Linux Foundation. All rights
> > reserved.
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +
>
> one blank line is enough

Ack

>
> > +#define TIMEOUT_US   100
> > +
> > +#define dcc_writel(drvdata, val, off)  
  \
> > + writel((val), drvdata->base + dcc_offset_conv(drvdata, off))
> > +#define dcc_readl(drvdata, off)
  \
> > + readl(drvdata->base + dcc_offset_conv(drvdata, off))
> > +
> > +#define dcc_sram_readl(drvdata, off) \
> > + readl(drvdata->ram_base + off)
>
> this would be probably be better as static inlines.

These are simple read and write operations used in the driver
which just calls the generic writel and readl function.
That's why macros have been used here to lesson the overhead
of an extra function call.


The compiler will realize that your static dcc_sram_readl() is cheaper
to inline than call and do so for you. So you can expect that there's 
no

difference in the output from the compiler, and if there is then the
compiler knows something that you're overlooking.


Ack. Will go for static inline here.



Regards,
Bjorn

Re: [PATCH -next] tools/testing/nvdimm: Make symbol '__nfit_test_ioremap' static

2021-04-19 Thread Santosh Sivaraj



Hi Zou,

Zou Wei  writes:

> The sparse tool complains as follows:
>
> tools/testing/nvdimm/test/iomap.c:65:14: warning:
>  symbol '__nfit_test_ioremap' was not declared. Should it be static?
>
> This symbol is not used outside of security.c, so this

s/security.c/iomap.c/

Thanks,
Santosh

> commit marks it static.
>
> Reported-by: Hulk Robot 
> Signed-off-by: Zou Wei 
> ---
>  tools/testing/nvdimm/test/iomap.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/testing/nvdimm/test/iomap.c 
> b/tools/testing/nvdimm/test/iomap.c
> index c62d372..ed563bd 100644
> --- a/tools/testing/nvdimm/test/iomap.c
> +++ b/tools/testing/nvdimm/test/iomap.c
> @@ -62,7 +62,7 @@ struct nfit_test_resource *get_nfit_res(resource_size_t 
> resource)
>  }
>  EXPORT_SYMBOL(get_nfit_res);
>  
> -void __iomem *__nfit_test_ioremap(resource_size_t offset, unsigned long size,
> +static void __iomem *__nfit_test_ioremap(resource_size_t offset, unsigned 
> long size,
>   void __iomem *(*fallback_fn)(resource_size_t, unsigned long))
>  {
>   struct nfit_test_resource *nfit_res = get_nfit_res(offset);
> -- 
> 2.6.2
> ___
> Linux-nvdimm mailing list -- linux-nvd...@lists.01.org
> To unsubscribe send an email to linux-nvdimm-le...@lists.01.org

Re: [PATCH AUTOSEL 5.11 17/23] i2c: mv64xxx: Fix random system lock caused by runtime PM

2021-04-19 Thread Marek Behún

On Mon, 19 Apr 2021 16:43:36 -0400
Sasha Levin  wrote:

> This first appeared with commit e5c02cf54154 ("i2c: mv64xxx: Add runtime
> PM support").

I forgot to add Fixes: tag to this commit. But the bug first appeared with
commit
  e5c02cf54154 ("i2c: mv64xxx: Add runtime PM support")
which is in 5.12, but not 5.11 or any others.

So this fix is not needed for the stable releases (althogh it does not
break anything on those...).

Marek

Re: [PATCH 5.11 000/122] 5.11.16-rc1 review

2021-04-19 Thread Naresh Kamboju

On Mon, 19 Apr 2021 at 18:39, Greg Kroah-Hartman
 wrote:
>
> This is the start of the stable review cycle for the 5.11.16 release.
> There are 122 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 21 Apr 2021 13:05:09 +.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.11.16-rc1.gz
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-5.11.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.

Tested-by: Linux Kernel Functional Testing 

## Build
* kernel: 5.11.16-rc1
* git: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
* git branch: linux-5.11.y
* git commit: f34f787f0e47983f4f18c818b2cf23f5e3bf576e
* git describe: v5.11.15-123-gf34f787f0e47
* test details:
https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.11.y/build/v5.11.15-123-gf34f787f0e47

## No regressions (compared to v5.11.14-24-g7825299a896f)

## No fixes (compared to v5.11.14-24-g7825299a896f)

## Test result summary
 total: 76776, pass: 63385, fail: 2827, skip: 10320, xfail: 244,

## Build Summary
* arc: 10 total, 10 passed, 0 failed
* arm: 192 total, 192 passed, 0 failed
* arm64: 26 total, 26 passed, 0 failed
* dragonboard-410c: 2 total, 2 passed, 0 failed
* hi6220-hikey: 2 total, 2 passed, 0 failed
* i386: 26 total, 26 passed, 0 failed
* juno-r2: 2 total, 2 passed, 0 failed
* mips: 45 total, 45 passed, 0 failed
* parisc: 9 total, 9 passed, 0 failed
* powerpc: 27 total, 27 passed, 0 failed
* riscv: 21 total, 21 passed, 0 failed
* s390: 18 total, 18 passed, 0 failed
* sh: 18 total, 18 passed, 0 failed
* sparc: 9 total, 9 passed, 0 failed
* x15: 1 total, 0 passed, 1 failed
* x86: 2 total, 2 passed, 0 failed
* x86_64: 26 total, 26 passed, 0 failed

## Test suites summary
* fwts
* igt-gpu-tools
* install-android-platform-tools-r2600
* kselftest-
* kselftest-android
* kselftest-bpf
* kselftest-capabilities
* kselftest-cgroup
* kselftest-clone3
* kselftest-core
* kselftest-cpu-hotplug
* kselftest-cpufreq
* kselftest-efivarfs
* kselftest-filesystems
* kselftest-firmware
* kselftest-fpu
* kselftest-futex
* kselftest-gpio
* kselftest-intel_pstate
* kselftest-ipc
* kselftest-ir
* kselftest-kcmp
* kselftest-kexec
* kselftest-kvm
* kselftest-lib
* kselftest-livepatch
* kselftest-lkdtm
* kselftest-membarrier
* kselftest-memfd
* kselftest-memory-hotplug
* kselftest-mincore
* kselftest-mount
* kselftest-mqueue
* kselftest-net
* kselftest-netfilter
* kselftest-nsfs
* kselftest-openat2
* kselftest-pid_namespace
* kselftest-pidfd
* kselftest-proc
* kselftest-pstore
* kselftest-ptrace
* kselftest-rseq
* kselftest-rtc
* kselftest-seccomp
* kselftest-sigaltstack
* kselftest-size
* kselftest-splice
* kselftest-static_keys
* kselftest-sync
* kselftest-sysctl
* kselftest-tc-testing
* kselftest-timens
* kselftest-timers
* kselftest-tmpfs
* kselftest-tpm2
* kselftest-user
* kselftest-vm
* kselftest-vsyscall-mode-native-
* kselftest-vsyscall-mode-none-
* kselftest-x86
* kselftest-zram
* kunit
* kvm-unit-tests
* libhugetlbfs
* linux-log-parser
* ltp-cap_bounds-tests
* ltp-commands-tests
* ltp-containers-tests
* ltp-controllers-tests
* ltp-cpuhotplug-tests
* ltp-crypto-tests
* ltp-cve-tests
* ltp-dio-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-mm-tests
* ltp-nptl-tests
* ltp-open-posix-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-tracing-tests
* network-basic-tests
* perf
* rcutorture
* ssuite
* v4l2-compliance

--
Linaro LKFT
https://lkft.linaro.org

[tip:auto-latest] BUILD SUCCESS e48f64fea89e61301248c4be02b406d934a4bac3

2021-04-19 Thread kernel test robot

nfig
alpha   defconfig
alphaallyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
s390 allmodconfig
parisc   allyesconfig
s390defconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
x86_64   randconfig-a003-20210419
x86_64   randconfig-a001-20210419
x86_64   randconfig-a005-20210419
x86_64   randconfig-a002-20210419
x86_64   randconfig-a006-20210419
x86_64   randconfig-a004-20210419
i386 randconfig-a003-20210419
i386 randconfig-a001-20210419
i386 randconfig-a006-20210419
i386 randconfig-a005-20210419
i386 randconfig-a004-20210419
i386 randconfig-a002-20210419
i386 randconfig-a015-20210419
i386 randconfig-a013-20210419
i386 randconfig-a014-20210419
i386 randconfig-a016-20210419
i386 randconfig-a012-20210419
i386 randconfig-a011-20210419
riscvnommu_k210_defconfig
riscvnommu_virt_defconfig
riscv allnoconfig
riscv   defconfig
riscv  rv32_defconfig
um   allmodconfig
umallnoconfig
um   allyesconfig
um  defconfig
x86_64rhel-8.3-kselftests
x86_64  defconfig
x86_64   rhel-8.3
x86_64  rhel-8.3-kbuiltin
x86_64  kexec

clang tested configs:
x86_64   randconfig-a014-20210419
x86_64   randconfig-a015-20210419
x86_64   randconfig-a013-20210419
x86_64   randconfig-a011-20210419
x86_64   randconfig-a012-20210419
x86_64   randconfig-a016-20210419

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org

Re: [PATCH 2/4] PKCS#7: Check codeSigning EKU for kernel module and kexec pe verification

2021-04-19 Thread joeyli

Hi Varad,

Thanks for your review!

On Thu, Apr 15, 2021 at 02:08:32PM +0200, Varad Gautam wrote:
> Hi Joey,
> 
> On 4/9/21 4:46 AM, Lee, Chun-Yi wrote:
> > This patch adds the logic for checking the CodeSigning extended
> > key usage when verifying signature of kernel module or
> > kexec PE binary in PKCS#7.
> > 
> > Signed-off-by: "Lee, Chun-Yi" 
> > ---
> >  certs/system_keyring.c   |  2 +-
> >  crypto/asymmetric_keys/Kconfig   |  9 +
> >  crypto/asymmetric_keys/pkcs7_trust.c | 37 
> > +---
> >  include/crypto/pkcs7.h   |  3 ++-
> >  4 files changed, 46 insertions(+), 5 deletions(-)
> > 
> > diff --git a/certs/system_keyring.c b/certs/system_keyring.c
> > index 4b693da488f1..c9f8bca0b0d3 100644
> > --- a/certs/system_keyring.c
> > +++ b/certs/system_keyring.c
> > @@ -243,7 +243,7 @@ int verify_pkcs7_message_sig(const void *data, size_t 
> > len,
> > goto error;
> > }
> > }
> > -   ret = pkcs7_validate_trust(pkcs7, trusted_keys);
> > +   ret = pkcs7_validate_trust(pkcs7, trusted_keys, usage);
> > if (ret < 0) {
> > if (ret == -ENOKEY)
> > pr_devel("PKCS#7 signature not signed with a trusted 
> > key\n");
> > diff --git a/crypto/asymmetric_keys/Kconfig b/crypto/asymmetric_keys/Kconfig
> > index 1f1f004dc757..1754812df989 100644
> > --- a/crypto/asymmetric_keys/Kconfig
> > +++ b/crypto/asymmetric_keys/Kconfig
> > @@ -96,4 +96,13 @@ config SIGNED_PE_FILE_VERIFICATION
> >   This option provides support for verifying the signature(s) on a
> >   signed PE binary.
> >  
> > +config CHECK_CODESIGN_EKU
> > +   bool "Check codeSigning extended key usage"
> > +   depends on PKCS7_MESSAGE_PARSER=y
> > +   depends on SYSTEM_DATA_VERIFICATION
> > +   help
> > + This option provides support for checking the codeSigning extended
> > + key usage when verifying the signature in PKCS#7. It affects kernel
> > + module verification and kexec PE binary verification.
> > +
> >  endif # ASYMMETRIC_KEY_TYPE
> > diff --git a/crypto/asymmetric_keys/pkcs7_trust.c 
> > b/crypto/asymmetric_keys/pkcs7_trust.c
> > index b531df2013c4..077bfef928b6 100644
> > --- a/crypto/asymmetric_keys/pkcs7_trust.c
> > +++ b/crypto/asymmetric_keys/pkcs7_trust.c
> > @@ -16,12 +16,36 @@
> >  #include 
> >  #include "pkcs7_parser.h"
> >  
> > +#ifdef CONFIG_CHECK_CODESIGN_EKU
> > +static bool check_codesign_eku(struct key *key,
> > +enum key_being_used_for usage)
> > +{
> > +   struct public_key *public_key = key->payload.data[asym_crypto];
> > +
> > +   switch (usage) {
> > +   case VERIFYING_MODULE_SIGNATURE:
> > +   case VERIFYING_KEXEC_PE_SIGNATURE:
> > +   return !!(public_key->eku & EKU_codeSigning);
> > +   default:
> > +   break;
> > +   }
> > +   return true;
> > +}
> > +#else
> > +static bool check_codesign_eku(struct key *key,
> > +enum key_being_used_for usage)
> > +{
> > +   return true;
> > +}
> > +#endif
> > +
> >  /*
> >   * Check the trust on one PKCS#7 SignedInfo block.
> >   */
> >  static int pkcs7_validate_trust_one(struct pkcs7_message *pkcs7,
> > struct pkcs7_signed_info *sinfo,
> > -   struct key *trust_keyring)
> > +   struct key *trust_keyring,
> > +   enum key_being_used_for usage)
> >  {
> > struct public_key_signature *sig = sinfo->sig;
> > struct x509_certificate *x509, *last = NULL, *p;
> > @@ -112,6 +136,12 @@ static int pkcs7_validate_trust_one(struct 
> > pkcs7_message *pkcs7,
> > return -ENOKEY;
> >  
> >  matched:
> > +   if (!check_codesign_eku(key, usage)) {
> 
> Perhaps this can be a generic check_eku_usage() call, with codesigning as one 
> of the
> things it can check for.
>

Because only codesign EKU be checked now. So I prefer to keep it
as my current implementation until there have other EKU requirement. 
 
> > +   pr_warn("sinfo %u: The signer %x key is not CodeSigning\n",
> > +   sinfo->index, key_serial(key));
> > +   key_put(key);
> > +   return -ENOKEY;
> > +   }
> > ret = verify_signature(key, sig);
> > key_put(key);
> > if (ret < 0) {
> > @@ -156,7 +186,8 @@ static int pkcs7_validate_trust_one(struct 
> > pkcs7_message *pkcs7,
> >   * May also return -ENOMEM.
> >   */
> >  int pkcs7_validate_trust(struct pkcs7_message *pkcs7,
> > -struct key *trust_keyring)
> > +struct key *trust_keyring,
> > +enum key_being_used_for usage)
> 
> Please also update the comment above to mention the `usage` parameter.
> 
> Regards,
> Varad

I will update it to the comment. Thanks!

Joey Lee

> 
> >  {
> > struct pkcs7_signed_info *sinfo;
> > struct x509_certificate *p;
> > @@ -167,7 +198,7 @@ int pkcs7_validate_trust(struct pkcs7_message *pkcs7,
> >

Re: [PATCH v2] irqchip/xilinx: Expose Kconfig option for Zynq/ZynqMP

2021-04-19 Thread Michal Simek




On 4/19/21 9:42 PM, Robert Hancock wrote:
> Previously the XILINX_INTC config option was hidden and only
> auto-selected on the MicroBlaze platform. However, this IP can also be
> used on the Zynq and ZynqMP platforms as a secondary cascaded
> controller. Allow this option to be user-enabled on those platforms.
> 
> Signed-off-by: Robert Hancock 
> ---
>  drivers/irqchip/Kconfig | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
> index 15536e321df5..1020cc5a7800 100644
> --- a/drivers/irqchip/Kconfig
> +++ b/drivers/irqchip/Kconfig
> @@ -279,8 +279,13 @@ config XTENSA_MX
>   select GENERIC_IRQ_EFFECTIVE_AFF_MASK
>  
>  config XILINX_INTC
> - bool
> + bool "Xilinx Interrupt Controller IP"
> + depends on MICROBLAZE || ARCH_ZYNQ || ARCH_ZYNQMP || COMPILE_TEST
>   select IRQ_DOMAIN
> + help
> +   Support for the Xilinx Interrupt Controller IP core.
> +   This is used as a primary controller with MicroBlaze and can also
> +   be used as a secondary chained controller on other platforms.
>  
>  config IRQ_CROSSBAR
>   bool
> 

Acked-by: Michal Simek 

Thanks,
Michal

[PATCH v10 1/7] dt-bindings: PCI: mediatek-gen3: Add YAML schema

2021-04-19 Thread Jianjun Wang

Add YAML schemas documentation for Gen3 PCIe controller on
MediaTek SoCs.

Signed-off-by: Jianjun Wang 
Acked-by: Ryder Lee 
Reviewed-by: Rob Herring 
---
 .../bindings/pci/mediatek-pcie-gen3.yaml  | 181 ++
 1 file changed, 181 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/pci/mediatek-pcie-gen3.yaml

diff --git a/Documentation/devicetree/bindings/pci/mediatek-pcie-gen3.yaml 
b/Documentation/devicetree/bindings/pci/mediatek-pcie-gen3.yaml
new file mode 100644
index ..e7b1f9892da4
--- /dev/null
+++ b/Documentation/devicetree/bindings/pci/mediatek-pcie-gen3.yaml
@@ -0,0 +1,181 @@
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/pci/mediatek-pcie-gen3.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Gen3 PCIe controller on MediaTek SoCs
+
+maintainers:
+  - Jianjun Wang 
+
+description: |+
+  PCIe Gen3 MAC controller for MediaTek SoCs, it supports Gen3 speed
+  and compatible with Gen2, Gen1 speed.
+
+  This PCIe controller supports up to 256 MSI vectors, the MSI hardware
+  block diagram is as follows:
+
++-+
+| GIC |
++-+
+   ^
+   |
+   port->irq
+   |
+   +-+-+-+-+-+-+-+-+
+   |0|1|2|3|4|5|6|7| (PCIe intc)
+   +-+-+-+-+-+-+-+-+
+^ ^   ^
+| |...|
++---+ +--++---+
+|||
+  +-+-+---+--+--+  +-+-+---+--+--+  +-+-+---+--+--+
+  |0|1|...|30|31|  |0|1|...|30|31|  |0|1|...|30|31| (MSI sets)
+  +-+-+---+--+--+  +-+-+---+--+--+  +-+-+---+--+--+
+   ^ ^  ^  ^^ ^  ^  ^^ ^  ^  ^
+   | |  |  || |  |  || |  |  |  (MSI vectors)
+   | |  |  || |  |  || |  |  |
+
+(MSI SET0)   (MSI SET1)  ...   (MSI SET7)
+
+  With 256 MSI vectors supported, the MSI vectors are composed of 8 sets,
+  each set has its own address for MSI message, and supports 32 MSI vectors
+  to generate interrupt.
+
+allOf:
+  - $ref: /schemas/pci/pci-bus.yaml#
+
+properties:
+  compatible:
+const: mediatek,mt8192-pcie
+
+  reg:
+maxItems: 1
+
+  reg-names:
+items:
+  - const: pcie-mac
+
+  interrupts:
+maxItems: 1
+
+  ranges:
+minItems: 1
+maxItems: 8
+
+  resets:
+minItems: 1
+maxItems: 2
+
+  reset-names:
+minItems: 1
+maxItems: 2
+items:
+  - const: phy
+  - const: mac
+
+  clocks:
+maxItems: 6
+
+  clock-names:
+items:
+  - const: pl_250m
+  - const: tl_26m
+  - const: tl_96m
+  - const: tl_32k
+  - const: peri_26m
+  - const: top_133m
+
+  assigned-clocks:
+maxItems: 1
+
+  assigned-clock-parents:
+maxItems: 1
+
+  phys:
+maxItems: 1
+
+  '#interrupt-cells':
+const: 1
+
+  interrupt-controller:
+description: Interrupt controller node for handling legacy PCI interrupts.
+type: object
+properties:
+  '#address-cells':
+const: 0
+  '#interrupt-cells':
+const: 1
+  interrupt-controller: true
+
+required:
+  - '#address-cells'
+  - '#interrupt-cells'
+  - interrupt-controller
+
+additionalProperties: false
+
+required:
+  - compatible
+  - reg
+  - reg-names
+  - interrupts
+  - ranges
+  - clocks
+  - '#interrupt-cells'
+  - interrupt-controller
+
+unevaluatedProperties: false
+
+examples:
+  - |
+#include 
+#include 
+
+bus {
+#address-cells = <2>;
+#size-cells = <2>;
+
+pcie: pcie@1123 {
+compatible = "mediatek,mt8192-pcie";
+device_type = "pci";
+#address-cells = <3>;
+#size-cells = <2>;
+reg = <0x00 0x1123 0x00 0x4000>;
+reg-names = "pcie-mac";
+interrupts = ;
+bus-range = <0x00 0xff>;
+ranges = <0x8200 0x00 0x1200 0x00
+  0x1200 0x00 0x100>;
+clocks = <&infracfg 44>,
+ <&infracfg 40>,
+ <&infracfg 43>,
+ <&infracfg 97>,
+ <&infracfg 99>,
+ <&infracfg 111>;
+clock-names = "pl_250m", "tl_26m", "tl_96m",
+  "tl_32k", "peri_26m", "top_133m";
+assigned-clocks = <&topckgen 50>;
+assigned-clock-parents = <&topckgen 91>;
+
+phys = <&pciephy>;
+phy-names = "pcie-phy";
+
+resets = <&infracfg_rst 2>,
+ <&infracfg_rst 3>;
+reset-names = "phy", "mac";
+
+#interrupt-cells = <1>;
+interrupt-map-mask = <0 0 0 0x7>;
+interrupt-map = <0 0 0 1 &pcie_intc 0>,
+<0 0 0 2 &pcie_intc 1>,
+<0 0 0 3 &pcie_intc 2>,

[PATCH v10 7/7] MAINTAINERS: Add Jianjun Wang as MediaTek PCI co-maintainer

2021-04-19 Thread Jianjun Wang

Update entry for MediaTek PCIe controller, add Jianjun Wang
as MediaTek PCI co-maintainer.

Signed-off-by: Jianjun Wang 
Acked-by: Ryder Lee 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index d92f85ca831d..8050c14e6a7a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13919,6 +13919,7 @@ F:  drivers/pci/controller/dwc/pcie-histb.c
 
 PCIE DRIVER FOR MEDIATEK
 M: Ryder Lee 
+M: Jianjun Wang 
 L: linux-...@vger.kernel.org
 L: linux-media...@lists.infradead.org
 S: Supported
-- 
2.25.1

[PATCH v10 6/7] PCI: mediatek-gen3: Add system PM support

2021-04-19 Thread Jianjun Wang

Add suspend_noirq and resume_noirq callback functions to implement PM
system suspend and resume hooks for the MediaTek Gen3 PCIe controller.

When the system suspends, trigger the PCIe link to enter the L2 state
and pull down the PERST# pin, gating the clocks of the MAC layer, and
then power-off the physical layer to provide power-saving.

When the system resumes, the PCIe link should be re-established and the
related control register values should be restored.

Signed-off-by: Jianjun Wang 
Acked-by: Ryder Lee 
---
 drivers/pci/controller/pcie-mediatek-gen3.c | 113 
 1 file changed, 113 insertions(+)

diff --git a/drivers/pci/controller/pcie-mediatek-gen3.c 
b/drivers/pci/controller/pcie-mediatek-gen3.c
index ee1b51207d11..20165e4a75b2 100644
--- a/drivers/pci/controller/pcie-mediatek-gen3.c
+++ b/drivers/pci/controller/pcie-mediatek-gen3.c
@@ -45,6 +45,9 @@
 #define PCIE_PE_RSTB   BIT(3)
 
 #define PCIE_LTSSM_STATUS_REG  0x150
+#define PCIE_LTSSM_STATE_MASK  GENMASK(28, 24)
+#define PCIE_LTSSM_STATE(val)  ((val & PCIE_LTSSM_STATE_MASK) >> 24)
+#define PCIE_LTSSM_STATE_L2_IDLE   0x14
 
 #define PCIE_LINK_STATUS_REG   0x154
 #define PCIE_PORT_LINKUP   BIT(8)
@@ -73,6 +76,9 @@
 #define PCIE_MSI_SET_ADDR_HI_BASE  0xc80
 #define PCIE_MSI_SET_ADDR_HI_OFFSET0x04
 
+#define PCIE_ICMD_PM_REG   0x198
+#define PCIE_TURN_OFF_LINK BIT(4)
+
 #define PCIE_TRANS_TABLE_BASE_REG  0x800
 #define PCIE_ATR_SRC_ADDR_MSB_OFFSET   0x4
 #define PCIE_ATR_TRSL_ADDR_LSB_OFFSET  0x8
@@ -95,10 +101,12 @@
  * struct mtk_msi_set - MSI information for each set
  * @base: IO mapped register base
  * @msg_addr: MSI message address
+ * @saved_irq_state: IRQ enable state saved at suspend time
  */
 struct mtk_msi_set {
void __iomem *base;
phys_addr_t msg_addr;
+   u32 saved_irq_state;
 };
 
 /**
@@ -112,6 +120,7 @@ struct mtk_msi_set {
  * @clks: PCIe clocks
  * @num_clks: PCIe clocks count for this port
  * @irq: PCIe controller interrupt number
+ * @saved_irq_state: IRQ enable state saved at suspend time
  * @irq_lock: lock protecting IRQ register access
  * @intx_domain: legacy INTx IRQ domain
  * @msi_domain: MSI IRQ domain
@@ -131,6 +140,7 @@ struct mtk_pcie_port {
int num_clks;
 
int irq;
+   u32 saved_irq_state;
raw_spinlock_t irq_lock;
struct irq_domain *intx_domain;
struct irq_domain *msi_domain;
@@ -894,6 +904,108 @@ static int mtk_pcie_remove(struct platform_device *pdev)
return 0;
 }
 
+static void __maybe_unused mtk_pcie_irq_save(struct mtk_pcie_port *port)
+{
+   int i;
+
+   raw_spin_lock(&port->irq_lock);
+
+   port->saved_irq_state = readl_relaxed(port->base + PCIE_INT_ENABLE_REG);
+
+   for (i = 0; i < PCIE_MSI_SET_NUM; i++) {
+   struct mtk_msi_set *msi_set = &port->msi_sets[i];
+
+   msi_set->saved_irq_state = readl_relaxed(msi_set->base +
+  PCIE_MSI_SET_ENABLE_OFFSET);
+   }
+
+   raw_spin_unlock(&port->irq_lock);
+}
+
+static void __maybe_unused mtk_pcie_irq_restore(struct mtk_pcie_port *port)
+{
+   int i;
+
+   raw_spin_lock(&port->irq_lock);
+
+   writel_relaxed(port->saved_irq_state, port->base + PCIE_INT_ENABLE_REG);
+
+   for (i = 0; i < PCIE_MSI_SET_NUM; i++) {
+   struct mtk_msi_set *msi_set = &port->msi_sets[i];
+
+   writel_relaxed(msi_set->saved_irq_state,
+  msi_set->base + PCIE_MSI_SET_ENABLE_OFFSET);
+   }
+
+   raw_spin_unlock(&port->irq_lock);
+}
+
+static int __maybe_unused mtk_pcie_turn_off_link(struct mtk_pcie_port *port)
+{
+   u32 val;
+
+   val = readl_relaxed(port->base + PCIE_ICMD_PM_REG);
+   val |= PCIE_TURN_OFF_LINK;
+   writel_relaxed(val, port->base + PCIE_ICMD_PM_REG);
+
+   /* Check the link is L2 */
+   return readl_poll_timeout(port->base + PCIE_LTSSM_STATUS_REG, val,
+ (PCIE_LTSSM_STATE(val) ==
+  PCIE_LTSSM_STATE_L2_IDLE), 20,
+  50 * USEC_PER_MSEC);
+}
+
+static int __maybe_unused mtk_pcie_suspend_noirq(struct device *dev)
+{
+   struct mtk_pcie_port *port = dev_get_drvdata(dev);
+   int err;
+   u32 val;
+
+   /* Trigger link to L2 state */
+   err = mtk_pcie_turn_off_link(port);
+   if (err) {
+   dev_err(port->dev, "cannot enter L2 state\n");
+   return err;
+   }
+
+   /* Pull down the PERST# pin */
+   val = readl_relaxed(port->base + PCIE_RST_CTRL_REG);
+   val |= PCIE_PE_RSTB;
+   writel_relaxed(val, port->base + PCIE_RST_CTRL_REG);
+
+   dev_dbg(port->dev, "entered L2 states successfully");
+
+   mtk_pcie_irq_save(port);
+   mtk_pcie_power_down(port);
+
+   return 0;
+}
+
+static int __maybe_unused mtk_pcie_resume_noirq(struct

[PATCH v10 5/7] PCI: mediatek-gen3: Add MSI support

2021-04-19 Thread Jianjun Wang

Add MSI support for MediaTek Gen3 PCIe controller.

This PCIe controller supports up to 256 MSI vectors, the MSI hardware
block diagram is as follows:

  +-+
  | GIC |
  +-+
 ^
 |
 port->irq
 |
 +-+-+-+-+-+-+-+-+
 |0|1|2|3|4|5|6|7| (PCIe intc)
 +-+-+-+-+-+-+-+-+
  ^ ^   ^
  | |...|
  +---+ +--++---+
  |||
+-+-+---+--+--+  +-+-+---+--+--+  +-+-+---+--+--+
|0|1|...|30|31|  |0|1|...|30|31|  |0|1|...|30|31| (MSI sets)
+-+-+---+--+--+  +-+-+---+--+--+  +-+-+---+--+--+
 ^ ^  ^  ^^ ^  ^  ^^ ^  ^  ^
 | |  |  || |  |  || |  |  |  (MSI vectors)
 | |  |  || |  |  || |  |  |

  (MSI SET0)   (MSI SET1)  ...   (MSI SET7)

With 256 MSI vectors supported, the MSI vectors are composed of 8 sets,
each set has its own address for MSI message, and supports 32 MSI vectors
to generate interrupt.

Signed-off-by: Jianjun Wang 
Acked-by: Ryder Lee 
Reviewed-by: Marc Zyngier 
---
 drivers/pci/controller/pcie-mediatek-gen3.c | 276 
 1 file changed, 276 insertions(+)

diff --git a/drivers/pci/controller/pcie-mediatek-gen3.c 
b/drivers/pci/controller/pcie-mediatek-gen3.c
index ff91ad587461..ee1b51207d11 100644
--- a/drivers/pci/controller/pcie-mediatek-gen3.c
+++ b/drivers/pci/controller/pcie-mediatek-gen3.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -48,12 +49,29 @@
 #define PCIE_LINK_STATUS_REG   0x154
 #define PCIE_PORT_LINKUP   BIT(8)
 
+#define PCIE_MSI_SET_NUM   8
+#define PCIE_MSI_IRQS_PER_SET  32
+#define PCIE_MSI_IRQS_NUM \
+   (PCIE_MSI_IRQS_PER_SET * PCIE_MSI_SET_NUM)
+
 #define PCIE_INT_ENABLE_REG0x180
+#define PCIE_MSI_ENABLEGENMASK(PCIE_MSI_SET_NUM + 8 - 
1, 8)
+#define PCIE_MSI_SHIFT 8
 #define PCIE_INTX_SHIFT24
 #define PCIE_INTX_ENABLE \
GENMASK(PCIE_INTX_SHIFT + PCI_NUM_INTX - 1, PCIE_INTX_SHIFT)
 
 #define PCIE_INT_STATUS_REG0x184
+#define PCIE_MSI_SET_ENABLE_REG0x190
+#define PCIE_MSI_SET_ENABLEGENMASK(PCIE_MSI_SET_NUM - 1, 0)
+
+#define PCIE_MSI_SET_BASE_REG  0xc00
+#define PCIE_MSI_SET_OFFSET0x10
+#define PCIE_MSI_SET_STATUS_OFFSET 0x04
+#define PCIE_MSI_SET_ENABLE_OFFSET 0x08
+
+#define PCIE_MSI_SET_ADDR_HI_BASE  0xc80
+#define PCIE_MSI_SET_ADDR_HI_OFFSET0x04
 
 #define PCIE_TRANS_TABLE_BASE_REG  0x800
 #define PCIE_ATR_SRC_ADDR_MSB_OFFSET   0x4
@@ -73,6 +91,16 @@
 #define PCIE_ATR_TLP_TYPE_MEM  PCIE_ATR_TLP_TYPE(0)
 #define PCIE_ATR_TLP_TYPE_IO   PCIE_ATR_TLP_TYPE(2)
 
+/**
+ * struct mtk_msi_set - MSI information for each set
+ * @base: IO mapped register base
+ * @msg_addr: MSI message address
+ */
+struct mtk_msi_set {
+   void __iomem *base;
+   phys_addr_t msg_addr;
+};
+
 /**
  * struct mtk_pcie_port - PCIe port information
  * @dev: pointer to PCIe device
@@ -86,6 +114,11 @@
  * @irq: PCIe controller interrupt number
  * @irq_lock: lock protecting IRQ register access
  * @intx_domain: legacy INTx IRQ domain
+ * @msi_domain: MSI IRQ domain
+ * @msi_bottom_domain: MSI IRQ bottom domain
+ * @msi_sets: MSI sets information
+ * @lock: lock protecting IRQ bit map
+ * @msi_irq_in_use: bit map for assigned MSI IRQ
  */
 struct mtk_pcie_port {
struct device *dev;
@@ -100,6 +133,11 @@ struct mtk_pcie_port {
int irq;
raw_spinlock_t irq_lock;
struct irq_domain *intx_domain;
+   struct irq_domain *msi_domain;
+   struct irq_domain *msi_bottom_domain;
+   struct mtk_msi_set msi_sets[PCIE_MSI_SET_NUM];
+   struct mutex lock;
+   DECLARE_BITMAP(msi_irq_in_use, PCIE_MSI_IRQS_NUM);
 };
 
 /**
@@ -196,6 +234,35 @@ static int mtk_pcie_set_trans_table(struct mtk_pcie_port 
*port,
return 0;
 }
 
+static void mtk_pcie_enable_msi(struct mtk_pcie_port *port)
+{
+   int i;
+   u32 val;
+
+   for (i = 0; i < PCIE_MSI_SET_NUM; i++) {
+   struct mtk_msi_set *msi_set = &port->msi_sets[i];
+
+   msi_set->base = port->base + PCIE_MSI_SET_BASE_REG +
+   i * PCIE_MSI_SET_OFFSET;
+   msi_set->msg_addr = port->reg_base + PCIE_MSI_SET_BASE_REG +
+   i * PCIE_MSI_SET_OFFSET;
+
+   /* Configure the MSI capture address */
+   writel_relaxed(lower_32_bits(msi_set->msg_addr), msi_set->base);
+   writel_relaxed(upper_32_bits(msi_set->msg_addr),
+  port->base + PCIE_MSI_SET_ADDR_HI_BASE +
+  i * PCIE_MSI_SET_ADDR_HI_OFFSET);
+   }
+
+   val = readl_relaxed(p

[PATCH v10 3/7] PCI: mediatek-gen3: Add MediaTek Gen3 driver for MT8192

2021-04-19 Thread Jianjun Wang

MediaTek's PCIe host controller has three generation HWs, the new
generation HW is an individual bridge, it supports Gen3 speed and
compatible with Gen2, Gen1 speed.

Add support for new Gen3 controller which can be found on MT8192.

Signed-off-by: Jianjun Wang 
Acked-by: Ryder Lee 
---
 drivers/pci/controller/Kconfig  |  13 +
 drivers/pci/controller/Makefile |   1 +
 drivers/pci/controller/pcie-mediatek-gen3.c | 464 
 3 files changed, 478 insertions(+)
 create mode 100644 drivers/pci/controller/pcie-mediatek-gen3.c

diff --git a/drivers/pci/controller/Kconfig b/drivers/pci/controller/Kconfig
index 5aa8977d7b0f..1e925ac47279 100644
--- a/drivers/pci/controller/Kconfig
+++ b/drivers/pci/controller/Kconfig
@@ -233,6 +233,19 @@ config PCIE_MEDIATEK
  Say Y here if you want to enable PCIe controller support on
  MediaTek SoCs.
 
+config PCIE_MEDIATEK_GEN3
+   tristate "MediaTek Gen3 PCIe controller"
+   depends on ARCH_MEDIATEK || COMPILE_TEST
+   depends on PCI_MSI_IRQ_DOMAIN
+   help
+ Adds support for PCIe Gen3 MAC controller for MediaTek SoCs.
+ This PCIe controller is compatible with Gen3, Gen2 and Gen1 speed,
+ and support up to 256 MSI interrupt numbers for
+ multi-function devices.
+
+ Say Y here if you want to enable Gen3 PCIe controller support on
+ MediaTek SoCs.
+
 config VMD
depends on PCI_MSI && X86_64 && SRCU
tristate "Intel Volume Management Device Driver"
diff --git a/drivers/pci/controller/Makefile b/drivers/pci/controller/Makefile
index e4559f2182f2..579973327815 100644
--- a/drivers/pci/controller/Makefile
+++ b/drivers/pci/controller/Makefile
@@ -27,6 +27,7 @@ obj-$(CONFIG_PCIE_ROCKCHIP) += pcie-rockchip.o
 obj-$(CONFIG_PCIE_ROCKCHIP_EP) += pcie-rockchip-ep.o
 obj-$(CONFIG_PCIE_ROCKCHIP_HOST) += pcie-rockchip-host.o
 obj-$(CONFIG_PCIE_MEDIATEK) += pcie-mediatek.o
+obj-$(CONFIG_PCIE_MEDIATEK_GEN3) += pcie-mediatek-gen3.o
 obj-$(CONFIG_PCIE_MICROCHIP_HOST) += pcie-microchip-host.o
 obj-$(CONFIG_VMD) += vmd.o
 obj-$(CONFIG_PCIE_BRCMSTB) += pcie-brcmstb.o
diff --git a/drivers/pci/controller/pcie-mediatek-gen3.c 
b/drivers/pci/controller/pcie-mediatek-gen3.c
new file mode 100644
index ..3546e53b3c85
--- /dev/null
+++ b/drivers/pci/controller/pcie-mediatek-gen3.c
@@ -0,0 +1,464 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * MediaTek PCIe host controller driver.
+ *
+ * Copyright (c) 2020 MediaTek Inc.
+ * Author: Jianjun Wang 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "../pci.h"
+
+#define PCIE_SETTING_REG   0x80
+#define PCIE_PCI_IDS_1 0x9c
+#define PCI_CLASS(class)   (class << 8)
+#define PCIE_RC_MODE   BIT(0)
+
+#define PCIE_CFGNUM_REG0x140
+#define PCIE_CFG_DEVFN(devfn)  ((devfn) & GENMASK(7, 0))
+#define PCIE_CFG_BUS(bus)  (((bus) << 8) & GENMASK(15, 8))
+#define PCIE_CFG_BYTE_EN(bytes)(((bytes) << 16) & GENMASK(19, 
16))
+#define PCIE_CFG_FORCE_BYTE_EN BIT(20)
+#define PCIE_CFG_OFFSET_ADDR   0x1000
+#define PCIE_CFG_HEADER(bus, devfn) \
+   (PCIE_CFG_BUS(bus) | PCIE_CFG_DEVFN(devfn))
+
+#define PCIE_RST_CTRL_REG  0x148
+#define PCIE_MAC_RSTB  BIT(0)
+#define PCIE_PHY_RSTB  BIT(1)
+#define PCIE_BRG_RSTB  BIT(2)
+#define PCIE_PE_RSTB   BIT(3)
+
+#define PCIE_LTSSM_STATUS_REG  0x150
+
+#define PCIE_LINK_STATUS_REG   0x154
+#define PCIE_PORT_LINKUP   BIT(8)
+
+#define PCIE_TRANS_TABLE_BASE_REG  0x800
+#define PCIE_ATR_SRC_ADDR_MSB_OFFSET   0x4
+#define PCIE_ATR_TRSL_ADDR_LSB_OFFSET  0x8
+#define PCIE_ATR_TRSL_ADDR_MSB_OFFSET  0xc
+#define PCIE_ATR_TRSL_PARAM_OFFSET 0x10
+#define PCIE_ATR_TLB_SET_OFFSET0x20
+
+#define PCIE_MAX_TRANS_TABLES  8
+#define PCIE_ATR_ENBIT(0)
+#define PCIE_ATR_SIZE(size) \
+   (size) - 1) << 1) & GENMASK(6, 1)) | PCIE_ATR_EN)
+#define PCIE_ATR_ID(id)((id) & GENMASK(3, 0))
+#define PCIE_ATR_TYPE_MEM  PCIE_ATR_ID(0)
+#define PCIE_ATR_TYPE_IO   PCIE_ATR_ID(1)
+#define PCIE_ATR_TLP_TYPE(type)(((type) << 16) & GENMASK(18, 
16))
+#define PCIE_ATR_TLP_TYPE_MEM  PCIE_ATR_TLP_TYPE(0)
+#define PCIE_ATR_TLP_TYPE_IO   PCIE_ATR_TLP_TYPE(2)
+
+/**
+ * struct mtk_pcie_port - PCIe port information
+ * @dev: pointer to PCIe device
+ * @base: IO mapped register base
+ * @reg_base: physical register base
+ * @mac_reset: MAC reset control
+ * @phy_reset: PHY reset control
+ * @phy: PHY controller block
+ * @clks: PCIe clocks
+ * @num_clks: PCIe clocks count for this port
+ */
+struct mtk_pcie_port {
+   struct device *dev;
+   void __iomem *base;
+   phys_a

[PATCH v10 4/7] PCI: mediatek-gen3: Add INTx support

2021-04-19 Thread Jianjun Wang

Add INTx support for MediaTek Gen3 PCIe controller.

Signed-off-by: Jianjun Wang 
Acked-by: Ryder Lee 
Reviewed-by: Marc Zyngier 
---
 drivers/pci/controller/pcie-mediatek-gen3.c | 172 
 1 file changed, 172 insertions(+)

diff --git a/drivers/pci/controller/pcie-mediatek-gen3.c 
b/drivers/pci/controller/pcie-mediatek-gen3.c
index 3546e53b3c85..ff91ad587461 100644
--- a/drivers/pci/controller/pcie-mediatek-gen3.c
+++ b/drivers/pci/controller/pcie-mediatek-gen3.c
@@ -9,6 +9,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -45,6 +48,13 @@
 #define PCIE_LINK_STATUS_REG   0x154
 #define PCIE_PORT_LINKUP   BIT(8)
 
+#define PCIE_INT_ENABLE_REG0x180
+#define PCIE_INTX_SHIFT24
+#define PCIE_INTX_ENABLE \
+   GENMASK(PCIE_INTX_SHIFT + PCI_NUM_INTX - 1, PCIE_INTX_SHIFT)
+
+#define PCIE_INT_STATUS_REG0x184
+
 #define PCIE_TRANS_TABLE_BASE_REG  0x800
 #define PCIE_ATR_SRC_ADDR_MSB_OFFSET   0x4
 #define PCIE_ATR_TRSL_ADDR_LSB_OFFSET  0x8
@@ -73,6 +83,9 @@
  * @phy: PHY controller block
  * @clks: PCIe clocks
  * @num_clks: PCIe clocks count for this port
+ * @irq: PCIe controller interrupt number
+ * @irq_lock: lock protecting IRQ register access
+ * @intx_domain: legacy INTx IRQ domain
  */
 struct mtk_pcie_port {
struct device *dev;
@@ -83,6 +96,10 @@ struct mtk_pcie_port {
struct phy *phy;
struct clk_bulk_data *clks;
int num_clks;
+
+   int irq;
+   raw_spinlock_t irq_lock;
+   struct irq_domain *intx_domain;
 };
 
 /**
@@ -198,6 +215,11 @@ static int mtk_pcie_startup_port(struct mtk_pcie_port 
*port)
val |= PCI_CLASS(PCI_CLASS_BRIDGE_PCI << 8);
writel_relaxed(val, port->base + PCIE_PCI_IDS_1);
 
+   /* Mask all INTx interrupts */
+   val = readl_relaxed(port->base + PCIE_INT_ENABLE_REG);
+   val &= ~PCIE_INTX_ENABLE;
+   writel_relaxed(val, port->base + PCIE_INT_ENABLE_REG);
+
/* Assert all reset signals */
val = readl_relaxed(port->base + PCIE_RST_CTRL_REG);
val |= PCIE_MAC_RSTB | PCIE_PHY_RSTB | PCIE_BRG_RSTB | PCIE_PE_RSTB;
@@ -261,6 +283,150 @@ static int mtk_pcie_startup_port(struct mtk_pcie_port 
*port)
return 0;
 }
 
+static int mtk_pcie_set_affinity(struct irq_data *data,
+const struct cpumask *mask, bool force)
+{
+   return -EINVAL;
+}
+
+static void mtk_intx_mask(struct irq_data *data)
+{
+   struct mtk_pcie_port *port = irq_data_get_irq_chip_data(data);
+   unsigned long flags;
+   u32 val;
+
+   raw_spin_lock_irqsave(&port->irq_lock, flags);
+   val = readl_relaxed(port->base + PCIE_INT_ENABLE_REG);
+   val &= ~BIT(data->hwirq + PCIE_INTX_SHIFT);
+   writel_relaxed(val, port->base + PCIE_INT_ENABLE_REG);
+   raw_spin_unlock_irqrestore(&port->irq_lock, flags);
+}
+
+static void mtk_intx_unmask(struct irq_data *data)
+{
+   struct mtk_pcie_port *port = irq_data_get_irq_chip_data(data);
+   unsigned long flags;
+   u32 val;
+
+   raw_spin_lock_irqsave(&port->irq_lock, flags);
+   val = readl_relaxed(port->base + PCIE_INT_ENABLE_REG);
+   val |= BIT(data->hwirq + PCIE_INTX_SHIFT);
+   writel_relaxed(val, port->base + PCIE_INT_ENABLE_REG);
+   raw_spin_unlock_irqrestore(&port->irq_lock, flags);
+}
+
+/**
+ * mtk_intx_eoi() - Clear INTx IRQ status at the end of interrupt
+ * @data: pointer to chip specific data
+ *
+ * As an emulated level IRQ, its interrupt status will remain
+ * until the corresponding de-assert message is received; hence that
+ * the status can only be cleared when the interrupt has been serviced.
+ */
+static void mtk_intx_eoi(struct irq_data *data)
+{
+   struct mtk_pcie_port *port = irq_data_get_irq_chip_data(data);
+   unsigned long hwirq;
+
+   hwirq = data->hwirq + PCIE_INTX_SHIFT;
+   writel_relaxed(BIT(hwirq), port->base + PCIE_INT_STATUS_REG);
+}
+
+static struct irq_chip mtk_intx_irq_chip = {
+   .irq_mask   = mtk_intx_mask,
+   .irq_unmask = mtk_intx_unmask,
+   .irq_eoi= mtk_intx_eoi,
+   .irq_set_affinity   = mtk_pcie_set_affinity,
+   .name   = "INTx",
+};
+
+static int mtk_pcie_intx_map(struct irq_domain *domain, unsigned int irq,
+irq_hw_number_t hwirq)
+{
+   irq_set_chip_data(irq, domain->host_data);
+   irq_set_chip_and_handler_name(irq, &mtk_intx_irq_chip,
+ handle_fasteoi_irq, "INTx");
+   return 0;
+}
+
+static const struct irq_domain_ops intx_domain_ops = {
+   .map = mtk_pcie_intx_map,
+};
+
+static int mtk_pcie_init_irq_domains(struct mtk_pcie_port *port)
+{
+   struct device *dev = port->dev;
+   struct device_node *intc_node, *node = dev->of_node;
+
+   raw_spin_lock_init(&port->irq_lock);
+
+   /* Setup INTx */
+

[PATCH v10 2/7] PCI: Export pci_pio_to_address() for module use

2021-04-19 Thread Jianjun Wang

This interface will be used by PCI host drivers for PIO translation,
export it to support compiling those drivers as kernel modules.

Signed-off-by: Jianjun Wang 
Acked-by: Bjorn Helgaas 
---
 drivers/pci/pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 16a17215f633..09f1714e23be 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4052,6 +4052,7 @@ phys_addr_t pci_pio_to_address(unsigned long pio)
 
return address;
 }
+EXPORT_SYMBOL_GPL(pci_pio_to_address);
 
 unsigned long __weak pci_address_to_pio(phys_addr_t address)
 {
-- 
2.25.1

[PATCH v10 0/7] PCI: mediatek: Add new generation controller support

2021-04-19 Thread Jianjun Wang

From: mtk15901 

These series patches add pcie-mediatek-gen3.c and dt-bindings file to
support new generation PCIe controller.

Changes in v10:
1. Fix the subject line format in commit message;
2. Use EXPORT_SYMBOL_GPL() to export pci_pio_to_address().

Changes in v9:
1. Use mtk_pcie_parse_port() to get the hw resources;
2. Remove unnecessary logs;
3. Add local IRQ enable status save/restore instead of
   the enable/disable callbacks;
4. Fix typos.

Changes in v8:
1. Add irq_clock to protect IRQ register access;
2. Mask all INTx interrupt when startup port;
3. Remove activate/deactivate callbacks from bottom_domain_ops;
4. Add unmask/mask callbacks in mtk_msi_bottom_irq_chip;
5. Add property information for reg-names.

Changes in v7:
1. Split the driver patch to core PCIe, INTx, MSI and PM patches;
2. Reshape MSI init and handle flow,
   use msi_bottom_domain to cover all sets;
3. Replace readl/writel with their relaxed version;
4. Add MSI description in binding document;
5. Add pl_250m clock in binding document.

Changes in v6:
1. Export pci_pio_to_address() to support compiling as kernel module;
2. Replace usleep_range(100 * 1000, 120 * 1000) with msleep(100);
3. Replace dev_notice with dev_err;
4. Fix MSI get hwirq flow;
5. Fix warning for possible recursive locking in mtk_pcie_set_affinity.

Changes in v5:
1. Remove unused macros
2. Modify the config read/write callbacks, set the config byte field
   in TLP header and use pci_generic_config_read32/write32
   to access the config space
3. Fix the settings of translation window, both MEM and IO regions
   works properly
4. Fix typos

Changes in v4:
1. Fix PCIe power up/down flow
2. Use "mac" and "phy" for reset names
3. Add clock names
4. Fix the variables type

Changes in v3:
1. Remove standard property in binding document
2. Return error number when get_optional* API throws an error
3. Use the bulk clk APIs

Changes in v2:
1. Fix the typo of dt-bindings patch
2. Remove the unnecessary properties in binding document
3. dispos the irq mappings of msi top domain when irq teardown

Jianjun Wang (7):
  dt-bindings: PCI: mediatek-gen3: Add YAML schema
  PCI: Export pci_pio_to_address() for module use
  PCI: mediatek-gen3: Add MediaTek Gen3 driver for MT8192
  PCI: mediatek-gen3: Add INTx support
  PCI: mediatek-gen3: Add MSI support
  PCI: mediatek-gen3: Add system PM support
  MAINTAINERS: Add Jianjun Wang as MediaTek PCI co-maintainer

 .../bindings/pci/mediatek-pcie-gen3.yaml  |  181 +++
 MAINTAINERS   |1 +
 drivers/pci/controller/Kconfig|   13 +
 drivers/pci/controller/Makefile   |1 +
 drivers/pci/controller/pcie-mediatek-gen3.c   | 1025 +
 drivers/pci/pci.c |1 +
 6 files changed, 1222 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/pci/mediatek-pcie-gen3.yaml
 create mode 100644 drivers/pci/controller/pcie-mediatek-gen3.c

-- 
2.25.1

Re: [PATCH 00/13] [RFC] Rust support

2021-04-19 Thread Willy Tarreau

On Tue, Apr 20, 2021 at 07:56:18AM +0200, Greg Kroah-Hartman wrote:
> I would LOVE it if some "executives" would see the above presentations,
> because then they would maybe actually fund developers to fix bugs and
> maintain the kernel code, instead of only allowing them to add new
> features.
> 
> Seriously, that's the real problem, that Dmitry's work has exposed, the
> lack of people allowed to do this type of bugfixing and maintenance on
> company time, for something that the company relies on, is a huge issue.
> "executives" feel that they are willing to fund the initial work and
> then "throw it over the wall to the community" once it is merged, and
> then they can forget about it as "the community" will maintain it for
> them for free.  And that's a lie, as Dmitry's work shows.

That's sadly the eternal situation, and I'm suspecting that software
development and maintenance is not identified as a requirement for a
large number of hardware vendors, especially on the consumer side where
margins are lower. A contractor is paid to develop a driver, *sometimes*
to try to mainline it (and the later they engage with the community, the
longer it takes in round trips), and once the code finally gets merged,
all the initial budget is depleted and no more software work will be
done.

Worse, we could imagine kicking unmaintained drivers faster off the
tree, but that would actually help these unscrupulous vendors by
forcing their customers to switch to the new model :-/  And most of
them wouldn't care either if their contributions were refused based
on their track record of not maintaining their code, since they often
see this as a convenience to please their customers and not something
they need (after all, relying on a bogus and vulnerable BSP has never
prevented from selling a device, quite the opposite).

In short, there is a parallel universe where running highly bogus and
vulnerable out-of-tree code seems like the norm and where there is no
sort of care for what is mainlined as it's possibly just made to look
"cool".

We also need to recognize that it's expectable that some vendors are
not willing to engage on supporting a driver for a decade if they
expect their device to last 5 years only, and maybe we should make
some rules clear about mainlining drivers and what to expect for
users (in which case the end of support would be clear and nobody
would be surprised if the driver is removed at the end of its
maintenance, barring a switch to a community maintainer).

Just my two cents,
Willy

Re: [PATCH] pci: Rename pci_dev->untrusted to pci_dev->external

2021-04-19 Thread Christoph Hellwig

On Mon, Apr 19, 2021 at 05:30:49PM -0700, Rajat Jain wrote:
> The current flag name "untrusted" is not correct as it is populated
> using the firmware property "external-facing" for the parent ports. In
> other words, the firmware only says which ports are external facing, so
> the field really identifies the devices as external (vs internal).
> 
> Only field renaming. No functional change intended.

I don't think this is a good idea.  First the field should have been
added to the generic struct device as requested multiple times before.
Right now this requires horrible hacks in the IOMMU code to get at the
pci_dev, and also doesn't scale to various other potential users.

Second the untrusted is objectively a better name.  Because untrusted
is how we treat the device, which is what mattes.  External is just
how we come to that conclusion.

Re: [PATCH] KVM: Boost vCPU candidiate in user mode which is delivering interrupt

2021-04-19 Thread Wanpeng Li

On Tue, 20 Apr 2021 at 14:02, Wanpeng Li  wrote:
>
> On Tue, 20 Apr 2021 at 00:59, Paolo Bonzini  wrote:
> >
> > On 19/04/21 18:32, Sean Christopherson wrote:
> > > If false positives are a big concern, what about adding another pass to 
> > > the loop
> > > and only yielding to usermode vCPUs with interrupts in the second full 
> > > pass?
> > > I.e. give vCPUs that are already in kernel mode priority, and only yield 
> > > to
> > > handle an interrupt if there are no vCPUs in kernel mode.
> > >
> > > kvm_arch_dy_runnable() pulls in pv_unhalted, which seems like a good 
> > > thing.
> >
> > pv_unhalted won't help if you're waiting for a kernel spinlock though,
> > would it?  Doing two passes (or looking for a "best" candidate that
> > prefers kernel mode vCPUs to user mode vCPUs waiting for an interrupt)
> > seems like the best choice overall.
>
> How about something like this:

Sorry, should be the codes below:

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6b4dd95..9bc5f87 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -325,10 +325,12 @@ struct kvm_vcpu {
  * Cpu relax intercept or pause loop exit optimization
  * in_spin_loop: set when a vcpu does a pause loop exit
  *  or cpu relax intercepted.
+ * pending_interrupt: set when a vcpu waiting for an interrupt
  * dy_eligible: indicates whether vcpu is eligible for directed yield.
  */
 struct {
 bool in_spin_loop;
+bool pending_interrupt;
 bool dy_eligible;
 } spin_loop;
 #endif
@@ -1427,6 +1429,12 @@ static inline void
kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
 {
 vcpu->spin_loop.in_spin_loop = val;
 }
+
+static inline void kvm_vcpu_set_pending_interrupt(struct kvm_vcpu
*vcpu, bool val)
+{
+vcpu->spin_loop.pending_interrupt = val;
+}
+
 static inline void kvm_vcpu_set_dy_eligible(struct kvm_vcpu *vcpu, bool val)
 {
 vcpu->spin_loop.dy_eligible = val;
@@ -1438,6 +1446,10 @@ static inline void
kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
 {
 }

+static inline void kvm_vcpu_set_pending_interrupt(struct kvm_vcpu
*vcpu, bool val)
+{
+}
+
 static inline void kvm_vcpu_set_dy_eligible(struct kvm_vcpu *vcpu, bool val)
 {
 }
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 529cff1..bf6f1ec 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -410,6 +410,7 @@ static void kvm_vcpu_init(struct kvm_vcpu *vcpu,
struct kvm *kvm, unsigned id)
 INIT_LIST_HEAD(&vcpu->blocked_vcpu_list);

 kvm_vcpu_set_in_spin_loop(vcpu, false);
+kvm_vcpu_set_pending_interrupt(vcpu, false);
 kvm_vcpu_set_dy_eligible(vcpu, false);
 vcpu->preempted = false;
 vcpu->ready = false;
@@ -3079,14 +3080,17 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_yield_to);
  * Helper that checks whether a VCPU is eligible for directed yield.
  * Most eligible candidate to yield is decided by following heuristics:
  *
- *  (a) VCPU which has not done pl-exit or cpu relax intercepted recently
- *  (preempted lock holder), indicated by @in_spin_loop.
- *  Set at the beginning and cleared at the end of interception/PLE handler.
+ *  (a) VCPU which has not done pl-exit or cpu relax intercepted and is not
+ *  waiting for an interrupt recently (preempted lock holder). The former
+ *  one is indicated by @in_spin_loop, set at the beginning and cleared at
+ *  the end of interception/PLE handler. The later one is indicated by
+ *  @pending_interrupt, set when interrupt is delivering and cleared at
+ *  the end of directed yield.
  *
- *  (b) VCPU which has done pl-exit/ cpu relax intercepted but did not get
- *  chance last time (mostly it has become eligible now since we have probably
- *  yielded to lockholder in last iteration. This is done by toggling
- *  @dy_eligible each time a VCPU checked for eligibility.)
+ *  (b) VCPU which has done pl-exit/ cpu relax intercepted or is waiting for
+ *  interrupt but did not get chance last time (mostly it has become eligible
+ *  now since we have probably yielded to lockholder in last iteration. This
+ *  is done by toggling @dy_eligible each time a VCPU checked for eligibility.)
  *
  *  Yielding to a recently pl-exited/cpu relax intercepted VCPU before yielding
  *  to preempted lock-holder could result in wrong VCPU selection and CPU
@@ -3102,10 +3106,10 @@ static bool
kvm_vcpu_eligible_for_directed_yield(struct kvm_vcpu *vcpu)
 #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
 bool eligible;

-eligible = !vcpu->spin_loop.in_spin_loop ||
+eligible = !(vcpu->spin_loop.in_spin_loop ||
vcpu->spin_loop.pending_interrupt) ||
 vcpu->spin_loop.dy_eligible;

-if (vcpu->spin_loop.in_spin_loop)
+if (vcpu->spin_loop.in_spin_loop || vcpu->spin_loop.pending_interrupt)
 kvm_vcpu_set_dy_eligible(vcpu, !vcpu->spin_loop.dy_eligible);

 return eligible;
@@ -3137,6 +3141,16 @@ static bool vcpu_dy_runnable(struct kvm_vcpu *vcpu)
 return false;
 }

+static bool kvm_has_int

Re: [PATCHv2 2/2] iommu/arm-smmu-qcom: Move the adreno smmu specific impl earlier

2021-04-19 Thread Sai Prakash Ranjan


On 2021-04-19 20:08, Bjorn Andersson wrote:

On Fri 26 Feb 03:55 CST 2021, Sai Prakash Ranjan wrote:


Adreno(GPU) SMMU and APSS(Application Processor SubSystem) SMMU
both implement "arm,mmu-500" in some QTI SoCs and to run through
adreno smmu specific implementation such as enabling split pagetables
support, we need to match the "qcom,adreno-smmu" compatible first
before apss smmu or else we will be running apps smmu implementation
for adreno smmu and the additional features for adreno smmu is never
set. For ex: we have "qcom,sc7280-smmu-500" compatible for both apps
and adreno smmu implementing "arm,mmu-500", so the adreno smmu
implementation is never reached because the current sequence checks
for apps smmu compatible(qcom,sc7280-smmu-500) first and runs that
specific impl and we never reach adreno smmu specific implementation.

Suggested-by: Akhil P Oommen 
Signed-off-by: Sai Prakash Ranjan 


Sorry for taking my time thinking about this.

Reviewed-by: Bjorn Andersson 



No worries, thanks Bjorn.

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation

[PATCHv3 2/2] iommu/arm-smmu-qcom: Move the adreno smmu specific impl earlier

2021-04-19 Thread Sai Prakash Ranjan

Adreno(GPU) SMMU and APSS(Application Processor SubSystem) SMMU
both implement "arm,mmu-500" in some QTI SoCs and to run through
adreno smmu specific implementation such as enabling split pagetables
support, we need to match the "qcom,adreno-smmu" compatible first
before apss smmu or else we will be running apps smmu implementation
for adreno smmu and the additional features for adreno smmu is never
set. For ex: we have "qcom,sc7280-smmu-500" compatible for both apps
and adreno smmu implementing "arm,mmu-500", so the adreno smmu
implementation is never reached because the current sequence checks
for apps smmu compatible(qcom,sc7280-smmu-500) first and runs that
specific impl and we never reach adreno smmu specific implementation.

Suggested-by: Akhil P Oommen 
Signed-off-by: Sai Prakash Ranjan 
Reviewed-by: Bjorn Andersson 
Acked-by: Jordan Crouse 
---
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index bea3ee0dabc2..03f048aebb80 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -345,11 +345,17 @@ struct arm_smmu_device *qcom_smmu_impl_init(struct 
arm_smmu_device *smmu)
 {
const struct device_node *np = smmu->dev->of_node;
 
-   if (of_match_node(qcom_smmu_impl_of_match, np))
-   return qcom_smmu_create(smmu, &qcom_smmu_impl);
-
+   /*
+* Do not change this order of implementation, i.e., first adreno
+* smmu impl and then apss smmu since we can have both implementing
+* arm,mmu-500 in which case we will miss setting adreno smmu specific
+* features if the order is changed.
+*/
if (of_device_is_compatible(np, "qcom,adreno-smmu"))
return qcom_smmu_create(smmu, &qcom_adreno_smmu_impl);
 
+   if (of_match_node(qcom_smmu_impl_of_match, np))
+   return qcom_smmu_create(smmu, &qcom_smmu_impl);
+
return smmu;
 }
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

[PATCHv3 0/2] iommu/arm-smmu-qcom: Add SC7280 support

2021-04-19 Thread Sai Prakash Ranjan

Patch 1 adds the sc7280 smmu compatible.
Patch 2 moves the adreno smmu check before apss smmu to enable
adreno smmu specific implementation.

Note that dt-binding for sc7280 is already merged.

Changes in v3:
 * Collect acks and reviews
 * Rebase on top of for-joerg/arm-smmu/updates

Changes in v2:
 * Add a comment to make sure this order is not changed in future (Jordan)

Sai Prakash Ranjan (2):
  iommu/arm-smmu-qcom: Add SC7280 SMMU compatible
  iommu/arm-smmu-qcom: Move the adreno smmu specific impl earlier

 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

[PATCHv3 1/2] iommu/arm-smmu-qcom: Add SC7280 SMMU compatible

2021-04-19 Thread Sai Prakash Ranjan

Add compatible for SC7280 SMMU to use the Qualcomm Technologies, Inc.
specific implementation.

Signed-off-by: Sai Prakash Ranjan 
---
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 98b3a1c2a181..bea3ee0dabc2 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -166,6 +166,7 @@ static const struct of_device_id 
qcom_smmu_client_of_match[] __maybe_unused = {
{ .compatible = "qcom,mdss" },
{ .compatible = "qcom,sc7180-mdss" },
{ .compatible = "qcom,sc7180-mss-pil" },
+   { .compatible = "qcom,sc7280-mdss" },
{ .compatible = "qcom,sc8180x-mdss" },
{ .compatible = "qcom,sdm845-mdss" },
{ .compatible = "qcom,sdm845-mss-pil" },
@@ -330,6 +331,7 @@ static struct arm_smmu_device *qcom_smmu_create(struct 
arm_smmu_device *smmu,
 static const struct of_device_id __maybe_unused qcom_smmu_impl_of_match[] = {
{ .compatible = "qcom,msm8998-smmu-v2" },
{ .compatible = "qcom,sc7180-smmu-500" },
+   { .compatible = "qcom,sc7280-smmu-500" },
{ .compatible = "qcom,sc8180x-smmu-500" },
{ .compatible = "qcom,sdm630-smmu-v2" },
{ .compatible = "qcom,sdm845-smmu-500" },
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

Re: [PATCH] KVM: Boost vCPU candidiate in user mode which is delivering interrupt

2021-04-19 Thread Wanpeng Li

On Tue, 20 Apr 2021 at 00:59, Paolo Bonzini  wrote:
>
> On 19/04/21 18:32, Sean Christopherson wrote:
> > If false positives are a big concern, what about adding another pass to the 
> > loop
> > and only yielding to usermode vCPUs with interrupts in the second full pass?
> > I.e. give vCPUs that are already in kernel mode priority, and only yield to
> > handle an interrupt if there are no vCPUs in kernel mode.
> >
> > kvm_arch_dy_runnable() pulls in pv_unhalted, which seems like a good thing.
>
> pv_unhalted won't help if you're waiting for a kernel spinlock though,
> would it?  Doing two passes (or looking for a "best" candidate that
> prefers kernel mode vCPUs to user mode vCPUs waiting for an interrupt)
> seems like the best choice overall.

How about something like this:

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6b4dd95..8ba50be 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -325,10 +325,12 @@ struct kvm_vcpu {
  * Cpu relax intercept or pause loop exit optimization
  * in_spin_loop: set when a vcpu does a pause loop exit
  *  or cpu relax intercepted.
+ * pending_interrupt: set when a vcpu waiting for an interrupt
  * dy_eligible: indicates whether vcpu is eligible for directed yield.
  */
 struct {
 bool in_spin_loop;
+bool pending_interrupt;
 bool dy_eligible;
 } spin_loop;
 #endif
@@ -1427,6 +1429,12 @@ static inline void
kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
 {
 vcpu->spin_loop.in_spin_loop = val;
 }
+
+static inline void kvm_vcpu_set_pending_interrupt(struct kvm_vcpu
*vcpu, bool val)
+{
+vcpu->spin_loop.pending__interrupt = val;
+}
+
 static inline void kvm_vcpu_set_dy_eligible(struct kvm_vcpu *vcpu, bool val)
 {
 vcpu->spin_loop.dy_eligible = val;
@@ -1438,6 +1446,10 @@ static inline void
kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
 {
 }

+static inline void kvm_vcpu_set_pending_interrupt(struct kvm_vcpu
*vcpu, bool val)
+{
+}
+
 static inline void kvm_vcpu_set_dy_eligible(struct kvm_vcpu *vcpu, bool val)
 {
 }
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 529cff1..42e0255 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -410,6 +410,7 @@ static void kvm_vcpu_init(struct kvm_vcpu *vcpu,
struct kvm *kvm, unsigned id)
 INIT_LIST_HEAD(&vcpu->blocked_vcpu_list);

 kvm_vcpu_set_in_spin_loop(vcpu, false);
+kvm_vcpu_set_pending_interrupt(vcpu, false);
 kvm_vcpu_set_dy_eligible(vcpu, false);
 vcpu->preempted = false;
 vcpu->ready = false;
@@ -3079,14 +3080,17 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_yield_to);
  * Helper that checks whether a VCPU is eligible for directed yield.
  * Most eligible candidate to yield is decided by following heuristics:
  *
- *  (a) VCPU which has not done pl-exit or cpu relax intercepted recently
- *  (preempted lock holder), indicated by @in_spin_loop.
- *  Set at the beginning and cleared at the end of interception/PLE handler.
+ *  (a) VCPU which has not done pl-exit or cpu relax intercepted and is not
+ *  waiting for an interrupt recently (preempted lock holder). The former
+ *  one is indicated by @in_spin_loop, set at the beginning and cleared at
+ *  the end of interception/PLE handler. The later one is indicated by
+ *  @pending_interrupt, set when interrupt is delivering and cleared at
+ *  the end of directed yield.
  *
- *  (b) VCPU which has done pl-exit/ cpu relax intercepted but did not get
- *  chance last time (mostly it has become eligible now since we have probably
- *  yielded to lockholder in last iteration. This is done by toggling
- *  @dy_eligible each time a VCPU checked for eligibility.)
+ *  (b) VCPU which has done pl-exit/ cpu relax intercepted or is waiting for
+ *  interrupt but did not get chance last time (mostly it has become eligible
+ *  now since we have probably yielded to lockholder in last iteration. This
+ *  is done by toggling @dy_eligible each time a VCPU checked for eligibility.)
  *
  *  Yielding to a recently pl-exited/cpu relax intercepted VCPU before yielding
  *  to preempted lock-holder could result in wrong VCPU selection and CPU
@@ -3102,10 +3106,10 @@ static bool
kvm_vcpu_eligible_for_directed_yield(struct kvm_vcpu *vcpu)
 #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
 bool eligible;

-eligible = !vcpu->spin_loop.in_spin_loop ||
+eligible = !(vcpu->spin_loop.in_spin_loop ||
vcpu->spin_loop.has_interrupt) ||
 vcpu->spin_loop.dy_eligible;

-if (vcpu->spin_loop.in_spin_loop)
+if (vcpu->spin_loop.in_spin_loop || vcpu->spin_loop.has_interrupt)
 kvm_vcpu_set_dy_eligible(vcpu, !vcpu->spin_loop.dy_eligible);

 return eligible;
@@ -3137,6 +3141,16 @@ static bool vcpu_dy_runnable(struct kvm_vcpu *vcpu)
 return false;
 }

+static bool kvm_has_interrupt_delivery(struct kvm_vcpu *vcpu)
+{
+if (vcpu_dy_runnable(vcpu)) {
+kvm_vcpu_set_pending_interrupt(vcpu, true);
+return t

Re: Enabling pmbus power control

2021-04-19 Thread Guenter Roeck

On 4/19/21 10:50 PM, Zev Weiss wrote:
[ ... ]

> I had a glance at the enclosure driver; it looks pretty geared toward 
> SES-like things (drivers/scsi/ses.c being its only usage I can see in the 
> kernel at the moment) and while it could perhaps be pressed into working for 
> this it seems like it would probably drag in a fair amount of boilerplate and 
> result in a somewhat gratuitously confusing driver arrangement (calling the 
> things involved in the cases we're looking at "enclosures" seems like a bit 
> of a stretch).
> 
> As an alternative, would something like the patch below be more along the 
> lines of what you're suggesting?  And if so, would it make sense to 
> generalize it into something like 'pmbus-switch.c' and add a 
> PMBUS_HAVE_POWERSWITCH functionality bit or similar in the pmbus code instead 
> of hardcoding it for only LM25066 support?
> 
> 

No. Don't access pmbus functions from outside drivers/hwmon/pmbus.

I used to be opposed to function export restrictions (aka export namespaces),
but you are making a good case that we need to introduce them for pmbus
functions.

Guenter

Re: [PATCH 00/13] [RFC] Rust support

2021-04-19 Thread Greg Kroah-Hartman

On Mon, Apr 19, 2021 at 05:24:33PM -0700, Nick Desaulniers wrote:
> On Fri, Apr 16, 2021 at 10:39 AM Willy Tarreau  wrote:
> >
> > resources usage, I'm really not convinced at all it's suited for
> > low-level development. I understand the interest of the experiment
> > to help the language evolve into that direction, but I fear that
> > the kernel will soon be as bloated and insecure as a browser, and
> > that's really not to please me.
> 
> Dunno, I don't think the introduction of Rust made Firefox _more_ insecure.
> https://wiki.mozilla.org/Oxidation#Within_Firefox
> 
> I pray no executives ever see Dmitry Vyukov's 2019 Linux Plumbers Conf
> talk "Reflections on kernel quality, development process and testing."
> https://www.youtube.com/watch?v=iAfrrNdl2f4
> or his 2018 Linux Security Summit talk "Syzbot and the Tale of
> Thousand Kernel Bugs" https://www.youtube.com/watch?v=qrBVXxZDVQY
> (and they're just fuzzing the syscall interface and USB devices.
> Imagine once folks can more easily craft malformed bluetooth and wifi
> packets.)
> 
> I'd imagine the first term that comes to mind for them might be
> "liability."  They are quite sensitive to these vulnerabilities with
> silly names, logos, and websites.  There are many of us that believe
> an incremental approach of introducing a memory safe language to our
> existing infrastructure at the very least to attempt to improve the
> quality of drivers for those that choose to use such tools is a better
> approach.

I would LOVE it if some "executives" would see the above presentations,
because then they would maybe actually fund developers to fix bugs and
maintain the kernel code, instead of only allowing them to add new
features.

Seriously, that's the real problem, that Dmitry's work has exposed, the
lack of people allowed to do this type of bugfixing and maintenance on
company time, for something that the company relies on, is a huge issue.
"executives" feel that they are willing to fund the initial work and
then "throw it over the wall to the community" once it is merged, and
then they can forget about it as "the community" will maintain it for
them for free.  And that's a lie, as Dmitry's work shows.

The world creates new use cases and testing ability all the time, which
exposes bugs that have been around in old code.  Once the bugs are fixed
in that layer of code, the next layer down can finally be tested and
then look, more corner cases of issues.

Rewriting the kernel in another language is not going to fix the
majority of the issues that fuzzing finds here automagically, as that
work has exposed us to things like fault-injection and malicious USB
packets that no language would have saved us from "automatically".  All
of those code paths deal with "unsafe" data that doesn't magically
become "safe" because we switch languages.

And over time, what we have determined is "safe" has changed!  People
forget that only a few years ago have we decided that the kernel now has
to protect userspace programs from malicious hardware.  That's a major
shift in thinking, now data that we used to blindly trust can not be
trusted at all.  And "executives" want us to fix all of those issues for
free, when really it's a major design shift for loads of kernel
subsystems to handle this new threat model.

So yes, please spread that talk around.  Maybe then will we actually get
funding and support to FIX the bugs that those tools test.  Right now,
the primary fixer of those findings are _INTERNS_ as that's all
companies are willing to fund to fix this type of thing.

And don't get me started on the inability for "executives" to fund other
parts of Linux that they rely on, because they want "other companies" to
do it instead.  The tragedy-of-the-commons is a real threat to Linux,
and always has been...

thanks,

greg k-h

[PATCH v3 2/2] mmc: block: Update ext_csd.cache_ctrl if it was written

2021-04-19 Thread Avri Altman

The cache function can be turned ON and OFF by writing to the CACHE_CTRL
byte (EXT_CSD byte [33]).  However,  card->ext_csd.cache_ctrl is only
set on init if cache size > 0.

Fix that by explicitly setting ext_csd.cache_ctrl on ext-csd write.

Signed-off-by: Avri Altman 
---
 drivers/mmc/core/block.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 5b6501fc9fb7..8b07ed5e08de 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -572,6 +572,18 @@ static int __mmc_blk_ioctl_cmd(struct mmc_card *card, 
struct mmc_blk_data *md,
main_md->part_curr = value & EXT_CSD_PART_CONFIG_ACC_MASK;
}
 
+   /*
+* Make sure to update CACHE_CTRL in case it was changed. The cache
+* will get turned back on if the card is re-initialized, e.g.
+* suspend/resume or hw reset in recovery.
+*/
+   if ((MMC_EXTRACT_INDEX_FROM_ARG(cmd.arg) == EXT_CSD_CACHE_CTRL) &&
+   (cmd.opcode == MMC_SWITCH)) {
+   u8 value = MMC_EXTRACT_VALUE_FROM_ARG(cmd.arg) & 1;
+
+   card->ext_csd.cache_ctrl = value;
+   }
+
/*
 * According to the SD specs, some commands require a delay after
 * issuing the command.
-- 
2.25.1

[PATCH v3 1/2] mmc: block: Issue flush only if allowed

2021-04-19 Thread Avri Altman

The cache may be flushed to the nonvolatile storage by writing to
FLUSH_CACHE byte (EXT_CSD byte [32]). When in command queueing mode, the
cache may be flushed by issuing a CMDQ_TASK_ DEV_MGMT (CMD48) with a
FLUSH_CACHE op-code.  Either way, verify that The cache function is
turned ON before doing so.

fixes: 1e8e55b67030 (mmc: block: Add CQE support)

Reported-by: Brendan Peter 
Tested-by: Brendan Peter 
Signed-off-by: Avri Altman 
---
 drivers/mmc/core/block.c   | 7 +++
 drivers/mmc/core/mmc.c | 2 +-
 drivers/mmc/core/mmc_ops.h | 5 +
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 8bfd4d95b386..5b6501fc9fb7 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1476,6 +1476,11 @@ static int mmc_blk_cqe_issue_flush(struct mmc_queue *mq, 
struct request *req)
struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
struct mmc_request *mrq = mmc_blk_cqe_prep_dcmd(mqrq, req);
 
+   if (mmc_card_mmc(mq->card) && !mmc_flush_allowed(mq->card)) {
+   blk_mq_end_request(req, BLK_STS_OK);
+   return -EPERM;
+   }
+
mrq->cmd->opcode = MMC_SWITCH;
mrq->cmd->arg = (MMC_SWITCH_MODE_WRITE_BYTE << 24) |
(EXT_CSD_FLUSH_CACHE << 16) |
@@ -2226,6 +2231,8 @@ enum mmc_issued mmc_blk_mq_issue_rq(struct mmc_queue *mq, 
struct request *req)
switch (req_op(req)) {
case REQ_OP_FLUSH:
ret = mmc_blk_cqe_issue_flush(mq, req);
+   if (ret == -EPERM)
+   return MMC_REQ_FINISHED;
break;
case REQ_OP_READ:
case REQ_OP_WRITE:
diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
index 9ad4aa537867..e3da62ffcb5e 100644
--- a/drivers/mmc/core/mmc.c
+++ b/drivers/mmc/core/mmc.c
@@ -2037,7 +2037,7 @@ static int _mmc_flush_cache(struct mmc_card *card)
 {
int err = 0;
 
-   if (card->ext_csd.cache_size > 0 && card->ext_csd.cache_ctrl & 1) {
+   if (mmc_flush_allowed(card)) {
err = mmc_switch(card, EXT_CSD_CMD_SET_NORMAL,
 EXT_CSD_FLUSH_CACHE, 1,
 CACHE_FLUSH_TIMEOUT_MS);
diff --git a/drivers/mmc/core/mmc_ops.h b/drivers/mmc/core/mmc_ops.h
index 5782fdf4e8e9..2682bf66708a 100644
--- a/drivers/mmc/core/mmc_ops.h
+++ b/drivers/mmc/core/mmc_ops.h
@@ -19,6 +19,11 @@ enum mmc_busy_cmd {
 struct mmc_host;
 struct mmc_card;
 
+static inline bool mmc_flush_allowed(struct mmc_card *card)
+{
+   return card->ext_csd.cache_size > 0 && card->ext_csd.cache_ctrl & 1;
+}
+
 int mmc_select_card(struct mmc_card *card);
 int mmc_deselect_cards(struct mmc_host *host);
 int mmc_set_dsr(struct mmc_host *host);
-- 
2.25.1

[PATCH v3 0/2] Do not flush cache when it is disabled

2021-04-19 Thread Avri Altman

v2 -> v3:
 - rebase onto recent cache changes

v1 -> v2:
 - Attend Adrian's comments

Cache is a temporary storage space in an eMMC device. Volatile by
nature, the cache should in typical case reduce the access time compared
to an access to the main nonvolatile storage.

The cache function can be turned ON and OFF. Once OFF, the host is not
expected to issue a flush-cache command to the device.

Avri Altman (2):
  mmc: block: Issue flush only if allowed
  mmc: block: Update ext_csd.cache_ctrl if it was written

 drivers/mmc/core/block.c   | 19 +++
 drivers/mmc/core/mmc.c |  2 +-
 drivers/mmc/core/mmc_ops.h |  5 +
 3 files changed, 25 insertions(+), 1 deletion(-)

-- 
2.25.1

Re: Enabling pmbus power control

2021-04-19 Thread Zev Weiss

On Mon, Apr 19, 2021 at 10:36:48PM CDT, Guenter Roeck wrote:

On Mon, Apr 19, 2021 at 08:29:53PM -0500, Zev Weiss wrote:

On Tue, Mar 30, 2021 at 02:38:10PM CDT, Guenter Roeck wrote:
> On Tue, Mar 30, 2021 at 07:02:00PM +0100, Mark Brown wrote:
> > On Tue, Mar 30, 2021 at 12:56:56PM -0500, Zev Weiss wrote:
> >
> > > Okay, to expand a bit on the description in my initial message -- we've
> > > got a single chassis with multiple server boards and a single manager 
board
> > > that handles, among other things, power control for the servers.
> > > The manager board has one LM25066 for each attached server, which acts as
> > > the "power switch" for that server.  There thus really isn't any driver to
> > > speak of for the downstream device.
> >
> > This sounds like you need a driver representing those server boards (or
> > the slots they plug into perhaps) that represents everything about those
> > boards to userspace, including power switching.  I don't see why you
> > wouldn't have a driver for that - it's a thing that physically exists
> > and clearly has some software control, and you'd presumably also expect
> > to represent some labelling about the slot as well.
>
> Absolutely agree.
>
> Thanks,
> Guenter

Hi Guenter, Mark,

I wanted to return to this to try to explain why structuring the kernel
support for this in a way that's specific to the device behind the PMIC
seems like an awkward fit to me, and ultimately kind of artificial.

In the system I described, the manager board with the LM25066 devices is its
own little self-contained BMC system running its own Linux kernel (though
"BMC" is perhaps a slightly misleading term because there's no host system
that it manages).  The PMICs are really the only relevant connection it has
to the servers it controls power for -- they have their own dedicated local
BMCs on board as well doing all the usual BMC tasks.  A hypothetical
dedicated driver for this on the manager board wouldn't have any other
hardware to touch aside from the pmbus interface of the LM25066 itself, so
calling it a server board driver seems pretty arbitrary -- and in fact the
same system also has another LM25066 that controls the power supply to the
chassis cooling fans (a totally different downstream device, but one that
would be equally well-served by the same software).

More recently, another system has entered the picture for us that might
illustrate it more starkly -- the Delta Open19 power shelf [0] supported by
a recent code release from LinkedIn [1].  This is a rackmount power supply

All I can see is that this code is a mess.

with fifty outputs, each independently switchable via an LM25066 attached to
an on-board BMC-style management controller (though again, no host system
involved).  We (Equinix Metal) are interested in porting a modern OpenBMC to
it to replace the dated, crufty, pre-Linux-Foundation version of OpenBMC it
currently runs (as found in the linked repo).  The exact nature of the
devices powered by its outputs is well outside the scope of the firmware
running on that controller (it could be any arbitrary thing that runs on
12VDC), but we still want to be able to both (a) retrieve per-output
voltage/current/power readings as provided by the existing LM25066 hwmon
support, and (b) control the on/off state of those outputs from userspace.

Given the array of possible use-cases, an approach of adding power-switch
functionality to the existing LM25066 support seems like the most obvious
way to support this, so I'm hoping to see if I can get some idea of what an
acceptable implementation of that might look like.

Sorry, that is simply a no-go for the LM25066 driver. I mentioned it before,
and it is still true: The hwmon subsystem is not in the business of power
control.

Have you looked into enclosures (drivers/misc/enclosure.c) ? Maybe that
or something similar could be used for your purposes.

Thanks,
Guenter

I had a glance at the enclosure driver; it looks pretty geared toward 
SES-like things (drivers/scsi/ses.c being its only usage I can see in 
the kernel at the moment) and while it could perhaps be pressed into 
working for this it seems like it would probably drag in a fair amount 
of boilerplate and result in a somewhat gratuitously confusing driver 
arrangement (calling the things involved in the cases we're looking at 
"enclosures" seems like a bit of a stretch).

As an alternative, would something like the patch below be more along 
the lines of what you're suggesting?  And if so, would it make sense to 
generalize it into something like 'pmbus-switch.c' and add a 
PMBUS_HAVE_POWERSWITCH functionality bit or similar in the pmbus code 
instead of hardcoding it for only LM25066 support?

Thanks,
Zev

From f4c0a3637371d69a414b6fb882497d22e5de3ba0 Mon Sep 17 00:00:00 2001
From: Zev Weiss 
Date: Wed, 31 Mar 2021 01:58:35 -0500
Subject: [PATCH] misc: add lm25066-switch driver

This driver allows an lm25066 to be switched on and off from userspace
via sysf

Doubt regarding memory allocation in KVM

2021-04-19 Thread Shivank Garg

Hi,
I'm learning about qemu KVM, looking into code and experimenting on
it. I have the following doubts regarding it, I would be grateful if
you help me to get some idea on them.

1. I observe that KVM allocates memory to guests when it needs it but
doesn't take it back (except for ballooning case).
Also, the Qemu/KVM process does not free the memory even when the
guest is rebooted. In this case,  Does the Guest VM get access to
memory already pre-filled with some garbage from the previous run??
(Since the host would allocate zeroed pages to guests the first time
it requests but after that it's up to guests). Can it be a security
issue?

2. How does the KVM know if GPFN (guest physical frame number) is
backed by an actual machine frame number in host? If not mapped, then
it faults in the host and allocates a physical frame for guests in the
host. (kvm_mmu_page_fault)

3. How/where can I access the GPFNs in the host? Is "gfn_t gfn = gpa
>> PAGE_SHIFT" and "gpa_t cr2_or_gpa" in the KVM page fault handler,
x86 is the same as GPFN. (that is can I use pfn_to_page in guest VM to
access the struct page in Guest)

Thank You.

Best Regards,
Shivank Garg
M.Tech Student,
IIT Kanpur

Re: [PATCH] watchdog: qcom: Move suspend/resume to suspend_late/resume_early

2021-04-19 Thread Sai Prakash Ranjan


Hi Guenter,

On 2021-03-11 01:53, Guenter Roeck wrote:

On Thu, Mar 11, 2021 at 01:50:04AM +0530, Sai Prakash Ranjan wrote:

During suspend/resume usecases and tests, it is common to see issues
such as lockups either in suspend path or resume path because of the
bugs in the corresponding device driver pm handling code. In such 
cases,

it is important that watchdog is active to make sure that we either
receive a watchdog pretimeout notification or a bite causing reset
instead of a hang causing us to hard reset the machine.

There are good reasons as to why we need this because:

* We can have a watchdog pretimeout governor set to panic in which
  case we can have a backtrace which would help identify the issue
  with the particular driver and cause a normal reboot.

* Even in case where there is no pretimeout support, a watchdog
  bite is still useful because some firmware has debug support to dump
  CPU core context on watchdog bite for post-mortem analysis.

* One more usecase which comes to mind is of warm reboot. In case we
  hard reset the target, a cold reboot could be induced resulting in
  lose of ddr contents thereby losing all the debug info.

Currently, the watchdog pm callback just invokes the usual suspend
and resume callback which do not have any special ordering in the
sense that a watchdog can be suspended before the buggy device driver
suspend callback and watchdog resume can happen after the buggy device
driver resume callback. This would mean that the watchdog will not be
active when the buggy driver cause the lockups thereby hanging the
system. So to make sure this doesn't happen, move the watchdog pm to
use late/early system pm callbacks which will ensure that the watchdog
is suspended late and resumed early so that it can catch such issues.

Signed-off-by: Sai Prakash Ranjan 


Reviewed-by: Guenter Roeck 



Gentle Ping. I don't see this in linux-next or linux-watchdog, please 
let

me know if anything is pending from my side.

Thanks,
Sai


---
 drivers/watchdog/qcom-wdt.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/watchdog/qcom-wdt.c b/drivers/watchdog/qcom-wdt.c
index e38a87ffe5f5..0d2209c5eaca 100644
--- a/drivers/watchdog/qcom-wdt.c
+++ b/drivers/watchdog/qcom-wdt.c
@@ -329,7 +329,9 @@ static int __maybe_unused qcom_wdt_resume(struct 
device *dev)

return 0;
 }

-static SIMPLE_DEV_PM_OPS(qcom_wdt_pm_ops, qcom_wdt_suspend, 
qcom_wdt_resume);

+static const struct dev_pm_ops qcom_wdt_pm_ops = {
+   SET_LATE_SYSTEM_SLEEP_PM_OPS(qcom_wdt_suspend, qcom_wdt_resume)
+};

 static const struct of_device_id qcom_wdt_of_table[] = {
{ .compatible = "qcom,kpss-timer", .data = &match_data_apcs_tmr },
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation

[PATCH] x86/mm: Fix copy&paste error in comments

2021-04-19 Thread Cao jin

Direct page mapping in bottom-up way will allocate memory from low
address for page structures in a range, which is the *bottom*,
not the *end*.

Signed-off-by: Cao jin 
---
 arch/x86/mm/init.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index e26f5c5c6565..bc2f871c75f1 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -663,7 +663,7 @@ static void __init memory_map_bottom_up(unsigned long 
map_start,
/*
 * We start from the bottom (@map_start) and go to the top (@map_end).
 * The memblock_find_in_range() gets us a block of RAM from the
-* end of RAM in [min_pfn_mapped, max_pfn_mapped) used as new pages
+* bottom of RAM in [min_pfn_mapped, max_pfn_mapped) used as new pages
 * for page table.
 */
while (start < map_end) {
-- 
2.30.1

Re: [PATCH 1/1] powerpc/pseries/iommu: Fix window size for direct mapping with pmem

2021-04-19 Thread Leonardo Bras

On Tue, 2021-04-20 at 15:18 +1000, Alexey Kardashevskiy wrote:
> 
> On 20/04/2021 14:54, Leonardo Bras wrote:
> > As of today, if the DDW is big enough to fit (1 << MAX_PHYSMEM_BITS) it's
> > possible to use direct DMA mapping even with pmem region.
> > 
> > But, if that happens, the window size (len) is set to
> > (MAX_PHYSMEM_BITS - page_shift) instead of MAX_PHYSMEM_BITS, causing a
> > pagesize times smaller DDW to be created, being insufficient for correct
> > usage.
> > 
> > Fix this so the correct window size is used in this case.
> 
> Good find indeed.
> 
> afaict this does not create a huge problem though as 
> query.largest_available_block is always smaller than (MAX_PHYSMEM_BITS - 
> page_shift) where it matters (phyp).
> 
> 
> Reviewed-by: Alexey Kardashevskiy 
> 

Thanks for reviewing!

Leonardo Bras

Re: [PATCH 5/7] mfd: sec: Simplify getting of_device_id match data

2021-04-19 Thread Marek Szyprowski



On 19.04.2021 10:17, Krzysztof Kozlowski wrote:
> Use of_device_get_match_data() to make the code slightly smaller.
>
> Signed-off-by: Krzysztof Kozlowski 
> ---
>   drivers/mfd/sec-core.c | 9 +++--
>   1 file changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/mfd/sec-core.c b/drivers/mfd/sec-core.c
> index 8d55992da19e..3126c39f3203 100644
> --- a/drivers/mfd/sec-core.c
> +++ b/drivers/mfd/sec-core.c
> @@ -10,6 +10,7 @@
>   #include 
>   #include 
>   #include 
> +#include 
>   #include 
>   #include 
>   #include 
> @@ -324,12 +325,8 @@ static inline unsigned long 
> sec_i2c_get_driver_data(struct i2c_client *i2c,
>   const struct i2c_device_id *id)
>   {
>   #ifdef CONFIG_OF
> - if (i2c->dev.of_node) {
> - const struct of_device_id *match;
> -
> - match = of_match_node(sec_dt_match, i2c->dev.of_node);
> - return (unsigned long)match->data;
> - }
> + if (i2c->dev.of_node)
> + return (unsigned long)of_device_get_match_data(&i2c->dev);
>   #endif

Does it make any sense to keep the #ifdef CONFIG_OF after this change? I 
would also skip (i2c->dev.of_node) check, because 
of_device_get_match_data() already does that (although indirectly).

>   return id->driver_data;
>   }

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

Re: [PATCH v2 8/8] MIPS: pci-legacy: use generic pci_enable_resources

2021-04-19 Thread Guenter Roeck

On Tue, Apr 13, 2021 at 08:12:40PM -0700, Ilya Lipnitskiy wrote:
> Follow the reasoning from commit 842de40d93e0 ("PCI: add generic
> pci_enable_resources()"):
> 
>   The only functional difference from the MIPS version is that the
>   generic one uses "!r->parent" to check for resource collisions
>   instead of "!r->start && r->end".
> 
> That should have no effect on any pci-legacy driver.
> 

Unfortunately it does. With this patch in place, all my mips qemu emulations
fail to boot from ide/ata drive; the driver is not instantiated. The error
message is:

ata_piix :00:0a.1: can't enable device: BAR 0 [io  0x01f0-0x01f7] not 
claimed
ata_piix: probe of :00:0a.1 failed with error -22

Reverting this patch fixes the problem, and the message displayed by the driver
is:

ata_piix :00:0a.1: enabling device ( -> 0001)

Bisect log is attached.

Guenter

---
# bad: [50b8b1d699ac313c0a07a3c185ffb23aecab8abb] Add linux-next specific files 
for 20210419
# good: [bf05bf16c76bb44ab5156223e1e58e26dfe30a88] Linux 5.12-rc8
git bisect start 'HEAD' 'v5.12-rc8'
# bad: [c4bb91fc07e59241cde97f913d7a2fbedc248f0d] Merge remote-tracking branch 
'crypto/master'
git bisect bad c4bb91fc07e59241cde97f913d7a2fbedc248f0d
# bad: [499f739ad70f2a58aac985dceb25ca7666da88be] Merge remote-tracking branch 
'jc_docs/docs-next'
git bisect bad 499f739ad70f2a58aac985dceb25ca7666da88be
# bad: [11b56408a328d1c5c4dfa7667c5dc46956b64aec] Merge remote-tracking branch 
'parisc-hd/for-next'
git bisect bad 11b56408a328d1c5c4dfa7667c5dc46956b64aec
# good: [09ccc0ee1227f2cfe50d8dbbe241d115d9b3885f] Merge branch 'arm/defconfig' 
into for-next
git bisect good 09ccc0ee1227f2cfe50d8dbbe241d115d9b3885f
# good: [a5b76c2f17330e266a5c56dde21430e27b0d0dbb] Merge remote-tracking branch 
'arm-soc/for-next'
git bisect good a5b76c2f17330e266a5c56dde21430e27b0d0dbb
# good: [1e4241f6813f1c1a0027d96df075ffd01808b3cf] Merge remote-tracking branch 
'ti-k3/ti-k3-next'
git bisect good 1e4241f6813f1c1a0027d96df075ffd01808b3cf
# good: [7496a43be7a362391607d78e49a3f28de80029ce] Merge remote-tracking branch 
'h8300/h8300-next'
git bisect good 7496a43be7a362391607d78e49a3f28de80029ce
# good: [66633abd0642f1e89d26e15f36fb13d3a1c535ff] MIPS/bpf: Enable 
bpf_probe_read{, str}() on MIPS again
git bisect good 66633abd0642f1e89d26e15f36fb13d3a1c535ff
# good: [2c92ef8ff8d327797c1920ae7f938bcc6f3f7421] MIPS: Fix strnlen_user 
access check
git bisect good 2c92ef8ff8d327797c1920ae7f938bcc6f3f7421
# good: [3a070801c61f4e3987d59b1068368ba71d727208] Merge remote-tracking branch 
'microblaze/next'
git bisect good 3a070801c61f4e3987d59b1068368ba71d727208
# good: [317f553bb677e324c9c865ff7f14597bc5ceeb9c] MIPS: pci-legacy: remove 
redundant info messages
git bisect good 317f553bb677e324c9c865ff7f14597bc5ceeb9c
# bad: [6ce48897ce476bed86fde28752c27596e8753277] MIPS: Loongson64: Add 
kexec/kdump support
git bisect bad 6ce48897ce476bed86fde28752c27596e8753277
# bad: [99bca615d89510917864fac6b26fd343eff2aba2] MIPS: pci-legacy: use generic 
pci_enable_resources
git bisect bad 99bca615d89510917864fac6b26fd343eff2aba2
# good: [0af83d2e447af3e5098583cb6320bb1b1fb0976b] MIPS: pci-legacy: remove 
busn_resource field
git bisect good 0af83d2e447af3e5098583cb6320bb1b1fb0976b
# first bad commit: [99bca615d89510917864fac6b26fd343eff2aba2] MIPS: 
pci-legacy: use generic pci_enable_resources

Re: [PATCH] mips: kdump: Crash kernel should be able to see old memories

2021-04-19 Thread Youling Tang


Hi, Jiaxun

On 04/20/2021 09:11 AM, Jiaxun Yang wrote:


在 2021/4/19 18:56, Youling Tang 写道:

From: Huacai Chen 

kexec-tools use mem=X@Y to pass usable memories to crash kernel, but in
commit a94e4f24ec836c8984f83959 ("MIPS: init: Drop boot_mem_map") all
BIOS passed memories are removed by early_parse_mem(). I think this is
reasonable for a normal kernel but not for a crash kernel, because a
crash kernel should be able to see all old memories, even though it is
not supposed to use them.

Fixes: a94e4f24ec836c8984f83959 ("MIPS: init: Drop boot_mem_map")
Signed-off-by: Huacai Chen 
Signed-off-by: Youling Tang 
---
  arch/mips/kernel/setup.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index b86e241..ac90d3b 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -351,8 +351,10 @@ static int __init early_parse_mem(char *p)
   */
  if (usermem == 0) {
  usermem = 1;
+#ifndef CONFIG_CRASH_DUMP


Why depend on a config instead of a runtime variable?


If not depend on config, we can determine whether the command line contains
the "elfcorehdr=" parameter, because the "mem=" and "elfcorhdr=" parameters
are automatically added in kexec-tools. So if there is an "elfcorehdr="
parameter in the command line, it means that the currently running kernel
is a capture kernel, and the memblock_remove() operation is not called.

The revised patch is as follows:
if (usermem == 0) {
usermem = 1;
-   memblock_remove(memblock_start_of_DRAM(),
-   memblock_end_of_DRAM() - memblock_start_of_DRAM());
+   if (!strstr(boot_command_line, "elfcorehdr")) {
+   memblock_remove(memblock_start_of_DRAM(),
+   memblock_end_of_DRAM() - 
memblock_start_of_DRAM());

+   }

Do you think it is feasible?

Btw as you are fixing my commit please keep me CCed.

Sorry, I will add your CCed.

Thanks,
Youling


Thanks.


- Jiaxun

Re: [PATCH 1/1] powerpc/pseries/iommu: Fix window size for direct mapping with pmem

2021-04-19 Thread Alexey Kardashevskiy





On 20/04/2021 14:54, Leonardo Bras wrote:

As of today, if the DDW is big enough to fit (1 << MAX_PHYSMEM_BITS) it's
possible to use direct DMA mapping even with pmem region.

But, if that happens, the window size (len) is set to
(MAX_PHYSMEM_BITS - page_shift) instead of MAX_PHYSMEM_BITS, causing a
pagesize times smaller DDW to be created, being insufficient for correct
usage.

Fix this so the correct window size is used in this case.


Good find indeed.

afaict this does not create a huge problem though as 
query.largest_available_block is always smaller than (MAX_PHYSMEM_BITS - 
page_shift) where it matters (phyp).



Reviewed-by: Alexey Kardashevskiy 



Fixes: bf6e2d562bbc4("powerpc/dma: Fallback to dma_ops when persistent memory 
present")
Signed-off-by: Leonardo Bras 
---
  arch/powerpc/platforms/pseries/iommu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index 9fc5217f0c8e..836cbbe0ecc5 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -1229,7 +1229,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct 
device_node *pdn)
if (pmem_present) {
if (query.largest_available_block >=
(1ULL << (MAX_PHYSMEM_BITS - page_shift)))
-   len = MAX_PHYSMEM_BITS - page_shift;
+   len = MAX_PHYSMEM_BITS;
else
dev_info(&dev->dev, "Skipping ibm,pmemory");
}



--
Alexey

[tip:master] BUILD SUCCESS 75ce17f4d207d047b991e28cd5243f2345889c0c

2021-04-19 Thread kernel test robot

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git master
branch HEAD: 75ce17f4d207d047b991e28cd5243f2345889c0c  Merge branch 'x86/build'

elapsed time: 722m

configs tested: 117
configs skipped: 3

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
x86_64   allyesconfig
riscvallmodconfig
i386 allyesconfig
riscvallyesconfig
powerpc   motionpro_defconfig
sh ap325rxa_defconfig
mips  pic32mzda_defconfig
powerpc xes_mpc85xx_defconfig
m68kmvme16x_defconfig
arm   versatile_defconfig
s390 alldefconfig
mips  pistachio_defconfig
armspear3xx_defconfig
sh  rsk7203_defconfig
powerpc mpc8272_ads_defconfig
mips  rb532_defconfig
shedosk7705_defconfig
powerpc asp8347_defconfig
armrealview_defconfig
sparc   sparc32_defconfig
um   alldefconfig
sh   rts7751r2dplus_defconfig
sh   se7705_defconfig
powerpc redwood_defconfig
powerpc  katmai_defconfig
sh ecovec24_defconfig
sh  rsk7201_defconfig
ia64 allmodconfig
ia64defconfig
parisc   alldefconfig
powerpc   holly_defconfig
shsh7785lcr_defconfig
m68k  multi_defconfig
arm   omap2plus_defconfig
sh   sh7724_generic_defconfig
mips loongson1b_defconfig
powerpc  g5_defconfig
arm  footbridge_defconfig
powerpc sequoia_defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
arc  allyesconfig
nds32 allnoconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
s390 allmodconfig
parisc   allyesconfig
s390defconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
x86_64   randconfig-a003-20210419
x86_64   randconfig-a001-20210419
x86_64   randconfig-a005-20210419
x86_64   randconfig-a002-20210419
x86_64   randconfig-a006-20210419
x86_64   randconfig-a004-20210419
i386 randconfig-a003-20210419
i386 randconfig-a001-20210419
i386 randconfig-a006-20210419
i386 randconfig-a005-20210419
i386 randconfig-a004-20210419
i386 randconfig-a002-20210419
i386 randconfig-a012-20210420
i386 randconfig-a014-20210420
i386 randconfig-a011-20210420
i386 randconfig-a013-20210420
i386 randconfig-a015-20210420
i386 randconfig-a016-20210420
i386 randconfig-a015-20210419
i386 randconfig-a013-20210419
i386 randconfig-a014-20210419
i386 randconfig-a016-20210419
i386 randconfig-a012-20210419
i386 randconfig-a011-20210419
riscvnommu_k210_defconfig
riscvnommu_virt_defconfig
riscv allnoconfig
riscv   defco

[syzbot] KASAN: use-after-free Read in sctp_do_8_2_transport_strike

2021-04-19 Thread syzbot

Hello,

syzbot found the following issue on:

HEAD commit:bf05bf16 Linux 5.12-rc8
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=16a00dfed0
kernel config:  https://syzkaller.appspot.com/x/.config?x=9404cfa686df2c05
dashboard link: https://syzkaller.appspot.com/bug?extid=bbe538efd1046586f587

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+bbe538efd1046586f...@syzkaller.appspotmail.com

==
BUG: KASAN: use-after-free in 
sctp_do_8_2_transport_strike.constprop.0+0xa27/0xab0 
net/sctp/sm_sideeffect.c:531
Read of size 4 at addr 888024d65154 by task swapper/1/0

CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.12.0-rc8-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x141/0x1d7 lib/dump_stack.c:120
 print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:232
 __kasan_report mm/kasan/report.c:399 [inline]
 kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416
 sctp_do_8_2_transport_strike.constprop.0+0xa27/0xab0 
net/sctp/sm_sideeffect.c:531
 sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1636 [inline]
 sctp_side_effects net/sctp/sm_sideeffect.c:1185 [inline]
 sctp_do_sm+0x114f/0x5120 net/sctp/sm_sideeffect.c:1156
 sctp_generate_timeout_event+0x1bb/0x3d0 net/sctp/sm_sideeffect.c:295
 call_timer_fn+0x1a5/0x6b0 kernel/time/timer.c:1431
 expire_timers kernel/time/timer.c:1476 [inline]
 __run_timers.part.0+0x67c/0xa50 kernel/time/timer.c:1745
 __run_timers kernel/time/timer.c:1726 [inline]
 run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1758
 __do_softirq+0x29b/0x9f6 kernel/softirq.c:345
 invoke_softirq kernel/softirq.c:221 [inline]
 __irq_exit_rcu kernel/softirq.c:422 [inline]
 irq_exit_rcu+0x134/0x200 kernel/softirq.c:434
 sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1100
 
 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:632
RIP: 0010:native_save_fl arch/x86/include/asm/irqflags.h:29 [inline]
RIP: 0010:arch_local_save_flags arch/x86/include/asm/irqflags.h:70 [inline]
RIP: 0010:arch_irqs_disabled arch/x86/include/asm/irqflags.h:137 [inline]
RIP: 0010:acpi_safe_halt drivers/acpi/processor_idle.c:112 [inline]
RIP: 0010:acpi_idle_do_entry+0x1c9/0x250 drivers/acpi/processor_idle.c:517
Code: ed 56 6e f8 84 db 75 ac e8 34 50 6e f8 e8 3f 3f 74 f8 e9 0c 00 00 00 e8 
25 50 6e f8 0f 00 2d be a5 c7 00 e8 19 50 6e f8 fb f4 <9c> 5b 81 e3 00 02 00 00 
fa 31 ff 48 89 de e8 24 58 6e f8 48 85 db
RSP: 0018:c9d57d18 EFLAGS: 0293
RAX:  RBX:  RCX: 
RDX: 8880111c54c0 RSI: 8905a167 RDI: 
RBP: 888141433864 R08: 0001 R09: 0001
R10: 8179e0c8 R11:  R12: 0001
R13: 888141433800 R14: 888141433864 R15: 888017879004
 acpi_idle_enter+0x361/0x500 drivers/acpi/processor_idle.c:652
 cpuidle_enter_state+0x1b1/0xc80 drivers/cpuidle/cpuidle.c:237
 cpuidle_enter+0x4a/0xa0 drivers/cpuidle/cpuidle.c:351
 call_cpuidle kernel/sched/idle.c:158 [inline]
 cpuidle_idle_call kernel/sched/idle.c:239 [inline]
 do_idle+0x3e1/0x590 kernel/sched/idle.c:300
 cpu_startup_entry+0x14/0x20 kernel/sched/idle.c:397
 start_secondary+0x274/0x350 arch/x86/kernel/smpboot.c:272
 secondary_startup_64_no_verify+0xb0/0xbb

Allocated by task 14433:
 kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
 kasan_set_track mm/kasan/common.c:46 [inline]
 set_alloc_info mm/kasan/common.c:427 [inline]
 kasan_kmalloc mm/kasan/common.c:506 [inline]
 kasan_kmalloc mm/kasan/common.c:465 [inline]
 __kasan_kmalloc+0x99/0xc0 mm/kasan/common.c:515
 kmalloc include/linux/slab.h:554 [inline]
 kzalloc include/linux/slab.h:684 [inline]
 sctp_transport_new+0x8c/0x690 net/sctp/transport.c:96
 sctp_assoc_add_peer+0x28f/0x1160 net/sctp/associola.c:618
 sctp_process_init+0x12a/0x2940 net/sctp/sm_make_chunk.c:2345
 sctp_sf_do_dupcook_a net/sctp/sm_statefuns.c:1800 [inline]
 sctp_sf_do_5_2_4_dupcook+0x1401/0x2d50 net/sctp/sm_statefuns.c:2200
 sctp_do_sm+0x179/0x5120 net/sctp/sm_sideeffect.c:1153
 sctp_assoc_bh_rcv+0x386/0x6c0 net/sctp/associola.c:1048
 sctp_inq_push+0x1da/0x270 net/sctp/inqueue.c:80
 sctp_rcv+0xf64/0x2f10 net/sctp/input.c:256
 sctp6_rcv+0x38/0x60 net/sctp/ipv6.c:1077
 ip6_protocol_deliver_rcu+0x2e9/0x17f0 net/ipv6/ip6_input.c:422
 ip6_input_finish+0x7f/0x160 net/ipv6/ip6_input.c:463
 NF_HOOK include/linux/netfilter.h:301 [inline]
 NF_HOOK include/linux/netfilter.h:295 [inline]
 ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:472
 dst_input include/net/dst.h:458 [inline]
 ip6_rcv_finish net/ipv6/ip6_input.c:76 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 NF_HOOK include/linux/netfilter.h:295 [inline]
 ipv6_rcv+0x28e/0x3c0 net/i

Re: [PATCH] bonding: 3ad: update slave arr after initialize

2021-04-19 Thread Jay Vosburgh

jin yiting  wrote:
[...]
>>  The described issue is a race condition (in that
>> ad_agg_selection_logic clears agg->is_active under mode_lock, but
>> bond_open -> bond_update_slave_arr is inspecting agg->is_active outside
>> the lock).  I don't see how the above change will reliably manage this;
>> the real issue looks to be that bond_update_slave_arr is committing
>> changes to the array (via bond_reset_slave_arr) based on a racy
>> inspection of the active aggregator state while it is in flux.
>>
>>  Also, the description of the issue says "The best aggregator in
>> ad_agg_selection_logic has not changed, no need to update slave arr,"
>> but the change above does the opposite, and will set update_slave_arr
>> when the aggregator has not changed (update_slave_arr remains false at
>> return of ad_agg_selection_logic).
>>
>>  I believe I understand the described problem, but I don't see
>> how the patch fixes it.  I suspect (but haven't tested) that the proper
>> fix is to acquire mode_lock in bond_update_slave_arr while calling
>> bond_3ad_get_active_agg_info to avoid conflict with the state machine.
>>
>>  -J
>>
>> ---
>>  -Jay Vosburgh, jay.vosbu...@canonical.com
>> .
>>
>
>   Thank you for your reply. The last patch does have redundant
>update slave arr.Thank you for your correction.
>
>As you said, holding mode_lock in bond_update_slave_arr while
>calling bond_3ad_get_active_agg_info can avoid conflictwith the state
>machine. I have tested this patch, with ifdown/ifup operations for bond or
>slaves.
>
>But bond_update_slave_arr is expected to hold RTNL only and NO
>other lock. And it have WARN_ON(lockdep_is_held(&bond->mode_lock)); in
>bond_update_slave_arr. I'm not sure that holding mode_lock in
>bond_update_slave_arr while calling bond_3ad_get_active_agg_info is a
>correct action.

That WARN_ON came up in discussion recently, and my opinion is
that it's incorrect, and is trying to insure bond_update_slave_arr is
safe for a potential sleep when allocating memory.

https://lore.kernel.org/netdev/20210322123846.3024549-1-maxi...@nvidia.com/

The original authors haven't replied, so I would suggest you
remove the WARN_ON and the surrounding CONFIG_LOCKDEP ifdefs as part of
your patch and replace it with a call to might_sleep.

The other callers of bond_3ad_get_active_agg_info are generally
obtaining the state in order to report it to user space, so I think it's
safe to leave those calls not holding the mode_lock.  The race is still
there, but the data returned to user space is a snapshot and so may
reflect an incomplete state during a transition.  Further, having the
inspection functions acquire the mode_lock permits user space to spam
the lock with little effort.

-J

>diff --git a/drivers/net/bonding/bond_main.c
>b/drivers/net/bonding/bond_main.c
>index 74cbbb2..db988e5 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -4406,7 +4406,9 @@ int bond_update_slave_arr(struct bonding *bond,
>struct slave *skipslave)
>if (BOND_MODE(bond) == BOND_MODE_8023AD) {
>struct ad_info ad_info;
>
>+   spin_lock_bh(&bond->mode_lock);
>if (bond_3ad_get_active_agg_info(bond, &ad_info)) {
>+   spin_unlock_bh(&bond->mode_lock);
>pr_debug("bond_3ad_get_active_agg_info failed\n");
>/* No active aggragator means it's not safe to use
> * the previous array.
>@@ -4414,6 +4416,7 @@ int bond_update_slave_arr(struct bonding *bond,
>struct slave *skipslave)
>bond_reset_slave_arr(bond);
>goto out;
>}
>+   spin_unlock_bh(&bond->mode_lock);
>agg_id = ad_info.aggregator_id;
>}
>bond_for_each_slave(bond, slave, iter) {

---
-Jay Vosburgh, jay.vosbu...@canonical.com

Re: PPC_FPU, ALTIVEC: enable_kernel_fp, put_vr, get_vr

2021-04-19 Thread Christophe Leroy





Le 19/04/2021 à 23:39, Randy Dunlap a écrit :

On 4/19/21 6:16 AM, Michael Ellerman wrote:

Randy Dunlap  writes:



Sure.  I'll post them later today.
They keep FPU and ALTIVEC as independent (build) features.


Those patches look OK.

But I don't think it makes sense to support that configuration, FPU=n
ALTVEC=y. No one is ever going to make a CPU like that. We have enough
testing surface due to configuration options, without adding artificial
combinations that no one is ever going to use.

IMHO :)

So I'd rather we just make ALTIVEC depend on FPU.


That's rather simple. See below.
I'm doing a bunch of randconfig builds with it now.

---
From: Randy Dunlap 
Subject: [PATCH] powerpc: make ALTIVEC depend PPC_FPU

On a kernel config with ALTIVEC=y and PPC_FPU not set/enabled,
there are build errors:

drivers/cpufreq/pmac32-cpufreq.c:262:2: error: implicit declaration of function 
'enable_kernel_fp' [-Werror,-Wimplicit-function-declaration]
enable_kernel_fp();
../arch/powerpc/lib/sstep.c: In function 'do_vec_load':
../arch/powerpc/lib/sstep.c:637:3: error: implicit declaration of function 
'put_vr' [-Werror=implicit-function-declaration]
   637 |   put_vr(rn, &u.v);
   |   ^~
../arch/powerpc/lib/sstep.c: In function 'do_vec_store':
../arch/powerpc/lib/sstep.c:660:3: error: implicit declaration of function 
'get_vr'; did you mean 'get_oc'? [-Werror=implicit-function-declaration]
   660 |   get_vr(rn, &u.v);
   |   ^~

In theory ALTIVEC is independent of PPC_FPU but in practice nobody
is going to build such a machine, so make ALTIVEC require PPC_FPU
by depending on PPC_FPU.

Signed-off-by: Randy Dunlap 
Reported-by: kernel test robot 
Cc: Michael Ellerman 
Cc: linuxppc-...@lists.ozlabs.org
Cc: Christophe Leroy 
Cc: Segher Boessenkool 
Cc: l...@intel.com
---
  arch/powerpc/platforms/86xx/Kconfig|1 +
  arch/powerpc/platforms/Kconfig.cputype |2 ++
  2 files changed, 3 insertions(+)

--- linux-next-20210416.orig/arch/powerpc/platforms/86xx/Kconfig
+++ linux-next-20210416/arch/powerpc/platforms/86xx/Kconfig
@@ -4,6 +4,7 @@ menuconfig PPC_86xx
bool "86xx-based boards"
depends on PPC_BOOK3S_32
select FSL_SOC
+   select PPC_FPU
select ALTIVEC
help
  The Freescale E600 SoCs have 74xx cores.
--- linux-next-20210416.orig/arch/powerpc/platforms/Kconfig.cputype
+++ linux-next-20210416/arch/powerpc/platforms/Kconfig.cputype
@@ -186,6 +186,7 @@ config E300C3_CPU
  config G4_CPU
bool "G4 (74xx)"
depends on PPC_BOOK3S_32
+   select PPC_FPU
select ALTIVEC
  
  endchoice

@@ -309,6 +310,7 @@ config PHYS_64BIT
  
  config ALTIVEC

bool "AltiVec Support"
+   depends on PPC_FPU


Shouldn't we do it the other way round ? In extenso make ALTIVEC select PPC_FPU and avoid the two 
selects that are above ?



depends on PPC_BOOK3S_32 || PPC_BOOK3S_64 || (PPC_E500MC && PPC64)
help
  This option enables kernel support for the Altivec extensions to the

Re: [PATCH v5 09/12] media: uvcvideo: Implement UVC_QUIRK_PRIVACY_DURING_STREAM

2021-04-19 Thread Tomasz Figa

On Wed, Dec 23, 2020 at 9:56 PM Ricardo Ribalda  wrote:
>
> Hi again
>
> On Wed, Dec 23, 2020 at 9:31 AM Ricardo Ribalda  wrote:
> >
> > Hi Laurent
> >
> > On Wed, Dec 23, 2020 at 9:05 AM Laurent Pinchart
> >  wrote:
> > >
> > > Hi Ricardo,
> > >
> > > On Tue, Dec 22, 2020 at 09:04:19PM +0100, Ricardo Ribalda wrote:
> > > > On Tue, Dec 22, 2020 at 11:30 AM Laurent Pinchart wrote:
> > > > > On Mon, Dec 21, 2020 at 05:48:16PM +0100, Ricardo Ribalda wrote:
> > > > > > Some devices, can only read the privacy_pin if the device is
> > > > >
> > > > > s/devices,/devices/
> > > > >
> > > > > > streaming.
> > > > > >
> > > > > > This patch implement a quirk for such devices, in order to avoid 
> > > > > > invalid
> > > > > > reads and/or spurious events.
> > > > > >
> > > > > > Signed-off-by: Ricardo Ribalda 
> > > > > > ---
> > > > > >  drivers/media/usb/uvc/uvc_driver.c | 57 
> > > > > > --
> > > > > >  drivers/media/usb/uvc/uvc_queue.c  |  3 ++
> > > > > >  drivers/media/usb/uvc/uvcvideo.h   |  4 +++
> > > > > >  3 files changed, 61 insertions(+), 3 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/media/usb/uvc/uvc_driver.c 
> > > > > > b/drivers/media/usb/uvc/uvc_driver.c
> > > > > > index 72516101fdd0..7af37d4bd60a 100644
> > > > > > --- a/drivers/media/usb/uvc/uvc_driver.c
> > > > > > +++ b/drivers/media/usb/uvc/uvc_driver.c
> > > > > > @@ -7,6 +7,7 @@
> > > > > >   */
> > > > > >
> > > > > >  #include 
> > > > > > +#include 
> > > > > >  #include 
> > > > > >  #include 
> > > > > >  #include 
> > > > > > @@ -1472,6 +1473,17 @@ static int uvc_parse_control(struct 
> > > > > > uvc_device *dev)
> > > > > >  /* 
> > > > > > -
> > > > > >   * Privacy GPIO
> > > > > >   */
> > > > >
> > > > > There should be a blank line here.
> > > > >
> > > > > > +static bool uvc_gpio_is_streaming(struct uvc_device *dev)
> > > > > > +{
> > > > > > + struct uvc_streaming *streaming;
> > > > > > +
> > > > > > + list_for_each_entry(streaming, &dev->streams, list) {
> > > > > > + if (uvc_queue_streaming(&streaming->queue))
> > > > > > + return true;
> > > > > > + }
> > > > > > +
> > > > > > + return false;
> > > > > > +}
> > > > > >
> > > > > >
> > > > >
> > > > > But not too blank lines here.
> > > > >
> > > > > >  static u8 uvc_gpio_update_value(struct uvc_device *dev,
> > > > > > @@ -1499,7 +1511,12 @@ static int uvc_gpio_get_cur(struct 
> > > > > > uvc_device *dev, struct uvc_entity *entity,
> > > > > >   if (cs != UVC_CT_PRIVACY_CONTROL || size < 1)
> > > > > >   return -EINVAL;
> > > > > >
> > > > > > + if ((dev->quirks & UVC_QUIRK_PRIVACY_DURING_STREAM) &&
> > > > > > + !uvc_gpio_is_streaming(dev))
> > > > > > + return -EBUSY;
> > > > > > +
> > > > > >   *(uint8_t *)data = uvc_gpio_update_value(dev, entity);
> > > > > > +
> > > > > >   return 0;
> > > > > >  }
> > > > > >
> > > > > > @@ -1528,19 +1545,50 @@ static struct uvc_entity 
> > > > > > *uvc_gpio_find_entity(struct uvc_device *dev)
> > > > > >   return NULL;
> > > > > >  }
> > > > > >
> > > > > > -static irqreturn_t uvc_gpio_irq(int irq, void *data)
> > > > > > +void uvc_privacy_gpio_event(struct uvc_device *dev)
> > > > > >  {
> > > > > > - struct uvc_device *dev = data;
> > > > > >   struct uvc_entity *unit;
> > > > > >
> > > > > > +
> > > > > >   unit = uvc_gpio_find_entity(dev);
> > > > > >   if (!unit)
> > > > > > - return IRQ_HANDLED;
> > > > > > + return;
> > > > > >
> > > > > >   uvc_gpio_update_value(dev, unit);
> > > > > > +}
> > > > > > +
> > > > > > +static irqreturn_t uvc_gpio_irq(int irq, void *data)
> > > > > > +{
> > > > > > + struct uvc_device *dev = data;
> > > > > > +
> > > > > > + /* Ignore privacy events during streamoff */
> > > > > > + if (dev->quirks & UVC_QUIRK_PRIVACY_DURING_STREAM)
> > > > > > + if (!uvc_gpio_is_streaming(dev))
> > > > > > + return IRQ_HANDLED;
> > > > >
> > > > > I'm still a bit concerned of race conditions. When stopping the 
> > > > > stream,
> > > > > vb2_queue.streaming is set to 0 after calling the driver's 
> > > > > .stop_stream()
> > > > > handler. This means that the device will cut power before
> > > > > uvc_gpio_is_streaming() can detect that streaming has stopped, and the
> > > > > GPIO could thus trigger an IRQ.
> > > >
> > > > On the affected devices I have not seen this. I guess it takes some
> > > > time to discharge. Anyway I am implementing a workaround. Tell me if
> > > > it is too ugly.
> > > >
> > > > > You mentioned that devices have a pull-up or pull-down on the GPIO 
> > > > > line.
> > > > > As there are only two devices affected, do you know if it's a pull-up 
> > > > > or
> > > > > pull-down ? Would it be worse to expose that state to userspace than 
> > > > > to
> > > > > return -EBUSY when reading th

[PATCH 1/1] powerpc/pseries/iommu: Fix window size for direct mapping with pmem

2021-04-19 Thread Leonardo Bras

As of today, if the DDW is big enough to fit (1 << MAX_PHYSMEM_BITS) it's
possible to use direct DMA mapping even with pmem region.

But, if that happens, the window size (len) is set to
(MAX_PHYSMEM_BITS - page_shift) instead of MAX_PHYSMEM_BITS, causing a
pagesize times smaller DDW to be created, being insufficient for correct
usage.

Fix this so the correct window size is used in this case.

Fixes: bf6e2d562bbc4("powerpc/dma: Fallback to dma_ops when persistent memory 
present")
Signed-off-by: Leonardo Bras 
---
 arch/powerpc/platforms/pseries/iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index 9fc5217f0c8e..836cbbe0ecc5 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -1229,7 +1229,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct 
device_node *pdn)
if (pmem_present) {
if (query.largest_available_block >=
(1ULL << (MAX_PHYSMEM_BITS - page_shift)))
-   len = MAX_PHYSMEM_BITS - page_shift;
+   len = MAX_PHYSMEM_BITS;
else
dev_info(&dev->dev, "Skipping ibm,pmemory");
}
-- 
2.30.2

Re: [PATCH] bpf: Fix backport of "bpf: restrict unknown scalars of mixed signed bounds for unprivileged"

2021-04-19 Thread Balbir Singh

On Mon, Apr 19, 2021 at 04:56:41PM -0700, Samuel Mendoza-Jonas wrote:
> The 4.14 backport of 9d7eceede ("bpf: restrict unknown scalars of mixed
> signed bounds for unprivileged") adds the PTR_TO_MAP_VALUE check to the
> wrong location in adjust_ptr_min_max_vals(), most likely because 4.14
> doesn't include the commit that updates the if-statement to a
> switch-statement (aad2eeaf4 "bpf: Simplify ptr_min_max_vals adjustment").
> 
> Move the check to the proper location in adjust_ptr_min_max_vals().
> 
> Fixes: 17efa65350c5a ("bpf: restrict unknown scalars of mixed signed bounds 
> for unprivileged")
> Signed-off-by: Samuel Mendoza-Jonas 
> Reviewed-by: Frank van der Linden 
> Reviewed-by: Ethan Chen 
> ---

Thanks for catching it :)

Reviewed-by: Balbir Singh

RE: [PATCH v2] Drivers: hv: vmbus: Initialize unload_event statically

2021-04-19 Thread Michael Kelley

From: Andrea Parri (Microsoft)  Sent: Monday, April 19, 
2021 6:44 PM
> 
> If a malicious or compromised Hyper-V sends a spurious message of type
> CHANNELMSG_UNLOAD_RESPONSE, the function vmbus_unload_response() will
> call complete() on an uninitialized event, and cause an oops.
> 
> Reported-by: Michael Kelley 
> Signed-off-by: Andrea Parri (Microsoft) 
> ---
> Changes since v1[1]:
>   - add inline comment in vmbus_unload_response()
> 
> [1] 
> https://lore.kernel.org/linux-hyperv/20210416143932.16512-1-parri.and...@gmail.com/
> 
>  drivers/hv/channel_mgmt.c | 7 ++-
>  drivers/hv/connection.c   | 2 ++
>  2 files changed, 8 insertions(+), 1 deletion(-)
> 

Reviewed-by: Michael Kelley 

> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index 4c9e45d1f462c..335a10ee03a5e 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -826,6 +826,11 @@ static void vmbus_unload_response(struct
> vmbus_channel_message_header *hdr)
>   /*
>* This is a global event; just wakeup the waiting thread.
>* Once we successfully unload, we can cleanup the monitor state.
> +  *
> +  * NB.  A malicious or compromised Hyper-V could send a spurious
> +  * message of type CHANNELMSG_UNLOAD_RESPONSE, and trigger a call
> +  * of the complete() below.  Make sure that unload_event has been
> +  * initialized by the time this complete() is executed.
>*/
>   complete(&vmbus_connection.unload_event);
>  }
> @@ -841,7 +846,7 @@ void vmbus_initiate_unload(bool crash)
>   if (vmbus_proto_version < VERSION_WIN8_1)
>   return;
> 
> - init_completion(&vmbus_connection.unload_event);
> + reinit_completion(&vmbus_connection.unload_event);
>   memset(&hdr, 0, sizeof(struct vmbus_channel_message_header));
>   hdr.msgtype = CHANNELMSG_UNLOAD;
>   vmbus_post_msg(&hdr, sizeof(struct vmbus_channel_message_header),
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index dc19d5ae4373c..311cd005b3be6 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -26,6 +26,8 @@
> 
>  struct vmbus_connection vmbus_connection = {
>   .conn_state = DISCONNECTED,
> + .unload_event   = COMPLETION_INITIALIZER(
> +   vmbus_connection.unload_event),
>   .next_gpadl_handle  = ATOMIC_INIT(0xE1E10),
> 
>   .ready_for_suspend_event = COMPLETION_INITIALIZER(
> --
> 2.25.1

[PATCH v2 1/1] Drivers: hv: vmbus: Increase wait time for VMbus unload

2021-04-19 Thread Michael Kelley

When running in Azure, disks may be connected to a Linux VM with
read/write caching enabled. If a VM panics and issues a VMbus
UNLOAD request to Hyper-V, the response is delayed until all dirty
data in the disk cache is flushed.  In extreme cases, this flushing
can take 10's of seconds, depending on the disk speed and the amount
of dirty data. If kdump is configured for the VM, the current 10 second
timeout in vmbus_wait_for_unload() may be exceeded, and the UNLOAD
complete message may arrive well after the kdump kernel is already
running, causing problems.  Note that no problem occurs if kdump is
not enabled because Hyper-V waits for the cache flush before doing
a reboot through the BIOS/UEFI code.

Fix this problem by increasing the timeout in vmbus_wait_for_unload()
to 100 seconds. Also output periodic messages so that if anyone is
watching the serial console, they won't think the VM is completely
hung.

Fixes: 911e1987efc8 ("Drivers: hv: vmbus: Add timeout to vmbus_wait_for_unload")
Signed-off-by: Michael Kelley 
---

Changed in v2: Fixed silly error in the argument to mdelay()

---
 drivers/hv/channel_mgmt.c | 30 +-
 1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index f3cf4af..ef4685c 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -755,6 +755,12 @@ static void init_vp_index(struct vmbus_channel *channel)
free_cpumask_var(available_mask);
 }
 
+#define UNLOAD_DELAY_UNIT_MS   10  /* 10 milliseconds */
+#define UNLOAD_WAIT_MS (100*1000)  /* 100 seconds */
+#define UNLOAD_WAIT_LOOPS  (UNLOAD_WAIT_MS/UNLOAD_DELAY_UNIT_MS)
+#define UNLOAD_MSG_MS  (5*1000)/* Every 5 seconds */
+#define UNLOAD_MSG_LOOPS   (UNLOAD_MSG_MS/UNLOAD_DELAY_UNIT_MS)
+
 static void vmbus_wait_for_unload(void)
 {
int cpu;
@@ -772,12 +778,17 @@ static void vmbus_wait_for_unload(void)
 * vmbus_connection.unload_event. If not, the last thing we can do is
 * read message pages for all CPUs directly.
 *
-* Wait no more than 10 seconds so that the panic path can't get
-* hung forever in case the response message isn't seen.
+* Wait up to 100 seconds since an Azure host must writeback any dirty
+* data in its disk cache before the VMbus UNLOAD request will
+* complete. This flushing has been empirically observed to take up
+* to 50 seconds in cases with a lot of dirty data, so allow additional
+* leeway and for inaccuracies in mdelay(). But eventually time out so
+* that the panic path can't get hung forever in case the response
+* message isn't seen.
 */
-   for (i = 0; i < 1000; i++) {
+   for (i = 1; i <= UNLOAD_WAIT_LOOPS; i++) {
if (completion_done(&vmbus_connection.unload_event))
-   break;
+   goto completed;
 
for_each_online_cpu(cpu) {
struct hv_per_cpu_context *hv_cpu
@@ -800,9 +811,18 @@ static void vmbus_wait_for_unload(void)
vmbus_signal_eom(msg, message_type);
}
 
-   mdelay(10);
+   /*
+* Give a notice periodically so someone watching the
+* serial output won't think it is completely hung.
+*/
+   if (!(i % UNLOAD_MSG_LOOPS))
+   pr_notice("Waiting for VMBus UNLOAD to complete\n");
+
+   mdelay(UNLOAD_DELAY_UNIT_MS);
}
+   pr_err("Continuing even though VMBus UNLOAD did not complete\n");
 
+completed:
/*
 * We're crashing and already got the UNLOAD_RESPONSE, cleanup all
 * maybe-pending messages on all CPUs to be able to receive new
-- 
1.8.3.1

Re: [PATCH bpf-next v2 4/4] libbpf: add selftests for TC-BPF API

2021-04-19 Thread Andrii Nakryiko

On Mon, Apr 19, 2021 at 5:18 AM Kumar Kartikeya Dwivedi
 wrote:
>
> This adds some basic tests for the low level bpf_tc_cls_* API.
>
> Reviewed-by: Toke Høiland-Jørgensen 
> Signed-off-by: Kumar Kartikeya Dwivedi 
> ---
>  .../selftests/bpf/prog_tests/test_tc_bpf.c| 112 ++
>  .../selftests/bpf/progs/test_tc_bpf_kern.c|  12 ++
>  2 files changed, 124 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
>  create mode 100644 tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c 
> b/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
> new file mode 100644
> index ..945f3a1a72f8
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
> @@ -0,0 +1,112 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define LO_IFINDEX 1
> +
> +static int test_tc_cls_internal(int fd, __u32 parent_id)
> +{
> +   DECLARE_LIBBPF_OPTS(bpf_tc_cls_opts, opts, .handle = 1, .priority = 
> 10,
> +   .class_id = TC_H_MAKE(1UL << 16, 1),
> +   .chain_index = 5);
> +   struct bpf_tc_cls_attach_id id = {};
> +   struct bpf_tc_cls_info info = {};
> +   int ret;
> +
> +   ret = bpf_tc_cls_attach(fd, LO_IFINDEX, parent_id, &opts, &id);
> +   if (CHECK_FAIL(ret < 0))
> +   return ret;
> +
> +   ret = bpf_tc_cls_get_info(fd, LO_IFINDEX, parent_id, NULL, &info);
> +   if (CHECK_FAIL(ret < 0))
> +   goto end;
> +
> +   ret = -1;
> +
> +   if (CHECK_FAIL(info.id.handle != id.handle) ||
> +   CHECK_FAIL(info.id.chain_index != id.chain_index) ||
> +   CHECK_FAIL(info.id.priority != id.priority) ||
> +   CHECK_FAIL(info.id.handle != 1) ||
> +   CHECK_FAIL(info.id.priority != 10) ||
> +   CHECK_FAIL(info.class_id != TC_H_MAKE(1UL << 16, 1)) ||
> +   CHECK_FAIL(info.id.chain_index != 5))
> +   goto end;
> +
> +   ret = bpf_tc_cls_replace(fd, LO_IFINDEX, parent_id, &opts, &id);
> +   if (CHECK_FAIL(ret < 0))
> +   return ret;
> +
> +   if (CHECK_FAIL(info.id.handle != 1) ||
> +   CHECK_FAIL(info.id.priority != 10) ||
> +   CHECK_FAIL(info.class_id != TC_H_MAKE(1UL << 16, 1)))
> +   goto end;
> +
> +   /* Demonstrate changing attributes */
> +   opts.class_id = TC_H_MAKE(1UL << 16, 2);
> +
> +   ret = bpf_tc_cls_change(fd, LO_IFINDEX, parent_id, &opts, &info.id);
> +   if (CHECK_FAIL(ret < 0))
> +   goto end;
> +
> +   ret = bpf_tc_cls_get_info(fd, LO_IFINDEX, parent_id, NULL, &info);
> +   if (CHECK_FAIL(ret < 0))
> +   goto end;
> +
> +   if (CHECK_FAIL(info.class_id != TC_H_MAKE(1UL << 16, 2)))
> +   goto end;
> +   if (CHECK_FAIL((info.bpf_flags & TCA_BPF_FLAG_ACT_DIRECT) != 1))
> +   goto end;
> +
> +end:
> +   ret = bpf_tc_cls_detach(LO_IFINDEX, parent_id, &id);
> +   CHECK_FAIL(ret < 0);
> +   return ret;
> +}
> +
> +void test_test_tc_bpf(void)
> +{
> +   const char *file = "./test_tc_bpf_kern.o";
> +   struct bpf_program *clsp;
> +   struct bpf_object *obj;
> +   int cls_fd, ret;
> +
> +   obj = bpf_object__open(file);
> +   if (CHECK_FAIL(IS_ERR_OR_NULL(obj)))
> +   return;
> +
> +   clsp = bpf_object__find_program_by_title(obj, "classifier");
> +   if (CHECK_FAIL(IS_ERR_OR_NULL(clsp)))
> +   goto end;
> +
> +   ret = bpf_object__load(obj);
> +   if (CHECK_FAIL(ret < 0))
> +   goto end;
> +
> +   cls_fd = bpf_program__fd(clsp);
> +
> +   system("tc qdisc del dev lo clsact");
> +
> +   ret = test_tc_cls_internal(cls_fd, BPF_TC_CLSACT_INGRESS);
> +   if (CHECK_FAIL(ret < 0))
> +   goto end;
> +
> +   if (CHECK_FAIL(system("tc qdisc del dev lo clsact")))
> +   goto end;
> +
> +   ret = test_tc_cls_internal(cls_fd, BPF_TC_CLSACT_EGRESS);
> +   if (CHECK_FAIL(ret < 0))
> +   goto end;
> +
> +   CHECK_FAIL(system("tc qdisc del dev lo clsact"));

please don't use CHECK_FAIL. And prefer ASSERT_xxx over CHECK().

> +
> +end:
> +   bpf_object__close(obj);
> +}
> diff --git a/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c 
> b/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c
> new file mode 100644
> index ..3dd40e21af8e
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c
> @@ -0,0 +1,12 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include 
> +#include 
> +
> +// Dummy prog to test TC-BPF API

no C++-style comments, please (except for SPDX header, of course)
> +
> +SEC("classifier")
> +int cls(struct __sk_buff *skb)
> +{
> +   return 0;
> +}
> --
> 2.30.2
>

Re: [PATCH] arm64: dts: ls1028a-rdb: enable optee node

2021-04-19 Thread Li Yang

On Tue, Mar 30, 2021 at 9:53 AM Sahil Malhotra
 wrote:
>
> From: Sahil Malhotra 
>
> optee node was disabled by default, enabling it for ls1028a-rdb.
>
> Signed-off-by: Michael Walle 
> Signed-off-by: Sahil Malhotra 

Acked-by: Li Yang 

> ---
>  arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts | 4 
>  arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi| 2 +-
>  2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts 
> b/arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts
> index 41ae6e7675ba..1bdf0104d492 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts
> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts
> @@ -274,6 +274,10 @@
> status = "okay";
>  };
>
> +&optee {
> +   status = "okay";
> +};
> +
>  &sai4 {
> status = "okay";
>  };
> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi 
> b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
> index 262fbad8f0ec..fb155ac192b1 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
> @@ -92,7 +92,7 @@
> };
>
> firmware {
> -   optee {
> +   optee: optee  {
> compatible = "linaro,optee-tz";
> method = "smc";
> status = "disabled";
> --
> 2.17.1
>

[PATCH v2 2/2] usb: cdnsp: Fix lack of removing request from pending list.

2021-04-19 Thread Pawel Laszczak

From: Pawel Laszczak 

Patch fixes lack of removing request from ep->pending_list on failure
of the stop endpoint command. Driver even after failing this command
must remove request from ep->pending_list.
Without this fix driver can stuck in cdnsp_gadget_ep_disable function
in loop:
while (!list_empty(&pep->pending_list)) {
preq = next_request(&pep->pending_list);
cdnsp_ep_dequeue(pep, preq);
}

Fixes: 3d82904559f4 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD 
Driver")
Signed-off-by: Pawel Laszczak 

---
Changelog:
v2:
- removed blank space
- added "Fixes" tag

 drivers/usb/cdns3/cdnsp-gadget.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/usb/cdns3/cdnsp-gadget.c b/drivers/usb/cdns3/cdnsp-gadget.c
index 56707b6b0f57..c083985e387b 100644
--- a/drivers/usb/cdns3/cdnsp-gadget.c
+++ b/drivers/usb/cdns3/cdnsp-gadget.c
@@ -422,17 +422,17 @@ int cdnsp_ep_enqueue(struct cdnsp_ep *pep, struct 
cdnsp_request *preq)
 int cdnsp_ep_dequeue(struct cdnsp_ep *pep, struct cdnsp_request *preq)
 {
struct cdnsp_device *pdev = pep->pdev;
-   int ret;
+   int ret_stop = 0;
+   int ret_rem;
 
trace_cdnsp_request_dequeue(preq);
 
-   if (GET_EP_CTX_STATE(pep->out_ctx) == EP_STATE_RUNNING) {
-   ret = cdnsp_cmd_stop_ep(pdev, pep);
-   if (ret)
-   return ret;
-   }
+   if (GET_EP_CTX_STATE(pep->out_ctx) == EP_STATE_RUNNING)
+   ret_stop = cdnsp_cmd_stop_ep(pdev, pep);
+
+   ret_rem = cdnsp_remove_request(pdev, preq, pep);
 
-   return cdnsp_remove_request(pdev, preq, pep);
+   return ret_rem ? ret_rem : ret_stop;
 }
 
 static void cdnsp_zero_in_ctx(struct cdnsp_device *pdev)
-- 
2.25.1

Re: linux-next: Tree for Apr 19 (bcache)

2021-04-19 Thread Coly Li

On 4/20/21 1:50 AM, Jens Axboe wrote:
> On 4/19/21 10:26 AM, Coly Li wrote:
>> On 4/19/21 11:40 PM, Randy Dunlap wrote:
>>> On 4/19/21 3:23 AM, Stephen Rothwell wrote:
 Hi all,

 Changes since 20210416:

>>>
>>> on x86_64:
>>>
>>> when
>>> # CONFIG_BLK_DEV is not set
>>>
>>>
>>> WARNING: unmet direct dependencies detected for LIBNVDIMM
>>>   Depends on [n]: PHYS_ADDR_T_64BIT [=y] && HAS_IOMEM [=y] && BLK_DEV [=n]
>>>   Selected by [y]:
>>>   - BCACHE_NVM_PAGES [=y] && MD [=y] && BCACHE [=y] && PHYS_ADDR_T_64BIT 
>>> [=y]
>>>
>>>
>>> Full randconfig file is attached.
>>>
>>
>> I need hint from kbuild expert.
>>
>> My original idea to use "select LIBNVDIMM" is to avoid the
>> BCACHE_NVM_PAGES option to disappear if LIBNVDIMM is not enabled.
>> Otherwise if nvdimm driver is not configure, users won't know there is a
>> BCACHE_NVM_PAGES option unless they read bcache Kconfig file.
> 
> But why? That's exactly how it should work. Just use depends to set the
> dependency.
> 
>> But I see nvdimm's Kconfig, it uses "depends on BLK_DEV", I understand
>> it is acceptable that LIBNVDIMM option to disappear from "make
>> menuconfig" if BLK_DEV is not enabled.
>>
>> For such condition, which one is the proper way to set the dependence ?
>> - Change "select LIBNVDIMM" and "select DAX" to "depends on LIBNVDIMM"
>> and "depends on DAX" in bcache Kconfig
>> - Or change "depends on BLK_DEV" to "select BLK_DEV" in nvdimm Kconfig.
> 
> The former.
> 

Copied. Thanks for the hint. I will post a fix soon.

Coly Li

Re: [PATCH] riscv: Remove 32b kernel mapping from page table dump

2021-04-19 Thread Anup Patel

On Sun, Apr 18, 2021 at 4:59 PM Alexandre Ghiti  wrote:
>
> The 32b kernel mapping lies in the linear mapping, there is no point in
> printing its address in page table dump, so remove this leftover that
> comes from moving the kernel mapping outside the linear mapping for 64b
> kernel.
>
> Fixes: e9efb21fe352 ("riscv: Prepare ptdump for vm layout dynamic addresses")
> Signed-off-by: Alexandre Ghiti 

Looks good to me.

Reviewed-by: Anup Patel 

Regards,
Anup

> ---
>  arch/riscv/mm/ptdump.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c
> index 0aba4421115c..a4ed4bdbbfde 100644
> --- a/arch/riscv/mm/ptdump.c
> +++ b/arch/riscv/mm/ptdump.c
> @@ -76,8 +76,8 @@ enum address_markers_idx {
> PAGE_OFFSET_NR,
>  #ifdef CONFIG_64BIT
> MODULES_MAPPING_NR,
> -#endif
> KERNEL_MAPPING_NR,
> +#endif
> END_OF_SPACE_NR
>  };
>
> @@ -99,8 +99,8 @@ static struct addr_marker address_markers[] = {
> {0, "Linear mapping"},
>  #ifdef CONFIG_64BIT
> {0, "Modules mapping"},
> -#endif
> {0, "Kernel mapping (kernel, BPF)"},
> +#endif
> {-1, NULL},
>  };
>
> @@ -379,8 +379,8 @@ static int ptdump_init(void)
> address_markers[PAGE_OFFSET_NR].start_address = PAGE_OFFSET;
>  #ifdef CONFIG_64BIT
> address_markers[MODULES_MAPPING_NR].start_address = MODULES_VADDR;
> -#endif
> address_markers[KERNEL_MAPPING_NR].start_address = kernel_virt_addr;
> +#endif
>
> kernel_ptd_info.base_addr = KERN_VIRT_START;
>
> --
> 2.20.1
>

Re: [PATCH] riscv: Fix 32b kernel caused by 64b kernel mapping moving outside linear mapping

2021-04-19 Thread Anup Patel

On Sat, Apr 17, 2021 at 10:52 PM Alexandre Ghiti  wrote:
>
> Fix multiple leftovers when moving the kernel mapping outside the linear
> mapping for 64b kernel that left the 32b kernel unusable.
>
> Fixes: 4b67f48da707 ("riscv: Move kernel mapping outside of linear mapping")
> Signed-off-by: Alexandre Ghiti 

Quite a few #ifdef but I don't see any better way at the moment. Maybe we can
clean this later. Otherwise looks good to me.

Reviewed-by: Anup Patel 

Regards,
Anup

> ---
>  arch/riscv/include/asm/page.h|  9 +
>  arch/riscv/include/asm/pgtable.h | 16 
>  arch/riscv/mm/init.c | 25 -
>  3 files changed, 45 insertions(+), 5 deletions(-)
>
> diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> index 22cfb2be60dc..f64b61296c0c 100644
> --- a/arch/riscv/include/asm/page.h
> +++ b/arch/riscv/include/asm/page.h
> @@ -90,15 +90,20 @@ typedef struct page *pgtable_t;
>
>  #ifdef CONFIG_MMU
>  extern unsigned long va_pa_offset;
> +#ifdef CONFIG_64BIT
>  extern unsigned long va_kernel_pa_offset;
> +#endif
>  extern unsigned long pfn_base;
>  #define ARCH_PFN_OFFSET(pfn_base)
>  #else
>  #define va_pa_offset   0
> +#ifdef CONFIG_64BIT
>  #define va_kernel_pa_offset0
> +#endif
>  #define ARCH_PFN_OFFSET(PAGE_OFFSET >> PAGE_SHIFT)
>  #endif /* CONFIG_MMU */
>
> +#ifdef CONFIG_64BIT
>  extern unsigned long kernel_virt_addr;
>
>  #define linear_mapping_pa_to_va(x) ((void *)((unsigned long)(x) + 
> va_pa_offset))
> @@ -112,6 +117,10 @@ extern unsigned long kernel_virt_addr;
> (_x < kernel_virt_addr) ? 
>   \
> linear_mapping_va_to_pa(_x) : kernel_mapping_va_to_pa(_x);
>   \
> })
> +#else
> +#define __pa_to_va_nodebug(x)  ((void *)((unsigned long) (x) + va_pa_offset))
> +#define __va_to_pa_nodebug(x)  ((unsigned long)(x) - va_pa_offset)
> +#endif
>
>  #ifdef CONFIG_DEBUG_VIRTUAL
>  extern phys_addr_t __virt_to_phys(unsigned long x);
> diff --git a/arch/riscv/include/asm/pgtable.h 
> b/arch/riscv/include/asm/pgtable.h
> index 80e63a93e903..5afda75cc2c3 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -16,19 +16,27 @@
>  #else
>
>  #define ADDRESS_SPACE_END  (UL(-1))
> -/*
> - * Leave 2GB for kernel and BPF at the end of the address space
> - */
> +
> +#ifdef CONFIG_64BIT
> +/* Leave 2GB for kernel and BPF at the end of the address space */
>  #define KERNEL_LINK_ADDR   (ADDRESS_SPACE_END - SZ_2G + 1)
> +#else
> +#define KERNEL_LINK_ADDR   PAGE_OFFSET
> +#endif
>
>  #define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
>  #define VMALLOC_END  (PAGE_OFFSET - 1)
>  #define VMALLOC_START(PAGE_OFFSET - VMALLOC_SIZE)
>
> -/* KASLR should leave at least 128MB for BPF after the kernel */
>  #define BPF_JIT_REGION_SIZE(SZ_128M)
> +#ifdef CONFIG_64BIT
> +/* KASLR should leave at least 128MB for BPF after the kernel */
>  #define BPF_JIT_REGION_START   PFN_ALIGN((unsigned long)&_end)
>  #define BPF_JIT_REGION_END (BPF_JIT_REGION_START + BPF_JIT_REGION_SIZE)
> +#else
> +#define BPF_JIT_REGION_START   (PAGE_OFFSET - BPF_JIT_REGION_SIZE)
> +#define BPF_JIT_REGION_END (VMALLOC_END)
> +#endif
>
>  /* Modules always live before the kernel */
>  #ifdef CONFIG_64BIT
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 093f3a96ecfc..dc9b988e0778 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -91,8 +91,10 @@ static void print_vm_layout(void)
>   (unsigned long)VMALLOC_END);
> print_mlm("lowmem", (unsigned long)PAGE_OFFSET,
>   (unsigned long)high_memory);
> +#ifdef CONFIG_64BIT
> print_mlm("kernel", (unsigned long)KERNEL_LINK_ADDR,
>   (unsigned long)ADDRESS_SPACE_END);
> +#endif
>  }
>  #else
>  static void print_vm_layout(void) { }
> @@ -165,9 +167,11 @@ static struct pt_alloc_ops pt_ops;
>  /* Offset between linear mapping virtual address and kernel load address */
>  unsigned long va_pa_offset;
>  EXPORT_SYMBOL(va_pa_offset);
> +#ifdef CONFIG_64BIT
>  /* Offset between kernel mapping virtual address and kernel load address */
>  unsigned long va_kernel_pa_offset;
>  EXPORT_SYMBOL(va_kernel_pa_offset);
> +#endif
>  unsigned long pfn_base;
>  EXPORT_SYMBOL(pfn_base);
>
> @@ -410,7 +414,9 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
> load_sz = (uintptr_t)(&_end) - load_pa;
>
> va_pa_offset = PAGE_OFFSET - load_pa;
> +#ifdef CONFIG_64BIT
> va_kernel_pa_offset = kernel_virt_addr - load_pa;
> +#endif
>
> pfn_base = PFN_DOWN(load_pa);
>
> @@ -469,12 +475,16 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
>pa + PMD_SIZE, PMD_SIZE, PAGE_KERNEL);
> dtb_early_va = (void *)DTB_EARLY_BASE_VA + (dtb_pa & (PMD_SIZE - 1));
>  #else /* CONFIG_BUILTIN_DTB */
> +#ifdef CONFIG_64BIT
> /*
>

Re: [PATCH v20 1/2] scsi: ufs: Enable power management for wlun

2021-04-19 Thread Adrian Hunter

On 20/04/21 12:53 am, Asutosh Das (asd) wrote:
> On 4/19/2021 11:37 AM, Adrian Hunter wrote:
>> On 16/04/21 10:49 pm, Asutosh Das wrote:
>>>
>>> Co-developed-by: Can Guo 
>>> Signed-off-by: Can Guo 
>>> Signed-off-by: Asutosh Das 
>>> ---
>>
>> I came across 3 issues while testing.  See comments below.
>>
> Hi Adrian
> Thanks for the comments.
>> 
>>
>>> @@ -5794,7 +5839,7 @@ static void ufshcd_err_handling_unprepare(struct 
>>> ufs_hba *hba)
>>>   if (ufshcd_is_clkscaling_supported(hba))
>>>   ufshcd_clk_scaling_suspend(hba, false);
>>>   ufshcd_clear_ua_wluns(hba);
>>
>> ufshcd_clear_ua_wluns() deadlocks trying to clear UFS_UPIU_RPMB_WLUN
>> if sdev_rpmb is suspended and sdev_ufs_device is suspending.
>> e.g. ufshcd_wl_suspend() is waiting on host_sem while ufshcd_err_handler()
>> is running, at which point sdev_rpmb has already suspended.
>>
> Umm, I didn't understand this deadlock.
> When you say, sdev_rpmb is suspended, does it mean runtime_suspended?
> sdev_ufs_device is suspending - this can't be runtime_suspending, while 
> ufshcd_err_handling_unprepare is running.
> 
> If you've a call-stack of this deadlock, please can you share it with me. 
> I'll also try to reproduce this.

Yes it is system suspend. sdev_rpmb has suspended, sdev_ufs_device is waiting 
on host_sem.
ufshcd_err_handler() holds host_sem. ufshcd_clear_ua_wlun(UFS_UPIU_RPMB_WLUN) 
gets stuck.
I will get some call-stacks.

> 
> I'll address the other comments in the next version.
> 
> 
> Thank you!
> 
>>> -    pm_runtime_put(hba->dev);
>>> +    ufshcd_rpm_put(hba);
>>>   }
>>
>> 
>>
>>> +void ufshcd_resume_complete(struct device *dev)
>>> +{
>

linux-next: manual merge of the ftrace tree with the bpf-next tree

2021-04-19 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the ftrace tree got a conflict in:

  kernel/trace/bpf_trace.c

between commit:

  d9c9e4db186a ("bpf: Factorize bpf_trace_printk and bpf_seq_printf")

from the bpf-next tree and commit:

  f2cc020d7876 ("tracing: Fix various typos in comments")

from the ftrace tree.

I fixed it up (the former removed the comment updated by the latter) and
can carry the fix as necessary. This is now fixed as far as linux-next
is concerned, but any non trivial conflicts should be mentioned to your
upstream maintainer when your tree is submitted for merging.  You may
also want to consider cooperating with the maintainer of the conflicting
tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


pgpZSIISeFaSc.pgp
Description: OpenPGP digital signature

Re: [PATCH 0/2] Change struct page layout for page_pool

2021-04-19 Thread Michael Ellerman

"Matthew Wilcox (Oracle)"  writes:
> The first patch here fixes two bugs on ppc32, and mips32.  It fixes one
> bug on arc and arm32 (in certain configurations).  It probably makes
> sense to get it in ASAP through the networking tree.  I'd like to see
> testing on those four architectures if possible?

Sorry I don't have easy access to any hardware that fits the bill. At
some point I'll be able to go to the office and setup a machine that (I
think) can test these paths, but I don't have an ETA on that.

You and others seem to have done lots of analysis on this though, so I
think you should merge the fixes without waiting on ppc32 testing.

cheers


>
> The second patch enables new functionality.  It is much less urgent.
> I'd really like to see Mel & Michal's thoughts on it.
>
> I have only compile-tested these patches.
>
> Matthew Wilcox (Oracle) (2):
>   mm: Fix struct page layout on 32-bit systems
>   mm: Indicate pfmemalloc pages in compound_head
>
>  include/linux/mm.h   | 12 +++-
>  include/linux/mm_types.h |  9 -
>  include/net/page_pool.h  | 12 +++-
>  net/core/page_pool.c | 12 +++-
>  4 files changed, 29 insertions(+), 16 deletions(-)
>
> -- 
> 2.30.2

RE: [PATCH 1/2] usb: gadget: f_uac2: Stop endpoint before enabling it.

2021-04-19 Thread Pawel Laszczak

>On 21-04-19 09:50:53, Pawel Laszczak wrote:
>> From: Pawel Laszczak 
>>
>> Patch adds disabling endpoint before enabling it during changing
>> alternate setting. Lack of this functionality causes that in some
>> cases uac2 queue the same request multiple time.
>> Such situation can occur when host send set interface with
>> alternate setting 1 twice.
>>
>> Signed-off-by: Pawel Laszczak 
>> ---
>>  drivers/usb/gadget/function/f_uac2.c | 8 
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/usb/gadget/function/f_uac2.c 
>> b/drivers/usb/gadget/function/f_uac2.c
>> index 9cc5c512a5cd..7d20a9d8a1b4 100644
>> --- a/drivers/usb/gadget/function/f_uac2.c
>> +++ b/drivers/usb/gadget/function/f_uac2.c
>> @@ -890,17 +890,17 @@ afunc_set_alt(struct usb_function *fn, unsigned intf, 
>> unsigned alt)
>>  if (intf == uac2->as_out_intf) {
>>  uac2->as_out_alt = alt;
>>
>> +u_audio_stop_capture(&uac2->g_audio);
>> +
>>  if (alt)
>>  ret = u_audio_start_capture(&uac2->g_audio);
>> -else
>> -u_audio_stop_capture(&uac2->g_audio);
>>  } else if (intf == uac2->as_in_intf) {
>>  uac2->as_in_alt = alt;
>>
>> +u_audio_stop_playback(&uac2->g_audio);
>> +
>>  if (alt)
>>  ret = u_audio_start_playback(&uac2->g_audio);
>> -else
>> -u_audio_stop_playback(&uac2->g_audio);
>>  } else {
>>  dev_err(dev, "%s:%d Error!\n", __func__, __LINE__);
>>  return -EINVAL;
>
>To avoid this, you may use prm->ep_enabled to judge if the endpoint has
>already enabled.

Such condition is as first instruction inside u_audio_stop_playback->free_ep  
function,
so we don't need duplicate it here.

>
>--
>
>Thanks,
>Peter Chen

--

Regards,
Pawe Laszczak

[PATCH] mhi: add MHI_STATE_M2 to resume success criteria

2021-04-19 Thread Baochen Qiang

During system resume, mhi driver triggers M3->M0 transition and then waits
for target device to enter M0 state. Once done, the device queues a state
change event into ctrl event ring and notify mhi dirver by raising an
interrupt, where a tasklet is scheduled to process this event. In most cases,
the taklet is served timely and wait operation succeeds.

However, there are cases where CPU is busy and can not serve this tasklet
for some time. Once delay goes long enough, the device moves itself to M1
state and also interrupts mhi driver after inserting a new state change
event to ctrl ring. Later CPU finally has time to process the ring, however
there are two events in it now:
1. for M3->M0 event, which is processed first as queued first,
   tasklet handler updates device state to M0 and wakes up the task,
   i.e., the mhi driver.
2. for M0->M1 event, which is processed later, tasklet handler
   triggers M1->M2 transition and updates device state to M2 directly,
   then wakes up the mhi driver(if still sleeping on this wait queue).
Note that although mhi driver has been woken up while processing the first
event, it may still has no chance to run before the second event is processed.
In other words, mhi driver has to keep waiting till timeout cause the M0 state
has been missed.

kernel log here:
...
Apr 15 01:45:14 test-NUC8i7HVK kernel: [ 4247.911251] mhi :06:00.0: Entered 
with PM state: M3, MHI state: M3
Apr 15 01:45:14 test-NUC8i7HVK kernel: [ 4247.917762] mhi :06:00.0: State 
change event to state: M0
Apr 15 01:45:14 test-NUC8i7HVK kernel: [ 4247.917767] mhi :06:00.0: State 
change event to state: M1
Apr 15 01:45:14 test-NUC8i7HVK kernel: [ 4338.788231] mhi :06:00.0: Did not 
enter M0 state, MHI state: M2, PM state: M2
...

Fix this issue by simply adding M2 as a valid state for resume.

Tested-on: WCN6855 hw2.0 PCI 
WLAN.HSP.1.1-01720.1-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1

Signed-off-by: Baochen Qiang 
---
 drivers/bus/mhi/core/pm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
index ce73cfa63cb3..ca5f2feed9d5 100644
--- a/drivers/bus/mhi/core/pm.c
+++ b/drivers/bus/mhi/core/pm.c
@@ -900,6 +900,7 @@ int mhi_pm_resume(struct mhi_controller *mhi_cntrl)
 
ret = wait_event_timeout(mhi_cntrl->state_event,
 mhi_cntrl->dev_state == MHI_STATE_M0 ||
+mhi_cntrl->dev_state == MHI_STATE_M2 ||
 MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state),
 msecs_to_jiffies(mhi_cntrl->timeout_ms));
 
-- 
2.25.1

Re: [PATCH v4 4/4] pinctrl: add rsel setting on MT8195

2021-04-19 Thread Sean Wang

On Mon, Apr 12, 2021 at 10:57 PM Zhiyong Tao  wrote:


> @@ -176,6 +180,12 @@ static int mtk_pinconf_get(struct pinctrl_dev *pctldev,
> else
> err = -ENOTSUPP;
> break;
> +   case MTK_PIN_CONFIG_RSEL:
> +   if (hw->soc->rsel_get)
> +   err = hw->soc->rsel_get(hw, desc, &ret);
> +   else
> +   err = -EOPNOTSUPP;

I think that should want to be -ENOTSUPP to align other occurrences.

> +   break;
> default:
> err = -ENOTSUPP;
> }
> @@ -295,6 +305,12 @@ static int mtk_pinconf_set(struct pinctrl_dev *pctldev, 
> unsigned int pin,
> else
> err = -ENOTSUPP;
> break;
> +   case MTK_PIN_CONFIG_RSEL:
> +   if (hw->soc->rsel_set)
> +   err = hw->soc->rsel_set(hw, desc, arg);
> +   else
> +   err = -EOPNOTSUPP;

Ditto

> +   break;
> default:
> err = -ENOTSUPP;
> }
> --
> 2.18.0
>

linux-next: build warning after merge of the spi tree

2021-04-19 Thread Stephen Rothwell

Hi all,

After merging the spi tree, today's linux-next build (x86_64 allmodconfig)
produced this warning:

In file included from include/linux/printk.h:409,
 from include/linux/kernel.h:16,
 from include/linux/clk.h:13,
 from drivers/spi/spi-stm32-qspi.c:7:
drivers/spi/spi-stm32-qspi.c: In function 'stm32_qspi_dirmap_read':
drivers/spi/spi-stm32-qspi.c:481:21: warning: format '%x' expects argument of 
type 'unsigned int', but argument 5 has type 'size_t' {aka 'long unsigned int'} 
[-Wformat=]
  481 |  dev_dbg(qspi->dev, "%s len = 0x%x offs = 0x%llx buf = 0x%p\n", 
__func__, len, offs, buf);
  | ^~
include/linux/dynamic_debug.h:129:15: note: in definition of macro 
'__dynamic_func_call'
  129 |   func(&id, ##__VA_ARGS__);  \
  |   ^~~
include/linux/dynamic_debug.h:161:2: note: in expansion of macro 
'_dynamic_func_call'
  161 |  _dynamic_func_call(fmt,__dynamic_dev_dbg,   \
  |  ^~
include/linux/dev_printk.h:123:2: note: in expansion of macro 'dynamic_dev_dbg'
  123 |  dynamic_dev_dbg(dev, dev_fmt(fmt), ##__VA_ARGS__)
  |  ^~~
include/linux/dev_printk.h:123:23: note: in expansion of macro 'dev_fmt'
  123 |  dynamic_dev_dbg(dev, dev_fmt(fmt), ##__VA_ARGS__)
  |   ^~~
drivers/spi/spi-stm32-qspi.c:481:2: note: in expansion of macro 'dev_dbg'
  481 |  dev_dbg(qspi->dev, "%s len = 0x%x offs = 0x%llx buf = 0x%p\n", 
__func__, len, offs, buf);
  |  ^~~
drivers/spi/spi-stm32-qspi.c:481:34: note: format string is defined here
  481 |  dev_dbg(qspi->dev, "%s len = 0x%x offs = 0x%llx buf = 0x%p\n", 
__func__, len, offs, buf);
  | ~^
  |  |
  |  unsigned int
  | %lx

Introduced by commit

  18674dee3cd6 ("spi: stm32-qspi: Add dirmap support")

-- 
Cheers,
Stephen Rothwell


pgpnXiPdMIUXz.pgp
Description: OpenPGP digital signature

Re: [PATCH 00/13] [RFC] Rust support

2021-04-19 Thread Willy Tarreau

Hi Nick,

On Mon, Apr 19, 2021 at 05:24:33PM -0700, Nick Desaulniers wrote:
> I don't think the introduction of Rust made Firefox _more_ insecure.
> https://wiki.mozilla.org/Oxidation#Within_Firefox

Browsers are human interfaces and do not fundamentally require low
level access to memory/hardware/whatever. They can be written in
about any language, only the resource usage and performance will
make a difference. As such, some were even written in Java or JS
for example.

Operating systems, and particularly drivers *do* require low-level
accesses, and stuff that can hardly be abstracted or understood by
a compiler. You may have to perform two 16-bit reads/writes on a
32-bit MMIO address to perform an operation and the compiler does
not have to know it, just to obey.

> Really, a key point is that a lot of common mistakes in C are compile
> time errors in Rust. I know no "true" kernel dev would make such
> mistakes in C,

Everyone makes mistakes, the level of attention varies over time and
the focus often changes when dealing with build errors. How many time
some of us facing a bug remembered having changed the code very late
after a build error, and being less careful from this point when the
goal changed from "let's do it right" to "let's get this to build" ?

> but is there nothing we can do to help our peers
> writing drivers?  The point is to transfer cost from runtime to
> compile time to avoid costs at runtime; like all of the memory safety
> bugs which are costing our industry.

And do we have stats on the number of logical bugs, some of which are
caused by developers trying to work around compilers' stubbornness ?
For me, personally speaking, they have *increased* over time, usually
trying to avoid some annoying modern gcc warnings, resulting in integer
casts being placed close to string formats, or returns being placed in
switch/case to avoid the fall-through warning, etc. Thus I'm worried that
a non-negligible part of the 70% of bugs caused by memory safety issues
could be replaced with logic bugs to get to the point where the rust
compiler finally accepts to compile the code. It makes me think about
researchers trying to reduce the causes of certain deaths and claiming
to "save lives" while in the end the people they "save" will simply die
from something else.

And I'm not particularly trying to blindly defend C here. I'm complaining
every single day about some of its shortcomings like the vast amount of
UB, stupid type promotion, counter-intuitive operators precedence when
combining bit-ops with arithmetic, limited size of enums, lack of rotate
operator, strict aliasing, or the recourse to asm() statements every 10
lines to do stuff that can hardly be expressed in a way understandable
by a compiler. I'm just seeing that a lot of the griefs I'm having
against C come from the compiler trying to be too smart or too stubborn,
so giving even more of the handle to a compiler doesn't appeal me at all.

In addition, we all know how painful it is to work around compiler bugs
by writing complex code that carefully avoids certain constructs. I'm
wondering if we'll still have that luxury with a stricter compiler, or
if the only response will have to be between "let's disable this driver
that does not compile" or "please force distros to upgrade their
compilers".

But we'll see :-/

Regards,
Willy

Re: [PATCH v4 3/4] pinctrl: add drive for I2C related pins on MT8195

2021-04-19 Thread Sean Wang

On Mon, Apr 12, 2021 at 10:57 PM Zhiyong Tao  wrote:
>
> This patch provides the advanced drive raw data setting version
> for I2C used pins on MT8195.
>
> Signed-off-by: Zhiyong Tao 

Acked-by: Sean Wang 

> ---
>  drivers/pinctrl/mediatek/pinctrl-mt8195.c | 22 +++
>  .../pinctrl/mediatek/pinctrl-mtk-common-v2.c  | 14 
>  .../pinctrl/mediatek/pinctrl-mtk-common-v2.h  |  5 +
>  3 files changed, 41 insertions(+)
>
> diff --git a/drivers/pinctrl/mediatek/pinctrl-mt8195.c 
> b/drivers/pinctrl/mediatek/pinctrl-mt8195.c
> index 063f164d7c9b..a7500e18bb1d 100644
> --- a/drivers/pinctrl/mediatek/pinctrl-mt8195.c
> +++ b/drivers/pinctrl/mediatek/pinctrl-mt8195.c
> @@ -760,6 +760,25 @@ static const struct mtk_pin_field_calc 
> mt8195_pin_drv_range[] = {
> PIN_FIELD_BASE(143, 143, 1, 0x020, 0x10, 24, 3),
>  };
>
> +static const struct mtk_pin_field_calc mt8195_pin_drv_adv_range[] = {
> +   PIN_FIELD_BASE(8, 8, 4, 0x020, 0x10, 15, 3),
> +   PIN_FIELD_BASE(9, 9, 4, 0x020, 0x10, 0, 3),
> +   PIN_FIELD_BASE(10, 10, 4, 0x020, 0x10, 18, 3),
> +   PIN_FIELD_BASE(11, 11, 4, 0x020, 0x10, 3, 3),
> +   PIN_FIELD_BASE(12, 12, 4, 0x020, 0x10, 21, 3),
> +   PIN_FIELD_BASE(13, 13, 4, 0x020, 0x10, 6, 3),
> +   PIN_FIELD_BASE(14, 14, 4, 0x020, 0x10, 24, 3),
> +   PIN_FIELD_BASE(15, 15, 4, 0x020, 0x10, 9, 3),
> +   PIN_FIELD_BASE(16, 16, 4, 0x020, 0x10, 27, 3),
> +   PIN_FIELD_BASE(17, 17, 4, 0x020, 0x10, 12, 3),
> +   PIN_FIELD_BASE(29, 29, 2, 0x020, 0x10, 0, 3),
> +   PIN_FIELD_BASE(30, 30, 2, 0x020, 0x10, 3, 3),
> +   PIN_FIELD_BASE(34, 34, 1, 0x040, 0x10, 0, 3),
> +   PIN_FIELD_BASE(35, 35, 1, 0x040, 0x10, 3, 3),
> +   PIN_FIELD_BASE(44, 44, 1, 0x040, 0x10, 6, 3),
> +   PIN_FIELD_BASE(45, 45, 1, 0x040, 0x10, 9, 3),
> +};
> +
>  static const struct mtk_pin_reg_calc mt8195_reg_cals[PINCTRL_PIN_REG_MAX] = {
> [PINCTRL_PIN_REG_MODE] = MTK_RANGE(mt8195_pin_mode_range),
> [PINCTRL_PIN_REG_DIR] = MTK_RANGE(mt8195_pin_dir_range),
> @@ -773,6 +792,7 @@ static const struct mtk_pin_reg_calc 
> mt8195_reg_cals[PINCTRL_PIN_REG_MAX] = {
> [PINCTRL_PIN_REG_PUPD] = MTK_RANGE(mt8195_pin_pupd_range),
> [PINCTRL_PIN_REG_R0] = MTK_RANGE(mt8195_pin_r0_range),
> [PINCTRL_PIN_REG_R1] = MTK_RANGE(mt8195_pin_r1_range),
> +   [PINCTRL_PIN_REG_DRV_ADV] = MTK_RANGE(mt8195_pin_drv_adv_range),
>  };
>
>  static const char * const mt8195_pinctrl_register_base_names[] = {
> @@ -801,6 +821,8 @@ static const struct mtk_pin_soc mt8195_data = {
> .bias_get_combo = mtk_pinconf_bias_get_combo,
> .drive_set = mtk_pinconf_drive_set_rev1,
> .drive_get = mtk_pinconf_drive_get_rev1,
> +   .adv_drive_get = mtk_pinconf_adv_drive_get_raw,
> +   .adv_drive_set = mtk_pinconf_adv_drive_set_raw,
>  };
>
>  static const struct of_device_id mt8195_pinctrl_of_match[] = {
> diff --git a/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.c 
> b/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.c
> index 72f17f26acd8..2b51f4a9b860 100644
> --- a/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.c
> +++ b/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.c
> @@ -1027,6 +1027,20 @@ int mtk_pinconf_adv_drive_get(struct mtk_pinctrl *hw,
>  }
>  EXPORT_SYMBOL_GPL(mtk_pinconf_adv_drive_get);
>
> +int mtk_pinconf_adv_drive_set_raw(struct mtk_pinctrl *hw,
> + const struct mtk_pin_desc *desc, u32 arg)
> +{
> +   return mtk_hw_set_value(hw, desc, PINCTRL_PIN_REG_DRV_ADV, arg);
> +}
> +EXPORT_SYMBOL_GPL(mtk_pinconf_adv_drive_set_raw);
> +
> +int mtk_pinconf_adv_drive_get_raw(struct mtk_pinctrl *hw,
> + const struct mtk_pin_desc *desc, u32 *val)
> +{
> +   return mtk_hw_get_value(hw, desc, PINCTRL_PIN_REG_DRV_ADV, val);
> +}
> +EXPORT_SYMBOL_GPL(mtk_pinconf_adv_drive_get_raw);
> +
>  MODULE_LICENSE("GPL v2");
>  MODULE_AUTHOR("Sean Wang ");
>  MODULE_DESCRIPTION("Pin configuration library module for mediatek SoCs");
> diff --git a/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.h 
> b/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.h
> index e2aae285b5fc..fd5ce9c5dcbd 100644
> --- a/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.h
> +++ b/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.h
> @@ -66,6 +66,7 @@ enum {
> PINCTRL_PIN_REG_DRV_EN,
> PINCTRL_PIN_REG_DRV_E0,
> PINCTRL_PIN_REG_DRV_E1,
> +   PINCTRL_PIN_REG_DRV_ADV,
> PINCTRL_PIN_REG_MAX,
>  };
>
> @@ -314,6 +315,10 @@ int mtk_pinconf_adv_drive_set(struct mtk_pinctrl *hw,
>   const struct mtk_pin_desc *desc, u32 arg);
>  int mtk_pinconf_adv_drive_get(struct mtk_pinctrl *hw,
>   const struct mtk_pin_desc *desc, u32 *val);
> +int mtk_pinconf_adv_drive_set_raw(struct mtk_pinctrl *hw,
> + const struct mtk_pin_desc *desc, u32 arg);
> +int mtk_pinconf_adv_drive_get_raw(struct mtk_pinctrl *h

[PATCH] drivers: regulator: core.c: Fix indentation of comment

2021-04-19 Thread Shubhankar Kuranagatti

Shifted the closing */ of multiline comment to a new line
This is done to maintain code uniformity

Signed-off-by: Shubhankar Kuranagatti 
---
 drivers/regulator/core.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index 16114aea099a..06fbf18b6524 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -538,7 +538,8 @@ static int regulator_mode_constrain(struct regulator_dev 
*rdev,
 
/* The modes are bitmasks, the most power hungry modes having
 * the lowest values. If the requested mode isn't supported
-* try higher modes. */
+* try higher modes.
+*/
while (*mode) {
if (rdev->constraints->valid_modes_mask & *mode)
return 0;
@@ -931,7 +932,8 @@ static DEVICE_ATTR(bypass, 0444,
   regulator_bypass_show, NULL);
 
 /* Calculate the new optimum regulator operating mode based on the new total
- * consumer load. All locks held by caller */
+ * consumer load. All locks held by caller
+ */
 static int drms_uA_update(struct regulator_dev *rdev)
 {
struct regulator *sibling;
@@ -1219,7 +1221,8 @@ static int machine_constraints_voltage(struct 
regulator_dev *rdev,
int cmax = constraints->max_uV;
 
/* it's safe to autoconfigure fixed-voltage supplies
-  and the constraints are used by list_voltage. */
+* and the constraints are used by list_voltage.
+*/
if (count == 1 && !cmin) {
cmin = 1;
cmax = INT_MAX;
@@ -2525,7 +2528,8 @@ static int _regulator_do_enable(struct regulator_dev 
*rdev)
 
/* Allow the regulator to ramp; it would be useful to extend
 * this for bulk operations so that the regulators can ramp
-* together.  */
+* together.
+*/
trace_regulator_enable_delay(rdev_get_name(rdev));
 
/* If poll_enabled_time is set, poll upto the delay calculated
@@ -5337,10 +5341,12 @@ regulator_register(const struct regulator_desc 
*regulator_desc,
ret = set_machine_constraints(rdev);
if (ret == -EPROBE_DEFER) {
/* Regulator might be in bypass mode and so needs its supply
-* to set the constraints */
+* to set the constraints
+*/
/* FIXME: this currently triggers a chicken-and-egg problem
 * when creating -SUPPLY symlink in sysfs to a regulator
-* that is just being created */
+* that is just being created
+*/
rdev_dbg(rdev, "will resolve supply early: %s\n",
 rdev->supply_name);
ret = regulator_resolve_supply(rdev);
@@ -5899,7 +5905,8 @@ static int regulator_late_cleanup(struct device *dev, 
void *data)
 
if (have_full_constraints()) {
/* We log since this may kill the system if it goes
-* wrong. */
+* wrong.
+*/
rdev_info(rdev, "disabling\n");
ret = _regulator_do_disable(rdev);
if (ret != 0)
-- 
2.17.1

Re: [PATCH v4 2/4] pinctrl: add pinctrl driver on mt8195

2021-04-19 Thread Sean Wang

On Mon, Apr 12, 2021 at 10:57 PM Zhiyong Tao  wrote:
>
> This commit includes pinctrl driver for mt8195.
>
> Signed-off-by: Zhiyong Tao 

Acked-by: Sean Wang 

> ---
>  drivers/pinctrl/mediatek/Kconfig  |6 +
>  drivers/pinctrl/mediatek/Makefile |1 +
>  drivers/pinctrl/mediatek/pinctrl-mt8195.c |  828 
>  drivers/pinctrl/mediatek/pinctrl-mtk-mt8195.h | 1669 +
>  4 files changed, 2504 insertions(+)
>  create mode 100644 drivers/pinctrl/mediatek/pinctrl-mt8195.c
>  create mode 100644 drivers/pinctrl/mediatek/pinctrl-mtk-mt8195.h
>
> diff --git a/drivers/pinctrl/mediatek/Kconfig 
> b/drivers/pinctrl/mediatek/Kconfig
> index eef17f228669..90f0c8255eaf 100644
> --- a/drivers/pinctrl/mediatek/Kconfig
> +++ b/drivers/pinctrl/mediatek/Kconfig
> @@ -147,6 +147,12 @@ config PINCTRL_MT8192
> default ARM64 && ARCH_MEDIATEK
> select PINCTRL_MTK_PARIS
>
> +config PINCTRL_MT8195
> +   bool "Mediatek MT8195 pin control"
> +   depends on OF
> +   depends on ARM64 || COMPILE_TEST
> +   select PINCTRL_MTK_PARIS
> +
>  config PINCTRL_MT8516
> bool "Mediatek MT8516 pin control"
> depends on OF
> diff --git a/drivers/pinctrl/mediatek/Makefile 
> b/drivers/pinctrl/mediatek/Makefile
> index 01218bf4dc30..06fde993ace2 100644
> --- a/drivers/pinctrl/mediatek/Makefile
> +++ b/drivers/pinctrl/mediatek/Makefile
> @@ -21,5 +21,6 @@ obj-$(CONFIG_PINCTRL_MT8167)  += pinctrl-mt8167.o
>  obj-$(CONFIG_PINCTRL_MT8173)   += pinctrl-mt8173.o
>  obj-$(CONFIG_PINCTRL_MT8183)   += pinctrl-mt8183.o
>  obj-$(CONFIG_PINCTRL_MT8192)   += pinctrl-mt8192.o
> +obj-$(CONFIG_PINCTRL_MT8195)+= pinctrl-mt8195.o
>  obj-$(CONFIG_PINCTRL_MT8516)   += pinctrl-mt8516.o
>  obj-$(CONFIG_PINCTRL_MT6397)   += pinctrl-mt6397.o
> diff --git a/drivers/pinctrl/mediatek/pinctrl-mt8195.c 
> b/drivers/pinctrl/mediatek/pinctrl-mt8195.c
> new file mode 100644
> index ..063f164d7c9b
> --- /dev/null
> +++ b/drivers/pinctrl/mediatek/pinctrl-mt8195.c
> @@ -0,0 +1,828 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2020 MediaTek Inc.
> + *
> + * Author: Zhiyong Tao 
> + *
> + */
> +
> +#include "pinctrl-mtk-mt8195.h"
> +#include "pinctrl-paris.h"
> +
> +/* MT8195 have multiple bases to program pin configuration listed as the 
> below:
> + * iocfg[0]:0x10005000, iocfg[1]:0x11d1, iocfg[2]:0x11d3,
> + * iocfg[3]:0x11d4, iocfg[4]:0x11e2, iocfg[5]:0x11eb,
> + * iocfg[6]:0x11f4.
> + * _i_based could be used to indicate what base the pin should be mapped 
> into.
> + */
> +
> +#define PIN_FIELD_BASE(s_pin, e_pin, i_base, s_addr, x_addrs, s_bit, x_bits) 
> \
> +   PIN_FIELD_CALC(s_pin, e_pin, i_base, s_addr, x_addrs, s_bit, x_bits, \
> +  32, 0)
> +
> +#define PINS_FIELD_BASE(s_pin, e_pin, i_base, s_addr, x_addrs, s_bit, 
> x_bits) \
> +   PIN_FIELD_CALC(s_pin, e_pin, i_base, s_addr, x_addrs, s_bit, x_bits,  
> \
> +  32, 1)
> +
> +static const struct mtk_pin_field_calc mt8195_pin_mode_range[] = {
> +   PIN_FIELD(0, 144, 0x300, 0x10, 0, 4),
> +};
> +
> +static const struct mtk_pin_field_calc mt8195_pin_dir_range[] = {
> +   PIN_FIELD(0, 144, 0x0, 0x10, 0, 1),
> +};
> +
> +static const struct mtk_pin_field_calc mt8195_pin_di_range[] = {
> +   PIN_FIELD(0, 144, 0x200, 0x10, 0, 1),
> +};
> +
> +static const struct mtk_pin_field_calc mt8195_pin_do_range[] = {
> +   PIN_FIELD(0, 144, 0x100, 0x10, 0, 1),
> +};
> +
> +static const struct mtk_pin_field_calc mt8195_pin_ies_range[] = {
> +   PIN_FIELD_BASE(0, 0, 4, 0x040, 0x10, 0, 1),
> +   PIN_FIELD_BASE(1, 1, 4, 0x040, 0x10, 1, 1),
> +   PIN_FIELD_BASE(2, 2, 4, 0x040, 0x10, 2, 1),
> +   PIN_FIELD_BASE(3, 3, 4, 0x040, 0x10, 3, 1),
> +   PIN_FIELD_BASE(4, 4, 4, 0x040, 0x10, 4, 1),
> +   PIN_FIELD_BASE(5, 5, 4, 0x040, 0x10, 5, 1),
> +   PIN_FIELD_BASE(6, 6, 4, 0x040, 0x10, 6, 1),
> +   PIN_FIELD_BASE(7, 7, 4, 0x040, 0x10, 7, 1),
> +   PIN_FIELD_BASE(8, 8, 4, 0x040, 0x10, 13, 1),
> +   PIN_FIELD_BASE(9, 9, 4, 0x040, 0x10, 8, 1),
> +   PIN_FIELD_BASE(10, 10, 4, 0x040, 0x10, 14, 1),
> +   PIN_FIELD_BASE(11, 11, 4, 0x040, 0x10, 9, 1),
> +   PIN_FIELD_BASE(12, 12, 4, 0x040, 0x10, 15, 1),
> +   PIN_FIELD_BASE(13, 13, 4, 0x040, 0x10, 10, 1),
> +   PIN_FIELD_BASE(14, 14, 4, 0x040, 0x10, 16, 1),
> +   PIN_FIELD_BASE(15, 15, 4, 0x040, 0x10, 11, 1),
> +   PIN_FIELD_BASE(16, 16, 4, 0x040, 0x10, 17, 1),
> +   PIN_FIELD_BASE(17, 17, 4, 0x040, 0x10, 12, 1),
> +   PIN_FIELD_BASE(18, 18, 2, 0x040, 0x10, 5, 1),
> +   PIN_FIELD_BASE(19, 19, 2, 0x040, 0x10, 12, 1),
> +   PIN_FIELD_BASE(20, 20, 2, 0x040, 0x10, 11, 1),
> +   PIN_FIELD_BASE(21, 21, 2, 0x040, 0x10, 10, 1),
> +   PIN_FIELD_BASE(22, 22, 2, 0x040, 0x10, 0, 1),
> +   PIN_FIELD_BASE(23, 23, 2, 0x040, 0x10, 1, 1),
> +   PIN_FIELD_BASE(24, 24, 2, 0x040, 0x10, 2, 1),
> +   PIN_FIELD_B

Re: [PATCH] powerpc/pseries: Add shutdown() to vio_driver and vio_bus

2021-04-19 Thread Michael Ellerman

Tyrel Datwyler  writes:
> On 4/17/21 5:30 AM, Michael Ellerman wrote:
>> Tyrel Datwyler  writes:
>>> On 4/1/21 5:13 PM, Tyrel Datwyler wrote:
 Currently, neither the vio_bus or vio_driver structures provide support
 for a shutdown() routine.

 Add support for shutdown() by allowing drivers to provide a
 implementation via function pointer in their vio_driver struct and
 provide a proper implementation in the driver template for the vio_bus
 that calls a vio drivers shutdown() if defined.

 In the case that no shutdown() is defined by a vio driver and a kexec is
 in progress we implement a big hammer that calls remove() to ensure no
 further DMA for the devices is possible.

 Signed-off-by: Tyrel Datwyler 
 ---
>>>
>>> Ping... any comments, problems with this approach?
>> 
>> The kexec part seems like a bit of a hack.
>> 
>> It also doesn't help for kdump, when none of the shutdown code is run.
>
> If I understand correctly for kdump we have a reserved memory space where the
> kdump kernel is loaded, but for kexec the memory region isn't reserved ahead 
> of
> time meaning we can try and load the kernel over potential memory used for DMA
> by the current kernel.

That's correct.

>> How many drivers do we have? Can we just implement a proper shutdown for
>> them?
>
> Well that is the end goal. I just don't currently have the bandwidth to do 
> each
> driver myself with a proper shutdown sequence, and thought this was a 
> launching
> off point to at least introduce the shutdown callback to the VIO bus.

Fair enough.

> Off the top of my head we have 3 storage drivers, 2 network drivers, vtpm, 
> vmc,
> pseries_rng, nx, nx842, hvcs, hvc_vio.
>
> I can drop the kexec_in_progress hammer and just have each driver call 
> remove()
> themselves in their shutdown function. Leave it to each maintainer to decide 
> if
> remove() is enough or if there is a more lightweight quiesce sequence they
> choose to implement.

That's OK, you've convinced me. I'll take it as-is.

Eventually it would be good for drivers to implement shutdown in the
optimal way for their device, but that can be done incrementally.

cheers

Re: [PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3.3)

2021-04-19 Thread Serge E. Hallyn

On Mon, Apr 19, 2021 at 06:09:11PM +0200, Christian Brauner wrote:
> On Mon, Apr 19, 2021 at 07:25:14AM -0500, Serge Hallyn wrote:
> > cap_setfcap is required to create file capabilities.
> > 
> > Since 8db6c34f1dbc ("Introduce v3 namespaced file capabilities"), a
> > process running as uid 0 but without cap_setfcap is able to work around
> > this as follows: unshare a new user namespace which maps parent uid 0
> > into the child namespace.  While this task will not have new
> > capabilities against the parent namespace, there is a loophole due to
> > the way namespaced file capabilities are represented as xattrs.  File
> > capabilities valid in userns 1 are distinguished from file capabilities
> > valid in userns 2 by the kuid which underlies uid 0.  Therefore the
> > restricted root process can unshare a new self-mapping namespace, add a
> > namespaced file capability onto a file, then use that file capability in
> > the parent namespace.
> > 
> > To prevent that, do not allow mapping parent uid 0 if the process which
> > opened the uid_map file does not have CAP_SETFCAP, which is the capability
> > for setting file capabilities.
> > 
> > As a further wrinkle:  a task can unshare its user namespace, then
> > open its uid_map file itself, and map (only) its own uid.  In this
> > case we do not have the credential from before unshare,  which was
> > potentially more restricted.  So, when creating a user namespace, we
> > record whether the creator had CAP_SETFCAP.  Then we can use that
> > during map_write().
> > 
> > With this patch:
> > 
> > 1. Unprivileged user can still unshare -Ur
> > 
> > ubuntu@caps:~$ unshare -Ur
> > root@caps:~# logout
> > 
> > 2. Root user can still unshare -Ur
> > 
> > ubuntu@caps:~$ sudo bash
> > root@caps:/home/ubuntu# unshare -Ur
> > root@caps:/home/ubuntu# logout
> > 
> > 3. Root user without CAP_SETFCAP cannot unshare -Ur:
> > 
> > root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap --
> > root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap
> > unable to set CAP_SETFCAP effective capability: Operation not permitted
> > root@caps:/home/ubuntu# unshare -Ur
> > unshare: write failed /proc/self/uid_map: Operation not permitted
> > 
> > Note: an alternative solution would be to allow uid 0 mappings by
> > processes without CAP_SETFCAP, but to prevent such a namespace from
> > writing any file capabilities.  This approach can be seen here:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4
> > 
> 
> Ah, can you link to the previous fix and its revert, please? I think
> that was mentioned in the formerly private thread as well but we forgot:
> 
> commit 95ebabde382c371572297915b104e55403674e73
> Author: Eric W. Biederman 
> Date:   Thu Dec 17 09:42:00 2020 -0600
> 
> capabilities: Don't allow writing ambiguous v3 file capabilities
> 
> commit 3b0c2d3eaa83da259d7726192cf55a137769012f
> Author: Eric W. Biederman 
> Date:   Fri Mar 12 15:07:09 2021 -0600
> 
> Revert 95ebabde382c ("capabilities: Don't allow writing ambiguous v3 file 
> capabilities")

Sure.

Is there a tag for that kind of thing or do I just mention it at the end
of the description?

Re: [PATCH v4 5/7] cpufreq: qcom-hw: Implement CPRh aware OSM programming

2021-04-19 Thread Viresh Kumar

On 19-04-21, 13:52, Bjorn Andersson wrote:
> On Tue 19 Jan 11:45 CST 2021, AngeloGioacchino Del Regno wrote:
> @Viresh, do you have any suggestion regarding my last comment?

> >  static int qcom_cpufreq_hw_driver_probe(struct platform_device *pdev)
> >  {
> > +   const struct qcom_cpufreq_soc_data *soc_data;
> > +   struct device_node *pd_node;
> > +   struct platform_device *pd_dev;
> > struct device *cpu_dev;
> > struct clk *clk;
> > -   int ret;
> > +   int clk_div, ret;
> > +
> > +   cpu_dev = get_cpu_device(0);
> > +   if (!cpu_dev)
> > +   return -EPROBE_DEFER;
> > +
> > +   soc_data = of_device_get_match_data(&pdev->dev);
> > +   if (!soc_data)
> > +   return -EINVAL;
> > +
> > +   if (!soc_data->uses_tz) {
> > +   /*
> > +* When the OSM is not pre-programmed from TZ, we will
> > +* need to program the sequencer through SCM calls.
> > +*/
> > +   if (!qcom_scm_is_available())
> > +   return -EPROBE_DEFER;
> > +
> > +   /*
> > +* If there are no power-domains, OSM programming cannot be
> > +* performed, as in that case, we wouldn't know where to take
> > +* the params from...
> > +*/
> > +   pd_node = of_parse_phandle(cpu_dev->of_node,
> > +  "power-domains", 0);
> > +   if (!pd_node) {
> > +   ret = PTR_ERR(pd_node);
> > +   dev_err(cpu_dev, "power domain not found: %d\n", ret);
> > +   return ret;
> > +   }
> > +
> > +   /*
> > +* If the power domain device is not registered yet, then
> > +* defer probing this driver until that is available.
> > +*/
> > +   pd_dev = of_find_device_by_node(pd_node);
> > +   if (!pd_dev || !pd_dev->dev.driver ||
> > +   !device_is_bound(&pd_dev->dev))
> > +   return -EPROBE_DEFER;
> 
> I wonder if there's a more appropriate way to probe defer on resources
> described in the CPU nodes...

Recently we made some updates to the OPP core to start returning
EPROBE_DEFER on failure to acquire resources. I think you can get rid
of many checks for resources here by just trying to create the OPP
table and check its return value for EPROBE_DEFER.

-- 
viresh

Re: Enabling pmbus power control

2021-04-19 Thread Guenter Roeck

On Mon, Apr 19, 2021 at 08:29:53PM -0500, Zev Weiss wrote:
> On Tue, Mar 30, 2021 at 02:38:10PM CDT, Guenter Roeck wrote:
> > On Tue, Mar 30, 2021 at 07:02:00PM +0100, Mark Brown wrote:
> > > On Tue, Mar 30, 2021 at 12:56:56PM -0500, Zev Weiss wrote:
> > > 
> > > > Okay, to expand a bit on the description in my initial message -- we've
> > > > got a single chassis with multiple server boards and a single manager 
> > > > board
> > > > that handles, among other things, power control for the servers.
> > > > The manager board has one LM25066 for each attached server, which acts 
> > > > as
> > > > the "power switch" for that server.  There thus really isn't any driver 
> > > > to
> > > > speak of for the downstream device.
> > > 
> > > This sounds like you need a driver representing those server boards (or
> > > the slots they plug into perhaps) that represents everything about those
> > > boards to userspace, including power switching.  I don't see why you
> > > wouldn't have a driver for that - it's a thing that physically exists
> > > and clearly has some software control, and you'd presumably also expect
> > > to represent some labelling about the slot as well.
> > 
> > Absolutely agree.
> > 
> > Thanks,
> > Guenter
> 
> Hi Guenter, Mark,
> 
> I wanted to return to this to try to explain why structuring the kernel
> support for this in a way that's specific to the device behind the PMIC
> seems like an awkward fit to me, and ultimately kind of artificial.
> 
> In the system I described, the manager board with the LM25066 devices is its
> own little self-contained BMC system running its own Linux kernel (though
> "BMC" is perhaps a slightly misleading term because there's no host system
> that it manages).  The PMICs are really the only relevant connection it has
> to the servers it controls power for -- they have their own dedicated local
> BMCs on board as well doing all the usual BMC tasks.  A hypothetical
> dedicated driver for this on the manager board wouldn't have any other
> hardware to touch aside from the pmbus interface of the LM25066 itself, so
> calling it a server board driver seems pretty arbitrary -- and in fact the
> same system also has another LM25066 that controls the power supply to the
> chassis cooling fans (a totally different downstream device, but one that
> would be equally well-served by the same software).
> 
> More recently, another system has entered the picture for us that might
> illustrate it more starkly -- the Delta Open19 power shelf [0] supported by
> a recent code release from LinkedIn [1].  This is a rackmount power supply

All I can see is that this code is a mess.

> with fifty outputs, each independently switchable via an LM25066 attached to
> an on-board BMC-style management controller (though again, no host system
> involved).  We (Equinix Metal) are interested in porting a modern OpenBMC to
> it to replace the dated, crufty, pre-Linux-Foundation version of OpenBMC it
> currently runs (as found in the linked repo).  The exact nature of the
> devices powered by its outputs is well outside the scope of the firmware
> running on that controller (it could be any arbitrary thing that runs on
> 12VDC), but we still want to be able to both (a) retrieve per-output
> voltage/current/power readings as provided by the existing LM25066 hwmon
> support, and (b) control the on/off state of those outputs from userspace.
> 
> Given the array of possible use-cases, an approach of adding power-switch
> functionality to the existing LM25066 support seems like the most obvious
> way to support this, so I'm hoping to see if I can get some idea of what an
> acceptable implementation of that might look like.
> 

Sorry, that is simply a no-go for the LM25066 driver. I mentioned it before,
and it is still true: The hwmon subsystem is not in the business of power
control.

Have you looked into enclosures (drivers/misc/enclosure.c) ? Maybe that
or something similar could be used for your purposes.

Thanks,
Guenter

[Patch v3 7/7] crypto: qce: aead: Schedule fallback algorithm

2021-04-19 Thread Thara Gopinath

Qualcomm crypto engine does not handle the following scenarios and
will issue an abort. In such cases, pass on the transformation to
a fallback algorithm.

- DES3 algorithms with all three keys same.
- AES192 algorithms.
- 0 length messages.

Signed-off-by: Thara Gopinath 
---

v1->v2:
- Updated crypto_aead_set_reqsize to include the size of fallback
  request as well.

 drivers/crypto/qce/aead.c | 64 ---
 drivers/crypto/qce/aead.h |  3 ++
 2 files changed, 56 insertions(+), 11 deletions(-)

diff --git a/drivers/crypto/qce/aead.c b/drivers/crypto/qce/aead.c
index ef66ae21eae3..6d06a19b48e4 100644
--- a/drivers/crypto/qce/aead.c
+++ b/drivers/crypto/qce/aead.c
@@ -512,7 +512,23 @@ static int qce_aead_crypt(struct aead_request *req, int 
encrypt)
/* CE does not handle 0 length messages */
if (!rctx->cryptlen) {
if (!(IS_CCM(rctx->flags) && IS_DECRYPT(rctx->flags)))
-   return -EINVAL;
+   ctx->need_fallback = true;
+   }
+
+   /* If fallback is needed, schedule and exit */
+   if (ctx->need_fallback) {
+   /* Reset need_fallback in case the same ctx is used for another 
transaction */
+   ctx->need_fallback = false;
+
+   aead_request_set_tfm(&rctx->fallback_req, ctx->fallback);
+   aead_request_set_callback(&rctx->fallback_req, req->base.flags,
+ req->base.complete, req->base.data);
+   aead_request_set_crypt(&rctx->fallback_req, req->src,
+  req->dst, req->cryptlen, req->iv);
+   aead_request_set_ad(&rctx->fallback_req, req->assoclen);
+
+   return encrypt ? crypto_aead_encrypt(&rctx->fallback_req) :
+crypto_aead_decrypt(&rctx->fallback_req);
}
 
/*
@@ -553,7 +569,7 @@ static int qce_aead_ccm_setkey(struct crypto_aead *tfm, 
const u8 *key,
memcpy(ctx->ccm4309_salt, key + keylen, QCE_CCM4309_SALT_SIZE);
}
 
-   if (keylen != AES_KEYSIZE_128 && keylen != AES_KEYSIZE_256)
+   if (keylen != AES_KEYSIZE_128 && keylen != AES_KEYSIZE_256 && keylen != 
AES_KEYSIZE_192)
return -EINVAL;
 
ctx->enc_keylen = keylen;
@@ -562,7 +578,12 @@ static int qce_aead_ccm_setkey(struct crypto_aead *tfm, 
const u8 *key,
memcpy(ctx->enc_key, key, keylen);
memcpy(ctx->auth_key, key, keylen);
 
-   return 0;
+   if (keylen == AES_KEYSIZE_192)
+   ctx->need_fallback = true;
+
+   return IS_CCM_RFC4309(flags) ?
+   crypto_aead_setkey(ctx->fallback, key, keylen + 
QCE_CCM4309_SALT_SIZE) :
+   crypto_aead_setkey(ctx->fallback, key, keylen);
 }
 
 static int qce_aead_setkey(struct crypto_aead *tfm, const u8 *key, unsigned 
int keylen)
@@ -593,20 +614,21 @@ static int qce_aead_setkey(struct crypto_aead *tfm, const 
u8 *key, unsigned int
 * The crypto engine does not support any two keys
 * being the same for triple des algorithms. The
 * verify_skcipher_des3_key does not check for all the
-* below conditions. Return -EINVAL in case any two keys
-* are the same. Revisit to see if a fallback cipher
-* is needed to handle this condition.
+* below conditions. Schedule fallback in this case.
 */
memcpy(_key, authenc_keys.enckey, DES3_EDE_KEY_SIZE);
if (!((_key[0] ^ _key[2]) | (_key[1] ^ _key[3])) ||
!((_key[2] ^ _key[4]) | (_key[3] ^ _key[5])) ||
!((_key[0] ^ _key[4]) | (_key[1] ^ _key[5])))
-   return -EINVAL;
+   ctx->need_fallback = true;
} else if (IS_AES(flags)) {
/* No random key sizes */
if (authenc_keys.enckeylen != AES_KEYSIZE_128 &&
+   authenc_keys.enckeylen != AES_KEYSIZE_192 &&
authenc_keys.enckeylen != AES_KEYSIZE_256)
return -EINVAL;
+   if (authenc_keys.enckeylen == AES_KEYSIZE_192)
+   ctx->need_fallback = true;
}
 
ctx->enc_keylen = authenc_keys.enckeylen;
@@ -617,7 +639,7 @@ static int qce_aead_setkey(struct crypto_aead *tfm, const 
u8 *key, unsigned int
memset(ctx->auth_key, 0, sizeof(ctx->auth_key));
memcpy(ctx->auth_key, authenc_keys.authkey, authenc_keys.authkeylen);
 
-   return 0;
+   return crypto_aead_setkey(ctx->fallback, key, keylen);
 }
 
 static int qce_aead_setauthsize(struct crypto_aead *tfm, unsigned int authsize)
@@ -632,15 +654,33 @@ static int qce_aead_setauthsize(struct crypto_aead *tfm, 
unsigned int authsize)
return -EINVAL;
}
ctx->authsize = authsize;
-   return 0;
+
+   return crypto_aead_setauths

[Patch v3 6/7] crypto: qce: common: Add support for AEAD algorithms

2021-04-19 Thread Thara Gopinath

Add register programming sequence for enabling AEAD
algorithms on the Qualcomm crypto engine.

Signed-off-by: Thara Gopinath 
---

v2->v3:
- Made qce_be32_to_cpu_array truly be32 to cpu endian by using 
be32_to_cpup
  instead of cpu_to_be32p. Also remove the (u32 *) typcasting of arrays 
obtained
  as output from qce_be32_to_cpu_array as per Bjorn's review comments.
- Wrapped newly introduced std_iv_sha1, std_iv_sha256 and 
qce_be32_to_cpu_array
  in CONFIG_CRYPTO_DEV_QCE_AEAD to prevent W1 warnings as reported by 
kernel
  test robot .

v1->v2:
- Minor fixes like removing not needed initializing of variables
  and using bool values in lieu of 0 and 1 as pointed out by Bjorn.
- Introduced qce_be32_to_cpu_array which converts the u8 string in big
  endian order to array of u32 and returns back total number of words,
  as per Bjorn's review comments. Presently this function is used only 
by
  qce_setup_regs_aead to format keys, iv and nonce. cipher and hash 
  algorithms can be made to use this function as a separate clean up 
patch.

 drivers/crypto/qce/common.c | 162 +++-
 1 file changed, 160 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/qce/common.c b/drivers/crypto/qce/common.c
index 7b3d6caec1b2..6d6b3792323b 100644
--- a/drivers/crypto/qce/common.c
+++ b/drivers/crypto/qce/common.c
@@ -15,6 +15,7 @@
 #include "core.h"
 #include "regs-v5.h"
 #include "sha.h"
+#include "aead.h"
 
 static inline u32 qce_read(struct qce_device *qce, u32 offset)
 {
@@ -96,7 +97,7 @@ static inline void qce_crypto_go(struct qce_device *qce, bool 
result_dump)
qce_write(qce, REG_GOPROC, BIT(GO_SHIFT));
 }
 
-#ifdef CONFIG_CRYPTO_DEV_QCE_SHA
+#if defined(CONFIG_CRYPTO_DEV_QCE_SHA) || defined(CONFIG_CRYPTO_DEV_QCE_AEAD)
 static u32 qce_auth_cfg(unsigned long flags, u32 key_size, u32 auth_size)
 {
u32 cfg = 0;
@@ -139,7 +140,9 @@ static u32 qce_auth_cfg(unsigned long flags, u32 key_size, 
u32 auth_size)
 
return cfg;
 }
+#endif
 
+#ifdef CONFIG_CRYPTO_DEV_QCE_SHA
 static int qce_setup_regs_ahash(struct crypto_async_request *async_req)
 {
struct ahash_request *req = ahash_request_cast(async_req);
@@ -225,7 +228,7 @@ static int qce_setup_regs_ahash(struct crypto_async_request 
*async_req)
 }
 #endif
 
-#ifdef CONFIG_CRYPTO_DEV_QCE_SKCIPHER
+#if defined(CONFIG_CRYPTO_DEV_QCE_SKCIPHER) || 
defined(CONFIG_CRYPTO_DEV_QCE_AEAD)
 static u32 qce_encr_cfg(unsigned long flags, u32 aes_key_size)
 {
u32 cfg = 0;
@@ -271,7 +274,9 @@ static u32 qce_encr_cfg(unsigned long flags, u32 
aes_key_size)
 
return cfg;
 }
+#endif
 
+#ifdef CONFIG_CRYPTO_DEV_QCE_SKCIPHER
 static void qce_xts_swapiv(__be32 *dst, const u8 *src, unsigned int ivsize)
 {
u8 swap[QCE_AES_IV_LENGTH];
@@ -386,6 +391,155 @@ static int qce_setup_regs_skcipher(struct 
crypto_async_request *async_req)
 }
 #endif
 
+#ifdef CONFIG_CRYPTO_DEV_QCE_AEAD
+static const u32 std_iv_sha1[SHA256_DIGEST_SIZE / sizeof(u32)] = {
+   SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4, 0, 0, 0
+};
+
+static const u32 std_iv_sha256[SHA256_DIGEST_SIZE / sizeof(u32)] = {
+   SHA256_H0, SHA256_H1, SHA256_H2, SHA256_H3,
+   SHA256_H4, SHA256_H5, SHA256_H6, SHA256_H7
+};
+
+static unsigned int qce_be32_to_cpu_array(u32 *dst, const u8 *src, unsigned 
int len)
+{
+   u32 *d = dst;
+   const u8 *s = src;
+   unsigned int n;
+
+   n = len / sizeof(u32);
+   for (; n > 0; n--) {
+   *d = be32_to_cpup((const __be32 *)s);
+   s += sizeof(u32);
+   d++;
+   }
+   return DIV_ROUND_UP(len, sizeof(u32));
+}
+
+static int qce_setup_regs_aead(struct crypto_async_request *async_req)
+{
+   struct aead_request *req = aead_request_cast(async_req);
+   struct qce_aead_reqctx *rctx = aead_request_ctx(req);
+   struct qce_aead_ctx *ctx = crypto_tfm_ctx(async_req->tfm);
+   struct qce_alg_template *tmpl = to_aead_tmpl(crypto_aead_reqtfm(req));
+   struct qce_device *qce = tmpl->qce;
+   u32 enckey[QCE_MAX_CIPHER_KEY_SIZE / sizeof(u32)] = {0};
+   u32 enciv[QCE_MAX_IV_SIZE / sizeof(u32)] = {0};
+   u32 authkey[QCE_SHA_HMAC_KEY_SIZE / sizeof(u32)] = {0};
+   u32 authiv[SHA256_DIGEST_SIZE / sizeof(u32)] = {0};
+   u32 authnonce[QCE_MAX_NONCE / sizeof(u32)] = {0};
+   unsigned int enc_keylen = ctx->enc_keylen;
+   unsigned int auth_keylen = ctx->auth_keylen;
+   unsigned int enc_ivsize = rctx->ivsize;
+   unsigned int auth_ivsize;
+   unsigned int enckey_words, enciv_words;
+   unsigned int authkey_words, authiv_words, authnonce_words;
+   unsigned long flags = rctx->flags;
+   u32 encr_cfg, auth_cfg, config, totallen;
+   u32 iv_last_word;
+
+   qce_setup_config(qce);
+
+   /* Write encryption key */
+   enckey_words = qce_be32_to_cpu_array(enckey, ctx->enc_key, e

[Patch v3 5/7] crypto: qce: common: Clean up qce_auth_cfg

2021-04-19 Thread Thara Gopinath

Remove various redundant checks in qce_auth_cfg. Also allow qce_auth_cfg
to take auth_size as a parameter which is a required setting for ccm(aes)
algorithms

Signed-off-by: Thara Gopinath 
---
 drivers/crypto/qce/common.c | 21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/crypto/qce/common.c b/drivers/crypto/qce/common.c
index 7b5bc5a6ae81..7b3d6caec1b2 100644
--- a/drivers/crypto/qce/common.c
+++ b/drivers/crypto/qce/common.c
@@ -97,11 +97,11 @@ static inline void qce_crypto_go(struct qce_device *qce, 
bool result_dump)
 }
 
 #ifdef CONFIG_CRYPTO_DEV_QCE_SHA
-static u32 qce_auth_cfg(unsigned long flags, u32 key_size)
+static u32 qce_auth_cfg(unsigned long flags, u32 key_size, u32 auth_size)
 {
u32 cfg = 0;
 
-   if (IS_AES(flags) && (IS_CCM(flags) || IS_CMAC(flags)))
+   if (IS_CCM(flags) || IS_CMAC(flags))
cfg |= AUTH_ALG_AES << AUTH_ALG_SHIFT;
else
cfg |= AUTH_ALG_SHA << AUTH_ALG_SHIFT;
@@ -119,15 +119,16 @@ static u32 qce_auth_cfg(unsigned long flags, u32 key_size)
cfg |= AUTH_SIZE_SHA256 << AUTH_SIZE_SHIFT;
else if (IS_CMAC(flags))
cfg |= AUTH_SIZE_ENUM_16_BYTES << AUTH_SIZE_SHIFT;
+   else if (IS_CCM(flags))
+   cfg |= (auth_size - 1) << AUTH_SIZE_SHIFT;
 
if (IS_SHA1(flags) || IS_SHA256(flags))
cfg |= AUTH_MODE_HASH << AUTH_MODE_SHIFT;
-   else if (IS_SHA1_HMAC(flags) || IS_SHA256_HMAC(flags) ||
-IS_CBC(flags) || IS_CTR(flags))
+   else if (IS_SHA1_HMAC(flags) || IS_SHA256_HMAC(flags))
cfg |= AUTH_MODE_HMAC << AUTH_MODE_SHIFT;
-   else if (IS_AES(flags) && IS_CCM(flags))
+   else if (IS_CCM(flags))
cfg |= AUTH_MODE_CCM << AUTH_MODE_SHIFT;
-   else if (IS_AES(flags) && IS_CMAC(flags))
+   else if (IS_CMAC(flags))
cfg |= AUTH_MODE_CMAC << AUTH_MODE_SHIFT;
 
if (IS_SHA(flags) || IS_SHA_HMAC(flags))
@@ -136,10 +137,6 @@ static u32 qce_auth_cfg(unsigned long flags, u32 key_size)
if (IS_CCM(flags))
cfg |= QCE_MAX_NONCE_WORDS << AUTH_NONCE_NUM_WORDS_SHIFT;
 
-   if (IS_CBC(flags) || IS_CTR(flags) || IS_CCM(flags) ||
-   IS_CMAC(flags))
-   cfg |= BIT(AUTH_LAST_SHIFT) | BIT(AUTH_FIRST_SHIFT);
-
return cfg;
 }
 
@@ -171,7 +168,7 @@ static int qce_setup_regs_ahash(struct crypto_async_request 
*async_req)
qce_clear_array(qce, REG_AUTH_KEY0, 16);
qce_clear_array(qce, REG_AUTH_BYTECNT0, 4);
 
-   auth_cfg = qce_auth_cfg(rctx->flags, rctx->authklen);
+   auth_cfg = qce_auth_cfg(rctx->flags, rctx->authklen, 
digestsize);
}
 
if (IS_SHA_HMAC(rctx->flags) || IS_CMAC(rctx->flags)) {
@@ -199,7 +196,7 @@ static int qce_setup_regs_ahash(struct crypto_async_request 
*async_req)
qce_write_array(qce, REG_AUTH_BYTECNT0,
(u32 *)rctx->byte_count, 2);
 
-   auth_cfg = qce_auth_cfg(rctx->flags, 0);
+   auth_cfg = qce_auth_cfg(rctx->flags, 0, digestsize);
 
if (rctx->last_blk)
auth_cfg |= BIT(AUTH_LAST_SHIFT);
-- 
2.25.1

[Patch v3 2/7] crypto: qce: common: Make result dump optional

2021-04-19 Thread Thara Gopinath

Qualcomm crypto engine allows for IV registers and status register
to be concatenated to the output. This option is enabled by setting the
RESULTS_DUMP field in GOPROC  register. This is useful for most of the
algorithms to either retrieve status of operation or in case of
authentication algorithms to retrieve the mac. But for ccm
algorithms, the mac is part of the output stream and not retrieved
from the IV registers, thus needing a separate buffer to retrieve it.
Make enabling RESULTS_DUMP field optional so that algorithms can choose
whether or not to enable the option.
Note that in this patch, the enabled algorithms always choose
RESULTS_DUMP to be enabled. But later with the introduction of ccm
algorithms, this changes.

Reviewed-by: Bjorn Andersson 
Signed-off-by: Thara Gopinath 
---
 drivers/crypto/qce/common.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/crypto/qce/common.c b/drivers/crypto/qce/common.c
index dd76175d5c62..7b5bc5a6ae81 100644
--- a/drivers/crypto/qce/common.c
+++ b/drivers/crypto/qce/common.c
@@ -88,9 +88,12 @@ static void qce_setup_config(struct qce_device *qce)
qce_write(qce, REG_CONFIG, config);
 }
 
-static inline void qce_crypto_go(struct qce_device *qce)
+static inline void qce_crypto_go(struct qce_device *qce, bool result_dump)
 {
-   qce_write(qce, REG_GOPROC, BIT(GO_SHIFT) | BIT(RESULTS_DUMP_SHIFT));
+   if (result_dump)
+   qce_write(qce, REG_GOPROC, BIT(GO_SHIFT) | 
BIT(RESULTS_DUMP_SHIFT));
+   else
+   qce_write(qce, REG_GOPROC, BIT(GO_SHIFT));
 }
 
 #ifdef CONFIG_CRYPTO_DEV_QCE_SHA
@@ -219,7 +222,7 @@ static int qce_setup_regs_ahash(struct crypto_async_request 
*async_req)
config = qce_config_reg(qce, 1);
qce_write(qce, REG_CONFIG, config);
 
-   qce_crypto_go(qce);
+   qce_crypto_go(qce, true);
 
return 0;
 }
@@ -380,7 +383,7 @@ static int qce_setup_regs_skcipher(struct 
crypto_async_request *async_req)
config = qce_config_reg(qce, 1);
qce_write(qce, REG_CONFIG, config);
 
-   qce_crypto_go(qce);
+   qce_crypto_go(qce, true);
 
return 0;
 }
-- 
2.25.1

[Patch v3 4/7] crypto: qce: Add support for AEAD algorithms

2021-04-19 Thread Thara Gopinath

Introduce support to enable following algorithms in Qualcomm Crypto Engine.

- authenc(hmac(sha1),cbc(des))
- authenc(hmac(sha1),cbc(des3_ede))
- authenc(hmac(sha256),cbc(des))
- authenc(hmac(sha256),cbc(des3_ede))
- authenc(hmac(sha256),cbc(aes))
- ccm(aes)
- rfc4309(ccm(aes))

Signed-off-by: Thara Gopinath 
---

v1->v2:
- Removed spurious totallen from qce_aead_async_req_handle since
  it was unused as pointed out by Hubert.
- Updated qce_dma_prep_sgs to use #nents returned by dma_map_sg rather 
than
  using precomputed #nents.


 drivers/crypto/Kconfig  |  15 +
 drivers/crypto/qce/Makefile |   1 +
 drivers/crypto/qce/aead.c   | 799 
 drivers/crypto/qce/aead.h   |  53 +++
 drivers/crypto/qce/common.h |   2 +
 drivers/crypto/qce/core.c   |   4 +
 6 files changed, 874 insertions(+)
 create mode 100644 drivers/crypto/qce/aead.c
 create mode 100644 drivers/crypto/qce/aead.h

diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index 9a4c275a1335..1fe5b7eafc02 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -627,6 +627,12 @@ config CRYPTO_DEV_QCE_SHA
select CRYPTO_SHA1
select CRYPTO_SHA256
 
+config CRYPTO_DEV_QCE_AEAD
+   bool
+   depends on CRYPTO_DEV_QCE
+   select CRYPTO_AUTHENC
+   select CRYPTO_LIB_DES
+
 choice
prompt "Algorithms enabled for QCE acceleration"
default CRYPTO_DEV_QCE_ENABLE_ALL
@@ -647,6 +653,7 @@ choice
bool "All supported algorithms"
select CRYPTO_DEV_QCE_SKCIPHER
select CRYPTO_DEV_QCE_SHA
+   select CRYPTO_DEV_QCE_AEAD
help
  Enable all supported algorithms:
- AES (CBC, CTR, ECB, XTS)
@@ -672,6 +679,14 @@ choice
- SHA1, HMAC-SHA1
- SHA256, HMAC-SHA256
 
+   config CRYPTO_DEV_QCE_ENABLE_AEAD
+   bool "AEAD algorithms only"
+   select CRYPTO_DEV_QCE_AEAD
+   help
+ Enable AEAD algorithms only:
+   - authenc()
+   - ccm(aes)
+   - rfc4309(ccm(aes))
 endchoice
 
 config CRYPTO_DEV_QCE_SW_MAX_LEN
diff --git a/drivers/crypto/qce/Makefile b/drivers/crypto/qce/Makefile
index 14ade8a7d664..2cf8984e1b85 100644
--- a/drivers/crypto/qce/Makefile
+++ b/drivers/crypto/qce/Makefile
@@ -6,3 +6,4 @@ qcrypto-objs := core.o \
 
 qcrypto-$(CONFIG_CRYPTO_DEV_QCE_SHA) += sha.o
 qcrypto-$(CONFIG_CRYPTO_DEV_QCE_SKCIPHER) += skcipher.o
+qcrypto-$(CONFIG_CRYPTO_DEV_QCE_AEAD) += aead.o
diff --git a/drivers/crypto/qce/aead.c b/drivers/crypto/qce/aead.c
new file mode 100644
index ..ef66ae21eae3
--- /dev/null
+++ b/drivers/crypto/qce/aead.c
@@ -0,0 +1,799 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Copyright (C) 2021, Linaro Limited. All rights reserved.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "aead.h"
+
+#define CCM_NONCE_ADATA_SHIFT  6
+#define CCM_NONCE_AUTHSIZE_SHIFT   3
+#define MAX_CCM_ADATA_HEADER_LEN6
+
+static LIST_HEAD(aead_algs);
+
+static void qce_aead_done(void *data)
+{
+   struct crypto_async_request *async_req = data;
+   struct aead_request *req = aead_request_cast(async_req);
+   struct qce_aead_reqctx *rctx = aead_request_ctx(req);
+   struct qce_aead_ctx *ctx = crypto_tfm_ctx(async_req->tfm);
+   struct qce_alg_template *tmpl = to_aead_tmpl(crypto_aead_reqtfm(req));
+   struct qce_device *qce = tmpl->qce;
+   struct qce_result_dump *result_buf = qce->dma.result_buf;
+   enum dma_data_direction dir_src, dir_dst;
+   bool diff_dst;
+   int error;
+   u32 status;
+   unsigned int totallen;
+   unsigned char tag[SHA256_DIGEST_SIZE] = {0};
+   int ret = 0;
+
+   diff_dst = (req->src != req->dst) ? true : false;
+   dir_src = diff_dst ? DMA_TO_DEVICE : DMA_BIDIRECTIONAL;
+   dir_dst = diff_dst ? DMA_FROM_DEVICE : DMA_BIDIRECTIONAL;
+
+   error = qce_dma_terminate_all(&qce->dma);
+   if (error)
+   dev_dbg(qce->dev, "aead dma termination error (%d)\n",
+   error);
+   if (diff_dst)
+   dma_unmap_sg(qce->dev, rctx->src_sg, rctx->src_nents, dir_src);
+
+   dma_unmap_sg(qce->dev, rctx->dst_sg, rctx->dst_nents, dir_dst);
+
+   if (IS_CCM(rctx->flags)) {
+   if (req->assoclen) {
+   sg_free_table(&rctx->src_tbl);
+   if (diff_dst)
+   sg_free_table(&rctx->dst_tbl);
+   } else {
+   if (!(IS_DECRYPT(rctx->flags) && !diff_dst))
+   sg_free_table(&rctx->dst_tbl);
+   }
+   } else {
+   sg_free_table(&rctx->dst_tbl);
+   }
+
+   error = qce_check_status(qce, &status);
+

[Patch v3 3/7] crypto: qce: Add mode for rfc4309

2021-04-19 Thread Thara Gopinath

rf4309 is the specification that uses aes ccm algorithms with IPsec
security packets. Add a submode to identify rfc4309 ccm(aes) algorithm
in the crypto driver.

Reviewed-by: Bjorn Andersson 
Signed-off-by: Thara Gopinath 
---

v1->v2:
- Moved up the QCE_ENCRYPT AND QCE_DECRYPT bit positions so that
  addition of other algorithms in future will not affect these
  macros.

 drivers/crypto/qce/common.h | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/qce/common.h b/drivers/crypto/qce/common.h
index 3bc244bcca2d..b135440bf72b 100644
--- a/drivers/crypto/qce/common.h
+++ b/drivers/crypto/qce/common.h
@@ -51,9 +51,11 @@
 #define QCE_MODE_CCM   BIT(12)
 #define QCE_MODE_MASK  GENMASK(12, 8)
 
+#define QCE_MODE_CCM_RFC4309   BIT(13)
+
 /* cipher encryption/decryption operations */
-#define QCE_ENCRYPTBIT(13)
-#define QCE_DECRYPTBIT(14)
+#define QCE_ENCRYPTBIT(30)
+#define QCE_DECRYPTBIT(31)
 
 #define IS_DES(flags)  (flags & QCE_ALG_DES)
 #define IS_3DES(flags) (flags & QCE_ALG_3DES)
@@ -73,6 +75,7 @@
 #define IS_CTR(mode)   (mode & QCE_MODE_CTR)
 #define IS_XTS(mode)   (mode & QCE_MODE_XTS)
 #define IS_CCM(mode)   (mode & QCE_MODE_CCM)
+#define IS_CCM_RFC4309(mode)   ((mode) & QCE_MODE_CCM_RFC4309)
 
 #define IS_ENCRYPT(dir)(dir & QCE_ENCRYPT)
 #define IS_DECRYPT(dir)(dir & QCE_DECRYPT)
-- 
2.25.1

[Patch v3 0/7] Add support for AEAD algorithms in Qualcomm Crypto Engine driver

2021-04-19 Thread Thara Gopinath

Enable support for AEAD algorithms in Qualcomm CE driver.  The first three
patches in this series are cleanups and add a few missing pieces required
to add support for AEAD algorithms.  Patch 4 introduces supported AEAD
transformations on Qualcomm CE.  Patches 5 and 6 implements the h/w
infrastructure needed to enable and run the AEAD transformations on
Qualcomm CE.  Patch 7 adds support to queue fallback algorithms in case of
unsupported special inputs.

This patch series has been tested with in kernel crypto testing module
tcrypt.ko with fuzz tests enabled as well.

Thara Gopinath (7):
  crypto: qce: common: Add MAC failed error checking
  crypto: qce: common: Make result dump optional
  crypto: qce: Add mode for rfc4309
  crypto: qce: Add support for AEAD algorithms
  crypto: qce: common: Clean up qce_auth_cfg
  crypto: qce: common: Add support for AEAD algorithms
  crypto: qce: aead: Schedule fallback algorithm

 drivers/crypto/Kconfig  |  15 +
 drivers/crypto/qce/Makefile |   1 +
 drivers/crypto/qce/aead.c   | 841 
 drivers/crypto/qce/aead.h   |  56 +++
 drivers/crypto/qce/common.c | 196 -
 drivers/crypto/qce/common.h |   9 +-
 drivers/crypto/qce/core.c   |   4 +
 7 files changed, 1102 insertions(+), 20 deletions(-)
 create mode 100644 drivers/crypto/qce/aead.c
 create mode 100644 drivers/crypto/qce/aead.h

-- 
2.25.1

[Patch v3 1/7] crypto: qce: common: Add MAC failed error checking

2021-04-19 Thread Thara Gopinath

MAC_FAILED gets set in the status register if authenthication fails
for ccm algorithms(during decryption). Add support to catch and flag
this error.

Reviewed-by: Bjorn Andersson 
Signed-off-by: Thara Gopinath 
---

v1->v2:
- Split the error checking for -ENXIO and -EBADMSG into if-else clause
  so that the code is more readable as per Bjorn's review comment.

 drivers/crypto/qce/common.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/crypto/qce/common.c b/drivers/crypto/qce/common.c
index dceb9579d87a..dd76175d5c62 100644
--- a/drivers/crypto/qce/common.c
+++ b/drivers/crypto/qce/common.c
@@ -419,6 +419,8 @@ int qce_check_status(struct qce_device *qce, u32 *status)
 */
if (*status & STATUS_ERRORS || !(*status & BIT(OPERATION_DONE_SHIFT)))
ret = -ENXIO;
+   else if (*status & BIT(MAC_FAILED_SHIFT))
+   ret = -EBADMSG;
 
return ret;
 }
-- 
2.25.1

Re: [PATCH] net: ethernet: ravb: Fix release of refclk

2021-04-19 Thread Adam Ford

On Mon, Apr 19, 2021 at 5:45 PM David Miller  wrote:
>
> From: Adam Ford 
> Date: Sat, 17 Apr 2021 08:23:29 -0500
>
> > The call to clk_disable_unprepare() can happen before priv is
> > initialized. This means moving clk_disable_unprepare out of
> > out_release into a new label.
> >
> > Fixes: 8ef7adc6beb2("net: ethernet: ravb: Enable optional refclk")
> > Signed-off-by: Adam Ford 
> Thjis does not apply cleanly, please rebbase and resubmit.

Which branch should I use as the rebase?  I used net-next because
that's where the bug is, but I know it changes frequently.

>
> Please fix the formatting of your Fixes tag while you are at it, thank you.

no problem.  Sorry about that

adam

RE: [RFC PATCH v5 1/4] topology: Represent clusters of CPUs within a die

2021-04-19 Thread Song Bao Hua (Barry Song)




> -Original Message-
> From: Greg KH [mailto:gre...@linuxfoundation.org]
> Sent: Friday, March 19, 2021 11:02 PM
> To: Jonathan Cameron 
> Cc: Song Bao Hua (Barry Song) ;
> tim.c.c...@linux.intel.com; catalin.mari...@arm.com; w...@kernel.org;
> r...@rjwysocki.net; vincent.guit...@linaro.org; b...@alien8.de;
> t...@linutronix.de; mi...@redhat.com; l...@kernel.org; pet...@infradead.org;
> dietmar.eggem...@arm.com; rost...@goodmis.org; bseg...@google.com;
> mgor...@suse.de; msys.miz...@gmail.com; valentin.schnei...@arm.com;
> juri.le...@redhat.com; mark.rutl...@arm.com; sudeep.ho...@arm.com;
> aubrey...@linux.intel.com; linux-arm-ker...@lists.infradead.org;
> linux-kernel@vger.kernel.org; linux-a...@vger.kernel.org; x...@kernel.org;
> xuwei (O) ; Zengtao (B) ;
> guodong...@linaro.org; yangyicong ; Liguozhu (Kenneth)
> ; linux...@openeuler.org; h...@zytor.com
> Subject: Re: [RFC PATCH v5 1/4] topology: Represent clusters of CPUs within
> a die
> 
> On Fri, Mar 19, 2021 at 09:36:16AM +, Jonathan Cameron wrote:
> > On Fri, 19 Mar 2021 06:57:08 +
> > "Song Bao Hua (Barry Song)"  wrote:
> >
> > > > -Original Message-
> > > > From: Greg KH [mailto:gre...@linuxfoundation.org]
> > > > Sent: Friday, March 19, 2021 7:35 PM
> > > > To: Song Bao Hua (Barry Song) 
> > > > Cc: tim.c.c...@linux.intel.com; catalin.mari...@arm.com;
> w...@kernel.org;
> > > > r...@rjwysocki.net; vincent.guit...@linaro.org; b...@alien8.de;
> > > > t...@linutronix.de; mi...@redhat.com; l...@kernel.org;
> pet...@infradead.org;
> > > > dietmar.eggem...@arm.com; rost...@goodmis.org; bseg...@google.com;
> > > > mgor...@suse.de; msys.miz...@gmail.com; valentin.schnei...@arm.com;
> Jonathan
> > > > Cameron ; juri.le...@redhat.com;
> > > > mark.rutl...@arm.com; sudeep.ho...@arm.com; aubrey...@linux.intel.com;
> > > > linux-arm-ker...@lists.infradead.org; linux-kernel@vger.kernel.org;
> > > > linux-a...@vger.kernel.org; x...@kernel.org; xuwei (O)
> ;
> > > > Zengtao (B) ; guodong...@linaro.org;
> yangyicong
> > > > ; Liguozhu (Kenneth) ;
> > > > linux...@openeuler.org; h...@zytor.com
> > > > Subject: Re: [RFC PATCH v5 1/4] topology: Represent clusters of CPUs 
> > > > within
> > > > a die
> > > >
> > > > On Fri, Mar 19, 2021 at 05:16:15PM +1300, Barry Song wrote:
> > > > > diff --git a/Documentation/admin-guide/cputopology.rst
> > > > b/Documentation/admin-guide/cputopology.rst
> > > > > index b90dafc..f9d3745 100644
> > > > > --- a/Documentation/admin-guide/cputopology.rst
> > > > > +++ b/Documentation/admin-guide/cputopology.rst
> > > > > @@ -24,6 +24,12 @@ core_id:
> > > > >   identifier (rather than the kernel's).  The actual value is
> > > > >   architecture and platform dependent.
> > > > >
> > > > > +cluster_id:
> > > > > +
> > > > > + the Cluster ID of cpuX.  Typically it is the hardware platform's
> > > > > + identifier (rather than the kernel's).  The actual value is
> > > > > + architecture and platform dependent.
> > > > > +
> > > > >  book_id:
> > > > >
> > > > >   the book ID of cpuX. Typically it is the hardware platform's
> > > > > @@ -56,6 +62,14 @@ package_cpus_list:
> > > > >   human-readable list of CPUs sharing the same 
> > > > > physical_package_id.
> > > > >   (deprecated name: "core_siblings_list")
> > > > >
> > > > > +cluster_cpus:
> > > > > +
> > > > > + internal kernel map of CPUs within the same cluster.
> > > > > +
> > > > > +cluster_cpus_list:
> > > > > +
> > > > > + human-readable list of CPUs within the same cluster.
> > > > > +
> > > > >  die_cpus:
> > > > >
> > > > >   internal kernel map of CPUs within the same die.
> > > >
> > > > Why are these sysfs files in this file, and not in a Documentation/ABI/
> > > > file which can be correctly parsed and shown to userspace?
> > >
> > > Well. Those ABIs have been there for much a long time. It is like:
> > >
> > > [root@ceph1 topology]# ls
> > > core_id  core_siblings  core_siblings_list  physical_package_id
> thread_siblings  thread_siblings_list
> > > [root@ceph1 topology]# pwd
> > > /sys/devices/system/cpu/cpu100/topology
> > > [root@ceph1 topology]# cat core_siblings_list
> > > 64-127
> > > [root@ceph1 topology]#
> > >
> > > >
> > > > Any chance you can fix that up here as well?
> > >
> > > Yes. we will send a separate patch to address this, which won't
> > > be in this patchset. This patchset will base on that one.
> > >
> > > >
> > > > Also note that "list" is not something that goes in sysfs, sysfs is "one
> > > > value per file", and a list is not "one value".  How do you prevent
> > > > overflowing the buffer of the sysfs file if you have a "list"?
> > > >
> > >
> > > At a glance, the list is using "-" rather than a real list
> > > [root@ceph1 topology]# cat core_siblings_list
> > > 64-127
> > >
> > > Anyway, I will take a look if it has any chance to overflow.
> >
> > It could in theory be alternate CPUs as comma separated list.
> > So it's would get interesting around 500-1000 cpus (guessing).

Re: [PATCH bpf-next 1/2] bpf: Remove bpf_jit_enable=2 debugging mode

2021-04-19 Thread Alexei Starovoitov

On Sat, Apr 17, 2021 at 1:16 AM Christophe Leroy
 wrote:
>
>
>
> Le 16/04/2021 à 01:49, Alexei Starovoitov a écrit :
> > On Thu, Apr 15, 2021 at 8:41 AM Quentin Monnet  
> > wrote:
> >>
> >> 2021-04-15 16:37 UTC+0200 ~ Daniel Borkmann 
> >>> On 4/15/21 11:32 AM, Jianlin Lv wrote:
>  For debugging JITs, dumping the JITed image to kernel log is discouraged,
>  "bpftool prog dump jited" is much better way to examine JITed dumps.
>  This patch get rid of the code related to bpf_jit_enable=2 mode and
>  update the proc handler of bpf_jit_enable, also added auxiliary
>  information to explain how to use bpf_jit_disasm tool after this change.
> 
>  Signed-off-by: Jianlin Lv 
> >>
> >> Hello,
> >>
> >> For what it's worth, I have already seen people dump the JIT image in
> >> kernel logs in Qemu VMs running with just a busybox, not for kernel
> >> development, but in a context where buiding/using bpftool was not
> >> possible.
> >
> > If building/using bpftool is not possible then majority of selftests won't
> > be exercised. I don't think such environment is suitable for any kind
> > of bpf development. Much so for JIT debugging.
> > While bpf_jit_enable=2 is nothing but the debugging tool for JIT developers.
> > I'd rather nuke that code instead of carrying it from kernel to kernel.
> >
>
> When I implemented JIT for PPC32, it was extremely helpfull.
>
> As far as I understand, for the time being bpftool is not usable in my 
> environment because it
> doesn't support cross compilation when the target's endianess differs from 
> the building host
> endianess, see discussion at
> https://lore.kernel.org/bpf/21e66a09-514f-f426-b9e2-13baab0b9...@csgroup.eu/
>
> That's right that selftests can't be exercised because they don't build.
>
> The question might be candid as I didn't investigate much about the 
> replacement of "bpf_jit_enable=2
> debugging mode" by bpftool, how do we use bpftool exactly for that ? 
> Especially when using the BPF
> test module ?

the kernel developers can add any amount of printk and dumps to debug
their code,
but such debugging aid should not be part of the production kernel.
That sysctl was two things at once: debugging tool for kernel devs and
introspection for users.
bpftool jit dump solves the 2nd part. It provides JIT introspection to users.
Debugging of the kernel can be done with any amount of auxiliary code
including calling print_hex_dump() during jiting.

Re: [PATCH] bonding: 3ad: update slave arr after initialize

2021-04-19 Thread jin yiting





在 2021/4/16 12:28, Jay Vosburgh 写道:

jinyiting  wrote:


From: jin yiting 

The bond works in mode 4, and performs down/up operations on the bond
that is normally negotiated. The probability of bond-> slave_arr is NULL

Test commands:
ifconfig bond1 down
ifconfig bond1 up

The conflict occurs in the following process：

__dev_open (CPU A)
--bond_open
   --queue_delayed_work(bond->wq,&bond->ad_work,0);
   --bond_update_slave_arr
 --bond_3ad_get_active_agg_info

ad_work(CPU B)
--bond_3ad_state_machine_handler
   --ad_agg_selection_logic

ad_work runs on cpu B. In the function ad_agg_selection_logic, all
agg->is_active will be cleared. Before the new active aggregator is
selected on CPU B, bond_3ad_get_active_agg_info failed on CPU A,
bond->slave_arr will be set to NULL. The best aggregator in
ad_agg_selection_logic has not changed, no need to update slave arr.

Signed-off-by: jin yiting 
---
drivers/net/bonding/bond_3ad.c | 6 ++
1 file changed, 6 insertions(+)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 6908822..d100079 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -2327,6 +2327,12 @@ void bond_3ad_state_machine_handler(struct work_struct 
*work)

aggregator = __get_first_agg(port);
ad_agg_selection_logic(aggregator, &update_slave_arr);
+   if (!update_slave_arr) {
+   struct aggregator *active = 
__get_active_agg(aggregator);
+
+   if (active && active->is_active)
+   update_slave_arr = true;
+   }
}
bond_3ad_set_carrier(bond);
}


The described issue is a race condition (in that
ad_agg_selection_logic clears agg->is_active under mode_lock, but
bond_open -> bond_update_slave_arr is inspecting agg->is_active outside
the lock).  I don't see how the above change will reliably manage this;
the real issue looks to be that bond_update_slave_arr is committing
changes to the array (via bond_reset_slave_arr) based on a racy
inspection of the active aggregator state while it is in flux.

Also, the description of the issue says "The best aggregator in
ad_agg_selection_logic has not changed, no need to update slave arr,"
but the change above does the opposite, and will set update_slave_arr
when the aggregator has not changed (update_slave_arr remains false at
return of ad_agg_selection_logic).

I believe I understand the described problem, but I don't see
how the patch fixes it.  I suspect (but haven't tested) that the proper
fix is to acquire mode_lock in bond_update_slave_arr while calling
bond_3ad_get_active_agg_info to avoid conflict with the state machine.

-J

---
-Jay Vosburgh, jay.vosbu...@canonical.com
.



	Thank you for your reply. The last patch does have redundant update 
slave arr.Thank you for your correction.


As you said, holding mode_lock in bond_update_slave_arr while 
calling bond_3ad_get_active_agg_info can avoid conflictwith the state 
machine. I have tested this patch, with ifdown/ifup operations for bond 
or slaves.


But bond_update_slave_arr is expected to hold RTNL only and NO 
other lock. And it have WARN_ON(lockdep_is_held(&bond->mode_lock)); in 
bond_update_slave_arr. I'm not sure that holding mode_lock in 
bond_update_slave_arr while calling bond_3ad_get_active_agg_info is a 
correct action.



diff --git a/drivers/net/bonding/bond_main.c 
b/drivers/net/bonding/bond_main.c

index 74cbbb2..db988e5 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4406,7 +4406,9 @@ int bond_update_slave_arr(struct bonding *bond, 
struct slave *skipslave)

if (BOND_MODE(bond) == BOND_MODE_8023AD) {
struct ad_info ad_info;

+   spin_lock_bh(&bond->mode_lock);
if (bond_3ad_get_active_agg_info(bond, &ad_info)) {
+   spin_unlock_bh(&bond->mode_lock);
pr_debug("bond_3ad_get_active_agg_info failed\n");
/* No active aggragator means it's not safe to use
 * the previous array.
@@ -4414,6 +4416,7 @@ int bond_update_slave_arr(struct bonding *bond, 
struct slave *skipslave)

bond_reset_slave_arr(bond);
goto out;
}
+   spin_unlock_bh(&bond->mode_lock);
agg_id = ad_info.aggregator_id;
}
bond_for_each_slave(bond, slave, iter) {

[PATCH v7 9/9] Documentation: arm64: Document PMU counters access from userspace

2021-04-19 Thread Rob Herring

From: Raphael Gault 

Add documentation to describe the access to the pmu hardware counters from
userspace.

Signed-off-by: Raphael Gault 
Signed-off-by: Rob Herring 
---
v7:
 - Merge into existing arm64 perf.rst
v6:
  - Update the chained event section with attr.config1 details
v2:
  - Update links to test examples

Changes from Raphael's v4:
  - Convert to rSt
  - Update chained event status
  - Add section for heterogeneous systems
---
 Documentation/arm64/perf.rst | 67 +++-
 1 file changed, 66 insertions(+), 1 deletion(-)

diff --git a/Documentation/arm64/perf.rst b/Documentation/arm64/perf.rst
index b567f177d385..c40fc2adc451 100644
--- a/Documentation/arm64/perf.rst
+++ b/Documentation/arm64/perf.rst
@@ -2,7 +2,10 @@
 
 .. _perf_index:
 
-=
+
+Perf
+
+
 Perf Event Attributes
 =
 
@@ -88,3 +91,65 @@ exclude_host. However when using !exclude_hv there is a 
small blackout
 window at the guest entry/exit where host events are not captured.
 
 On VHE systems there are no blackout windows.
+
+Perf Userspace PMU Hardware Counter Access
+==
+
+Overview
+
+The perf userspace tool relies on the PMU to monitor events. It offers an
+abstraction layer over the hardware counters since the underlying
+implementation is cpu-dependent.
+Arm64 allows userspace tools to have access to the registers storing the
+hardware counters' values directly.
+
+This targets specifically self-monitoring tasks in order to reduce the overhead
+by directly accessing the registers without having to go through the kernel.
+
+How-to
+--
+The focus is set on the armv8 PMUv3 which makes sure that the access to the pmu
+registers is enabled and that the userspace has access to the relevant
+information in order to use them.
+
+In order to have access to the hardware counter it is necessary to open the 
event
+using the perf tool interface: the sys_perf_event_open syscall returns a fd 
which
+can subsequently be used with the mmap syscall in order to retrieve a page of
+memory containing information about the event.
+The PMU driver uses this page to expose to the user the hardware counter's
+index and other necessary data. Using this index enables the user to access the
+PMU registers using the `mrs` instruction.
+
+The userspace access is supported in libperf using the perf_evsel__mmap()
+and perf_evsel__read() functions. See `tools/lib/perf/tests/test-evsel.c`_ for
+an example.
+
+About heterogeneous systems
+---
+On heterogeneous systems such as big.LITTLE, userspace PMU counter access can
+only be enabled when the tasks are pinned to a homogeneous subset of cores and
+the corresponding PMU instance is opened by specifying the 'type' attribute.
+The use of generic event types is not supported in this case.
+
+Have a look at `tools/perf/arch/arm64/tests/user-events.c`_ for an example. It
+can be run using the perf tool to check that the access to the registers works
+correctly from userspace:
+
+.. code-block:: sh
+
+  perf test -v user
+
+About chained events and 64-bit counters
+
+Chained events are not supported in conjunction with userspace counter
+access. If a 64-bit counter is requested (attr.config1:0), then
+userspace access must also be requested with attr.config1:1 set. This
+will disable counter chaining. The 'pmc_width' in the user page will
+indicate the actual width of the counter which could be only 32-bits
+depending on event and PMU features.
+
+.. Links
+.. _tools/perf/arch/arm64/tests/user-events.c:
+   
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/arch/arm64/tests/user-events.c
+.. _tools/lib/perf/tests/test-evsel.c:
+   
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/tests/test-evsel.c
-- 
2.27.0

[PATCH v7 7/9] perf: arm64: Add test for userspace counter access on heterogeneous systems

2021-04-19 Thread Rob Herring

Userspace counter access only works on heterogeneous systems with some
restrictions. The userspace process must be pinned to a homogeneous
subset of CPUs and must open the corresponding PMU for those CPUs. This
commit adds a test implementing these requirements.

Signed-off-by: Rob Herring 
---
v6:
 - Add a check on cap_user_rdpmc
v5:
 - Adapt to libperf mmap API changes
v4:
 - Update perf_evsel__mmap params
v2:
 - Drop all but heterogeneous test as others covered by libperf tests
 - Rework to use libperf
---
 tools/perf/arch/arm64/include/arch-tests.h |   7 +
 tools/perf/arch/arm64/tests/Build  |   1 +
 tools/perf/arch/arm64/tests/arch-tests.c   |   4 +
 tools/perf/arch/arm64/tests/user-events.c  | 177 +
 4 files changed, 189 insertions(+)
 create mode 100644 tools/perf/arch/arm64/tests/user-events.c

diff --git a/tools/perf/arch/arm64/include/arch-tests.h 
b/tools/perf/arch/arm64/include/arch-tests.h
index 90ec4c8cb880..380ad34a3f09 100644
--- a/tools/perf/arch/arm64/include/arch-tests.h
+++ b/tools/perf/arch/arm64/include/arch-tests.h
@@ -2,11 +2,18 @@
 #ifndef ARCH_TESTS_H
 #define ARCH_TESTS_H
 
+#include 
+
 #ifdef HAVE_DWARF_UNWIND_SUPPORT
 struct thread;
 struct perf_sample;
+int test__arch_unwind_sample(struct perf_sample *sample,
+struct thread *thread);
 #endif
 
 extern struct test arch_tests[];
+int test__rd_pinned(struct test __maybe_unused *test,
+  int __maybe_unused subtest);
+
 
 #endif
diff --git a/tools/perf/arch/arm64/tests/Build 
b/tools/perf/arch/arm64/tests/Build
index a61c06bdb757..3f9a20c17fc6 100644
--- a/tools/perf/arch/arm64/tests/Build
+++ b/tools/perf/arch/arm64/tests/Build
@@ -1,4 +1,5 @@
 perf-y += regs_load.o
 perf-$(CONFIG_DWARF_UNWIND) += dwarf-unwind.o
 
+perf-y += user-events.o
 perf-y += arch-tests.o
diff --git a/tools/perf/arch/arm64/tests/arch-tests.c 
b/tools/perf/arch/arm64/tests/arch-tests.c
index 5b1543c98022..80ce7bd3c16d 100644
--- a/tools/perf/arch/arm64/tests/arch-tests.c
+++ b/tools/perf/arch/arm64/tests/arch-tests.c
@@ -10,6 +10,10 @@ struct test arch_tests[] = {
.func = test__dwarf_unwind,
},
 #endif
+   {
+   .desc = "Pinned CPU user counter access",
+   .func = test__rd_pinned,
+   },
{
.func = NULL,
},
diff --git a/tools/perf/arch/arm64/tests/user-events.c 
b/tools/perf/arch/arm64/tests/user-events.c
new file mode 100644
index ..c8efc6b369e6
--- /dev/null
+++ b/tools/perf/arch/arm64/tests/user-events.c
@@ -0,0 +1,177 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include "pmu.h"
+#include "debug.h"
+#include "tests/tests.h"
+#include "arch-tests.h"
+
+static int run_test(struct perf_evsel *evsel)
+{
+   int n;
+   volatile int tmp = 0;
+   u64 delta, i, loops = 1000;
+   struct perf_counts_values counts = { .val = 0 };
+
+   for (n = 0; n < 6; n++) {
+   u64 stamp, now;
+
+   perf_evsel__read(evsel, 0, 0, &counts);
+   stamp = counts.val;
+
+   for (i = 0; i < loops; i++)
+   tmp++;
+
+   perf_evsel__read(evsel, 0, 0, &counts);
+   now = counts.val;
+   loops *= 10;
+
+   delta = now - stamp;
+   pr_debug("%14d: %14llu\n", n, (long long)delta);
+
+   if (!delta)
+   break;
+   }
+   return delta ? 0 : -1;
+}
+
+static struct perf_pmu *pmu_for_cpu(int cpu)
+{
+   int acpu, idx;
+   struct perf_pmu *pmu = NULL;
+
+   while ((pmu = perf_pmu__scan(pmu)) != NULL) {
+   if (pmu->is_uncore)
+   continue;
+   perf_cpu_map__for_each_cpu(acpu, idx, pmu->cpus)
+   if (acpu == cpu)
+   return pmu;
+   }
+   return NULL;
+}
+
+static bool pmu_is_homogeneous(void)
+{
+   int core_cnt = 0;
+   struct perf_pmu *pmu = NULL;
+
+   while ((pmu = perf_pmu__scan(pmu)) != NULL) {
+   if (!pmu->is_uncore && !perf_cpu_map__empty(pmu->cpus))
+   core_cnt++;
+   }
+   return core_cnt == 1;
+}
+
+static int libperf_print(enum libperf_print_level level,
+const char *fmt, va_list ap)
+{
+   (void)level;
+   return vfprintf(stderr, fmt, ap);
+}
+
+static struct perf_evsel *perf_init(struct perf_event_attr *attr)
+{
+   int err;
+   struct perf_thread_map *threads;
+   struct perf_evsel *evsel;
+   struct perf_event_mmap_page *pc;
+
+   libperf_init(libperf_print);
+
+   threads = perf_thread_map__new_dummy();
+   if (!threads) {
+   pr_err("failed to create threads\n");
+   return NULL;
+   }
+
+   perf_thread_map__set_pid(threads, 0, 0);
+
+   evsel = perf_evsel__new(attr);
+   if (!evsel) {
+

[PATCH v7 8/9] perf: arm64: Add tests for 32-bit and 64-bit counter size userspace access

2021-04-19 Thread Rob Herring

Add arm64 specific tests for 32-bit and 64-bit counter user access. On
arm64, counters default to 32-bit unless attr.config1:0 is set to 1. In
order to enable user access for 64-bit counters, attr.config1 must be set
to 3.

Signed-off-by: Rob Herring 
---
v6:
 - New patch
---
 tools/perf/arch/arm64/include/arch-tests.h |  5 +++
 tools/perf/arch/arm64/tests/arch-tests.c   |  8 +
 tools/perf/arch/arm64/tests/user-events.c  | 38 ++
 3 files changed, 51 insertions(+)

diff --git a/tools/perf/arch/arm64/include/arch-tests.h 
b/tools/perf/arch/arm64/include/arch-tests.h
index 380ad34a3f09..ddfa7460e1e1 100644
--- a/tools/perf/arch/arm64/include/arch-tests.h
+++ b/tools/perf/arch/arm64/include/arch-tests.h
@@ -15,5 +15,10 @@ extern struct test arch_tests[];
 int test__rd_pinned(struct test __maybe_unused *test,
   int __maybe_unused subtest);
 
+int test__rd_64bit(struct test __maybe_unused *test,
+  int __maybe_unused subtest);
+
+int test__rd_32bit(struct test __maybe_unused *test,
+  int __maybe_unused subtest);
 
 #endif
diff --git a/tools/perf/arch/arm64/tests/arch-tests.c 
b/tools/perf/arch/arm64/tests/arch-tests.c
index 80ce7bd3c16d..bbdb81aa3229 100644
--- a/tools/perf/arch/arm64/tests/arch-tests.c
+++ b/tools/perf/arch/arm64/tests/arch-tests.c
@@ -14,6 +14,14 @@ struct test arch_tests[] = {
.desc = "Pinned CPU user counter access",
.func = test__rd_pinned,
},
+   {
+   .desc = "User 64-bit counter access",
+   .func = test__rd_64bit,
+   },
+   {
+   .desc = "User 32-bit counter access",
+   .func = test__rd_32bit,
+   },
{
.func = NULL,
},
diff --git a/tools/perf/arch/arm64/tests/user-events.c 
b/tools/perf/arch/arm64/tests/user-events.c
index c8efc6b369e6..546323b5242c 100644
--- a/tools/perf/arch/arm64/tests/user-events.c
+++ b/tools/perf/arch/arm64/tests/user-events.c
@@ -175,3 +175,41 @@ int test__rd_pinned(struct test __maybe_unused *test,
perf_evsel__delete(evsel);
return ret;
 }
+
+static int test__rd_counter_size(struct test __maybe_unused *test,
+int config1)
+{
+   int ret;
+   struct perf_evsel *evsel;
+   struct perf_event_attr attr = {
+   .type = PERF_TYPE_HARDWARE,
+   .config = PERF_COUNT_HW_INSTRUCTIONS,
+   .config1 = config1,
+   .exclude_kernel = 1,
+   };
+
+   if (!pmu_is_homogeneous())
+   return TEST_SKIP;
+
+   evsel = perf_init(&attr);
+   if (!evsel)
+   return -1;
+
+   ret = run_test(evsel);
+
+   perf_evsel__close(evsel);
+   perf_evsel__delete(evsel);
+   return ret;
+}
+
+int test__rd_64bit(struct test __maybe_unused *test,
+  int __maybe_unused subtest)
+{
+   return test__rd_counter_size(test, 0x3);
+}
+
+int test__rd_32bit(struct test __maybe_unused *test,
+  int __maybe_unused subtest)
+{
+   return test__rd_counter_size(test, 0x2);
+}
-- 
2.27.0

[PATCH v7 6/9] libperf: Add arm64 support to perf_mmap__read_self()

2021-04-19 Thread Rob Herring

Add the arm64 variants for read_perf_counter() and read_timestamp().
Unfortunately the counter number is encoded into the instruction, so the
code is a bit verbose to enumerate all possible counters.

Signed-off-by: Rob Herring 
---
v7:
 - Move enabling of libperf user read test for arm64 to this patch
---
 tools/lib/perf/mmap.c | 98 +++
 tools/lib/perf/tests/test-evsel.c |  2 +-
 2 files changed, 99 insertions(+), 1 deletion(-)

diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
index 915469f00cf4..216e6c6212a2 100644
--- a/tools/lib/perf/mmap.c
+++ b/tools/lib/perf/mmap.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 
 void perf_mmap__init(struct perf_mmap *map, struct perf_mmap *prev,
@@ -294,6 +295,103 @@ static u64 read_timestamp(void)
 
return low | ((u64)high) << 32;
 }
+#elif defined(__aarch64__)
+#define read_sysreg(r) ({  \
+   u64 __val;  \
+   asm volatile("mrs %0, " __stringify(r) : "=r" (__val)); \
+   __val;  \
+})
+
+static u64 read_pmccntr(void)
+{
+   return read_sysreg(pmccntr_el0);
+}
+
+#define PMEVCNTR_READ(idx) \
+   static u64 read_pmevcntr_##idx(void) {  \
+   return read_sysreg(pmevcntr##idx##_el0);\
+   }
+
+PMEVCNTR_READ(0);
+PMEVCNTR_READ(1);
+PMEVCNTR_READ(2);
+PMEVCNTR_READ(3);
+PMEVCNTR_READ(4);
+PMEVCNTR_READ(5);
+PMEVCNTR_READ(6);
+PMEVCNTR_READ(7);
+PMEVCNTR_READ(8);
+PMEVCNTR_READ(9);
+PMEVCNTR_READ(10);
+PMEVCNTR_READ(11);
+PMEVCNTR_READ(12);
+PMEVCNTR_READ(13);
+PMEVCNTR_READ(14);
+PMEVCNTR_READ(15);
+PMEVCNTR_READ(16);
+PMEVCNTR_READ(17);
+PMEVCNTR_READ(18);
+PMEVCNTR_READ(19);
+PMEVCNTR_READ(20);
+PMEVCNTR_READ(21);
+PMEVCNTR_READ(22);
+PMEVCNTR_READ(23);
+PMEVCNTR_READ(24);
+PMEVCNTR_READ(25);
+PMEVCNTR_READ(26);
+PMEVCNTR_READ(27);
+PMEVCNTR_READ(28);
+PMEVCNTR_READ(29);
+PMEVCNTR_READ(30);
+
+/*
+ * Read a value direct from PMEVCNTR
+ */
+static u64 read_perf_counter(unsigned int counter)
+{
+   static u64 (* const read_f[])(void) = {
+   read_pmevcntr_0,
+   read_pmevcntr_1,
+   read_pmevcntr_2,
+   read_pmevcntr_3,
+   read_pmevcntr_4,
+   read_pmevcntr_5,
+   read_pmevcntr_6,
+   read_pmevcntr_7,
+   read_pmevcntr_8,
+   read_pmevcntr_9,
+   read_pmevcntr_10,
+   read_pmevcntr_11,
+   read_pmevcntr_13,
+   read_pmevcntr_12,
+   read_pmevcntr_14,
+   read_pmevcntr_15,
+   read_pmevcntr_16,
+   read_pmevcntr_17,
+   read_pmevcntr_18,
+   read_pmevcntr_19,
+   read_pmevcntr_20,
+   read_pmevcntr_21,
+   read_pmevcntr_22,
+   read_pmevcntr_23,
+   read_pmevcntr_24,
+   read_pmevcntr_25,
+   read_pmevcntr_26,
+   read_pmevcntr_27,
+   read_pmevcntr_28,
+   read_pmevcntr_29,
+   read_pmevcntr_30,
+   read_pmccntr
+   };
+
+   if (counter < ARRAY_SIZE(read_f))
+   return (read_f[counter])();
+
+   return 0;
+}
+
+static u64 read_timestamp(void) { return read_sysreg(cntvct_el0); }
+
 #else
 static u64 read_perf_counter(unsigned int counter) { return 0; }
 static u64 read_timestamp(void) { return 0; }
diff --git a/tools/lib/perf/tests/test-evsel.c 
b/tools/lib/perf/tests/test-evsel.c
index 288b5feaefe2..670fcdb6d6af 100644
--- a/tools/lib/perf/tests/test-evsel.c
+++ b/tools/lib/perf/tests/test-evsel.c
@@ -148,7 +148,7 @@ static int test_stat_user_read(int event)
 
pc = perf_evsel__mmap_base(evsel, 0, 0);
 
-#if defined(__i386__) || defined(__x86_64__)
+#if defined(__i386__) || defined(__x86_64__) || __defined(__aarch64__)
__T("userspace counter access not supported", pc->cap_user_rdpmc);
__T("userspace counter access not enabled", pc->index);
__T("userspace counter width not set", pc->pmc_width >= 32);
-- 
2.27.0

[PATCH v7 5/9] arm64: perf: Add userspace counter access disable switch

2021-04-19 Thread Rob Herring

Like x86, some users may want to disable userspace PMU counter
altogether. Add a sysfs 'rdpmc' file to control userspace counter
access. The default is '1' which is enabled. Writing '0' disables
access.

In the case of multiple PMUs (i.e. big.LITTLE), the control is per PMU
and userspace must disable access on each PMU.

Note that x86 also supports writing '2' to globally enable user access.
As there's not existing userspace support to worry about, this shouldn't
be necessary for Arm. It could be added later if the need arises.

Signed-off-by: Rob Herring 
---
 arch/arm64/kernel/perf_event.c | 61 --
 include/linux/perf/arm_pmu.h   |  4 ++-
 2 files changed, 61 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index bfbb7f449aca..1ab6308ca89c 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -336,6 +336,54 @@ static const struct attribute_group 
armv8_pmuv3_caps_attr_group = {
.attrs = armv8_pmuv3_caps_attrs,
 };
 
+static void armv8pmu_disable_user_access(void *mm);
+
+static ssize_t get_attr_rdpmc(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+   struct pmu *pmu = dev_get_drvdata(dev);
+   struct arm_pmu *cpu_pmu = container_of(pmu, struct arm_pmu, pmu);
+
+   return snprintf(buf, 40, "%d\n", cpu_pmu->attr_rdpmc);
+}
+
+static ssize_t set_attr_rdpmc(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+   struct pmu *pmu = dev_get_drvdata(dev);
+   struct arm_pmu *cpu_pmu = container_of(pmu, struct arm_pmu, pmu);
+   unsigned long val;
+   ssize_t ret;
+
+   ret = kstrtoul(buf, 0, &val);
+   if (ret)
+   return ret;
+
+   if (val > 1)
+   return -EINVAL;
+
+   if (val != cpu_pmu->attr_rdpmc) {
+   cpu_pmu->attr_rdpmc = !!val;
+   if (!val)
+   on_each_cpu_mask(&cpu_pmu->supported_cpus, 
armv8pmu_disable_user_access, NULL, 1);
+   }
+
+   return count;
+}
+
+static DEVICE_ATTR(rdpmc, S_IRUSR | S_IWUSR, get_attr_rdpmc, set_attr_rdpmc);
+
+static struct attribute *armv8_pmuv3_rdpmc_attrs[] = {
+   &dev_attr_rdpmc.attr,
+   NULL,
+};
+
+static const struct attribute_group armv8_pmuv3_rdpmc_attr_group = {
+   .attrs = armv8_pmuv3_rdpmc_attrs,
+};
+
 /*
  * Perf Events' indices
  */
@@ -950,7 +998,8 @@ static void armv8pmu_sched_task(struct perf_event_context 
*ctx, bool sched_in)
 * If a new task has user access enabled, clear the dirty counters
 * to prevent the potential leak.
 */
-   if (ctx && current->mm && 
atomic_read(¤t->mm->context.pmu_direct_access)) {
+   if (ctx && to_arm_pmu(ctx->pmu)->attr_rdpmc &&
+   current->mm && 
atomic_read(¤t->mm->context.pmu_direct_access)) {
armv8pmu_enable_user_access();
armv8pmu_clear_dirty_counters(to_arm_pmu(ctx->pmu));
} else {
@@ -1093,7 +1142,8 @@ static int __armv8_pmuv3_map_event(struct perf_event 
*event,
   &armv8_pmuv3_perf_cache_map,
   ARMV8_PMU_EVTYPE_EVENT);
 
-   if (armv8pmu_event_want_user_access(event) || 
!armv8pmu_event_is_64bit(event))
+   if (armpmu->attr_rdpmc &&
+   (armv8pmu_event_want_user_access(event) || 
!armv8pmu_event_is_64bit(event)))
event->hw.flags |= ARMPMU_EL0_RD_CNTR;
 
/*
@@ -1218,7 +1268,9 @@ static int armv8pmu_probe_pmu(struct arm_pmu *cpu_pmu)
 
 static int armv8pmu_undef_handler(struct pt_regs *regs, u32 insn)
 {
-   if (atomic_read(¤t->mm->context.pmu_direct_access)) {
+   struct arm_pmu *armpmu = *this_cpu_ptr(&cpu_armpmu);
+
+   if (armpmu->attr_rdpmc && 
atomic_read(¤t->mm->context.pmu_direct_access)) {
armv8pmu_enable_user_access();
return 0;
}
@@ -1277,6 +1329,9 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu, char 
*name,
cpu_pmu->attr_groups[ARMPMU_ATTR_GROUP_CAPS] = caps ?
caps : &armv8_pmuv3_caps_attr_group;
 
+   cpu_pmu->attr_groups[ARMPMU_ATTR_GROUP_RDPMC] = 
&armv8_pmuv3_rdpmc_attr_group;
+   cpu_pmu->attr_rdpmc = true;
+
return 0;
 }
 
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 1daad3b2cce5..9303cd07ac57 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -82,6 +82,7 @@ enum armpmu_attr_groups {
ARMPMU_ATTR_GROUP_EVENTS,
ARMPMU_ATTR_GROUP_FORMATS,
ARMPMU_ATTR_GROUP_CAPS,
+   ARMPMU_ATTR_GROUP_RDPMC,
ARMPMU_NR_ATTR_GROUPS
 };
 
@@ -107,7 +108,8 @@ struct arm_pmu {
int (*map_event)(struct perf_event *event);
int (*filter_match)(struct perf_event *event);

[PATCH v7 3/9] arm64: perf: Enable PMU counter direct access for perf event

2021-04-19 Thread Rob Herring

From: Raphael Gault 

Keep track of event opened with direct access to the hardware counters
and modify permissions while they are open.

The strategy used here is the same which x86 uses: every time an event
is mapped, the permissions are set if required. The atomic field added
in the mm_context helps keep track of the different event opened and
de-activate the permissions when all are unmapped.
We also need to update the permissions in the context switch code so
that tasks keep the right permissions.

In order to enable 64-bit counters for userspace when available, a new
config1 bit is added for userspace to indicate it wants userspace counter
access. This bit allows the kernel to decide if chaining should be
disabled and chaining and userspace access are incompatible.
The modes for config1 are as follows:

config1 = 0 or 2 : user access enabled and always 32-bit
config1 = 1 : user access disabled and always 64-bit (using chaining if needed)
config1 = 3 : user access enabled and counter size matches underlying counter.

User access is enabled with config1 == 0 so that we match x86 behavior
and don't need Arm specific code (outside the counter read).

Signed-off-by: Raphael Gault 
Signed-off-by: Rob Herring 
---
Peter Z says (event->oncpu == smp_processor_id()) in the user page
update is always true, but my testing says otherwise[1].

v7:
 - Clear disabled counters when user access is enabled for a task to
   avoid leaking other tasks counter data.
 - Rework context switch handling utilizing sched_task callback
 - Add armv8pmu_event_can_chain() helper
 - Rework config1 flags handling structure
 - Use ARMV8_IDX_CYCLE_COUNTER_USER define for remapped user cycle
   counter index

v6:
 - Add new attr.config1 rdpmc bit for userspace to hint it wants
   userspace access when also requesting 64-bit counters.

v5:
 - Only set cap_user_rdpmc if event is on current cpu
 - Limit enabling/disabling access to CPUs associated with the PMU
   (supported_cpus) and with the mm_struct matching current->active_mm.

v2:
 - Move mapped/unmapped into arm64 code. Fixes arm32.
 - Rebase on cap_user_time_short changes

Changes from Raphael's v4:
  - Drop homogeneous check
  - Disable access for chained counters
  - Set pmc_width in user page

[1] 
https://lore.kernel.org/lkml/cal_jsqk+ekef5navnbfarcjre3myhfbfe54f9yhkbstnwql...@mail.gmail.com/
---
 arch/arm64/include/asm/mmu.h   |   5 +
 arch/arm64/kernel/perf_event.c | 179 +++--
 include/linux/perf/arm_pmu.h   |   6 ++
 3 files changed, 183 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 75beffe2ee8a..ee08447455da 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -18,6 +18,11 @@
 
 typedef struct {
atomic64_t  id;
+   /*
+* non-zero if userspace have access to hardware
+* counters directly.
+*/
+   atomic_tpmu_direct_access;
 #ifdef CONFIG_COMPAT
void*sigpage;
 #endif
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 40cf59455ce8..bfbb7f449aca 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -8,9 +8,11 @@
  * This code is based heavily on the ARMv7 perf event code.
  */
 
+#include 
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -288,15 +290,22 @@ static const struct attribute_group 
armv8_pmuv3_events_attr_group = {
 
 PMU_FORMAT_ATTR(event, "config:0-15");
 PMU_FORMAT_ATTR(long, "config1:0");
+PMU_FORMAT_ATTR(rdpmc, "config1:1");
 
 static inline bool armv8pmu_event_is_64bit(struct perf_event *event)
 {
return event->attr.config1 & 0x1;
 }
 
+static inline bool armv8pmu_event_want_user_access(struct perf_event *event)
+{
+   return event->attr.config1 & 0x2;
+}
+
 static struct attribute *armv8_pmuv3_format_attrs[] = {
&format_attr_event.attr,
&format_attr_long.attr,
+   &format_attr_rdpmc.attr,
NULL,
 };
 
@@ -344,6 +353,15 @@ static bool armv8pmu_has_long_event(struct arm_pmu 
*cpu_pmu)
return (cpu_pmu->pmuver >= ID_AA64DFR0_PMUVER_8_5);
 }
 
+static inline bool armv8pmu_event_can_chain(struct perf_event *event)
+{
+   struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
+
+   return !armv8pmu_event_want_user_access(event) &&
+  armv8pmu_event_is_64bit(event) &&
+  !armv8pmu_has_long_event(cpu_pmu);
+}
+
 /*
  * We must chain two programmable counters for 64 bit events,
  * except when we have allocated the 64bit cycle counter (for CPU
@@ -353,11 +371,9 @@ static bool armv8pmu_has_long_event(struct arm_pmu 
*cpu_pmu)
 static inline bool armv8pmu_event_is_chained(struct perf_event *event)
 {
int idx = event->hw.idx;
-   struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 
return !WARN_ON(idx < 0) &&
-  armv8pmu_event_is_64bit(event) &&
-  !armv8pmu_has_long_event(cpu_pmu) &&
+

[PATCH v7 4/9] drivers/perf: arm_pmu: Export the per_cpu cpu_armpmu

2021-04-19 Thread Rob Herring

The arm64 PMU driver needs to retrieve the struct arm_pmu pointer for
the current CPU. As the common code already maintains this with the
percpu cpu_armpmu, let's make it global.

Signed-off-by: Rob Herring 
---
 drivers/perf/arm_pmu.c   | 2 +-
 include/linux/perf/arm_pmu.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 2d10d84fb79c..6ac56280b9c7 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -99,7 +99,7 @@ static const struct pmu_irq_ops percpu_pmunmi_ops = {
.free_pmuirq = armpmu_free_percpu_pmunmi
 };
 
-static DEFINE_PER_CPU(struct arm_pmu *, cpu_armpmu);
+DEFINE_PER_CPU(struct arm_pmu *, cpu_armpmu);
 static DEFINE_PER_CPU(int, cpu_irq);
 static DEFINE_PER_CPU(const struct pmu_irq_ops *, cpu_irq_ops);
 
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 63af052e3909..1daad3b2cce5 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -127,6 +127,8 @@ struct arm_pmu {
 
 #define to_arm_pmu(p) (container_of(p, struct arm_pmu, pmu))
 
+DECLARE_PER_CPU(struct arm_pmu *, cpu_armpmu);
+
 u64 armpmu_event_update(struct perf_event *event);
 
 int armpmu_event_set_period(struct perf_event *event);
-- 
2.27.0

[PATCH v7 2/9] arm64: pmu: Add function implementation to update event index in userpage

2021-04-19 Thread Rob Herring

From: Raphael Gault 

In order to be able to access the counter directly for userspace,
we need to provide the index of the counter using the userpage.
We thus need to override the event_idx function to retrieve and
convert the perf_event index to armv8 hardware index.

Since the arm_pmu driver can be used by any implementation, even
if not armv8, two components play a role into making sure the
behaviour is correct and consistent with the PMU capabilities:

* the ARMPMU_EL0_RD_CNTR flag which denotes the capability to access
counter from userspace.
* the event_idx call back, which is implemented and initialized by
the PMU implementation: if no callback is provided, the default
behaviour applies, returning 0 as index value.

Signed-off-by: Raphael Gault 
Signed-off-by: Rob Herring 
---
v7:
 - Add define ARMV8_IDX_CYCLE_COUNTER_USER for userspace index
   of cycle counter
---
 arch/arm64/kernel/perf_event.c | 20 +++-
 include/linux/perf/arm_pmu.h   |  2 ++
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 4658fcf88c2b..40cf59455ce8 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -332,7 +332,7 @@ static const struct attribute_group 
armv8_pmuv3_caps_attr_group = {
  */
 #defineARMV8_IDX_CYCLE_COUNTER 0
 #defineARMV8_IDX_COUNTER0  1
-
+#defineARMV8_IDX_CYCLE_COUNTER_USER32
 
 /*
  * We unconditionally enable ARMv8.5-PMU long event counter support
@@ -871,6 +871,22 @@ static void armv8pmu_clear_event_idx(struct pmu_hw_events 
*cpuc,
clear_bit(idx - 1, cpuc->used_mask);
 }
 
+static int armv8pmu_access_event_idx(struct perf_event *event)
+{
+   if (!(event->hw.flags & ARMPMU_EL0_RD_CNTR))
+   return 0;
+
+   /*
+* We remap the cycle counter index to 32 to
+* match the offset applied to the rest of
+* the counter indices.
+*/
+   if (event->hw.idx == ARMV8_IDX_CYCLE_COUNTER)
+   return ARMV8_IDX_CYCLE_COUNTER_USER;
+
+   return event->hw.idx;
+}
+
 /*
  * Add an event filter to a given event.
  */
@@ -1098,6 +1114,8 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu, char 
*name,
cpu_pmu->set_event_filter   = armv8pmu_set_event_filter;
cpu_pmu->filter_match   = armv8pmu_filter_match;
 
+   cpu_pmu->pmu.event_idx  = armv8pmu_access_event_idx;
+
cpu_pmu->name   = name;
cpu_pmu->map_event  = map_event;
cpu_pmu->attr_groups[ARMPMU_ATTR_GROUP_EVENTS] = events ?
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 505480217cf1..d29aa981d989 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -26,6 +26,8 @@
  */
 /* Event uses a 64bit counter */
 #define ARMPMU_EVT_64BIT   1
+/* Allow access to hardware counter from userspace */
+#define ARMPMU_EL0_RD_CNTR 2
 
 #define HW_OP_UNSUPPORTED  0x
 #define C(_x)  PERF_COUNT_HW_CACHE_##_x
-- 
2.27.0

[PATCH v7 0/9] arm64 userspace counter access support

2021-04-19 Thread Rob Herring

Hi all,

Another version of arm64 userspace counter access support. I sent out
the libperf bits separately and those are now applied (Thanks!), so this
is just the arm64 bits.


This originally resurrected Raphael's series[1] to enable userspace counter
access on arm64. My previous versions are here[2][3][4][5][6][7]. A git
branch is here[8].


Changes in v7:
 - Handling of dirty counter leakage and reworking of context switch and
   user access enabling. The .sched_task hook and undef instruction handler
   are now utilized. (Patch 3)
 - Add a userspace disable switch like x86. (Patch 5)

Changes in v6:
 - Reworking of the handling of 64-bit counters and user access. There's
   a new config1 flag to request user access. This takes priority over
   the 64-bit flag and the user will get the maximum size the h/w
   supports without chaining.
 - The libperf evsel mmap struct is stored in its own xyarray
 - New tests for user 64-bit and 32-bit counters
 - Rebase to v5.12-rc2

Changes in v5:
 - Limit enabling/disabling access to CPUs associated with the PMU
   (supported_cpus) and with the mm_struct matching current->active_mm.
   The x86 method of using mm_cpumask doesn't work for arm64 as it is not
   updated.
 - Only set cap_user_rdpmc if event is on current cpu. See patch 2.
 - Create an mmap for every event in an evsel. This results in some changes
   to the libperf mmap API from the last version.
 - Rebase to v5.11-rc2

Changes in v4:
 - Dropped 'arm64: pmu: Add hook to handle pmu-related undefined instructions'.
   The onus is on userspace to pin itself to a homogeneous subset of CPUs
   and avoid any aborts on heterogeneous systems, so the hook is not needed.
 - Make perf_evsel__mmap() take pages rather than bytes for size
 - Fix building arm64 heterogeneous test.

Changes in v3:
 - Dropped removing x86 rdpmc test until libperf tests can run via 'perf test'
 - Added verbose prints for tests
 - Split adding perf_evsel__mmap() to separate patch

The following changes to the arm64 support have been made compared to
Raphael's last version:

The major change is support for heterogeneous systems with some
restrictions. Specifically, userspace must pin itself to like CPUs, open
a specific PMU by type, and use h/w specific events. The tests have been
reworked to demonstrate this.

Chained events are not supported. The problem with supporting chained
events was there's no way to distinguish between a chained event and a
native 64-bit counter. We could add some flag, but do self monitoring
processes really need that? Native 64-bit counters are supported if the
PMU h/w has support. As there's already an explicit ABI to request 64-bit
counters, userspace can request 64-bit counters and if user
access is not enabled, then it must retry with 32-bit counters.

Prior versions broke the build on arm32 (surprisingly never caught by
0-day). As a result, event_mapped and event_unmapped implementations have
been moved into the arm64 code.

There was a bug in that pmc_width was not set in the user page. The tests
now check for this.

The documentation has been converted to rST. I've added sections on
chained events and heterogeneous.

The tests have been expanded to test the cycle counter access.

Rob

[1] https://lore.kernel.org/r/20190822144220.27860-1-raphael.ga...@arm.com/
[2] https://lore.kernel.org/r/20200707205333.624938-1-r...@kernel.org/
[3] https://lore.kernel.org/r/20200828205614.3391252-1-r...@kernel.org/
[4] https://lore.kernel.org/r/20200911215118.2887710-1-r...@kernel.org/
[5] https://lore.kernel.org/r/20201001140116.651970-1-r...@kernel.org/
[6] https://lore.kernel.org/r/20210114020605.3943992-1-r...@kernel.org/
[7] https://lore.kernel.org/r/20210311000837.3630499-1-r...@kernel.org/
[8] git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git 
user-perf-event-v7

Raphael Gault (4):
  arm64: Restrict undef hook for cpufeature registers
  arm64: pmu: Add function implementation to update event index in
userpage
  arm64: perf: Enable PMU counter direct access for perf event
  Documentation: arm64: Document PMU counters access from userspace

Rob Herring (5):
  drivers/perf: arm_pmu: Export the per_cpu cpu_armpmu
  arm64: perf: Add userspace counter access disable switch
  libperf: Add arm64 support to perf_mmap__read_self()
  perf: arm64: Add test for userspace counter access on heterogeneous
systems
  perf: arm64: Add tests for 32-bit and 64-bit counter size userspace
access

 Documentation/arm64/perf.rst   |  67 +-
 arch/arm64/include/asm/mmu.h   |   5 +
 arch/arm64/kernel/cpufeature.c |   4 +-
 arch/arm64/kernel/perf_event.c | 254 -
 drivers/perf/arm_pmu.c |   2 +-
 include/linux/perf/arm_pmu.h   |  14 +-
 tools/lib/perf/mmap.c  |  98 
 tools/lib/perf/tests/test-evsel.c  |   2 +-
 tools/perf/arch/arm64/include/arch-tests.h |  12 +
 tools/perf/arch/arm64/te

[PATCH v7 1/9] arm64: Restrict undef hook for cpufeature registers

2021-04-19 Thread Rob Herring

From: Raphael Gault 

This commit modifies the mask of the mrs_hook declared in
arch/arm64/kernel/cpufeatures.c which emulates only feature register
access. This is necessary because this hook's mask was too large and
thus masking any mrs instruction, even if not related to the emulated
registers which made the pmu emulation inefficient.

Signed-off-by: Raphael Gault 
Signed-off-by: Rob Herring 
---
v7:
 - Split off from Raphael's original undef hook patch as this change stands
   on its own.
---
 arch/arm64/kernel/cpufeature.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 066030717a4c..aa0777690ab1 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2888,8 +2888,8 @@ static int emulate_mrs(struct pt_regs *regs, u32 insn)
 }

 static struct undef_hook mrs_hook = {
-   .instr_mask = 0xfff0,
-   .instr_val  = 0xd530,
+   .instr_mask = 0x,
+   .instr_val  = 0xd538,
.pstate_mask = PSR_AA32_MODE_MASK,
.pstate_val = PSR_MODE_EL0t,
.fn = emulate_mrs,
--
2.27.0

Re: [PATCH] hrtimer: Update softirq_expires_next correctly after __hrtimer_get_next_event()

2021-04-19 Thread Maciej Żenczykowski

On Thu, Apr 15, 2021 at 9:47 AM Thomas Gleixner  wrote:
>
> On Wed, Apr 14 2021 at 11:49, Lorenzo Colitti wrote:
> > On Wed, Apr 14, 2021 at 2:14 AM Greg KH  wrote:
> >> To give context, the commit is now 46eb1701c046 ("hrtimer: Update
> >> softirq_expires_next correctly after __hrtimer_get_next_event()") and is
> >> attached below.
> >>
> >> The f_ncm.c driver is doing a lot of calls to hrtimer_start() with mode
> >> HRTIMER_MODE_REL_SOFT for I think every packet it gets.  If that should
> >> not be happening, we can easily fix it but I guess no one has actually
> >> had fast USB devices that noticed this until now :)
> >
> > AIUI the purpose of this timer is to support packet aggregation. USB
> > transfers are relatively expensive/high latency. 6 Gbps is 500k
> > 1500-byte packets per second, or one every 2us. So f_ncm buffers as
> > many packets as will fit into 16k (usually, 10 1500-byte packets), and
> > only initiates a USB transfer when those packets have arrived. That
> > ends up doing only one transfer every 20us. It sets a 300us timer to
> > ensure that if the 10 packets haven't arrived, it still sends out
> > whatever it has when the timer fires. The timer is set/updated on
> > every packet buffered by ncm.
> >
> > Is this somehow much more expensive in 5.10.24 than it was before?
> > Even if this driver is somehow "holding it wrong", might there not be
> > other workloads that have a similar problem? What about regressions on
> > those workloads?
>
> Let's put the question of whether this hrtimer usage is sensible or not
> aside for now.
>
> I stared at the change for a while and did some experiments to recreate
> the problem, but that didn't get me anywhere.
>
> Could you please do the following?
>
> Enable tracing and enable the following tracepoints:
>
> timers/hrtimer_cancel
> timers/hrtimer_start
> timers/hrtimer_expire_entry
> irq/softirq_raise
> irq/softirq_enter
> irq/softirq_exit
>
> and function tracing filtered on ncm_wrap_ntb() and
> package_for_tx() only (to reduce the noise).
>
> Run the test on a kernels with and without that commit and collect trace
> data for both.
>
> That should give me a pretty clear picture what's going on.

Lorenzo is trying to get the traces you asked for, or rather he’s
trying to get confirmation he can post them.

Our initial observation of these results seems to suggest that
updating the timer (hrtimer_start, which seems to also call
hrtimer_cancel) takes twice as long as it used to.

My gut feeling is that softirq_activated is usually false, and the old
code in such a case calls just __hrtimer_get_next_event(,
HRTIMER_ACTIVE_ALL).  While the new code will first call
__hrtimer_get_next_event(, HRTIMER_ACTIVE_SOFT) and then
__hrtimer_get_next_event(, HRTIMER_ACTIVE_HARD)

Perhaps __hrtimer_get_next_event() should return both soft and hard
event times in one function call?
Or perhaps hrtimer_start should not call hrtimer_cancel?

We've also been looking at the f_ncm driver itself, and trying to
reduce the number of hrtimer_cancel/start calls.
It's pretty easy to reduce this by a factor of 10x (we’re not yet 100%
certain this is race free), which does drastically improve
performance.
However, it still only takes the regression from 60% to 20%.

- Maciej

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1435 matches

Mail list logo