Re: [PATCH v14 09/13] KVM: x86: Report CET MSRs as to-be-saved if CET is supported

2021-01-29 Thread Yang Weijiang
On Thu, Jan 28, 2021 at 06:46:37PM +0100, Paolo Bonzini wrote:
> On 06/11/20 02:16, Yang Weijiang wrote:
> > Report all CET MSRs, including the synthetic GUEST_SSP MSR, as
> > to-be-saved, e.g. for migration, if CET is supported by KVM.
> > 
> > Co-developed-by: Sean Christopherson 
> > Signed-off-by: Sean Christopherson 
> > Signed-off-by: Yang Weijiang 
> > ---
> >   arch/x86/kvm/x86.c | 9 +
> >   1 file changed, 9 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 751b62e871e5..d573cadf5baf 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -1248,6 +1248,8 @@ static const u32 msrs_to_save_all[] = {
> > MSR_ARCH_PERFMON_EVENTSEL0 + 16, MSR_ARCH_PERFMON_EVENTSEL0 + 17,
> > MSR_IA32_XSS,
> > +   MSR_IA32_U_CET, MSR_IA32_S_CET, MSR_IA32_INT_SSP_TAB, MSR_KVM_GUEST_SSP,
> > +   MSR_IA32_PL0_SSP, MSR_IA32_PL1_SSP, MSR_IA32_PL2_SSP, MSR_IA32_PL3_SSP,
> >   };
> >   static u32 msrs_to_save[ARRAY_SIZE(msrs_to_save_all)];
> > @@ -5761,6 +5763,13 @@ static void kvm_init_msr_list(void)
> > if (!supported_xss)
> > continue;
> > break;
> > +   case MSR_IA32_U_CET:
> > +   case MSR_IA32_S_CET:
> > +   case MSR_IA32_INT_SSP_TAB:
> > +   case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
> > +   if (!kvm_cet_supported())
> > +   continue;
> > +   break;
> > default:
> > break;
> > }
> > 
> 
> Missing "case MSR_KVM_GUEST_SSP".
>
OK, will fix it in next version.
> Paolo


[PATCH v2] perf/core: Wake up tasks for failing pinned events

2021-01-29 Thread Namhyung Kim
As of now we don't get any notice for pinned events when it's failed
to be scheduled and make it in an error state not try to schedule it
again.  That means we won't get any samples for the event.

It's possible we can detect it by reading the file, but usually we
only monitor it via mmap-ed ring buffers.  Let's poke the tasks
waiting for poll(2) so that they can respond to the event.

Signed-off-by: Namhyung Kim 
---
 include/linux/perf_event.h |  1 +
 kernel/events/core.c   | 10 ++
 2 files changed, 11 insertions(+)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 9a38f579bc76..0b3b3e97243b 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -733,6 +733,7 @@ struct perf_event {
int pending_wakeup;
int pending_kill;
int pending_disable;
+   int pending_pin_error;
struct irq_work pending;
 
atomic_tevent_limit;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 55d18791a72d..f8e9db30a573 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3675,6 +3675,8 @@ static int merge_sched_in(struct perf_event *event, void 
*data)
if (event->attr.pinned) {
perf_cgroup_event_disable(event, ctx);
perf_event_set_state(event, PERF_EVENT_STATE_ERROR);
+   event->pending_pin_error = 1;
+   irq_work_queue(&event->pending);
}
 
*can_add_hw = 0;
@@ -5288,6 +5290,9 @@ static __poll_t perf_poll(struct file *file, poll_table 
*wait)
if (is_event_hup(event))
return events;
 
+   if (event->attr.pinned && event->state == PERF_EVENT_STATE_ERROR)
+   return EPOLLERR;
+
/*
 * Pin the event->rb by taking event->mmap_mutex; otherwise
 * perf_event_set_output() can swizzle our rb and make us miss wakeups.
@@ -6333,6 +6338,11 @@ static void perf_pending_event(struct irq_work *entry)
perf_event_wakeup(event);
}
 
+   if (event->pending_pin_error) {
+   event->pending_pin_error = 0;
+   wake_up_all(&event->waitq);
+   }
+
if (rctx >= 0)
perf_swevent_put_recursion_context(rctx);
 }
-- 
2.30.0.365.g02bc693789-goog



Re: [PATCH] drm/tilcdc: send vblank event when disabling crtc

2021-01-29 Thread Tomi Valkeinen
Dropped the @ti.com addresses and added the new ones.

 Tomi

On 29/01/2021 07:58, quanyang.w...@windriver.com wrote:
> From: Quanyang Wang 
> 
> When run xrandr to change resolution on Beaglebone Black board, it will
> print the error information:
> 
> root@beaglebone:~# xrandr -display :0 --output HDMI-1 --mode 720x400
> [drm:drm_crtc_commit_wait] *ERROR* flip_done timed out
> [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:32:tilcdc crtc] 
> commit wait timed out
> [drm:drm_crtc_commit_wait] *ERROR* flip_done timed out
> [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CONNECTOR:34:HDMI-A-1] 
> commit wait timed out
> [drm:drm_crtc_commit_wait] *ERROR* flip_done timed out
> [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [PLANE:31:plane-0] 
> commit wait timed out
> tilcdc 4830e000.lcdc: already pending page flip!
> 
> This is because there is operation sequence as below:
> 
> drm_atomic_connector_commit_dpms(mode is DRM_MODE_DPMS_OFF):
> ...
> drm_atomic_helper_setup_commit <- init_completion(commit_A->flip_done)
> drm_atomic_helper_commit_tail
> tilcdc_crtc_atomic_disable
> tilcdc_plane_atomic_update <- drm_crtc_send_vblank_event in 
> tilcdc_crtc_irq
>   is skipped since tilcdc_crtc->enabled 
> is 0
> tilcdc_crtc_atomic_flush   <- drm_crtc_send_vblank_event is skipped 
> since
>   crtc->state->event is set to be NULL in
>   tilcdc_plane_atomic_update
> drm_mode_setcrtc:
> ...
> drm_atomic_helper_setup_commit <- init_completion(commit_B->flip_done)
> drm_atomic_helper_wait_for_dependencies
> drm_crtc_commit_wait   <- wait for commit_A->flip_done completing
> 
> Just as shown above, the steps which could complete commit_A->flip_done
> are all skipped and commit_A->flip_done will never be completed. This will
> result a time-out ERROR when drm_crtc_commit_wait check the 
> commit_A->flip_done.
> So add drm_crtc_send_vblank_event in tilcdc_crtc_atomic_disable to
> complete commit_A->flip_done.
> 
> Fixes: cb345decb4d2 ("drm/tilcdc: Use standard drm_atomic_helper_commit")
> Signed-off-by: Quanyang Wang 
> ---
>  drivers/gpu/drm/tilcdc/tilcdc_crtc.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/gpu/drm/tilcdc/tilcdc_crtc.c 
> b/drivers/gpu/drm/tilcdc/tilcdc_crtc.c
> index 30213708fc99..d99afd19ca08 100644
> --- a/drivers/gpu/drm/tilcdc/tilcdc_crtc.c
> +++ b/drivers/gpu/drm/tilcdc/tilcdc_crtc.c
> @@ -515,6 +515,15 @@ static void tilcdc_crtc_off(struct drm_crtc *crtc, bool 
> shutdown)
>  
>   drm_crtc_vblank_off(crtc);
>  
> + spin_lock_irq(&crtc->dev->event_lock);
> +
> + if (crtc->state->event) {
> + drm_crtc_send_vblank_event(crtc, crtc->state->event);
> + crtc->state->event = NULL;
> + }
> +
> + spin_unlock_irq(&crtc->dev->event_lock);
> +
>   tilcdc_crtc_disable_irqs(dev);
>  
>   pm_runtime_put_sync(dev->dev);
> 


Re: [PATCH 2/3] arm64/ptrace: introduce NT_ARM_PRSTATUS to get a full set of registers

2021-01-29 Thread Andrei Vagin
On Wed, Jan 27, 2021 at 02:53:07PM +, Dave Martin wrote:
> On Tue, Jan 19, 2021 at 02:06:36PM -0800, Andrei Vagin wrote:
> > This is an alternative to NT_PRSTATUS that clobbers ip/r12 on AArch32,
> > x7 on AArch64 when a tracee is stopped in syscall entry or syscall exit
> > traps.
> > 
> > Signed-off-by: Andrei Vagin 
> 
> This approach looks like it works, though I still think adding an option
> for this under PTRACE_SETOPTIONS would be less intrusive.

Dave, thank you for the feedback. I will prepare a patch with an option
and then we will see what looks better.

> 
> Adding a shadow regset like this also looks like it would cause the gp
> regs to be pointlessly be dumped twice in a core dump.  Avoiding that
> might require hacks in the core code...
> 
> 
> > ---
> >  arch/arm64/kernel/ptrace.c | 39 ++
> >  include/uapi/linux/elf.h   |  1 +
> >  2 files changed, 40 insertions(+)
> > 
> > diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
> > index 1863f080cb07..b8e4c2ddf636 100644
> > --- a/arch/arm64/kernel/ptrace.c
> > +++ b/arch/arm64/kernel/ptrace.c
> > @@ -591,6 +591,15 @@ static int gpr_get(struct task_struct *target,
> > return ret;
> >  }
> >  
> > +static int gpr_get_full(struct task_struct *target,
> > +  const struct user_regset *regset,
> > +  struct membuf to)
> > +{
> > +   struct user_pt_regs *uregs = &task_pt_regs(target)->user_regs;
> > +
> > +   return membuf_write(&to, uregs, sizeof(*uregs));
> > +}
> > +
> >  static int gpr_set(struct task_struct *target, const struct user_regset 
> > *regset,
> >unsigned int pos, unsigned int count,
> >const void *kbuf, const void __user *ubuf)
> > @@ -1088,6 +1097,7 @@ static int tagged_addr_ctrl_set(struct task_struct 
> > *target, const struct
> >  
> >  enum aarch64_regset {
> > REGSET_GPR,
> > +   REGSET_GPR_FULL,
> 
> If we go with this approach, "REGSET_GPR_RAW" might be a preferable
> name.  Both regs represent all the regs ("full"), but REGSET_GPR is
> mangled by the kernel.

I agree that REGSET_GPR_RAW looks better in this case.

> 
> > REGSET_FPR,
> > REGSET_TLS,
> >  #ifdef CONFIG_HAVE_HW_BREAKPOINT
> > @@ -1119,6 +1129,14 @@ static const struct user_regset aarch64_regsets[] = {
> > .regset_get = gpr_get,
> > .set = gpr_set
> > },
> > +   [REGSET_GPR_FULL] = {
> > +   .core_note_type = NT_ARM_PRSTATUS,

...

> > diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
> > index 30f68b42eeb5..a2086d19263a 100644
> > --- a/include/uapi/linux/elf.h
> > +++ b/include/uapi/linux/elf.h
> > @@ -426,6 +426,7 @@ typedef struct elf64_shdr {
> >  #define NT_ARM_PACA_KEYS   0x407   /* ARM pointer authentication address 
> > keys */
> >  #define NT_ARM_PACG_KEYS   0x408   /* ARM pointer authentication generic 
> > key */
> >  #define NT_ARM_TAGGED_ADDR_CTRL0x409   /* arm64 tagged address control 
> > (prctl()) */
> 
> What happened to 0x40a..0x40f?

shame on me :)

> 
> [...]
> 
> Cheers
> ---Dave


Re: [PATCH v12 6/8] drm/mediatek: enable dither function

2021-01-29 Thread Hsin-Yi Wang
On Fri, Jan 29, 2021 at 3:42 PM Yongqiang Niu
 wrote:
>
> On Fri, 2021-01-29 at 14:46 +0800, Hsin-Yi Wang wrote:
> > On Fri, Jan 29, 2021 at 2:30 PM Yongqiang Niu
> >  wrote:
> > >
> > > On Fri, 2021-01-29 at 14:24 +0800, Hsin-Yi Wang wrote:
> > > > On Fri, Jan 29, 2021 at 9:33 AM CK Hu  wrote:
> > > > >
> > > > > Hi, Hsin-Yi:
> > > > >
> > > > > On Thu, 2021-01-28 at 19:23 +0800, Hsin-Yi Wang wrote:
> > > > > > From: Yongqiang Niu 
> > > > > >
> > > > > > for 5 or 6 bpc panel, we need enable dither function
> > > > > > to improve the display quality
> > > > > >
> > > > > > Signed-off-by: Yongqiang Niu 
> > > > > > Signed-off-by: Hsin-Yi Wang 
> > > > > > ---
> > > > > >  drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c | 15 +--
> > > > > >  1 file changed, 13 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c 
> > > > > > b/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c
> > > > > > index ac2cb25620357..6c8f246380a74 100644
> > > > > > --- a/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c
> > > > > > +++ b/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c
> > > > > > @@ -53,6 +53,7 @@
> > > > > >  #define DITHER_ENBIT(0)
> > > > > >  #define DISP_DITHER_CFG  0x0020
> > > > > >  #define DITHER_RELAY_MODEBIT(0)
> > > > > > +#define DITHER_ENGINE_EN BIT(1)
> > > > > >  #define DISP_DITHER_SIZE 0x0030
> > > > > >
> > > > > >  #define LUT_10BIT_MASK   0x03ff
> > > > > > @@ -314,9 +315,19 @@ static void mtk_dither_config(struct device 
> > > > > > *dev, unsigned int w,
> > > > > > unsigned int bpc, struct cmdq_pkt 
> > > > > > *cmdq_pkt)
> > > > > >  {
> > > > > >   struct mtk_ddp_comp_dev *priv = dev_get_drvdata(dev);
> > > > > > + bool enable = (bpc == 5 || bpc == 6);
> > > > >
> > > > > I strongly believe that dither function in dither is identical to the
> > > > > one in gamma and od, and in mtk_dither_set_common(), 'bpc >=
> > > > > MTK_MIN_BPC' is valid, so I believe we need not to limit bpc to 5 or 
> > > > > 6.
> > > > > But we should consider the case that bpc is invalid in
> > > > > mtk_dither_set_common(). Invalid case in gamma and od use different 
> > > > > way
> > > > > to process. For gamma, dither is default relay mode, so invalid bpc
> > > > > would do nothing in mtk_dither_set_common() and result in relay mode.
> > > > > For od, it set to relay mode first, them invalid bpc would do nothing 
> > > > > in
> > > > > mtk_dither_set_common() and result in relay mode. I would like dither,
> > > > > gamma and od to process invalid bpc in the same way. One solution is 
> > > > > to
> > > > > set relay mode in mtk_dither_set_common() for invalid bpc.
> > > > >
> > > > > Regards,
> > > > > CK
> > > > >
> > > >
> > > > I modify the mtk_dither_config() to follow:
> > > >
> > > >
> > > > diff --git a/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c
> > > > b/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c
> > > > index ac2cb25620357..5b7fcedb9f9a8 100644
> > > > --- a/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c
> > > > +++ b/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c
> > > > @@ -53,6 +53,7 @@
> > > >  #define DITHER_EN  BIT(0)
> > > >  #define DISP_DITHER_CFG0x0020
> > > >  #define DITHER_RELAY_MODE  BIT(0)
> > > > +#define DITHER_ENGINE_EN   BIT(1)
> > > >  #define DISP_DITHER_SIZE   0x0030
> > > >
> > > >  #define LUT_10BIT_MASK 0x03ff
> > > > @@ -166,6 +167,8 @@ void mtk_dither_set_common(void __iomem *regs,
> > > > struct cmdq_client_reg *cmdq_reg,
> > > >   DITHER_ADD_LSHIFT_G(MTK_MAX_BPC - bpc),
> > > >   cmdq_reg, regs, DISP_DITHER_16);
> > > > mtk_ddp_write(cmdq_pkt, dither_en, cmdq_reg, regs, cfg);
> > > > +   } else {
> > > > +   mtk_ddp_write(cmdq_pkt, DITHER_RELAY_MODE, cmdq_reg, 
> > > > regs, cfg);
> > > > }
> > > >  }
> > > >
> > > > @@ -315,8 +318,12 @@ static void mtk_dither_config(struct device *dev,
> > > > unsigned int w,
> > > >  {
> > > > struct mtk_ddp_comp_dev *priv = dev_get_drvdata(dev);
> > > >
> > > > -   mtk_ddp_write(cmdq_pkt, h << 16 | w, &priv->cmdq_reg,
> > > > priv->regs, DISP_DITHER_SIZE);
> > > > -   mtk_ddp_write(cmdq_pkt, DITHER_RELAY_MODE, &priv->cmdq_reg,
> > > > priv->regs, DISP_DITHER_CFG);
> > > > +   mtk_ddp_write(cmdq_pkt, h << 16 | w, &priv->cmdq_reg, 
> > > > priv->regs,
> > > > + DISP_DITHER_SIZE);
> > > > +   mtk_ddp_write(cmdq_pkt, DITHER_RELAY_MODE, &priv->cmdq_reg, 
> > > > priv->regs,
> > > > + DISP_DITHER_CFG);
> > > > +   mtk_dither_set_common(priv->regs, &priv->cmdq_reg, bpc, 
> > > > DISP_DITHER_CFG,
> > > > +  

[PATCH] power: supply: Simplify bool conversion

2021-01-29 Thread Yang Li
Fix the following coccicheck warning:
./drivers/power/supply/cpcap-charger.c:416:31-36: WARNING: conversion to
bool not needed here

Reported-by: Abaci Robot 
Signed-off-by: Yang Li 
---
 drivers/power/supply/cpcap-charger.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/power/supply/cpcap-charger.c 
b/drivers/power/supply/cpcap-charger.c
index c0d452e..c70a761 100644
--- a/drivers/power/supply/cpcap-charger.c
+++ b/drivers/power/supply/cpcap-charger.c
@@ -413,7 +413,7 @@ static bool cpcap_charger_vbus_valid(struct 
cpcap_charger_ddata *ddata)
 
error = iio_read_channel_processed(channel, &value);
if (error >= 0)
-   return value > 3900 ? true : false;
+   return value > 3900;
 
dev_err(ddata->dev, "error reading VBUS: %i\n", error);
 
-- 
1.8.3.1



Re: [PATCH] PCI: endpoint: Select configfs dependency

2021-01-29 Thread Kishon Vijay Abraham I
Hi Arnd, Lorenzo,

On 25/01/21 5:04 pm, Arnd Bergmann wrote:
> From: Arnd Bergmann 
> 
> The newly added pci-epf-ntb driver uses configfs, which
> causes a link failure when that is disabled at compile-time:
> 
> arm-linux-gnueabi-ld: drivers/pci/endpoint/functions/pci-epf-ntb.o: in 
> function `epf_ntb_add_cfs':
> pci-epf-ntb.c:(.text+0x954): undefined reference to 
> `config_group_init_type_name'
> 
> Add a 'select' statement to Kconfig to ensure it's always there,
> which is the common way to enable it for other configfs users.
> 
> Fixes: 7dc64244f9e9 ("PCI: endpoint: Add EP function driver to provide NTB 
> functionality")
> Signed-off-by: Arnd Bergmann 

Since I'm sending a new revision of NTB driver, I'll squash this patch
with the driver patch and add Arnd's sign off.

Thank You,
Kishon

> ---
>  drivers/pci/endpoint/functions/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/pci/endpoint/functions/Kconfig 
> b/drivers/pci/endpoint/functions/Kconfig
> index 24bfb2af65a1..5d35fcd613ef 100644
> --- a/drivers/pci/endpoint/functions/Kconfig
> +++ b/drivers/pci/endpoint/functions/Kconfig
> @@ -16,6 +16,7 @@ config PCI_EPF_TEST
>  config PCI_EPF_NTB
>   tristate "PCI Endpoint NTB driver"
>   depends on PCI_ENDPOINT
> + select CONFIGFS_FS
>   help
> Select this configuration option to enable the NTB driver
> for PCI Endpoint. NTB driver implements NTB controller
> 


Re: [PATCH 04/13] module: use RCU to synchronize find_module

2021-01-29 Thread Petr Mladek
On Thu 2021-01-28 19:14:12, Christoph Hellwig wrote:
> Allow for a RCU-sched critical section around find_module, following
> the lower level find_module_all helper, and switch the two callers
> outside of module.c to use such a RCU-sched critical section instead
> of module_mutex.
> 
> Signed-off-by: Christoph Hellwig 

It looks good and safe.

Reviewed-by: Petr Mladek 

Best Regards,
Petr


Re: [RFC PATCH v3 00/13] virtio/vsock: introduce SOCK_SEQPACKET support

2021-01-29 Thread Stefano Garzarella

On Fri, Jan 29, 2021 at 09:41:50AM +0300, Arseny Krasnov wrote:


On 28.01.2021 20:19, Stefano Garzarella wrote:

Hi Arseny,
I reviewed a part, tomorrow I hope to finish the other patches.

Just a couple of comments in the TODOs below.

On Mon, Jan 25, 2021 at 02:09:00PM +0300, Arseny Krasnov wrote:

This patchset impelements support of SOCK_SEQPACKET for virtio
transport.
As SOCK_SEQPACKET guarantees to save record boundaries, so to
do it, new packet operation was added: it marks start of record (with
record length in header), such packet doesn't carry any data.  To send
record, packet with start marker is sent first, then all data is sent
as usual 'RW' packets. On receiver's side, length of record is known

>from packet with start record marker. Now as  packets of one socket

are not reordered neither on vsock nor on vhost transport layers, such
marker allows to restore original record on receiver's side. If user's
buffer is smaller that record length, when all out of size data is
dropped.
Maximum length of datagram is not limited as in stream socket,
because same credit logic is used. Difference with stream socket is
that user is not woken up until whole record is received or error
occurred. Implementation also supports 'MSG_EOR' and 'MSG_TRUNC' flags.
Tests also implemented.

Arseny Krasnov (13):
 af_vsock: prepare for SOCK_SEQPACKET support
 af_vsock: prepare 'vsock_connectible_recvmsg()'
 af_vsock: implement SEQPACKET rx loop
 af_vsock: implement send logic for SOCK_SEQPACKET
 af_vsock: rest of SEQPACKET support
 af_vsock: update comments for stream sockets
 virtio/vsock: dequeue callback for SOCK_SEQPACKET
 virtio/vsock: fetch length for SEQPACKET record
 virtio/vsock: add SEQPACKET receive logic
 virtio/vsock: rest of SOCK_SEQPACKET support
 virtio/vsock: setup SEQPACKET ops for transport
 vhost/vsock: setup SEQPACKET ops for transport
 vsock_test: add SOCK_SEQPACKET tests

drivers/vhost/vsock.c   |   7 +-
include/linux/virtio_vsock.h|  12 +
include/net/af_vsock.h  |   6 +
include/uapi/linux/virtio_vsock.h   |   9 +
net/vmw_vsock/af_vsock.c| 543 --
net/vmw_vsock/virtio_transport.c|   4 +
net/vmw_vsock/virtio_transport_common.c | 295 ++--
tools/testing/vsock/util.c  |  32 +-
tools/testing/vsock/util.h  |   3 +
tools/testing/vsock/vsock_test.c| 126 +
10 files changed, 862 insertions(+), 175 deletions(-)

TODO:
- Support for record integrity control. As transport could drop some
  packets, something like "record-id" and record end marker need to
  be implemented. Idea is that SEQ_BEGIN packet carries both record
  length and record id, end marker(let it be SEQ_END) carries only
  record id. To be sure that no one packet was lost, receiver checks
  length of data between SEQ_BEGIN and SEQ_END(it must be same with
  value in SEQ_BEGIN) and record ids of SEQ_BEGIN and SEQ_END(this
  means that both markers were not dropped. I think that easiest way
  to implement record id for SEQ_BEGIN is to reuse another field of
  packet header(SEQ_BEGIN already uses 'flags' as record length).For
  SEQ_END record id could be stored in 'flags'.

I don't really like the idea of reusing the 'flags' field for this
purpose.


Another way to implement it, is to move metadata of both SEQ_END
  and SEQ_BEGIN to payload. But this approach has problem, because
  if we move something to payload, such payload is accounted by
  credit logic, which fragments payload, while payload with record
  length and id couldn't be fragmented. One way to overcome it is to
  ignore credit update for SEQ_BEGIN/SEQ_END packet.Another solution
  is to update 'stream_has_space()' function: current implementation
  return non-zero when at least 1 byte is allowed to use,but updated
  version will have extra argument, which is needed length. For 'RW'
  packet this argument is 1, for SEQ_BEGIN it is sizeof(record len +
  record id) and for SEQ_END it is sizeof(record id).

Is the payload accounted by credit logic also if hdr.op is not
VIRTIO_VSOCK_OP_RW?


Yes, on send any packet with payload could be fragmented if

there is not enough space at receiver. On receive 'fwd_cnt' and

'buf_alloc' are updated with header of every packet. Of course,

to every such case i've described i can add check for 'RW'

packet, to exclude payload from credit accounting, but this is

bunch of dumb checks.



I think that we can define a specific header to put after the
virtio_vsock_hdr when hdr.op is SEQ_BEGIN or SEQ_END, and in this header
we can store the id and the length of the message.


I think it is better than use payload and touch credit logic



Cool, so let's try this option, hoping there aren't a lot of issues.

Another item for TODO could be to add the SOCK_SEQPACKET support also 
for vsock_loopback. Should be simple since it also uses 
virtio_transport_common APIs and it can be useful fo

Re: [PATCH v4 2/8] drm/mediatek: add component POSTMASK

2021-01-29 Thread CK Hu
Hi, Hsin-Yi:

On Fri, 2021-01-29 at 15:34 +0800, Hsin-Yi Wang wrote:
> From: Yongqiang Niu 
> 
> This patch add component POSTMASK,
> 
> Signed-off-by: Yongqiang Niu 
> Signed-off-by: Hsin-Yi Wang 
> ---
>  drivers/gpu/drm/mediatek/Makefile|   1 +
>  drivers/gpu/drm/mediatek/mtk_disp_drv.h  |   8 +
>  drivers/gpu/drm/mediatek/mtk_disp_postmask.c | 161 +++
>  drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c  |  11 ++
>  drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.h  |   1 +
>  drivers/gpu/drm/mediatek/mtk_drm_drv.c   |   4 +-
>  drivers/gpu/drm/mediatek/mtk_drm_drv.h   |   1 +
>  7 files changed, 186 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/drm/mediatek/mtk_disp_postmask.c
> 
> diff --git a/drivers/gpu/drm/mediatek/Makefile 
> b/drivers/gpu/drm/mediatek/Makefile
> index b64674b944860..13a0eafabf9c0 100644
> --- a/drivers/gpu/drm/mediatek/Makefile
> +++ b/drivers/gpu/drm/mediatek/Makefile
> @@ -3,6 +3,7 @@
>  mediatek-drm-y := mtk_disp_color.o \
> mtk_disp_gamma.o \
> mtk_disp_ovl.o \
> +   mtk_disp_postmask.o \
> mtk_disp_rdma.o \
> mtk_drm_crtc.o \
> mtk_drm_ddp_comp.o \
> diff --git a/drivers/gpu/drm/mediatek/mtk_disp_drv.h 
> b/drivers/gpu/drm/mediatek/mtk_disp_drv.h
> index 02191010699f8..d74e85db3fcdf 100644
> --- a/drivers/gpu/drm/mediatek/mtk_disp_drv.h
> +++ b/drivers/gpu/drm/mediatek/mtk_disp_drv.h
> @@ -37,6 +37,14 @@ void mtk_gamma_set_common(void __iomem *regs, struct 
> drm_crtc_state *state);
>  void mtk_gamma_start(struct device *dev);
>  void mtk_gamma_stop(struct device *dev);
>  
> +int mtk_postmask_clk_enable(struct device *dev);
> +void mtk_postmask_clk_disable(struct device *dev);
> +void mtk_postmask_config(struct device *dev, unsigned int w,
> +  unsigned int h, unsigned int vrefresh,
> +  unsigned int bpc, struct cmdq_pkt *cmdq_pkt);
> +void mtk_postmask_start(struct device *dev);
> +void mtk_postmask_stop(struct device *dev);
> +
>  void mtk_ovl_bgclr_in_on(struct device *dev);
>  void mtk_ovl_bgclr_in_off(struct device *dev);
>  void mtk_ovl_bypass_shadow(struct device *dev);
> diff --git a/drivers/gpu/drm/mediatek/mtk_disp_postmask.c 
> b/drivers/gpu/drm/mediatek/mtk_disp_postmask.c
> new file mode 100644
> index 0..d640cef9c15a4
> --- /dev/null
> +++ b/drivers/gpu/drm/mediatek/mtk_disp_postmask.c
> @@ -0,0 +1,161 @@
> +/*
> + * SPDX-License-Identifier:
> + *
> + * Copyright (c) 2020 MediaTek Inc.

2021

> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "mtk_disp_drv.h"
> +#include "mtk_drm_crtc.h"
> +#include "mtk_drm_ddp_comp.h"
> +
> +#define DISP_POSTMASK_EN 0x
> +#define POSTMASK_EN  BIT(0)
> +#define DISP_POSTMASK_CFG0x0020
> +#define POSTMASK_RELAY_MODE  BIT(0)
> +#define DISP_POSTMASK_SIZE   0x0030
> +
> +struct mtk_disp_postmask_data {
> + u32 reserved;
> +};

Useless, so remove.

> +
> +/**
> + * struct mtk_disp_postmask - DISP_postmask driver structure
> + * @ddp_comp - structure containing type enum and hardware resources
> + * @crtc - associated crtc to report irq events to
> + */
> +struct mtk_disp_postmask {
> + struct clk *clk;
> + void __iomem *regs;
> + struct cmdq_client_reg cmdq_reg;
> + const struct mtk_disp_postmask_data *data;
> +};
> +
> +int mtk_postmask_clk_enable(struct device *dev)
> +{
> + struct mtk_disp_postmask *postmask = dev_get_drvdata(dev);
> +
> + return clk_prepare_enable(postmask->clk);
> +}
> +
> +void mtk_postmask_clk_disable(struct device *dev)
> +{
> + struct mtk_disp_postmask *postmask = dev_get_drvdata(dev);
> +
> + clk_disable_unprepare(postmask->clk);
> +}
> +
> +void mtk_postmask_config(struct device *dev, unsigned int w,
> +  unsigned int h, unsigned int vrefresh,
> +  unsigned int bpc, struct cmdq_pkt *cmdq_pkt)
> +{
> + struct mtk_disp_postmask *postmask = dev_get_drvdata(dev);
> +
> + mtk_ddp_write(cmdq_pkt, w << 16 | h, &postmask->cmdq_reg, 
> postmask->regs,
> +   DISP_POSTMASK_SIZE);
> + mtk_ddp_write(cmdq_pkt, POSTMASK_RELAY_MODE, &postmask->cmdq_reg,
> +   postmask->regs, DISP_POSTMASK_CFG);
> +}
> +
> +void mtk_postmask_start(struct device *dev)
> +{
> + struct mtk_disp_postmask *postmask = dev_get_drvdata(dev);
> +
> + writel(POSTMASK_EN, postmask->regs + DISP_POSTMASK_EN);
> +}
> +
> +void mtk_postmask_stop(struct device *dev)
> +{
> + struct mtk_disp_postmask *postmask = dev_get_drvdata(dev);
> +
> + writel_relaxed(0x0, postmask->regs + DISP_POSTMASK_EN);
> +}
> +
> +static int mtk_disp_postmask_bind(struct device *dev, struct device *master, 
> void *data)
> +{
> + return 0;
> +}
> +
> +static void mtk_disp_po

[PATCH RESEND v5 6/8] regulator: mt6359: Add support for MT6359 regulator

2021-01-29 Thread Hsin-Hsiung Wang
From: Wen Su 

The MT6359 is a regulator found on boards based on MediaTek MT6779 and
probably other SoCs. It is a so called pmic and connects as a slave to
SoC using SPI, wrapped inside the pmic-wrapper.

Signed-off-by: Wen Su 
Signed-off-by: Hsin-Hsiung Wang 
---
changes since v4:
- add enable time of ldo.
- use the device of mfd driver for the regulator_config.
- add the regulators_node support.
---
 drivers/regulator/Kconfig  |   9 +
 drivers/regulator/Makefile |   1 +
 drivers/regulator/mt6359-regulator.c   | 669 +
 include/linux/regulator/mt6359-regulator.h |  58 ++
 4 files changed, 737 insertions(+)
 create mode 100644 drivers/regulator/mt6359-regulator.c
 create mode 100644 include/linux/regulator/mt6359-regulator.h

diff --git a/drivers/regulator/Kconfig b/drivers/regulator/Kconfig
index 53fa84f4d1e1..3de7bb5be8ac 100644
--- a/drivers/regulator/Kconfig
+++ b/drivers/regulator/Kconfig
@@ -750,6 +750,15 @@ config REGULATOR_MT6358
  This driver supports the control of different power rails of device
  through regulator interface.
 
+config REGULATOR_MT6359
+   tristate "MediaTek MT6359 PMIC"
+   depends on MFD_MT6397
+   help
+ Say y here to select this option to enable the power regulator of
+ MediaTek MT6359 PMIC.
+ This driver supports the control of different power rails of device
+ through regulator interface.
+
 config REGULATOR_MT6360
tristate "MT6360 SubPMIC Regulator"
depends on MFD_MT6360
diff --git a/drivers/regulator/Makefile b/drivers/regulator/Makefile
index 680e539f6579..4f65eaead82d 100644
--- a/drivers/regulator/Makefile
+++ b/drivers/regulator/Makefile
@@ -91,6 +91,7 @@ obj-$(CONFIG_REGULATOR_MPQ7920) += mpq7920.o
 obj-$(CONFIG_REGULATOR_MT6311) += mt6311-regulator.o
 obj-$(CONFIG_REGULATOR_MT6323) += mt6323-regulator.o
 obj-$(CONFIG_REGULATOR_MT6358) += mt6358-regulator.o
+obj-$(CONFIG_REGULATOR_MT6359) += mt6359-regulator.o
 obj-$(CONFIG_REGULATOR_MT6360) += mt6360-regulator.o
 obj-$(CONFIG_REGULATOR_MT6380) += mt6380-regulator.o
 obj-$(CONFIG_REGULATOR_MT6397) += mt6397-regulator.o
diff --git a/drivers/regulator/mt6359-regulator.c 
b/drivers/regulator/mt6359-regulator.c
new file mode 100644
index ..fabc3f57f334
--- /dev/null
+++ b/drivers/regulator/mt6359-regulator.c
@@ -0,0 +1,669 @@
+// SPDX-License-Identifier: GPL-2.0
+//
+// Copyright (c) 2020 MediaTek Inc.
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define MT6359_BUCK_MODE_AUTO  0
+#define MT6359_BUCK_MODE_FORCE_PWM 1
+#define MT6359_BUCK_MODE_NORMAL0
+#define MT6359_BUCK_MODE_LP2
+
+/*
+ * MT6359 regulators' information
+ *
+ * @desc: standard fields of regulator description.
+ * @status_reg: for query status of regulators.
+ * @qi: Mask for query enable signal status of regulators.
+ * @modeset_reg: for operating AUTO/PWM mode register.
+ * @modeset_mask: MASK for operating modeset register.
+ * @modeset_shift: SHIFT for operating modeset register.
+ */
+struct mt6359_regulator_info {
+   struct regulator_desc desc;
+   u32 status_reg;
+   u32 qi;
+   u32 modeset_reg;
+   u32 modeset_mask;
+   u32 modeset_shift;
+   u32 lp_mode_reg;
+   u32 lp_mode_mask;
+   u32 lp_mode_shift;
+};
+
+#define MT6359_BUCK(match, _name, min, max, step, min_sel, \
+   volt_ranges, _enable_reg, _status_reg,  \
+   _vsel_reg, _vsel_mask,  \
+   _lp_mode_reg, _lp_mode_shift,   \
+   _modeset_reg, _modeset_shift)   \
+[MT6359_ID_##_name] = {\
+   .desc = {   \
+   .name = #_name, \
+   .of_match = of_match_ptr(match),\
+   .regulators_node = of_match_ptr("regulators"),  \
+   .ops = &mt6359_volt_range_ops,  \
+   .type = REGULATOR_VOLTAGE,  \
+   .id = MT6359_ID_##_name,\
+   .owner = THIS_MODULE,   \
+   .uV_step = (step),  \
+   .linear_min_sel = (min_sel),\
+   .n_voltages = ((max) - (min)) / (step) + 1, \
+   .min_uV = (min),\
+   .linear_ranges = volt_ranges,   \
+   .n_linear_ranges = ARRAY_SIZE(volt_ranges), \
+   .vsel_reg = _vsel_reg,  \
+   .vsel_mask = _vsel_mask,\
+   .enable_reg = _enable_reg,  \
+   .enable_mask = BIT(0),   

[PATCH RESEND v5 8/8] arm64: dts: mt6359: add PMIC MT6359 related nodes

2021-01-29 Thread Hsin-Hsiung Wang
From: Wen Su 

add PMIC MT6359 related nodes which is for MT6779 platform

Signed-off-by: Wen Su 
Signed-off-by: Hsin-Hsiung Wang 
---
changes since v4:
- add pmic MT6359 support in the MT8192 evb dts.
---
 arch/arm64/boot/dts/mediatek/mt6359.dtsi| 298 
 arch/arm64/boot/dts/mediatek/mt8192-evb.dts |   1 +
 2 files changed, 299 insertions(+)
 create mode 100644 arch/arm64/boot/dts/mediatek/mt6359.dtsi

diff --git a/arch/arm64/boot/dts/mediatek/mt6359.dtsi 
b/arch/arm64/boot/dts/mediatek/mt6359.dtsi
new file mode 100644
index ..4bd85e33a4c9
--- /dev/null
+++ b/arch/arm64/boot/dts/mediatek/mt6359.dtsi
@@ -0,0 +1,298 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2020 MediaTek Inc.
+ */
+
+&pwrap {
+   pmic: pmic {
+   compatible = "mediatek,mt6359";
+   interrupt-controller;
+   #interrupt-cells = <2>;
+
+   mt6359codec: mt6359codec {
+   };
+
+   mt6359regulator: regulators {
+   mt6359_vs1_buck_reg: buck_vs1 {
+   regulator-name = "vs1";
+   regulator-min-microvolt = <80>;
+   regulator-max-microvolt = <220>;
+   regulator-enable-ramp-delay = <0>;
+   regulator-always-on;
+   };
+   mt6359_vgpu11_buck_reg: buck_vgpu11 {
+   regulator-name = "vgpu11";
+   regulator-min-microvolt = <40>;
+   regulator-max-microvolt = <1193750>;
+   regulator-ramp-delay = <5000>;
+   regulator-enable-ramp-delay = <200>;
+   regulator-allowed-modes = <0 1 2>;
+   };
+   mt6359_vmodem_buck_reg: buck_vmodem {
+   regulator-name = "vmodem";
+   regulator-min-microvolt = <40>;
+   regulator-max-microvolt = <110>;
+   regulator-ramp-delay = <10760>;
+   regulator-enable-ramp-delay = <200>;
+   };
+   mt6359_vpu_buck_reg: buck_vpu {
+   regulator-name = "vpu";
+   regulator-min-microvolt = <40>;
+   regulator-max-microvolt = <1193750>;
+   regulator-ramp-delay = <5000>;
+   regulator-enable-ramp-delay = <200>;
+   regulator-allowed-modes = <0 1 2>;
+   };
+   mt6359_vcore_buck_reg: buck_vcore {
+   regulator-name = "vcore";
+   regulator-min-microvolt = <40>;
+   regulator-max-microvolt = <130>;
+   regulator-ramp-delay = <5000>;
+   regulator-enable-ramp-delay = <200>;
+   regulator-allowed-modes = <0 1 2>;
+   };
+   mt6359_vs2_buck_reg: buck_vs2 {
+   regulator-name = "vs2";
+   regulator-min-microvolt = <80>;
+   regulator-max-microvolt = <160>;
+   regulator-enable-ramp-delay = <0>;
+   regulator-always-on;
+   };
+   mt6359_vpa_buck_reg: buck_vpa {
+   regulator-name = "vpa";
+   regulator-min-microvolt = <50>;
+   regulator-max-microvolt = <365>;
+   regulator-enable-ramp-delay = <300>;
+   };
+   mt6359_vproc2_buck_reg: buck_vproc2 {
+   regulator-name = "vproc2";
+   regulator-min-microvolt = <40>;
+   regulator-max-microvolt = <1193750>;
+   regulator-ramp-delay = <7500>;
+   regulator-enable-ramp-delay = <200>;
+   regulator-allowed-modes = <0 1 2>;
+   };
+   mt6359_vproc1_buck_reg: buck_vproc1 {
+   regulator-name = "vproc1";
+   regulator-min-microvolt = <40>;
+   regulator-max-microvolt = <1193750>;
+   regulator-ramp-delay = <7500>;
+   regulator-enable-ramp-delay = <200>;
+   regulator-allowed-modes = <0 1 2>;
+   };
+ 

Re: [PATCH v4 7/8] soc: mediatek: add mtk mutex support for MT8192

2021-01-29 Thread CK Hu
Hi, Hsin-Yi:

On Fri, 2021-01-29 at 15:34 +0800, Hsin-Yi Wang wrote:
> From: Yongqiang Niu 
> 
> Add mtk mutex support for MT8192 SoC.

Reviewed-by: CK Hu 

> 
> Signed-off-by: Yongqiang Niu 
> Signed-off-by: Hsin-Yi Wang 
> ---
>  drivers/soc/mediatek/mtk-mutex.c | 35 
>  1 file changed, 35 insertions(+)
> 
> diff --git a/drivers/soc/mediatek/mtk-mutex.c 
> b/drivers/soc/mediatek/mtk-mutex.c
> index 718a41beb6afb..dfd9806d5a001 100644
> --- a/drivers/soc/mediatek/mtk-mutex.c
> +++ b/drivers/soc/mediatek/mtk-mutex.c
> @@ -39,6 +39,18 @@
>  #define MT8167_MUTEX_MOD_DISP_DITHER 15
>  #define MT8167_MUTEX_MOD_DISP_UFOE   16
>  
> +#define MT8192_MUTEX_MOD_DISP_OVL0   0
> +#define MT8192_MUTEX_MOD_DISP_OVL0_2L1
> +#define MT8192_MUTEX_MOD_DISP_RDMA0  2
> +#define MT8192_MUTEX_MOD_DISP_COLOR0 4
> +#define MT8192_MUTEX_MOD_DISP_CCORR0 5
> +#define MT8192_MUTEX_MOD_DISP_AAL0   6
> +#define MT8192_MUTEX_MOD_DISP_GAMMA0 7
> +#define MT8192_MUTEX_MOD_DISP_POSTMASK0  8
> +#define MT8192_MUTEX_MOD_DISP_DITHER09
> +#define MT8192_MUTEX_MOD_DISP_OVL2_2L16
> +#define MT8192_MUTEX_MOD_DISP_RDMA4  17
> +
>  #define MT8183_MUTEX_MOD_DISP_RDMA0  0
>  #define MT8183_MUTEX_MOD_DISP_RDMA1  1
>  #define MT8183_MUTEX_MOD_DISP_OVL0   9
> @@ -214,6 +226,20 @@ static const unsigned int 
> mt8183_mutex_mod[DDP_COMPONENT_ID_MAX] = {
>   [DDP_COMPONENT_WDMA0] = MT8183_MUTEX_MOD_DISP_WDMA0,
>  };
>  
> +static const unsigned int mt8192_mutex_mod[DDP_COMPONENT_ID_MAX] = {
> + [DDP_COMPONENT_AAL0] = MT8192_MUTEX_MOD_DISP_AAL0,
> + [DDP_COMPONENT_CCORR] = MT8192_MUTEX_MOD_DISP_CCORR0,
> + [DDP_COMPONENT_COLOR0] = MT8192_MUTEX_MOD_DISP_COLOR0,
> + [DDP_COMPONENT_DITHER] = MT8192_MUTEX_MOD_DISP_DITHER0,
> + [DDP_COMPONENT_GAMMA] = MT8192_MUTEX_MOD_DISP_GAMMA0,
> + [DDP_COMPONENT_POSTMASK0] = MT8192_MUTEX_MOD_DISP_POSTMASK0,
> + [DDP_COMPONENT_OVL0] = MT8192_MUTEX_MOD_DISP_OVL0,
> + [DDP_COMPONENT_OVL_2L0] = MT8192_MUTEX_MOD_DISP_OVL0_2L,
> + [DDP_COMPONENT_OVL_2L2] = MT8192_MUTEX_MOD_DISP_OVL2_2L,
> + [DDP_COMPONENT_RDMA0] = MT8192_MUTEX_MOD_DISP_RDMA0,
> + [DDP_COMPONENT_RDMA4] = MT8192_MUTEX_MOD_DISP_RDMA4,
> +};
> +
>  static const unsigned int mt2712_mutex_sof[MUTEX_SOF_DSI3 + 1] = {
>   [MUTEX_SOF_SINGLE_MODE] = MUTEX_SOF_SINGLE_MODE,
>   [MUTEX_SOF_DSI0] = MUTEX_SOF_DSI0,
> @@ -275,6 +301,13 @@ static const struct mtk_mutex_data 
> mt8183_mutex_driver_data = {
>   .no_clk = true,
>  };
>  
> +static const struct mtk_mutex_data mt8192_mutex_driver_data = {
> + .mutex_mod = mt8192_mutex_mod,
> + .mutex_sof = mt8183_mutex_sof,
> + .mutex_mod_reg = MT8183_MUTEX0_MOD0,
> + .mutex_sof_reg = MT8183_MUTEX0_SOF0,
> +};
> +
>  struct mtk_mutex *mtk_mutex_get(struct device *dev)
>  {
>   struct mtk_mutex_ctx *mtx = dev_get_drvdata(dev);
> @@ -507,6 +540,8 @@ static const struct of_device_id mutex_driver_dt_match[] 
> = {
> .data = &mt8173_mutex_driver_data},
>   { .compatible = "mediatek,mt8183-disp-mutex",
> .data = &mt8183_mutex_driver_data},
> + { .compatible = "mediatek,mt8192-disp-mutex",
> +   .data = &mt8192_mutex_driver_data},
>   {},
>  };
>  MODULE_DEVICE_TABLE(of, mutex_driver_dt_match);



Re: [PATCH v16 07/11] secretmem: use PMD-size pages to amortize direct map fragmentation

2021-01-29 Thread Michal Hocko
On Thu 28-01-21 13:05:02, James Bottomley wrote:
> Obviously the API choice could be revisited
> but do you have anything to add over the previous discussion, or is
> this just to get your access control?

Well, access control is certainly one thing which I still believe is
missing. But if there is a general agreement that the direct map
manipulation is not that critical then this will become much less of a
problem of course.

It all boils down whether secret memory is a scarce resource. With the
existing implementation it really is. It is effectivelly repeating
same design errors as hugetlb did. And look now, we have a subtle and
convoluted reservation code to track mmap requests and we have a cgroup
controller to, guess what, have at least some control over distribution
if the preallocated pool. See where am I coming from?

If the secret memory is more in line with mlock without any imposed
limit (other than available memory) in the end then, sure, using the same
access control as mlock sounds reasonable. Btw. if this is really
just a more restrictive mlock then is there any reason to not hook this
into the existing mlock infrastructure (e.g. MCL_EXCLUSIVE)?
Implications would be that direct map would be handled on instantiation/tear
down paths, migration would deal with the same (if possible). Other than
that it would be mlock like.
-- 
Michal Hocko
SUSE Labs


Re: [RFC PATCH] io_uring: add support for IORING_OP_GETDENTS64

2021-01-29 Thread Lennert Buytenhek
On Fri, Jan 29, 2021 at 07:37:03AM +0200, Lennert Buytenhek wrote:

> > > > One open question is whether IORING_OP_GETDENTS64 should be more like
> > > > pread(2) and allow passing in a starting offset to read from the
> > > > directory from.  (This would require some more surgery in fs/readdir.c.)
> > > 
> > > Since directories are seekable this ought to work.
> > > Modulo horrid issues with 32bit file offsets.
> > 
> > The incremental patch below does this.  (It doesn't apply cleanly on
> > top of v1 of the IORING_OP_GETDENTS patch as I have other changes in
> > my tree -- I'm including it just to illustrate the changes that would
> > make this work.)
> > 
> > This change seems to work, and makes IORING_OP_GETDENTS take an
> > explicitly specified directory offset (instead of using the file's
> > ->f_pos), making it more like pread(2) [...]
> 
> ...but the fact that this patch avoids taking file->f_pos_lock (as this
> proposed version of IORING_OP_GETDENTS avoids using file->f_pos) means
> that ->iterate_shared() can then be called concurrently on the same
> struct file, which breaks the mutual exclusion guarantees provided here.
> 
> If possible, I'd like to keep the ability to explicitly pass in a
> directory offset to start reading from into IORING_OP_GETDENTS, so
> perhaps we can simply satisfy the mutual exclusion requirement by
> taking ->f_pos_lock by hand -- but then I do need to check that it's OK
> for ->iterate{,_shared}() to be called successively with discontinuous
> offsets without ->llseek() being called in between.
> 
> (If that's not OK, then we can either have IORING_OP_GETDENTS just
> always start reading at ->f_pos like before (which might then require
> adding a IORING_OP_GETDENTS2 at some point in the future if we'll
> ever want to change that), or we could have IORING_OP_GETDENTS itself
> call ->llseek() for now whenever necessary.)

Having IORING_OP_GETDENTS seek to sqe->off if needed seems easy
enough to implement, and it simplifies the other code as well, so
I'll send out a v2 RFC shortly that does this.


[PATCH v2] KVM: x86: Allow guests to see MSR_IA32_TSX_CTRL even if tsx=off

2021-01-29 Thread Paolo Bonzini
Userspace that does not know about KVM_GET_MSR_FEATURE_INDEX_LIST
will generally use the default value for MSR_IA32_ARCH_CAPABILITIES.
When this happens and the host has tsx=on, it is possible to end up with
virtual machines that have HLE and RTM disabled, but TSX_CTRL available.

If the fleet is then switched to tsx=off, kvm_get_arch_capabilities()
will clear the ARCH_CAP_TSX_CTRL_MSR bit and it will not be possible to
use the tsx=off hosts as migration destinations, even though the guests
do not have TSX enabled.

To allow this migration, allow guests to write to their TSX_CTRL MSR,
while keeping the host MSR unchanged for the entire life of the guests.
This ensures that TSX remains disabled and also saves MSR reads and
writes, and it's okay to do because with tsx=off we know that guests will
not have the HLE and RTM features in their CPUID.  (If userspace sets
bogus CPUID data, we do not expect HLE and RTM to work in guests anyway).

Cc: sta...@vger.kernel.org
Fixes: cbbaa2727aa3 ("KVM: x86: fix presentation of TSX feature in 
ARCH_CAPABILITIES")
Signed-off-by: Paolo Bonzini 
---
 arch/x86/kvm/vmx/vmx.c | 17 +
 arch/x86/kvm/x86.c |  2 +-
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index cc60b1fc3ee7..eb69fef57485 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6860,11 +6860,20 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
switch (index) {
case MSR_IA32_TSX_CTRL:
/*
-* No need to pass TSX_CTRL_CPUID_CLEAR through, so
-* let's avoid changing CPUID bits under the host
-* kernel's feet.
+* TSX_CTRL_CPUID_CLEAR is handled in the CPUID
+* interception.  Keep the host value unchanged to avoid
+* changing CPUID bits under the host kernel's feet.
+*
+* hle=0, rtm=0, tsx_ctrl=1 can be found with some
+* combinations of new kernel and old userspace.  If
+* those guests run on a tsx=off host, do allow guests
+* to use TSX_CTRL, but do not change the value on the
+* host so that TSX remains always disabled.
 */
-   vmx->guest_uret_msrs[j].mask = 
~(u64)TSX_CTRL_CPUID_CLEAR;
+   if (boot_cpu_has(X86_FEATURE_RTM))
+   vmx->guest_uret_msrs[j].mask = 
~(u64)TSX_CTRL_CPUID_CLEAR;
+   else
+   vmx->guest_uret_msrs[j].mask = 0;
break;
default:
vmx->guest_uret_msrs[j].mask = -1ull;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 76bce832cade..15733013b266 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1401,7 +1401,7 @@ static u64 kvm_get_arch_capabilities(void)
 *This lets the guest use VERW to clear CPU buffers.
 */
if (!boot_cpu_has(X86_FEATURE_RTM))
-   data &= ~(ARCH_CAP_TAA_NO | ARCH_CAP_TSX_CTRL_MSR);
+   data &= ~ARCH_CAP_TAA_NO;
else if (!boot_cpu_has_bug(X86_BUG_TAA))
data |= ARCH_CAP_TAA_NO;
 
-- 
2.26.2



[PATCH] gpiolib: free device name on error path to fix kmemleak

2021-01-29 Thread quanyang . wang
From: Quanyang Wang 

In gpiochip_add_data_with_key, we should check the return value of
dev_set_name to ensure that device name is allocated successfully
and then add a label on the error path to free device name to fix
kmemleak as below:

unreferenced object 0xc2d6fc40 (size 64):
  comm "kworker/0:1", pid 16, jiffies 4294937425 (age 65.120s)
  hex dump (first 32 bytes):
67 70 69 6f 63 68 69 70 30 00 1a c0 54 63 1a c0  gpiochip0...Tc..
0c ed 84 c0 48 ed 84 c0 3c ee 84 c0 10 00 00 00  H...<...
  backtrace:
[<962810f7>] kobject_set_name_vargs+0x2c/0xa0
[] dev_set_name+0x2c/0x5c
[<94abbca9>] gpiochip_add_data_with_key+0xfc/0xce8
[<5c4193e0>] omap_gpio_probe+0x33c/0x68c
[<3402f137>] platform_probe+0x58/0xb8
[<7421e210>] really_probe+0xec/0x3b4
[<000f8ada>] driver_probe_device+0x58/0xb4
[<67e0f7f7>] bus_for_each_drv+0x80/0xd0
[<4de545dc>] __device_attach+0xe8/0x15c
[<2e4431e7>] bus_probe_device+0x84/0x8c
[] device_add+0x384/0x7c0
[<5aff2995>] of_platform_device_create_pdata+0x8c/0xb8
[<061c3483>] of_platform_bus_create+0x198/0x230
[<5ee6d42a>] of_platform_populate+0x60/0xb8
[<2647300f>] sysc_probe+0xd18/0x135c
[<3402f137>] platform_probe+0x58/0xb8

Signed-off-by: Quanyang Wang 
---
 drivers/gpio/gpiolib.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index 7e1ad4d40e0a..091e00f2e0a9 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -603,7 +603,11 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void 
*data,
ret = gdev->id;
goto err_free_gdev;
}
-   dev_set_name(&gdev->dev, GPIOCHIP_NAME "%d", gdev->id);
+
+   ret = dev_set_name(&gdev->dev, GPIOCHIP_NAME "%d", gdev->id);
+   if (ret)
+   goto err_free_ida;
+
device_initialize(&gdev->dev);
dev_set_drvdata(&gdev->dev, gdev);
if (gc->parent && gc->parent->driver)
@@ -617,7 +621,7 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void 
*data,
gdev->descs = kcalloc(gc->ngpio, sizeof(gdev->descs[0]), GFP_KERNEL);
if (!gdev->descs) {
ret = -ENOMEM;
-   goto err_free_ida;
+   goto err_free_dev_name;
}
 
if (gc->ngpio == 0) {
@@ -768,6 +772,8 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void 
*data,
kfree_const(gdev->label);
 err_free_descs:
kfree(gdev->descs);
+err_free_dev_name:
+   kfree(dev_name(&gdev->dev));
 err_free_ida:
ida_free(&gpio_ida, gdev->id);
 err_free_gdev:
-- 
2.25.1



[PATCH 4.14 10/50] xen: Fix event channel callback via INTX/GSI

2021-01-29 Thread Greg Kroah-Hartman
From: David Woodhouse 

[ Upstream commit 3499ba8198cad47b731792e5e56b9ec2a78a83a2 ]

For a while, event channel notification via the PCI platform device
has been broken, because we attempt to communicate with xenstore before
we even have notifications working, with the xs_reset_watches() call
in xs_init().

We tend to get away with this on Xen versions below 4.0 because we avoid
calling xs_reset_watches() anyway, because xenstore might not cope with
reading a non-existent key. And newer Xen *does* have the vector
callback support, so we rarely fall back to INTX/GSI delivery.

To fix it, clean up a bit of the mess of xs_init() and xenbus_probe()
startup. Call xs_init() directly from xenbus_init() only in the !XS_HVM
case, deferring it to be called from xenbus_probe() in the XS_HVM case
instead.

Then fix up the invocation of xenbus_probe() to happen either from its
device_initcall if the callback is available early enough, or when the
callback is finally set up. This means that the hack of calling
xenbus_probe() from a workqueue after the first interrupt, or directly
from the PCI platform device setup, is no longer needed.

Signed-off-by: David Woodhouse 
Reviewed-by: Boris Ostrovsky 
Link: https://lore.kernel.org/r/20210113132606.422794-2-dw...@infradead.org
Signed-off-by: Juergen Gross 
Signed-off-by: Sasha Levin 
---
 arch/arm/xen/enlighten.c  |  2 +-
 drivers/xen/events/events_base.c  | 10 
 drivers/xen/platform-pci.c|  1 -
 drivers/xen/xenbus/xenbus.h   |  1 +
 drivers/xen/xenbus/xenbus_comms.c |  8 ---
 drivers/xen/xenbus/xenbus_probe.c | 81 +--
 include/xen/xenbus.h  |  2 +-
 7 files changed, 70 insertions(+), 35 deletions(-)

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index ba7f4c8f5c3e4..e8e637c4f354d 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -393,7 +393,7 @@ static int __init xen_guest_init(void)
}
gnttab_init();
if (!xen_initial_domain())
-   xenbus_probe(NULL);
+   xenbus_probe();
 
/*
 * Making sure board specific code will not set up ops for
diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index aca8456752797..8c08c7d46d3d0 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -1987,16 +1987,6 @@ static struct irq_chip xen_percpu_chip __read_mostly = {
.irq_ack= ack_dynirq,
 };
 
-int xen_set_callback_via(uint64_t via)
-{
-   struct xen_hvm_param a;
-   a.domid = DOMID_SELF;
-   a.index = HVM_PARAM_CALLBACK_IRQ;
-   a.value = via;
-   return HYPERVISOR_hvm_op(HVMOP_set_param, &a);
-}
-EXPORT_SYMBOL_GPL(xen_set_callback_via);
-
 #ifdef CONFIG_XEN_PVHVM
 /* Vector callbacks are better than PCI interrupts to receive event
  * channel notifications because we can receive vector callbacks on any
diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c
index 5d7dcad0b0a0d..4cec8146609ad 100644
--- a/drivers/xen/platform-pci.c
+++ b/drivers/xen/platform-pci.c
@@ -162,7 +162,6 @@ static int platform_pci_probe(struct pci_dev *pdev,
ret = gnttab_init();
if (ret)
goto grant_out;
-   xenbus_probe(NULL);
return 0;
 grant_out:
gnttab_free_auto_xlat_frames();
diff --git a/drivers/xen/xenbus/xenbus.h b/drivers/xen/xenbus/xenbus.h
index 139539b0ab20d..e6a8d02d35254 100644
--- a/drivers/xen/xenbus/xenbus.h
+++ b/drivers/xen/xenbus/xenbus.h
@@ -114,6 +114,7 @@ int xenbus_probe_node(struct xen_bus_type *bus,
  const char *type,
  const char *nodename);
 int xenbus_probe_devices(struct xen_bus_type *bus);
+void xenbus_probe(void);
 
 void xenbus_dev_changed(const char *node, struct xen_bus_type *bus);
 
diff --git a/drivers/xen/xenbus/xenbus_comms.c 
b/drivers/xen/xenbus/xenbus_comms.c
index eb5151fc8efab..e5fda0256feb3 100644
--- a/drivers/xen/xenbus/xenbus_comms.c
+++ b/drivers/xen/xenbus/xenbus_comms.c
@@ -57,16 +57,8 @@ DEFINE_MUTEX(xs_response_mutex);
 static int xenbus_irq;
 static struct task_struct *xenbus_task;
 
-static DECLARE_WORK(probe_work, xenbus_probe);
-
-
 static irqreturn_t wake_waiting(int irq, void *unused)
 {
-   if (unlikely(xenstored_ready == 0)) {
-   xenstored_ready = 1;
-   schedule_work(&probe_work);
-   }
-
wake_up(&xb_waitq);
return IRQ_HANDLED;
 }
diff --git a/drivers/xen/xenbus/xenbus_probe.c 
b/drivers/xen/xenbus/xenbus_probe.c
index 217bcc092a968..fe24e8dcb2b8e 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -674,29 +674,76 @@ void unregister_xenstore_notifier(struct notifier_block 
*nb)
 }
 EXPORT_SYMBOL_GPL(unregister_xenstore_notifier);
 
-void xenbus_probe(struct work_struct *unused)
+void xenbus_probe(void)
 {
xenstored_ready = 1;
 
+   /*
+* In the HVM case, xenbus_init() deferre

[PATCH v2 1/2] EDAC/ghes: Add EDAC device for reporting the CPU cache errors

2021-01-29 Thread Shiju Jose
CPU L2 cache corrected errors are detected occasionally on
few of our ARM64 hardware boards. Though it is rare, the
probability of the CPU cache errors frequently occurring
can't be avoided. The earlier failure detection by monitoring
the cache corrected errors for the frequent occurrences and
taking preventive action could prevent more serious hardware
faults.

On Intel architectures, cache corrected errors are reported and
the affected cores are offlined in the architecture specific method.
http://www.mcelog.org/cache.html

However for the firmware-first error reporting, specifically on
ARM64 architectures, there is no provision present for reporting
the cache corrected error count to the user-space and taking
preventive action such as offline the affected cores.

For this purpose, it was suggested to create the CPU EDAC
device for the CPU caches for reporting the cache error count
for the firmware-first error reporting.
The EDAC device blocks for the CPU caches would be created
based on the cache information obtained from the cpu_cacheinfo.

User-space application could monitor the recorded corrected error
count for the earlier hardware failure detection and could take
preventive action, such as offline the corresponding CPU core/s.

Add an EDAC device and device blocks for the CPU caches
based on the cache information from the cpu_cacheinfo.
The cache's corrected error count would be stored in the
/sys/devices/system/edac/cpu/cpu*/cache*/ce_count.

Issues and possible solutions,
1.Cache info is not available for the CPUs offline.
 EDAC device interface requires creating EDAC device
 and device blocks together. It requires the number
 of caches per CPU as device blocks for the creation.
 However, this info is not available for the
 offlined CPUs.
Tested Solution: Find the max number of caches among
  online CPUs, create the EDAC device for CPUs caches
  and get and populate the cache info for an offline
  CPU later, when the error is reported on that CPU for
  the first time.

2. Reporting error count for the Shared caches.
   There are few possible solutions,
Tested Solution:
Kernel would report a new error count for a shared cache
through the EDAC device block for that CPU on which the error
is reported. Then user-space application would sum the total
error count from EDAC device block of all the CPUs in the
shared CPU list of that shared cache.

For the firmware-first error reporting, add an interface in the
ghes_edac allow to report a CPU corrected error count.

Suggested-by: James Morse 
Signed-off-by: Shiju Jose 
---
 Documentation/ABI/testing/sysfs-devices-edac |  15 ++
 drivers/acpi/apei/ghes.c |   8 +-
 drivers/edac/Kconfig |  12 ++
 drivers/edac/ghes_edac.c | 186 +++
 include/acpi/ghes.h  |  27 +++
 5 files changed, 247 insertions(+), 1 deletion(-)

diff --git a/Documentation/ABI/testing/sysfs-devices-edac 
b/Documentation/ABI/testing/sysfs-devices-edac
index 256a9e990c0b..56a18b0af419 100644
--- a/Documentation/ABI/testing/sysfs-devices-edac
+++ b/Documentation/ABI/testing/sysfs-devices-edac
@@ -155,3 +155,18 @@ Description:   This attribute file displays the total 
count of uncorrectable
errors that have occurred on this DIMM. If panic_on_ue is set, 
this
counter will not have a chance to increment, since EDAC will 
panic the
system
+
+What:   /sys/devices/system/edac/cpu/cpu*/cache*/ce_count
+Date:   December 2020
+Contact:linux-e...@vger.kernel.org
+Description:This attribute file displays the total count of correctable
+errors that have occurred on this CPU cache. This count is 
very important
+to examine. CEs provide early indications that a cache is 
beginning
+to fail. This count field should be monitored for non-zero 
values
+and report such information to the system administrator.
+
+What:   /sys/devices/system/edac/cpu/cpu*/cache*/ue_count
+Date:   December 2020
+Contact:linux-e...@vger.kernel.org
+Description:This attribute file displays the total count of uncorrectable
+errors that have occurred on this CPU cache.
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index fce7ade2aba9..139540f2c8f4 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -1452,4 +1452,10 @@ static int __init ghes_init(void)
 err:
return rc;
 }
-device_initcall(ghes_init);
+
+/*
+ * device_initcall_sync() is added instead of the device_initcall()
+ * because the CPU cacheinfo should be populated and is required for
+ * adding the CPU cache edac device in the ghes_edac_register().
+ */
+device_initcall_sync(ghes_init);
diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
index 81c42664f21b..39fb53aa9cd9 100644
--- a/drivers/edac/Kconfig
+++ b/

Re: [PATCH] x86: Disable CET instrumentation in the kernel

2021-01-29 Thread Borislav Petkov
On Thu, Jan 28, 2021 at 03:52:19PM -0600, Josh Poimboeuf wrote:
> 
> With retpolines disabled, some configurations of GCC will add Intel CET
> instrumentation to the kernel by default.  That breaks certain tracing
> scenarios by adding a superfluous ENDBR64 instruction before the fentry
> call, for functions which can be called indirectly.
> 
> CET instrumentation isn't currently necessary in the kernel, as CET is
> only supported in user space.  Disable it unconditionally.
> 
> Reported-by: Nikolay Borisov 
> Signed-off-by: Josh Poimboeuf 
> ---
>  Makefile  | 6 --
>  arch/x86/Makefile | 3 +++
>  2 files changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/Makefile b/Makefile
> index e0af7a4a5598..51c2bf34142d 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -948,12 +948,6 @@ KBUILD_CFLAGS   += $(call 
> cc-option,-Werror=designated-init)
>  # change __FILE__ to the relative path from the srctree
>  KBUILD_CPPFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
>  
> -# ensure -fcf-protection is disabled when using retpoline as it is
> -# incompatible with -mindirect-branch=thunk-extern
> -ifdef CONFIG_RETPOLINE
> -KBUILD_CFLAGS += $(call cc-option,-fcf-protection=none)
> -endif
> -

Why is that even here, in the main Makefile if this cf-protection thing
is x86-specific?

Are we going to move it back there when some other arch gets CET or
CET-like support?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


[PATCH 4.9 00/30] 4.9.254-rc1 review

2021-01-29 Thread Greg Kroah-Hartman
This is the start of the stable review cycle for the 4.9.254 release.
There are 30 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Sun, 31 Jan 2021 10:59:01 +.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:

https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.254-rc1.gz
or in the git tree and branch at:

git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
linux-4.9.y
and the diffstat can be found below.

thanks,

greg k-h

-
Pseudo-Shortlog of commits:

Greg Kroah-Hartman 
Linux 4.9.254-rc1

Arvind Sankar 
x86/boot/compressed: Disable relocation relaxation

Gaurav Kohli 
tracing: Fix race in trace_open and buffer resize call

Wang Hai 
Revert "mm/slub: fix a memory leak in sysfs_slab_add()"

Dan Carpenter 
net: dsa: b53: fix an off by one in checking "vlan->vid"

Eric Dumazet 
net_sched: avoid shift-out-of-bounds in tcindex_set_parms()

Matteo Croce 
ipv6: create multicast route with RTPROT_KERNEL

Alexander Lobakin 
skbuff: back tiny skbs with kmalloc() in __netdev_alloc_skb() too

Geert Uytterhoeven 
sh_eth: Fix power down vs. is_opened flag ordering

Necip Fazil Yildiran 
sh: dma: fix kconfig dependency for G2_DMA

Guillaume Nault 
netfilter: rpfilter: mask ecn bits before fib lookup

Will Deacon 
compiler.h: Raise minimum version of GCC to 5.1 for arm64

Daniel Borkmann 
bpf: Fix buggy rsh min/max bounds tracking

JC Kuo 
xhci: tegra: Delay for disabling LFPS detector

Mathias Nyman 
xhci: make sure TRB is fully written before giving it to the controller

Patrik Jakobsson 
usb: bdc: Make bdc pci driver depend on BROKEN

Thinh Nguyen 
usb: udc: core: Use lock when write to soft_connect

Longfang Liu 
USB: ehci: fix an interrupt calltrace error

Eugene Korenevsky 
ehci: fix EHCI host controller initialization sequence

Wang Hui 
stm class: Fix module init return on allocation failure

Lars-Peter Clausen 
iio: ad5504: Fix setting power-down state

Vincent Mailhol 
can: dev: can_restart: fix use after free bug

Wolfram Sang 
i2c: octeon: check correct size of maximum RECV_LEN packet

Ben Skeggs 
drm/nouveau/i2c/gm200: increase width of aux semaphore owner fields

Ben Skeggs 
drm/nouveau/bios: fix issue shadowing expansion ROMs

Can Guo 
scsi: ufs: Correct the LUN used in eh_device_reset_handler() callback

Cezary Rojewski 
ASoC: Intel: haswell: Add missing pm_ops

Hannes Reinecke 
dm: avoid filesystem lookup in dm_get_dev_t()

Hans de Goede 
ACPI: scan: Make acpi_bus_get_device() clear return pointer on error

Takashi Iwai 
ALSA: hda/via: Add minimum mute flag

Takashi Iwai 
ALSA: seq: oss: Fix missing error check in snd_seq_oss_synth_make_info()


-

Diffstat:

 Makefile   |  4 ++--
 arch/sh/drivers/dma/Kconfig|  3 +--
 arch/x86/boot/compressed/Makefile  |  2 ++
 drivers/acpi/scan.c|  2 ++
 drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadow.c  |  2 +-
 drivers/gpu/drm/nouveau/nvkm/subdev/i2c/auxgm200.c |  8 
 drivers/hwtracing/stm/heartbeat.c  |  6 --
 drivers/i2c/busses/i2c-octeon-core.c   |  2 +-
 drivers/iio/dac/ad5504.c   |  4 ++--
 drivers/md/dm-table.c  | 15 ---
 drivers/net/can/dev.c  |  4 ++--
 drivers/net/dsa/b53/b53_common.c   |  2 +-
 drivers/net/ethernet/renesas/sh_eth.c  |  4 ++--
 drivers/scsi/ufs/ufshcd.c  | 11 ---
 drivers/usb/gadget/udc/bdc/Kconfig |  2 +-
 drivers/usb/gadget/udc/core.c  | 13 ++---
 drivers/usb/host/ehci-hcd.c| 12 
 drivers/usb/host/ehci-hub.c|  3 +++
 drivers/usb/host/xhci-ring.c   |  2 ++
 drivers/usb/host/xhci-tegra.c  |  7 +++
 include/linux/compiler-gcc.h   |  6 ++
 kernel/bpf/verifier.c  |  7 +++
 kernel/trace/ring_buffer.c |  4 
 mm/slub.c  |  4 +---
 net/core/skbuff.c  |  6 +-
 net/ipv4/netfilter/ipt_rpfilter.c  |  2 +-
 net/ipv6/addrconf.c|  1 +
 net/sched/cls_tcindex.c|  8 ++--
 sound/core/seq/oss/seq_oss_synth.c |  3 ++-
 sound/pci/hda/patch_via.c  |  1 +
 sound/soc/intel/boards/haswell.c   |  1 +
 31 files changed, 106 insertions(+), 45 deletions(

Re: [PATCH v6] close_range.2: new page documenting close_range(2)

2021-01-29 Thread Christian Brauner
On Thu, Jan 28, 2021 at 09:50:23PM +0100, Michael Kerrisk (man-pages) wrote:
> Hello Stephen, (and CHristian, please!)

Ah, I think this was mostly done which is why I kept quiet.

Christian


[PATCH RESEND v5 3/8] dt-bindings: mfd: Add compatible for the MediaTek MT6359 PMIC

2021-01-29 Thread Hsin-Hsiung Wang
This adds compatible for the MediaTek MT6359 PMIC.

Signed-off-by: Hsin-Hsiung Wang 
---
changes since v4:
- remove unused compatible name.
---
 Documentation/devicetree/bindings/mfd/mt6397.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/mfd/mt6397.txt 
b/Documentation/devicetree/bindings/mfd/mt6397.txt
index 2661775a3825..99a84b69a29f 100644
--- a/Documentation/devicetree/bindings/mfd/mt6397.txt
+++ b/Documentation/devicetree/bindings/mfd/mt6397.txt
@@ -21,6 +21,7 @@ Required properties:
 compatible:
"mediatek,mt6323" for PMIC MT6323
"mediatek,mt6358" for PMIC MT6358
+   "mediatek,mt6359" for PMIC MT6359
"mediatek,mt6397" for PMIC MT6397
 
 Optional subnodes:
-- 
2.18.0



Re: [PATCH v2 04/16] rpmsg: ctrl: implement the ioctl function to create device

2021-01-29 Thread Arnaud POULIQUEN



On 1/29/21 1:13 AM, Mathieu Poirier wrote:
> [...]
> 
>>> It seems to me that the main point to step forward is to clarify the global
>>> design and features of the rpmsg-ctrl.
>>> Depending on the decision taken, this series could be trashed and rewritten 
>>> from
>>> a blank page...To not lost to much time on the series don't hesitate to 
>>> limit
>>> the review to the minimum.
>>>
>>
>> I doubt you will ever get clear guidelines on the whole solution.  I will get
>> back to you once I am done with the SMD driver, which should be in the
>> latter part of next week.
>>
> 
> After looking at the rpmsg_chrdev driver, its current customers (i.e the Qcom
> drivers), the rpmsg name service and considering the long term goals of this
> patchset I have the following guidelines: 
> 
> 1) I thought long and hard about how to split the current rpmsg_chrdev driver
> between the control plane and the raw device plane and the end solution looks
> much slimpler than I expected.  Exporting function rpmsg_eptdev_create() after
> moving it to another file (along with other dependencies) should be all we 
> need.
> Calling rpmsg_eptdev_create() from rpmsg_ctrldev_ioctl() will automatically 
> load
> the new driver, the same way calling rpmsg_ns_register_device() from
> rpmsg_probe() took care of loading the rpmsg_ns driver.
> 
> 2) While keeping the control plane functionality related to
> RPMSG_CREATE_EPT_IOCTL intact, introduce a new RPMSG_CREATE_DEV_IOCTL that 
> will
> allow for the instantiation of rpmsg_devices, exactly the same way a name 
> service
> announcement from a remote processor does.  I envision that code path to
> eventually call rpmsg_create_channel().
> 
> 3) Leave the rpmsg_channel_info structure intact and use the
> rpmsg_channel_info::name to bind to a rpmsg_driver, exactly how it is 
> currently
> done for name service driver selection.  That will allow us to re-use the
> current rpmsg_bus intrastructure, i.e rpmsg_bus::match(), without having to 
> deal
> with yet another bus type.  Proceeding this way gives us the opportunity to 
> keep
> the current channel name convention for other rpmch_chrdev users untouched.
> 
> 4) In a prior conversation you indicated the intention of instantiating the
> rpmsg_chrdev from the name service interface.  I agree with doing so but 
> conjugating that with the RPMSG_CHAR kenrel define may be tricky.  I will wait
> to see what you come up with.
> 
> I hope this helps.

Thank you for these guidelines! It need a bit of time to look at the details
(especially point 1) ), but your suggestion seems to me to be a good compromise.
I hope to come back soon with a new revision based on this point.

Regards,
Arnaud

> 
> Thanks,
> Mathieu
> 
> 
>  
>>> Thanks,
>>> Arnaud
>>>

 Thanks,
 Mathieu

> + return NULL;
> +}
> +
>  static long rpmsg_ctrl_dev_ioctl(struct file *fp, unsigned int cmd,
>unsigned long arg)
>  {
>   struct rpmsg_ctrl_dev *ctrldev = fp->private_data;
> -
> - dev_info(&ctrldev->dev, "Control not yet implemented\n");
> + void __user *argp = (void __user *)arg;
> + struct rpmsg_channel_info chinfo;
> + struct rpmsg_endpoint_info eptinfo;
> + struct rpmsg_device *newch;
> +
> + if (cmd != RPMSG_CREATE_EPT_IOCTL)
> + return -EINVAL;
> +
> + if (copy_from_user(&eptinfo, argp, sizeof(eptinfo)))
> + return -EFAULT;
> +
> + /*
> +  * In a frst step only the rpmsg_raw service is supported.
> +  * The override is foorced to RPMSG_RAW_SERVICE
> +  */
> + chinfo.driver_override = rpmsg_ctrl_get_drv_name(RPMSG_RAW_SERVICE);
> + if (!chinfo.driver_override)
> + return -ENODEV;
> +
> + memcpy(chinfo.name, eptinfo.name, RPMSG_NAME_SIZE);
> + chinfo.name[RPMSG_NAME_SIZE - 1] = '\0';
> + chinfo.src = eptinfo.src;
> + chinfo.dst = eptinfo.dst;
> +
> + newch = rpmsg_create_channel(ctrldev->rpdev, &chinfo);
> + if (!newch) {
> + dev_err(&ctrldev->dev, "rpmsg_create_channel failed\n");
> + return -ENXIO;
> + }
>  
>   return 0;
>  };
> -- 
> 2.17.1
>


[v6,0/3] mt8183: Add Mediatek thermal driver and dtsi

2021-01-29 Thread Michael Kao
This patchset supports for MT8183 chip to mtk_thermal.c.
Add thermal zone of all the thermal sensor in SoC for
another get temperatrue. They don't need to thermal throttle.
And we bind coolers for thermal zone nodes of cpu_thermal.

Changes in v6:
- Rebase to kernel-5.11-rc1.
- [1/3]
- add interrupts property.
- [2/3]
- add the Tested-by in the commit message.
- [3/3]
- use the mt->conf->msr[id] instead of conf->msr[id] in the
  _get_sensor_temp and mtk_thermal_bank_temperature.
- remove the redundant space in _get_sensor_temp and
  mtk_read_sensor_temp.
- change kmalloc to dev_kmalloc in mtk_thermal_probe.

Changes in v5:
- Rebase to kernel-5.9-rc1.
- Revise the title of cover letter.
- Drop "[v4,7/7] thermal: mediatek: use spinlock to protect PTPCORESEL"
- [2/2]
-  Add the judgement to the version of raw_to_mcelsius.

Changes in v4:
- Rebase to kernel-5.6-rc1.
- [1/7]
- Squash thermal zone settings in the dtsi from [v3,5/8]
  arm64: dts: mt8183: Increase polling frequency for CPU thermal zone.
- Remove the property of interrupts and mediatek,hw-reset-temp.
- [2/7]
- Correct commit message.
- [4/7]
- Change the target temperature to the 80C and change the commit 
message.
- [6/7]
- Adjust newline alignment.
- Fix the judgement on the return value of registering thermal zone.

Changes in v3:
- Rebase to kernel-5.5-rc1.
- [1/8]
- Update sustainable power of cpu, tzts1~5 and tztsABB.
- [7/8]
- Bypass the failure that non cpu_thermal sensor is not find in 
thermal-zones
  in dts, which is normal for mt8173, so prompt a warning here instead 
of
  failing.

Return -EAGAIN instead of -EACCESS on the first read of sensor that
often are bogus values. This can avoid following warning on boot:

  thermal thermal_zone6: failed to read out thermal zone (-13)

Changes in v2:
- [1/8]
- Add the sustainable-power,trips,cooling-maps to the tzts1~tztsABB.
- [4/8]
- Add the min opp of cpu throttle.

Matthias Kaehlcke (1):
  arm64: dts: mt8183: Configure CPU cooling

Michael Kao (2):
  thermal: mediatek: add another get_temp ops for thermal sensors
  arm64: dts: mt8183: add thermal zone node

 arch/arm64/boot/dts/mediatek/mt8183.dtsi | 140 +++
 drivers/thermal/mtk_thermal.c| 100 
 2 files changed, 215 insertions(+), 25 deletions(-)

-- 
2.18.0



Re: [PATCH v5 4/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller

2021-01-29 Thread Russell King - ARM Linux admin
On Fri, Jan 29, 2021 at 11:26:38AM +0100, Arnd Bergmann wrote:
> Another clarification, as there are actually two independent
> points here:
> 
> * if you can completely remove the readl() above and just write a
>   hardcoded value into the register, or perhaps read the original
>   value once at boot time, that is probably a win because it
>   avoids one of the barriers in the beginning. The datasheet should
>   tell you if there are any bits in the register that have to be
>   preserved
> 
> * Regarding the _relaxed() accessors, it's a lot harder to know
>   whether that is safe, as you first have to show, in particular in case
>   any of the accesses stop being guarded by the spinlock in that
>   case, and whether there may be a case where you have to
>   serialize the memory access against accesses that are still in the
>   store queue or prefetched.
> 
> Whether this matters at all depends mostly on the type of devices
> you are driving on your SoC. If you have any high-speed network
> interfaces that are unable to do cache coherent DMA, any extra
> instruction here may impact the number of packets you can transfer,
> but if all your high-speed devices are connected to a coherent
> interconnect, I would just go with the obvious approach and use
> the safe MMIO accessors everywhere.

For L2 cache code, I would say the opposite, actually, because it is
all too easy to get into a deadlock otherwise.

If you implement the sync callback, that will be called from every
non-relaxed accessor, which means if you need to take some kind of
lock in the sync callback and elsewhere in the L2 cache code, you will
definitely deadlock.

It is safer to put explicit barriers where it is necessary.

Also remember that the barrier in readl() etc is _after_ the read, not
before, and the barrier in writel() is _before_ the write, not after.
The point is to ensure that DMA memory accesses are properly ordered
with the IO-accessing instructions.

So, using readl_relaxed() with a read-modify-write is entirely sensible
provided you do not access DMA memory inbetween.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!


Re: [PATCH v4 4/8] drm/mediatek: enable OVL_LAYER_SMI_ID_EN for multi-layer usecase

2021-01-29 Thread CK Hu
Hi, Hsin-Yi:

On Fri, 2021-01-29 at 15:34 +0800, Hsin-Yi Wang wrote:
> From: Yongqiang Niu 
> 
> enable OVL_LAYER_SMI_ID_EN for multi-layer usecase, without this patch,
> ovl will hang up when more than 1 layer enabled.

Reviewed-by: CK Hu 

> 
> Signed-off-by: Yongqiang Niu 
> Signed-off-by: Hsin-Yi Wang 
> ---
>  drivers/gpu/drm/mediatek/mtk_disp_ovl.c | 17 +
>  1 file changed, 17 insertions(+)
> 
> diff --git a/drivers/gpu/drm/mediatek/mtk_disp_ovl.c 
> b/drivers/gpu/drm/mediatek/mtk_disp_ovl.c
> index da7e38a28759b..961f87f8d4d15 100644
> --- a/drivers/gpu/drm/mediatek/mtk_disp_ovl.c
> +++ b/drivers/gpu/drm/mediatek/mtk_disp_ovl.c
> @@ -24,6 +24,7 @@
>  #define DISP_REG_OVL_RST 0x0014
>  #define DISP_REG_OVL_ROI_SIZE0x0020
>  #define DISP_REG_OVL_DATAPATH_CON0x0024
> +#define OVL_LAYER_SMI_ID_EN  BIT(0)
>  #define OVL_BGCLR_SEL_IN BIT(2)
>  #define DISP_REG_OVL_ROI_BGCLR   0x0028
>  #define DISP_REG_OVL_SRC_CON 0x002c
> @@ -62,6 +63,7 @@ struct mtk_disp_ovl_data {
>   unsigned int gmc_bits;
>   unsigned int layer_nr;
>   bool fmt_rgb565_is_0;
> + bool smi_id_en;
>  };
>  
>  /**
> @@ -134,6 +136,13 @@ void mtk_ovl_start(struct device *dev)
>  {
>   struct mtk_disp_ovl *ovl = dev_get_drvdata(dev);
>  
> + if (ovl->data->smi_id_en) {
> + unsigned int reg;
> +
> + reg = readl(ovl->regs + DISP_REG_OVL_DATAPATH_CON);
> + reg = reg | OVL_LAYER_SMI_ID_EN;
> + writel_relaxed(reg, ovl->regs + DISP_REG_OVL_DATAPATH_CON);
> + }
>   writel_relaxed(0x1, ovl->regs + DISP_REG_OVL_EN);
>  }
>  
> @@ -142,6 +151,14 @@ void mtk_ovl_stop(struct device *dev)
>   struct mtk_disp_ovl *ovl = dev_get_drvdata(dev);
>  
>   writel_relaxed(0x0, ovl->regs + DISP_REG_OVL_EN);
> + if (ovl->data->smi_id_en) {
> + unsigned int reg;
> +
> + reg = readl(ovl->regs + DISP_REG_OVL_DATAPATH_CON);
> + reg = reg & ~OVL_LAYER_SMI_ID_EN;
> + writel_relaxed(reg, ovl->regs + DISP_REG_OVL_DATAPATH_CON);
> + }
> +
>  }
>  
>  void mtk_ovl_config(struct device *dev, unsigned int w,



[PATCH 4.9 19/30] bpf: Fix buggy rsh min/max bounds tracking

2021-01-29 Thread Greg Kroah-Hartman
From: Daniel Borkmann 

[ no upstream commit ]

Fix incorrect bounds tracking for RSH opcode. Commit f23cc643f9ba ("bpf: fix
range arithmetic for bpf map access") had a wrong assumption about min/max
bounds. The new dst_reg->min_value needs to be derived by right shifting the
max_val bounds, not min_val, and likewise new dst_reg->max_value needs to be
derived by right shifting the min_val bounds, not max_val. Later stable kernels
than 4.9 are not affected since bounds tracking was overall reworked and they
already track this similarly as in the fix.

Fixes: f23cc643f9ba ("bpf: fix range arithmetic for bpf map access")
Reported-by: Ryota Shiga (Flatt Security)
Signed-off-by: Daniel Borkmann 
Reviewed-by: John Fastabend 
Cc: Josef Bacik 
Signed-off-by: Greg Kroah-Hartman 
---
 kernel/bpf/verifier.c |7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1732,12 +1732,11 @@ static void adjust_reg_min_max_vals(stru
 * unsigned shift, so make the appropriate casts.
 */
if (min_val < 0 || dst_reg->min_value < 0)
-   dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
+   reset_reg_range_values(regs, insn->dst_reg);
else
-   dst_reg->min_value =
-   (u64)(dst_reg->min_value) >> min_val;
+   dst_reg->min_value = (u64)(dst_reg->min_value) >> 
max_val;
if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
-   dst_reg->max_value >>= max_val;
+   dst_reg->max_value >>= min_val;
break;
default:
reset_reg_range_values(regs, insn->dst_reg);




Re: [PATCH v3] mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfo

2021-01-29 Thread Oscar Salvador
On Fri, Jan 29, 2021 at 12:34:51PM +0100, David Hildenbrand wrote:
> Let's count the number of CMA pages per zone and print them in
> /proc/zoneinfo.
> 
> Having access to the total number of CMA pages per zone is helpful for
> debugging purposes to know where exactly the CMA pages ended up, and to
> figure out how many pages of a zone might behave differently, even after
> some of these pages might already have been allocated.
> 
> As one example, CMA pages part of a kernel zone cannot be used for
> ordinary kernel allocations but instead behave more like ZONE_MOVABLE.
> 
> For now, we are only able to get the global nr+free cma pages from
> /proc/meminfo and the free cma pages per zone from /proc/zoneinfo.
> 
> Example after this patch when booting a 6 GiB QEMU VM with
> "hugetlb_cma=2G":
>   # cat /proc/zoneinfo | grep cma
>   cma  0
> nr_free_cma  0
>   cma  0
> nr_free_cma  0
>   cma  524288
> nr_free_cma  493016
>   cma  0
>   cma  0
>   # cat /proc/meminfo | grep Cma
>   CmaTotal:2097152 kB
>   CmaFree: 1972064 kB
> 
> Note: We print even without CONFIG_CMA, just like "nr_free_cma"; this way,
>   one can be sure when spotting "cma 0", that there are definetly no
>   CMA pages located in a zone.
> 
> Cc: Andrew Morton 
> Cc: Thomas Gleixner 
> Cc: "Peter Zijlstra (Intel)" 
> Cc: Mike Rapoport 
> Cc: Oscar Salvador 
> Cc: Michal Hocko 
> Cc: Wei Yang 
> Cc: David Rientjes 
> Cc: linux-...@vger.kernel.org
> Signed-off-by: David Hildenbrand 

Looks good to me, I guess it is better to print it unconditionally
so the layout does not change.

Reviewed-by: Oscar Salvador 

thanks

> ---
> 
> The third time is the charm.
> 
> v2 -> v3:
> - Print even without CONFIG_CMA. Use zone_cma_pages().
> - Adjust patch description
> - Dropped Oscar's RB due to the changes
> 
> v1 -> v2:
> - Print/track only with CONFIG_CMA
> - Extend patch description
> 
> ---
>  include/linux/mmzone.h | 15 +++
>  mm/page_alloc.c|  1 +
>  mm/vmstat.c|  6 --
>  3 files changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index ae588b2f87ef..caafd5e37080 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -503,6 +503,9 @@ struct zone {
>* bootmem allocator):
>*  managed_pages = present_pages - reserved_pages;
>*
> +  * cma pages is present pages that are assigned for CMA use
> +  * (MIGRATE_CMA).
> +  *
>* So present_pages may be used by memory hotplug or memory power
>* management logic to figure out unmanaged pages by checking
>* (present_pages - managed_pages). And managed_pages should be used
> @@ -527,6 +530,9 @@ struct zone {
>   atomic_long_t   managed_pages;
>   unsigned long   spanned_pages;
>   unsigned long   present_pages;
> +#ifdef CONFIG_CMA
> + unsigned long   cma_pages;
> +#endif
>  
>   const char  *name;
>  
> @@ -624,6 +630,15 @@ static inline unsigned long zone_managed_pages(struct 
> zone *zone)
>   return (unsigned long)atomic_long_read(&zone->managed_pages);
>  }
>  
> +static inline unsigned long zone_cma_pages(struct zone *zone)
> +{
> +#ifdef CONFIG_CMA
> + return zone->cma_pages;
> +#else
> + return 0;
> +#endif
> +}
> +
>  static inline unsigned long zone_end_pfn(const struct zone *zone)
>  {
>   return zone->zone_start_pfn + zone->spanned_pages;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b031a5ae0bd5..9a82375bbcb2 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2168,6 +2168,7 @@ void __init init_cma_reserved_pageblock(struct page 
> *page)
>   }
>  
>   adjust_managed_page_count(page, pageblock_nr_pages);
> + page_zone(page)->cma_pages += pageblock_nr_pages;
>  }
>  #endif
>  
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 7758486097f9..b2537852d498 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1642,14 +1642,16 @@ static void zoneinfo_show_print(struct seq_file *m, 
> pg_data_t *pgdat,
>  "\nhigh %lu"
>  "\nspanned  %lu"
>  "\npresent  %lu"
> -"\nmanaged  %lu",
> +"\nmanaged  %lu"
> +"\ncma  %lu",
>  zone_page_state(zone, NR_FREE_PAGES),
>  min_wmark_pages(zone),
>  low_wmark_pages(zone),
>  high_wmark_pages(zone),
>  zone->spanned_pages,
>  zone->present_pages,
> -zone_managed_pages(zone));
> +zone_managed_pages(zone),
> +zone_cma_pages(zone));
>  
>   seq_printf(m,
>  "\nprotection: (%ld",
> -- 
> 2.29.2
> 
> 

-- 
Oscar Salvador
SUSE L3


Re: [PATCH v2] kretprobe: avoid re-registration of the same kretprobe earlier

2021-01-29 Thread Steven Rostedt
On Fri, 29 Jan 2021 22:29:47 +0900
Masami Hiramatsu  wrote:

> I'll send a patch over this to replace those check with WARN_ON() since
> it's a software bug and should be fixed.

Please use WARN_ON_ONCE()

Thanks!

-- Steve


Re: [PATCH v2] kretprobe: avoid re-registration of the same kretprobe earlier

2021-01-29 Thread Steven Rostedt
On Fri, 29 Jan 2021 15:23:47 +0530
"Naveen N. Rao"  wrote:

> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index f7fb5d135930fa..63a36f33565354 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1530,6 +1530,7 @@ static inline int check_kprobe_rereg(struct kprobe *p)
> ret = -EINVAL;
> mutex_unlock(&kprobe_mutex);
> 
> +   WARN_ON(ret);
> return ret;
>  }

Please use WARN_ON_ONCE(ret);

Thanks,

-- Steve


[GIT PULL] Power management fixes for v5.11-rc6

2021-01-29 Thread Rafael J. Wysocki
Hi Linus,

Please pull from the tag

 git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
 pm-5.11-rc6

with top-most commit fef9c8d28e28a808274a18fbd8cc2685817fd62a

 PM: hibernate: flush swap writer after marking

on top of commit 6ee1d745b7c9fd573fba142a2efdad76a9f1cb04

 Linux 5.11-rc5

to receive power management fixes for 5.11-rc6.

These fix a deadlock in the "kexec jump" code and address a possible
hibernation image creation issue.

Specifics:

 - Fix a deadlock caused by attempting to acquire the same mutex
   twice in a row in the "kexec jump" code (Baoquan He).

 - Modify the hibernation image saving code to flush the unwritten
   data to the swap storage later so as to avoid failing to write the
   image signature which is possible in some cases (Laurent Badel).

Thanks!


---

Baoquan He (1):
  kernel: kexec: remove the lock operation of system_transition_mutex

Laurent Badel (1):
  PM: hibernate: flush swap writer after marking

---

 kernel/kexec_core.c | 2 --
 kernel/power/swap.c | 2 +-
 2 files changed, 1 insertion(+), 3 deletions(-)


Re: [PATCH v2] x86/debug: Fix DR6 handling

2021-01-29 Thread Borislav Petkov
On Fri, Jan 29, 2021 at 04:41:09PM +0100, Oleg Nesterov wrote:
> This seems to fix the problem reported by Jan, see his test-case below.

Should it be part of

tools/testing/selftests/breakpoints/

?

tglx has one destined for there already, wouldn't hurt to have a second
one:

https://lkml.kernel.org/r/87eei4d4k6@nanos.tec.linutronix.de

after applying kernel coding style to that one.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH] bus: mvebu-mbus: make iounmap() symmetric with ioremap()

2021-01-29 Thread Thomas Petazzoni
On Fri, 29 Jan 2021 17:01:35 +0100
Gregory CLEMENT  wrote:

> Could you sent me the patch I don't have it in my emails boxes.

https://lore.kernel.org/lkml/20201112032149.21906-1-chris.pack...@alliedtelesis.co.nz/raw

Thomas
-- 
Thomas Petazzoni, CTO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


Re: [PATCH] x86: Disable CET instrumentation in the kernel

2021-01-29 Thread Borislav Petkov
On Fri, Jan 29, 2021 at 09:10:34AM -0600, Josh Poimboeuf wrote:
> Maybe eventually.  But the enablement (actually enabling CET/CFI/etc)
> happens in the arch code anyway, right?  So it could be a per-arch
> decision.

Right.

Ok, for this one, what about

Cc: 

?

What are "some configurations of GCC"? If it can be reproduced with
what's released out there, maybe that should go in now, even for 5.11?

Hmm?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH iproute-next v2] devlink: add support for port params get/set

2021-01-29 Thread David Ahern
On 1/25/21 6:48 AM, Oleksandr Mazur wrote:
> Add implementation for the port parameters getting/setting.
> Add bash completion for port param.
> Add man description for port param.
> 

Add example commands here - both set and show. Include a json version of
the show.

> Signed-off-by: Oleksandr Mazur 
> ---
> V2:
> 1) Add bash completion for port param;
> 2) Add man decsription / examples for port param;
> 
>  bash-completion/devlink |  55 
>  devlink/devlink.c   | 275 +++-
>  man/man8/devlink-port.8 |  65 ++
>  3 files changed, 389 insertions(+), 6 deletions(-)
> 

> diff --git a/devlink/devlink.c b/devlink/devlink.c
> index a2e06644..0fc1d4f0 100644
> --- a/devlink/devlink.c
> +++ b/devlink/devlink.c
> @@ -2706,7 +2706,8 @@ static void pr_out_param_value(struct dl *dl, const 
> char *nla_name,
>   }
>  }
>  
> -static void pr_out_param(struct dl *dl, struct nlattr **tb, bool array)
> +static void pr_out_param(struct dl *dl, struct nlattr **tb, bool array,
> +  bool is_port_param)
>  {
>   struct nlattr *nla_param[DEVLINK_ATTR_MAX + 1] = {};
>   struct nlattr *param_value_attr;
> @@ -2714,6 +2715,7 @@ static void pr_out_param(struct dl *dl, struct nlattr 
> **tb, bool array)
>   int nla_type;
>   int err;
>  
> +

stray newline here




Re: dax alignment problem on arm64 (and other achitectures)

2021-01-29 Thread Pavel Tatashin
On Fri, Jan 29, 2021 at 8:19 AM David Hildenbrand  wrote:
>
> On 29.01.21 03:06, Pavel Tatashin wrote:
> >>> Might be related to the broken custom pfn_valid() implementation for
> >>> ZONE_DEVICE.
> >>>
> >>> https://lkml.kernel.org/r/1608621144-4001-1-git-send-email-anshuman.khand...@arm.com
> >>>
> >>> And essentially ignoring sub-section data in there for now as well (but
> >>> might not be that relevant yet). In addition, this might also be related 
> >>> to
> >>>
> >>> https://lkml.kernel.org/r/161058499000.1840162.702316708443239771.st...@dwillia2-desk3.amr.corp.intel.com
> >>
> >> I will check it, and see what I find. I saw that panic almost a year
> >> ago, things might have changed since then.
> >
> > Hi David,
> >
> > There is no panic anymore, but I also can't offset by 2M anymore, the
> > minimum that works now is 16M, and if alignment is less than 16M
> > creating devdax device fails.
>
> I wonder why we get such different namespace sizes? Where do the
> differences come from? This looks very weird.
>
> >
> > So, I tried the new ARM64 patch that reduces section sizes, and two
> > alignments for pmem: regular 2G alignment, and 2G+16M alignment.
> > (subtracted 16M from the bottom)
> >
> > * 4K page, 6G RAM, 2G PRAM  *
> > BOOT:
> > 4000-1bfff : System RAM
> > 1c000-23fff : namespace0.0
> > DEVDAX:
> > 4000-1bfff : System RAM
> > 1c000-1c21f : namespace0.0
> > 1c220-23fff : dax0.0
> > HOTPLUG:
> > 4000-1bfff : System RAM
> > 1c000-1c21f : namespace0.0
> > 1c800-23fff : dax0.0
> >1c800-23fff : System RAM (kmem)   128M Wasted 
> > (Expected)
>
> The namespace spans 34MB??
>
> >
> > * 4K page, 6G-16M RAM, 2G+16M PRAM  *
> > BOOT:
> > 4000-1beff : System RAM
> > 1bf00-23fff : namespace0.0
> > DEVDAX:
> > 4000-1beff : System RAM
> > 1bf00-1c11f : namespace0.0
> > 1c120-23fff : dax0.0
> > HOTPLUG:
> > 4000-1beff : System RAM
> > 1bf00-1c11f : namespace0.0
> > 1c800-23fff : dax0.0
> >1c800-23fff : System RAM (kmem)   144M Wasted ()
>
> The namespace spans 34MB??

Right, this seems like a bug

>
> >
> > * 64K page, 6G RAM, 2G PRAM  *
> > BOOT:
> > 4000-1bfff : System RAM
> > 1c000-23fff : namespace0.0
> > DEVDAX:
> > 4000-1bfff : System RAM
> > 1c000-1dfff : namespace0.0
> > 1e000-23fff : dax0.0
> > HOTPLUG:
> > 4000-1bfff : System RAM
> > 1c000-1dfff : namespace0.0
>
> The namespace spans 512MB ?!? What?

This is because section size is 512M with 64K pages.

>
> > 1e000-23fff : dax0.0
> >1e000-23fff : System RAM (kmem)   512M Wasted 
> > (Expected)
> >
> > * 64K page, 6G-16M RAM, 2G+16M PRAM  *
> > BOOT:
> > 4000-1beff : System RAM
> > 1bf00-23fff : namespace0.0
> > DEVDAX:
> > 4000-1beff : System RAM
> > 1bf00-1bf3f : namespace0.0
> > 1bf40-23fff : dax0.0
> > HOTPLUG:
> > 4000-1beff : System RAM
> > 1bf00-1bf3f : namespace0.0
>
> The namespace now consumes 4MB ?!?
>
> > 1c000-23fff : dax0.0
> >1c000-23fff : System RAM (kmem)   16M Wasted 
> > (Optimal)
>
> Good :) I guess more optimal would be 2MB/0MB :)

Agree, but for the offset 16M this is optimal, because 16M is smaller
than section size.

>
> >
> > In all three cases only System RAM, namespace0.0, and dax0.0 were
> > printed from /proc/iomem.
> > BOOTcontent of iomem right after boot
> > DEVDAX  content of iomem after devdax is created
> > ndctl create-namespace --mode devdax  -e namespace0.0"
> > HOTPLUG content of imem after dax0.0 is hotplugged:
> > echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind
> > echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id
> >
> >
> > The most surprising part is why with 4K pages and 16M offset 144M is
> > wasted? For whatever reason, when devdax is created 34 goes wasted to
> > the label? Something is wrong here.. However, I am happy with 64K
> > pages result, and that only 16M is wasted, of course optimally, we
> > should be using any memory here, but it is still much better than what
> > we have now.
>
> Definitely, but we should try figuring out what's going on here. I
> assume on x86-64 it behaves differently?

Yes, we should root cause. I highly suspect that there is somewhere
alignment miscalculations happen that cause this memory waste with the
offset 16M. I am also not sure why the 2M label size was increased,
and  why 16M is now an alignment requirement.

I tested on x86, and got pretty much the same results as on ARM64: 2M
offset is not allowed anymore 16M minimum, and even with 16M offset,
144M is wasted. Here is full QEMU command if anyone wants to repro it:


KERNEL_PARAM='console=ttyS0 ip=dhcp'
KERNEL_PARAM+=' memmap=2G!8G'
#KERNEL_PARAM+=' memmap=2064M!8176M'

qemu-system-x86_64

YOU HAVE WON

2021-01-29 Thread lottonlxxx
LOTTO.NL,
2391  Beds 152 Koningin Julianaplein 21,
Den Haag-Netherlands.
(Lotto affiliate with Subscriber Agents).
From: Susan Console
(Lottery Coordinator)
Website: www.lotto.nl

Sir/Madam,

CONGRATULATIONS!!!

We are pleased to inform you of the result of the Lotto NL Winners 
International programs held on the 27th of January 2021.  Your e-mail address 
attached to ticket #: 00903228100 with prize # 778009/UK drew €1,000,000.00 
which was first in the 2nd class of the draws. you are to receive €1,000,000.00 
(One Million Euros). Because of mix up in cash
pay-outs, we ask that you keep your winning information confidential until your 
money (€1,000,000.00) has been fully remitted to you by our accredited 
pay-point bank. 

This measure must be adhere to  avoid loss of your cash prize-winners of our 
cash prizes are advised to adhere to these instructions to forestall the abuse 
of this program by other participants.  

It's important to note that this draws were conducted formally, and winners are 
selected through an internet ballot system from 60,000 individual and companies 
e-mail addresses - the draws are conducted around the world through our 
internet based ballot system. The promotion is sponsored and promoted Lotto NL. 

We congratulate you once again. We hope you will use part of it in our next 
draws; the jackpot winning is €85million.  Remember, all winning must be 
claimed not later than 20 days. After this date all unclaimed cash prize will 
be forfeited and included in the next sweepstake.  Please, in order to avoid 
unnecessary delays and complications remember to quote personal and winning 
numbers in all correspondence with us.

Congratulations once again from all members of Lotto NL. Thank you for being 
part of our promotional program.

To file for the release of your winnings you are advice to contact our Foreign 
Transfer Manager:

MR. WILSON WARREN JOHNSON

Tel: +31-620-561-787

Fax: +31-84-438-5342

Email: johnsonwilson...@gmail.com





Re: [PATCH v2] btrfs: Avoid calling btrfs_get_chunk_map() twice

2021-01-29 Thread Josef Bacik

On 1/27/21 8:57 AM, Michal Rostecki wrote:

From: Michal Rostecki 

Before this change, the btrfs_get_io_geometry() function was calling
btrfs_get_chunk_map() to get the extent mapping, necessary for
calculating the I/O geometry. It was using that extent mapping only
internally and freeing the pointer after its execution.

That resulted in calling btrfs_get_chunk_map() de facto twice by the
__btrfs_map_block() function. It was calling btrfs_get_io_geometry()
first and then calling btrfs_get_chunk_map() directly to get the extent
mapping, used by the rest of the function.

This change fixes that by passing the extent mapping to the
btrfs_get_io_geometry() function as an argument.

v2:
When btrfs_get_chunk_map() returns an error in btrfs_submit_direct():
- Use errno_to_blk_status(PTR_ERR(em)) as the status
- Set em to NULL

Signed-off-by: Michal Rostecki 


This panic'ed all of my test vms in their overnight xfstests runs, the panic is 
this

[ 2449.936502] BTRFS critical (device dm-7): mapping failed logical 1113825280 
bio len 40960 len 24576

[ 2449.937073] [ cut here ]
[ 2449.937329] kernel BUG at fs/btrfs/volumes.c:6450!
[ 2449.937604] invalid opcode:  [#1] SMP NOPTI
[ 2449.937855] CPU: 0 PID: 259045 Comm: kworker/u5:0 Not tainted 5.11.0-rc5+ 
#122
[ 2449.938252] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
1.13.0-2.fc32 04/01/2014

[ 2449.938713] Workqueue: btrfs-worker-high btrfs_work_helper
[ 2449.939016] RIP: 0010:btrfs_map_bio.cold+0x5a/0x5c
[ 2449.939392] Code: 37 87 ff ff e8 ed d4 8a ff 48 83 c4 18 e9 b5 52 8b ff 49 89 
c8 4c 89 fa 4c 89 f1 48 c7 c6 b0 c0 61 8b 48 89 ef e8 11 87 ff ff <0f> 0b 4c 89 
e7 e8 42 09 86 ff e9 fd 59 8b ff 49 8b 7a 50 44 89 f2

[ 2449.940402] RSP: :9f24c1637d90 EFLAGS: 00010282
[ 2449.940689] RAX: 0057 RBX: 90c78ff716b8 RCX: 
[ 2449.941080] RDX: 90c7fbc27ae0 RSI: 90c7fbc19110 RDI: 90c7fbc19110
[ 2449.941467] RBP: 90c7911d4000 R08:  R09: 
[ 2449.941853] R10: 9f24c1637b48 R11: 8b9723e8 R12: 
[ 2449.942243] R13:  R14: a000 R15: 4263a000
[ 2449.942632] FS:  () GS:90c7fbc0() 
knlGS:

[ 2449.943072] CS:  0010 DS:  ES:  CR0: 80050033
[ 2449.943386] CR2: 5575163c3080 CR3: 00010ad6c004 CR4: 00370ef0
[ 2449.943772] Call Trace:
[ 2449.943915]  ? lock_release+0x1c3/0x290
[ 2449.944135]  run_one_async_done+0x3a/0x60
[ 2449.944360]  btrfs_work_helper+0x136/0x520
[ 2449.944588]  process_one_work+0x26e/0x570
[ 2449.944812]  worker_thread+0x55/0x3c0
[ 2449.945016]  ? process_one_work+0x570/0x570
[ 2449.945250]  kthread+0x137/0x150
[ 2449.945430]  ? __kthread_bind_mask+0x60/0x60
[ 2449.945666]  ret_from_fork+0x1f/0x30

it happens when you run btrfs/060.  Please make sure to run xfstests against 
patches before you submit them upstream.  Thanks,


Josef


Re: kprobes broken since 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()")

2021-01-29 Thread Peter Zijlstra
On Fri, Jan 29, 2021 at 10:59:52AM -0500, Steven Rostedt wrote:
> On Fri, 29 Jan 2021 22:40:11 +0900
> Masami Hiramatsu  wrote:
> 
> > > So what, they can all happen with random locks held. Marking them as NMI
> > > enables a whole bunch of sanity checks that are entirely appropriate.  
> > 
> > How about introducing an idea of Asynchronous NMI (ANMI) and Synchronous
> > NMI (SNMI)? kprobes and ftrace is synchronously called and can be controlled
> > (we can expect the context) but ANMI may be caused by asynchronous 
> > hardware events on any context.
> > 
> > If we can distinguish those 2 NMIs on preempt count, bpf people can easily
> > avoid the inevitable situation.
> 
> I don't like the name NMI IN SNMI, because they are not NMIs. They are
> actually more like kernel exceptions. Even page faults in the kernel is
> similar to a kprobe breakpoint or ftrace. It can happen anywhere, with any
> lock held. Perhaps we need a kernel exception context? Which by definition
> is synchronous.

What problem are you trying to solve? AFAICT all these contexts have the
same restrictions, why try and muck about with different names for the
same thing?


Re: [REGRESSION] "ALSA: HDA: Early Forbid of runtime PM" broke my laptop's internal audio

2021-01-29 Thread Takashi Iwai
On Fri, 29 Jan 2021 17:12:08 +0100,
Michael Catanzaro wrote:
> 
> On Fri, Jan 29, 2021 at 9:30 am, Michael Catanzaro
>  wrote:
> > OK, I found "ALSA: hda/via: Apply the workaround generically for
> > Clevo machines" which was just merged yesterday. So I will test
> > again to find out.
> 
> Hi Takashi, hi Harsha,
> 
> I can confirm that the problem is fixed by this commit:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4961167bf7482944ca09a6f71263b9e47f949851

Thanks, good to hear.

Then I think we can drop the entry from power_save_denylist in
hda_intel.c.  Could you try that it still works with the patch below?


thanks,

Takashi

--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -2217,8 +2217,6 @@ static const struct snd_pci_quirk power_save_denylist[] = 
{
/* https://bugzilla.redhat.com/show_bug.cgi?id=1525104 */
SND_PCI_QUIRK(0x1043, 0x8733, "Asus Prime X370-Pro", 0),
/* https://bugzilla.redhat.com/show_bug.cgi?id=1525104 */
-   SND_PCI_QUIRK(0x1558, 0x6504, "Clevo W65_67SB", 0),
-   /* https://bugzilla.redhat.com/show_bug.cgi?id=1525104 */
SND_PCI_QUIRK(0x1028, 0x0497, "Dell Precision T3600", 0),
/* https://bugzilla.redhat.com/show_bug.cgi?id=1525104 */
/* Note the P55A-UD3 and Z87-D3HP share the subsys id for the HDA dev */


Re: [PATCH v5 2/7] pwm: pca9685: Support hardware readout

2021-01-29 Thread Clemens Gruber
Hi Sven,

On Fri, Jan 29, 2021 at 08:42:13AM -0500, Sven Van Asbroeck wrote:
> On Mon, Jan 11, 2021 at 3:35 PM Uwe Kleine-König
>  wrote:
> >
> > My position here is: A consumer should disable a PWM before calling
> > pwm_put. The driver should however not enforce this and so should not
> > modify the hardware state in .free().
> >
> > Also .probe should not change the PWM configuration.
> 
> I agree that this is the most user-friendly behaviour.
> 
> The problem however with the pca9685 is that it has many degrees of
> freedom: there are many possible register values which produce the same
> physical chip outputs.
> 
> This could lead to a situation where, if .probe() does not reset the register
> values, subsequent writes may lead to different outputs than expected.
> 
> One possible solution is to write .get_state() so that it always reads the
> correct state, even if "unconventional" register settings are present, i.e.
> those written by an outside entity, e.g. a bootloader. Then write that
> state back using driver conventions.
> 
> This may be trickier than it sounds - after all we've learnt that the pca9685
> looks simple on the surface, but is actually quite challenging to get right.
> 
> Clemens, Uwe, what do you think?

Ok, so you suggest we extend our get_state logic to deal with cases
like the following:
If neither full OFF nor full ON is set && ON == OFF, we should probably
set the full OFF bit to disable the PWM and log a warning message?
(e.g. "invalid register setting detected: pwm disabled" ?)
If the ON registers are set and the nxp,staggered-outputs property is
not, I'd calculate (off - on) & 4095, set the OFF register to that value
and clear the ON register.

And then call our get_state in .probe, followed by a write of the
resulting / fixed-up state?

This would definitely solve the problem of invalid/unconventional values
set by the bootloader and avoid inconsistencies.
Sounds good to me!

If Thierry and Uwe have no objections, I can send out a new round of
patches in the upcoming weeks.

My current goal is to get the changes into 5.13.

Thanks,
Clemens


Re: dax alignment problem on arm64 (and other achitectures)

2021-01-29 Thread Pavel Tatashin
On Fri, Jan 29, 2021 at 9:51 AM Joao Martins  wrote:
>
> Hey Pavel,
>
> On 1/29/21 1:50 PM, Pavel Tatashin wrote:
> >> Since we last talked about this the enabling for EFI "Special Purpose"
> >> / Soft Reserved Memory has gone upstream and instantiates device-dax
> >> instances for address ranges marked with EFI_MEMORY_SP attribute.
> >> Critically this way of declaring device-dax removes the consideration
> >> of it as persistent memory and as such no metadata reservation. So, if
> >> you are willing to maintain the metadata external to the device (which
> >> seems reasonable for your environment) and have your platform firmware
> >> / kernel command line mark it as EFI_CONVENTIONAL_MEMORY +
> >> EFI_MEMORY_SP, then these reserve-free dax-devices will surface.
> >
> > Hi Dan,
> >
> > This is cool. Does it allow conversion between devdax and fsdax so DAX
> > aware filesystem can be installed and data can be put there to be
> > preserved across the reboot?
> >
>
> fwiw wrt to the 'preserved across kexec' part, you are going to need
> something conceptually similar to snippet below the scissors mark.
> Alternatively, we could fix kexec userspace to add conventional memory
> ranges (without the SP attribute part) when it sees a Soft-Reserved region.
> But can't tell which one is the right thing to do.

Hi Joao,

Is not it just a matter of appending arguments to the kernel parameter
during kexec reboot with Soft-Reserved region specified, or am I
missing something? I understand with fileload kexec syscall we might
accidently load segments onto reserved region, but with the original
kexec syscall, where we can specify destinations for each segment that
should not be a problem with today's kexec tools.

I agree that preserving it automatically as you are proposing, would
make more sense, instead of fiddling with kernel parameters and
segment destinations.

Thank you,
Pasha

>
> At the moment, HMAT ranges (or those defined with efi_fake_mem=) aren't
> preserved not because of anything special with HMAT, but simply because
> the EFI memmap conventional ram ranges are not preserved (only runtime
> services). And HMAT/efi_fake_mem expects these to based on EFI memmap.
>
> >8--
>
> From: Joao Martins 
> Subject: x86/efi: add Conventional Memory ranges to runtime-map
>
> Through EFI/HMAT certain ranges are marked with Specific Purpose
> EFI attribute (EFI_MEMORY_SP). These ranges are usually
> specified in a memory descriptor of type Conventional Memory.
>
> We only ever expose regions to the runtime-map that were marked
> with efi_mem_reserve(). Currently these comprise the Runtime
> Data/Code and Boot data. Everything else gets lost, so on a kexec
> boot, if we had an HMAT (or efi_fake_mem= marked regions) the second
> kernel kexec will lose this information, and expose this memory
> as regular RAM.
>
> To address that, let's add the Conventional Memory ranges from the
> firmware EFI memory map to the runtime. kexec then picks these up
> on kexec load. Specifically, we save the fw memmap first, and when
> we enter EFI virtual mode which on x86 is the latest point where
> we filter the EFI memmap to construct one with only runtime services.
>
> Signed-off-by: Joao Martins 
> ---
> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> index 8a26e705cb06..c244da8b185d 100644
> --- a/arch/x86/platform/efi/efi.c
> +++ b/arch/x86/platform/efi/efi.c
> @@ -663,6 +663,53 @@ static bool should_map_region(efi_memory_desc_t *md)
> return false;
>  }
>
> +static void __init efi_fw_memmap_restore(void **map, int left,
> +int *count, int *pg_shift)
> +{
> +   struct efi_memory_map_data *data = &efi_fw_memmap;
> +   void *fw_memmap, *new_memmap = *map;
> +   unsigned long desc_size;
> +   int i, nr_map;
> +
> +   if (!data->phys_map)
> +   return;
> +
> +   /* create new EFI memmap */
> +   fw_memmap = early_memremap(data->phys_map, data->size);
> +   if (!fw_memmap) {
> +   return;
> +   }
> +
> +   desc_size = data->desc_size;
> +   nr_map = data->size / desc_size;
> +
> +   for (i = 0; i < nr_map; i++) {
> +   efi_memory_desc_t *md = efi_early_memdesc_ptr(fw_memmap,
> +   desc_size, i);
> +
> +   if (md->type != EFI_CONVENTIONAL_MEMORY)
> +   continue;
> +
> +   if (left < desc_size) {
> +   new_memmap = realloc_pages(new_memmap, *pg_shift);
> +   if (!new_memmap) {
> +   early_memunmap(fw_memmap, data->size);
> +   return;
> +   }
> +
> +   left += PAGE_SIZE << *pg_shift;
> +   (*pg_shift)++;
> +   }
> +
> +   memcpy(new_memmap + (*count * desc_size), md, desc_size);
> +
> +   

Re: [PATCH 1/3] kvfree_rcu: Allocate a page for a single argument

2021-01-29 Thread Uladzislau Rezki
On Fri, Jan 29, 2021 at 09:56:29AM +0100, Michal Hocko wrote:
> On Thu 28-01-21 19:02:37, Uladzislau Rezki wrote:
> [...]
> > >From 0bdb8ca1ae62088790e0a452c4acec3821e06989 Mon Sep 17 00:00:00 2001
> > From: "Uladzislau Rezki (Sony)" 
> > Date: Wed, 20 Jan 2021 17:21:46 +0100
> > Subject: [PATCH v2 1/1] kvfree_rcu: Directly allocate page for 
> > single-argument
> >  case
> > 
> > Single-argument kvfree_rcu() must be invoked from sleepable contexts,
> > so we can directly allocate pages.  Furthermmore, the fallback in case
> > of page-allocation failure is the high-latency synchronize_rcu(), so it
> > makes sense to do these page allocations from the fastpath, and even to
> > permit limited sleeping within the allocator.
> > 
> > This commit therefore allocates if needed on the fastpath using
> > GFP_KERNEL|__GFP_NORETRY.
> 
> Yes, __GFP_NORETRY as a lightweight allocation mode should be fine. It
> is more robust than __GFP_NOWAIT on memory usage spikes.  The caller is
> prepared to handle the failure which is likely much less disruptive than
> OOM or potentially heavy reclaim __GFP_RETRY_MAYFAIL.
> 
> I cannot give you ack as I am not familiar with the code but this makes
> sense to me.
> 
No problem, i can separate it. We can have a patch on top of what we have so
far. The patch only modifies the gfp_mask passed to __get_free_pages():

>From ec2feaa9b7f55f73b3b17e9ac372151c1aab5ae0 Mon Sep 17 00:00:00 2001
From: "Uladzislau Rezki (Sony)" 
Date: Fri, 29 Jan 2021 17:16:03 +0100
Subject: [PATCH 1/1] kvfree_rcu: replace __GFP_RETRY_MAYFAIL by __GFP_NORETRY

__GFP_RETRY_MAYFAIL is a bit heavy from reclaim process of view,
therefore a time consuming. That is not optional and there is
no need in doing it so hard, because we have a fallback path.

__GFP_NORETRY in its turn can perform some light-weight reclaim
and it rather fails under high memory pressure or low memory
condition.

In general there are four simple criterias we we would like to
achieve:
a) minimize a fallback hitting;
b) avoid of OOM invoking;
c) do a light-wait page request;
d) avoid of dipping into the emergency reserves.

Signed-off-by: Uladzislau Rezki (Sony) 
---
 kernel/rcu/tree.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 70ddc339e0b7..1e862120db9e 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3489,8 +3489,20 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
bnode = get_cached_bnode(*krcp);
if (!bnode && can_alloc) {
krc_this_cpu_unlock(*krcp, *flags);
+
+   // __GFP_NORETRY - allows a light-weight direct reclaim
+   // what is OK from minimizing of fallback hitting point 
of
+   // view. Apart of that it forbids any OOM invoking what 
is
+   // also beneficial since we are about to release memory 
soon.
+   //
+   // __GFP_NOMEMALLOC - prevents from consuming of all the
+   // memory reserves. Please note we have a fallback path.
+   //
+   // __GFP_NOWARN - it is supposed that an allocation can
+   // be failed under low memory or high memory pressure
+   // scenarios.
bnode = (struct kvfree_rcu_bulk_data *)
-   __get_free_page(GFP_KERNEL | 
__GFP_RETRY_MAYFAIL | __GFP_NOMEMALLOC | __GFP_NOWARN);
+   __get_free_page(GFP_KERNEL | __GFP_NORETRY | 
__GFP_NOMEMALLOC | __GFP_NOWARN);
*krcp = krc_this_cpu_lock(flags);
}
 
-- 
2.20.1

--
Vlad Rezki


Re: [net-next PATCH v4 01/15] Documentation: ACPI: DSD: Document MDIO PHY

2021-01-29 Thread Rafael J. Wysocki
On Fri, Jan 29, 2021 at 7:48 AM Calvin Johnson
 wrote:
>
> On Thu, Jan 28, 2021 at 02:27:00PM +0100, Rafael J. Wysocki wrote:
> > On Thu, Jan 28, 2021 at 2:12 PM Calvin Johnson
> >  wrote:
> > >
> > > On Thu, Jan 28, 2021 at 01:00:40PM +0100, Rafael J. Wysocki wrote:
> > > > On Thu, Jan 28, 2021 at 12:27 PM Calvin Johnson
> > > >  wrote:
> > > > >
> > > > > Hi Rafael,
> > > > >
> > > > > Thanks for the review. I'll work on all the comments.
> > > > >
> > > > > On Fri, Jan 22, 2021 at 08:22:21PM +0100, Rafael J. Wysocki wrote:
> > > > > > On Fri, Jan 22, 2021 at 4:43 PM Calvin Johnson
> > > > > >  wrote:
> > > > > > >
> > > > > > > Introduce ACPI mechanism to get PHYs registered on a MDIO bus and
> > > > > > > provide them to be connected to MAC.
> > > > > > >
> > > > > > > Describe properties "phy-handle" and "phy-mode".
> > > > > > >
> > > > > > > Signed-off-by: Calvin Johnson 
> > > > > > > ---
> > > > > > >
> > > > > > > Changes in v4:
> > > > > > > - More cleanup
> > > > > >
> > > > > > This looks much better that the previous versions IMV, some nits 
> > > > > > below.
> > > > > >
> > > > > > > Changes in v3: None
> > > > > > > Changes in v2:
> > > > > > > - Updated with more description in document
> > > > > > >
> > > > > > >  Documentation/firmware-guide/acpi/dsd/phy.rst | 129 
> > > > > > > ++
> > > > > > >  1 file changed, 129 insertions(+)
> > > > > > >  create mode 100644 Documentation/firmware-guide/acpi/dsd/phy.rst
> > > > > > >
> > > > > > > diff --git a/Documentation/firmware-guide/acpi/dsd/phy.rst 
> > > > > > > b/Documentation/firmware-guide/acpi/dsd/phy.rst
> > > > > > > new file mode 100644
> > > > > > > index ..76fca994bc99
> > > > > > > --- /dev/null
> > > > > > > +++ b/Documentation/firmware-guide/acpi/dsd/phy.rst
> > > > > > > @@ -0,0 +1,129 @@
> > > > > > > +.. SPDX-License-Identifier: GPL-2.0
> > > > > > > +
> > > > > > > +=
> > > > > > > +MDIO bus and PHYs in ACPI
> > > > > > > +=
> > > > > > > +
> > > > > > > +The PHYs on an MDIO bus [1] are probed and registered using
> > > > > > > +fwnode_mdiobus_register_phy().
> > > > > >
> > > > > > Empty line here, please.
> > > > > >
> > > > > > > +Later, for connecting these PHYs to MAC, the PHYs registered on 
> > > > > > > the
> > > > > > > +MDIO bus have to be referenced.
> > > > > > > +
> > > > > > > +The UUID given below should be used as mentioned in the "Device 
> > > > > > > Properties
> > > > > > > +UUID For _DSD" [2] document.
> > > > > > > +   - UUID: daffd814-6eba-4d8c-8a91-bc9bbf4aa301
> > > > > >
> > > > > > I would drop the above paragraph.
> > > > > >
> > > > > > > +
> > > > > > > +This document introduces two _DSD properties that are to be used
> > > > > > > +for PHYs on the MDIO bus.[3]
> > > > > >
> > > > > > I'd say "for connecting PHYs on the MDIO bus [3] to the MAC layer."
> > > > > > above and add the following here:
> > > > > >
> > > > > > "These properties are defined in accordance with the "Device
> > > > > > Properties UUID For _DSD" [2] document and the
> > > > > > daffd814-6eba-4d8c-8a91-bc9bbf4aa301 UUID must be used in the Device
> > > > > > Data Descriptors containing them."
> > > > > >
> > > > > > > +
> > > > > > > +phy-handle
> > > > > > > +--
> > > > > > > +For each MAC node, a device property "phy-handle" is used to 
> > > > > > > reference
> > > > > > > +the PHY that is registered on an MDIO bus. This is mandatory for
> > > > > > > +network interfaces that have PHYs connected to MAC via MDIO bus.
> > > > > > > +
> > > > > > > +During the MDIO bus driver initialization, PHYs on this bus are 
> > > > > > > probed
> > > > > > > +using the _ADR object as shown below and are registered on the 
> > > > > > > MDIO bus.
> > > > > >
> > > > > > Do you want to mention the "reg" property here?  I think it would be
> > > > > > useful to do that.
> > > > >
> > > > > No. I think we should adhere to _ADR in MDIO case. The "reg" property 
> > > > > for ACPI
> > > > > may be useful for other use cases that Andy is aware of.
> > > >
> > > > The code should reflect this, then.  I mean it sounds like you want to
> > > > check the "reg" property only if this is a non-ACPI node.
> > >
> > > Right. For MDIO case, that is what is required.
> > > "reg" for DT and "_ADR" for ACPI.
> > >
> > > However, Andy pointed out [1] that ACPI nodes can also hold reg property 
> > > and
> > > therefore, fwnode_get_id() need to be capable to handling that situation 
> > > as
> > > well.
> >
> > No, please don't confuse those two things.
> >
> > Yes, ACPI nodes can also hold a "reg" property, but the meaning of it
> > depends on the binding which is exactly my point: _ADR is not a
> > fallback replacement for "reg" in general and it is not so for MDIO
> > too.  The new function as proposed doesn't match the MDIO requirements
> > and so it should not be used for MDIO.
> >
> > For MDIO, the exact flow mentioned above needs to be implemented (and
> > if someone 

Re: Quick review of RCU-related patches in v5.10.8-rt23

2021-01-29 Thread Paul E. McKenney
On Fri, Jan 29, 2021 at 05:11:37PM +0100, Sebastian Andrzej Siewior wrote:
> On 2021-01-28 11:50:37 [-0800], Paul E. McKenney wrote:
> > Hello, Sebastian,
> 
> Hi Paul,
> 
> > Just doing my periodic (but decidedly non-real-time) scan of RCU-related
> > patches in -rt, in this case v5.10.8-rt23:
> > 
> > f3541b467fbb ("sched: Do not account rcu_preempt_depth on RT in 
> > might_sleep()")
> > If the scheduler maintainers are OK with their part of this patch,
> > looks good to me, given CONFIG_PREEMPT_RT.  Feel free to add:
> > Acked-by: Paul E. McKenney 
> 
> Thank. I think we should pump it together with the rt-mutex part. But I
> add a note.
> 
> > d8c5a7d75e08 ("rcutorture: Avoid problematic critical section nesting on 
> > RT")
> > This one I need to understand better.  I do like the use of local
> > variables to make the "if" conditions less unruly.
> 
> This originated in
>   https://lkml.kernel.org/r/20190911165729.11178-6-sw...@redhat.com
> 
> I planned to post it upstream last cycle but it appears that it broke
> apart and I did not yet look how to fix it.

I do recall the discussion, I just need to get up to speed on the
details.  ;-)

> > The rest are in -rcu already:
> > 
> > a163ef8687a1 ("rcu: make RCU_BOOST default on RT")
> > Commit 2341bc4a0311 in -rcu.  In yesterday's pull request.
> > 5ffd75a96828 ("rcu: Use rcuc threads on PREEMPT_RT as we did")
> > Commit 8b9a0ecc7ef5 in -rcu.  In yesterday's pull request.
> > e0b671bca2e7 ("rcu: enable rcu_normal_after_boot by default for RT")
> > Commit 36221e109eb2 in -rcu.  In yesterday's pull request.
> > e27ef68731a1 ("rcu: Don't invoke try_invoke_on_locked_down_task() with irqs 
> > disabled")
> > This one is in v5.10 mainline.
> 
>  \o/
>  
> > Any reason I shouldn't pull in db93e2f1b4b0 ("rcu: Prevent false positive
> > softirq warning on RT") for v5.13?
> 
> tglx has a version of that with your Reviewed-by tag on it in this
> softirq tree waiting. So I guess just sit it out ;)

Works for me!

Thanx, Paul

> Thank you for looking Paul.
> > Thanx, Paul
> 
> Sebastian


[PATCH] lib: crc8: Pointer to data block should be const

2021-01-29 Thread Richard Fitzgerald
crc8() does not change the data passed to it, so the pointer argument
should be declared const. This avoids callers that receive const data
having to cast it to a non-const pointer to call crc8().

Signed-off-by: Richard Fitzgerald 
---
 include/linux/crc8.h | 2 +-
 lib/crc8.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/crc8.h b/include/linux/crc8.h
index 13c8dabb0441..674045c59a04 100644
--- a/include/linux/crc8.h
+++ b/include/linux/crc8.h
@@ -96,6 +96,6 @@ void crc8_populate_msb(u8 table[CRC8_TABLE_SIZE], u8 
polynomial);
  * Williams, Ross N., rossross.net
  * (see URL http://www.ross.net/crc/download/crc_v3.txt).
  */
-u8 crc8(const u8 table[CRC8_TABLE_SIZE], u8 *pdata, size_t nbytes, u8 crc);
+u8 crc8(const u8 table[CRC8_TABLE_SIZE], const u8 *pdata, size_t nbytes, u8 
crc);
 
 #endif /* __CRC8_H_ */
diff --git a/lib/crc8.c b/lib/crc8.c
index 595a5a75e3cd..1ad8e501d9b6 100644
--- a/lib/crc8.c
+++ b/lib/crc8.c
@@ -71,7 +71,7 @@ EXPORT_SYMBOL(crc8_populate_lsb);
  * @nbytes: number of bytes in data buffer.
  * @crc: previous returned crc8 value.
  */
-u8 crc8(const u8 table[CRC8_TABLE_SIZE], u8 *pdata, size_t nbytes, u8 crc)
+u8 crc8(const u8 table[CRC8_TABLE_SIZE], const u8 *pdata, size_t nbytes, u8 
crc)
 {
/* loop over the buffer data */
while (nbytes-- > 0)
-- 
2.20.1



[git pull] IOMMU Fixes for Linux v5.11-rc5

2021-01-29 Thread Joerg Roedel
Hi Linus,

The following changes since commit 6ee1d745b7c9fd573fba142a2efdad76a9f1cb04:

  Linux 5.11-rc5 (2021-01-24 16:47:14 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git 
tags/iommu-fixes-v5.11-rc5

for you to fetch changes up to 29b32839725f8c89a41cb6ee054c85f3116ea8b5:

  iommu/vt-d: Do not use flush-queue when caching-mode is on (2021-01-28 
13:59:02 +0100)


IOMMU Fixes for Linux v5.11-rc5

Including:

- AMD IOMMU Fix to make sure features are detected before they
  are queried.

- Intel IOMMU address alignment check fix for an IOLTB flushing
  command.

- Performance fix for Intel IOMMU to make sure the code does not
  do full IOTLB flushes all the time. Those flushes are very
  expensive on emulated IOMMUs.


Lu Baolu (1):
  iommu/vt-d: Correctly check addr alignment in qi_flush_dev_iotlb_pasid()

Nadav Amit (1):
  iommu/vt-d: Do not use flush-queue when caching-mode is on

Suravee Suthikulpanit (1):
  iommu/amd: Use IVHD EFR for early initialization of IOMMU features

 drivers/iommu/amd/amd_iommu.h   |  7 ++---
 drivers/iommu/amd/amd_iommu_types.h |  4 +++
 drivers/iommu/amd/init.c| 56 +++--
 drivers/iommu/intel/dmar.c  |  2 +-
 drivers/iommu/intel/iommu.c | 32 -
 5 files changed, 92 insertions(+), 9 deletions(-)

Please pull.

Thanks,

Joerg


signature.asc
Description: Digital signature


Re: [PATCH V3 0/6] x86: don't abuse tss.sp1

2021-01-29 Thread Borislav Petkov
On Fri, Jan 29, 2021 at 11:35:46PM +0800, Lai Jiangshan wrote:
> Any feedback?

Yes: be patient please.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [net-next PATCH v4 01/15] Documentation: ACPI: DSD: Document MDIO PHY

2021-01-29 Thread Rafael J. Wysocki
On Fri, Jan 29, 2021 at 5:37 PM Rafael J. Wysocki  wrote:
>
> On Fri, Jan 29, 2021 at 7:48 AM Calvin Johnson
>  wrote:
> >
> > On Thu, Jan 28, 2021 at 02:27:00PM +0100, Rafael J. Wysocki wrote:
> > > On Thu, Jan 28, 2021 at 2:12 PM Calvin Johnson
> > >  wrote:
> > > >
> > > > On Thu, Jan 28, 2021 at 01:00:40PM +0100, Rafael J. Wysocki wrote:
> > > > > On Thu, Jan 28, 2021 at 12:27 PM Calvin Johnson
> > > > >  wrote:
> > > > > >
> > > > > > Hi Rafael,
> > > > > >
> > > > > > Thanks for the review. I'll work on all the comments.
> > > > > >
> > > > > > On Fri, Jan 22, 2021 at 08:22:21PM +0100, Rafael J. Wysocki wrote:
> > > > > > > On Fri, Jan 22, 2021 at 4:43 PM Calvin Johnson
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > Introduce ACPI mechanism to get PHYs registered on a MDIO bus 
> > > > > > > > and
> > > > > > > > provide them to be connected to MAC.
> > > > > > > >
> > > > > > > > Describe properties "phy-handle" and "phy-mode".
> > > > > > > >
> > > > > > > > Signed-off-by: Calvin Johnson 
> > > > > > > > ---
> > > > > > > >
> > > > > > > > Changes in v4:
> > > > > > > > - More cleanup
> > > > > > >
> > > > > > > This looks much better that the previous versions IMV, some nits 
> > > > > > > below.
> > > > > > >
> > > > > > > > Changes in v3: None
> > > > > > > > Changes in v2:
> > > > > > > > - Updated with more description in document
> > > > > > > >
> > > > > > > >  Documentation/firmware-guide/acpi/dsd/phy.rst | 129 
> > > > > > > > ++
> > > > > > > >  1 file changed, 129 insertions(+)
> > > > > > > >  create mode 100644 
> > > > > > > > Documentation/firmware-guide/acpi/dsd/phy.rst
> > > > > > > >
> > > > > > > > diff --git a/Documentation/firmware-guide/acpi/dsd/phy.rst 
> > > > > > > > b/Documentation/firmware-guide/acpi/dsd/phy.rst
> > > > > > > > new file mode 100644
> > > > > > > > index ..76fca994bc99
> > > > > > > > --- /dev/null
> > > > > > > > +++ b/Documentation/firmware-guide/acpi/dsd/phy.rst
> > > > > > > > @@ -0,0 +1,129 @@
> > > > > > > > +.. SPDX-License-Identifier: GPL-2.0
> > > > > > > > +
> > > > > > > > +=
> > > > > > > > +MDIO bus and PHYs in ACPI
> > > > > > > > +=
> > > > > > > > +
> > > > > > > > +The PHYs on an MDIO bus [1] are probed and registered using
> > > > > > > > +fwnode_mdiobus_register_phy().
> > > > > > >
> > > > > > > Empty line here, please.
> > > > > > >
> > > > > > > > +Later, for connecting these PHYs to MAC, the PHYs registered 
> > > > > > > > on the
> > > > > > > > +MDIO bus have to be referenced.
> > > > > > > > +
> > > > > > > > +The UUID given below should be used as mentioned in the 
> > > > > > > > "Device Properties
> > > > > > > > +UUID For _DSD" [2] document.
> > > > > > > > +   - UUID: daffd814-6eba-4d8c-8a91-bc9bbf4aa301
> > > > > > >
> > > > > > > I would drop the above paragraph.
> > > > > > >
> > > > > > > > +
> > > > > > > > +This document introduces two _DSD properties that are to be 
> > > > > > > > used
> > > > > > > > +for PHYs on the MDIO bus.[3]
> > > > > > >
> > > > > > > I'd say "for connecting PHYs on the MDIO bus [3] to the MAC 
> > > > > > > layer."
> > > > > > > above and add the following here:
> > > > > > >
> > > > > > > "These properties are defined in accordance with the "Device
> > > > > > > Properties UUID For _DSD" [2] document and the
> > > > > > > daffd814-6eba-4d8c-8a91-bc9bbf4aa301 UUID must be used in the 
> > > > > > > Device
> > > > > > > Data Descriptors containing them."
> > > > > > >
> > > > > > > > +
> > > > > > > > +phy-handle
> > > > > > > > +--
> > > > > > > > +For each MAC node, a device property "phy-handle" is used to 
> > > > > > > > reference
> > > > > > > > +the PHY that is registered on an MDIO bus. This is mandatory 
> > > > > > > > for
> > > > > > > > +network interfaces that have PHYs connected to MAC via MDIO 
> > > > > > > > bus.
> > > > > > > > +
> > > > > > > > +During the MDIO bus driver initialization, PHYs on this bus 
> > > > > > > > are probed
> > > > > > > > +using the _ADR object as shown below and are registered on the 
> > > > > > > > MDIO bus.
> > > > > > >
> > > > > > > Do you want to mention the "reg" property here?  I think it would 
> > > > > > > be
> > > > > > > useful to do that.
> > > > > >
> > > > > > No. I think we should adhere to _ADR in MDIO case. The "reg" 
> > > > > > property for ACPI
> > > > > > may be useful for other use cases that Andy is aware of.
> > > > >
> > > > > The code should reflect this, then.  I mean it sounds like you want to
> > > > > check the "reg" property only if this is a non-ACPI node.
> > > >
> > > > Right. For MDIO case, that is what is required.
> > > > "reg" for DT and "_ADR" for ACPI.
> > > >
> > > > However, Andy pointed out [1] that ACPI nodes can also hold reg 
> > > > property and
> > > > therefore, fwnode_get_id() need to be capable to handling that 
> > > > situation as
> > > > well.
> > >
> > > No, please don't confuse those two things.
> > >
>

[PATCH net] rxrpc: Fix deadlock around release of dst cached on udp tunnel

2021-01-29 Thread David Howells
AF_RXRPC sockets use UDP ports in encap mode.  This causes socket and dst
from an incoming packet to get stolen and attached to the UDP socket from
whence it is leaked when that socket is closed.

When a network namespace is removed, the wait for dst records to be cleaned
up happens before the cleanup of the rxrpc and UDP socket, meaning that the
wait never finishes.

Fix this by moving the rxrpc (and, by dependence, the afs) private
per-network namespace registrations to the device group rather than subsys
group.  This allows cached rxrpc local endpoints to be cleared and their
UDP sockets closed before we try waiting for the dst records.

The symptom is that lines looking like the following:

unregister_netdevice: waiting for lo to become free

get emitted at regular intervals after running something like the
referenced syzbot test.

Thanks to Vadim for tracking this down and work out the fix.

Reported-by: syzbot+df400f2f24a1677cd...@syzkaller.appspotmail.com
Reported-by: Vadim Fedorenko 
Fixes: 5271953cad31 ("rxrpc: Use the UDP encap_rcv hook")
Signed-off-by: David Howells 
---

 fs/afs/main.c|6 +++---
 net/rxrpc/af_rxrpc.c |6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/afs/main.c b/fs/afs/main.c
index accdd8970e7c..b2975256dadb 100644
--- a/fs/afs/main.c
+++ b/fs/afs/main.c
@@ -193,7 +193,7 @@ static int __init afs_init(void)
goto error_cache;
 #endif
 
-   ret = register_pernet_subsys(&afs_net_ops);
+   ret = register_pernet_device(&afs_net_ops);
if (ret < 0)
goto error_net;
 
@@ -213,7 +213,7 @@ static int __init afs_init(void)
 error_proc:
afs_fs_exit();
 error_fs:
-   unregister_pernet_subsys(&afs_net_ops);
+   unregister_pernet_device(&afs_net_ops);
 error_net:
 #ifdef CONFIG_AFS_FSCACHE
fscache_unregister_netfs(&afs_cache_netfs);
@@ -244,7 +244,7 @@ static void __exit afs_exit(void)
 
proc_remove(afs_proc_symlink);
afs_fs_exit();
-   unregister_pernet_subsys(&afs_net_ops);
+   unregister_pernet_device(&afs_net_ops);
 #ifdef CONFIG_AFS_FSCACHE
fscache_unregister_netfs(&afs_cache_netfs);
 #endif
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 0a2f4817ec6c..41671af6b33f 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -990,7 +990,7 @@ static int __init af_rxrpc_init(void)
goto error_security;
}
 
-   ret = register_pernet_subsys(&rxrpc_net_ops);
+   ret = register_pernet_device(&rxrpc_net_ops);
if (ret)
goto error_pernet;
 
@@ -1035,7 +1035,7 @@ static int __init af_rxrpc_init(void)
 error_sock:
proto_unregister(&rxrpc_proto);
 error_proto:
-   unregister_pernet_subsys(&rxrpc_net_ops);
+   unregister_pernet_device(&rxrpc_net_ops);
 error_pernet:
rxrpc_exit_security();
 error_security:
@@ -1057,7 +1057,7 @@ static void __exit af_rxrpc_exit(void)
unregister_key_type(&key_type_rxrpc);
sock_unregister(PF_RXRPC);
proto_unregister(&rxrpc_proto);
-   unregister_pernet_subsys(&rxrpc_net_ops);
+   unregister_pernet_device(&rxrpc_net_ops);
ASSERTCMP(atomic_read(&rxrpc_n_tx_skbs), ==, 0);
ASSERTCMP(atomic_read(&rxrpc_n_rx_skbs), ==, 0);
 




Re: [PATCH 1/3] serial: 8250: Handle UART without interrupt on TEMT using em485

2021-01-29 Thread Eric Tremblay
On 2021-01-29 6:22 a.m., Andy Shevchenko wrote:
> On Thu, Jan 28, 2021 at 06:36:27PM -0500, Eric Tremblay wrote:
>> The patch introduce the UART_CAP_TEMT capability which is by default
>> assigned to all 8250 UART since the code assume that device has the
>> interrupt on TEMT
> You have missed periods in the sentences here and there. Please, check the
> grammar and punctuation everywhere.
>
>> In the case where the device does not support it, we calculate the
>> maximum of time it could take for the transmitter to empty the
> maximum time
>
>> shift register. When we get in the situation where we get the
>> THRE interrupt but the TEMT bit is not set we start the timer
>> and recall __stop_tx after the delay
> __stop_tx()

I will review the grammar and spelling, thanks for mentioning it

>
> ...
>
>>  /* initialize data */
>> -up.capabilities = UART_CAP_FIFO | UART_CAP_MINI;
>> +data->uart.capabilities = UART_CAP_FIFO | UART_CAP_MINI | UART_CAP_TEMT;
> I didn't get, if you state that CAP_TEMT is default on all UARTs, why you have
> this?

It's a merge mistake, sorry for that. The next version will use the 
reverse capability like Jiri Slaby suggested, there will be no needs to
modify other driver.

>
>> -up.capabilities = UART_CAP_FIFO;
>> +up.capabilities = UART_CAP_FIFO | UART_CAP_TEMT;
> And so this?
>
> ...
>
>> +static inline void serial8250_em485_update_temt_delay(struct uart_8250_port 
>> *p,
>> +unsigned int cflag, unsigned int baud)
>> +{
>> +unsigned int bits;
>> +
>> +if (!p->em485)
>> +return;
>> +
>> +/* byte size and parity */
>> +switch (cflag & CSIZE) {
>> +case CS5:
>> +bits = 7;
>> +break;
>> +case CS6:
>> +bits = 8;
>> +break;
>> +case CS7:
>> +bits = 9;
>> +break;
>> +default:
>> +bits = 10;
>> +break; /* CS8 */
>> +}
>> +
>> +if (cflag & CSTOPB)
>> +bits++;
>> +if (cflag & PARENB)
>> +bits++;
> This is repetition of uart_update_timeout(). Find a way to deduplicate.
>
>> +p->em485->no_temt_delay = bits*100/baud;
> Use spaces.
> Is this magic should be defined as HZ_PER_MHZ?
>
>> +}
> ...
>
>> +static void start_hrtimer_us(struct hrtimer *hrt, unsigned long usec)
>> +{
>> +long sec = usec / 100;
>> +long nsec = (usec % 100) * 1000;
>
> USEC_PER_SEC
> NSEC_PER_USEC
>
>> +ktime_t t = ktime_set(sec, nsec);
>> +
>> +hrtimer_start(hrt, t, HRTIMER_MODE_REL);
>> +}
> ...
>
>> +if ((lsr & BOTH_EMPTY) != BOTH_EMPTY) {
>> +/*
>> + * On devices with no interrupt on TEMT available
> "with no TEMT interrupt available"
>
>> + * start a timer for a byte time, the timer will recall
>> + * __stop_tx
> __stop_tx().
>
>> + */
>> +if (!(p->capabilities & UART_CAP_TEMT) && (lsr & 
>> UART_LSR_THRE)) {
>> +em485->active_timer = &em485->no_temt_timer;
>> +start_hrtimer_us(&em485->no_temt_timer, 
>> em485->no_temt_delay);
>> +}
> Perhaps
>   if ((p->capabilities & UART_CAP_TEMT) && (lsr & 
> UART_LSR_THRE))
>   return;
>
>   em485->active_timer = &em485->no_temt_timer;
>   start_hrtimer_us(&em485->no_temt_timer, 
> em485->no_temt_delay);
>
> ?

I also prefer that form, I will apply it in next version

>
>>  return;
>> +}




Re: [PATCH v9 1/7] ACPI: scan: Obtain device's desired enumeration power state

2021-01-29 Thread Sakari Ailus
Hi Rafael,

Thanks for the comments.

On Fri, Jan 29, 2021 at 03:07:57PM +0100, Rafael J. Wysocki wrote:
> On Fri, Jan 29, 2021 at 12:27 AM Sakari Ailus
>  wrote:
> >
> > Store a device's desired enumeration power state in struct
> > acpi_device_power_flags during acpi_device object's initialisation.
> >
> > Signed-off-by: Sakari Ailus 
> > ---
> >  drivers/acpi/scan.c | 6 ++
> >  include/acpi/acpi_bus.h | 3 ++-
> >  2 files changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> > index 1d7a02ee45e05..b077c645c9845 100644
> > --- a/drivers/acpi/scan.c
> > +++ b/drivers/acpi/scan.c
> > @@ -987,6 +987,8 @@ static void acpi_bus_init_power_state(struct 
> > acpi_device *device, int state)
> >
> >  static void acpi_bus_get_power_flags(struct acpi_device *device)
> >  {
> > +   unsigned long long pre;
> > +   acpi_status status;
> > u32 i;
> >
> > /* Presence of _PS0|_PR0 indicates 'power manageable' */
> > @@ -1008,6 +1010,10 @@ static void acpi_bus_get_power_flags(struct 
> > acpi_device *device)
> > if (acpi_has_method(device->handle, "_DSW"))
> > device->power.flags.dsw_present = 1;
> >
> > +   status = acpi_evaluate_integer(device->handle, "_PRE", NULL, &pre);
> > +   if (ACPI_SUCCESS(status) && !pre)
> > +   device->power.flags.allow_low_power_probe = 1;
> 
> While this is what has been discussed and thanks for taking it into
> account, I'm now thinking that it may be cleaner to introduce a new
> object to return the deepest power state of the device in which it can
> be enumerated, say _DSE (Device State for Enumeration) such that 4
> means D3cold, 3 - D3hot and so on, so the above check can be replaced
> with something like
> 
> status = acpi_evaluate_integer(device->handle, "_PRE", NULL, &dse);

s/_PRE/_DSE/

?

> if (ACPI_FAILURE(status))

ACPI_SUCCESS?

> device->power.state_for_enumeratin = dse;
> 
> And then, it is a matter of comparing ->power.state_for_enumeratin
> with ->power.state and putting the device into D0 if the former is
> shallower than the latter.
> 
> What do you think?

Sounds good. How about calling the function e.g.
acpi_device_resume_for_probe(), so runtime PM could be used to resume the
device if the function returns true?

-- 
Kind regards,

Sakari Ailus


linux-next-20210129: drivers/media/platform/mtk-vcodec/

2021-01-29 Thread Randy Dunlap
on i386:

ld: drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_pm.o: in function 
`mtk_vcodec_dec_clock_on':
mtk_vcodec_dec_pm.c:(.text+0xff): undefined reference to `mtk_smi_larb_get'
ld: drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_pm.o: in function 
`mtk_vcodec_dec_clock_off':
mtk_vcodec_dec_pm.c:(.text+0x180): undefined reference to `mtk_smi_larb_put'
ld: drivers/media/platform/mtk-vcodec/mtk_vcodec_enc_pm.o: in function 
`mtk_vcodec_enc_clock_on':
mtk_vcodec_enc_pm.c:(.text+0xd0): undefined reference to `mtk_smi_larb_get'
ld: mtk_vcodec_enc_pm.c:(.text+0xf3): undefined reference to `mtk_smi_larb_get'
ld: mtk_vcodec_enc_pm.c:(.text+0x114): undefined reference to `mtk_smi_larb_put'
ld: drivers/media/platform/mtk-vcodec/mtk_vcodec_enc_pm.o: in function 
`mtk_vcodec_enc_clock_off':
mtk_vcodec_enc_pm.c:(.text+0x181): undefined reference to `mtk_smi_larb_put'
ld: mtk_vcodec_enc_pm.c:(.text+0x189): undefined reference to `mtk_smi_larb_put'

Full randconfig file is attached.

-- 
~Randy
Reported-by: Randy Dunlap 



config-r7503.gz
Description: application/gzip


Re: [PATCH] x86: Disable CET instrumentation in the kernel

2021-01-29 Thread Josh Poimboeuf
On Fri, Jan 29, 2021 at 05:30:48PM +0100, Borislav Petkov wrote:
> On Fri, Jan 29, 2021 at 09:10:34AM -0600, Josh Poimboeuf wrote:
> > Maybe eventually.  But the enablement (actually enabling CET/CFI/etc)
> > happens in the arch code anyway, right?  So it could be a per-arch
> > decision.
> 
> Right.
> 
> Ok, for this one, what about
> 
> Cc: 
> 
> ?
> 
> What are "some configurations of GCC"? If it can be reproduced with
> what's released out there, maybe that should go in now, even for 5.11?
> 
> Hmm?

Agreed, stable is a good idea.   I think Nikolay saw it with GCC 9.

-- 
Josh



[PATCH v2] selinux: measure state and policy capabilities

2021-01-29 Thread Lakshmi Ramasubramanian
SELinux stores the configuration state and the policy capabilities
in kernel memory.  Changes to this data at runtime would have an impact
on the security guarantees provided by SELinux.  Measuring this data
through IMA subsystem provides a tamper-resistant way for
an attestation service to remotely validate it at runtime.

Measure the configuration state and policy capabilities by calling
the IMA hook ima_measure_critical_data().

To enable SELinux data measurement, the following steps are required:

 1, Add "ima_policy=critical_data" to the kernel command line arguments
to enable measuring SELinux data at boot time.
For example,
  BOOT_IMAGE=/boot/vmlinuz-5.11.0-rc3+ 
root=UUID=fd643309-a5d2-4ed3-b10d-3c579a5fab2f ro nomodeset security=selinux 
ima_policy=critical_data

 2, Add the following rule to /etc/ima/ima-policy
   measure func=CRITICAL_DATA label=selinux

Sample measurement of SELinux state and policy capabilities:

10 2122...65d8 ima-buf sha256:13c2...1292 selinux-state 696e...303b

Execute the following command to extract the measured data
from the IMA's runtime measurements list:

  grep "selinux-state" 
/sys/kernel/security/integrity/ima/ascii_runtime_measurements | tail -1 | cut 
-d' ' -f 6 | xxd -r -p

The output should be a list of key-value pairs. For example,
 
initialized=1;enforcing=0;checkreqprot=1;network_peer_controls=1;open_perms=1;extended_socket_class=1;always_check_network=0;cgroup_seclabel=1;nnp_nosuid_transition=1;genfs_seclabel_symlinks=0;

To verify the measurement is consistent with the current SELinux state
reported on the system, compare the integer values in the following
files with those set in the IMA measurement (using the following commands):

 - cat /sys/fs/selinux/enforce
 - cat /sys/fs/selinux/checkreqprot
 - cat /sys/fs/selinux/policy_capabilities/[capability_file]

Note that the actual verification would be against an expected state
and done on a separate system (likely an attestation server) requiring
"initialized=1;enforcing=1;checkreqprot=0;"
for a secure state and then whatever policy capabilities are actually
set in the expected policy (which can be extracted from the policy
itself via seinfo, for example).

Signed-off-by: Lakshmi Ramasubramanian 
Suggested-by: Stephen Smalley 
Suggested-by: Paul Moore 
---
 security/selinux/ima.c | 77 --
 security/selinux/include/ima.h |  6 +++
 security/selinux/selinuxfs.c   |  6 +++
 security/selinux/ss/services.c |  2 +-
 4 files changed, 86 insertions(+), 5 deletions(-)

diff --git a/security/selinux/ima.c b/security/selinux/ima.c
index 03715893ff97..5c7f73cd1117 100644
--- a/security/selinux/ima.c
+++ b/security/selinux/ima.c
@@ -13,18 +13,73 @@
 #include "ima.h"
 
 /*
- * selinux_ima_measure_state - Measure hash of the SELinux policy
+ * selinux_ima_collect_state - Read selinux configuration settings
  *
- * @state: selinux state struct
+ * @state: selinux_state
  *
- * NOTE: This function must be called with policy_mutex held.
+ * On success returns the configuration settings string.
+ * On error, returns NULL.
  */
-void selinux_ima_measure_state(struct selinux_state *state)
+static char *selinux_ima_collect_state(struct selinux_state *state)
+{
+   const char *on = "=1;", *off = "=0;";
+   char *buf;
+   int buf_len, i;
+
+   /*
+* Size of the following string including the terminating NULL char
+*initialized=0;enforcing=0;checkreqprot=0;
+*/
+   buf_len = 42;
+   for (i = 0; i < __POLICYDB_CAPABILITY_MAX; i++)
+   buf_len += strlen(selinux_policycap_names[i]) + 3;
+
+   buf = kzalloc(buf_len, GFP_KERNEL);
+   if (!buf)
+   return NULL;
+
+   strscpy(buf, "initialized", buf_len);
+   strlcat(buf, selinux_initialized(state) ? on : off, buf_len);
+
+   strlcat(buf, "enforcing", buf_len);
+   strlcat(buf, enforcing_enabled(state) ? on : off, buf_len);
+
+   strlcat(buf, "checkreqprot", buf_len);
+   strlcat(buf, checkreqprot_get(state) ? on : off, buf_len);
+
+   for (i = 0; i < __POLICYDB_CAPABILITY_MAX; i++) {
+   strlcat(buf, selinux_policycap_names[i], buf_len);
+   strlcat(buf, state->policycap[i] ? on : off, buf_len);
+   }
+
+   return buf;
+}
+
+/*
+ * selinux_ima_measure_state_locked - Measure SELinux state and hash of policy
+ *
+ * @state: selinux state struct
+ */
+void selinux_ima_measure_state_locked(struct selinux_state *state)
 {
+   char *state_str = NULL;
void *policy = NULL;
size_t policy_len;
int rc = 0;
 
+   WARN_ON(!mutex_is_locked(&state->policy_mutex));
+
+   state_str = selinux_ima_collect_state(state);
+   if (!state_str) {
+   pr_err("SELinux: %s: failed to read state.\n", __func__);
+   return;
+   }
+
+   ima_measure_critical_data("selinux", "selinux-state",
+ state_str, strlen(state_str),

Re: [PATCH] pwm: fix semicolon.cocci warnings

2021-01-29 Thread Vladimir Zapolskiy

On 1/28/21 10:57 PM, Uwe Kleine-König wrote:

Hello,

On Thu, Jan 28, 2021 at 09:45:37PM +0800, kernel test robot wrote:

From: kernel test robot 

drivers/pwm/pwm-lpc18xx-sct.c:292:2-3: Unneeded semicolon


  Remove unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

Fixes: e96c0ff4b1e0 ("pwm: Enable compile testing for some of drivers")


This looks wrong. e96c0ff4b1e0 only touches drivers/pwm/Kconfig.

The ; was introduced by commit 841e6f90bb78 ("pwm: NXP LPC18xx PWM/SCT
driver")


Right, thank you for the correction, Uwe.

Since the patch has been composed by the robot, it has to be fixed
in the first place.

And regarding this particular change and in general fixes to this type
of issues detected by the robot, I don't think that it earns a Fixes tag.


CC: Krzysztof Kozlowski 
Reported-by: kernel test robot 
Signed-off-by: kernel test robot 


--
Best wishes,
Vladimir


Re: [PATCH] x86: Disable CET instrumentation in the kernel

2021-01-29 Thread Nikolay Borisov



On 29.01.21 г. 18:49 ч., Josh Poimboeuf wrote:
> Agreed, stable is a good idea.   I think Nikolay saw it with GCC 9.


Yes I did, with the default Ubuntu compiler as well as the default gcc-10 
compiler: 

# gcc -v -Q -O2 --help=target | grep protection

gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04) 
COLLECT_GCC_OPTIONS='-v' '-Q' '-O2' '--help=target' '-mtune=generic' 
'-march=x86-64'
 /usr/lib/gcc/x86_64-linux-gnu/9/cc1 -v -imultiarch x86_64-linux-gnu help-dummy 
-dumpbase help-dummy -mtune=generic -march=x86-64 -auxbase help-dummy -O2 
-version --help=target -fasynchronous-unwind-tables -fstack-protector-strong 
-Wformat -Wformat-security -fstack-clash-protection -fcf-protection -o 
/tmp/ccSecttk.s
GNU C17 (Ubuntu 9.3.0-17ubuntu1~20.04) version 9.3.0 (x86_64-linux-gnu)
compiled by GNU C version 9.3.0, GMP version 6.2.0, MPFR version 4.0.2, 
MPC version 1.1.0, isl version isl-0.22.1-GMP


It has -fcf-protection turned on by default it seems. 


Re: [PATCH v9 1/7] ACPI: scan: Obtain device's desired enumeration power state

2021-01-29 Thread Rafael J. Wysocki
On Fri, Jan 29, 2021 at 5:45 PM Sakari Ailus
 wrote:
>
> Hi Rafael,
>
> Thanks for the comments.
>
> On Fri, Jan 29, 2021 at 03:07:57PM +0100, Rafael J. Wysocki wrote:
> > On Fri, Jan 29, 2021 at 12:27 AM Sakari Ailus
> >  wrote:
> > >
> > > Store a device's desired enumeration power state in struct
> > > acpi_device_power_flags during acpi_device object's initialisation.
> > >
> > > Signed-off-by: Sakari Ailus 
> > > ---
> > >  drivers/acpi/scan.c | 6 ++
> > >  include/acpi/acpi_bus.h | 3 ++-
> > >  2 files changed, 8 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> > > index 1d7a02ee45e05..b077c645c9845 100644
> > > --- a/drivers/acpi/scan.c
> > > +++ b/drivers/acpi/scan.c
> > > @@ -987,6 +987,8 @@ static void acpi_bus_init_power_state(struct 
> > > acpi_device *device, int state)
> > >
> > >  static void acpi_bus_get_power_flags(struct acpi_device *device)
> > >  {
> > > +   unsigned long long pre;
> > > +   acpi_status status;
> > > u32 i;
> > >
> > > /* Presence of _PS0|_PR0 indicates 'power manageable' */
> > > @@ -1008,6 +1010,10 @@ static void acpi_bus_get_power_flags(struct 
> > > acpi_device *device)
> > > if (acpi_has_method(device->handle, "_DSW"))
> > > device->power.flags.dsw_present = 1;
> > >
> > > +   status = acpi_evaluate_integer(device->handle, "_PRE", NULL, 
> > > &pre);
> > > +   if (ACPI_SUCCESS(status) && !pre)
> > > +   device->power.flags.allow_low_power_probe = 1;
> >
> > While this is what has been discussed and thanks for taking it into
> > account, I'm now thinking that it may be cleaner to introduce a new
> > object to return the deepest power state of the device in which it can
> > be enumerated, say _DSE (Device State for Enumeration) such that 4
> > means D3cold, 3 - D3hot and so on, so the above check can be replaced
> > with something like
> >
> > status = acpi_evaluate_integer(device->handle, "_PRE", NULL, &dse);
>
> s/_PRE/_DSE/
>
> ?

Yes, sorry.

>
> > if (ACPI_FAILURE(status))
>
> ACPI_SUCCESS?

Yup.

> > device->power.state_for_enumeratin = dse;
> >
> > And then, it is a matter of comparing ->power.state_for_enumeratin
> > with ->power.state and putting the device into D0 if the former is
> > shallower than the latter.
> >
> > What do you think?
>
> Sounds good. How about calling the function e.g.
> acpi_device_resume_for_probe(), so runtime PM could be used to resume the
> device if the function returns true?

I'd rather try to power it up before enabling runtime PM, because in
order to do the latter properly, you need to know if the device is
active or suspended to start with.

So you need something like (pseudo-code)

if (this_device_needs_to_be_on(ACPI_COMPANION(dev))) {
   acpi_device_set_power(ACPI_COMPANION(dev), ACPI_STATE_D0);
   pm_runtime_set_active(dev);
} else {
   pm_runtime_set_suspended(dev);
}

and then you can enable PM-runtime.


Re: [PATCH v2] KVM: x86: Allow guests to see MSR_IA32_TSX_CTRL even if tsx=off

2021-01-29 Thread Sean Christopherson
On Fri, Jan 29, 2021, Paolo Bonzini wrote:
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 76bce832cade..15733013b266 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1401,7 +1401,7 @@ static u64 kvm_get_arch_capabilities(void)
>*This lets the guest use VERW to clear CPU buffers.


This comment be updated to call out the new TSX_CTRL behavior.

/*
 * On TAA affected systems:
 *  - nothing to do if TSX is disabled on the host.
 *  - we emulate TSX_CTRL if present on the host.
 *This lets the guest use VERW to clear CPU buffers.
 */

>*/
>   if (!boot_cpu_has(X86_FEATURE_RTM))
> - data &= ~(ARCH_CAP_TAA_NO | ARCH_CAP_TSX_CTRL_MSR);
> + data &= ~ARCH_CAP_TAA_NO;

Hmm, simply clearing TSX_CTRL will only preserve the host value.  Since
ARCH_CAPABILITIES is unconditionally emulated by KVM, wouldn't it make sense to
unconditionally expose TSX_CTRL as well, as opposed to exposing it only if it's
supported in the host?  I.e. allow migrating a TSX-disabled guest to a host
without TSX.  Or am I misunderstanding how TSX_CTRL is checked/used?

>   else if (!boot_cpu_has_bug(X86_BUG_TAA))
>   data |= ARCH_CAP_TAA_NO;
>  
> -- 
> 2.26.2
> 


Re: [PATCH] bus: mvebu-mbus: make iounmap() symmetric with ioremap()

2021-01-29 Thread Gregory CLEMENT
Hi,

> On Fri, 29 Jan 2021 17:01:35 +0100
> Gregory CLEMENT  wrote:
>
>> Could you sent me the patch I don't have it in my emails boxes.
>
> https://lore.kernel.org/lkml/20201112032149.21906-1-chris.pack...@alliedtelesis.co.nz/raw


Applied on mvebu/arm

Thanks,

Gregory

>
> Thomas
> -- 
> Thomas Petazzoni, CTO, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com

-- 
Gregory Clement, Bootlin
Embedded Linux and Kernel engineering
http://bootlin.com


Re: [PATCH v2] x86/debug: Fix DR6 handling

2021-01-29 Thread Tom de Vries
On 1/29/21 3:48 PM, Borislav Petkov wrote:
> On Thu, Jan 28, 2021 at 10:16:27PM +0100, Peter Zijlstra wrote:
>>
>> Tom reported that one of the GDB test-cases failed, and Boris bisected
>> it to commit:
>>
>>   d53d9bc0cf78 ("x86/debug: Change thread.debugreg6 to thread.virtual_dr6")
>>
>> The debugging session led us to commit:
>>
>>   6c0aca288e72 ("x86: Ignore trap bits on single step exceptions")
>>
>> It turns out that TF and data breakpoints are both traps and will be
>> merged, while instruction breakpoints are faults and will not be
>> merged. This means 6c0aca288e72 is wrong, we only need to exclude TF
>> and instruction breakpoints while we can merge TF and data
>> breakpoints.
>>
>> Fixes: d53d9bc0cf78 ("x86/debug: Change thread.debugreg6 to 
>> thread.virtual_dr6")
>> Fixes: 6c0aca288e72 ("x86: Ignore trap bits on single step exceptions")
>> Reported-by: Tom de Vries 
>> Bisected-by: Borislav Petkov 
>> Signed-off-by: Peter Zijlstra (Intel) 
> 
> I guess
> 
> Cc: 
> 
> Also,
> 
> Reviewed-by: Borislav Petkov 
> 
> And gdb testsuite is a bit happier:
> 
> --- before
> +++ after
>  === gdb Summary ===
>  
> -# of expected passes70822
> -# of unexpected failures899
> +# of expected passes70852
> +# of unexpected failures869
>  # of expected failures  74
>  # of known failures 99
>  # of untested testcases 114
> 
> You just fixed 30(!) testcases.
> 
> :-)
> 

Hi Boris,

thanks for testing this, and just to confirm: the total number of
regressions I see in the gdb testsuite related to watchpoints is indeed 30.

Thanks,
- Tom


linux-next-20210129: drivers/iommu/intel/dmar.c

2021-01-29 Thread Randy Dunlap
on x86_64:

../drivers/iommu/intel/dmar.c: In function 'qi_submit_sync':
../drivers/iommu/intel/dmar.c:1311:3: error: implicit declaration of function 
'trace_qi_submit'; did you mean 'ftrace_nmi_exit'? 
[-Werror=implicit-function-declaration]
   trace_qi_submit(iommu, desc[i].qw0, desc[i].qw1,
   ^~~
   ftrace_nmi_exit

Full randconfig file is attached.

-- 
~Randy
Reported-by: Randy Dunlap 



config-r7511.gz
Description: application/gzip


Re: [PATCH v3 2/3] perf/smmuv3: Add a MODULE_SOFTDEP() to indicate dependency on SMMU

2021-01-29 Thread Robin Murphy

On 2021-01-29 15:34, John Garry wrote:

On 29/01/2021 15:12, Robin Murphy wrote:

On 2021-01-27 11:32, Zhen Lei wrote:
The MODULE_SOFTDEP() gives user space a hint of the loading sequence. 
And
when command "modprobe arm_smmuv3_pmu" is executed, the 
arm_smmu_v3.ko is

automatically loaded in advance.


Why do we need this? If probe order doesn't matter when both drivers 
are built-in, why should module load order?


TBH I'm not sure why we even have a Kconfig dependency on ARM_SMMU_V3, 
given that the drivers operate completely independently :/


Can that Kconfig dependency just be removed? I think that it was added 
under the idea that there is no point in having the SMMUv3 PMU driver 
without the SMMUv3 driver.


A PMCG *might* be usable for simply counting transactions to measure 
device activity regardless of its associated SMMU being enabled. Either 
way, it's not really Kconfig's job to decide what makes sense (beyond 
the top-level "can this driver *ever* be used on this platform" 
visibility choices). Imagine if we gave every PCI/USB/etc. device driver 
an explicit dependency on at least one host controller driver being 
enabled...


Robin.


Re: [PATCH] x86: Disable CET instrumentation in the kernel

2021-01-29 Thread Josh Poimboeuf
On Fri, Jan 29, 2021 at 06:54:08PM +0200, Nikolay Borisov wrote:
> 
> 
> On 29.01.21 г. 18:49 ч., Josh Poimboeuf wrote:
> > Agreed, stable is a good idea.   I think Nikolay saw it with GCC 9.
> 
> 
> Yes I did, with the default Ubuntu compiler as well as the default gcc-10 
> compiler: 
> 
> # gcc -v -Q -O2 --help=target | grep protection
> 
> gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04) 
> COLLECT_GCC_OPTIONS='-v' '-Q' '-O2' '--help=target' '-mtune=generic' 
> '-march=x86-64'
>  /usr/lib/gcc/x86_64-linux-gnu/9/cc1 -v -imultiarch x86_64-linux-gnu 
> help-dummy -dumpbase help-dummy -mtune=generic -march=x86-64 -auxbase 
> help-dummy -O2 -version --help=target -fasynchronous-unwind-tables 
> -fstack-protector-strong -Wformat -Wformat-security -fstack-clash-protection 
> -fcf-protection -o /tmp/ccSecttk.s
> GNU C17 (Ubuntu 9.3.0-17ubuntu1~20.04) version 9.3.0 (x86_64-linux-gnu)
>   compiled by GNU C version 9.3.0, GMP version 6.2.0, MPFR version 4.0.2, 
> MPC version 1.1.0, isl version isl-0.22.1-GMP
> 
> 
> It has -fcf-protection turned on by default it seems. 

Yup, explains why I didn't see it:

gcc version 10.2.1 20201125 (Red Hat 10.2.1-9) (GCC)
COLLECT_GCC_OPTIONS='-v' '-Q' '-O2' '--help=target' '-mtune=generic' 
'-march=x86-64'
 /usr/libexec/gcc/x86_64-redhat-linux/10/cc1 -v help-dummy -dumpbase help-dummy 
-mtune=generic -march=x86-64 -auxbase help-dummy -O2 -version --help=target -o 
/tmp/cclBz55H.s


-- 
Josh



Re: [v5 PATCH 04/11] mm: vmscan: remove memcg_shrinker_map_size

2021-01-29 Thread Yang Shi
On Fri, Jan 29, 2021 at 3:22 AM Vlastimil Babka  wrote:
>
> On 1/28/21 10:22 PM, Yang Shi wrote:
> >> > @@ -266,12 +265,13 @@ int alloc_shrinker_maps(struct mem_cgroup *memcg)
> >> >  static int expand_shrinker_maps(int new_id)
> >> >  {
> >> >   int size, old_size, ret = 0;
> >> > + int new_nr_max = new_id + 1;
> >> >   struct mem_cgroup *memcg;
> >> >
> >> > - size = DIV_ROUND_UP(new_id + 1, BITS_PER_LONG) * sizeof(unsigned 
> >> > long);
> >> > - old_size = memcg_shrinker_map_size;
> >> > + size = (new_nr_max / BITS_PER_LONG + 1) * sizeof(unsigned long);
> >> > + old_size = (shrinker_nr_max / BITS_PER_LONG + 1) * sizeof(unsigned 
> >> > long);
> >>
> >> What's wrong with using DIV_ROUND_UP() here?
> >
> > I don't think there is anything wrong with DIV_ROUND_UP. Should be
> > just different taste and result in shorter statement.
>
> IMHO it's not just taste. DIV_ROUND_UP() says what it does and you don't need 
> to
> guess it from the math expression. Also your expression is shorter as it 
> simply
> adds + 1, so if shrinker_nr_max is a multiple of BITS_PER_LONG, there's an 
> extra
> unsigned long that shouldn't be needed. People reading that code will wonder
> whether there was some non-obvious intention behind that, and possibly send
> cleanup patches.

OK, will replace back to DIV_ROUND_UP(). And, a helper macro is
introduced in patch #6, will add that helper in this patch and use
DIV_ROUND_UP() in the helper.

>
> >>
> >> >   if (size <= old_size)
> >> > - return 0;
> >> > + goto out;
> >>
> >> Can this even happen? Seems to me it can't, so just remove this?
> >
> > Yes, it can. The maps use unsigned long value for bitmap, so any
> > shrinker ID < 31 would fall into the same unsigned long, so we may see
> > size <= old_size, but we need increase shrinker_nr_max since
> > expand_shrinker_maps() is called iff id >= shrinker_nr_max.
>
> Ah, good point.


Re: [PATCH v18 24/25] x86/cet/shstk: Add arch_prctl functions for shadow stack

2021-01-29 Thread Dave Hansen
On 1/27/21 1:25 PM, Yu-cheng Yu wrote:
> arch_prctl(ARCH_X86_CET_STATUS, u64 *args)
> Get CET feature status.
> 
> The parameter 'args' is a pointer to a user buffer.  The kernel returns
> the following information:
> 
> *args = shadow stack/IBT status
> *(args + 1) = shadow stack base address
> *(args + 2) = shadow stack size

What's the deal for 32-bit binaries?  The in-kernel code looks 64-bit
only, but I don't see anything restricting the interface to 64-bit.

> +static int copy_status_to_user(struct cet_status *cet, u64 arg2)

This has static scope, but it's still awfully generically named.  A cet_
prefix would be nice.

> +{
> + u64 buf[3] = {0, 0, 0};
> +
> + if (cet->shstk_size) {
> + buf[0] |= GNU_PROPERTY_X86_FEATURE_1_SHSTK;
> + buf[1] = (u64)cet->shstk_base;
> + buf[2] = (u64)cet->shstk_size;

What's the casting for?

> + }
> +
> + return copy_to_user((u64 __user *)arg2, buf, sizeof(buf));
> +}
> +
> +int prctl_cet(int option, u64 arg2)
> +{
> + struct cet_status *cet;
> + unsigned int features;
> +
> + /*
> +  * GLIBC's ENOTSUPP == EOPNOTSUPP == 95, and it does not recognize
> +  * the kernel's ENOTSUPP (524).  So return EOPNOTSUPP here.
> +  */
> + if (!IS_ENABLED(CONFIG_X86_CET))
> + return -EOPNOTSUPP;

Let's ignore glibc for a moment.  What error code *should* the kernel be
returning here?  errno(3) says:

   EOPNOTSUPP  Operation not supported on socket (POSIX.1)
...
   ENOTSUP Operation not supported (POSIX.1)


> + cet = ¤t->thread.cet;
> +
> + if (option == ARCH_X86_CET_STATUS)
> + return copy_status_to_user(cet, arg2);

What's the point of doing copy_status_to_user() if the processor doesn't
support CET?  In other words, shouldn't this be below the CPU feature check?

Also, please cast arg2 *here*.  It becomes a user pointer here, not at
the copy_to_user().

> + if (!static_cpu_has(X86_FEATURE_CET))
> + return -EOPNOTSUPP;

So, you went to the trouble of adding a disabled-features.h entry for
this.  Why not just do:

if (cpu_feature_enabled(X86_FEATURE_CET))
...

instead of the IS_ENABLED() check above?  That should get rid of one of
these if's.

> + switch (option) {
> + case ARCH_X86_CET_DISABLE:
> + if (cet->locked)
> + return -EPERM;
> +
> + features = (unsigned int)arg2;

What's the purpose of this cast?

> + if (features & ~GNU_PROPERTY_X86_FEATURE_1_VALID)
> + return -EINVAL;
> + if (features & GNU_PROPERTY_X86_FEATURE_1_SHSTK)
> + cet_disable_shstk();
> + return 0;

This doesn't enforce that the high bits of arg2 be 0.  Shouldn't we call
them reserved and enforce that they be 0?

> + case ARCH_X86_CET_LOCK:
> + cet->locked = 1;
> + return 0;

This needs to check for and enforce that arg2==0.

> + default:
> + return -ENOSYS;
> + }
> +}


Re: [PATCH] x86: Disable CET instrumentation in the kernel

2021-01-29 Thread Borislav Petkov
On Fri, Jan 29, 2021 at 11:03:31AM -0600, Josh Poimboeuf wrote:
> On Fri, Jan 29, 2021 at 06:54:08PM +0200, Nikolay Borisov wrote:
> > 
> > 
> > On 29.01.21 г. 18:49 ч., Josh Poimboeuf wrote:
> > > Agreed, stable is a good idea.   I think Nikolay saw it with GCC 9.
> > 
> > 
> > Yes I did, with the default Ubuntu compiler as well as the default gcc-10 
> > compiler: 
> > 
> > # gcc -v -Q -O2 --help=target | grep protection
> > 
> > gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04) 
> > COLLECT_GCC_OPTIONS='-v' '-Q' '-O2' '--help=target' '-mtune=generic' 
> > '-march=x86-64'
> >  /usr/lib/gcc/x86_64-linux-gnu/9/cc1 -v -imultiarch x86_64-linux-gnu 
> > help-dummy -dumpbase help-dummy -mtune=generic -march=x86-64 -auxbase 
> > help-dummy -O2 -version --help=target -fasynchronous-unwind-tables 
> > -fstack-protector-strong -Wformat -Wformat-security 
> > -fstack-clash-protection -fcf-protection -o /tmp/ccSecttk.s
> > GNU C17 (Ubuntu 9.3.0-17ubuntu1~20.04) version 9.3.0 (x86_64-linux-gnu)
> > compiled by GNU C version 9.3.0, GMP version 6.2.0, MPFR version 4.0.2, 
> > MPC version 1.1.0, isl version isl-0.22.1-GMP
> > 
> > 
> > It has -fcf-protection turned on by default it seems. 
> 
> Yup, explains why I didn't see it:
> 
> gcc version 10.2.1 20201125 (Red Hat 10.2.1-9) (GCC)
> COLLECT_GCC_OPTIONS='-v' '-Q' '-O2' '--help=target' '-mtune=generic' 
> '-march=x86-64'
>  /usr/libexec/gcc/x86_64-redhat-linux/10/cc1 -v help-dummy -dumpbase 
> help-dummy -mtune=generic -march=x86-64 -auxbase help-dummy -O2 -version 
> --help=target -o /tmp/cclBz55H.s

The fact that you triggered it with an Ubuntu gcc explains why the
original patch adding that switch:

29be86d7f9cb ("kbuild: add -fcf-protection=none when using retpoline flags")

came from a Canonical.

Adding the author to Cc for FYI.

Seth, you can find this thread starting here:

https://lkml.kernel.org/r/20210128215219.6kct3h2eiustncws@treble

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


[PATCH] dmaengine: xilinx_dma: Alloc tx descriptors GFP_NOWAIT

2021-01-29 Thread Richard Fitzgerald
Use GFP_NOWAIT allocation in xilinx_dma_alloc_tx_descriptor().

This is necessary for compatibility with ALSA, which calls
dmaengine_prep_dma_cyclic() from an atomic context.

Signed-off-by: Richard Fitzgerald 
---
 drivers/dma/xilinx/xilinx_dma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c
index 22faea653ea8..fb046af9ac53 100644
--- a/drivers/dma/xilinx/xilinx_dma.c
+++ b/drivers/dma/xilinx/xilinx_dma.c
@@ -800,7 +800,7 @@ xilinx_dma_alloc_tx_descriptor(struct xilinx_dma_chan *chan)
 {
struct xilinx_dma_tx_descriptor *desc;
 
-   desc = kzalloc(sizeof(*desc), GFP_KERNEL);
+   desc = kzalloc(sizeof(*desc), GFP_NOWAIT);
if (!desc)
return NULL;
 
-- 
2.20.1



Re: general protection fault in tomoyo_socket_sendmsg_permission

2021-01-29 Thread Tetsuo Handa
On 2021/01/30 1:05, Shuah Khan wrote:
>> Since "general protection fault in tomoyo_socket_sendmsg_permission" is 
>> caused by
>> unexpectedly resetting ud->tcp_socket to NULL without waiting for tx thread 
>> to
>> terminate, tracing the ordering of events is worth knowing. Even adding
>> schedule_timeout_uninterruptible() to before kernel_sendmsg() might help.
>>
> 
> What about the duplicate bug information that was in my email?
> Did you take a look at that?

I was not aware of the duplicate bugs. It is interesting that
"KASAN: null-ptr-deref Write in event_handler" says that vdev->ud.tcp_tx became 
NULL at

if (vdev->ud.tcp_tx) {

/* this location */

kthread_stop_put(vdev->ud.tcp_tx);
vdev->ud.tcp_tx = NULL;
}

which means that somebody else is unexpectedly resetting vdev->ud.tcp_tx to 
NULL.

If memset() from vhci_device_init() from vhci_start() were unexpectedly called,
all of tcp_socket, tcp_rx, tcp_tx etc. becomes NULL which can explain these 
bugs ?
I'm inclined to report not only tcp_socket but also other fields when 
kernel_sendmsg()
detected that tcp_socket is NULL...


IMPORTANT INVESTMENT INFORMATION..6

2021-01-29 Thread Robert Nelson
ATTENTION;
IMPORTANT INVESTMENT INFORMATION
We have a good investment program going on now.
We have $95m USD for Investment in your Country. We use this opportunity to 
invest you to join the investment program and you will never regret it.
Please kindly invest with us and you will be receiving monthly 
income/return/profit every month.
Reply for more detail.
Thank you Sir.
Robert Nelson.


[PATCH 1/3] dt-bindings: remoteproc: qcom: Add Q6V5 Modem PIL binding for IPQ6018

2021-01-29 Thread Gokul Sriram Palanisamy
Add a new modem compatible string for IPQ6018 SoCs

Signed-off-by: Gokul Sriram Palanisamy 
---
 Documentation/devicetree/bindings/remoteproc/qcom,q6v5.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,q6v5.txt 
b/Documentation/devicetree/bindings/remoteproc/qcom,q6v5.txt
index 69c49c7..7f1d5783 100644
--- a/Documentation/devicetree/bindings/remoteproc/qcom,q6v5.txt
+++ b/Documentation/devicetree/bindings/remoteproc/qcom,q6v5.txt
@@ -9,6 +9,7 @@ on the Qualcomm Hexagon core.
Definition: must be one of:
"qcom,q6v5-pil",
"qcom,ipq8074-wcss-pil"
+   "qcom,ipq6018-wcss-pil"
"qcom,qcs404-wcss-pil"
"qcom,msm8916-mss-pil",
"qcom,msm8974-mss-pil"
@@ -40,6 +41,7 @@ on the Qualcomm Hexagon core.
string:
qcom,q6v5-pil:
qcom,ipq8074-wcss-pil:
+   qcom,ipq6018-wcss-pil:
qcom,qcs404-wcss-pil:
qcom,msm8916-mss-pil:
qcom,msm8974-mss-pil:
@@ -68,6 +70,7 @@ on the Qualcomm Hexagon core.
Value type: 
Definition: The clocks needed depend on the compatible string:
qcom,ipq8074-wcss-pil:
+   qcom,ipq6018-wcss-pil:
no clock names required
qcom,qcs404-wcss-pil:
must be "xo", "gcc_abhs_cbcr", "gcc_abhs_cbcr",
@@ -165,6 +168,7 @@ For the compatible string below the following supplies are 
required:
Value type: 
Definition: The power-domains needed depend on the compatible string:
qcom,ipq8074-wcss-pil:
+   qcom,ipq6018-wcss-pil:
no power-domain names required
qcom,q6v5-pil:
qcom,msm8916-mss-pil:
-- 
2.7.4



[PATCH 3/3] arm64: dts: ipq6018: Update WCSS PIL driver compatible

2021-01-29 Thread Gokul Sriram Palanisamy
Updated WCSS PIL driver node with IPQ6018 specific
compatible to enable SoC specific driver data.

Signed-off-by: Gokul Sriram Palanisamy 
---
 arch/arm64/boot/dts/qcom/ipq6018.dtsi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/qcom/ipq6018.dtsi 
b/arch/arm64/boot/dts/qcom/ipq6018.dtsi
index 9fa5b02..2e6b23b 100644
--- a/arch/arm64/boot/dts/qcom/ipq6018.dtsi
+++ b/arch/arm64/boot/dts/qcom/ipq6018.dtsi
@@ -477,7 +477,7 @@
};
 
q6v5_wcss: remoteproc@cd0 {
-   compatible = "qcom,ipq8074-wcss-pil";
+   compatible = "qcom,ipq6018-wcss-pil";
reg = <0x0 0x0cd0 0x0 0x4040>,
  <0x0 0x004ab000 0x0 0x20>;
reg-names = "qdsp6",
-- 
2.7.4



[PATCH 2/3] remoteproc: qcom: wcss: populate driver data for IPQ6018

2021-01-29 Thread Gokul Sriram Palanisamy
Populate hardcoded param using driver data for IPQ6018 SoCs.

Signed-off-by: Gokul Sriram Palanisamy 
---
 drivers/remoteproc/qcom_q6v5_wcss.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/remoteproc/qcom_q6v5_wcss.c 
b/drivers/remoteproc/qcom_q6v5_wcss.c
index 7c64bfc..bc9531c 100644
--- a/drivers/remoteproc/qcom_q6v5_wcss.c
+++ b/drivers/remoteproc/qcom_q6v5_wcss.c
@@ -965,7 +965,7 @@ static int q6v5_alloc_memory_region(struct q6v5_wcss *wcss)
return 0;
 }
 
-static int ipq8074_init_clock(struct q6v5_wcss *wcss)
+static int ipq_init_clock(struct q6v5_wcss *wcss)
 {
int ret;
 
@@ -1172,7 +1172,7 @@ static int q6v5_wcss_remove(struct platform_device *pdev)
 }
 
 static const struct wcss_data wcss_ipq8074_res_init = {
-   .init_clock = ipq8074_init_clock,
+   .init_clock = ipq_init_clock,
.q6_firmware_name = "IPQ8074/q6_fw.mdt",
.m3_firmware_name = "IPQ8074/m3_fw.mdt",
.crash_reason_smem = WCSS_CRASH_REASON,
@@ -1185,6 +1185,20 @@ static const struct wcss_data wcss_ipq8074_res_init = {
.need_mem_protection = true,
 };
 
+static const struct wcss_data wcss_ipq6018_res_init = {
+   .init_clock = ipq_init_clock,
+   .q6_firmware_name = "IPQ6018/q6_fw.mdt",
+   .m3_firmware_name = "IPQ6018/m3_fw.mdt",
+   .crash_reason_smem = WCSS_CRASH_REASON,
+   .aon_reset_required = true,
+   .wcss_q6_reset_required = true,
+   .bcr_reset_required = false,
+   .ssr_name = "q6wcss",
+   .ops = &q6v5_wcss_ipq8074_ops,
+   .requires_force_stop = true,
+   .need_mem_protection = true,
+};
+
 static const struct wcss_data wcss_qcs404_res_init = {
.init_clock = qcs404_init_clock,
.init_regulator = qcs404_init_regulator,
@@ -1203,6 +1217,7 @@ static const struct wcss_data wcss_qcs404_res_init = {
 
 static const struct of_device_id q6v5_wcss_of_match[] = {
{ .compatible = "qcom,ipq8074-wcss-pil", .data = &wcss_ipq8074_res_init 
},
+   { .compatible = "qcom,ipq6018-wcss-pil", .data = &wcss_ipq6018_res_init 
},
{ .compatible = "qcom,qcs404-wcss-pil", .data = &wcss_qcs404_res_init },
{ },
 };
-- 
2.7.4



[PATCH 0/3] remoteproc: qcom: q6v5-wcss: Add driver data for IPQ6018

2021-01-29 Thread Gokul Sriram Palanisamy
Q6 based WiFi fw loading is supported across
different targets, ex: IPQ8074/QCS404. In order to
support different fw name for IPQ6018, populate
hardcoded param using compatible and driver data. 

Gokul Sriram Palanisamy (3):
  dt-bindings: remoteproc: qcom: Add Q6V5 Modem PIL binding for IPQ6018
  remoteproc: qcom: wcss: populate driver data for IPQ6018
  arm64: dts: ipq6018: Update WCSS PIL driver compatible

 .../devicetree/bindings/remoteproc/qcom,q6v5.txt  |  4 
 arch/arm64/boot/dts/qcom/ipq6018.dtsi |  2 +-
 drivers/remoteproc/qcom_q6v5_wcss.c   | 19 +--
 3 files changed, 22 insertions(+), 3 deletions(-)

-- 
2.7.4



Re: [v5 PATCH 02/11] mm: vmscan: consolidate shrinker_maps handling code

2021-01-29 Thread Yang Shi
On Fri, Jan 29, 2021 at 6:34 AM Kirill Tkhai  wrote:
>
> On 28.01.2021 02:33, Yang Shi wrote:
> > The shrinker map management is not purely memcg specific, it is at the 
> > intersection
> > between memory cgroup and shrinkers.  It's allocation and assignment of a 
> > structure,
> > and the only memcg bit is the map is being stored in a memcg structure.  So 
> > move the
> > shrinker_maps handling code into vmscan.c for tighter integration with 
> > shrinker code,
> > and remove the "memcg_" prefix.  There is no functional change.
> >
> > Signed-off-by: Yang Shi 
> > ---
> >  include/linux/memcontrol.h |  12 ++--
> >  mm/huge_memory.c   |   4 +-
> >  mm/list_lru.c  |   6 +-
> >  mm/memcontrol.c| 130 +
> >  mm/vmscan.c| 130 -
> >  5 files changed, 142 insertions(+), 140 deletions(-)
> >
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index eeb0b52203e9..0ee2924991fb 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -1581,10 +1581,10 @@ static inline bool 
> > mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
> >   return false;
> >  }
> >
> > -extern int memcg_expand_shrinker_maps(int new_id);
> > -
> > -extern void memcg_set_shrinker_bit(struct mem_cgroup *memcg,
> > -int nid, int shrinker_id);
> > +extern int alloc_shrinker_maps(struct mem_cgroup *memcg);
> > +extern void free_shrinker_maps(struct mem_cgroup *memcg);
> > +extern void set_shrinker_bit(struct mem_cgroup *memcg,
> > +  int nid, int shrinker_id);
> >  #else
> >  #define mem_cgroup_sockets_enabled 0
> >  static inline void mem_cgroup_sk_alloc(struct sock *sk) { };
> > @@ -1594,8 +1594,8 @@ static inline bool 
> > mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
> >   return false;
> >  }
> >
> > -static inline void memcg_set_shrinker_bit(struct mem_cgroup *memcg,
> > -   int nid, int shrinker_id)
> > +static inline void set_shrinker_bit(struct mem_cgroup *memcg,
> > + int nid, int shrinker_id)
> >  {
> >  }
> >  #endif
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 9237976abe72..05190d7f32ae 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -2823,8 +2823,8 @@ void deferred_split_huge_page(struct page *page)
> >   ds_queue->split_queue_len++;
> >  #ifdef CONFIG_MEMCG
> >   if (memcg)
> > - memcg_set_shrinker_bit(memcg, page_to_nid(page),
> > -deferred_split_shrinker.id);
> > + set_shrinker_bit(memcg, page_to_nid(page),
> > +  deferred_split_shrinker.id);
> >  #endif
> >   }
> >   spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
> > diff --git a/mm/list_lru.c b/mm/list_lru.c
> > index fe230081690b..628030fa5f69 100644
> > --- a/mm/list_lru.c
> > +++ b/mm/list_lru.c
> > @@ -125,8 +125,8 @@ bool list_lru_add(struct list_lru *lru, struct 
> > list_head *item)
> >   list_add_tail(item, &l->list);
> >   /* Set shrinker bit if the first element was added */
> >   if (!l->nr_items++)
> > - memcg_set_shrinker_bit(memcg, nid,
> > -lru_shrinker_id(lru));
> > + set_shrinker_bit(memcg, nid,
> > +  lru_shrinker_id(lru));
> >   nlru->nr_items++;
> >   spin_unlock(&nlru->lock);
> >   return true;
> > @@ -548,7 +548,7 @@ static void memcg_drain_list_lru_node(struct list_lru 
> > *lru, int nid,
> >
> >   if (src->nr_items) {
> >   dst->nr_items += src->nr_items;
> > - memcg_set_shrinker_bit(dst_memcg, nid, lru_shrinker_id(lru));
> > + set_shrinker_bit(dst_memcg, nid, lru_shrinker_id(lru));
> >   src->nr_items = 0;
> >   }
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index e2de77b5bcc2..f5c9a0d2160b 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -397,130 +397,6 @@ DEFINE_STATIC_KEY_FALSE(memcg_kmem_enabled_key);
> >  EXPORT_SYMBOL(memcg_kmem_enabled_key);
> >  #endif
> >
> > -static int memcg_shrinker_map_size;
> > -static DEFINE_MUTEX(memcg_shrinker_map_mutex);
> > -
> > -static void memcg_free_shrinker_map_rcu(struct rcu_head *head)
> > -{
> > - kvfree(container_of(head, struct memcg_shrinker_map, rcu));
> > -}
> > -
> > -static int memcg_expand_one_shrinker_map(struct mem_cgroup *memcg,
> > -  int size, int old_size)
> > -{
> > - struct memcg_shrinker_map *new, *old;
> > - int nid;
> > -
> > - lockdep_assert_held(&memcg_shrinker_map_mutex);
> > -
> > - for_each_node(nid) {
> > - old = rcu_der

[PATCH 2/3] remoteproc: qcom: wcss: populate driver data for IPQ6018

2021-01-29 Thread Gokul Sriram Palanisamy
Populate hardcoded param using driver data for IPQ6018 SoCs.

Signed-off-by: Gokul Sriram Palanisamy 
---
 drivers/remoteproc/qcom_q6v5_wcss.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/remoteproc/qcom_q6v5_wcss.c 
b/drivers/remoteproc/qcom_q6v5_wcss.c
index 7c64bfc..bc9531c 100644
--- a/drivers/remoteproc/qcom_q6v5_wcss.c
+++ b/drivers/remoteproc/qcom_q6v5_wcss.c
@@ -965,7 +965,7 @@ static int q6v5_alloc_memory_region(struct q6v5_wcss *wcss)
return 0;
 }
 
-static int ipq8074_init_clock(struct q6v5_wcss *wcss)
+static int ipq_init_clock(struct q6v5_wcss *wcss)
 {
int ret;
 
@@ -1172,7 +1172,7 @@ static int q6v5_wcss_remove(struct platform_device *pdev)
 }
 
 static const struct wcss_data wcss_ipq8074_res_init = {
-   .init_clock = ipq8074_init_clock,
+   .init_clock = ipq_init_clock,
.q6_firmware_name = "IPQ8074/q6_fw.mdt",
.m3_firmware_name = "IPQ8074/m3_fw.mdt",
.crash_reason_smem = WCSS_CRASH_REASON,
@@ -1185,6 +1185,20 @@ static const struct wcss_data wcss_ipq8074_res_init = {
.need_mem_protection = true,
 };
 
+static const struct wcss_data wcss_ipq6018_res_init = {
+   .init_clock = ipq_init_clock,
+   .q6_firmware_name = "IPQ6018/q6_fw.mdt",
+   .m3_firmware_name = "IPQ6018/m3_fw.mdt",
+   .crash_reason_smem = WCSS_CRASH_REASON,
+   .aon_reset_required = true,
+   .wcss_q6_reset_required = true,
+   .bcr_reset_required = false,
+   .ssr_name = "q6wcss",
+   .ops = &q6v5_wcss_ipq8074_ops,
+   .requires_force_stop = true,
+   .need_mem_protection = true,
+};
+
 static const struct wcss_data wcss_qcs404_res_init = {
.init_clock = qcs404_init_clock,
.init_regulator = qcs404_init_regulator,
@@ -1203,6 +1217,7 @@ static const struct wcss_data wcss_qcs404_res_init = {
 
 static const struct of_device_id q6v5_wcss_of_match[] = {
{ .compatible = "qcom,ipq8074-wcss-pil", .data = &wcss_ipq8074_res_init 
},
+   { .compatible = "qcom,ipq6018-wcss-pil", .data = &wcss_ipq6018_res_init 
},
{ .compatible = "qcom,qcs404-wcss-pil", .data = &wcss_qcs404_res_init },
{ },
 };
-- 
2.7.4



[PATCH 0/3] remoteproc: qcom: q6v5-wcss: Add driver data for IPQ6018

2021-01-29 Thread Gokul Sriram Palanisamy
Q6 based WiFi fw loading is supported across
different targets, ex: IPQ8074/QCS404. In order to
support different fw name for IPQ6018, populate
hardcoded param using compatible and driver data.

This series depends on
[PATCH v8] remoteproc: qcom: q6v5-wcss: Add support for secure pil

Gokul Sriram Palanisamy (3):
  dt-bindings: remoteproc: qcom: Add Q6V5 Modem PIL binding for IPQ6018
  remoteproc: qcom: wcss: populate driver data for IPQ6018
  arm64: dts: ipq6018: Update WCSS PIL driver compatible

 .../devicetree/bindings/remoteproc/qcom,q6v5.txt  |  4 
 arch/arm64/boot/dts/qcom/ipq6018.dtsi |  2 +-
 drivers/remoteproc/qcom_q6v5_wcss.c   | 19 +--
 3 files changed, 22 insertions(+), 3 deletions(-)

-- 
2.7.4



[PATCH 1/3] dt-bindings: remoteproc: qcom: Add Q6V5 Modem PIL binding for IPQ6018

2021-01-29 Thread Gokul Sriram Palanisamy
Add a new modem compatible string for IPQ6018 SoCs

Signed-off-by: Gokul Sriram Palanisamy 
---
 Documentation/devicetree/bindings/remoteproc/qcom,q6v5.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,q6v5.txt 
b/Documentation/devicetree/bindings/remoteproc/qcom,q6v5.txt
index 69c49c7..7f1d5783 100644
--- a/Documentation/devicetree/bindings/remoteproc/qcom,q6v5.txt
+++ b/Documentation/devicetree/bindings/remoteproc/qcom,q6v5.txt
@@ -9,6 +9,7 @@ on the Qualcomm Hexagon core.
Definition: must be one of:
"qcom,q6v5-pil",
"qcom,ipq8074-wcss-pil"
+   "qcom,ipq6018-wcss-pil"
"qcom,qcs404-wcss-pil"
"qcom,msm8916-mss-pil",
"qcom,msm8974-mss-pil"
@@ -40,6 +41,7 @@ on the Qualcomm Hexagon core.
string:
qcom,q6v5-pil:
qcom,ipq8074-wcss-pil:
+   qcom,ipq6018-wcss-pil:
qcom,qcs404-wcss-pil:
qcom,msm8916-mss-pil:
qcom,msm8974-mss-pil:
@@ -68,6 +70,7 @@ on the Qualcomm Hexagon core.
Value type: 
Definition: The clocks needed depend on the compatible string:
qcom,ipq8074-wcss-pil:
+   qcom,ipq6018-wcss-pil:
no clock names required
qcom,qcs404-wcss-pil:
must be "xo", "gcc_abhs_cbcr", "gcc_abhs_cbcr",
@@ -165,6 +168,7 @@ For the compatible string below the following supplies are 
required:
Value type: 
Definition: The power-domains needed depend on the compatible string:
qcom,ipq8074-wcss-pil:
+   qcom,ipq6018-wcss-pil:
no power-domain names required
qcom,q6v5-pil:
qcom,msm8916-mss-pil:
-- 
2.7.4



Re: [PATCH] dt-bindings: Cleanup standard unit properties

2021-01-29 Thread Alexandre Torgue




On 1/28/21 8:45 PM, Rob Herring wrote:

Properties with standard unit suffixes already have a type and don't need
type definitions. They also default to a single entry, so 'maxItems: 1'
can be dropped.

adi,ad5758 is an oddball which defined an enum of arrays. While a valid
schema, it is simpler as a whole to only define scalar constraints.

Cc: Jean Delvare 
Cc: Guenter Roeck 
Cc: Jonathan Cameron 
Cc: Lars-Peter Clausen 
Cc: Alexandre Torgue 
Cc: Dmitry Torokhov 
Cc: Ulf Hansson 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Sebastian Reichel 
Cc: Mark Brown 
Cc: Alexandre Belloni 
Cc: Greg Kroah-Hartman 
Cc: Serge Semin 
Cc: Wolfram Sang 
Cc: linux-hw...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-in...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: net...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-ser...@vger.kernel.org
Cc: alsa-de...@alsa-project.org
Cc: linux-watch...@vger.kernel.org
Signed-off-by: Rob Herring 
---
  .../devicetree/bindings/arm/cpus.yaml |  1 -
  .../bindings/extcon/wlf,arizona.yaml  |  1 -
  .../bindings/hwmon/adi,ltc2947.yaml   |  1 -
  .../bindings/hwmon/baikal,bt1-pvt.yaml|  8 ++--
  .../devicetree/bindings/hwmon/ti,tmp513.yaml  |  1 -
  .../devicetree/bindings/i2c/i2c-gpio.yaml |  2 -
  .../bindings/i2c/snps,designware-i2c.yaml |  3 --
  .../bindings/iio/adc/maxim,max9611.yaml   |  1 -
  .../bindings/iio/adc/st,stm32-adc.yaml|  1 -
  .../bindings/iio/adc/ti,palmas-gpadc.yaml |  2 -
  .../bindings/iio/dac/adi,ad5758.yaml  | 41 ---
  .../bindings/iio/health/maxim,max30100.yaml   |  1 -
  .../input/touchscreen/touchscreen.yaml|  2 -
  .../bindings/mmc/mmc-controller.yaml  |  1 -
  .../bindings/mmc/mmc-pwrseq-simple.yaml   |  2 -
  .../bindings/net/ethernet-controller.yaml |  2 -
  .../devicetree/bindings/net/snps,dwmac.yaml   |  1 -
  .../bindings/power/supply/battery.yaml|  3 --
  .../bindings/power/supply/bq2515x.yaml|  1 -
  .../bindings/regulator/dlg,da9121.yaml|  1 -
  .../bindings/regulator/fixed-regulator.yaml   |  2 -
  .../devicetree/bindings/rtc/rtc.yaml  |  2 -
  .../devicetree/bindings/serial/pl011.yaml |  2 -
  .../devicetree/bindings/sound/sgtl5000.yaml   |  2 -
  .../bindings/watchdog/watchdog.yaml   |  1 -
  25 files changed, 29 insertions(+), 56 deletions(-)


For stm32:
Acked-by: Alexandre TORGUE 



diff --git a/Documentation/devicetree/bindings/arm/cpus.yaml 
b/Documentation/devicetree/bindings/arm/cpus.yaml
index 14cd727d3c4b..f02fd10de604 100644
--- a/Documentation/devicetree/bindings/arm/cpus.yaml
+++ b/Documentation/devicetree/bindings/arm/cpus.yaml
@@ -232,7 +232,6 @@ properties:
by this cpu (see ./idle-states.yaml).
  
capacity-dmips-mhz:

-$ref: '/schemas/types.yaml#/definitions/uint32'
  description:
u32 value representing CPU capacity (see ./cpu-capacity.txt) in
DMIPS/MHz, relative to highest capacity-dmips-mhz
diff --git a/Documentation/devicetree/bindings/extcon/wlf,arizona.yaml 
b/Documentation/devicetree/bindings/extcon/wlf,arizona.yaml
index 5fe784f487c5..efdf59abb2e1 100644
--- a/Documentation/devicetree/bindings/extcon/wlf,arizona.yaml
+++ b/Documentation/devicetree/bindings/extcon/wlf,arizona.yaml
@@ -85,7 +85,6 @@ properties:
wlf,micd-timeout-ms:
  description:
Timeout for microphone detection, specified in milliseconds.
-$ref: "/schemas/types.yaml#/definitions/uint32"
  
wlf,micd-force-micbias:

  description:
diff --git a/Documentation/devicetree/bindings/hwmon/adi,ltc2947.yaml 
b/Documentation/devicetree/bindings/hwmon/adi,ltc2947.yaml
index eef614962b10..bf04151b63d2 100644
--- a/Documentation/devicetree/bindings/hwmon/adi,ltc2947.yaml
+++ b/Documentation/devicetree/bindings/hwmon/adi,ltc2947.yaml
@@ -49,7 +49,6 @@ properties:
  description:
This property controls the Accumulation Dead band which allows to set 
the
level of current below which no accumulation takes place.
-$ref: /schemas/types.yaml#/definitions/uint32
  maximum: 255
  default: 0
  
diff --git a/Documentation/devicetree/bindings/hwmon/baikal,bt1-pvt.yaml b/Documentation/devicetree/bindings/hwmon/baikal,bt1-pvt.yaml

index 00a6511354e6..5d3ce641fcde 100644
--- a/Documentation/devicetree/bindings/hwmon/baikal,bt1-pvt.yaml
+++ b/Documentation/devicetree/bindings/hwmon/baikal,bt1-pvt.yaml
@@ -73,11 +73,9 @@ properties:
  description: |
Temperature sensor trimming factor. It can be used to manually adjust 
the
temperature measurements within 7.130 degrees Celsius.
-maxItems: 1
-items:
-  default: 0
-  minimum: 0
-  maximum: 7130
+default: 0
+minimum: 0
+maximum: 7130
  
  additionalProperties: false
  
diff --git a/Documentation/devicetree/bindings/hwmon/ti,t

[PATCH 3/3] arm64: dts: ipq6018: Update WCSS PIL driver compatible

2021-01-29 Thread Gokul Sriram Palanisamy
Updated WCSS PIL driver node with IPQ6018 specific
compatible to enable SoC specific driver data.

Signed-off-by: Gokul Sriram Palanisamy 
---
 arch/arm64/boot/dts/qcom/ipq6018.dtsi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/qcom/ipq6018.dtsi 
b/arch/arm64/boot/dts/qcom/ipq6018.dtsi
index 9fa5b02..2e6b23b 100644
--- a/arch/arm64/boot/dts/qcom/ipq6018.dtsi
+++ b/arch/arm64/boot/dts/qcom/ipq6018.dtsi
@@ -477,7 +477,7 @@
};
 
q6v5_wcss: remoteproc@cd0 {
-   compatible = "qcom,ipq8074-wcss-pil";
+   compatible = "qcom,ipq6018-wcss-pil";
reg = <0x0 0x0cd0 0x0 0x4040>,
  <0x0 0x004ab000 0x0 0x20>;
reg-names = "qdsp6",
-- 
2.7.4



[PATCH 3/5] dt-bindings: nvmem: Add bindings for rmem driver

2021-01-29 Thread Srinivas Kandagatla
From: Nicolas Saenz Julienne 

Firmware/co-processors might use reserved memory areas in order to pass
data stemming from an nvmem device otherwise non accessible to Linux.
For example an EEPROM memory only physically accessible to firmware, or
data only accessible early at boot time.

Introduce the dt-bindings to nvmem's rmem.

Signed-off-by: Nicolas Saenz Julienne 
Reviewed-by: Rob Herring 
Signed-off-by: Srinivas Kandagatla 
---
 .../devicetree/bindings/nvmem/rmem.yaml   | 49 +++
 1 file changed, 49 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/nvmem/rmem.yaml

diff --git a/Documentation/devicetree/bindings/nvmem/rmem.yaml 
b/Documentation/devicetree/bindings/nvmem/rmem.yaml
new file mode 100644
index ..1d85a0a30846
--- /dev/null
+++ b/Documentation/devicetree/bindings/nvmem/rmem.yaml
@@ -0,0 +1,49 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/nvmem/rmem.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Reserved Memory Based nvmem Device
+
+maintainers:
+  - Nicolas Saenz Julienne 
+
+allOf:
+  - $ref: "nvmem.yaml#"
+
+properties:
+  compatible:
+items:
+  - enum:
+  - raspberrypi,bootloader-config
+  - const: nvmem-rmem
+
+  no-map:
+$ref: /schemas/types.yaml#/definitions/flag
+description:
+  Avoid creating a virtual mapping of the region as part of the OS'
+  standard mapping of system memory.
+
+required:
+  - compatible
+  - no-map
+
+unevaluatedProperties: false
+
+examples:
+  - |
+reserved-memory {
+#address-cells = <1>;
+#size-cells = <1>;
+
+blconfig: nvram@1000 {
+compatible = "raspberrypi,bootloader-config", 
"nvmem-rmem";
+#address-cells = <1>;
+#size-cells = <1>;
+reg = <0x1000 0x1000>;
+no-map;
+};
+};
+
+...
-- 
2.21.0



[PATCH 2/5] nvmem: imx-iim: Use of_device_get_match_data()

2021-01-29 Thread Srinivas Kandagatla
From: Fabio Estevam 

The retrieval of driver data via of_device_get_match_data() can make
the code simpler.

Use of_device_get_match_data() to simplify the code.

Signed-off-by: Fabio Estevam 
Signed-off-by: Srinivas Kandagatla 
---
 drivers/nvmem/imx-iim.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/nvmem/imx-iim.c b/drivers/nvmem/imx-iim.c
index 701704b87dc9..c86339a7f583 100644
--- a/drivers/nvmem/imx-iim.c
+++ b/drivers/nvmem/imx-iim.c
@@ -96,7 +96,6 @@ MODULE_DEVICE_TABLE(of, imx_iim_dt_ids);
 
 static int imx_iim_probe(struct platform_device *pdev)
 {
-   const struct of_device_id *of_id;
struct device *dev = &pdev->dev;
struct iim_priv *iim;
struct nvmem_device *nvmem;
@@ -111,11 +110,7 @@ static int imx_iim_probe(struct platform_device *pdev)
if (IS_ERR(iim->base))
return PTR_ERR(iim->base);
 
-   of_id = of_match_device(imx_iim_dt_ids, dev);
-   if (!of_id)
-   return -ENODEV;
-
-   drvdata = of_id->data;
+   drvdata = of_device_get_match_data(&pdev->dev);
 
iim->clk = devm_clk_get(dev, NULL);
if (IS_ERR(iim->clk))
-- 
2.21.0



[PATCH 1/5] nvmem: core: Fix a resource leak on error in nvmem_add_cells_from_of()

2021-01-29 Thread Srinivas Kandagatla
From: Dan Carpenter 

This doesn't call of_node_put() on the error path so it leads to a
memory leak.

Fixes: 0749aa25af82 ("nvmem: core: fix regression in of_nvmem_cell_get()")
Signed-off-by: Dan Carpenter 
Signed-off-by: Srinivas Kandagatla 
---
 drivers/nvmem/core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/nvmem/core.c b/drivers/nvmem/core.c
index 177f5bf27c6d..68ae6f24b57f 100644
--- a/drivers/nvmem/core.c
+++ b/drivers/nvmem/core.c
@@ -713,6 +713,7 @@ static int nvmem_add_cells_from_of(struct nvmem_device 
*nvmem)
cell->name, nvmem->stride);
/* Cells already added will be freed later. */
kfree_const(cell->name);
+   of_node_put(cell->np);
kfree(cell);
return -EINVAL;
}
-- 
2.21.0



[PATCH 4/5] nvmem: Add driver to expose reserved memory as nvmem

2021-01-29 Thread Srinivas Kandagatla
From: Nicolas Saenz Julienne 

Firmware/co-processors might use reserved memory areas in order to pass
data stemming from an nvmem device otherwise non accessible to Linux.
For example an EEPROM memory only physically accessible to firmware, or
data only accessible early at boot time.

In order to expose this data to other drivers and user-space, the driver
models the reserved memory area as an nvmem device.

Signed-off-by: Nicolas Saenz Julienne 
Reviewed-by: Rob Herring 
Tested-by: Tim Gover 
Signed-off-by: Srinivas Kandagatla 
---
 drivers/nvmem/Kconfig  |  8 
 drivers/nvmem/Makefile |  2 +
 drivers/nvmem/rmem.c   | 97 ++
 drivers/of/platform.c  |  1 +
 4 files changed, 108 insertions(+)
 create mode 100644 drivers/nvmem/rmem.c

diff --git a/drivers/nvmem/Kconfig b/drivers/nvmem/Kconfig
index 954d3b4a52ab..fecc19b884bf 100644
--- a/drivers/nvmem/Kconfig
+++ b/drivers/nvmem/Kconfig
@@ -270,4 +270,12 @@ config SPRD_EFUSE
  This driver can also be built as a module. If so, the module
  will be called nvmem-sprd-efuse.
 
+config NVMEM_RMEM
+   tristate "Reserved Memory Based Driver Support"
+   help
+ This drivers maps reserved memory into an nvmem device. It might be
+ useful to expose information left by firmware in memory.
+
+ This driver can also be built as a module. If so, the module
+ will be called nvmem-rmem.
 endif
diff --git a/drivers/nvmem/Makefile b/drivers/nvmem/Makefile
index a7c377218341..5376b8e0dae5 100644
--- a/drivers/nvmem/Makefile
+++ b/drivers/nvmem/Makefile
@@ -55,3 +55,5 @@ obj-$(CONFIG_NVMEM_ZYNQMP)+= nvmem_zynqmp_nvmem.o
 nvmem_zynqmp_nvmem-y   := zynqmp_nvmem.o
 obj-$(CONFIG_SPRD_EFUSE)   += nvmem_sprd_efuse.o
 nvmem_sprd_efuse-y := sprd-efuse.o
+obj-$(CONFIG_NVMEM_RMEM)   += nvmem-rmem.o
+nvmem-rmem-y   := rmem.o
diff --git a/drivers/nvmem/rmem.c b/drivers/nvmem/rmem.c
new file mode 100644
index ..b11c3c974b3d
--- /dev/null
+++ b/drivers/nvmem/rmem.c
@@ -0,0 +1,97 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright (C) 2020 Nicolas Saenz Julienne 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct rmem {
+   struct device *dev;
+   struct nvmem_device *nvmem;
+   struct reserved_mem *mem;
+
+   phys_addr_t size;
+};
+
+static int rmem_read(void *context, unsigned int offset,
+void *val, size_t bytes)
+{
+   struct rmem *priv = context;
+   size_t available = priv->mem->size;
+   loff_t off = offset;
+   void *addr;
+   int count;
+
+   /*
+* Only map the reserved memory at this point to avoid potential rogue
+* kernel threads inadvertently modifying it. Based on the current
+* uses-cases for this driver, the performance hit isn't a concern.
+* Nor is likely to be, given the nature of the subsystem. Most nvmem
+* devices operate over slow buses to begin with.
+*
+* An alternative would be setting the memory as RO, set_memory_ro(),
+* but as of Dec 2020 this isn't possible on arm64.
+*/
+   addr = memremap(priv->mem->base, available, MEMREMAP_WB);
+   if (IS_ERR(addr)) {
+   dev_err(priv->dev, "Failed to remap memory region\n");
+   return PTR_ERR(addr);
+   }
+
+   count = memory_read_from_buffer(val, bytes, &off, addr, available);
+
+   memunmap(addr);
+
+   return count;
+}
+
+static int rmem_probe(struct platform_device *pdev)
+{
+   struct nvmem_config config = { };
+   struct device *dev = &pdev->dev;
+   struct reserved_mem *mem;
+   struct rmem *priv;
+
+   priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
+   if (!priv)
+   return -ENOMEM;
+   priv->dev = dev;
+
+   mem = of_reserved_mem_lookup(dev->of_node);
+   if (!mem) {
+   dev_err(dev, "Failed to lookup reserved memory\n");
+   return -EINVAL;
+   }
+   priv->mem = mem;
+
+   config.dev = dev;
+   config.priv = priv;
+   config.name = "rmem";
+   config.size = mem->size;
+   config.reg_read = rmem_read;
+
+   return PTR_ERR_OR_ZERO(devm_nvmem_register(dev, &config));
+}
+
+static const struct of_device_id rmem_match[] = {
+   { .compatible = "nvmem-rmem", },
+   { /* sentinel */ },
+};
+MODULE_DEVICE_TABLE(of, rmem_match);
+
+static struct platform_driver rmem_driver = {
+   .probe = rmem_probe,
+   .driver = {
+   .name = "rmem",
+   .of_match_table = rmem_match,
+   },
+};
+module_platform_driver(rmem_driver);
+
+MODULE_AUTHOR("Nicolas Saenz Julienne ");
+MODULE_DESCRIPTION("Reserved Memory Based nvmem Driver");
+MODULE_LICENSE("GPL");
diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 79bd5f5a1bf1..6699cdbe58b6 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -511,6 +511,7 

[PATCH 5/5] nvmem: core: skip child nodes not matching binding

2021-01-29 Thread Srinivas Kandagatla
From: Ahmad Fatoum 

The nvmem cell binding applies to all eeprom child nodes matching
"^.*@[0-9a-f]+$" without taking a compatible into account.

Linux drivers, like at24, are even more extensive and assume
_all_ at24 eeprom child nodes to be nvmem cells since e888d445ac33
("nvmem: resolve cells from DT at registration time").

Since df5f3b6f5357 ("dt-bindings: nvmem: stm32: new property for
data access"), the additionalProperties: True means it's Ok to have
other properties as long as they don't match "^.*@[0-9a-f]+$".

The barebox bootloader extends the MTD partitions binding to
EEPROM and can fix up following device tree node:

  &eeprom {
partitions {
  compatible = "fixed-partitions";
};
  };

This is allowed binding-wise, but drivers using nvmem_register()
like at24 will fail to parse because the function expects all child
nodes to have a reg property present. This results in the whole
EEPROM driver probe failing despite the device tree being correct.

Fix this by skipping nodes lacking a reg property instead of
returning an error. This effectively makes the drivers adhere
to the binding because all nodes with a unit address must have
a reg property and vice versa.

Fixes: e888d445ac33 ("nvmem: resolve cells from DT at registration time").
Signed-off-by: Ahmad Fatoum 
Signed-off-by: Srinivas Kandagatla 
---
 drivers/nvmem/core.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/nvmem/core.c b/drivers/nvmem/core.c
index 68ae6f24b57f..a5ab1e0c74cf 100644
--- a/drivers/nvmem/core.c
+++ b/drivers/nvmem/core.c
@@ -682,7 +682,9 @@ static int nvmem_add_cells_from_of(struct nvmem_device 
*nvmem)
 
for_each_child_of_node(parent, child) {
addr = of_get_property(child, "reg", &len);
-   if (!addr || (len < 2 * sizeof(u32))) {
+   if (!addr)
+   continue;
+   if (len < 2 * sizeof(u32)) {
dev_err(dev, "nvmem: invalid reg on %pOF\n", child);
return -EINVAL;
}
-- 
2.21.0



Re: [PATCH v2] btrfs: Avoid calling btrfs_get_chunk_map() twice

2021-01-29 Thread Michal Rostecki
On Fri, Jan 29, 2021 at 11:22:48AM -0500, Josef Bacik wrote:
> On 1/27/21 8:57 AM, Michal Rostecki wrote:
> > From: Michal Rostecki 
> > 
> > Before this change, the btrfs_get_io_geometry() function was calling
> > btrfs_get_chunk_map() to get the extent mapping, necessary for
> > calculating the I/O geometry. It was using that extent mapping only
> > internally and freeing the pointer after its execution.
> > 
> > That resulted in calling btrfs_get_chunk_map() de facto twice by the
> > __btrfs_map_block() function. It was calling btrfs_get_io_geometry()
> > first and then calling btrfs_get_chunk_map() directly to get the extent
> > mapping, used by the rest of the function.
> > 
> > This change fixes that by passing the extent mapping to the
> > btrfs_get_io_geometry() function as an argument.
> > 
> > v2:
> > When btrfs_get_chunk_map() returns an error in btrfs_submit_direct():
> > - Use errno_to_blk_status(PTR_ERR(em)) as the status
> > - Set em to NULL
> > 
> > Signed-off-by: Michal Rostecki 
> 
> This panic'ed all of my test vms in their overnight xfstests runs, the panic 
> is this
> 
> [ 2449.936502] BTRFS critical (device dm-7): mapping failed logical
> 1113825280 bio len 40960 len 24576
> [ 2449.937073] [ cut here ]
> [ 2449.937329] kernel BUG at fs/btrfs/volumes.c:6450!
> [ 2449.937604] invalid opcode:  [#1] SMP NOPTI
> [ 2449.937855] CPU: 0 PID: 259045 Comm: kworker/u5:0 Not tainted 5.11.0-rc5+ 
> #122
> [ 2449.938252] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> 1.13.0-2.fc32 04/01/2014
> [ 2449.938713] Workqueue: btrfs-worker-high btrfs_work_helper
> [ 2449.939016] RIP: 0010:btrfs_map_bio.cold+0x5a/0x5c
> [ 2449.939392] Code: 37 87 ff ff e8 ed d4 8a ff 48 83 c4 18 e9 b5 52 8b ff
> 49 89 c8 4c 89 fa 4c 89 f1 48 c7 c6 b0 c0 61 8b 48 89 ef e8 11 87 ff ff <0f>
> 0b 4c 89 e7 e8 42 09 86 ff e9 fd 59 8b ff 49 8b 7a 50 44 89 f2
> [ 2449.940402] RSP: :9f24c1637d90 EFLAGS: 00010282
> [ 2449.940689] RAX: 0057 RBX: 90c78ff716b8 RCX: 
> 
> [ 2449.941080] RDX: 90c7fbc27ae0 RSI: 90c7fbc19110 RDI: 
> 90c7fbc19110
> [ 2449.941467] RBP: 90c7911d4000 R08:  R09: 
> 
> [ 2449.941853] R10: 9f24c1637b48 R11: 8b9723e8 R12: 
> 
> [ 2449.942243] R13:  R14: a000 R15: 
> 4263a000
> [ 2449.942632] FS:  () GS:90c7fbc0()
> knlGS:
> [ 2449.943072] CS:  0010 DS:  ES:  CR0: 80050033
> [ 2449.943386] CR2: 5575163c3080 CR3: 00010ad6c004 CR4: 
> 00370ef0
> [ 2449.943772] Call Trace:
> [ 2449.943915]  ? lock_release+0x1c3/0x290
> [ 2449.944135]  run_one_async_done+0x3a/0x60
> [ 2449.944360]  btrfs_work_helper+0x136/0x520
> [ 2449.944588]  process_one_work+0x26e/0x570
> [ 2449.944812]  worker_thread+0x55/0x3c0
> [ 2449.945016]  ? process_one_work+0x570/0x570
> [ 2449.945250]  kthread+0x137/0x150
> [ 2449.945430]  ? __kthread_bind_mask+0x60/0x60
> [ 2449.945666]  ret_from_fork+0x1f/0x30
> 
> it happens when you run btrfs/060.  Please make sure to run xfstests against
> patches before you submit them upstream.  Thanks,
> 
> Josef

Umm... I ran the xftests against v1 patch and didn't get that panic.
I'll try to reproduce and fix that now. Thanks for the heads up and
sorry!

Thanks,
Michal


[PATCH 0/5] nvmem: patches (set 1) for 5.12

2021-01-29 Thread Srinivas Kandagatla
Hi Greg,

Here are some nvmem patches for 5.12 which includes
- adding support to new rmem nvmem provider
- a improvement in core to skip invalid node and a fix a leak
- patch in imx driver to use of_device_get_match_data

Can you please queue them up for 5.12.

thanks for you help,
srini

Ahmad Fatoum (1):
  nvmem: core: skip child nodes not matching binding

Dan Carpenter (1):
  nvmem: core: Fix a resource leak on error in nvmem_add_cells_from_of()

Fabio Estevam (1):
  nvmem: imx-iim: Use of_device_get_match_data()

Nicolas Saenz Julienne (2):
  dt-bindings: nvmem: Add bindings for rmem driver
  nvmem: Add driver to expose reserved memory as nvmem

 .../devicetree/bindings/nvmem/rmem.yaml   | 49 ++
 drivers/nvmem/Kconfig |  8 ++
 drivers/nvmem/Makefile|  2 +
 drivers/nvmem/core.c  |  5 +-
 drivers/nvmem/imx-iim.c   |  7 +-
 drivers/nvmem/rmem.c  | 97 +++
 drivers/of/platform.c |  1 +
 7 files changed, 162 insertions(+), 7 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/nvmem/rmem.yaml
 create mode 100644 drivers/nvmem/rmem.c

-- 
2.21.0



Re: [v5 PATCH 08/11] mm: vmscan: use per memcg nr_deferred of shrinker

2021-01-29 Thread Yang Shi
On Fri, Jan 29, 2021 at 6:59 AM Kirill Tkhai  wrote:
>
> On 29.01.2021 17:55, Kirill Tkhai wrote:
> > On 28.01.2021 02:33, Yang Shi wrote:
> >> Use per memcg's nr_deferred for memcg aware shrinkers.  The shrinker's 
> >> nr_deferred
> >> will be used in the following cases:
> >> 1. Non memcg aware shrinkers
> >> 2. !CONFIG_MEMCG
> >> 3. memcg is disabled by boot parameter
> >>
> >> Signed-off-by: Yang Shi 
> >> ---
> >>  mm/vmscan.c | 87 -
> >>  1 file changed, 73 insertions(+), 14 deletions(-)
> >>
> >> diff --git a/mm/vmscan.c b/mm/vmscan.c
> >> index 20be0db291fe..e1f8960f5cf6 100644
> >> --- a/mm/vmscan.c
> >> +++ b/mm/vmscan.c
> >> @@ -205,7 +205,8 @@ static int expand_one_shrinker_info(struct mem_cgroup 
> >> *memcg,
> >>
> >>  for_each_node(nid) {
> >>  old = rcu_dereference_protected(
> >> -mem_cgroup_nodeinfo(memcg, nid)->shrinker_info, true);
> >> +mem_cgroup_nodeinfo(memcg, nid)->shrinker_info,
> >> +lockdep_is_held(&shrinker_rwsem));
> >
> > Won't it better to pack this repeating pattern into helper function, e.g.:
> >
> > static struct shrinker_info memcg_shrinker_info(struct mem_cgroup *memcg, 
> > int nid)
> > {
> >   return rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_info,
> >  lockdep_is_held(&shrinker_rwsem));
> > }
> >
> > ?
> >
> > Even shrink_slab_memcg() may want to use it.
>
> Hm, I see you already introduced a helper in [10/11], but it is used in only 
> place.
> Then, we should use it for all places (introduce the helper earlier).

Yes, good point. Will fix in v6.

>
> >>  /* Not yet online memcg */
> >>  if (!old)
> >>  return 0;
> >> @@ -239,7 +240,8 @@ void free_shrinker_info(struct mem_cgroup *memcg)
> >>
> >>  for_each_node(nid) {
> >>  pn = mem_cgroup_nodeinfo(memcg, nid);
> >> -info = rcu_dereference_protected(pn->shrinker_info, true);
> >> +info = rcu_dereference_protected(pn->shrinker_info,
> >> + 
> >> lockdep_is_held(&shrinker_rwsem));
> >>  if (info)
> >>  kvfree(info);
> >>  rcu_assign_pointer(pn->shrinker_info, NULL);
> >> @@ -360,6 +362,27 @@ static void unregister_memcg_shrinker(struct shrinker 
> >> *shrinker)
> >>  up_write(&shrinker_rwsem);
> >>  }
> >>
> >> +static long count_nr_deferred_memcg(int nid, struct shrinker *shrinker,
> >> +struct mem_cgroup *memcg)
> >> +{
> >> +struct shrinker_info *info;
> >> +
> >> +info = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_info,
> >> + lockdep_is_held(&shrinker_rwsem));
> >> +return atomic_long_xchg(&info->nr_deferred[shrinker->id], 0);
> >> +}
> >> +
> >> +static long set_nr_deferred_memcg(long nr, int nid, struct shrinker 
> >> *shrinker,
> >> +  struct mem_cgroup *memcg)
> >> +{
> >> +struct shrinker_info *info;
> >> +
> >> +info = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_info,
> >> + lockdep_is_held(&shrinker_rwsem));
> >> +
> >> +return atomic_long_add_return(nr, &info->nr_deferred[shrinker->id]);
> >> +}
> >> +
> >>  static bool cgroup_reclaim(struct scan_control *sc)
> >>  {
> >>  return sc->target_mem_cgroup;
> >> @@ -398,6 +421,18 @@ static void unregister_memcg_shrinker(struct shrinker 
> >> *shrinker)
> >>  {
> >>  }
> >>
> >> +static long count_nr_deferred_memcg(int nid, struct shrinker *shrinker,
> >> +struct mem_cgroup *memcg)
> >> +{
> >> +return 0;
> >> +}
> >> +
> >> +static long set_nr_deferred_memcg(long nr, int nid, struct shrinker 
> >> *shrinker,
> >> +  struct mem_cgroup *memcg)
> >> +{
> >> +return 0;
> >> +}
> >> +
> >>  static bool cgroup_reclaim(struct scan_control *sc)
> >>  {
> >>  return false;
> >> @@ -409,6 +444,39 @@ static bool writeback_throttling_sane(struct 
> >> scan_control *sc)
> >>  }
> >>  #endif
> >>
> >> +static long count_nr_deferred(struct shrinker *shrinker,
> >> +  struct shrink_control *sc)
> >> +{
> >> +int nid = sc->nid;
> >> +
> >> +if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
> >> +nid = 0;
> >> +
> >> +if (sc->memcg &&
> >> +(shrinker->flags & SHRINKER_MEMCG_AWARE))
> >> +return count_nr_deferred_memcg(nid, shrinker,
> >> +   sc->memcg);
> >> +
> >> +return atomic_long_xchg(&shrinker->nr_deferred[nid], 0);
> >> +}
> >> +
> >> +
> >> +static long set_nr_deferred(long nr, struct shrinker *shrinker,
> >> +struct shrink_control *sc)
> >> +{
> >> +int nid = sc->nid;
> >> +
> >> +if (!(shrinker->flags & SHRINK

Re: [v5 PATCH 07/11] mm: vmscan: add per memcg shrinker nr_deferred

2021-01-29 Thread Yang Shi
On Fri, Jan 29, 2021 at 5:00 AM Vlastimil Babka  wrote:
>
> On 1/28/21 12:33 AM, Yang Shi wrote:
> > Currently the number of deferred objects are per shrinker, but some slabs, 
> > for example,
> > vfs inode/dentry cache are per memcg, this would result in poor isolation 
> > among memcgs.
> >
> > The deferred objects typically are generated by __GFP_NOFS allocations, one 
> > memcg with
> > excessive __GFP_NOFS allocations may blow up deferred objects, then other 
> > innocent memcgs
> > may suffer from over shrink, excessive reclaim latency, etc.
> >
> > For example, two workloads run in memcgA and memcgB respectively, workload 
> > in B is vfs
> > heavy workload.  Workload in A generates excessive deferred objects, then 
> > B's vfs cache
> > might be hit heavily (drop half of caches) by B's limit reclaim or global 
> > reclaim.
> >
> > We observed this hit in our production environment which was running vfs 
> > heavy workload
> > shown as the below tracing log:
> >
> > <...>-409454 [016]  28286961.747146: mm_shrink_slab_start: 
> > super_cache_scan+0x0/0x1a0 9a83046f3458:
> > nid: 1 objects to shrink 3641681686040 gfp_flags 
> > GFP_HIGHUSER_MOVABLE|__GFP_ZERO pgs_scanned 1 lru_pgs 15721
> > cache items 246404277 delta 31345 total_scan 123202138
> > <...>-409454 [022]  28287105.928018: mm_shrink_slab_end: 
> > super_cache_scan+0x0/0x1a0 9a83046f3458:
> > nid: 1 unused scan count 3641681686040 new scan count 3641798379189 
> > total_scan 602
> > last shrinker return val 123186855
> >
> > The vfs cache and page cache ration was 10:1 on this machine, and half of 
> > caches were dropped.
> > This also resulted in significant amount of page caches were dropped due to 
> > inodes eviction.
> >
> > Make nr_deferred per memcg for memcg aware shrinkers would solve the 
> > unfairness and bring
> > better isolation.
> >
> > When memcg is not enabled (!CONFIG_MEMCG or memcg disabled), the shrinker's 
> > nr_deferred
> > would be used.  And non memcg aware shrinkers use shrinker's nr_deferred 
> > all the time.
> >
> > Signed-off-by: Yang Shi 
> > ---
> >  include/linux/memcontrol.h |  7 +++---
> >  mm/vmscan.c| 48 +-
> >  2 files changed, 36 insertions(+), 19 deletions(-)
> >
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index 62b888b88a5f..e0384367e07d 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -93,12 +93,13 @@ struct lruvec_stat {
> >  };
> >
> >  /*
> > - * Bitmap of shrinker::id corresponding to memcg-aware shrinkers,
> > - * which have elements charged to this memcg.
> > + * Bitmap and deferred work of shrinker::id corresponding to memcg-aware
> > + * shrinkers, which have elements charged to this memcg.
> >   */
> >  struct shrinker_info {
> >   struct rcu_head rcu;
> > - unsigned long map[];
> > + unsigned long *map;
> > + atomic_long_t *nr_deferred;
> >  };
> >
> >  /*
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 256896d157d4..20be0db291fe 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -187,16 +187,21 @@ static DECLARE_RWSEM(shrinker_rwsem);
> >  #ifdef CONFIG_MEMCG
> >  static int shrinker_nr_max;
> >
> > +#define NR_MAX_TO_SHR_MAP_SIZE(nr_max)   \
> > + ((nr_max / BITS_PER_LONG + 1) * sizeof(unsigned long))
>
> Could have been part of patch 4 already. And yeah, using DIV_ROUND_UP(), as
> being hidden in a macro makes the "shorter statement" benefit disappear :)
>
> > +
> >  static void free_shrinker_info_rcu(struct rcu_head *head)
> >  {
> >   kvfree(container_of(head, struct shrinker_info, rcu));
> >  }
> >
> >  static int expand_one_shrinker_info(struct mem_cgroup *memcg,
> > -int size, int old_size)
> > + int m_size, int d_size,
> > + int old_m_size, int old_d_size)
> >  {
> >   struct shrinker_info *new, *old;
> >   int nid;
> > + int size = m_size + d_size;
> >
> >   for_each_node(nid) {
> >   old = rcu_dereference_protected(
> > @@ -209,9 +214,15 @@ static int expand_one_shrinker_info(struct mem_cgroup 
> > *memcg,
> >   if (!new)
> >   return -ENOMEM;
> >
> > - /* Set all old bits, clear all new bits */
> > - memset(new->map, (int)0xff, old_size);
> > - memset((void *)new->map + old_size, 0, size - old_size);
> > + new->map = (unsigned long *)(new + 1);
> > + new->nr_deferred = (void *)new->map + m_size;
>
> This better be aligned to sizeof(atomic_long_t). Can we be sure about that?

Good point. No, if unsigned long is 32 bit on some 64 bit machines.

> Also it's all quite ugly and complex. Is it worth it? What about just leaving
> map as it is and allocating a nr_deferred array separately, i.e.:
>
>   struct shrinker_info {
> struct rcu_head rcu;
> atomic_long_t *nr_deferred; //

Re: [net-next PATCH v4 01/15] Documentation: ACPI: DSD: Document MDIO PHY

2021-01-29 Thread Andy Shevchenko
On Fri, Jan 29, 2021 at 6:44 PM Rafael J. Wysocki  wrote:
> On Fri, Jan 29, 2021 at 5:37 PM Rafael J. Wysocki  wrote:
> > On Fri, Jan 29, 2021 at 7:48 AM Calvin Johnson
> >  wrote:

...

> > It would work, but I would introduce a wrapper around the _ADR
> > evaluation, something like:
> >
> > int acpi_get_local_address(acpi_handle handle, u32 *addr)
> > {
> >   unsigned long long adr;
> >   acpi_status status;
> >
> >   status = acpi_evaluate_integer(handle, METHOD_NAME__ADR, NULL, &adr);
> >   if (ACPI_FAILURE(status))
> > return -ENODATA;
> >
> >   *addr = (u32)adr;
> >   return 0;
> > }
> >
> > in drivers/acpi/utils.c and add a static inline stub always returning
> > -ENODEV for it for !CONFIG_ACPI.

...

> BTW, you may not need the fwnode_get_local_addr() at all then, just
> evaluate either the "reg" property for OF or acpi_get_local_address()
> for ACPI in the "caller" code directly. A common helper doing this can
> be added later.

Sounds good to me and it will address your concern about different
semantics of reg/_ADR on per driver/subsystem basis.

-- 
With Best Regards,
Andy Shevchenko


[RFC v4 1/3] vfio/platform: add support for msi

2021-01-29 Thread Vikas Gupta
MSI support for platform devices. MSI is added
as a single 'index' with 'count' as the number of
MSI(s) supported by the devices.

Signed-off-by: Vikas Gupta 
---
 drivers/vfio/platform/Kconfig |   1 +
 drivers/vfio/platform/vfio_platform_common.c  |  95 ++-
 drivers/vfio/platform/vfio_platform_irq.c | 253 --
 drivers/vfio/platform/vfio_platform_private.h |  29 ++
 include/uapi/linux/vfio.h |  24 ++
 5 files changed, 373 insertions(+), 29 deletions(-)

diff --git a/drivers/vfio/platform/Kconfig b/drivers/vfio/platform/Kconfig
index dc1a3c44f2c6..d4bbc9f27763 100644
--- a/drivers/vfio/platform/Kconfig
+++ b/drivers/vfio/platform/Kconfig
@@ -3,6 +3,7 @@ config VFIO_PLATFORM
tristate "VFIO support for platform devices"
depends on VFIO && EVENTFD && (ARM || ARM64)
select VFIO_VIRQFD
+   select GENERIC_MSI_IRQ_DOMAIN
help
  Support for platform devices with VFIO. This is required to make
  use of platform devices present on the system using the VFIO
diff --git a/drivers/vfio/platform/vfio_platform_common.c 
b/drivers/vfio/platform/vfio_platform_common.c
index fb4b385191f2..f2b1f0c3bfcc 100644
--- a/drivers/vfio/platform/vfio_platform_common.c
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "vfio_platform_private.h"
 
@@ -28,23 +29,22 @@
 static LIST_HEAD(reset_list);
 static DEFINE_MUTEX(driver_lock);
 
-static vfio_platform_reset_fn_t vfio_platform_lookup_reset(const char *compat,
-   struct module **module)
+static void vfio_platform_lookup_reset(const char *compat,
+  struct module **module,
+  struct vfio_platform_reset_node **node)
 {
struct vfio_platform_reset_node *iter;
-   vfio_platform_reset_fn_t reset_fn = NULL;
 
mutex_lock(&driver_lock);
list_for_each_entry(iter, &reset_list, link) {
if (!strcmp(iter->compat, compat) &&
try_module_get(iter->owner)) {
*module = iter->owner;
-   reset_fn = iter->of_reset;
+   *node = iter;
break;
}
}
mutex_unlock(&driver_lock);
-   return reset_fn;
 }
 
 static int vfio_platform_acpi_probe(struct vfio_platform_device *vdev,
@@ -112,15 +112,23 @@ static bool vfio_platform_has_reset(struct 
vfio_platform_device *vdev)
 
 static int vfio_platform_get_reset(struct vfio_platform_device *vdev)
 {
+   struct vfio_platform_reset_node *node = NULL;
+
if (VFIO_PLATFORM_IS_ACPI(vdev))
return vfio_platform_acpi_has_reset(vdev) ? 0 : -ENOENT;
 
-   vdev->of_reset = vfio_platform_lookup_reset(vdev->compat,
-   &vdev->reset_module);
-   if (!vdev->of_reset) {
+   vfio_platform_lookup_reset(vdev->compat, &vdev->reset_module,
+  &node);
+   if (!node) {
request_module("vfio-reset:%s", vdev->compat);
-   vdev->of_reset = vfio_platform_lookup_reset(vdev->compat,
-   &vdev->reset_module);
+   vfio_platform_lookup_reset(vdev->compat, &vdev->reset_module,
+  &node);
+   }
+
+   if (node) {
+   vdev->of_reset = node->of_reset;
+   vdev->of_get_msi = node->of_get_msi;
+   vdev->of_msi_write = node->of_msi_write;
}
 
return vdev->of_reset ? 0 : -ENOENT;
@@ -343,9 +351,16 @@ static long vfio_platform_ioctl(void *device_data,
 
} else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) {
struct vfio_irq_info info;
+   struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
+   int ext_irq_index = vdev->num_irqs - vdev->num_ext_irqs;
+   unsigned long capsz;
+   u32 index;
 
minsz = offsetofend(struct vfio_irq_info, count);
 
+   /* For backward compatibility, cannot require this */
+   capsz = offsetofend(struct vfio_irq_info, cap_offset);
+
if (copy_from_user(&info, (void __user *)arg, minsz))
return -EFAULT;
 
@@ -355,8 +370,53 @@ static long vfio_platform_ioctl(void *device_data,
if (info.index >= vdev->num_irqs)
return -EINVAL;
 
-   info.flags = vdev->irqs[info.index].flags;
-   info.count = vdev->irqs[info.index].count;
+   if (info.argsz >= capsz)
+   minsz = capsz;
+
+   index = info.index;
+
+   info.flags = vdev->irqs[index].flags;
+   info.count = vdev->irqs[index].count;
+
+   if (ext_irq_index - index == VFIO_EXT_IR

[RFC v4 0/3] msi support for platform devices

2021-01-29 Thread Vikas Gupta
This RFC adds support for MSI for platform devices.
MSI block is added as an ext irq along with the existing
wired interrupt implementation. The patchset exports two
caps for MSI and related data to configure MSI source device.

Changes from:
-
 v3 to v4:
1) Removed the 'cap' for exporting MSI info to userspace and
   restored into vedor specific module.
2) Enable GENERIC_MSI_IRQ_DOMAIN in Kconfig.
3) Removed the vendor specific, Broadcom, 'msi' module and 
   integrated the MSI relates ops into the 'reset' module for
   MSI support.  

 v2 to v3:
1) Restored the vendor specific module to get max number
   of MSIs supported and .count value initialized.
2) Comments from Eric addressed.

 v1 to v2:
1) IRQ allocation has been implemented as below:
   
   |IRQ-0|IRQ-1||IRQ-n|MSI|
   
MSI block has msi contexts and its implemneted
as ext irq.

2) Removed vendor specific module for msi handling so
   previously patch2 and patch3 are not required.

3) MSI related data is exported to userspace using 'caps'.
 Please note VFIO_IRQ_INFO_CAP_TYPE in include/uapi/linux/vfio.h 
implementation
is taken from the Eric`s patch

https://patchwork.kernel.org/project/kvm/patch/20201116110030.32335-8-eric.au...@redhat.com/


 v0 to v1:
   i)  Removed MSI device flag VFIO_DEVICE_FLAGS_MSI.
   ii) Add MSI(s) at the end of the irq list of platform IRQs.
   MSI(s) with first entry of MSI block has count and flag
   information.
   IRQ list: Allocation for IRQs + MSIs are allocated as below
   Example: if there are 'n' IRQs and 'k' MSIs
   ---
   |IRQ-0|IRQ-1||IRQ-n|MSI-0|MSI-1|MSI-2|..|MSI-k|
   ---
   MSI-0 will have count=k set and flags set accordingly.


Vikas Gupta (3):
  vfio/platform: add support for msi
  vfio/platform: change cleanup order
  vfio: platform: reset: add msi support

 drivers/vfio/platform/Kconfig |   1 +
 .../platform/reset/vfio_platform_bcmflexrm.c  |  72 -
 drivers/vfio/platform/vfio_platform_common.c  |  97 +--
 drivers/vfio/platform/vfio_platform_irq.c | 253 --
 drivers/vfio/platform/vfio_platform_private.h |  29 ++
 include/uapi/linux/vfio.h |  24 ++
 6 files changed, 444 insertions(+), 32 deletions(-)

-- 
2.17.1



smime.p7s
Description: S/MIME Cryptographic Signature


[RFC v4 2/3] vfio/platform: change cleanup order

2021-01-29 Thread Vikas Gupta
In the case of msi, vendor specific msi module may require
region access to handle msi cleanup so we need to cleanup region
after irq cleanup only.

Signed-off-by: Vikas Gupta 
---
 drivers/vfio/platform/vfio_platform_common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vfio/platform/vfio_platform_common.c 
b/drivers/vfio/platform/vfio_platform_common.c
index f2b1f0c3bfcc..1cc040e3ed1f 100644
--- a/drivers/vfio/platform/vfio_platform_common.c
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -243,8 +243,8 @@ static void vfio_platform_release(void *device_data)
WARN_ON(1);
}
pm_runtime_put(vdev->device);
-   vfio_platform_regions_cleanup(vdev);
vfio_platform_irq_cleanup(vdev);
+   vfio_platform_regions_cleanup(vdev);
}
 
mutex_unlock(&driver_lock);
-- 
2.17.1



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [REGRESSION] "ALSA: HDA: Early Forbid of runtime PM" broke my laptop's internal audio

2021-01-29 Thread Michael Catanzaro

On Fri, Jan 29, 2021 at 5:17 pm, Takashi Iwai  wrote:

--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -2217,8 +2217,6 @@ static const struct snd_pci_quirk 
power_save_denylist[] = {

/* https://bugzilla.redhat.com/show_bug.cgi?id=1525104 */
SND_PCI_QUIRK(0x1043, 0x8733, "Asus Prime X370-Pro", 0),
/* https://bugzilla.redhat.com/show_bug.cgi?id=1525104 */
-   SND_PCI_QUIRK(0x1558, 0x6504, "Clevo W65_67SB", 0),
-   /* https://bugzilla.redhat.com/show_bug.cgi?id=1525104 */
SND_PCI_QUIRK(0x1028, 0x0497, "Dell Precision T3600", 0),
/* https://bugzilla.redhat.com/show_bug.cgi?id=1525104 */
 	/* Note the P55A-UD3 and Z87-D3HP share the subsys id for the HDA 
dev */


Hi,

This patch works fine on my laptop. I have no clue whether that means 
it's really safe to remove the quirk. I've never noticed any clicking 
noise myself, but I understand it has been a problem for other System76 
laptops.


Michael




Re: dax alignment problem on arm64 (and other achitectures)

2021-01-29 Thread Joao Martins



On 1/29/21 4:32 PM, Pavel Tatashin wrote:
> On Fri, Jan 29, 2021 at 9:51 AM Joao Martins  
> wrote:
>>
>> Hey Pavel,
>>
>> On 1/29/21 1:50 PM, Pavel Tatashin wrote:
 Since we last talked about this the enabling for EFI "Special Purpose"
 / Soft Reserved Memory has gone upstream and instantiates device-dax
 instances for address ranges marked with EFI_MEMORY_SP attribute.
 Critically this way of declaring device-dax removes the consideration
 of it as persistent memory and as such no metadata reservation. So, if
 you are willing to maintain the metadata external to the device (which
 seems reasonable for your environment) and have your platform firmware
 / kernel command line mark it as EFI_CONVENTIONAL_MEMORY +
 EFI_MEMORY_SP, then these reserve-free dax-devices will surface.
>>>
>>> Hi Dan,
>>>
>>> This is cool. Does it allow conversion between devdax and fsdax so DAX
>>> aware filesystem can be installed and data can be put there to be
>>> preserved across the reboot?
>>>
>>
>> fwiw wrt to the 'preserved across kexec' part, you are going to need
>> something conceptually similar to snippet below the scissors mark.
>> Alternatively, we could fix kexec userspace to add conventional memory
>> ranges (without the SP attribute part) when it sees a Soft-Reserved region.
>> But can't tell which one is the right thing to do.
> 
> Hi Joao,
> 
> Is not it just a matter of appending arguments to the kernel parameter
> during kexec reboot with Soft-Reserved region specified, or am I
> missing something? I understand with fileload kexec syscall we might
> accidently load segments onto reserved region, but with the original
> kexec syscall, where we can specify destinations for each segment that
> should not be a problem with today's kexec tools.
> 
efi_fake_mem only works with EFI_MEMMAP conventional memory ranges, thus
not having a EFI_MEMMAP with RAM ranges means it's a nop for the soft-reserved
regions. Unless, you trying to suggest something like:

memmap=%+0xefff

... To mark soft reserved on top an existing RAM? Sadly don't know if there's
an equivalent for ARM.


> I agree that preserving it automatically as you are proposing, would
> make more sense, instead of fiddling with kernel parameters and
> segment destinations.
> 
> Thank you,
> Pasha
> 
>>
>> At the moment, HMAT ranges (or those defined with efi_fake_mem=) aren't
>> preserved not because of anything special with HMAT, but simply because
>> the EFI memmap conventional ram ranges are not preserved (only runtime
>> services). And HMAT/efi_fake_mem expects these to based on EFI memmap.
>>

[snip]


[RFC v4 3/3] vfio: platform: reset: add msi support

2021-01-29 Thread Vikas Gupta
Add msi support for Broadcom FlexRm device.

Signed-off-by: Vikas Gupta 
---
 .../platform/reset/vfio_platform_bcmflexrm.c  | 72 ++-
 1 file changed, 70 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/platform/reset/vfio_platform_bcmflexrm.c 
b/drivers/vfio/platform/reset/vfio_platform_bcmflexrm.c
index 96064ef8f629..6ca4ca12575b 100644
--- a/drivers/vfio/platform/reset/vfio_platform_bcmflexrm.c
+++ b/drivers/vfio/platform/reset/vfio_platform_bcmflexrm.c
@@ -21,7 +21,9 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 
 
 #include "../vfio_platform_private.h"
 
@@ -33,6 +35,9 @@
 #define RING_VER   0x000
 #define RING_CONTROL   0x034
 #define RING_FLUSH_DONE0x038
+#define RING_MSI_ADDR_LS   0x03c
+#define RING_MSI_ADDR_MS   0x040
+#define RING_MSI_DATA_VALUE0x064
 
 /* Register RING_CONTROL fields */
 #define CONTROL_FLUSH_SHIFT5
@@ -105,8 +110,71 @@ static int vfio_platform_bcmflexrm_reset(struct 
vfio_platform_device *vdev)
return ret;
 }
 
-module_vfio_reset_handler("brcm,iproc-flexrm-mbox",
- vfio_platform_bcmflexrm_reset);
+static u32 bcm_num_msi(struct vfio_platform_device *vdev)
+{
+   struct vfio_platform_region *reg = &vdev->regions[0];
+
+   return (reg->size / RING_REGS_SIZE);
+}
+
+static void bcm_write_msi(struct vfio_platform_device *vdev,
+   struct msi_desc *desc,
+   struct msi_msg *msg)
+{
+   int i;
+   int hwirq = -1;
+   int msi_src;
+   void __iomem *ring;
+   struct vfio_platform_region *reg = &vdev->regions[0];
+
+   if (!reg)
+   return;
+
+   for (i = 0; i < vdev->num_irqs; i++)
+   if (vdev->irqs[i].type == VFIO_IRQ_TYPE_MSI)
+   hwirq = vdev->irqs[i].ctx[0].hwirq;
+
+   if (hwirq < 0)
+   return;
+
+   msi_src = desc->irq - hwirq;
+
+   if (!reg->ioaddr) {
+   reg->ioaddr = ioremap(reg->addr, reg->size);
+   if (!reg->ioaddr)
+   return;
+   }
+
+   ring = reg->ioaddr + msi_src * RING_REGS_SIZE;
+
+   writel_relaxed(msg->address_lo, ring + RING_MSI_ADDR_LS);
+   writel_relaxed(msg->address_hi, ring + RING_MSI_ADDR_MS);
+   writel_relaxed(msg->data, ring + RING_MSI_DATA_VALUE);
+}
+
+static struct vfio_platform_reset_node vfio_platform_bcmflexrm_reset_node = {
+   .owner = THIS_MODULE,
+   .compat = "brcm,iproc-flexrm-mbox",
+   .of_reset = vfio_platform_bcmflexrm_reset,
+   .of_get_msi = bcm_num_msi,
+   .of_msi_write = bcm_write_msi
+};
+
+static int __init vfio_platform_bcmflexrm_reset_module_init(void)
+{
+   __vfio_platform_register_reset(&vfio_platform_bcmflexrm_reset_node);
+
+   return 0;
+}
+
+static void __exit vfio_platform_bcmflexrm_reset_module_exit(void)
+{
+   vfio_platform_unregister_reset("brcm,iproc-flexrm-mbox",
+  vfio_platform_bcmflexrm_reset);
+}
+
+module_init(vfio_platform_bcmflexrm_reset_module_init);
+module_exit(vfio_platform_bcmflexrm_reset_module_exit);
 
 MODULE_LICENSE("GPL v2");
 MODULE_AUTHOR("Anup Patel ");
-- 
2.17.1



smime.p7s
Description: S/MIME Cryptographic Signature


[PATCH] platform/x86: dell-wmi-sysman: fix a NULL pointer dereference

2021-01-29 Thread Mario Limonciello
An upcoming Dell platform is causing a NULL pointer dereference
in dell-wmi-sysman initialization.  Validate that the input from
BIOS matches correct ACPI types and abort module initialization
if it fails.

This leads to a memory leak that needs to be cleaned up properly.

Signed-off-by: Mario Limonciello 
---
 drivers/platform/x86/dell-wmi-sysman/sysman.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/platform/x86/dell-wmi-sysman/sysman.c 
b/drivers/platform/x86/dell-wmi-sysman/sysman.c
index dc6dd531c996..38b497991071 100644
--- a/drivers/platform/x86/dell-wmi-sysman/sysman.c
+++ b/drivers/platform/x86/dell-wmi-sysman/sysman.c
@@ -419,13 +419,19 @@ static int init_bios_attributes(int attr_type, const char 
*guid)
return retval;
/* need to use specific instance_id and guid combination to get right 
data */
obj = get_wmiobj_pointer(instance_id, guid);
-   if (!obj)
+   if (!obj || obj->type != ACPI_TYPE_PACKAGE) {
+   release_attributes_data();
return -ENODEV;
+   }
elements = obj->package.elements;
 
mutex_lock(&wmi_priv.mutex);
while (elements) {
/* sanity checking */
+   if (elements[ATTR_NAME].type != ACPI_TYPE_STRING) {
+   pr_debug("incorrect element type\n");
+   goto nextobj;
+   }
if (strlen(elements[ATTR_NAME].string.pointer) == 0) {
pr_debug("empty attribute found\n");
goto nextobj;
-- 
2.25.1



Re: [PATCH V3 1/5] perf/core: Add PERF_SAMPLE_WEIGHT_STRUCT

2021-01-29 Thread Liang, Kan




On 1/28/2021 5:40 PM, kan.li...@linux.intel.com wrote:

From: Kan Liang 

Current PERF_SAMPLE_WEIGHT sample type is very useful to expresses the
cost of an action represented by the sample. This allows the profiler
to scale the samples to be more informative to the programmer. It could
also help to locate a hotspot, e.g., when profiling by memory latencies,
the expensive load appear higher up in the histograms. But current
PERF_SAMPLE_WEIGHT sample type is solely determined by one factor. This
could be a problem, if users want two or more factors to contribute to
the weight. For example, Golden Cove core PMU can provide both the
instruction latency and the cache Latency information as factors for the
memory profiling.

For current X86 platforms, although meminfo::latency is defined as a
u64, only the lower 32 bits include the valid data in practice (No
memory access could last than 4G cycles). The higher 32 bits can be used
to store new factors.

Add a new sample type, PERF_SAMPLE_WEIGHT_STRUCT, to indicate the new
sample weight structure. It shares the same space as the
PERF_SAMPLE_WEIGHT sample type.

Users can apply either the PERF_SAMPLE_WEIGHT sample type or the
PERF_SAMPLE_WEIGHT_STRUCT sample type to retrieve the sample weight, but
they cannot apply both sample types simultaneously.

Currently, only X86 and PowerPC use the PERF_SAMPLE_WEIGHT sample type.
- For PowerPC, there is nothing changed for the PERF_SAMPLE_WEIGHT
   sample type. There is no effect for the new PERF_SAMPLE_WEIGHT_STRUCT
   sample type. PowerPC can re-struct the weight field similarly later.
- For X86, the same value will be dumped for the PERF_SAMPLE_WEIGHT
   sample type or the PERF_SAMPLE_WEIGHT_STRUCT sample type for now.
   The following patches will apply the new factors for the
   PERF_SAMPLE_WEIGHT_STRUCT sample type.

The field in the union perf_sample_weight should be shared among
different architectures. A generic name is required, but it's hard to
abstract a name that applies to all architectures. For example, on X86,
the fields are to store all kinds of latency. While on PowerPC, it
stores MMCRA[TECX/TECM], which should not be latency. So a general name
prefix 'var$NUM' is used here.

Suggested-by: Peter Zijlstra (Intel) 
Signed-off-by: Kan Liang 
---
  arch/powerpc/perf/core-book3s.c |  2 +-
  arch/x86/events/intel/ds.c  | 17 +++---
  include/linux/perf_event.h  |  4 ++--
  include/uapi/linux/perf_event.h | 49 +++--
  kernel/events/core.c| 11 +
  5 files changed, 66 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 28206b1..869d999 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2195,7 +2195,7 @@ static void record_and_restart(struct perf_event *event, 
unsigned long val,
  
  		if (event->attr.sample_type & PERF_SAMPLE_WEIGHT &&

ppmu->get_mem_weight)
-   ppmu->get_mem_weight(&data.weight);
+   ppmu->get_mem_weight(&data.weight.full);
  
  		if (perf_event_overflow(event, &data, regs))

power_pmu_stop(event, 0);
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 67dbc91..2f54b1f 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -960,7 +960,8 @@ static void adaptive_pebs_record_size_update(void)
  }
  
  #define PERF_PEBS_MEMINFO_TYPE	(PERF_SAMPLE_ADDR | PERF_SAMPLE_DATA_SRC |   \

-   PERF_SAMPLE_PHYS_ADDR | PERF_SAMPLE_WEIGHT | \
+   PERF_SAMPLE_PHYS_ADDR |  \
+   PERF_SAMPLE_WEIGHT_TYPE |\
PERF_SAMPLE_TRANSACTION |\
PERF_SAMPLE_DATA_PAGE_SIZE)
  
@@ -987,7 +988,7 @@ static u64 pebs_update_adaptive_cfg(struct perf_event *event)

gprs = (sample_type & PERF_SAMPLE_REGS_INTR) &&
   (attr->sample_regs_intr & PEBS_GP_REGS);
  
-	tsx_weight = (sample_type & PERF_SAMPLE_WEIGHT) &&

+   tsx_weight = (sample_type & PERF_SAMPLE_WEIGHT_TYPE) &&
 ((attr->config & INTEL_ARCH_EVENT_MASK) ==
  x86_pmu.rtm_abort_event);
  
@@ -1369,8 +1370,8 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,

/*
 * Use latency for weight (only avail with PEBS-LL)
 */
-   if (fll && (sample_type & PERF_SAMPLE_WEIGHT))
-   data->weight = pebs->lat;
+   if (fll && (sample_type & PERF_SAMPLE_WEIGHT_TYPE))
+   data->weight.full = pebs->lat;
  
  	/*

 * data.data_src encodes the data source
@@ -1462,8 +1463,8 @@ static void setup_pebs_fixed_sample_data(struct 
perf_event *event,
  
  	if (x86_pmu.intel_cap.pebs_format >= 2) {

/* Only set the TSX we

Re: [PATCH] sched/fair: Rate limit calls to update_blocked_averages() for NOHZ

2021-01-29 Thread Vincent Guittot
Le vendredi 29 janv. 2021 à 11:33:00 (+0100), Vincent Guittot a écrit :
> On Thu, 28 Jan 2021 at 16:09, Joel Fernandes  wrote:
> >
> > Hi Vincent,
> >
> > On Thu, Jan 28, 2021 at 8:57 AM Vincent Guittot
> >  wrote:
> > > > On Mon, Jan 25, 2021 at 03:42:41PM +0100, Vincent Guittot wrote:
> > > > > On Fri, 22 Jan 2021 at 20:10, Joel Fernandes  
> > > > > wrote:
> > > > > > On Fri, Jan 22, 2021 at 05:56:22PM +0100, Vincent Guittot wrote:
> > > > > > > On Fri, 22 Jan 2021 at 16:46, Joel Fernandes (Google)
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On an octacore ARM64 device running ChromeOS Linux kernel v5.4, 
> > > > > > > > I found
> > > > > > > > that there are a lot of calls to update_blocked_averages(). 
> > > > > > > > This causes
> > > > > > > > the schedule loop to slow down to taking upto 500 micro seconds 
> > > > > > > > at
> > > > > > > > times (due to newidle load balance). I have also seen this 
> > > > > > > > manifest in
> > > > > > > > the periodic balancer.
> > > > > > > >
> > > > > > > > Closer look shows that the problem is caused by the following
> > > > > > > > ingredients:
> > > > > > > > 1. If the system has a lot of inactive CGroups (thanks Dietmar 
> > > > > > > > for
> > > > > > > > suggesting to inspect /proc/sched_debug for this), this can make
> > > > > > > > __update_blocked_fair() take a long time.
> > > > > > >
> > > > > > > Inactive cgroups are removed from the list so they should not 
> > > > > > > impact
> > > > > > > the duration
> > > > > >
> > > > > > I meant blocked CGroups. According to this code, a cfs_rq can be 
> > > > > > partially
> > > > > > decayed and not have any tasks running on it but its load needs to 
> > > > > > be
> > > > > > decayed, correct? That's what I meant by 'inactive'. I can reword 
> > > > > > it to
> > > > > > 'blocked'.
> > > > >
> > > > > How many blocked cgroups have you got ?
> > > >
> > > > I put a counter in for_each_leaf_cfs_rq_safe() { } to count how many 
> > > > times
> > > > this loop runs per new idle balance. When the problem happens I see 
> > > > this loop
> > > > run 35-40 times (for one single instance of newidle balance). So in 
> > > > total
> > > > there are at least these many cfs_rq load updates.
> > >
> > > Do you mean that you have 35-40 cgroups ? Or the 35-40 includes all CPUs ?
> >
> > All CPUs.
> >
> > > > I also see that new idle balance can be called 200-500 times per second.
> > >
> > > This is not surprising because newidle_balance() is called every time
> > > the CPU is about to become idle
> >
> > Sure.
> >
> > > > > >
> > > > > >   * There can be a lot of idle CPU cgroups.  Don't 
> > > > > > let fully
> > > > > >   * decayed cfs_rqs linger on the list.
> > > > > >   */
> > > > > >  if (cfs_rq_is_decayed(cfs_rq))
> > > > > >  list_del_leaf_cfs_rq(cfs_rq);
> > > > > >
> > > > > > > > 2. The device has a lot of CPUs in a cluster which causes 
> > > > > > > > schedutil in a
> > > > > > > > shared frequency domain configuration to be slower than usual. 
> > > > > > > > (the load
> > > > > > >
> > > > > > > What do you mean exactly by it causes schedutil to be slower than 
> > > > > > > usual ?
> > > > > >
> > > > > > sugov_next_freq_shared() is order number of CPUs in the a cluster. 
> > > > > > This
> > > > > > system is a 6+2 system with 6 CPUs in a cluster. schedutil shared 
> > > > > > policy
> > > > > > frequency update needs to go through utilization of other CPUs in 
> > > > > > the
> > > > > > cluster. I believe this could be adding to the problem but is not 
> > > > > > really
> > > > > > needed to optimize if we can rate limit the calls to 
> > > > > > update_blocked_averages
> > > > > > to begin with.
> > > > >
> > > > > Qais mentioned half of the time being used by
> > > > > sugov_next_freq_shared(). Are there any frequency changes resulting in
> > > > > this call ?
> > > >
> > > > I do not see a frequency update happening at the time of the problem. 
> > > > However
> > > > note that sugov_iowait_boost() does run even if frequency is not being
> > > > updated. IIRC, this function is also not that light weight and I am not 
> > > > sure
> > > > if it is a good idea to call this that often.
> > >
> > > Scheduler can't make any assumption about how often schedutil/cpufreq
> > > wants to be called. Some are fast and straightforward and can be
> > > called very often to adjust frequency; Others can't handle much
> > > updates. The rate limit mechanism in schedutil and io-boost should be
> > > there for such purpose.
> >
> > Sure, I know that's the intention.
> >
> > > > > > > > average updates also try to update the frequency in schedutil).
> > > > > > > >
> > > > > > > > 3. The CPU is running at a low frequency causing the 
> > > > > > > > scheduler/schedutil
> > > > > > > > code paths to take longer than when running at a high CPU 
> > > > > > > > frequency.
> > > > > > >
> > > > > > > Low frequency usually

  1   2   3   4   5   6   >