Re: [PATCH v14 09/13] KVM: x86: Report CET MSRs as to-be-saved if CET is supported
On Thu, Jan 28, 2021 at 06:46:37PM +0100, Paolo Bonzini wrote: > On 06/11/20 02:16, Yang Weijiang wrote: > > Report all CET MSRs, including the synthetic GUEST_SSP MSR, as > > to-be-saved, e.g. for migration, if CET is supported by KVM. > > > > Co-developed-by: Sean Christopherson > > Signed-off-by: Sean Christopherson > > Signed-off-by: Yang Weijiang > > --- > > arch/x86/kvm/x86.c | 9 + > > 1 file changed, 9 insertions(+) > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > index 751b62e871e5..d573cadf5baf 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -1248,6 +1248,8 @@ static const u32 msrs_to_save_all[] = { > > MSR_ARCH_PERFMON_EVENTSEL0 + 16, MSR_ARCH_PERFMON_EVENTSEL0 + 17, > > MSR_IA32_XSS, > > + MSR_IA32_U_CET, MSR_IA32_S_CET, MSR_IA32_INT_SSP_TAB, MSR_KVM_GUEST_SSP, > > + MSR_IA32_PL0_SSP, MSR_IA32_PL1_SSP, MSR_IA32_PL2_SSP, MSR_IA32_PL3_SSP, > > }; > > static u32 msrs_to_save[ARRAY_SIZE(msrs_to_save_all)]; > > @@ -5761,6 +5763,13 @@ static void kvm_init_msr_list(void) > > if (!supported_xss) > > continue; > > break; > > + case MSR_IA32_U_CET: > > + case MSR_IA32_S_CET: > > + case MSR_IA32_INT_SSP_TAB: > > + case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP: > > + if (!kvm_cet_supported()) > > + continue; > > + break; > > default: > > break; > > } > > > > Missing "case MSR_KVM_GUEST_SSP". > OK, will fix it in next version. > Paolo
[PATCH v2] perf/core: Wake up tasks for failing pinned events
As of now we don't get any notice for pinned events when it's failed to be scheduled and make it in an error state not try to schedule it again. That means we won't get any samples for the event. It's possible we can detect it by reading the file, but usually we only monitor it via mmap-ed ring buffers. Let's poke the tasks waiting for poll(2) so that they can respond to the event. Signed-off-by: Namhyung Kim --- include/linux/perf_event.h | 1 + kernel/events/core.c | 10 ++ 2 files changed, 11 insertions(+) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 9a38f579bc76..0b3b3e97243b 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -733,6 +733,7 @@ struct perf_event { int pending_wakeup; int pending_kill; int pending_disable; + int pending_pin_error; struct irq_work pending; atomic_tevent_limit; diff --git a/kernel/events/core.c b/kernel/events/core.c index 55d18791a72d..f8e9db30a573 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -3675,6 +3675,8 @@ static int merge_sched_in(struct perf_event *event, void *data) if (event->attr.pinned) { perf_cgroup_event_disable(event, ctx); perf_event_set_state(event, PERF_EVENT_STATE_ERROR); + event->pending_pin_error = 1; + irq_work_queue(&event->pending); } *can_add_hw = 0; @@ -5288,6 +5290,9 @@ static __poll_t perf_poll(struct file *file, poll_table *wait) if (is_event_hup(event)) return events; + if (event->attr.pinned && event->state == PERF_EVENT_STATE_ERROR) + return EPOLLERR; + /* * Pin the event->rb by taking event->mmap_mutex; otherwise * perf_event_set_output() can swizzle our rb and make us miss wakeups. @@ -6333,6 +6338,11 @@ static void perf_pending_event(struct irq_work *entry) perf_event_wakeup(event); } + if (event->pending_pin_error) { + event->pending_pin_error = 0; + wake_up_all(&event->waitq); + } + if (rctx >= 0) perf_swevent_put_recursion_context(rctx); } -- 2.30.0.365.g02bc693789-goog
Re: [PATCH] drm/tilcdc: send vblank event when disabling crtc
Dropped the @ti.com addresses and added the new ones. Tomi On 29/01/2021 07:58, quanyang.w...@windriver.com wrote: > From: Quanyang Wang > > When run xrandr to change resolution on Beaglebone Black board, it will > print the error information: > > root@beaglebone:~# xrandr -display :0 --output HDMI-1 --mode 720x400 > [drm:drm_crtc_commit_wait] *ERROR* flip_done timed out > [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:32:tilcdc crtc] > commit wait timed out > [drm:drm_crtc_commit_wait] *ERROR* flip_done timed out > [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CONNECTOR:34:HDMI-A-1] > commit wait timed out > [drm:drm_crtc_commit_wait] *ERROR* flip_done timed out > [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [PLANE:31:plane-0] > commit wait timed out > tilcdc 4830e000.lcdc: already pending page flip! > > This is because there is operation sequence as below: > > drm_atomic_connector_commit_dpms(mode is DRM_MODE_DPMS_OFF): > ... > drm_atomic_helper_setup_commit <- init_completion(commit_A->flip_done) > drm_atomic_helper_commit_tail > tilcdc_crtc_atomic_disable > tilcdc_plane_atomic_update <- drm_crtc_send_vblank_event in > tilcdc_crtc_irq > is skipped since tilcdc_crtc->enabled > is 0 > tilcdc_crtc_atomic_flush <- drm_crtc_send_vblank_event is skipped > since > crtc->state->event is set to be NULL in > tilcdc_plane_atomic_update > drm_mode_setcrtc: > ... > drm_atomic_helper_setup_commit <- init_completion(commit_B->flip_done) > drm_atomic_helper_wait_for_dependencies > drm_crtc_commit_wait <- wait for commit_A->flip_done completing > > Just as shown above, the steps which could complete commit_A->flip_done > are all skipped and commit_A->flip_done will never be completed. This will > result a time-out ERROR when drm_crtc_commit_wait check the > commit_A->flip_done. > So add drm_crtc_send_vblank_event in tilcdc_crtc_atomic_disable to > complete commit_A->flip_done. > > Fixes: cb345decb4d2 ("drm/tilcdc: Use standard drm_atomic_helper_commit") > Signed-off-by: Quanyang Wang > --- > drivers/gpu/drm/tilcdc/tilcdc_crtc.c | 9 + > 1 file changed, 9 insertions(+) > > diff --git a/drivers/gpu/drm/tilcdc/tilcdc_crtc.c > b/drivers/gpu/drm/tilcdc/tilcdc_crtc.c > index 30213708fc99..d99afd19ca08 100644 > --- a/drivers/gpu/drm/tilcdc/tilcdc_crtc.c > +++ b/drivers/gpu/drm/tilcdc/tilcdc_crtc.c > @@ -515,6 +515,15 @@ static void tilcdc_crtc_off(struct drm_crtc *crtc, bool > shutdown) > > drm_crtc_vblank_off(crtc); > > + spin_lock_irq(&crtc->dev->event_lock); > + > + if (crtc->state->event) { > + drm_crtc_send_vblank_event(crtc, crtc->state->event); > + crtc->state->event = NULL; > + } > + > + spin_unlock_irq(&crtc->dev->event_lock); > + > tilcdc_crtc_disable_irqs(dev); > > pm_runtime_put_sync(dev->dev); >
Re: [PATCH 2/3] arm64/ptrace: introduce NT_ARM_PRSTATUS to get a full set of registers
On Wed, Jan 27, 2021 at 02:53:07PM +, Dave Martin wrote: > On Tue, Jan 19, 2021 at 02:06:36PM -0800, Andrei Vagin wrote: > > This is an alternative to NT_PRSTATUS that clobbers ip/r12 on AArch32, > > x7 on AArch64 when a tracee is stopped in syscall entry or syscall exit > > traps. > > > > Signed-off-by: Andrei Vagin > > This approach looks like it works, though I still think adding an option > for this under PTRACE_SETOPTIONS would be less intrusive. Dave, thank you for the feedback. I will prepare a patch with an option and then we will see what looks better. > > Adding a shadow regset like this also looks like it would cause the gp > regs to be pointlessly be dumped twice in a core dump. Avoiding that > might require hacks in the core code... > > > > --- > > arch/arm64/kernel/ptrace.c | 39 ++ > > include/uapi/linux/elf.h | 1 + > > 2 files changed, 40 insertions(+) > > > > diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c > > index 1863f080cb07..b8e4c2ddf636 100644 > > --- a/arch/arm64/kernel/ptrace.c > > +++ b/arch/arm64/kernel/ptrace.c > > @@ -591,6 +591,15 @@ static int gpr_get(struct task_struct *target, > > return ret; > > } > > > > +static int gpr_get_full(struct task_struct *target, > > + const struct user_regset *regset, > > + struct membuf to) > > +{ > > + struct user_pt_regs *uregs = &task_pt_regs(target)->user_regs; > > + > > + return membuf_write(&to, uregs, sizeof(*uregs)); > > +} > > + > > static int gpr_set(struct task_struct *target, const struct user_regset > > *regset, > >unsigned int pos, unsigned int count, > >const void *kbuf, const void __user *ubuf) > > @@ -1088,6 +1097,7 @@ static int tagged_addr_ctrl_set(struct task_struct > > *target, const struct > > > > enum aarch64_regset { > > REGSET_GPR, > > + REGSET_GPR_FULL, > > If we go with this approach, "REGSET_GPR_RAW" might be a preferable > name. Both regs represent all the regs ("full"), but REGSET_GPR is > mangled by the kernel. I agree that REGSET_GPR_RAW looks better in this case. > > > REGSET_FPR, > > REGSET_TLS, > > #ifdef CONFIG_HAVE_HW_BREAKPOINT > > @@ -1119,6 +1129,14 @@ static const struct user_regset aarch64_regsets[] = { > > .regset_get = gpr_get, > > .set = gpr_set > > }, > > + [REGSET_GPR_FULL] = { > > + .core_note_type = NT_ARM_PRSTATUS, ... > > diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h > > index 30f68b42eeb5..a2086d19263a 100644 > > --- a/include/uapi/linux/elf.h > > +++ b/include/uapi/linux/elf.h > > @@ -426,6 +426,7 @@ typedef struct elf64_shdr { > > #define NT_ARM_PACA_KEYS 0x407 /* ARM pointer authentication address > > keys */ > > #define NT_ARM_PACG_KEYS 0x408 /* ARM pointer authentication generic > > key */ > > #define NT_ARM_TAGGED_ADDR_CTRL0x409 /* arm64 tagged address control > > (prctl()) */ > > What happened to 0x40a..0x40f? shame on me :) > > [...] > > Cheers > ---Dave
Re: [PATCH v12 6/8] drm/mediatek: enable dither function
On Fri, Jan 29, 2021 at 3:42 PM Yongqiang Niu wrote: > > On Fri, 2021-01-29 at 14:46 +0800, Hsin-Yi Wang wrote: > > On Fri, Jan 29, 2021 at 2:30 PM Yongqiang Niu > > wrote: > > > > > > On Fri, 2021-01-29 at 14:24 +0800, Hsin-Yi Wang wrote: > > > > On Fri, Jan 29, 2021 at 9:33 AM CK Hu wrote: > > > > > > > > > > Hi, Hsin-Yi: > > > > > > > > > > On Thu, 2021-01-28 at 19:23 +0800, Hsin-Yi Wang wrote: > > > > > > From: Yongqiang Niu > > > > > > > > > > > > for 5 or 6 bpc panel, we need enable dither function > > > > > > to improve the display quality > > > > > > > > > > > > Signed-off-by: Yongqiang Niu > > > > > > Signed-off-by: Hsin-Yi Wang > > > > > > --- > > > > > > drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c | 15 +-- > > > > > > 1 file changed, 13 insertions(+), 2 deletions(-) > > > > > > > > > > > > diff --git a/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c > > > > > > b/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c > > > > > > index ac2cb25620357..6c8f246380a74 100644 > > > > > > --- a/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c > > > > > > +++ b/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c > > > > > > @@ -53,6 +53,7 @@ > > > > > > #define DITHER_ENBIT(0) > > > > > > #define DISP_DITHER_CFG 0x0020 > > > > > > #define DITHER_RELAY_MODEBIT(0) > > > > > > +#define DITHER_ENGINE_EN BIT(1) > > > > > > #define DISP_DITHER_SIZE 0x0030 > > > > > > > > > > > > #define LUT_10BIT_MASK 0x03ff > > > > > > @@ -314,9 +315,19 @@ static void mtk_dither_config(struct device > > > > > > *dev, unsigned int w, > > > > > > unsigned int bpc, struct cmdq_pkt > > > > > > *cmdq_pkt) > > > > > > { > > > > > > struct mtk_ddp_comp_dev *priv = dev_get_drvdata(dev); > > > > > > + bool enable = (bpc == 5 || bpc == 6); > > > > > > > > > > I strongly believe that dither function in dither is identical to the > > > > > one in gamma and od, and in mtk_dither_set_common(), 'bpc >= > > > > > MTK_MIN_BPC' is valid, so I believe we need not to limit bpc to 5 or > > > > > 6. > > > > > But we should consider the case that bpc is invalid in > > > > > mtk_dither_set_common(). Invalid case in gamma and od use different > > > > > way > > > > > to process. For gamma, dither is default relay mode, so invalid bpc > > > > > would do nothing in mtk_dither_set_common() and result in relay mode. > > > > > For od, it set to relay mode first, them invalid bpc would do nothing > > > > > in > > > > > mtk_dither_set_common() and result in relay mode. I would like dither, > > > > > gamma and od to process invalid bpc in the same way. One solution is > > > > > to > > > > > set relay mode in mtk_dither_set_common() for invalid bpc. > > > > > > > > > > Regards, > > > > > CK > > > > > > > > > > > > > I modify the mtk_dither_config() to follow: > > > > > > > > > > > > diff --git a/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c > > > > b/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c > > > > index ac2cb25620357..5b7fcedb9f9a8 100644 > > > > --- a/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c > > > > +++ b/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c > > > > @@ -53,6 +53,7 @@ > > > > #define DITHER_EN BIT(0) > > > > #define DISP_DITHER_CFG0x0020 > > > > #define DITHER_RELAY_MODE BIT(0) > > > > +#define DITHER_ENGINE_EN BIT(1) > > > > #define DISP_DITHER_SIZE 0x0030 > > > > > > > > #define LUT_10BIT_MASK 0x03ff > > > > @@ -166,6 +167,8 @@ void mtk_dither_set_common(void __iomem *regs, > > > > struct cmdq_client_reg *cmdq_reg, > > > > DITHER_ADD_LSHIFT_G(MTK_MAX_BPC - bpc), > > > > cmdq_reg, regs, DISP_DITHER_16); > > > > mtk_ddp_write(cmdq_pkt, dither_en, cmdq_reg, regs, cfg); > > > > + } else { > > > > + mtk_ddp_write(cmdq_pkt, DITHER_RELAY_MODE, cmdq_reg, > > > > regs, cfg); > > > > } > > > > } > > > > > > > > @@ -315,8 +318,12 @@ static void mtk_dither_config(struct device *dev, > > > > unsigned int w, > > > > { > > > > struct mtk_ddp_comp_dev *priv = dev_get_drvdata(dev); > > > > > > > > - mtk_ddp_write(cmdq_pkt, h << 16 | w, &priv->cmdq_reg, > > > > priv->regs, DISP_DITHER_SIZE); > > > > - mtk_ddp_write(cmdq_pkt, DITHER_RELAY_MODE, &priv->cmdq_reg, > > > > priv->regs, DISP_DITHER_CFG); > > > > + mtk_ddp_write(cmdq_pkt, h << 16 | w, &priv->cmdq_reg, > > > > priv->regs, > > > > + DISP_DITHER_SIZE); > > > > + mtk_ddp_write(cmdq_pkt, DITHER_RELAY_MODE, &priv->cmdq_reg, > > > > priv->regs, > > > > + DISP_DITHER_CFG); > > > > + mtk_dither_set_common(priv->regs, &priv->cmdq_reg, bpc, > > > > DISP_DITHER_CFG, > > > > +
[PATCH] power: supply: Simplify bool conversion
Fix the following coccicheck warning: ./drivers/power/supply/cpcap-charger.c:416:31-36: WARNING: conversion to bool not needed here Reported-by: Abaci Robot Signed-off-by: Yang Li --- drivers/power/supply/cpcap-charger.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/power/supply/cpcap-charger.c b/drivers/power/supply/cpcap-charger.c index c0d452e..c70a761 100644 --- a/drivers/power/supply/cpcap-charger.c +++ b/drivers/power/supply/cpcap-charger.c @@ -413,7 +413,7 @@ static bool cpcap_charger_vbus_valid(struct cpcap_charger_ddata *ddata) error = iio_read_channel_processed(channel, &value); if (error >= 0) - return value > 3900 ? true : false; + return value > 3900; dev_err(ddata->dev, "error reading VBUS: %i\n", error); -- 1.8.3.1
Re: [PATCH] PCI: endpoint: Select configfs dependency
Hi Arnd, Lorenzo, On 25/01/21 5:04 pm, Arnd Bergmann wrote: > From: Arnd Bergmann > > The newly added pci-epf-ntb driver uses configfs, which > causes a link failure when that is disabled at compile-time: > > arm-linux-gnueabi-ld: drivers/pci/endpoint/functions/pci-epf-ntb.o: in > function `epf_ntb_add_cfs': > pci-epf-ntb.c:(.text+0x954): undefined reference to > `config_group_init_type_name' > > Add a 'select' statement to Kconfig to ensure it's always there, > which is the common way to enable it for other configfs users. > > Fixes: 7dc64244f9e9 ("PCI: endpoint: Add EP function driver to provide NTB > functionality") > Signed-off-by: Arnd Bergmann Since I'm sending a new revision of NTB driver, I'll squash this patch with the driver patch and add Arnd's sign off. Thank You, Kishon > --- > drivers/pci/endpoint/functions/Kconfig | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/pci/endpoint/functions/Kconfig > b/drivers/pci/endpoint/functions/Kconfig > index 24bfb2af65a1..5d35fcd613ef 100644 > --- a/drivers/pci/endpoint/functions/Kconfig > +++ b/drivers/pci/endpoint/functions/Kconfig > @@ -16,6 +16,7 @@ config PCI_EPF_TEST > config PCI_EPF_NTB > tristate "PCI Endpoint NTB driver" > depends on PCI_ENDPOINT > + select CONFIGFS_FS > help > Select this configuration option to enable the NTB driver > for PCI Endpoint. NTB driver implements NTB controller >
Re: [PATCH 04/13] module: use RCU to synchronize find_module
On Thu 2021-01-28 19:14:12, Christoph Hellwig wrote: > Allow for a RCU-sched critical section around find_module, following > the lower level find_module_all helper, and switch the two callers > outside of module.c to use such a RCU-sched critical section instead > of module_mutex. > > Signed-off-by: Christoph Hellwig It looks good and safe. Reviewed-by: Petr Mladek Best Regards, Petr
Re: [RFC PATCH v3 00/13] virtio/vsock: introduce SOCK_SEQPACKET support
On Fri, Jan 29, 2021 at 09:41:50AM +0300, Arseny Krasnov wrote: On 28.01.2021 20:19, Stefano Garzarella wrote: Hi Arseny, I reviewed a part, tomorrow I hope to finish the other patches. Just a couple of comments in the TODOs below. On Mon, Jan 25, 2021 at 02:09:00PM +0300, Arseny Krasnov wrote: This patchset impelements support of SOCK_SEQPACKET for virtio transport. As SOCK_SEQPACKET guarantees to save record boundaries, so to do it, new packet operation was added: it marks start of record (with record length in header), such packet doesn't carry any data. To send record, packet with start marker is sent first, then all data is sent as usual 'RW' packets. On receiver's side, length of record is known >from packet with start record marker. Now as packets of one socket are not reordered neither on vsock nor on vhost transport layers, such marker allows to restore original record on receiver's side. If user's buffer is smaller that record length, when all out of size data is dropped. Maximum length of datagram is not limited as in stream socket, because same credit logic is used. Difference with stream socket is that user is not woken up until whole record is received or error occurred. Implementation also supports 'MSG_EOR' and 'MSG_TRUNC' flags. Tests also implemented. Arseny Krasnov (13): af_vsock: prepare for SOCK_SEQPACKET support af_vsock: prepare 'vsock_connectible_recvmsg()' af_vsock: implement SEQPACKET rx loop af_vsock: implement send logic for SOCK_SEQPACKET af_vsock: rest of SEQPACKET support af_vsock: update comments for stream sockets virtio/vsock: dequeue callback for SOCK_SEQPACKET virtio/vsock: fetch length for SEQPACKET record virtio/vsock: add SEQPACKET receive logic virtio/vsock: rest of SOCK_SEQPACKET support virtio/vsock: setup SEQPACKET ops for transport vhost/vsock: setup SEQPACKET ops for transport vsock_test: add SOCK_SEQPACKET tests drivers/vhost/vsock.c | 7 +- include/linux/virtio_vsock.h| 12 + include/net/af_vsock.h | 6 + include/uapi/linux/virtio_vsock.h | 9 + net/vmw_vsock/af_vsock.c| 543 -- net/vmw_vsock/virtio_transport.c| 4 + net/vmw_vsock/virtio_transport_common.c | 295 ++-- tools/testing/vsock/util.c | 32 +- tools/testing/vsock/util.h | 3 + tools/testing/vsock/vsock_test.c| 126 + 10 files changed, 862 insertions(+), 175 deletions(-) TODO: - Support for record integrity control. As transport could drop some packets, something like "record-id" and record end marker need to be implemented. Idea is that SEQ_BEGIN packet carries both record length and record id, end marker(let it be SEQ_END) carries only record id. To be sure that no one packet was lost, receiver checks length of data between SEQ_BEGIN and SEQ_END(it must be same with value in SEQ_BEGIN) and record ids of SEQ_BEGIN and SEQ_END(this means that both markers were not dropped. I think that easiest way to implement record id for SEQ_BEGIN is to reuse another field of packet header(SEQ_BEGIN already uses 'flags' as record length).For SEQ_END record id could be stored in 'flags'. I don't really like the idea of reusing the 'flags' field for this purpose. Another way to implement it, is to move metadata of both SEQ_END and SEQ_BEGIN to payload. But this approach has problem, because if we move something to payload, such payload is accounted by credit logic, which fragments payload, while payload with record length and id couldn't be fragmented. One way to overcome it is to ignore credit update for SEQ_BEGIN/SEQ_END packet.Another solution is to update 'stream_has_space()' function: current implementation return non-zero when at least 1 byte is allowed to use,but updated version will have extra argument, which is needed length. For 'RW' packet this argument is 1, for SEQ_BEGIN it is sizeof(record len + record id) and for SEQ_END it is sizeof(record id). Is the payload accounted by credit logic also if hdr.op is not VIRTIO_VSOCK_OP_RW? Yes, on send any packet with payload could be fragmented if there is not enough space at receiver. On receive 'fwd_cnt' and 'buf_alloc' are updated with header of every packet. Of course, to every such case i've described i can add check for 'RW' packet, to exclude payload from credit accounting, but this is bunch of dumb checks. I think that we can define a specific header to put after the virtio_vsock_hdr when hdr.op is SEQ_BEGIN or SEQ_END, and in this header we can store the id and the length of the message. I think it is better than use payload and touch credit logic Cool, so let's try this option, hoping there aren't a lot of issues. Another item for TODO could be to add the SOCK_SEQPACKET support also for vsock_loopback. Should be simple since it also uses virtio_transport_common APIs and it can be useful fo
Re: [PATCH v4 2/8] drm/mediatek: add component POSTMASK
Hi, Hsin-Yi: On Fri, 2021-01-29 at 15:34 +0800, Hsin-Yi Wang wrote: > From: Yongqiang Niu > > This patch add component POSTMASK, > > Signed-off-by: Yongqiang Niu > Signed-off-by: Hsin-Yi Wang > --- > drivers/gpu/drm/mediatek/Makefile| 1 + > drivers/gpu/drm/mediatek/mtk_disp_drv.h | 8 + > drivers/gpu/drm/mediatek/mtk_disp_postmask.c | 161 +++ > drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c | 11 ++ > drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.h | 1 + > drivers/gpu/drm/mediatek/mtk_drm_drv.c | 4 +- > drivers/gpu/drm/mediatek/mtk_drm_drv.h | 1 + > 7 files changed, 186 insertions(+), 1 deletion(-) > create mode 100644 drivers/gpu/drm/mediatek/mtk_disp_postmask.c > > diff --git a/drivers/gpu/drm/mediatek/Makefile > b/drivers/gpu/drm/mediatek/Makefile > index b64674b944860..13a0eafabf9c0 100644 > --- a/drivers/gpu/drm/mediatek/Makefile > +++ b/drivers/gpu/drm/mediatek/Makefile > @@ -3,6 +3,7 @@ > mediatek-drm-y := mtk_disp_color.o \ > mtk_disp_gamma.o \ > mtk_disp_ovl.o \ > + mtk_disp_postmask.o \ > mtk_disp_rdma.o \ > mtk_drm_crtc.o \ > mtk_drm_ddp_comp.o \ > diff --git a/drivers/gpu/drm/mediatek/mtk_disp_drv.h > b/drivers/gpu/drm/mediatek/mtk_disp_drv.h > index 02191010699f8..d74e85db3fcdf 100644 > --- a/drivers/gpu/drm/mediatek/mtk_disp_drv.h > +++ b/drivers/gpu/drm/mediatek/mtk_disp_drv.h > @@ -37,6 +37,14 @@ void mtk_gamma_set_common(void __iomem *regs, struct > drm_crtc_state *state); > void mtk_gamma_start(struct device *dev); > void mtk_gamma_stop(struct device *dev); > > +int mtk_postmask_clk_enable(struct device *dev); > +void mtk_postmask_clk_disable(struct device *dev); > +void mtk_postmask_config(struct device *dev, unsigned int w, > + unsigned int h, unsigned int vrefresh, > + unsigned int bpc, struct cmdq_pkt *cmdq_pkt); > +void mtk_postmask_start(struct device *dev); > +void mtk_postmask_stop(struct device *dev); > + > void mtk_ovl_bgclr_in_on(struct device *dev); > void mtk_ovl_bgclr_in_off(struct device *dev); > void mtk_ovl_bypass_shadow(struct device *dev); > diff --git a/drivers/gpu/drm/mediatek/mtk_disp_postmask.c > b/drivers/gpu/drm/mediatek/mtk_disp_postmask.c > new file mode 100644 > index 0..d640cef9c15a4 > --- /dev/null > +++ b/drivers/gpu/drm/mediatek/mtk_disp_postmask.c > @@ -0,0 +1,161 @@ > +/* > + * SPDX-License-Identifier: > + * > + * Copyright (c) 2020 MediaTek Inc. 2021 > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include "mtk_disp_drv.h" > +#include "mtk_drm_crtc.h" > +#include "mtk_drm_ddp_comp.h" > + > +#define DISP_POSTMASK_EN 0x > +#define POSTMASK_EN BIT(0) > +#define DISP_POSTMASK_CFG0x0020 > +#define POSTMASK_RELAY_MODE BIT(0) > +#define DISP_POSTMASK_SIZE 0x0030 > + > +struct mtk_disp_postmask_data { > + u32 reserved; > +}; Useless, so remove. > + > +/** > + * struct mtk_disp_postmask - DISP_postmask driver structure > + * @ddp_comp - structure containing type enum and hardware resources > + * @crtc - associated crtc to report irq events to > + */ > +struct mtk_disp_postmask { > + struct clk *clk; > + void __iomem *regs; > + struct cmdq_client_reg cmdq_reg; > + const struct mtk_disp_postmask_data *data; > +}; > + > +int mtk_postmask_clk_enable(struct device *dev) > +{ > + struct mtk_disp_postmask *postmask = dev_get_drvdata(dev); > + > + return clk_prepare_enable(postmask->clk); > +} > + > +void mtk_postmask_clk_disable(struct device *dev) > +{ > + struct mtk_disp_postmask *postmask = dev_get_drvdata(dev); > + > + clk_disable_unprepare(postmask->clk); > +} > + > +void mtk_postmask_config(struct device *dev, unsigned int w, > + unsigned int h, unsigned int vrefresh, > + unsigned int bpc, struct cmdq_pkt *cmdq_pkt) > +{ > + struct mtk_disp_postmask *postmask = dev_get_drvdata(dev); > + > + mtk_ddp_write(cmdq_pkt, w << 16 | h, &postmask->cmdq_reg, > postmask->regs, > + DISP_POSTMASK_SIZE); > + mtk_ddp_write(cmdq_pkt, POSTMASK_RELAY_MODE, &postmask->cmdq_reg, > + postmask->regs, DISP_POSTMASK_CFG); > +} > + > +void mtk_postmask_start(struct device *dev) > +{ > + struct mtk_disp_postmask *postmask = dev_get_drvdata(dev); > + > + writel(POSTMASK_EN, postmask->regs + DISP_POSTMASK_EN); > +} > + > +void mtk_postmask_stop(struct device *dev) > +{ > + struct mtk_disp_postmask *postmask = dev_get_drvdata(dev); > + > + writel_relaxed(0x0, postmask->regs + DISP_POSTMASK_EN); > +} > + > +static int mtk_disp_postmask_bind(struct device *dev, struct device *master, > void *data) > +{ > + return 0; > +} > + > +static void mtk_disp_po
[PATCH RESEND v5 6/8] regulator: mt6359: Add support for MT6359 regulator
From: Wen Su The MT6359 is a regulator found on boards based on MediaTek MT6779 and probably other SoCs. It is a so called pmic and connects as a slave to SoC using SPI, wrapped inside the pmic-wrapper. Signed-off-by: Wen Su Signed-off-by: Hsin-Hsiung Wang --- changes since v4: - add enable time of ldo. - use the device of mfd driver for the regulator_config. - add the regulators_node support. --- drivers/regulator/Kconfig | 9 + drivers/regulator/Makefile | 1 + drivers/regulator/mt6359-regulator.c | 669 + include/linux/regulator/mt6359-regulator.h | 58 ++ 4 files changed, 737 insertions(+) create mode 100644 drivers/regulator/mt6359-regulator.c create mode 100644 include/linux/regulator/mt6359-regulator.h diff --git a/drivers/regulator/Kconfig b/drivers/regulator/Kconfig index 53fa84f4d1e1..3de7bb5be8ac 100644 --- a/drivers/regulator/Kconfig +++ b/drivers/regulator/Kconfig @@ -750,6 +750,15 @@ config REGULATOR_MT6358 This driver supports the control of different power rails of device through regulator interface. +config REGULATOR_MT6359 + tristate "MediaTek MT6359 PMIC" + depends on MFD_MT6397 + help + Say y here to select this option to enable the power regulator of + MediaTek MT6359 PMIC. + This driver supports the control of different power rails of device + through regulator interface. + config REGULATOR_MT6360 tristate "MT6360 SubPMIC Regulator" depends on MFD_MT6360 diff --git a/drivers/regulator/Makefile b/drivers/regulator/Makefile index 680e539f6579..4f65eaead82d 100644 --- a/drivers/regulator/Makefile +++ b/drivers/regulator/Makefile @@ -91,6 +91,7 @@ obj-$(CONFIG_REGULATOR_MPQ7920) += mpq7920.o obj-$(CONFIG_REGULATOR_MT6311) += mt6311-regulator.o obj-$(CONFIG_REGULATOR_MT6323) += mt6323-regulator.o obj-$(CONFIG_REGULATOR_MT6358) += mt6358-regulator.o +obj-$(CONFIG_REGULATOR_MT6359) += mt6359-regulator.o obj-$(CONFIG_REGULATOR_MT6360) += mt6360-regulator.o obj-$(CONFIG_REGULATOR_MT6380) += mt6380-regulator.o obj-$(CONFIG_REGULATOR_MT6397) += mt6397-regulator.o diff --git a/drivers/regulator/mt6359-regulator.c b/drivers/regulator/mt6359-regulator.c new file mode 100644 index ..fabc3f57f334 --- /dev/null +++ b/drivers/regulator/mt6359-regulator.c @@ -0,0 +1,669 @@ +// SPDX-License-Identifier: GPL-2.0 +// +// Copyright (c) 2020 MediaTek Inc. + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define MT6359_BUCK_MODE_AUTO 0 +#define MT6359_BUCK_MODE_FORCE_PWM 1 +#define MT6359_BUCK_MODE_NORMAL0 +#define MT6359_BUCK_MODE_LP2 + +/* + * MT6359 regulators' information + * + * @desc: standard fields of regulator description. + * @status_reg: for query status of regulators. + * @qi: Mask for query enable signal status of regulators. + * @modeset_reg: for operating AUTO/PWM mode register. + * @modeset_mask: MASK for operating modeset register. + * @modeset_shift: SHIFT for operating modeset register. + */ +struct mt6359_regulator_info { + struct regulator_desc desc; + u32 status_reg; + u32 qi; + u32 modeset_reg; + u32 modeset_mask; + u32 modeset_shift; + u32 lp_mode_reg; + u32 lp_mode_mask; + u32 lp_mode_shift; +}; + +#define MT6359_BUCK(match, _name, min, max, step, min_sel, \ + volt_ranges, _enable_reg, _status_reg, \ + _vsel_reg, _vsel_mask, \ + _lp_mode_reg, _lp_mode_shift, \ + _modeset_reg, _modeset_shift) \ +[MT6359_ID_##_name] = {\ + .desc = { \ + .name = #_name, \ + .of_match = of_match_ptr(match),\ + .regulators_node = of_match_ptr("regulators"), \ + .ops = &mt6359_volt_range_ops, \ + .type = REGULATOR_VOLTAGE, \ + .id = MT6359_ID_##_name,\ + .owner = THIS_MODULE, \ + .uV_step = (step), \ + .linear_min_sel = (min_sel),\ + .n_voltages = ((max) - (min)) / (step) + 1, \ + .min_uV = (min),\ + .linear_ranges = volt_ranges, \ + .n_linear_ranges = ARRAY_SIZE(volt_ranges), \ + .vsel_reg = _vsel_reg, \ + .vsel_mask = _vsel_mask,\ + .enable_reg = _enable_reg, \ + .enable_mask = BIT(0),
[PATCH RESEND v5 8/8] arm64: dts: mt6359: add PMIC MT6359 related nodes
From: Wen Su add PMIC MT6359 related nodes which is for MT6779 platform Signed-off-by: Wen Su Signed-off-by: Hsin-Hsiung Wang --- changes since v4: - add pmic MT6359 support in the MT8192 evb dts. --- arch/arm64/boot/dts/mediatek/mt6359.dtsi| 298 arch/arm64/boot/dts/mediatek/mt8192-evb.dts | 1 + 2 files changed, 299 insertions(+) create mode 100644 arch/arm64/boot/dts/mediatek/mt6359.dtsi diff --git a/arch/arm64/boot/dts/mediatek/mt6359.dtsi b/arch/arm64/boot/dts/mediatek/mt6359.dtsi new file mode 100644 index ..4bd85e33a4c9 --- /dev/null +++ b/arch/arm64/boot/dts/mediatek/mt6359.dtsi @@ -0,0 +1,298 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2020 MediaTek Inc. + */ + +&pwrap { + pmic: pmic { + compatible = "mediatek,mt6359"; + interrupt-controller; + #interrupt-cells = <2>; + + mt6359codec: mt6359codec { + }; + + mt6359regulator: regulators { + mt6359_vs1_buck_reg: buck_vs1 { + regulator-name = "vs1"; + regulator-min-microvolt = <80>; + regulator-max-microvolt = <220>; + regulator-enable-ramp-delay = <0>; + regulator-always-on; + }; + mt6359_vgpu11_buck_reg: buck_vgpu11 { + regulator-name = "vgpu11"; + regulator-min-microvolt = <40>; + regulator-max-microvolt = <1193750>; + regulator-ramp-delay = <5000>; + regulator-enable-ramp-delay = <200>; + regulator-allowed-modes = <0 1 2>; + }; + mt6359_vmodem_buck_reg: buck_vmodem { + regulator-name = "vmodem"; + regulator-min-microvolt = <40>; + regulator-max-microvolt = <110>; + regulator-ramp-delay = <10760>; + regulator-enable-ramp-delay = <200>; + }; + mt6359_vpu_buck_reg: buck_vpu { + regulator-name = "vpu"; + regulator-min-microvolt = <40>; + regulator-max-microvolt = <1193750>; + regulator-ramp-delay = <5000>; + regulator-enable-ramp-delay = <200>; + regulator-allowed-modes = <0 1 2>; + }; + mt6359_vcore_buck_reg: buck_vcore { + regulator-name = "vcore"; + regulator-min-microvolt = <40>; + regulator-max-microvolt = <130>; + regulator-ramp-delay = <5000>; + regulator-enable-ramp-delay = <200>; + regulator-allowed-modes = <0 1 2>; + }; + mt6359_vs2_buck_reg: buck_vs2 { + regulator-name = "vs2"; + regulator-min-microvolt = <80>; + regulator-max-microvolt = <160>; + regulator-enable-ramp-delay = <0>; + regulator-always-on; + }; + mt6359_vpa_buck_reg: buck_vpa { + regulator-name = "vpa"; + regulator-min-microvolt = <50>; + regulator-max-microvolt = <365>; + regulator-enable-ramp-delay = <300>; + }; + mt6359_vproc2_buck_reg: buck_vproc2 { + regulator-name = "vproc2"; + regulator-min-microvolt = <40>; + regulator-max-microvolt = <1193750>; + regulator-ramp-delay = <7500>; + regulator-enable-ramp-delay = <200>; + regulator-allowed-modes = <0 1 2>; + }; + mt6359_vproc1_buck_reg: buck_vproc1 { + regulator-name = "vproc1"; + regulator-min-microvolt = <40>; + regulator-max-microvolt = <1193750>; + regulator-ramp-delay = <7500>; + regulator-enable-ramp-delay = <200>; + regulator-allowed-modes = <0 1 2>; + }; +
Re: [PATCH v4 7/8] soc: mediatek: add mtk mutex support for MT8192
Hi, Hsin-Yi: On Fri, 2021-01-29 at 15:34 +0800, Hsin-Yi Wang wrote: > From: Yongqiang Niu > > Add mtk mutex support for MT8192 SoC. Reviewed-by: CK Hu > > Signed-off-by: Yongqiang Niu > Signed-off-by: Hsin-Yi Wang > --- > drivers/soc/mediatek/mtk-mutex.c | 35 > 1 file changed, 35 insertions(+) > > diff --git a/drivers/soc/mediatek/mtk-mutex.c > b/drivers/soc/mediatek/mtk-mutex.c > index 718a41beb6afb..dfd9806d5a001 100644 > --- a/drivers/soc/mediatek/mtk-mutex.c > +++ b/drivers/soc/mediatek/mtk-mutex.c > @@ -39,6 +39,18 @@ > #define MT8167_MUTEX_MOD_DISP_DITHER 15 > #define MT8167_MUTEX_MOD_DISP_UFOE 16 > > +#define MT8192_MUTEX_MOD_DISP_OVL0 0 > +#define MT8192_MUTEX_MOD_DISP_OVL0_2L1 > +#define MT8192_MUTEX_MOD_DISP_RDMA0 2 > +#define MT8192_MUTEX_MOD_DISP_COLOR0 4 > +#define MT8192_MUTEX_MOD_DISP_CCORR0 5 > +#define MT8192_MUTEX_MOD_DISP_AAL0 6 > +#define MT8192_MUTEX_MOD_DISP_GAMMA0 7 > +#define MT8192_MUTEX_MOD_DISP_POSTMASK0 8 > +#define MT8192_MUTEX_MOD_DISP_DITHER09 > +#define MT8192_MUTEX_MOD_DISP_OVL2_2L16 > +#define MT8192_MUTEX_MOD_DISP_RDMA4 17 > + > #define MT8183_MUTEX_MOD_DISP_RDMA0 0 > #define MT8183_MUTEX_MOD_DISP_RDMA1 1 > #define MT8183_MUTEX_MOD_DISP_OVL0 9 > @@ -214,6 +226,20 @@ static const unsigned int > mt8183_mutex_mod[DDP_COMPONENT_ID_MAX] = { > [DDP_COMPONENT_WDMA0] = MT8183_MUTEX_MOD_DISP_WDMA0, > }; > > +static const unsigned int mt8192_mutex_mod[DDP_COMPONENT_ID_MAX] = { > + [DDP_COMPONENT_AAL0] = MT8192_MUTEX_MOD_DISP_AAL0, > + [DDP_COMPONENT_CCORR] = MT8192_MUTEX_MOD_DISP_CCORR0, > + [DDP_COMPONENT_COLOR0] = MT8192_MUTEX_MOD_DISP_COLOR0, > + [DDP_COMPONENT_DITHER] = MT8192_MUTEX_MOD_DISP_DITHER0, > + [DDP_COMPONENT_GAMMA] = MT8192_MUTEX_MOD_DISP_GAMMA0, > + [DDP_COMPONENT_POSTMASK0] = MT8192_MUTEX_MOD_DISP_POSTMASK0, > + [DDP_COMPONENT_OVL0] = MT8192_MUTEX_MOD_DISP_OVL0, > + [DDP_COMPONENT_OVL_2L0] = MT8192_MUTEX_MOD_DISP_OVL0_2L, > + [DDP_COMPONENT_OVL_2L2] = MT8192_MUTEX_MOD_DISP_OVL2_2L, > + [DDP_COMPONENT_RDMA0] = MT8192_MUTEX_MOD_DISP_RDMA0, > + [DDP_COMPONENT_RDMA4] = MT8192_MUTEX_MOD_DISP_RDMA4, > +}; > + > static const unsigned int mt2712_mutex_sof[MUTEX_SOF_DSI3 + 1] = { > [MUTEX_SOF_SINGLE_MODE] = MUTEX_SOF_SINGLE_MODE, > [MUTEX_SOF_DSI0] = MUTEX_SOF_DSI0, > @@ -275,6 +301,13 @@ static const struct mtk_mutex_data > mt8183_mutex_driver_data = { > .no_clk = true, > }; > > +static const struct mtk_mutex_data mt8192_mutex_driver_data = { > + .mutex_mod = mt8192_mutex_mod, > + .mutex_sof = mt8183_mutex_sof, > + .mutex_mod_reg = MT8183_MUTEX0_MOD0, > + .mutex_sof_reg = MT8183_MUTEX0_SOF0, > +}; > + > struct mtk_mutex *mtk_mutex_get(struct device *dev) > { > struct mtk_mutex_ctx *mtx = dev_get_drvdata(dev); > @@ -507,6 +540,8 @@ static const struct of_device_id mutex_driver_dt_match[] > = { > .data = &mt8173_mutex_driver_data}, > { .compatible = "mediatek,mt8183-disp-mutex", > .data = &mt8183_mutex_driver_data}, > + { .compatible = "mediatek,mt8192-disp-mutex", > + .data = &mt8192_mutex_driver_data}, > {}, > }; > MODULE_DEVICE_TABLE(of, mutex_driver_dt_match);
Re: [PATCH v16 07/11] secretmem: use PMD-size pages to amortize direct map fragmentation
On Thu 28-01-21 13:05:02, James Bottomley wrote: > Obviously the API choice could be revisited > but do you have anything to add over the previous discussion, or is > this just to get your access control? Well, access control is certainly one thing which I still believe is missing. But if there is a general agreement that the direct map manipulation is not that critical then this will become much less of a problem of course. It all boils down whether secret memory is a scarce resource. With the existing implementation it really is. It is effectivelly repeating same design errors as hugetlb did. And look now, we have a subtle and convoluted reservation code to track mmap requests and we have a cgroup controller to, guess what, have at least some control over distribution if the preallocated pool. See where am I coming from? If the secret memory is more in line with mlock without any imposed limit (other than available memory) in the end then, sure, using the same access control as mlock sounds reasonable. Btw. if this is really just a more restrictive mlock then is there any reason to not hook this into the existing mlock infrastructure (e.g. MCL_EXCLUSIVE)? Implications would be that direct map would be handled on instantiation/tear down paths, migration would deal with the same (if possible). Other than that it would be mlock like. -- Michal Hocko SUSE Labs
Re: [RFC PATCH] io_uring: add support for IORING_OP_GETDENTS64
On Fri, Jan 29, 2021 at 07:37:03AM +0200, Lennert Buytenhek wrote: > > > > One open question is whether IORING_OP_GETDENTS64 should be more like > > > > pread(2) and allow passing in a starting offset to read from the > > > > directory from. (This would require some more surgery in fs/readdir.c.) > > > > > > Since directories are seekable this ought to work. > > > Modulo horrid issues with 32bit file offsets. > > > > The incremental patch below does this. (It doesn't apply cleanly on > > top of v1 of the IORING_OP_GETDENTS patch as I have other changes in > > my tree -- I'm including it just to illustrate the changes that would > > make this work.) > > > > This change seems to work, and makes IORING_OP_GETDENTS take an > > explicitly specified directory offset (instead of using the file's > > ->f_pos), making it more like pread(2) [...] > > ...but the fact that this patch avoids taking file->f_pos_lock (as this > proposed version of IORING_OP_GETDENTS avoids using file->f_pos) means > that ->iterate_shared() can then be called concurrently on the same > struct file, which breaks the mutual exclusion guarantees provided here. > > If possible, I'd like to keep the ability to explicitly pass in a > directory offset to start reading from into IORING_OP_GETDENTS, so > perhaps we can simply satisfy the mutual exclusion requirement by > taking ->f_pos_lock by hand -- but then I do need to check that it's OK > for ->iterate{,_shared}() to be called successively with discontinuous > offsets without ->llseek() being called in between. > > (If that's not OK, then we can either have IORING_OP_GETDENTS just > always start reading at ->f_pos like before (which might then require > adding a IORING_OP_GETDENTS2 at some point in the future if we'll > ever want to change that), or we could have IORING_OP_GETDENTS itself > call ->llseek() for now whenever necessary.) Having IORING_OP_GETDENTS seek to sqe->off if needed seems easy enough to implement, and it simplifies the other code as well, so I'll send out a v2 RFC shortly that does this.
[PATCH v2] KVM: x86: Allow guests to see MSR_IA32_TSX_CTRL even if tsx=off
Userspace that does not know about KVM_GET_MSR_FEATURE_INDEX_LIST will generally use the default value for MSR_IA32_ARCH_CAPABILITIES. When this happens and the host has tsx=on, it is possible to end up with virtual machines that have HLE and RTM disabled, but TSX_CTRL available. If the fleet is then switched to tsx=off, kvm_get_arch_capabilities() will clear the ARCH_CAP_TSX_CTRL_MSR bit and it will not be possible to use the tsx=off hosts as migration destinations, even though the guests do not have TSX enabled. To allow this migration, allow guests to write to their TSX_CTRL MSR, while keeping the host MSR unchanged for the entire life of the guests. This ensures that TSX remains disabled and also saves MSR reads and writes, and it's okay to do because with tsx=off we know that guests will not have the HLE and RTM features in their CPUID. (If userspace sets bogus CPUID data, we do not expect HLE and RTM to work in guests anyway). Cc: sta...@vger.kernel.org Fixes: cbbaa2727aa3 ("KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES") Signed-off-by: Paolo Bonzini --- arch/x86/kvm/vmx/vmx.c | 17 + arch/x86/kvm/x86.c | 2 +- 2 files changed, 14 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index cc60b1fc3ee7..eb69fef57485 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6860,11 +6860,20 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu) switch (index) { case MSR_IA32_TSX_CTRL: /* -* No need to pass TSX_CTRL_CPUID_CLEAR through, so -* let's avoid changing CPUID bits under the host -* kernel's feet. +* TSX_CTRL_CPUID_CLEAR is handled in the CPUID +* interception. Keep the host value unchanged to avoid +* changing CPUID bits under the host kernel's feet. +* +* hle=0, rtm=0, tsx_ctrl=1 can be found with some +* combinations of new kernel and old userspace. If +* those guests run on a tsx=off host, do allow guests +* to use TSX_CTRL, but do not change the value on the +* host so that TSX remains always disabled. */ - vmx->guest_uret_msrs[j].mask = ~(u64)TSX_CTRL_CPUID_CLEAR; + if (boot_cpu_has(X86_FEATURE_RTM)) + vmx->guest_uret_msrs[j].mask = ~(u64)TSX_CTRL_CPUID_CLEAR; + else + vmx->guest_uret_msrs[j].mask = 0; break; default: vmx->guest_uret_msrs[j].mask = -1ull; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 76bce832cade..15733013b266 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1401,7 +1401,7 @@ static u64 kvm_get_arch_capabilities(void) *This lets the guest use VERW to clear CPU buffers. */ if (!boot_cpu_has(X86_FEATURE_RTM)) - data &= ~(ARCH_CAP_TAA_NO | ARCH_CAP_TSX_CTRL_MSR); + data &= ~ARCH_CAP_TAA_NO; else if (!boot_cpu_has_bug(X86_BUG_TAA)) data |= ARCH_CAP_TAA_NO; -- 2.26.2
[PATCH] gpiolib: free device name on error path to fix kmemleak
From: Quanyang Wang In gpiochip_add_data_with_key, we should check the return value of dev_set_name to ensure that device name is allocated successfully and then add a label on the error path to free device name to fix kmemleak as below: unreferenced object 0xc2d6fc40 (size 64): comm "kworker/0:1", pid 16, jiffies 4294937425 (age 65.120s) hex dump (first 32 bytes): 67 70 69 6f 63 68 69 70 30 00 1a c0 54 63 1a c0 gpiochip0...Tc.. 0c ed 84 c0 48 ed 84 c0 3c ee 84 c0 10 00 00 00 H...<... backtrace: [<962810f7>] kobject_set_name_vargs+0x2c/0xa0 [] dev_set_name+0x2c/0x5c [<94abbca9>] gpiochip_add_data_with_key+0xfc/0xce8 [<5c4193e0>] omap_gpio_probe+0x33c/0x68c [<3402f137>] platform_probe+0x58/0xb8 [<7421e210>] really_probe+0xec/0x3b4 [<000f8ada>] driver_probe_device+0x58/0xb4 [<67e0f7f7>] bus_for_each_drv+0x80/0xd0 [<4de545dc>] __device_attach+0xe8/0x15c [<2e4431e7>] bus_probe_device+0x84/0x8c [] device_add+0x384/0x7c0 [<5aff2995>] of_platform_device_create_pdata+0x8c/0xb8 [<061c3483>] of_platform_bus_create+0x198/0x230 [<5ee6d42a>] of_platform_populate+0x60/0xb8 [<2647300f>] sysc_probe+0xd18/0x135c [<3402f137>] platform_probe+0x58/0xb8 Signed-off-by: Quanyang Wang --- drivers/gpio/gpiolib.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index 7e1ad4d40e0a..091e00f2e0a9 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -603,7 +603,11 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data, ret = gdev->id; goto err_free_gdev; } - dev_set_name(&gdev->dev, GPIOCHIP_NAME "%d", gdev->id); + + ret = dev_set_name(&gdev->dev, GPIOCHIP_NAME "%d", gdev->id); + if (ret) + goto err_free_ida; + device_initialize(&gdev->dev); dev_set_drvdata(&gdev->dev, gdev); if (gc->parent && gc->parent->driver) @@ -617,7 +621,7 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data, gdev->descs = kcalloc(gc->ngpio, sizeof(gdev->descs[0]), GFP_KERNEL); if (!gdev->descs) { ret = -ENOMEM; - goto err_free_ida; + goto err_free_dev_name; } if (gc->ngpio == 0) { @@ -768,6 +772,8 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data, kfree_const(gdev->label); err_free_descs: kfree(gdev->descs); +err_free_dev_name: + kfree(dev_name(&gdev->dev)); err_free_ida: ida_free(&gpio_ida, gdev->id); err_free_gdev: -- 2.25.1
[PATCH 4.14 10/50] xen: Fix event channel callback via INTX/GSI
From: David Woodhouse [ Upstream commit 3499ba8198cad47b731792e5e56b9ec2a78a83a2 ] For a while, event channel notification via the PCI platform device has been broken, because we attempt to communicate with xenstore before we even have notifications working, with the xs_reset_watches() call in xs_init(). We tend to get away with this on Xen versions below 4.0 because we avoid calling xs_reset_watches() anyway, because xenstore might not cope with reading a non-existent key. And newer Xen *does* have the vector callback support, so we rarely fall back to INTX/GSI delivery. To fix it, clean up a bit of the mess of xs_init() and xenbus_probe() startup. Call xs_init() directly from xenbus_init() only in the !XS_HVM case, deferring it to be called from xenbus_probe() in the XS_HVM case instead. Then fix up the invocation of xenbus_probe() to happen either from its device_initcall if the callback is available early enough, or when the callback is finally set up. This means that the hack of calling xenbus_probe() from a workqueue after the first interrupt, or directly from the PCI platform device setup, is no longer needed. Signed-off-by: David Woodhouse Reviewed-by: Boris Ostrovsky Link: https://lore.kernel.org/r/20210113132606.422794-2-dw...@infradead.org Signed-off-by: Juergen Gross Signed-off-by: Sasha Levin --- arch/arm/xen/enlighten.c | 2 +- drivers/xen/events/events_base.c | 10 drivers/xen/platform-pci.c| 1 - drivers/xen/xenbus/xenbus.h | 1 + drivers/xen/xenbus/xenbus_comms.c | 8 --- drivers/xen/xenbus/xenbus_probe.c | 81 +-- include/xen/xenbus.h | 2 +- 7 files changed, 70 insertions(+), 35 deletions(-) diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c index ba7f4c8f5c3e4..e8e637c4f354d 100644 --- a/arch/arm/xen/enlighten.c +++ b/arch/arm/xen/enlighten.c @@ -393,7 +393,7 @@ static int __init xen_guest_init(void) } gnttab_init(); if (!xen_initial_domain()) - xenbus_probe(NULL); + xenbus_probe(); /* * Making sure board specific code will not set up ops for diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c index aca8456752797..8c08c7d46d3d0 100644 --- a/drivers/xen/events/events_base.c +++ b/drivers/xen/events/events_base.c @@ -1987,16 +1987,6 @@ static struct irq_chip xen_percpu_chip __read_mostly = { .irq_ack= ack_dynirq, }; -int xen_set_callback_via(uint64_t via) -{ - struct xen_hvm_param a; - a.domid = DOMID_SELF; - a.index = HVM_PARAM_CALLBACK_IRQ; - a.value = via; - return HYPERVISOR_hvm_op(HVMOP_set_param, &a); -} -EXPORT_SYMBOL_GPL(xen_set_callback_via); - #ifdef CONFIG_XEN_PVHVM /* Vector callbacks are better than PCI interrupts to receive event * channel notifications because we can receive vector callbacks on any diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c index 5d7dcad0b0a0d..4cec8146609ad 100644 --- a/drivers/xen/platform-pci.c +++ b/drivers/xen/platform-pci.c @@ -162,7 +162,6 @@ static int platform_pci_probe(struct pci_dev *pdev, ret = gnttab_init(); if (ret) goto grant_out; - xenbus_probe(NULL); return 0; grant_out: gnttab_free_auto_xlat_frames(); diff --git a/drivers/xen/xenbus/xenbus.h b/drivers/xen/xenbus/xenbus.h index 139539b0ab20d..e6a8d02d35254 100644 --- a/drivers/xen/xenbus/xenbus.h +++ b/drivers/xen/xenbus/xenbus.h @@ -114,6 +114,7 @@ int xenbus_probe_node(struct xen_bus_type *bus, const char *type, const char *nodename); int xenbus_probe_devices(struct xen_bus_type *bus); +void xenbus_probe(void); void xenbus_dev_changed(const char *node, struct xen_bus_type *bus); diff --git a/drivers/xen/xenbus/xenbus_comms.c b/drivers/xen/xenbus/xenbus_comms.c index eb5151fc8efab..e5fda0256feb3 100644 --- a/drivers/xen/xenbus/xenbus_comms.c +++ b/drivers/xen/xenbus/xenbus_comms.c @@ -57,16 +57,8 @@ DEFINE_MUTEX(xs_response_mutex); static int xenbus_irq; static struct task_struct *xenbus_task; -static DECLARE_WORK(probe_work, xenbus_probe); - - static irqreturn_t wake_waiting(int irq, void *unused) { - if (unlikely(xenstored_ready == 0)) { - xenstored_ready = 1; - schedule_work(&probe_work); - } - wake_up(&xb_waitq); return IRQ_HANDLED; } diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c index 217bcc092a968..fe24e8dcb2b8e 100644 --- a/drivers/xen/xenbus/xenbus_probe.c +++ b/drivers/xen/xenbus/xenbus_probe.c @@ -674,29 +674,76 @@ void unregister_xenstore_notifier(struct notifier_block *nb) } EXPORT_SYMBOL_GPL(unregister_xenstore_notifier); -void xenbus_probe(struct work_struct *unused) +void xenbus_probe(void) { xenstored_ready = 1; + /* +* In the HVM case, xenbus_init() deferre
[PATCH v2 1/2] EDAC/ghes: Add EDAC device for reporting the CPU cache errors
CPU L2 cache corrected errors are detected occasionally on few of our ARM64 hardware boards. Though it is rare, the probability of the CPU cache errors frequently occurring can't be avoided. The earlier failure detection by monitoring the cache corrected errors for the frequent occurrences and taking preventive action could prevent more serious hardware faults. On Intel architectures, cache corrected errors are reported and the affected cores are offlined in the architecture specific method. http://www.mcelog.org/cache.html However for the firmware-first error reporting, specifically on ARM64 architectures, there is no provision present for reporting the cache corrected error count to the user-space and taking preventive action such as offline the affected cores. For this purpose, it was suggested to create the CPU EDAC device for the CPU caches for reporting the cache error count for the firmware-first error reporting. The EDAC device blocks for the CPU caches would be created based on the cache information obtained from the cpu_cacheinfo. User-space application could monitor the recorded corrected error count for the earlier hardware failure detection and could take preventive action, such as offline the corresponding CPU core/s. Add an EDAC device and device blocks for the CPU caches based on the cache information from the cpu_cacheinfo. The cache's corrected error count would be stored in the /sys/devices/system/edac/cpu/cpu*/cache*/ce_count. Issues and possible solutions, 1.Cache info is not available for the CPUs offline. EDAC device interface requires creating EDAC device and device blocks together. It requires the number of caches per CPU as device blocks for the creation. However, this info is not available for the offlined CPUs. Tested Solution: Find the max number of caches among online CPUs, create the EDAC device for CPUs caches and get and populate the cache info for an offline CPU later, when the error is reported on that CPU for the first time. 2. Reporting error count for the Shared caches. There are few possible solutions, Tested Solution: Kernel would report a new error count for a shared cache through the EDAC device block for that CPU on which the error is reported. Then user-space application would sum the total error count from EDAC device block of all the CPUs in the shared CPU list of that shared cache. For the firmware-first error reporting, add an interface in the ghes_edac allow to report a CPU corrected error count. Suggested-by: James Morse Signed-off-by: Shiju Jose --- Documentation/ABI/testing/sysfs-devices-edac | 15 ++ drivers/acpi/apei/ghes.c | 8 +- drivers/edac/Kconfig | 12 ++ drivers/edac/ghes_edac.c | 186 +++ include/acpi/ghes.h | 27 +++ 5 files changed, 247 insertions(+), 1 deletion(-) diff --git a/Documentation/ABI/testing/sysfs-devices-edac b/Documentation/ABI/testing/sysfs-devices-edac index 256a9e990c0b..56a18b0af419 100644 --- a/Documentation/ABI/testing/sysfs-devices-edac +++ b/Documentation/ABI/testing/sysfs-devices-edac @@ -155,3 +155,18 @@ Description: This attribute file displays the total count of uncorrectable errors that have occurred on this DIMM. If panic_on_ue is set, this counter will not have a chance to increment, since EDAC will panic the system + +What: /sys/devices/system/edac/cpu/cpu*/cache*/ce_count +Date: December 2020 +Contact:linux-e...@vger.kernel.org +Description:This attribute file displays the total count of correctable +errors that have occurred on this CPU cache. This count is very important +to examine. CEs provide early indications that a cache is beginning +to fail. This count field should be monitored for non-zero values +and report such information to the system administrator. + +What: /sys/devices/system/edac/cpu/cpu*/cache*/ue_count +Date: December 2020 +Contact:linux-e...@vger.kernel.org +Description:This attribute file displays the total count of uncorrectable +errors that have occurred on this CPU cache. diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index fce7ade2aba9..139540f2c8f4 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -1452,4 +1452,10 @@ static int __init ghes_init(void) err: return rc; } -device_initcall(ghes_init); + +/* + * device_initcall_sync() is added instead of the device_initcall() + * because the CPU cacheinfo should be populated and is required for + * adding the CPU cache edac device in the ghes_edac_register(). + */ +device_initcall_sync(ghes_init); diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig index 81c42664f21b..39fb53aa9cd9 100644 --- a/drivers/edac/Kconfig +++ b/
Re: [PATCH] x86: Disable CET instrumentation in the kernel
On Thu, Jan 28, 2021 at 03:52:19PM -0600, Josh Poimboeuf wrote: > > With retpolines disabled, some configurations of GCC will add Intel CET > instrumentation to the kernel by default. That breaks certain tracing > scenarios by adding a superfluous ENDBR64 instruction before the fentry > call, for functions which can be called indirectly. > > CET instrumentation isn't currently necessary in the kernel, as CET is > only supported in user space. Disable it unconditionally. > > Reported-by: Nikolay Borisov > Signed-off-by: Josh Poimboeuf > --- > Makefile | 6 -- > arch/x86/Makefile | 3 +++ > 2 files changed, 3 insertions(+), 6 deletions(-) > > diff --git a/Makefile b/Makefile > index e0af7a4a5598..51c2bf34142d 100644 > --- a/Makefile > +++ b/Makefile > @@ -948,12 +948,6 @@ KBUILD_CFLAGS += $(call > cc-option,-Werror=designated-init) > # change __FILE__ to the relative path from the srctree > KBUILD_CPPFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=) > > -# ensure -fcf-protection is disabled when using retpoline as it is > -# incompatible with -mindirect-branch=thunk-extern > -ifdef CONFIG_RETPOLINE > -KBUILD_CFLAGS += $(call cc-option,-fcf-protection=none) > -endif > - Why is that even here, in the main Makefile if this cf-protection thing is x86-specific? Are we going to move it back there when some other arch gets CET or CET-like support? -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
[PATCH 4.9 00/30] 4.9.254-rc1 review
This is the start of the stable review cycle for the 4.9.254 release. There are 30 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know. Responses should be made by Sun, 31 Jan 2021 10:59:01 +. Anything received after that time might be too late. The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.254-rc1.gz or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below. thanks, greg k-h - Pseudo-Shortlog of commits: Greg Kroah-Hartman Linux 4.9.254-rc1 Arvind Sankar x86/boot/compressed: Disable relocation relaxation Gaurav Kohli tracing: Fix race in trace_open and buffer resize call Wang Hai Revert "mm/slub: fix a memory leak in sysfs_slab_add()" Dan Carpenter net: dsa: b53: fix an off by one in checking "vlan->vid" Eric Dumazet net_sched: avoid shift-out-of-bounds in tcindex_set_parms() Matteo Croce ipv6: create multicast route with RTPROT_KERNEL Alexander Lobakin skbuff: back tiny skbs with kmalloc() in __netdev_alloc_skb() too Geert Uytterhoeven sh_eth: Fix power down vs. is_opened flag ordering Necip Fazil Yildiran sh: dma: fix kconfig dependency for G2_DMA Guillaume Nault netfilter: rpfilter: mask ecn bits before fib lookup Will Deacon compiler.h: Raise minimum version of GCC to 5.1 for arm64 Daniel Borkmann bpf: Fix buggy rsh min/max bounds tracking JC Kuo xhci: tegra: Delay for disabling LFPS detector Mathias Nyman xhci: make sure TRB is fully written before giving it to the controller Patrik Jakobsson usb: bdc: Make bdc pci driver depend on BROKEN Thinh Nguyen usb: udc: core: Use lock when write to soft_connect Longfang Liu USB: ehci: fix an interrupt calltrace error Eugene Korenevsky ehci: fix EHCI host controller initialization sequence Wang Hui stm class: Fix module init return on allocation failure Lars-Peter Clausen iio: ad5504: Fix setting power-down state Vincent Mailhol can: dev: can_restart: fix use after free bug Wolfram Sang i2c: octeon: check correct size of maximum RECV_LEN packet Ben Skeggs drm/nouveau/i2c/gm200: increase width of aux semaphore owner fields Ben Skeggs drm/nouveau/bios: fix issue shadowing expansion ROMs Can Guo scsi: ufs: Correct the LUN used in eh_device_reset_handler() callback Cezary Rojewski ASoC: Intel: haswell: Add missing pm_ops Hannes Reinecke dm: avoid filesystem lookup in dm_get_dev_t() Hans de Goede ACPI: scan: Make acpi_bus_get_device() clear return pointer on error Takashi Iwai ALSA: hda/via: Add minimum mute flag Takashi Iwai ALSA: seq: oss: Fix missing error check in snd_seq_oss_synth_make_info() - Diffstat: Makefile | 4 ++-- arch/sh/drivers/dma/Kconfig| 3 +-- arch/x86/boot/compressed/Makefile | 2 ++ drivers/acpi/scan.c| 2 ++ drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadow.c | 2 +- drivers/gpu/drm/nouveau/nvkm/subdev/i2c/auxgm200.c | 8 drivers/hwtracing/stm/heartbeat.c | 6 -- drivers/i2c/busses/i2c-octeon-core.c | 2 +- drivers/iio/dac/ad5504.c | 4 ++-- drivers/md/dm-table.c | 15 --- drivers/net/can/dev.c | 4 ++-- drivers/net/dsa/b53/b53_common.c | 2 +- drivers/net/ethernet/renesas/sh_eth.c | 4 ++-- drivers/scsi/ufs/ufshcd.c | 11 --- drivers/usb/gadget/udc/bdc/Kconfig | 2 +- drivers/usb/gadget/udc/core.c | 13 ++--- drivers/usb/host/ehci-hcd.c| 12 drivers/usb/host/ehci-hub.c| 3 +++ drivers/usb/host/xhci-ring.c | 2 ++ drivers/usb/host/xhci-tegra.c | 7 +++ include/linux/compiler-gcc.h | 6 ++ kernel/bpf/verifier.c | 7 +++ kernel/trace/ring_buffer.c | 4 mm/slub.c | 4 +--- net/core/skbuff.c | 6 +- net/ipv4/netfilter/ipt_rpfilter.c | 2 +- net/ipv6/addrconf.c| 1 + net/sched/cls_tcindex.c| 8 ++-- sound/core/seq/oss/seq_oss_synth.c | 3 ++- sound/pci/hda/patch_via.c | 1 + sound/soc/intel/boards/haswell.c | 1 + 31 files changed, 106 insertions(+), 45 deletions(
Re: [PATCH v6] close_range.2: new page documenting close_range(2)
On Thu, Jan 28, 2021 at 09:50:23PM +0100, Michael Kerrisk (man-pages) wrote: > Hello Stephen, (and CHristian, please!) Ah, I think this was mostly done which is why I kept quiet. Christian
[PATCH RESEND v5 3/8] dt-bindings: mfd: Add compatible for the MediaTek MT6359 PMIC
This adds compatible for the MediaTek MT6359 PMIC. Signed-off-by: Hsin-Hsiung Wang --- changes since v4: - remove unused compatible name. --- Documentation/devicetree/bindings/mfd/mt6397.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/devicetree/bindings/mfd/mt6397.txt b/Documentation/devicetree/bindings/mfd/mt6397.txt index 2661775a3825..99a84b69a29f 100644 --- a/Documentation/devicetree/bindings/mfd/mt6397.txt +++ b/Documentation/devicetree/bindings/mfd/mt6397.txt @@ -21,6 +21,7 @@ Required properties: compatible: "mediatek,mt6323" for PMIC MT6323 "mediatek,mt6358" for PMIC MT6358 + "mediatek,mt6359" for PMIC MT6359 "mediatek,mt6397" for PMIC MT6397 Optional subnodes: -- 2.18.0
Re: [PATCH v2 04/16] rpmsg: ctrl: implement the ioctl function to create device
On 1/29/21 1:13 AM, Mathieu Poirier wrote: > [...] > >>> It seems to me that the main point to step forward is to clarify the global >>> design and features of the rpmsg-ctrl. >>> Depending on the decision taken, this series could be trashed and rewritten >>> from >>> a blank page...To not lost to much time on the series don't hesitate to >>> limit >>> the review to the minimum. >>> >> >> I doubt you will ever get clear guidelines on the whole solution. I will get >> back to you once I am done with the SMD driver, which should be in the >> latter part of next week. >> > > After looking at the rpmsg_chrdev driver, its current customers (i.e the Qcom > drivers), the rpmsg name service and considering the long term goals of this > patchset I have the following guidelines: > > 1) I thought long and hard about how to split the current rpmsg_chrdev driver > between the control plane and the raw device plane and the end solution looks > much slimpler than I expected. Exporting function rpmsg_eptdev_create() after > moving it to another file (along with other dependencies) should be all we > need. > Calling rpmsg_eptdev_create() from rpmsg_ctrldev_ioctl() will automatically > load > the new driver, the same way calling rpmsg_ns_register_device() from > rpmsg_probe() took care of loading the rpmsg_ns driver. > > 2) While keeping the control plane functionality related to > RPMSG_CREATE_EPT_IOCTL intact, introduce a new RPMSG_CREATE_DEV_IOCTL that > will > allow for the instantiation of rpmsg_devices, exactly the same way a name > service > announcement from a remote processor does. I envision that code path to > eventually call rpmsg_create_channel(). > > 3) Leave the rpmsg_channel_info structure intact and use the > rpmsg_channel_info::name to bind to a rpmsg_driver, exactly how it is > currently > done for name service driver selection. That will allow us to re-use the > current rpmsg_bus intrastructure, i.e rpmsg_bus::match(), without having to > deal > with yet another bus type. Proceeding this way gives us the opportunity to > keep > the current channel name convention for other rpmch_chrdev users untouched. > > 4) In a prior conversation you indicated the intention of instantiating the > rpmsg_chrdev from the name service interface. I agree with doing so but > conjugating that with the RPMSG_CHAR kenrel define may be tricky. I will wait > to see what you come up with. > > I hope this helps. Thank you for these guidelines! It need a bit of time to look at the details (especially point 1) ), but your suggestion seems to me to be a good compromise. I hope to come back soon with a new revision based on this point. Regards, Arnaud > > Thanks, > Mathieu > > > >>> Thanks, >>> Arnaud >>> Thanks, Mathieu > + return NULL; > +} > + > static long rpmsg_ctrl_dev_ioctl(struct file *fp, unsigned int cmd, >unsigned long arg) > { > struct rpmsg_ctrl_dev *ctrldev = fp->private_data; > - > - dev_info(&ctrldev->dev, "Control not yet implemented\n"); > + void __user *argp = (void __user *)arg; > + struct rpmsg_channel_info chinfo; > + struct rpmsg_endpoint_info eptinfo; > + struct rpmsg_device *newch; > + > + if (cmd != RPMSG_CREATE_EPT_IOCTL) > + return -EINVAL; > + > + if (copy_from_user(&eptinfo, argp, sizeof(eptinfo))) > + return -EFAULT; > + > + /* > + * In a frst step only the rpmsg_raw service is supported. > + * The override is foorced to RPMSG_RAW_SERVICE > + */ > + chinfo.driver_override = rpmsg_ctrl_get_drv_name(RPMSG_RAW_SERVICE); > + if (!chinfo.driver_override) > + return -ENODEV; > + > + memcpy(chinfo.name, eptinfo.name, RPMSG_NAME_SIZE); > + chinfo.name[RPMSG_NAME_SIZE - 1] = '\0'; > + chinfo.src = eptinfo.src; > + chinfo.dst = eptinfo.dst; > + > + newch = rpmsg_create_channel(ctrldev->rpdev, &chinfo); > + if (!newch) { > + dev_err(&ctrldev->dev, "rpmsg_create_channel failed\n"); > + return -ENXIO; > + } > > return 0; > }; > -- > 2.17.1 >
[v6,0/3] mt8183: Add Mediatek thermal driver and dtsi
This patchset supports for MT8183 chip to mtk_thermal.c. Add thermal zone of all the thermal sensor in SoC for another get temperatrue. They don't need to thermal throttle. And we bind coolers for thermal zone nodes of cpu_thermal. Changes in v6: - Rebase to kernel-5.11-rc1. - [1/3] - add interrupts property. - [2/3] - add the Tested-by in the commit message. - [3/3] - use the mt->conf->msr[id] instead of conf->msr[id] in the _get_sensor_temp and mtk_thermal_bank_temperature. - remove the redundant space in _get_sensor_temp and mtk_read_sensor_temp. - change kmalloc to dev_kmalloc in mtk_thermal_probe. Changes in v5: - Rebase to kernel-5.9-rc1. - Revise the title of cover letter. - Drop "[v4,7/7] thermal: mediatek: use spinlock to protect PTPCORESEL" - [2/2] - Add the judgement to the version of raw_to_mcelsius. Changes in v4: - Rebase to kernel-5.6-rc1. - [1/7] - Squash thermal zone settings in the dtsi from [v3,5/8] arm64: dts: mt8183: Increase polling frequency for CPU thermal zone. - Remove the property of interrupts and mediatek,hw-reset-temp. - [2/7] - Correct commit message. - [4/7] - Change the target temperature to the 80C and change the commit message. - [6/7] - Adjust newline alignment. - Fix the judgement on the return value of registering thermal zone. Changes in v3: - Rebase to kernel-5.5-rc1. - [1/8] - Update sustainable power of cpu, tzts1~5 and tztsABB. - [7/8] - Bypass the failure that non cpu_thermal sensor is not find in thermal-zones in dts, which is normal for mt8173, so prompt a warning here instead of failing. Return -EAGAIN instead of -EACCESS on the first read of sensor that often are bogus values. This can avoid following warning on boot: thermal thermal_zone6: failed to read out thermal zone (-13) Changes in v2: - [1/8] - Add the sustainable-power,trips,cooling-maps to the tzts1~tztsABB. - [4/8] - Add the min opp of cpu throttle. Matthias Kaehlcke (1): arm64: dts: mt8183: Configure CPU cooling Michael Kao (2): thermal: mediatek: add another get_temp ops for thermal sensors arm64: dts: mt8183: add thermal zone node arch/arm64/boot/dts/mediatek/mt8183.dtsi | 140 +++ drivers/thermal/mtk_thermal.c| 100 2 files changed, 215 insertions(+), 25 deletions(-) -- 2.18.0
Re: [PATCH v5 4/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller
On Fri, Jan 29, 2021 at 11:26:38AM +0100, Arnd Bergmann wrote: > Another clarification, as there are actually two independent > points here: > > * if you can completely remove the readl() above and just write a > hardcoded value into the register, or perhaps read the original > value once at boot time, that is probably a win because it > avoids one of the barriers in the beginning. The datasheet should > tell you if there are any bits in the register that have to be > preserved > > * Regarding the _relaxed() accessors, it's a lot harder to know > whether that is safe, as you first have to show, in particular in case > any of the accesses stop being guarded by the spinlock in that > case, and whether there may be a case where you have to > serialize the memory access against accesses that are still in the > store queue or prefetched. > > Whether this matters at all depends mostly on the type of devices > you are driving on your SoC. If you have any high-speed network > interfaces that are unable to do cache coherent DMA, any extra > instruction here may impact the number of packets you can transfer, > but if all your high-speed devices are connected to a coherent > interconnect, I would just go with the obvious approach and use > the safe MMIO accessors everywhere. For L2 cache code, I would say the opposite, actually, because it is all too easy to get into a deadlock otherwise. If you implement the sync callback, that will be called from every non-relaxed accessor, which means if you need to take some kind of lock in the sync callback and elsewhere in the L2 cache code, you will definitely deadlock. It is safer to put explicit barriers where it is necessary. Also remember that the barrier in readl() etc is _after_ the read, not before, and the barrier in writel() is _before_ the write, not after. The point is to ensure that DMA memory accesses are properly ordered with the IO-accessing instructions. So, using readl_relaxed() with a read-modify-write is entirely sensible provided you do not access DMA memory inbetween. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Re: [PATCH v4 4/8] drm/mediatek: enable OVL_LAYER_SMI_ID_EN for multi-layer usecase
Hi, Hsin-Yi: On Fri, 2021-01-29 at 15:34 +0800, Hsin-Yi Wang wrote: > From: Yongqiang Niu > > enable OVL_LAYER_SMI_ID_EN for multi-layer usecase, without this patch, > ovl will hang up when more than 1 layer enabled. Reviewed-by: CK Hu > > Signed-off-by: Yongqiang Niu > Signed-off-by: Hsin-Yi Wang > --- > drivers/gpu/drm/mediatek/mtk_disp_ovl.c | 17 + > 1 file changed, 17 insertions(+) > > diff --git a/drivers/gpu/drm/mediatek/mtk_disp_ovl.c > b/drivers/gpu/drm/mediatek/mtk_disp_ovl.c > index da7e38a28759b..961f87f8d4d15 100644 > --- a/drivers/gpu/drm/mediatek/mtk_disp_ovl.c > +++ b/drivers/gpu/drm/mediatek/mtk_disp_ovl.c > @@ -24,6 +24,7 @@ > #define DISP_REG_OVL_RST 0x0014 > #define DISP_REG_OVL_ROI_SIZE0x0020 > #define DISP_REG_OVL_DATAPATH_CON0x0024 > +#define OVL_LAYER_SMI_ID_EN BIT(0) > #define OVL_BGCLR_SEL_IN BIT(2) > #define DISP_REG_OVL_ROI_BGCLR 0x0028 > #define DISP_REG_OVL_SRC_CON 0x002c > @@ -62,6 +63,7 @@ struct mtk_disp_ovl_data { > unsigned int gmc_bits; > unsigned int layer_nr; > bool fmt_rgb565_is_0; > + bool smi_id_en; > }; > > /** > @@ -134,6 +136,13 @@ void mtk_ovl_start(struct device *dev) > { > struct mtk_disp_ovl *ovl = dev_get_drvdata(dev); > > + if (ovl->data->smi_id_en) { > + unsigned int reg; > + > + reg = readl(ovl->regs + DISP_REG_OVL_DATAPATH_CON); > + reg = reg | OVL_LAYER_SMI_ID_EN; > + writel_relaxed(reg, ovl->regs + DISP_REG_OVL_DATAPATH_CON); > + } > writel_relaxed(0x1, ovl->regs + DISP_REG_OVL_EN); > } > > @@ -142,6 +151,14 @@ void mtk_ovl_stop(struct device *dev) > struct mtk_disp_ovl *ovl = dev_get_drvdata(dev); > > writel_relaxed(0x0, ovl->regs + DISP_REG_OVL_EN); > + if (ovl->data->smi_id_en) { > + unsigned int reg; > + > + reg = readl(ovl->regs + DISP_REG_OVL_DATAPATH_CON); > + reg = reg & ~OVL_LAYER_SMI_ID_EN; > + writel_relaxed(reg, ovl->regs + DISP_REG_OVL_DATAPATH_CON); > + } > + > } > > void mtk_ovl_config(struct device *dev, unsigned int w,
[PATCH 4.9 19/30] bpf: Fix buggy rsh min/max bounds tracking
From: Daniel Borkmann [ no upstream commit ] Fix incorrect bounds tracking for RSH opcode. Commit f23cc643f9ba ("bpf: fix range arithmetic for bpf map access") had a wrong assumption about min/max bounds. The new dst_reg->min_value needs to be derived by right shifting the max_val bounds, not min_val, and likewise new dst_reg->max_value needs to be derived by right shifting the min_val bounds, not max_val. Later stable kernels than 4.9 are not affected since bounds tracking was overall reworked and they already track this similarly as in the fix. Fixes: f23cc643f9ba ("bpf: fix range arithmetic for bpf map access") Reported-by: Ryota Shiga (Flatt Security) Signed-off-by: Daniel Borkmann Reviewed-by: John Fastabend Cc: Josef Bacik Signed-off-by: Greg Kroah-Hartman --- kernel/bpf/verifier.c |7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1732,12 +1732,11 @@ static void adjust_reg_min_max_vals(stru * unsigned shift, so make the appropriate casts. */ if (min_val < 0 || dst_reg->min_value < 0) - dst_reg->min_value = BPF_REGISTER_MIN_RANGE; + reset_reg_range_values(regs, insn->dst_reg); else - dst_reg->min_value = - (u64)(dst_reg->min_value) >> min_val; + dst_reg->min_value = (u64)(dst_reg->min_value) >> max_val; if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE) - dst_reg->max_value >>= max_val; + dst_reg->max_value >>= min_val; break; default: reset_reg_range_values(regs, insn->dst_reg);
Re: [PATCH v3] mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfo
On Fri, Jan 29, 2021 at 12:34:51PM +0100, David Hildenbrand wrote: > Let's count the number of CMA pages per zone and print them in > /proc/zoneinfo. > > Having access to the total number of CMA pages per zone is helpful for > debugging purposes to know where exactly the CMA pages ended up, and to > figure out how many pages of a zone might behave differently, even after > some of these pages might already have been allocated. > > As one example, CMA pages part of a kernel zone cannot be used for > ordinary kernel allocations but instead behave more like ZONE_MOVABLE. > > For now, we are only able to get the global nr+free cma pages from > /proc/meminfo and the free cma pages per zone from /proc/zoneinfo. > > Example after this patch when booting a 6 GiB QEMU VM with > "hugetlb_cma=2G": > # cat /proc/zoneinfo | grep cma > cma 0 > nr_free_cma 0 > cma 0 > nr_free_cma 0 > cma 524288 > nr_free_cma 493016 > cma 0 > cma 0 > # cat /proc/meminfo | grep Cma > CmaTotal:2097152 kB > CmaFree: 1972064 kB > > Note: We print even without CONFIG_CMA, just like "nr_free_cma"; this way, > one can be sure when spotting "cma 0", that there are definetly no > CMA pages located in a zone. > > Cc: Andrew Morton > Cc: Thomas Gleixner > Cc: "Peter Zijlstra (Intel)" > Cc: Mike Rapoport > Cc: Oscar Salvador > Cc: Michal Hocko > Cc: Wei Yang > Cc: David Rientjes > Cc: linux-...@vger.kernel.org > Signed-off-by: David Hildenbrand Looks good to me, I guess it is better to print it unconditionally so the layout does not change. Reviewed-by: Oscar Salvador thanks > --- > > The third time is the charm. > > v2 -> v3: > - Print even without CONFIG_CMA. Use zone_cma_pages(). > - Adjust patch description > - Dropped Oscar's RB due to the changes > > v1 -> v2: > - Print/track only with CONFIG_CMA > - Extend patch description > > --- > include/linux/mmzone.h | 15 +++ > mm/page_alloc.c| 1 + > mm/vmstat.c| 6 -- > 3 files changed, 20 insertions(+), 2 deletions(-) > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index ae588b2f87ef..caafd5e37080 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -503,6 +503,9 @@ struct zone { >* bootmem allocator): >* managed_pages = present_pages - reserved_pages; >* > + * cma pages is present pages that are assigned for CMA use > + * (MIGRATE_CMA). > + * >* So present_pages may be used by memory hotplug or memory power >* management logic to figure out unmanaged pages by checking >* (present_pages - managed_pages). And managed_pages should be used > @@ -527,6 +530,9 @@ struct zone { > atomic_long_t managed_pages; > unsigned long spanned_pages; > unsigned long present_pages; > +#ifdef CONFIG_CMA > + unsigned long cma_pages; > +#endif > > const char *name; > > @@ -624,6 +630,15 @@ static inline unsigned long zone_managed_pages(struct > zone *zone) > return (unsigned long)atomic_long_read(&zone->managed_pages); > } > > +static inline unsigned long zone_cma_pages(struct zone *zone) > +{ > +#ifdef CONFIG_CMA > + return zone->cma_pages; > +#else > + return 0; > +#endif > +} > + > static inline unsigned long zone_end_pfn(const struct zone *zone) > { > return zone->zone_start_pfn + zone->spanned_pages; > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index b031a5ae0bd5..9a82375bbcb2 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2168,6 +2168,7 @@ void __init init_cma_reserved_pageblock(struct page > *page) > } > > adjust_managed_page_count(page, pageblock_nr_pages); > + page_zone(page)->cma_pages += pageblock_nr_pages; > } > #endif > > diff --git a/mm/vmstat.c b/mm/vmstat.c > index 7758486097f9..b2537852d498 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -1642,14 +1642,16 @@ static void zoneinfo_show_print(struct seq_file *m, > pg_data_t *pgdat, > "\nhigh %lu" > "\nspanned %lu" > "\npresent %lu" > -"\nmanaged %lu", > +"\nmanaged %lu" > +"\ncma %lu", > zone_page_state(zone, NR_FREE_PAGES), > min_wmark_pages(zone), > low_wmark_pages(zone), > high_wmark_pages(zone), > zone->spanned_pages, > zone->present_pages, > -zone_managed_pages(zone)); > +zone_managed_pages(zone), > +zone_cma_pages(zone)); > > seq_printf(m, > "\nprotection: (%ld", > -- > 2.29.2 > > -- Oscar Salvador SUSE L3
Re: [PATCH v2] kretprobe: avoid re-registration of the same kretprobe earlier
On Fri, 29 Jan 2021 22:29:47 +0900 Masami Hiramatsu wrote: > I'll send a patch over this to replace those check with WARN_ON() since > it's a software bug and should be fixed. Please use WARN_ON_ONCE() Thanks! -- Steve
Re: [PATCH v2] kretprobe: avoid re-registration of the same kretprobe earlier
On Fri, 29 Jan 2021 15:23:47 +0530 "Naveen N. Rao" wrote: > diff --git a/kernel/kprobes.c b/kernel/kprobes.c > index f7fb5d135930fa..63a36f33565354 100644 > --- a/kernel/kprobes.c > +++ b/kernel/kprobes.c > @@ -1530,6 +1530,7 @@ static inline int check_kprobe_rereg(struct kprobe *p) > ret = -EINVAL; > mutex_unlock(&kprobe_mutex); > > + WARN_ON(ret); > return ret; > } Please use WARN_ON_ONCE(ret); Thanks, -- Steve
[GIT PULL] Power management fixes for v5.11-rc6
Hi Linus, Please pull from the tag git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \ pm-5.11-rc6 with top-most commit fef9c8d28e28a808274a18fbd8cc2685817fd62a PM: hibernate: flush swap writer after marking on top of commit 6ee1d745b7c9fd573fba142a2efdad76a9f1cb04 Linux 5.11-rc5 to receive power management fixes for 5.11-rc6. These fix a deadlock in the "kexec jump" code and address a possible hibernation image creation issue. Specifics: - Fix a deadlock caused by attempting to acquire the same mutex twice in a row in the "kexec jump" code (Baoquan He). - Modify the hibernation image saving code to flush the unwritten data to the swap storage later so as to avoid failing to write the image signature which is possible in some cases (Laurent Badel). Thanks! --- Baoquan He (1): kernel: kexec: remove the lock operation of system_transition_mutex Laurent Badel (1): PM: hibernate: flush swap writer after marking --- kernel/kexec_core.c | 2 -- kernel/power/swap.c | 2 +- 2 files changed, 1 insertion(+), 3 deletions(-)
Re: [PATCH v2] x86/debug: Fix DR6 handling
On Fri, Jan 29, 2021 at 04:41:09PM +0100, Oleg Nesterov wrote: > This seems to fix the problem reported by Jan, see his test-case below. Should it be part of tools/testing/selftests/breakpoints/ ? tglx has one destined for there already, wouldn't hurt to have a second one: https://lkml.kernel.org/r/87eei4d4k6@nanos.tec.linutronix.de after applying kernel coding style to that one. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH] bus: mvebu-mbus: make iounmap() symmetric with ioremap()
On Fri, 29 Jan 2021 17:01:35 +0100 Gregory CLEMENT wrote: > Could you sent me the patch I don't have it in my emails boxes. https://lore.kernel.org/lkml/20201112032149.21906-1-chris.pack...@alliedtelesis.co.nz/raw Thomas -- Thomas Petazzoni, CTO, Bootlin Embedded Linux and Kernel engineering https://bootlin.com
Re: [PATCH] x86: Disable CET instrumentation in the kernel
On Fri, Jan 29, 2021 at 09:10:34AM -0600, Josh Poimboeuf wrote: > Maybe eventually. But the enablement (actually enabling CET/CFI/etc) > happens in the arch code anyway, right? So it could be a per-arch > decision. Right. Ok, for this one, what about Cc: ? What are "some configurations of GCC"? If it can be reproduced with what's released out there, maybe that should go in now, even for 5.11? Hmm? -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH iproute-next v2] devlink: add support for port params get/set
On 1/25/21 6:48 AM, Oleksandr Mazur wrote: > Add implementation for the port parameters getting/setting. > Add bash completion for port param. > Add man description for port param. > Add example commands here - both set and show. Include a json version of the show. > Signed-off-by: Oleksandr Mazur > --- > V2: > 1) Add bash completion for port param; > 2) Add man decsription / examples for port param; > > bash-completion/devlink | 55 > devlink/devlink.c | 275 +++- > man/man8/devlink-port.8 | 65 ++ > 3 files changed, 389 insertions(+), 6 deletions(-) > > diff --git a/devlink/devlink.c b/devlink/devlink.c > index a2e06644..0fc1d4f0 100644 > --- a/devlink/devlink.c > +++ b/devlink/devlink.c > @@ -2706,7 +2706,8 @@ static void pr_out_param_value(struct dl *dl, const > char *nla_name, > } > } > > -static void pr_out_param(struct dl *dl, struct nlattr **tb, bool array) > +static void pr_out_param(struct dl *dl, struct nlattr **tb, bool array, > + bool is_port_param) > { > struct nlattr *nla_param[DEVLINK_ATTR_MAX + 1] = {}; > struct nlattr *param_value_attr; > @@ -2714,6 +2715,7 @@ static void pr_out_param(struct dl *dl, struct nlattr > **tb, bool array) > int nla_type; > int err; > > + stray newline here
Re: dax alignment problem on arm64 (and other achitectures)
On Fri, Jan 29, 2021 at 8:19 AM David Hildenbrand wrote: > > On 29.01.21 03:06, Pavel Tatashin wrote: > >>> Might be related to the broken custom pfn_valid() implementation for > >>> ZONE_DEVICE. > >>> > >>> https://lkml.kernel.org/r/1608621144-4001-1-git-send-email-anshuman.khand...@arm.com > >>> > >>> And essentially ignoring sub-section data in there for now as well (but > >>> might not be that relevant yet). In addition, this might also be related > >>> to > >>> > >>> https://lkml.kernel.org/r/161058499000.1840162.702316708443239771.st...@dwillia2-desk3.amr.corp.intel.com > >> > >> I will check it, and see what I find. I saw that panic almost a year > >> ago, things might have changed since then. > > > > Hi David, > > > > There is no panic anymore, but I also can't offset by 2M anymore, the > > minimum that works now is 16M, and if alignment is less than 16M > > creating devdax device fails. > > I wonder why we get such different namespace sizes? Where do the > differences come from? This looks very weird. > > > > > So, I tried the new ARM64 patch that reduces section sizes, and two > > alignments for pmem: regular 2G alignment, and 2G+16M alignment. > > (subtracted 16M from the bottom) > > > > * 4K page, 6G RAM, 2G PRAM * > > BOOT: > > 4000-1bfff : System RAM > > 1c000-23fff : namespace0.0 > > DEVDAX: > > 4000-1bfff : System RAM > > 1c000-1c21f : namespace0.0 > > 1c220-23fff : dax0.0 > > HOTPLUG: > > 4000-1bfff : System RAM > > 1c000-1c21f : namespace0.0 > > 1c800-23fff : dax0.0 > >1c800-23fff : System RAM (kmem) 128M Wasted > > (Expected) > > The namespace spans 34MB?? > > > > > * 4K page, 6G-16M RAM, 2G+16M PRAM * > > BOOT: > > 4000-1beff : System RAM > > 1bf00-23fff : namespace0.0 > > DEVDAX: > > 4000-1beff : System RAM > > 1bf00-1c11f : namespace0.0 > > 1c120-23fff : dax0.0 > > HOTPLUG: > > 4000-1beff : System RAM > > 1bf00-1c11f : namespace0.0 > > 1c800-23fff : dax0.0 > >1c800-23fff : System RAM (kmem) 144M Wasted () > > The namespace spans 34MB?? Right, this seems like a bug > > > > > * 64K page, 6G RAM, 2G PRAM * > > BOOT: > > 4000-1bfff : System RAM > > 1c000-23fff : namespace0.0 > > DEVDAX: > > 4000-1bfff : System RAM > > 1c000-1dfff : namespace0.0 > > 1e000-23fff : dax0.0 > > HOTPLUG: > > 4000-1bfff : System RAM > > 1c000-1dfff : namespace0.0 > > The namespace spans 512MB ?!? What? This is because section size is 512M with 64K pages. > > > 1e000-23fff : dax0.0 > >1e000-23fff : System RAM (kmem) 512M Wasted > > (Expected) > > > > * 64K page, 6G-16M RAM, 2G+16M PRAM * > > BOOT: > > 4000-1beff : System RAM > > 1bf00-23fff : namespace0.0 > > DEVDAX: > > 4000-1beff : System RAM > > 1bf00-1bf3f : namespace0.0 > > 1bf40-23fff : dax0.0 > > HOTPLUG: > > 4000-1beff : System RAM > > 1bf00-1bf3f : namespace0.0 > > The namespace now consumes 4MB ?!? > > > 1c000-23fff : dax0.0 > >1c000-23fff : System RAM (kmem) 16M Wasted > > (Optimal) > > Good :) I guess more optimal would be 2MB/0MB :) Agree, but for the offset 16M this is optimal, because 16M is smaller than section size. > > > > > In all three cases only System RAM, namespace0.0, and dax0.0 were > > printed from /proc/iomem. > > BOOTcontent of iomem right after boot > > DEVDAX content of iomem after devdax is created > > ndctl create-namespace --mode devdax -e namespace0.0" > > HOTPLUG content of imem after dax0.0 is hotplugged: > > echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind > > echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id > > > > > > The most surprising part is why with 4K pages and 16M offset 144M is > > wasted? For whatever reason, when devdax is created 34 goes wasted to > > the label? Something is wrong here.. However, I am happy with 64K > > pages result, and that only 16M is wasted, of course optimally, we > > should be using any memory here, but it is still much better than what > > we have now. > > Definitely, but we should try figuring out what's going on here. I > assume on x86-64 it behaves differently? Yes, we should root cause. I highly suspect that there is somewhere alignment miscalculations happen that cause this memory waste with the offset 16M. I am also not sure why the 2M label size was increased, and why 16M is now an alignment requirement. I tested on x86, and got pretty much the same results as on ARM64: 2M offset is not allowed anymore 16M minimum, and even with 16M offset, 144M is wasted. Here is full QEMU command if anyone wants to repro it: KERNEL_PARAM='console=ttyS0 ip=dhcp' KERNEL_PARAM+=' memmap=2G!8G' #KERNEL_PARAM+=' memmap=2064M!8176M' qemu-system-x86_64
YOU HAVE WON
LOTTO.NL, 2391 Beds 152 Koningin Julianaplein 21, Den Haag-Netherlands. (Lotto affiliate with Subscriber Agents). From: Susan Console (Lottery Coordinator) Website: www.lotto.nl Sir/Madam, CONGRATULATIONS!!! We are pleased to inform you of the result of the Lotto NL Winners International programs held on the 27th of January 2021. Your e-mail address attached to ticket #: 00903228100 with prize # 778009/UK drew €1,000,000.00 which was first in the 2nd class of the draws. you are to receive €1,000,000.00 (One Million Euros). Because of mix up in cash pay-outs, we ask that you keep your winning information confidential until your money (€1,000,000.00) has been fully remitted to you by our accredited pay-point bank. This measure must be adhere to avoid loss of your cash prize-winners of our cash prizes are advised to adhere to these instructions to forestall the abuse of this program by other participants. It's important to note that this draws were conducted formally, and winners are selected through an internet ballot system from 60,000 individual and companies e-mail addresses - the draws are conducted around the world through our internet based ballot system. The promotion is sponsored and promoted Lotto NL. We congratulate you once again. We hope you will use part of it in our next draws; the jackpot winning is €85million. Remember, all winning must be claimed not later than 20 days. After this date all unclaimed cash prize will be forfeited and included in the next sweepstake. Please, in order to avoid unnecessary delays and complications remember to quote personal and winning numbers in all correspondence with us. Congratulations once again from all members of Lotto NL. Thank you for being part of our promotional program. To file for the release of your winnings you are advice to contact our Foreign Transfer Manager: MR. WILSON WARREN JOHNSON Tel: +31-620-561-787 Fax: +31-84-438-5342 Email: johnsonwilson...@gmail.com
Re: [PATCH v2] btrfs: Avoid calling btrfs_get_chunk_map() twice
On 1/27/21 8:57 AM, Michal Rostecki wrote: From: Michal Rostecki Before this change, the btrfs_get_io_geometry() function was calling btrfs_get_chunk_map() to get the extent mapping, necessary for calculating the I/O geometry. It was using that extent mapping only internally and freeing the pointer after its execution. That resulted in calling btrfs_get_chunk_map() de facto twice by the __btrfs_map_block() function. It was calling btrfs_get_io_geometry() first and then calling btrfs_get_chunk_map() directly to get the extent mapping, used by the rest of the function. This change fixes that by passing the extent mapping to the btrfs_get_io_geometry() function as an argument. v2: When btrfs_get_chunk_map() returns an error in btrfs_submit_direct(): - Use errno_to_blk_status(PTR_ERR(em)) as the status - Set em to NULL Signed-off-by: Michal Rostecki This panic'ed all of my test vms in their overnight xfstests runs, the panic is this [ 2449.936502] BTRFS critical (device dm-7): mapping failed logical 1113825280 bio len 40960 len 24576 [ 2449.937073] [ cut here ] [ 2449.937329] kernel BUG at fs/btrfs/volumes.c:6450! [ 2449.937604] invalid opcode: [#1] SMP NOPTI [ 2449.937855] CPU: 0 PID: 259045 Comm: kworker/u5:0 Not tainted 5.11.0-rc5+ #122 [ 2449.938252] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 [ 2449.938713] Workqueue: btrfs-worker-high btrfs_work_helper [ 2449.939016] RIP: 0010:btrfs_map_bio.cold+0x5a/0x5c [ 2449.939392] Code: 37 87 ff ff e8 ed d4 8a ff 48 83 c4 18 e9 b5 52 8b ff 49 89 c8 4c 89 fa 4c 89 f1 48 c7 c6 b0 c0 61 8b 48 89 ef e8 11 87 ff ff <0f> 0b 4c 89 e7 e8 42 09 86 ff e9 fd 59 8b ff 49 8b 7a 50 44 89 f2 [ 2449.940402] RSP: :9f24c1637d90 EFLAGS: 00010282 [ 2449.940689] RAX: 0057 RBX: 90c78ff716b8 RCX: [ 2449.941080] RDX: 90c7fbc27ae0 RSI: 90c7fbc19110 RDI: 90c7fbc19110 [ 2449.941467] RBP: 90c7911d4000 R08: R09: [ 2449.941853] R10: 9f24c1637b48 R11: 8b9723e8 R12: [ 2449.942243] R13: R14: a000 R15: 4263a000 [ 2449.942632] FS: () GS:90c7fbc0() knlGS: [ 2449.943072] CS: 0010 DS: ES: CR0: 80050033 [ 2449.943386] CR2: 5575163c3080 CR3: 00010ad6c004 CR4: 00370ef0 [ 2449.943772] Call Trace: [ 2449.943915] ? lock_release+0x1c3/0x290 [ 2449.944135] run_one_async_done+0x3a/0x60 [ 2449.944360] btrfs_work_helper+0x136/0x520 [ 2449.944588] process_one_work+0x26e/0x570 [ 2449.944812] worker_thread+0x55/0x3c0 [ 2449.945016] ? process_one_work+0x570/0x570 [ 2449.945250] kthread+0x137/0x150 [ 2449.945430] ? __kthread_bind_mask+0x60/0x60 [ 2449.945666] ret_from_fork+0x1f/0x30 it happens when you run btrfs/060. Please make sure to run xfstests against patches before you submit them upstream. Thanks, Josef
Re: kprobes broken since 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()")
On Fri, Jan 29, 2021 at 10:59:52AM -0500, Steven Rostedt wrote: > On Fri, 29 Jan 2021 22:40:11 +0900 > Masami Hiramatsu wrote: > > > > So what, they can all happen with random locks held. Marking them as NMI > > > enables a whole bunch of sanity checks that are entirely appropriate. > > > > How about introducing an idea of Asynchronous NMI (ANMI) and Synchronous > > NMI (SNMI)? kprobes and ftrace is synchronously called and can be controlled > > (we can expect the context) but ANMI may be caused by asynchronous > > hardware events on any context. > > > > If we can distinguish those 2 NMIs on preempt count, bpf people can easily > > avoid the inevitable situation. > > I don't like the name NMI IN SNMI, because they are not NMIs. They are > actually more like kernel exceptions. Even page faults in the kernel is > similar to a kprobe breakpoint or ftrace. It can happen anywhere, with any > lock held. Perhaps we need a kernel exception context? Which by definition > is synchronous. What problem are you trying to solve? AFAICT all these contexts have the same restrictions, why try and muck about with different names for the same thing?
Re: [REGRESSION] "ALSA: HDA: Early Forbid of runtime PM" broke my laptop's internal audio
On Fri, 29 Jan 2021 17:12:08 +0100, Michael Catanzaro wrote: > > On Fri, Jan 29, 2021 at 9:30 am, Michael Catanzaro > wrote: > > OK, I found "ALSA: hda/via: Apply the workaround generically for > > Clevo machines" which was just merged yesterday. So I will test > > again to find out. > > Hi Takashi, hi Harsha, > > I can confirm that the problem is fixed by this commit: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4961167bf7482944ca09a6f71263b9e47f949851 Thanks, good to hear. Then I think we can drop the entry from power_save_denylist in hda_intel.c. Could you try that it still works with the patch below? thanks, Takashi --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -2217,8 +2217,6 @@ static const struct snd_pci_quirk power_save_denylist[] = { /* https://bugzilla.redhat.com/show_bug.cgi?id=1525104 */ SND_PCI_QUIRK(0x1043, 0x8733, "Asus Prime X370-Pro", 0), /* https://bugzilla.redhat.com/show_bug.cgi?id=1525104 */ - SND_PCI_QUIRK(0x1558, 0x6504, "Clevo W65_67SB", 0), - /* https://bugzilla.redhat.com/show_bug.cgi?id=1525104 */ SND_PCI_QUIRK(0x1028, 0x0497, "Dell Precision T3600", 0), /* https://bugzilla.redhat.com/show_bug.cgi?id=1525104 */ /* Note the P55A-UD3 and Z87-D3HP share the subsys id for the HDA dev */
Re: [PATCH v5 2/7] pwm: pca9685: Support hardware readout
Hi Sven, On Fri, Jan 29, 2021 at 08:42:13AM -0500, Sven Van Asbroeck wrote: > On Mon, Jan 11, 2021 at 3:35 PM Uwe Kleine-König > wrote: > > > > My position here is: A consumer should disable a PWM before calling > > pwm_put. The driver should however not enforce this and so should not > > modify the hardware state in .free(). > > > > Also .probe should not change the PWM configuration. > > I agree that this is the most user-friendly behaviour. > > The problem however with the pca9685 is that it has many degrees of > freedom: there are many possible register values which produce the same > physical chip outputs. > > This could lead to a situation where, if .probe() does not reset the register > values, subsequent writes may lead to different outputs than expected. > > One possible solution is to write .get_state() so that it always reads the > correct state, even if "unconventional" register settings are present, i.e. > those written by an outside entity, e.g. a bootloader. Then write that > state back using driver conventions. > > This may be trickier than it sounds - after all we've learnt that the pca9685 > looks simple on the surface, but is actually quite challenging to get right. > > Clemens, Uwe, what do you think? Ok, so you suggest we extend our get_state logic to deal with cases like the following: If neither full OFF nor full ON is set && ON == OFF, we should probably set the full OFF bit to disable the PWM and log a warning message? (e.g. "invalid register setting detected: pwm disabled" ?) If the ON registers are set and the nxp,staggered-outputs property is not, I'd calculate (off - on) & 4095, set the OFF register to that value and clear the ON register. And then call our get_state in .probe, followed by a write of the resulting / fixed-up state? This would definitely solve the problem of invalid/unconventional values set by the bootloader and avoid inconsistencies. Sounds good to me! If Thierry and Uwe have no objections, I can send out a new round of patches in the upcoming weeks. My current goal is to get the changes into 5.13. Thanks, Clemens
Re: dax alignment problem on arm64 (and other achitectures)
On Fri, Jan 29, 2021 at 9:51 AM Joao Martins wrote: > > Hey Pavel, > > On 1/29/21 1:50 PM, Pavel Tatashin wrote: > >> Since we last talked about this the enabling for EFI "Special Purpose" > >> / Soft Reserved Memory has gone upstream and instantiates device-dax > >> instances for address ranges marked with EFI_MEMORY_SP attribute. > >> Critically this way of declaring device-dax removes the consideration > >> of it as persistent memory and as such no metadata reservation. So, if > >> you are willing to maintain the metadata external to the device (which > >> seems reasonable for your environment) and have your platform firmware > >> / kernel command line mark it as EFI_CONVENTIONAL_MEMORY + > >> EFI_MEMORY_SP, then these reserve-free dax-devices will surface. > > > > Hi Dan, > > > > This is cool. Does it allow conversion between devdax and fsdax so DAX > > aware filesystem can be installed and data can be put there to be > > preserved across the reboot? > > > > fwiw wrt to the 'preserved across kexec' part, you are going to need > something conceptually similar to snippet below the scissors mark. > Alternatively, we could fix kexec userspace to add conventional memory > ranges (without the SP attribute part) when it sees a Soft-Reserved region. > But can't tell which one is the right thing to do. Hi Joao, Is not it just a matter of appending arguments to the kernel parameter during kexec reboot with Soft-Reserved region specified, or am I missing something? I understand with fileload kexec syscall we might accidently load segments onto reserved region, but with the original kexec syscall, where we can specify destinations for each segment that should not be a problem with today's kexec tools. I agree that preserving it automatically as you are proposing, would make more sense, instead of fiddling with kernel parameters and segment destinations. Thank you, Pasha > > At the moment, HMAT ranges (or those defined with efi_fake_mem=) aren't > preserved not because of anything special with HMAT, but simply because > the EFI memmap conventional ram ranges are not preserved (only runtime > services). And HMAT/efi_fake_mem expects these to based on EFI memmap. > > >8-- > > From: Joao Martins > Subject: x86/efi: add Conventional Memory ranges to runtime-map > > Through EFI/HMAT certain ranges are marked with Specific Purpose > EFI attribute (EFI_MEMORY_SP). These ranges are usually > specified in a memory descriptor of type Conventional Memory. > > We only ever expose regions to the runtime-map that were marked > with efi_mem_reserve(). Currently these comprise the Runtime > Data/Code and Boot data. Everything else gets lost, so on a kexec > boot, if we had an HMAT (or efi_fake_mem= marked regions) the second > kernel kexec will lose this information, and expose this memory > as regular RAM. > > To address that, let's add the Conventional Memory ranges from the > firmware EFI memory map to the runtime. kexec then picks these up > on kexec load. Specifically, we save the fw memmap first, and when > we enter EFI virtual mode which on x86 is the latest point where > we filter the EFI memmap to construct one with only runtime services. > > Signed-off-by: Joao Martins > --- > diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c > index 8a26e705cb06..c244da8b185d 100644 > --- a/arch/x86/platform/efi/efi.c > +++ b/arch/x86/platform/efi/efi.c > @@ -663,6 +663,53 @@ static bool should_map_region(efi_memory_desc_t *md) > return false; > } > > +static void __init efi_fw_memmap_restore(void **map, int left, > +int *count, int *pg_shift) > +{ > + struct efi_memory_map_data *data = &efi_fw_memmap; > + void *fw_memmap, *new_memmap = *map; > + unsigned long desc_size; > + int i, nr_map; > + > + if (!data->phys_map) > + return; > + > + /* create new EFI memmap */ > + fw_memmap = early_memremap(data->phys_map, data->size); > + if (!fw_memmap) { > + return; > + } > + > + desc_size = data->desc_size; > + nr_map = data->size / desc_size; > + > + for (i = 0; i < nr_map; i++) { > + efi_memory_desc_t *md = efi_early_memdesc_ptr(fw_memmap, > + desc_size, i); > + > + if (md->type != EFI_CONVENTIONAL_MEMORY) > + continue; > + > + if (left < desc_size) { > + new_memmap = realloc_pages(new_memmap, *pg_shift); > + if (!new_memmap) { > + early_memunmap(fw_memmap, data->size); > + return; > + } > + > + left += PAGE_SIZE << *pg_shift; > + (*pg_shift)++; > + } > + > + memcpy(new_memmap + (*count * desc_size), md, desc_size); > + > +
Re: [PATCH 1/3] kvfree_rcu: Allocate a page for a single argument
On Fri, Jan 29, 2021 at 09:56:29AM +0100, Michal Hocko wrote: > On Thu 28-01-21 19:02:37, Uladzislau Rezki wrote: > [...] > > >From 0bdb8ca1ae62088790e0a452c4acec3821e06989 Mon Sep 17 00:00:00 2001 > > From: "Uladzislau Rezki (Sony)" > > Date: Wed, 20 Jan 2021 17:21:46 +0100 > > Subject: [PATCH v2 1/1] kvfree_rcu: Directly allocate page for > > single-argument > > case > > > > Single-argument kvfree_rcu() must be invoked from sleepable contexts, > > so we can directly allocate pages. Furthermmore, the fallback in case > > of page-allocation failure is the high-latency synchronize_rcu(), so it > > makes sense to do these page allocations from the fastpath, and even to > > permit limited sleeping within the allocator. > > > > This commit therefore allocates if needed on the fastpath using > > GFP_KERNEL|__GFP_NORETRY. > > Yes, __GFP_NORETRY as a lightweight allocation mode should be fine. It > is more robust than __GFP_NOWAIT on memory usage spikes. The caller is > prepared to handle the failure which is likely much less disruptive than > OOM or potentially heavy reclaim __GFP_RETRY_MAYFAIL. > > I cannot give you ack as I am not familiar with the code but this makes > sense to me. > No problem, i can separate it. We can have a patch on top of what we have so far. The patch only modifies the gfp_mask passed to __get_free_pages(): >From ec2feaa9b7f55f73b3b17e9ac372151c1aab5ae0 Mon Sep 17 00:00:00 2001 From: "Uladzislau Rezki (Sony)" Date: Fri, 29 Jan 2021 17:16:03 +0100 Subject: [PATCH 1/1] kvfree_rcu: replace __GFP_RETRY_MAYFAIL by __GFP_NORETRY __GFP_RETRY_MAYFAIL is a bit heavy from reclaim process of view, therefore a time consuming. That is not optional and there is no need in doing it so hard, because we have a fallback path. __GFP_NORETRY in its turn can perform some light-weight reclaim and it rather fails under high memory pressure or low memory condition. In general there are four simple criterias we we would like to achieve: a) minimize a fallback hitting; b) avoid of OOM invoking; c) do a light-wait page request; d) avoid of dipping into the emergency reserves. Signed-off-by: Uladzislau Rezki (Sony) --- kernel/rcu/tree.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 70ddc339e0b7..1e862120db9e 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -3489,8 +3489,20 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp, bnode = get_cached_bnode(*krcp); if (!bnode && can_alloc) { krc_this_cpu_unlock(*krcp, *flags); + + // __GFP_NORETRY - allows a light-weight direct reclaim + // what is OK from minimizing of fallback hitting point of + // view. Apart of that it forbids any OOM invoking what is + // also beneficial since we are about to release memory soon. + // + // __GFP_NOMEMALLOC - prevents from consuming of all the + // memory reserves. Please note we have a fallback path. + // + // __GFP_NOWARN - it is supposed that an allocation can + // be failed under low memory or high memory pressure + // scenarios. bnode = (struct kvfree_rcu_bulk_data *) - __get_free_page(GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOMEMALLOC | __GFP_NOWARN); + __get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN); *krcp = krc_this_cpu_lock(flags); } -- 2.20.1 -- Vlad Rezki
Re: [net-next PATCH v4 01/15] Documentation: ACPI: DSD: Document MDIO PHY
On Fri, Jan 29, 2021 at 7:48 AM Calvin Johnson wrote: > > On Thu, Jan 28, 2021 at 02:27:00PM +0100, Rafael J. Wysocki wrote: > > On Thu, Jan 28, 2021 at 2:12 PM Calvin Johnson > > wrote: > > > > > > On Thu, Jan 28, 2021 at 01:00:40PM +0100, Rafael J. Wysocki wrote: > > > > On Thu, Jan 28, 2021 at 12:27 PM Calvin Johnson > > > > wrote: > > > > > > > > > > Hi Rafael, > > > > > > > > > > Thanks for the review. I'll work on all the comments. > > > > > > > > > > On Fri, Jan 22, 2021 at 08:22:21PM +0100, Rafael J. Wysocki wrote: > > > > > > On Fri, Jan 22, 2021 at 4:43 PM Calvin Johnson > > > > > > wrote: > > > > > > > > > > > > > > Introduce ACPI mechanism to get PHYs registered on a MDIO bus and > > > > > > > provide them to be connected to MAC. > > > > > > > > > > > > > > Describe properties "phy-handle" and "phy-mode". > > > > > > > > > > > > > > Signed-off-by: Calvin Johnson > > > > > > > --- > > > > > > > > > > > > > > Changes in v4: > > > > > > > - More cleanup > > > > > > > > > > > > This looks much better that the previous versions IMV, some nits > > > > > > below. > > > > > > > > > > > > > Changes in v3: None > > > > > > > Changes in v2: > > > > > > > - Updated with more description in document > > > > > > > > > > > > > > Documentation/firmware-guide/acpi/dsd/phy.rst | 129 > > > > > > > ++ > > > > > > > 1 file changed, 129 insertions(+) > > > > > > > create mode 100644 Documentation/firmware-guide/acpi/dsd/phy.rst > > > > > > > > > > > > > > diff --git a/Documentation/firmware-guide/acpi/dsd/phy.rst > > > > > > > b/Documentation/firmware-guide/acpi/dsd/phy.rst > > > > > > > new file mode 100644 > > > > > > > index ..76fca994bc99 > > > > > > > --- /dev/null > > > > > > > +++ b/Documentation/firmware-guide/acpi/dsd/phy.rst > > > > > > > @@ -0,0 +1,129 @@ > > > > > > > +.. SPDX-License-Identifier: GPL-2.0 > > > > > > > + > > > > > > > += > > > > > > > +MDIO bus and PHYs in ACPI > > > > > > > += > > > > > > > + > > > > > > > +The PHYs on an MDIO bus [1] are probed and registered using > > > > > > > +fwnode_mdiobus_register_phy(). > > > > > > > > > > > > Empty line here, please. > > > > > > > > > > > > > +Later, for connecting these PHYs to MAC, the PHYs registered on > > > > > > > the > > > > > > > +MDIO bus have to be referenced. > > > > > > > + > > > > > > > +The UUID given below should be used as mentioned in the "Device > > > > > > > Properties > > > > > > > +UUID For _DSD" [2] document. > > > > > > > + - UUID: daffd814-6eba-4d8c-8a91-bc9bbf4aa301 > > > > > > > > > > > > I would drop the above paragraph. > > > > > > > > > > > > > + > > > > > > > +This document introduces two _DSD properties that are to be used > > > > > > > +for PHYs on the MDIO bus.[3] > > > > > > > > > > > > I'd say "for connecting PHYs on the MDIO bus [3] to the MAC layer." > > > > > > above and add the following here: > > > > > > > > > > > > "These properties are defined in accordance with the "Device > > > > > > Properties UUID For _DSD" [2] document and the > > > > > > daffd814-6eba-4d8c-8a91-bc9bbf4aa301 UUID must be used in the Device > > > > > > Data Descriptors containing them." > > > > > > > > > > > > > + > > > > > > > +phy-handle > > > > > > > +-- > > > > > > > +For each MAC node, a device property "phy-handle" is used to > > > > > > > reference > > > > > > > +the PHY that is registered on an MDIO bus. This is mandatory for > > > > > > > +network interfaces that have PHYs connected to MAC via MDIO bus. > > > > > > > + > > > > > > > +During the MDIO bus driver initialization, PHYs on this bus are > > > > > > > probed > > > > > > > +using the _ADR object as shown below and are registered on the > > > > > > > MDIO bus. > > > > > > > > > > > > Do you want to mention the "reg" property here? I think it would be > > > > > > useful to do that. > > > > > > > > > > No. I think we should adhere to _ADR in MDIO case. The "reg" property > > > > > for ACPI > > > > > may be useful for other use cases that Andy is aware of. > > > > > > > > The code should reflect this, then. I mean it sounds like you want to > > > > check the "reg" property only if this is a non-ACPI node. > > > > > > Right. For MDIO case, that is what is required. > > > "reg" for DT and "_ADR" for ACPI. > > > > > > However, Andy pointed out [1] that ACPI nodes can also hold reg property > > > and > > > therefore, fwnode_get_id() need to be capable to handling that situation > > > as > > > well. > > > > No, please don't confuse those two things. > > > > Yes, ACPI nodes can also hold a "reg" property, but the meaning of it > > depends on the binding which is exactly my point: _ADR is not a > > fallback replacement for "reg" in general and it is not so for MDIO > > too. The new function as proposed doesn't match the MDIO requirements > > and so it should not be used for MDIO. > > > > For MDIO, the exact flow mentioned above needs to be implemented (and > > if someone
Re: Quick review of RCU-related patches in v5.10.8-rt23
On Fri, Jan 29, 2021 at 05:11:37PM +0100, Sebastian Andrzej Siewior wrote: > On 2021-01-28 11:50:37 [-0800], Paul E. McKenney wrote: > > Hello, Sebastian, > > Hi Paul, > > > Just doing my periodic (but decidedly non-real-time) scan of RCU-related > > patches in -rt, in this case v5.10.8-rt23: > > > > f3541b467fbb ("sched: Do not account rcu_preempt_depth on RT in > > might_sleep()") > > If the scheduler maintainers are OK with their part of this patch, > > looks good to me, given CONFIG_PREEMPT_RT. Feel free to add: > > Acked-by: Paul E. McKenney > > Thank. I think we should pump it together with the rt-mutex part. But I > add a note. > > > d8c5a7d75e08 ("rcutorture: Avoid problematic critical section nesting on > > RT") > > This one I need to understand better. I do like the use of local > > variables to make the "if" conditions less unruly. > > This originated in > https://lkml.kernel.org/r/20190911165729.11178-6-sw...@redhat.com > > I planned to post it upstream last cycle but it appears that it broke > apart and I did not yet look how to fix it. I do recall the discussion, I just need to get up to speed on the details. ;-) > > The rest are in -rcu already: > > > > a163ef8687a1 ("rcu: make RCU_BOOST default on RT") > > Commit 2341bc4a0311 in -rcu. In yesterday's pull request. > > 5ffd75a96828 ("rcu: Use rcuc threads on PREEMPT_RT as we did") > > Commit 8b9a0ecc7ef5 in -rcu. In yesterday's pull request. > > e0b671bca2e7 ("rcu: enable rcu_normal_after_boot by default for RT") > > Commit 36221e109eb2 in -rcu. In yesterday's pull request. > > e27ef68731a1 ("rcu: Don't invoke try_invoke_on_locked_down_task() with irqs > > disabled") > > This one is in v5.10 mainline. > > \o/ > > > Any reason I shouldn't pull in db93e2f1b4b0 ("rcu: Prevent false positive > > softirq warning on RT") for v5.13? > > tglx has a version of that with your Reviewed-by tag on it in this > softirq tree waiting. So I guess just sit it out ;) Works for me! Thanx, Paul > Thank you for looking Paul. > > Thanx, Paul > > Sebastian
[PATCH] lib: crc8: Pointer to data block should be const
crc8() does not change the data passed to it, so the pointer argument should be declared const. This avoids callers that receive const data having to cast it to a non-const pointer to call crc8(). Signed-off-by: Richard Fitzgerald --- include/linux/crc8.h | 2 +- lib/crc8.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/crc8.h b/include/linux/crc8.h index 13c8dabb0441..674045c59a04 100644 --- a/include/linux/crc8.h +++ b/include/linux/crc8.h @@ -96,6 +96,6 @@ void crc8_populate_msb(u8 table[CRC8_TABLE_SIZE], u8 polynomial); * Williams, Ross N., rossross.net * (see URL http://www.ross.net/crc/download/crc_v3.txt). */ -u8 crc8(const u8 table[CRC8_TABLE_SIZE], u8 *pdata, size_t nbytes, u8 crc); +u8 crc8(const u8 table[CRC8_TABLE_SIZE], const u8 *pdata, size_t nbytes, u8 crc); #endif /* __CRC8_H_ */ diff --git a/lib/crc8.c b/lib/crc8.c index 595a5a75e3cd..1ad8e501d9b6 100644 --- a/lib/crc8.c +++ b/lib/crc8.c @@ -71,7 +71,7 @@ EXPORT_SYMBOL(crc8_populate_lsb); * @nbytes: number of bytes in data buffer. * @crc: previous returned crc8 value. */ -u8 crc8(const u8 table[CRC8_TABLE_SIZE], u8 *pdata, size_t nbytes, u8 crc) +u8 crc8(const u8 table[CRC8_TABLE_SIZE], const u8 *pdata, size_t nbytes, u8 crc) { /* loop over the buffer data */ while (nbytes-- > 0) -- 2.20.1
[git pull] IOMMU Fixes for Linux v5.11-rc5
Hi Linus, The following changes since commit 6ee1d745b7c9fd573fba142a2efdad76a9f1cb04: Linux 5.11-rc5 (2021-01-24 16:47:14 -0800) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git tags/iommu-fixes-v5.11-rc5 for you to fetch changes up to 29b32839725f8c89a41cb6ee054c85f3116ea8b5: iommu/vt-d: Do not use flush-queue when caching-mode is on (2021-01-28 13:59:02 +0100) IOMMU Fixes for Linux v5.11-rc5 Including: - AMD IOMMU Fix to make sure features are detected before they are queried. - Intel IOMMU address alignment check fix for an IOLTB flushing command. - Performance fix for Intel IOMMU to make sure the code does not do full IOTLB flushes all the time. Those flushes are very expensive on emulated IOMMUs. Lu Baolu (1): iommu/vt-d: Correctly check addr alignment in qi_flush_dev_iotlb_pasid() Nadav Amit (1): iommu/vt-d: Do not use flush-queue when caching-mode is on Suravee Suthikulpanit (1): iommu/amd: Use IVHD EFR for early initialization of IOMMU features drivers/iommu/amd/amd_iommu.h | 7 ++--- drivers/iommu/amd/amd_iommu_types.h | 4 +++ drivers/iommu/amd/init.c| 56 +++-- drivers/iommu/intel/dmar.c | 2 +- drivers/iommu/intel/iommu.c | 32 - 5 files changed, 92 insertions(+), 9 deletions(-) Please pull. Thanks, Joerg signature.asc Description: Digital signature
Re: [PATCH V3 0/6] x86: don't abuse tss.sp1
On Fri, Jan 29, 2021 at 11:35:46PM +0800, Lai Jiangshan wrote: > Any feedback? Yes: be patient please. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
Re: [net-next PATCH v4 01/15] Documentation: ACPI: DSD: Document MDIO PHY
On Fri, Jan 29, 2021 at 5:37 PM Rafael J. Wysocki wrote: > > On Fri, Jan 29, 2021 at 7:48 AM Calvin Johnson > wrote: > > > > On Thu, Jan 28, 2021 at 02:27:00PM +0100, Rafael J. Wysocki wrote: > > > On Thu, Jan 28, 2021 at 2:12 PM Calvin Johnson > > > wrote: > > > > > > > > On Thu, Jan 28, 2021 at 01:00:40PM +0100, Rafael J. Wysocki wrote: > > > > > On Thu, Jan 28, 2021 at 12:27 PM Calvin Johnson > > > > > wrote: > > > > > > > > > > > > Hi Rafael, > > > > > > > > > > > > Thanks for the review. I'll work on all the comments. > > > > > > > > > > > > On Fri, Jan 22, 2021 at 08:22:21PM +0100, Rafael J. Wysocki wrote: > > > > > > > On Fri, Jan 22, 2021 at 4:43 PM Calvin Johnson > > > > > > > wrote: > > > > > > > > > > > > > > > > Introduce ACPI mechanism to get PHYs registered on a MDIO bus > > > > > > > > and > > > > > > > > provide them to be connected to MAC. > > > > > > > > > > > > > > > > Describe properties "phy-handle" and "phy-mode". > > > > > > > > > > > > > > > > Signed-off-by: Calvin Johnson > > > > > > > > --- > > > > > > > > > > > > > > > > Changes in v4: > > > > > > > > - More cleanup > > > > > > > > > > > > > > This looks much better that the previous versions IMV, some nits > > > > > > > below. > > > > > > > > > > > > > > > Changes in v3: None > > > > > > > > Changes in v2: > > > > > > > > - Updated with more description in document > > > > > > > > > > > > > > > > Documentation/firmware-guide/acpi/dsd/phy.rst | 129 > > > > > > > > ++ > > > > > > > > 1 file changed, 129 insertions(+) > > > > > > > > create mode 100644 > > > > > > > > Documentation/firmware-guide/acpi/dsd/phy.rst > > > > > > > > > > > > > > > > diff --git a/Documentation/firmware-guide/acpi/dsd/phy.rst > > > > > > > > b/Documentation/firmware-guide/acpi/dsd/phy.rst > > > > > > > > new file mode 100644 > > > > > > > > index ..76fca994bc99 > > > > > > > > --- /dev/null > > > > > > > > +++ b/Documentation/firmware-guide/acpi/dsd/phy.rst > > > > > > > > @@ -0,0 +1,129 @@ > > > > > > > > +.. SPDX-License-Identifier: GPL-2.0 > > > > > > > > + > > > > > > > > += > > > > > > > > +MDIO bus and PHYs in ACPI > > > > > > > > += > > > > > > > > + > > > > > > > > +The PHYs on an MDIO bus [1] are probed and registered using > > > > > > > > +fwnode_mdiobus_register_phy(). > > > > > > > > > > > > > > Empty line here, please. > > > > > > > > > > > > > > > +Later, for connecting these PHYs to MAC, the PHYs registered > > > > > > > > on the > > > > > > > > +MDIO bus have to be referenced. > > > > > > > > + > > > > > > > > +The UUID given below should be used as mentioned in the > > > > > > > > "Device Properties > > > > > > > > +UUID For _DSD" [2] document. > > > > > > > > + - UUID: daffd814-6eba-4d8c-8a91-bc9bbf4aa301 > > > > > > > > > > > > > > I would drop the above paragraph. > > > > > > > > > > > > > > > + > > > > > > > > +This document introduces two _DSD properties that are to be > > > > > > > > used > > > > > > > > +for PHYs on the MDIO bus.[3] > > > > > > > > > > > > > > I'd say "for connecting PHYs on the MDIO bus [3] to the MAC > > > > > > > layer." > > > > > > > above and add the following here: > > > > > > > > > > > > > > "These properties are defined in accordance with the "Device > > > > > > > Properties UUID For _DSD" [2] document and the > > > > > > > daffd814-6eba-4d8c-8a91-bc9bbf4aa301 UUID must be used in the > > > > > > > Device > > > > > > > Data Descriptors containing them." > > > > > > > > > > > > > > > + > > > > > > > > +phy-handle > > > > > > > > +-- > > > > > > > > +For each MAC node, a device property "phy-handle" is used to > > > > > > > > reference > > > > > > > > +the PHY that is registered on an MDIO bus. This is mandatory > > > > > > > > for > > > > > > > > +network interfaces that have PHYs connected to MAC via MDIO > > > > > > > > bus. > > > > > > > > + > > > > > > > > +During the MDIO bus driver initialization, PHYs on this bus > > > > > > > > are probed > > > > > > > > +using the _ADR object as shown below and are registered on the > > > > > > > > MDIO bus. > > > > > > > > > > > > > > Do you want to mention the "reg" property here? I think it would > > > > > > > be > > > > > > > useful to do that. > > > > > > > > > > > > No. I think we should adhere to _ADR in MDIO case. The "reg" > > > > > > property for ACPI > > > > > > may be useful for other use cases that Andy is aware of. > > > > > > > > > > The code should reflect this, then. I mean it sounds like you want to > > > > > check the "reg" property only if this is a non-ACPI node. > > > > > > > > Right. For MDIO case, that is what is required. > > > > "reg" for DT and "_ADR" for ACPI. > > > > > > > > However, Andy pointed out [1] that ACPI nodes can also hold reg > > > > property and > > > > therefore, fwnode_get_id() need to be capable to handling that > > > > situation as > > > > well. > > > > > > No, please don't confuse those two things. > > > >
[PATCH net] rxrpc: Fix deadlock around release of dst cached on udp tunnel
AF_RXRPC sockets use UDP ports in encap mode. This causes socket and dst from an incoming packet to get stolen and attached to the UDP socket from whence it is leaked when that socket is closed. When a network namespace is removed, the wait for dst records to be cleaned up happens before the cleanup of the rxrpc and UDP socket, meaning that the wait never finishes. Fix this by moving the rxrpc (and, by dependence, the afs) private per-network namespace registrations to the device group rather than subsys group. This allows cached rxrpc local endpoints to be cleared and their UDP sockets closed before we try waiting for the dst records. The symptom is that lines looking like the following: unregister_netdevice: waiting for lo to become free get emitted at regular intervals after running something like the referenced syzbot test. Thanks to Vadim for tracking this down and work out the fix. Reported-by: syzbot+df400f2f24a1677cd...@syzkaller.appspotmail.com Reported-by: Vadim Fedorenko Fixes: 5271953cad31 ("rxrpc: Use the UDP encap_rcv hook") Signed-off-by: David Howells --- fs/afs/main.c|6 +++--- net/rxrpc/af_rxrpc.c |6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/fs/afs/main.c b/fs/afs/main.c index accdd8970e7c..b2975256dadb 100644 --- a/fs/afs/main.c +++ b/fs/afs/main.c @@ -193,7 +193,7 @@ static int __init afs_init(void) goto error_cache; #endif - ret = register_pernet_subsys(&afs_net_ops); + ret = register_pernet_device(&afs_net_ops); if (ret < 0) goto error_net; @@ -213,7 +213,7 @@ static int __init afs_init(void) error_proc: afs_fs_exit(); error_fs: - unregister_pernet_subsys(&afs_net_ops); + unregister_pernet_device(&afs_net_ops); error_net: #ifdef CONFIG_AFS_FSCACHE fscache_unregister_netfs(&afs_cache_netfs); @@ -244,7 +244,7 @@ static void __exit afs_exit(void) proc_remove(afs_proc_symlink); afs_fs_exit(); - unregister_pernet_subsys(&afs_net_ops); + unregister_pernet_device(&afs_net_ops); #ifdef CONFIG_AFS_FSCACHE fscache_unregister_netfs(&afs_cache_netfs); #endif diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c index 0a2f4817ec6c..41671af6b33f 100644 --- a/net/rxrpc/af_rxrpc.c +++ b/net/rxrpc/af_rxrpc.c @@ -990,7 +990,7 @@ static int __init af_rxrpc_init(void) goto error_security; } - ret = register_pernet_subsys(&rxrpc_net_ops); + ret = register_pernet_device(&rxrpc_net_ops); if (ret) goto error_pernet; @@ -1035,7 +1035,7 @@ static int __init af_rxrpc_init(void) error_sock: proto_unregister(&rxrpc_proto); error_proto: - unregister_pernet_subsys(&rxrpc_net_ops); + unregister_pernet_device(&rxrpc_net_ops); error_pernet: rxrpc_exit_security(); error_security: @@ -1057,7 +1057,7 @@ static void __exit af_rxrpc_exit(void) unregister_key_type(&key_type_rxrpc); sock_unregister(PF_RXRPC); proto_unregister(&rxrpc_proto); - unregister_pernet_subsys(&rxrpc_net_ops); + unregister_pernet_device(&rxrpc_net_ops); ASSERTCMP(atomic_read(&rxrpc_n_tx_skbs), ==, 0); ASSERTCMP(atomic_read(&rxrpc_n_rx_skbs), ==, 0);
Re: [PATCH 1/3] serial: 8250: Handle UART without interrupt on TEMT using em485
On 2021-01-29 6:22 a.m., Andy Shevchenko wrote: > On Thu, Jan 28, 2021 at 06:36:27PM -0500, Eric Tremblay wrote: >> The patch introduce the UART_CAP_TEMT capability which is by default >> assigned to all 8250 UART since the code assume that device has the >> interrupt on TEMT > You have missed periods in the sentences here and there. Please, check the > grammar and punctuation everywhere. > >> In the case where the device does not support it, we calculate the >> maximum of time it could take for the transmitter to empty the > maximum time > >> shift register. When we get in the situation where we get the >> THRE interrupt but the TEMT bit is not set we start the timer >> and recall __stop_tx after the delay > __stop_tx() I will review the grammar and spelling, thanks for mentioning it > > ... > >> /* initialize data */ >> -up.capabilities = UART_CAP_FIFO | UART_CAP_MINI; >> +data->uart.capabilities = UART_CAP_FIFO | UART_CAP_MINI | UART_CAP_TEMT; > I didn't get, if you state that CAP_TEMT is default on all UARTs, why you have > this? It's a merge mistake, sorry for that. The next version will use the reverse capability like Jiri Slaby suggested, there will be no needs to modify other driver. > >> -up.capabilities = UART_CAP_FIFO; >> +up.capabilities = UART_CAP_FIFO | UART_CAP_TEMT; > And so this? > > ... > >> +static inline void serial8250_em485_update_temt_delay(struct uart_8250_port >> *p, >> +unsigned int cflag, unsigned int baud) >> +{ >> +unsigned int bits; >> + >> +if (!p->em485) >> +return; >> + >> +/* byte size and parity */ >> +switch (cflag & CSIZE) { >> +case CS5: >> +bits = 7; >> +break; >> +case CS6: >> +bits = 8; >> +break; >> +case CS7: >> +bits = 9; >> +break; >> +default: >> +bits = 10; >> +break; /* CS8 */ >> +} >> + >> +if (cflag & CSTOPB) >> +bits++; >> +if (cflag & PARENB) >> +bits++; > This is repetition of uart_update_timeout(). Find a way to deduplicate. > >> +p->em485->no_temt_delay = bits*100/baud; > Use spaces. > Is this magic should be defined as HZ_PER_MHZ? > >> +} > ... > >> +static void start_hrtimer_us(struct hrtimer *hrt, unsigned long usec) >> +{ >> +long sec = usec / 100; >> +long nsec = (usec % 100) * 1000; > > USEC_PER_SEC > NSEC_PER_USEC > >> +ktime_t t = ktime_set(sec, nsec); >> + >> +hrtimer_start(hrt, t, HRTIMER_MODE_REL); >> +} > ... > >> +if ((lsr & BOTH_EMPTY) != BOTH_EMPTY) { >> +/* >> + * On devices with no interrupt on TEMT available > "with no TEMT interrupt available" > >> + * start a timer for a byte time, the timer will recall >> + * __stop_tx > __stop_tx(). > >> + */ >> +if (!(p->capabilities & UART_CAP_TEMT) && (lsr & >> UART_LSR_THRE)) { >> +em485->active_timer = &em485->no_temt_timer; >> +start_hrtimer_us(&em485->no_temt_timer, >> em485->no_temt_delay); >> +} > Perhaps > if ((p->capabilities & UART_CAP_TEMT) && (lsr & > UART_LSR_THRE)) > return; > > em485->active_timer = &em485->no_temt_timer; > start_hrtimer_us(&em485->no_temt_timer, > em485->no_temt_delay); > > ? I also prefer that form, I will apply it in next version > >> return; >> +}
Re: [PATCH v9 1/7] ACPI: scan: Obtain device's desired enumeration power state
Hi Rafael, Thanks for the comments. On Fri, Jan 29, 2021 at 03:07:57PM +0100, Rafael J. Wysocki wrote: > On Fri, Jan 29, 2021 at 12:27 AM Sakari Ailus > wrote: > > > > Store a device's desired enumeration power state in struct > > acpi_device_power_flags during acpi_device object's initialisation. > > > > Signed-off-by: Sakari Ailus > > --- > > drivers/acpi/scan.c | 6 ++ > > include/acpi/acpi_bus.h | 3 ++- > > 2 files changed, 8 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c > > index 1d7a02ee45e05..b077c645c9845 100644 > > --- a/drivers/acpi/scan.c > > +++ b/drivers/acpi/scan.c > > @@ -987,6 +987,8 @@ static void acpi_bus_init_power_state(struct > > acpi_device *device, int state) > > > > static void acpi_bus_get_power_flags(struct acpi_device *device) > > { > > + unsigned long long pre; > > + acpi_status status; > > u32 i; > > > > /* Presence of _PS0|_PR0 indicates 'power manageable' */ > > @@ -1008,6 +1010,10 @@ static void acpi_bus_get_power_flags(struct > > acpi_device *device) > > if (acpi_has_method(device->handle, "_DSW")) > > device->power.flags.dsw_present = 1; > > > > + status = acpi_evaluate_integer(device->handle, "_PRE", NULL, &pre); > > + if (ACPI_SUCCESS(status) && !pre) > > + device->power.flags.allow_low_power_probe = 1; > > While this is what has been discussed and thanks for taking it into > account, I'm now thinking that it may be cleaner to introduce a new > object to return the deepest power state of the device in which it can > be enumerated, say _DSE (Device State for Enumeration) such that 4 > means D3cold, 3 - D3hot and so on, so the above check can be replaced > with something like > > status = acpi_evaluate_integer(device->handle, "_PRE", NULL, &dse); s/_PRE/_DSE/ ? > if (ACPI_FAILURE(status)) ACPI_SUCCESS? > device->power.state_for_enumeratin = dse; > > And then, it is a matter of comparing ->power.state_for_enumeratin > with ->power.state and putting the device into D0 if the former is > shallower than the latter. > > What do you think? Sounds good. How about calling the function e.g. acpi_device_resume_for_probe(), so runtime PM could be used to resume the device if the function returns true? -- Kind regards, Sakari Ailus
linux-next-20210129: drivers/media/platform/mtk-vcodec/
on i386: ld: drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_pm.o: in function `mtk_vcodec_dec_clock_on': mtk_vcodec_dec_pm.c:(.text+0xff): undefined reference to `mtk_smi_larb_get' ld: drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_pm.o: in function `mtk_vcodec_dec_clock_off': mtk_vcodec_dec_pm.c:(.text+0x180): undefined reference to `mtk_smi_larb_put' ld: drivers/media/platform/mtk-vcodec/mtk_vcodec_enc_pm.o: in function `mtk_vcodec_enc_clock_on': mtk_vcodec_enc_pm.c:(.text+0xd0): undefined reference to `mtk_smi_larb_get' ld: mtk_vcodec_enc_pm.c:(.text+0xf3): undefined reference to `mtk_smi_larb_get' ld: mtk_vcodec_enc_pm.c:(.text+0x114): undefined reference to `mtk_smi_larb_put' ld: drivers/media/platform/mtk-vcodec/mtk_vcodec_enc_pm.o: in function `mtk_vcodec_enc_clock_off': mtk_vcodec_enc_pm.c:(.text+0x181): undefined reference to `mtk_smi_larb_put' ld: mtk_vcodec_enc_pm.c:(.text+0x189): undefined reference to `mtk_smi_larb_put' Full randconfig file is attached. -- ~Randy Reported-by: Randy Dunlap config-r7503.gz Description: application/gzip
Re: [PATCH] x86: Disable CET instrumentation in the kernel
On Fri, Jan 29, 2021 at 05:30:48PM +0100, Borislav Petkov wrote: > On Fri, Jan 29, 2021 at 09:10:34AM -0600, Josh Poimboeuf wrote: > > Maybe eventually. But the enablement (actually enabling CET/CFI/etc) > > happens in the arch code anyway, right? So it could be a per-arch > > decision. > > Right. > > Ok, for this one, what about > > Cc: > > ? > > What are "some configurations of GCC"? If it can be reproduced with > what's released out there, maybe that should go in now, even for 5.11? > > Hmm? Agreed, stable is a good idea. I think Nikolay saw it with GCC 9. -- Josh
[PATCH v2] selinux: measure state and policy capabilities
SELinux stores the configuration state and the policy capabilities in kernel memory. Changes to this data at runtime would have an impact on the security guarantees provided by SELinux. Measuring this data through IMA subsystem provides a tamper-resistant way for an attestation service to remotely validate it at runtime. Measure the configuration state and policy capabilities by calling the IMA hook ima_measure_critical_data(). To enable SELinux data measurement, the following steps are required: 1, Add "ima_policy=critical_data" to the kernel command line arguments to enable measuring SELinux data at boot time. For example, BOOT_IMAGE=/boot/vmlinuz-5.11.0-rc3+ root=UUID=fd643309-a5d2-4ed3-b10d-3c579a5fab2f ro nomodeset security=selinux ima_policy=critical_data 2, Add the following rule to /etc/ima/ima-policy measure func=CRITICAL_DATA label=selinux Sample measurement of SELinux state and policy capabilities: 10 2122...65d8 ima-buf sha256:13c2...1292 selinux-state 696e...303b Execute the following command to extract the measured data from the IMA's runtime measurements list: grep "selinux-state" /sys/kernel/security/integrity/ima/ascii_runtime_measurements | tail -1 | cut -d' ' -f 6 | xxd -r -p The output should be a list of key-value pairs. For example, initialized=1;enforcing=0;checkreqprot=1;network_peer_controls=1;open_perms=1;extended_socket_class=1;always_check_network=0;cgroup_seclabel=1;nnp_nosuid_transition=1;genfs_seclabel_symlinks=0; To verify the measurement is consistent with the current SELinux state reported on the system, compare the integer values in the following files with those set in the IMA measurement (using the following commands): - cat /sys/fs/selinux/enforce - cat /sys/fs/selinux/checkreqprot - cat /sys/fs/selinux/policy_capabilities/[capability_file] Note that the actual verification would be against an expected state and done on a separate system (likely an attestation server) requiring "initialized=1;enforcing=1;checkreqprot=0;" for a secure state and then whatever policy capabilities are actually set in the expected policy (which can be extracted from the policy itself via seinfo, for example). Signed-off-by: Lakshmi Ramasubramanian Suggested-by: Stephen Smalley Suggested-by: Paul Moore --- security/selinux/ima.c | 77 -- security/selinux/include/ima.h | 6 +++ security/selinux/selinuxfs.c | 6 +++ security/selinux/ss/services.c | 2 +- 4 files changed, 86 insertions(+), 5 deletions(-) diff --git a/security/selinux/ima.c b/security/selinux/ima.c index 03715893ff97..5c7f73cd1117 100644 --- a/security/selinux/ima.c +++ b/security/selinux/ima.c @@ -13,18 +13,73 @@ #include "ima.h" /* - * selinux_ima_measure_state - Measure hash of the SELinux policy + * selinux_ima_collect_state - Read selinux configuration settings * - * @state: selinux state struct + * @state: selinux_state * - * NOTE: This function must be called with policy_mutex held. + * On success returns the configuration settings string. + * On error, returns NULL. */ -void selinux_ima_measure_state(struct selinux_state *state) +static char *selinux_ima_collect_state(struct selinux_state *state) +{ + const char *on = "=1;", *off = "=0;"; + char *buf; + int buf_len, i; + + /* +* Size of the following string including the terminating NULL char +*initialized=0;enforcing=0;checkreqprot=0; +*/ + buf_len = 42; + for (i = 0; i < __POLICYDB_CAPABILITY_MAX; i++) + buf_len += strlen(selinux_policycap_names[i]) + 3; + + buf = kzalloc(buf_len, GFP_KERNEL); + if (!buf) + return NULL; + + strscpy(buf, "initialized", buf_len); + strlcat(buf, selinux_initialized(state) ? on : off, buf_len); + + strlcat(buf, "enforcing", buf_len); + strlcat(buf, enforcing_enabled(state) ? on : off, buf_len); + + strlcat(buf, "checkreqprot", buf_len); + strlcat(buf, checkreqprot_get(state) ? on : off, buf_len); + + for (i = 0; i < __POLICYDB_CAPABILITY_MAX; i++) { + strlcat(buf, selinux_policycap_names[i], buf_len); + strlcat(buf, state->policycap[i] ? on : off, buf_len); + } + + return buf; +} + +/* + * selinux_ima_measure_state_locked - Measure SELinux state and hash of policy + * + * @state: selinux state struct + */ +void selinux_ima_measure_state_locked(struct selinux_state *state) { + char *state_str = NULL; void *policy = NULL; size_t policy_len; int rc = 0; + WARN_ON(!mutex_is_locked(&state->policy_mutex)); + + state_str = selinux_ima_collect_state(state); + if (!state_str) { + pr_err("SELinux: %s: failed to read state.\n", __func__); + return; + } + + ima_measure_critical_data("selinux", "selinux-state", + state_str, strlen(state_str),
Re: [PATCH] pwm: fix semicolon.cocci warnings
On 1/28/21 10:57 PM, Uwe Kleine-König wrote: Hello, On Thu, Jan 28, 2021 at 09:45:37PM +0800, kernel test robot wrote: From: kernel test robot drivers/pwm/pwm-lpc18xx-sct.c:292:2-3: Unneeded semicolon Remove unneeded semicolon. Generated by: scripts/coccinelle/misc/semicolon.cocci Fixes: e96c0ff4b1e0 ("pwm: Enable compile testing for some of drivers") This looks wrong. e96c0ff4b1e0 only touches drivers/pwm/Kconfig. The ; was introduced by commit 841e6f90bb78 ("pwm: NXP LPC18xx PWM/SCT driver") Right, thank you for the correction, Uwe. Since the patch has been composed by the robot, it has to be fixed in the first place. And regarding this particular change and in general fixes to this type of issues detected by the robot, I don't think that it earns a Fixes tag. CC: Krzysztof Kozlowski Reported-by: kernel test robot Signed-off-by: kernel test robot -- Best wishes, Vladimir
Re: [PATCH] x86: Disable CET instrumentation in the kernel
On 29.01.21 г. 18:49 ч., Josh Poimboeuf wrote: > Agreed, stable is a good idea. I think Nikolay saw it with GCC 9. Yes I did, with the default Ubuntu compiler as well as the default gcc-10 compiler: # gcc -v -Q -O2 --help=target | grep protection gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04) COLLECT_GCC_OPTIONS='-v' '-Q' '-O2' '--help=target' '-mtune=generic' '-march=x86-64' /usr/lib/gcc/x86_64-linux-gnu/9/cc1 -v -imultiarch x86_64-linux-gnu help-dummy -dumpbase help-dummy -mtune=generic -march=x86-64 -auxbase help-dummy -O2 -version --help=target -fasynchronous-unwind-tables -fstack-protector-strong -Wformat -Wformat-security -fstack-clash-protection -fcf-protection -o /tmp/ccSecttk.s GNU C17 (Ubuntu 9.3.0-17ubuntu1~20.04) version 9.3.0 (x86_64-linux-gnu) compiled by GNU C version 9.3.0, GMP version 6.2.0, MPFR version 4.0.2, MPC version 1.1.0, isl version isl-0.22.1-GMP It has -fcf-protection turned on by default it seems.
Re: [PATCH v9 1/7] ACPI: scan: Obtain device's desired enumeration power state
On Fri, Jan 29, 2021 at 5:45 PM Sakari Ailus wrote: > > Hi Rafael, > > Thanks for the comments. > > On Fri, Jan 29, 2021 at 03:07:57PM +0100, Rafael J. Wysocki wrote: > > On Fri, Jan 29, 2021 at 12:27 AM Sakari Ailus > > wrote: > > > > > > Store a device's desired enumeration power state in struct > > > acpi_device_power_flags during acpi_device object's initialisation. > > > > > > Signed-off-by: Sakari Ailus > > > --- > > > drivers/acpi/scan.c | 6 ++ > > > include/acpi/acpi_bus.h | 3 ++- > > > 2 files changed, 8 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c > > > index 1d7a02ee45e05..b077c645c9845 100644 > > > --- a/drivers/acpi/scan.c > > > +++ b/drivers/acpi/scan.c > > > @@ -987,6 +987,8 @@ static void acpi_bus_init_power_state(struct > > > acpi_device *device, int state) > > > > > > static void acpi_bus_get_power_flags(struct acpi_device *device) > > > { > > > + unsigned long long pre; > > > + acpi_status status; > > > u32 i; > > > > > > /* Presence of _PS0|_PR0 indicates 'power manageable' */ > > > @@ -1008,6 +1010,10 @@ static void acpi_bus_get_power_flags(struct > > > acpi_device *device) > > > if (acpi_has_method(device->handle, "_DSW")) > > > device->power.flags.dsw_present = 1; > > > > > > + status = acpi_evaluate_integer(device->handle, "_PRE", NULL, > > > &pre); > > > + if (ACPI_SUCCESS(status) && !pre) > > > + device->power.flags.allow_low_power_probe = 1; > > > > While this is what has been discussed and thanks for taking it into > > account, I'm now thinking that it may be cleaner to introduce a new > > object to return the deepest power state of the device in which it can > > be enumerated, say _DSE (Device State for Enumeration) such that 4 > > means D3cold, 3 - D3hot and so on, so the above check can be replaced > > with something like > > > > status = acpi_evaluate_integer(device->handle, "_PRE", NULL, &dse); > > s/_PRE/_DSE/ > > ? Yes, sorry. > > > if (ACPI_FAILURE(status)) > > ACPI_SUCCESS? Yup. > > device->power.state_for_enumeratin = dse; > > > > And then, it is a matter of comparing ->power.state_for_enumeratin > > with ->power.state and putting the device into D0 if the former is > > shallower than the latter. > > > > What do you think? > > Sounds good. How about calling the function e.g. > acpi_device_resume_for_probe(), so runtime PM could be used to resume the > device if the function returns true? I'd rather try to power it up before enabling runtime PM, because in order to do the latter properly, you need to know if the device is active or suspended to start with. So you need something like (pseudo-code) if (this_device_needs_to_be_on(ACPI_COMPANION(dev))) { acpi_device_set_power(ACPI_COMPANION(dev), ACPI_STATE_D0); pm_runtime_set_active(dev); } else { pm_runtime_set_suspended(dev); } and then you can enable PM-runtime.
Re: [PATCH v2] KVM: x86: Allow guests to see MSR_IA32_TSX_CTRL even if tsx=off
On Fri, Jan 29, 2021, Paolo Bonzini wrote: > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 76bce832cade..15733013b266 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -1401,7 +1401,7 @@ static u64 kvm_get_arch_capabilities(void) >*This lets the guest use VERW to clear CPU buffers. This comment be updated to call out the new TSX_CTRL behavior. /* * On TAA affected systems: * - nothing to do if TSX is disabled on the host. * - we emulate TSX_CTRL if present on the host. *This lets the guest use VERW to clear CPU buffers. */ >*/ > if (!boot_cpu_has(X86_FEATURE_RTM)) > - data &= ~(ARCH_CAP_TAA_NO | ARCH_CAP_TSX_CTRL_MSR); > + data &= ~ARCH_CAP_TAA_NO; Hmm, simply clearing TSX_CTRL will only preserve the host value. Since ARCH_CAPABILITIES is unconditionally emulated by KVM, wouldn't it make sense to unconditionally expose TSX_CTRL as well, as opposed to exposing it only if it's supported in the host? I.e. allow migrating a TSX-disabled guest to a host without TSX. Or am I misunderstanding how TSX_CTRL is checked/used? > else if (!boot_cpu_has_bug(X86_BUG_TAA)) > data |= ARCH_CAP_TAA_NO; > > -- > 2.26.2 >
Re: [PATCH] bus: mvebu-mbus: make iounmap() symmetric with ioremap()
Hi, > On Fri, 29 Jan 2021 17:01:35 +0100 > Gregory CLEMENT wrote: > >> Could you sent me the patch I don't have it in my emails boxes. > > https://lore.kernel.org/lkml/20201112032149.21906-1-chris.pack...@alliedtelesis.co.nz/raw Applied on mvebu/arm Thanks, Gregory > > Thomas > -- > Thomas Petazzoni, CTO, Bootlin > Embedded Linux and Kernel engineering > https://bootlin.com -- Gregory Clement, Bootlin Embedded Linux and Kernel engineering http://bootlin.com
Re: [PATCH v2] x86/debug: Fix DR6 handling
On 1/29/21 3:48 PM, Borislav Petkov wrote: > On Thu, Jan 28, 2021 at 10:16:27PM +0100, Peter Zijlstra wrote: >> >> Tom reported that one of the GDB test-cases failed, and Boris bisected >> it to commit: >> >> d53d9bc0cf78 ("x86/debug: Change thread.debugreg6 to thread.virtual_dr6") >> >> The debugging session led us to commit: >> >> 6c0aca288e72 ("x86: Ignore trap bits on single step exceptions") >> >> It turns out that TF and data breakpoints are both traps and will be >> merged, while instruction breakpoints are faults and will not be >> merged. This means 6c0aca288e72 is wrong, we only need to exclude TF >> and instruction breakpoints while we can merge TF and data >> breakpoints. >> >> Fixes: d53d9bc0cf78 ("x86/debug: Change thread.debugreg6 to >> thread.virtual_dr6") >> Fixes: 6c0aca288e72 ("x86: Ignore trap bits on single step exceptions") >> Reported-by: Tom de Vries >> Bisected-by: Borislav Petkov >> Signed-off-by: Peter Zijlstra (Intel) > > I guess > > Cc: > > Also, > > Reviewed-by: Borislav Petkov > > And gdb testsuite is a bit happier: > > --- before > +++ after > === gdb Summary === > > -# of expected passes70822 > -# of unexpected failures899 > +# of expected passes70852 > +# of unexpected failures869 > # of expected failures 74 > # of known failures 99 > # of untested testcases 114 > > You just fixed 30(!) testcases. > > :-) > Hi Boris, thanks for testing this, and just to confirm: the total number of regressions I see in the gdb testsuite related to watchpoints is indeed 30. Thanks, - Tom
linux-next-20210129: drivers/iommu/intel/dmar.c
on x86_64: ../drivers/iommu/intel/dmar.c: In function 'qi_submit_sync': ../drivers/iommu/intel/dmar.c:1311:3: error: implicit declaration of function 'trace_qi_submit'; did you mean 'ftrace_nmi_exit'? [-Werror=implicit-function-declaration] trace_qi_submit(iommu, desc[i].qw0, desc[i].qw1, ^~~ ftrace_nmi_exit Full randconfig file is attached. -- ~Randy Reported-by: Randy Dunlap config-r7511.gz Description: application/gzip
Re: [PATCH v3 2/3] perf/smmuv3: Add a MODULE_SOFTDEP() to indicate dependency on SMMU
On 2021-01-29 15:34, John Garry wrote: On 29/01/2021 15:12, Robin Murphy wrote: On 2021-01-27 11:32, Zhen Lei wrote: The MODULE_SOFTDEP() gives user space a hint of the loading sequence. And when command "modprobe arm_smmuv3_pmu" is executed, the arm_smmu_v3.ko is automatically loaded in advance. Why do we need this? If probe order doesn't matter when both drivers are built-in, why should module load order? TBH I'm not sure why we even have a Kconfig dependency on ARM_SMMU_V3, given that the drivers operate completely independently :/ Can that Kconfig dependency just be removed? I think that it was added under the idea that there is no point in having the SMMUv3 PMU driver without the SMMUv3 driver. A PMCG *might* be usable for simply counting transactions to measure device activity regardless of its associated SMMU being enabled. Either way, it's not really Kconfig's job to decide what makes sense (beyond the top-level "can this driver *ever* be used on this platform" visibility choices). Imagine if we gave every PCI/USB/etc. device driver an explicit dependency on at least one host controller driver being enabled... Robin.
Re: [PATCH] x86: Disable CET instrumentation in the kernel
On Fri, Jan 29, 2021 at 06:54:08PM +0200, Nikolay Borisov wrote: > > > On 29.01.21 г. 18:49 ч., Josh Poimboeuf wrote: > > Agreed, stable is a good idea. I think Nikolay saw it with GCC 9. > > > Yes I did, with the default Ubuntu compiler as well as the default gcc-10 > compiler: > > # gcc -v -Q -O2 --help=target | grep protection > > gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04) > COLLECT_GCC_OPTIONS='-v' '-Q' '-O2' '--help=target' '-mtune=generic' > '-march=x86-64' > /usr/lib/gcc/x86_64-linux-gnu/9/cc1 -v -imultiarch x86_64-linux-gnu > help-dummy -dumpbase help-dummy -mtune=generic -march=x86-64 -auxbase > help-dummy -O2 -version --help=target -fasynchronous-unwind-tables > -fstack-protector-strong -Wformat -Wformat-security -fstack-clash-protection > -fcf-protection -o /tmp/ccSecttk.s > GNU C17 (Ubuntu 9.3.0-17ubuntu1~20.04) version 9.3.0 (x86_64-linux-gnu) > compiled by GNU C version 9.3.0, GMP version 6.2.0, MPFR version 4.0.2, > MPC version 1.1.0, isl version isl-0.22.1-GMP > > > It has -fcf-protection turned on by default it seems. Yup, explains why I didn't see it: gcc version 10.2.1 20201125 (Red Hat 10.2.1-9) (GCC) COLLECT_GCC_OPTIONS='-v' '-Q' '-O2' '--help=target' '-mtune=generic' '-march=x86-64' /usr/libexec/gcc/x86_64-redhat-linux/10/cc1 -v help-dummy -dumpbase help-dummy -mtune=generic -march=x86-64 -auxbase help-dummy -O2 -version --help=target -o /tmp/cclBz55H.s -- Josh
Re: [v5 PATCH 04/11] mm: vmscan: remove memcg_shrinker_map_size
On Fri, Jan 29, 2021 at 3:22 AM Vlastimil Babka wrote: > > On 1/28/21 10:22 PM, Yang Shi wrote: > >> > @@ -266,12 +265,13 @@ int alloc_shrinker_maps(struct mem_cgroup *memcg) > >> > static int expand_shrinker_maps(int new_id) > >> > { > >> > int size, old_size, ret = 0; > >> > + int new_nr_max = new_id + 1; > >> > struct mem_cgroup *memcg; > >> > > >> > - size = DIV_ROUND_UP(new_id + 1, BITS_PER_LONG) * sizeof(unsigned > >> > long); > >> > - old_size = memcg_shrinker_map_size; > >> > + size = (new_nr_max / BITS_PER_LONG + 1) * sizeof(unsigned long); > >> > + old_size = (shrinker_nr_max / BITS_PER_LONG + 1) * sizeof(unsigned > >> > long); > >> > >> What's wrong with using DIV_ROUND_UP() here? > > > > I don't think there is anything wrong with DIV_ROUND_UP. Should be > > just different taste and result in shorter statement. > > IMHO it's not just taste. DIV_ROUND_UP() says what it does and you don't need > to > guess it from the math expression. Also your expression is shorter as it > simply > adds + 1, so if shrinker_nr_max is a multiple of BITS_PER_LONG, there's an > extra > unsigned long that shouldn't be needed. People reading that code will wonder > whether there was some non-obvious intention behind that, and possibly send > cleanup patches. OK, will replace back to DIV_ROUND_UP(). And, a helper macro is introduced in patch #6, will add that helper in this patch and use DIV_ROUND_UP() in the helper. > > >> > >> > if (size <= old_size) > >> > - return 0; > >> > + goto out; > >> > >> Can this even happen? Seems to me it can't, so just remove this? > > > > Yes, it can. The maps use unsigned long value for bitmap, so any > > shrinker ID < 31 would fall into the same unsigned long, so we may see > > size <= old_size, but we need increase shrinker_nr_max since > > expand_shrinker_maps() is called iff id >= shrinker_nr_max. > > Ah, good point.
Re: [PATCH v18 24/25] x86/cet/shstk: Add arch_prctl functions for shadow stack
On 1/27/21 1:25 PM, Yu-cheng Yu wrote: > arch_prctl(ARCH_X86_CET_STATUS, u64 *args) > Get CET feature status. > > The parameter 'args' is a pointer to a user buffer. The kernel returns > the following information: > > *args = shadow stack/IBT status > *(args + 1) = shadow stack base address > *(args + 2) = shadow stack size What's the deal for 32-bit binaries? The in-kernel code looks 64-bit only, but I don't see anything restricting the interface to 64-bit. > +static int copy_status_to_user(struct cet_status *cet, u64 arg2) This has static scope, but it's still awfully generically named. A cet_ prefix would be nice. > +{ > + u64 buf[3] = {0, 0, 0}; > + > + if (cet->shstk_size) { > + buf[0] |= GNU_PROPERTY_X86_FEATURE_1_SHSTK; > + buf[1] = (u64)cet->shstk_base; > + buf[2] = (u64)cet->shstk_size; What's the casting for? > + } > + > + return copy_to_user((u64 __user *)arg2, buf, sizeof(buf)); > +} > + > +int prctl_cet(int option, u64 arg2) > +{ > + struct cet_status *cet; > + unsigned int features; > + > + /* > + * GLIBC's ENOTSUPP == EOPNOTSUPP == 95, and it does not recognize > + * the kernel's ENOTSUPP (524). So return EOPNOTSUPP here. > + */ > + if (!IS_ENABLED(CONFIG_X86_CET)) > + return -EOPNOTSUPP; Let's ignore glibc for a moment. What error code *should* the kernel be returning here? errno(3) says: EOPNOTSUPP Operation not supported on socket (POSIX.1) ... ENOTSUP Operation not supported (POSIX.1) > + cet = ¤t->thread.cet; > + > + if (option == ARCH_X86_CET_STATUS) > + return copy_status_to_user(cet, arg2); What's the point of doing copy_status_to_user() if the processor doesn't support CET? In other words, shouldn't this be below the CPU feature check? Also, please cast arg2 *here*. It becomes a user pointer here, not at the copy_to_user(). > + if (!static_cpu_has(X86_FEATURE_CET)) > + return -EOPNOTSUPP; So, you went to the trouble of adding a disabled-features.h entry for this. Why not just do: if (cpu_feature_enabled(X86_FEATURE_CET)) ... instead of the IS_ENABLED() check above? That should get rid of one of these if's. > + switch (option) { > + case ARCH_X86_CET_DISABLE: > + if (cet->locked) > + return -EPERM; > + > + features = (unsigned int)arg2; What's the purpose of this cast? > + if (features & ~GNU_PROPERTY_X86_FEATURE_1_VALID) > + return -EINVAL; > + if (features & GNU_PROPERTY_X86_FEATURE_1_SHSTK) > + cet_disable_shstk(); > + return 0; This doesn't enforce that the high bits of arg2 be 0. Shouldn't we call them reserved and enforce that they be 0? > + case ARCH_X86_CET_LOCK: > + cet->locked = 1; > + return 0; This needs to check for and enforce that arg2==0. > + default: > + return -ENOSYS; > + } > +}
Re: [PATCH] x86: Disable CET instrumentation in the kernel
On Fri, Jan 29, 2021 at 11:03:31AM -0600, Josh Poimboeuf wrote: > On Fri, Jan 29, 2021 at 06:54:08PM +0200, Nikolay Borisov wrote: > > > > > > On 29.01.21 г. 18:49 ч., Josh Poimboeuf wrote: > > > Agreed, stable is a good idea. I think Nikolay saw it with GCC 9. > > > > > > Yes I did, with the default Ubuntu compiler as well as the default gcc-10 > > compiler: > > > > # gcc -v -Q -O2 --help=target | grep protection > > > > gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04) > > COLLECT_GCC_OPTIONS='-v' '-Q' '-O2' '--help=target' '-mtune=generic' > > '-march=x86-64' > > /usr/lib/gcc/x86_64-linux-gnu/9/cc1 -v -imultiarch x86_64-linux-gnu > > help-dummy -dumpbase help-dummy -mtune=generic -march=x86-64 -auxbase > > help-dummy -O2 -version --help=target -fasynchronous-unwind-tables > > -fstack-protector-strong -Wformat -Wformat-security > > -fstack-clash-protection -fcf-protection -o /tmp/ccSecttk.s > > GNU C17 (Ubuntu 9.3.0-17ubuntu1~20.04) version 9.3.0 (x86_64-linux-gnu) > > compiled by GNU C version 9.3.0, GMP version 6.2.0, MPFR version 4.0.2, > > MPC version 1.1.0, isl version isl-0.22.1-GMP > > > > > > It has -fcf-protection turned on by default it seems. > > Yup, explains why I didn't see it: > > gcc version 10.2.1 20201125 (Red Hat 10.2.1-9) (GCC) > COLLECT_GCC_OPTIONS='-v' '-Q' '-O2' '--help=target' '-mtune=generic' > '-march=x86-64' > /usr/libexec/gcc/x86_64-redhat-linux/10/cc1 -v help-dummy -dumpbase > help-dummy -mtune=generic -march=x86-64 -auxbase help-dummy -O2 -version > --help=target -o /tmp/cclBz55H.s The fact that you triggered it with an Ubuntu gcc explains why the original patch adding that switch: 29be86d7f9cb ("kbuild: add -fcf-protection=none when using retpoline flags") came from a Canonical. Adding the author to Cc for FYI. Seth, you can find this thread starting here: https://lkml.kernel.org/r/20210128215219.6kct3h2eiustncws@treble Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
[PATCH] dmaengine: xilinx_dma: Alloc tx descriptors GFP_NOWAIT
Use GFP_NOWAIT allocation in xilinx_dma_alloc_tx_descriptor(). This is necessary for compatibility with ALSA, which calls dmaengine_prep_dma_cyclic() from an atomic context. Signed-off-by: Richard Fitzgerald --- drivers/dma/xilinx/xilinx_dma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c index 22faea653ea8..fb046af9ac53 100644 --- a/drivers/dma/xilinx/xilinx_dma.c +++ b/drivers/dma/xilinx/xilinx_dma.c @@ -800,7 +800,7 @@ xilinx_dma_alloc_tx_descriptor(struct xilinx_dma_chan *chan) { struct xilinx_dma_tx_descriptor *desc; - desc = kzalloc(sizeof(*desc), GFP_KERNEL); + desc = kzalloc(sizeof(*desc), GFP_NOWAIT); if (!desc) return NULL; -- 2.20.1
Re: general protection fault in tomoyo_socket_sendmsg_permission
On 2021/01/30 1:05, Shuah Khan wrote: >> Since "general protection fault in tomoyo_socket_sendmsg_permission" is >> caused by >> unexpectedly resetting ud->tcp_socket to NULL without waiting for tx thread >> to >> terminate, tracing the ordering of events is worth knowing. Even adding >> schedule_timeout_uninterruptible() to before kernel_sendmsg() might help. >> > > What about the duplicate bug information that was in my email? > Did you take a look at that? I was not aware of the duplicate bugs. It is interesting that "KASAN: null-ptr-deref Write in event_handler" says that vdev->ud.tcp_tx became NULL at if (vdev->ud.tcp_tx) { /* this location */ kthread_stop_put(vdev->ud.tcp_tx); vdev->ud.tcp_tx = NULL; } which means that somebody else is unexpectedly resetting vdev->ud.tcp_tx to NULL. If memset() from vhci_device_init() from vhci_start() were unexpectedly called, all of tcp_socket, tcp_rx, tcp_tx etc. becomes NULL which can explain these bugs ? I'm inclined to report not only tcp_socket but also other fields when kernel_sendmsg() detected that tcp_socket is NULL...
IMPORTANT INVESTMENT INFORMATION..6
ATTENTION; IMPORTANT INVESTMENT INFORMATION We have a good investment program going on now. We have $95m USD for Investment in your Country. We use this opportunity to invest you to join the investment program and you will never regret it. Please kindly invest with us and you will be receiving monthly income/return/profit every month. Reply for more detail. Thank you Sir. Robert Nelson.
[PATCH 1/3] dt-bindings: remoteproc: qcom: Add Q6V5 Modem PIL binding for IPQ6018
Add a new modem compatible string for IPQ6018 SoCs Signed-off-by: Gokul Sriram Palanisamy --- Documentation/devicetree/bindings/remoteproc/qcom,q6v5.txt | 4 1 file changed, 4 insertions(+) diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,q6v5.txt b/Documentation/devicetree/bindings/remoteproc/qcom,q6v5.txt index 69c49c7..7f1d5783 100644 --- a/Documentation/devicetree/bindings/remoteproc/qcom,q6v5.txt +++ b/Documentation/devicetree/bindings/remoteproc/qcom,q6v5.txt @@ -9,6 +9,7 @@ on the Qualcomm Hexagon core. Definition: must be one of: "qcom,q6v5-pil", "qcom,ipq8074-wcss-pil" + "qcom,ipq6018-wcss-pil" "qcom,qcs404-wcss-pil" "qcom,msm8916-mss-pil", "qcom,msm8974-mss-pil" @@ -40,6 +41,7 @@ on the Qualcomm Hexagon core. string: qcom,q6v5-pil: qcom,ipq8074-wcss-pil: + qcom,ipq6018-wcss-pil: qcom,qcs404-wcss-pil: qcom,msm8916-mss-pil: qcom,msm8974-mss-pil: @@ -68,6 +70,7 @@ on the Qualcomm Hexagon core. Value type: Definition: The clocks needed depend on the compatible string: qcom,ipq8074-wcss-pil: + qcom,ipq6018-wcss-pil: no clock names required qcom,qcs404-wcss-pil: must be "xo", "gcc_abhs_cbcr", "gcc_abhs_cbcr", @@ -165,6 +168,7 @@ For the compatible string below the following supplies are required: Value type: Definition: The power-domains needed depend on the compatible string: qcom,ipq8074-wcss-pil: + qcom,ipq6018-wcss-pil: no power-domain names required qcom,q6v5-pil: qcom,msm8916-mss-pil: -- 2.7.4
[PATCH 3/3] arm64: dts: ipq6018: Update WCSS PIL driver compatible
Updated WCSS PIL driver node with IPQ6018 specific compatible to enable SoC specific driver data. Signed-off-by: Gokul Sriram Palanisamy --- arch/arm64/boot/dts/qcom/ipq6018.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/boot/dts/qcom/ipq6018.dtsi b/arch/arm64/boot/dts/qcom/ipq6018.dtsi index 9fa5b02..2e6b23b 100644 --- a/arch/arm64/boot/dts/qcom/ipq6018.dtsi +++ b/arch/arm64/boot/dts/qcom/ipq6018.dtsi @@ -477,7 +477,7 @@ }; q6v5_wcss: remoteproc@cd0 { - compatible = "qcom,ipq8074-wcss-pil"; + compatible = "qcom,ipq6018-wcss-pil"; reg = <0x0 0x0cd0 0x0 0x4040>, <0x0 0x004ab000 0x0 0x20>; reg-names = "qdsp6", -- 2.7.4
[PATCH 2/3] remoteproc: qcom: wcss: populate driver data for IPQ6018
Populate hardcoded param using driver data for IPQ6018 SoCs. Signed-off-by: Gokul Sriram Palanisamy --- drivers/remoteproc/qcom_q6v5_wcss.c | 19 +-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/drivers/remoteproc/qcom_q6v5_wcss.c b/drivers/remoteproc/qcom_q6v5_wcss.c index 7c64bfc..bc9531c 100644 --- a/drivers/remoteproc/qcom_q6v5_wcss.c +++ b/drivers/remoteproc/qcom_q6v5_wcss.c @@ -965,7 +965,7 @@ static int q6v5_alloc_memory_region(struct q6v5_wcss *wcss) return 0; } -static int ipq8074_init_clock(struct q6v5_wcss *wcss) +static int ipq_init_clock(struct q6v5_wcss *wcss) { int ret; @@ -1172,7 +1172,7 @@ static int q6v5_wcss_remove(struct platform_device *pdev) } static const struct wcss_data wcss_ipq8074_res_init = { - .init_clock = ipq8074_init_clock, + .init_clock = ipq_init_clock, .q6_firmware_name = "IPQ8074/q6_fw.mdt", .m3_firmware_name = "IPQ8074/m3_fw.mdt", .crash_reason_smem = WCSS_CRASH_REASON, @@ -1185,6 +1185,20 @@ static const struct wcss_data wcss_ipq8074_res_init = { .need_mem_protection = true, }; +static const struct wcss_data wcss_ipq6018_res_init = { + .init_clock = ipq_init_clock, + .q6_firmware_name = "IPQ6018/q6_fw.mdt", + .m3_firmware_name = "IPQ6018/m3_fw.mdt", + .crash_reason_smem = WCSS_CRASH_REASON, + .aon_reset_required = true, + .wcss_q6_reset_required = true, + .bcr_reset_required = false, + .ssr_name = "q6wcss", + .ops = &q6v5_wcss_ipq8074_ops, + .requires_force_stop = true, + .need_mem_protection = true, +}; + static const struct wcss_data wcss_qcs404_res_init = { .init_clock = qcs404_init_clock, .init_regulator = qcs404_init_regulator, @@ -1203,6 +1217,7 @@ static const struct wcss_data wcss_qcs404_res_init = { static const struct of_device_id q6v5_wcss_of_match[] = { { .compatible = "qcom,ipq8074-wcss-pil", .data = &wcss_ipq8074_res_init }, + { .compatible = "qcom,ipq6018-wcss-pil", .data = &wcss_ipq6018_res_init }, { .compatible = "qcom,qcs404-wcss-pil", .data = &wcss_qcs404_res_init }, { }, }; -- 2.7.4
[PATCH 0/3] remoteproc: qcom: q6v5-wcss: Add driver data for IPQ6018
Q6 based WiFi fw loading is supported across different targets, ex: IPQ8074/QCS404. In order to support different fw name for IPQ6018, populate hardcoded param using compatible and driver data. Gokul Sriram Palanisamy (3): dt-bindings: remoteproc: qcom: Add Q6V5 Modem PIL binding for IPQ6018 remoteproc: qcom: wcss: populate driver data for IPQ6018 arm64: dts: ipq6018: Update WCSS PIL driver compatible .../devicetree/bindings/remoteproc/qcom,q6v5.txt | 4 arch/arm64/boot/dts/qcom/ipq6018.dtsi | 2 +- drivers/remoteproc/qcom_q6v5_wcss.c | 19 +-- 3 files changed, 22 insertions(+), 3 deletions(-) -- 2.7.4
Re: [v5 PATCH 02/11] mm: vmscan: consolidate shrinker_maps handling code
On Fri, Jan 29, 2021 at 6:34 AM Kirill Tkhai wrote: > > On 28.01.2021 02:33, Yang Shi wrote: > > The shrinker map management is not purely memcg specific, it is at the > > intersection > > between memory cgroup and shrinkers. It's allocation and assignment of a > > structure, > > and the only memcg bit is the map is being stored in a memcg structure. So > > move the > > shrinker_maps handling code into vmscan.c for tighter integration with > > shrinker code, > > and remove the "memcg_" prefix. There is no functional change. > > > > Signed-off-by: Yang Shi > > --- > > include/linux/memcontrol.h | 12 ++-- > > mm/huge_memory.c | 4 +- > > mm/list_lru.c | 6 +- > > mm/memcontrol.c| 130 + > > mm/vmscan.c| 130 - > > 5 files changed, 142 insertions(+), 140 deletions(-) > > > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > > index eeb0b52203e9..0ee2924991fb 100644 > > --- a/include/linux/memcontrol.h > > +++ b/include/linux/memcontrol.h > > @@ -1581,10 +1581,10 @@ static inline bool > > mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg) > > return false; > > } > > > > -extern int memcg_expand_shrinker_maps(int new_id); > > - > > -extern void memcg_set_shrinker_bit(struct mem_cgroup *memcg, > > -int nid, int shrinker_id); > > +extern int alloc_shrinker_maps(struct mem_cgroup *memcg); > > +extern void free_shrinker_maps(struct mem_cgroup *memcg); > > +extern void set_shrinker_bit(struct mem_cgroup *memcg, > > + int nid, int shrinker_id); > > #else > > #define mem_cgroup_sockets_enabled 0 > > static inline void mem_cgroup_sk_alloc(struct sock *sk) { }; > > @@ -1594,8 +1594,8 @@ static inline bool > > mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg) > > return false; > > } > > > > -static inline void memcg_set_shrinker_bit(struct mem_cgroup *memcg, > > - int nid, int shrinker_id) > > +static inline void set_shrinker_bit(struct mem_cgroup *memcg, > > + int nid, int shrinker_id) > > { > > } > > #endif > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > index 9237976abe72..05190d7f32ae 100644 > > --- a/mm/huge_memory.c > > +++ b/mm/huge_memory.c > > @@ -2823,8 +2823,8 @@ void deferred_split_huge_page(struct page *page) > > ds_queue->split_queue_len++; > > #ifdef CONFIG_MEMCG > > if (memcg) > > - memcg_set_shrinker_bit(memcg, page_to_nid(page), > > -deferred_split_shrinker.id); > > + set_shrinker_bit(memcg, page_to_nid(page), > > + deferred_split_shrinker.id); > > #endif > > } > > spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); > > diff --git a/mm/list_lru.c b/mm/list_lru.c > > index fe230081690b..628030fa5f69 100644 > > --- a/mm/list_lru.c > > +++ b/mm/list_lru.c > > @@ -125,8 +125,8 @@ bool list_lru_add(struct list_lru *lru, struct > > list_head *item) > > list_add_tail(item, &l->list); > > /* Set shrinker bit if the first element was added */ > > if (!l->nr_items++) > > - memcg_set_shrinker_bit(memcg, nid, > > -lru_shrinker_id(lru)); > > + set_shrinker_bit(memcg, nid, > > + lru_shrinker_id(lru)); > > nlru->nr_items++; > > spin_unlock(&nlru->lock); > > return true; > > @@ -548,7 +548,7 @@ static void memcg_drain_list_lru_node(struct list_lru > > *lru, int nid, > > > > if (src->nr_items) { > > dst->nr_items += src->nr_items; > > - memcg_set_shrinker_bit(dst_memcg, nid, lru_shrinker_id(lru)); > > + set_shrinker_bit(dst_memcg, nid, lru_shrinker_id(lru)); > > src->nr_items = 0; > > } > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index e2de77b5bcc2..f5c9a0d2160b 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -397,130 +397,6 @@ DEFINE_STATIC_KEY_FALSE(memcg_kmem_enabled_key); > > EXPORT_SYMBOL(memcg_kmem_enabled_key); > > #endif > > > > -static int memcg_shrinker_map_size; > > -static DEFINE_MUTEX(memcg_shrinker_map_mutex); > > - > > -static void memcg_free_shrinker_map_rcu(struct rcu_head *head) > > -{ > > - kvfree(container_of(head, struct memcg_shrinker_map, rcu)); > > -} > > - > > -static int memcg_expand_one_shrinker_map(struct mem_cgroup *memcg, > > - int size, int old_size) > > -{ > > - struct memcg_shrinker_map *new, *old; > > - int nid; > > - > > - lockdep_assert_held(&memcg_shrinker_map_mutex); > > - > > - for_each_node(nid) { > > - old = rcu_der
[PATCH 2/3] remoteproc: qcom: wcss: populate driver data for IPQ6018
Populate hardcoded param using driver data for IPQ6018 SoCs. Signed-off-by: Gokul Sriram Palanisamy --- drivers/remoteproc/qcom_q6v5_wcss.c | 19 +-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/drivers/remoteproc/qcom_q6v5_wcss.c b/drivers/remoteproc/qcom_q6v5_wcss.c index 7c64bfc..bc9531c 100644 --- a/drivers/remoteproc/qcom_q6v5_wcss.c +++ b/drivers/remoteproc/qcom_q6v5_wcss.c @@ -965,7 +965,7 @@ static int q6v5_alloc_memory_region(struct q6v5_wcss *wcss) return 0; } -static int ipq8074_init_clock(struct q6v5_wcss *wcss) +static int ipq_init_clock(struct q6v5_wcss *wcss) { int ret; @@ -1172,7 +1172,7 @@ static int q6v5_wcss_remove(struct platform_device *pdev) } static const struct wcss_data wcss_ipq8074_res_init = { - .init_clock = ipq8074_init_clock, + .init_clock = ipq_init_clock, .q6_firmware_name = "IPQ8074/q6_fw.mdt", .m3_firmware_name = "IPQ8074/m3_fw.mdt", .crash_reason_smem = WCSS_CRASH_REASON, @@ -1185,6 +1185,20 @@ static const struct wcss_data wcss_ipq8074_res_init = { .need_mem_protection = true, }; +static const struct wcss_data wcss_ipq6018_res_init = { + .init_clock = ipq_init_clock, + .q6_firmware_name = "IPQ6018/q6_fw.mdt", + .m3_firmware_name = "IPQ6018/m3_fw.mdt", + .crash_reason_smem = WCSS_CRASH_REASON, + .aon_reset_required = true, + .wcss_q6_reset_required = true, + .bcr_reset_required = false, + .ssr_name = "q6wcss", + .ops = &q6v5_wcss_ipq8074_ops, + .requires_force_stop = true, + .need_mem_protection = true, +}; + static const struct wcss_data wcss_qcs404_res_init = { .init_clock = qcs404_init_clock, .init_regulator = qcs404_init_regulator, @@ -1203,6 +1217,7 @@ static const struct wcss_data wcss_qcs404_res_init = { static const struct of_device_id q6v5_wcss_of_match[] = { { .compatible = "qcom,ipq8074-wcss-pil", .data = &wcss_ipq8074_res_init }, + { .compatible = "qcom,ipq6018-wcss-pil", .data = &wcss_ipq6018_res_init }, { .compatible = "qcom,qcs404-wcss-pil", .data = &wcss_qcs404_res_init }, { }, }; -- 2.7.4
[PATCH 0/3] remoteproc: qcom: q6v5-wcss: Add driver data for IPQ6018
Q6 based WiFi fw loading is supported across different targets, ex: IPQ8074/QCS404. In order to support different fw name for IPQ6018, populate hardcoded param using compatible and driver data. This series depends on [PATCH v8] remoteproc: qcom: q6v5-wcss: Add support for secure pil Gokul Sriram Palanisamy (3): dt-bindings: remoteproc: qcom: Add Q6V5 Modem PIL binding for IPQ6018 remoteproc: qcom: wcss: populate driver data for IPQ6018 arm64: dts: ipq6018: Update WCSS PIL driver compatible .../devicetree/bindings/remoteproc/qcom,q6v5.txt | 4 arch/arm64/boot/dts/qcom/ipq6018.dtsi | 2 +- drivers/remoteproc/qcom_q6v5_wcss.c | 19 +-- 3 files changed, 22 insertions(+), 3 deletions(-) -- 2.7.4
[PATCH 1/3] dt-bindings: remoteproc: qcom: Add Q6V5 Modem PIL binding for IPQ6018
Add a new modem compatible string for IPQ6018 SoCs Signed-off-by: Gokul Sriram Palanisamy --- Documentation/devicetree/bindings/remoteproc/qcom,q6v5.txt | 4 1 file changed, 4 insertions(+) diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,q6v5.txt b/Documentation/devicetree/bindings/remoteproc/qcom,q6v5.txt index 69c49c7..7f1d5783 100644 --- a/Documentation/devicetree/bindings/remoteproc/qcom,q6v5.txt +++ b/Documentation/devicetree/bindings/remoteproc/qcom,q6v5.txt @@ -9,6 +9,7 @@ on the Qualcomm Hexagon core. Definition: must be one of: "qcom,q6v5-pil", "qcom,ipq8074-wcss-pil" + "qcom,ipq6018-wcss-pil" "qcom,qcs404-wcss-pil" "qcom,msm8916-mss-pil", "qcom,msm8974-mss-pil" @@ -40,6 +41,7 @@ on the Qualcomm Hexagon core. string: qcom,q6v5-pil: qcom,ipq8074-wcss-pil: + qcom,ipq6018-wcss-pil: qcom,qcs404-wcss-pil: qcom,msm8916-mss-pil: qcom,msm8974-mss-pil: @@ -68,6 +70,7 @@ on the Qualcomm Hexagon core. Value type: Definition: The clocks needed depend on the compatible string: qcom,ipq8074-wcss-pil: + qcom,ipq6018-wcss-pil: no clock names required qcom,qcs404-wcss-pil: must be "xo", "gcc_abhs_cbcr", "gcc_abhs_cbcr", @@ -165,6 +168,7 @@ For the compatible string below the following supplies are required: Value type: Definition: The power-domains needed depend on the compatible string: qcom,ipq8074-wcss-pil: + qcom,ipq6018-wcss-pil: no power-domain names required qcom,q6v5-pil: qcom,msm8916-mss-pil: -- 2.7.4
Re: [PATCH] dt-bindings: Cleanup standard unit properties
On 1/28/21 8:45 PM, Rob Herring wrote: Properties with standard unit suffixes already have a type and don't need type definitions. They also default to a single entry, so 'maxItems: 1' can be dropped. adi,ad5758 is an oddball which defined an enum of arrays. While a valid schema, it is simpler as a whole to only define scalar constraints. Cc: Jean Delvare Cc: Guenter Roeck Cc: Jonathan Cameron Cc: Lars-Peter Clausen Cc: Alexandre Torgue Cc: Dmitry Torokhov Cc: Ulf Hansson Cc: "David S. Miller" Cc: Jakub Kicinski Cc: Sebastian Reichel Cc: Mark Brown Cc: Alexandre Belloni Cc: Greg Kroah-Hartman Cc: Serge Semin Cc: Wolfram Sang Cc: linux-hw...@vger.kernel.org Cc: linux-...@vger.kernel.org Cc: linux-...@vger.kernel.org Cc: linux-arm-ker...@lists.infradead.org Cc: linux-in...@vger.kernel.org Cc: linux-...@vger.kernel.org Cc: net...@vger.kernel.org Cc: linux...@vger.kernel.org Cc: linux-...@vger.kernel.org Cc: linux-ser...@vger.kernel.org Cc: alsa-de...@alsa-project.org Cc: linux-watch...@vger.kernel.org Signed-off-by: Rob Herring --- .../devicetree/bindings/arm/cpus.yaml | 1 - .../bindings/extcon/wlf,arizona.yaml | 1 - .../bindings/hwmon/adi,ltc2947.yaml | 1 - .../bindings/hwmon/baikal,bt1-pvt.yaml| 8 ++-- .../devicetree/bindings/hwmon/ti,tmp513.yaml | 1 - .../devicetree/bindings/i2c/i2c-gpio.yaml | 2 - .../bindings/i2c/snps,designware-i2c.yaml | 3 -- .../bindings/iio/adc/maxim,max9611.yaml | 1 - .../bindings/iio/adc/st,stm32-adc.yaml| 1 - .../bindings/iio/adc/ti,palmas-gpadc.yaml | 2 - .../bindings/iio/dac/adi,ad5758.yaml | 41 --- .../bindings/iio/health/maxim,max30100.yaml | 1 - .../input/touchscreen/touchscreen.yaml| 2 - .../bindings/mmc/mmc-controller.yaml | 1 - .../bindings/mmc/mmc-pwrseq-simple.yaml | 2 - .../bindings/net/ethernet-controller.yaml | 2 - .../devicetree/bindings/net/snps,dwmac.yaml | 1 - .../bindings/power/supply/battery.yaml| 3 -- .../bindings/power/supply/bq2515x.yaml| 1 - .../bindings/regulator/dlg,da9121.yaml| 1 - .../bindings/regulator/fixed-regulator.yaml | 2 - .../devicetree/bindings/rtc/rtc.yaml | 2 - .../devicetree/bindings/serial/pl011.yaml | 2 - .../devicetree/bindings/sound/sgtl5000.yaml | 2 - .../bindings/watchdog/watchdog.yaml | 1 - 25 files changed, 29 insertions(+), 56 deletions(-) For stm32: Acked-by: Alexandre TORGUE diff --git a/Documentation/devicetree/bindings/arm/cpus.yaml b/Documentation/devicetree/bindings/arm/cpus.yaml index 14cd727d3c4b..f02fd10de604 100644 --- a/Documentation/devicetree/bindings/arm/cpus.yaml +++ b/Documentation/devicetree/bindings/arm/cpus.yaml @@ -232,7 +232,6 @@ properties: by this cpu (see ./idle-states.yaml). capacity-dmips-mhz: -$ref: '/schemas/types.yaml#/definitions/uint32' description: u32 value representing CPU capacity (see ./cpu-capacity.txt) in DMIPS/MHz, relative to highest capacity-dmips-mhz diff --git a/Documentation/devicetree/bindings/extcon/wlf,arizona.yaml b/Documentation/devicetree/bindings/extcon/wlf,arizona.yaml index 5fe784f487c5..efdf59abb2e1 100644 --- a/Documentation/devicetree/bindings/extcon/wlf,arizona.yaml +++ b/Documentation/devicetree/bindings/extcon/wlf,arizona.yaml @@ -85,7 +85,6 @@ properties: wlf,micd-timeout-ms: description: Timeout for microphone detection, specified in milliseconds. -$ref: "/schemas/types.yaml#/definitions/uint32" wlf,micd-force-micbias: description: diff --git a/Documentation/devicetree/bindings/hwmon/adi,ltc2947.yaml b/Documentation/devicetree/bindings/hwmon/adi,ltc2947.yaml index eef614962b10..bf04151b63d2 100644 --- a/Documentation/devicetree/bindings/hwmon/adi,ltc2947.yaml +++ b/Documentation/devicetree/bindings/hwmon/adi,ltc2947.yaml @@ -49,7 +49,6 @@ properties: description: This property controls the Accumulation Dead band which allows to set the level of current below which no accumulation takes place. -$ref: /schemas/types.yaml#/definitions/uint32 maximum: 255 default: 0 diff --git a/Documentation/devicetree/bindings/hwmon/baikal,bt1-pvt.yaml b/Documentation/devicetree/bindings/hwmon/baikal,bt1-pvt.yaml index 00a6511354e6..5d3ce641fcde 100644 --- a/Documentation/devicetree/bindings/hwmon/baikal,bt1-pvt.yaml +++ b/Documentation/devicetree/bindings/hwmon/baikal,bt1-pvt.yaml @@ -73,11 +73,9 @@ properties: description: | Temperature sensor trimming factor. It can be used to manually adjust the temperature measurements within 7.130 degrees Celsius. -maxItems: 1 -items: - default: 0 - minimum: 0 - maximum: 7130 +default: 0 +minimum: 0 +maximum: 7130 additionalProperties: false diff --git a/Documentation/devicetree/bindings/hwmon/ti,t
[PATCH 3/3] arm64: dts: ipq6018: Update WCSS PIL driver compatible
Updated WCSS PIL driver node with IPQ6018 specific compatible to enable SoC specific driver data. Signed-off-by: Gokul Sriram Palanisamy --- arch/arm64/boot/dts/qcom/ipq6018.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/boot/dts/qcom/ipq6018.dtsi b/arch/arm64/boot/dts/qcom/ipq6018.dtsi index 9fa5b02..2e6b23b 100644 --- a/arch/arm64/boot/dts/qcom/ipq6018.dtsi +++ b/arch/arm64/boot/dts/qcom/ipq6018.dtsi @@ -477,7 +477,7 @@ }; q6v5_wcss: remoteproc@cd0 { - compatible = "qcom,ipq8074-wcss-pil"; + compatible = "qcom,ipq6018-wcss-pil"; reg = <0x0 0x0cd0 0x0 0x4040>, <0x0 0x004ab000 0x0 0x20>; reg-names = "qdsp6", -- 2.7.4
[PATCH 3/5] dt-bindings: nvmem: Add bindings for rmem driver
From: Nicolas Saenz Julienne Firmware/co-processors might use reserved memory areas in order to pass data stemming from an nvmem device otherwise non accessible to Linux. For example an EEPROM memory only physically accessible to firmware, or data only accessible early at boot time. Introduce the dt-bindings to nvmem's rmem. Signed-off-by: Nicolas Saenz Julienne Reviewed-by: Rob Herring Signed-off-by: Srinivas Kandagatla --- .../devicetree/bindings/nvmem/rmem.yaml | 49 +++ 1 file changed, 49 insertions(+) create mode 100644 Documentation/devicetree/bindings/nvmem/rmem.yaml diff --git a/Documentation/devicetree/bindings/nvmem/rmem.yaml b/Documentation/devicetree/bindings/nvmem/rmem.yaml new file mode 100644 index ..1d85a0a30846 --- /dev/null +++ b/Documentation/devicetree/bindings/nvmem/rmem.yaml @@ -0,0 +1,49 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/nvmem/rmem.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Reserved Memory Based nvmem Device + +maintainers: + - Nicolas Saenz Julienne + +allOf: + - $ref: "nvmem.yaml#" + +properties: + compatible: +items: + - enum: + - raspberrypi,bootloader-config + - const: nvmem-rmem + + no-map: +$ref: /schemas/types.yaml#/definitions/flag +description: + Avoid creating a virtual mapping of the region as part of the OS' + standard mapping of system memory. + +required: + - compatible + - no-map + +unevaluatedProperties: false + +examples: + - | +reserved-memory { +#address-cells = <1>; +#size-cells = <1>; + +blconfig: nvram@1000 { +compatible = "raspberrypi,bootloader-config", "nvmem-rmem"; +#address-cells = <1>; +#size-cells = <1>; +reg = <0x1000 0x1000>; +no-map; +}; +}; + +... -- 2.21.0
[PATCH 2/5] nvmem: imx-iim: Use of_device_get_match_data()
From: Fabio Estevam The retrieval of driver data via of_device_get_match_data() can make the code simpler. Use of_device_get_match_data() to simplify the code. Signed-off-by: Fabio Estevam Signed-off-by: Srinivas Kandagatla --- drivers/nvmem/imx-iim.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/drivers/nvmem/imx-iim.c b/drivers/nvmem/imx-iim.c index 701704b87dc9..c86339a7f583 100644 --- a/drivers/nvmem/imx-iim.c +++ b/drivers/nvmem/imx-iim.c @@ -96,7 +96,6 @@ MODULE_DEVICE_TABLE(of, imx_iim_dt_ids); static int imx_iim_probe(struct platform_device *pdev) { - const struct of_device_id *of_id; struct device *dev = &pdev->dev; struct iim_priv *iim; struct nvmem_device *nvmem; @@ -111,11 +110,7 @@ static int imx_iim_probe(struct platform_device *pdev) if (IS_ERR(iim->base)) return PTR_ERR(iim->base); - of_id = of_match_device(imx_iim_dt_ids, dev); - if (!of_id) - return -ENODEV; - - drvdata = of_id->data; + drvdata = of_device_get_match_data(&pdev->dev); iim->clk = devm_clk_get(dev, NULL); if (IS_ERR(iim->clk)) -- 2.21.0
[PATCH 1/5] nvmem: core: Fix a resource leak on error in nvmem_add_cells_from_of()
From: Dan Carpenter This doesn't call of_node_put() on the error path so it leads to a memory leak. Fixes: 0749aa25af82 ("nvmem: core: fix regression in of_nvmem_cell_get()") Signed-off-by: Dan Carpenter Signed-off-by: Srinivas Kandagatla --- drivers/nvmem/core.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/nvmem/core.c b/drivers/nvmem/core.c index 177f5bf27c6d..68ae6f24b57f 100644 --- a/drivers/nvmem/core.c +++ b/drivers/nvmem/core.c @@ -713,6 +713,7 @@ static int nvmem_add_cells_from_of(struct nvmem_device *nvmem) cell->name, nvmem->stride); /* Cells already added will be freed later. */ kfree_const(cell->name); + of_node_put(cell->np); kfree(cell); return -EINVAL; } -- 2.21.0
[PATCH 4/5] nvmem: Add driver to expose reserved memory as nvmem
From: Nicolas Saenz Julienne Firmware/co-processors might use reserved memory areas in order to pass data stemming from an nvmem device otherwise non accessible to Linux. For example an EEPROM memory only physically accessible to firmware, or data only accessible early at boot time. In order to expose this data to other drivers and user-space, the driver models the reserved memory area as an nvmem device. Signed-off-by: Nicolas Saenz Julienne Reviewed-by: Rob Herring Tested-by: Tim Gover Signed-off-by: Srinivas Kandagatla --- drivers/nvmem/Kconfig | 8 drivers/nvmem/Makefile | 2 + drivers/nvmem/rmem.c | 97 ++ drivers/of/platform.c | 1 + 4 files changed, 108 insertions(+) create mode 100644 drivers/nvmem/rmem.c diff --git a/drivers/nvmem/Kconfig b/drivers/nvmem/Kconfig index 954d3b4a52ab..fecc19b884bf 100644 --- a/drivers/nvmem/Kconfig +++ b/drivers/nvmem/Kconfig @@ -270,4 +270,12 @@ config SPRD_EFUSE This driver can also be built as a module. If so, the module will be called nvmem-sprd-efuse. +config NVMEM_RMEM + tristate "Reserved Memory Based Driver Support" + help + This drivers maps reserved memory into an nvmem device. It might be + useful to expose information left by firmware in memory. + + This driver can also be built as a module. If so, the module + will be called nvmem-rmem. endif diff --git a/drivers/nvmem/Makefile b/drivers/nvmem/Makefile index a7c377218341..5376b8e0dae5 100644 --- a/drivers/nvmem/Makefile +++ b/drivers/nvmem/Makefile @@ -55,3 +55,5 @@ obj-$(CONFIG_NVMEM_ZYNQMP)+= nvmem_zynqmp_nvmem.o nvmem_zynqmp_nvmem-y := zynqmp_nvmem.o obj-$(CONFIG_SPRD_EFUSE) += nvmem_sprd_efuse.o nvmem_sprd_efuse-y := sprd-efuse.o +obj-$(CONFIG_NVMEM_RMEM) += nvmem-rmem.o +nvmem-rmem-y := rmem.o diff --git a/drivers/nvmem/rmem.c b/drivers/nvmem/rmem.c new file mode 100644 index ..b11c3c974b3d --- /dev/null +++ b/drivers/nvmem/rmem.c @@ -0,0 +1,97 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Copyright (C) 2020 Nicolas Saenz Julienne + */ + +#include +#include +#include +#include +#include + +struct rmem { + struct device *dev; + struct nvmem_device *nvmem; + struct reserved_mem *mem; + + phys_addr_t size; +}; + +static int rmem_read(void *context, unsigned int offset, +void *val, size_t bytes) +{ + struct rmem *priv = context; + size_t available = priv->mem->size; + loff_t off = offset; + void *addr; + int count; + + /* +* Only map the reserved memory at this point to avoid potential rogue +* kernel threads inadvertently modifying it. Based on the current +* uses-cases for this driver, the performance hit isn't a concern. +* Nor is likely to be, given the nature of the subsystem. Most nvmem +* devices operate over slow buses to begin with. +* +* An alternative would be setting the memory as RO, set_memory_ro(), +* but as of Dec 2020 this isn't possible on arm64. +*/ + addr = memremap(priv->mem->base, available, MEMREMAP_WB); + if (IS_ERR(addr)) { + dev_err(priv->dev, "Failed to remap memory region\n"); + return PTR_ERR(addr); + } + + count = memory_read_from_buffer(val, bytes, &off, addr, available); + + memunmap(addr); + + return count; +} + +static int rmem_probe(struct platform_device *pdev) +{ + struct nvmem_config config = { }; + struct device *dev = &pdev->dev; + struct reserved_mem *mem; + struct rmem *priv; + + priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL); + if (!priv) + return -ENOMEM; + priv->dev = dev; + + mem = of_reserved_mem_lookup(dev->of_node); + if (!mem) { + dev_err(dev, "Failed to lookup reserved memory\n"); + return -EINVAL; + } + priv->mem = mem; + + config.dev = dev; + config.priv = priv; + config.name = "rmem"; + config.size = mem->size; + config.reg_read = rmem_read; + + return PTR_ERR_OR_ZERO(devm_nvmem_register(dev, &config)); +} + +static const struct of_device_id rmem_match[] = { + { .compatible = "nvmem-rmem", }, + { /* sentinel */ }, +}; +MODULE_DEVICE_TABLE(of, rmem_match); + +static struct platform_driver rmem_driver = { + .probe = rmem_probe, + .driver = { + .name = "rmem", + .of_match_table = rmem_match, + }, +}; +module_platform_driver(rmem_driver); + +MODULE_AUTHOR("Nicolas Saenz Julienne "); +MODULE_DESCRIPTION("Reserved Memory Based nvmem Driver"); +MODULE_LICENSE("GPL"); diff --git a/drivers/of/platform.c b/drivers/of/platform.c index 79bd5f5a1bf1..6699cdbe58b6 100644 --- a/drivers/of/platform.c +++ b/drivers/of/platform.c @@ -511,6 +511,7
[PATCH 5/5] nvmem: core: skip child nodes not matching binding
From: Ahmad Fatoum The nvmem cell binding applies to all eeprom child nodes matching "^.*@[0-9a-f]+$" without taking a compatible into account. Linux drivers, like at24, are even more extensive and assume _all_ at24 eeprom child nodes to be nvmem cells since e888d445ac33 ("nvmem: resolve cells from DT at registration time"). Since df5f3b6f5357 ("dt-bindings: nvmem: stm32: new property for data access"), the additionalProperties: True means it's Ok to have other properties as long as they don't match "^.*@[0-9a-f]+$". The barebox bootloader extends the MTD partitions binding to EEPROM and can fix up following device tree node: &eeprom { partitions { compatible = "fixed-partitions"; }; }; This is allowed binding-wise, but drivers using nvmem_register() like at24 will fail to parse because the function expects all child nodes to have a reg property present. This results in the whole EEPROM driver probe failing despite the device tree being correct. Fix this by skipping nodes lacking a reg property instead of returning an error. This effectively makes the drivers adhere to the binding because all nodes with a unit address must have a reg property and vice versa. Fixes: e888d445ac33 ("nvmem: resolve cells from DT at registration time"). Signed-off-by: Ahmad Fatoum Signed-off-by: Srinivas Kandagatla --- drivers/nvmem/core.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/nvmem/core.c b/drivers/nvmem/core.c index 68ae6f24b57f..a5ab1e0c74cf 100644 --- a/drivers/nvmem/core.c +++ b/drivers/nvmem/core.c @@ -682,7 +682,9 @@ static int nvmem_add_cells_from_of(struct nvmem_device *nvmem) for_each_child_of_node(parent, child) { addr = of_get_property(child, "reg", &len); - if (!addr || (len < 2 * sizeof(u32))) { + if (!addr) + continue; + if (len < 2 * sizeof(u32)) { dev_err(dev, "nvmem: invalid reg on %pOF\n", child); return -EINVAL; } -- 2.21.0
Re: [PATCH v2] btrfs: Avoid calling btrfs_get_chunk_map() twice
On Fri, Jan 29, 2021 at 11:22:48AM -0500, Josef Bacik wrote: > On 1/27/21 8:57 AM, Michal Rostecki wrote: > > From: Michal Rostecki > > > > Before this change, the btrfs_get_io_geometry() function was calling > > btrfs_get_chunk_map() to get the extent mapping, necessary for > > calculating the I/O geometry. It was using that extent mapping only > > internally and freeing the pointer after its execution. > > > > That resulted in calling btrfs_get_chunk_map() de facto twice by the > > __btrfs_map_block() function. It was calling btrfs_get_io_geometry() > > first and then calling btrfs_get_chunk_map() directly to get the extent > > mapping, used by the rest of the function. > > > > This change fixes that by passing the extent mapping to the > > btrfs_get_io_geometry() function as an argument. > > > > v2: > > When btrfs_get_chunk_map() returns an error in btrfs_submit_direct(): > > - Use errno_to_blk_status(PTR_ERR(em)) as the status > > - Set em to NULL > > > > Signed-off-by: Michal Rostecki > > This panic'ed all of my test vms in their overnight xfstests runs, the panic > is this > > [ 2449.936502] BTRFS critical (device dm-7): mapping failed logical > 1113825280 bio len 40960 len 24576 > [ 2449.937073] [ cut here ] > [ 2449.937329] kernel BUG at fs/btrfs/volumes.c:6450! > [ 2449.937604] invalid opcode: [#1] SMP NOPTI > [ 2449.937855] CPU: 0 PID: 259045 Comm: kworker/u5:0 Not tainted 5.11.0-rc5+ > #122 > [ 2449.938252] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > 1.13.0-2.fc32 04/01/2014 > [ 2449.938713] Workqueue: btrfs-worker-high btrfs_work_helper > [ 2449.939016] RIP: 0010:btrfs_map_bio.cold+0x5a/0x5c > [ 2449.939392] Code: 37 87 ff ff e8 ed d4 8a ff 48 83 c4 18 e9 b5 52 8b ff > 49 89 c8 4c 89 fa 4c 89 f1 48 c7 c6 b0 c0 61 8b 48 89 ef e8 11 87 ff ff <0f> > 0b 4c 89 e7 e8 42 09 86 ff e9 fd 59 8b ff 49 8b 7a 50 44 89 f2 > [ 2449.940402] RSP: :9f24c1637d90 EFLAGS: 00010282 > [ 2449.940689] RAX: 0057 RBX: 90c78ff716b8 RCX: > > [ 2449.941080] RDX: 90c7fbc27ae0 RSI: 90c7fbc19110 RDI: > 90c7fbc19110 > [ 2449.941467] RBP: 90c7911d4000 R08: R09: > > [ 2449.941853] R10: 9f24c1637b48 R11: 8b9723e8 R12: > > [ 2449.942243] R13: R14: a000 R15: > 4263a000 > [ 2449.942632] FS: () GS:90c7fbc0() > knlGS: > [ 2449.943072] CS: 0010 DS: ES: CR0: 80050033 > [ 2449.943386] CR2: 5575163c3080 CR3: 00010ad6c004 CR4: > 00370ef0 > [ 2449.943772] Call Trace: > [ 2449.943915] ? lock_release+0x1c3/0x290 > [ 2449.944135] run_one_async_done+0x3a/0x60 > [ 2449.944360] btrfs_work_helper+0x136/0x520 > [ 2449.944588] process_one_work+0x26e/0x570 > [ 2449.944812] worker_thread+0x55/0x3c0 > [ 2449.945016] ? process_one_work+0x570/0x570 > [ 2449.945250] kthread+0x137/0x150 > [ 2449.945430] ? __kthread_bind_mask+0x60/0x60 > [ 2449.945666] ret_from_fork+0x1f/0x30 > > it happens when you run btrfs/060. Please make sure to run xfstests against > patches before you submit them upstream. Thanks, > > Josef Umm... I ran the xftests against v1 patch and didn't get that panic. I'll try to reproduce and fix that now. Thanks for the heads up and sorry! Thanks, Michal
[PATCH 0/5] nvmem: patches (set 1) for 5.12
Hi Greg, Here are some nvmem patches for 5.12 which includes - adding support to new rmem nvmem provider - a improvement in core to skip invalid node and a fix a leak - patch in imx driver to use of_device_get_match_data Can you please queue them up for 5.12. thanks for you help, srini Ahmad Fatoum (1): nvmem: core: skip child nodes not matching binding Dan Carpenter (1): nvmem: core: Fix a resource leak on error in nvmem_add_cells_from_of() Fabio Estevam (1): nvmem: imx-iim: Use of_device_get_match_data() Nicolas Saenz Julienne (2): dt-bindings: nvmem: Add bindings for rmem driver nvmem: Add driver to expose reserved memory as nvmem .../devicetree/bindings/nvmem/rmem.yaml | 49 ++ drivers/nvmem/Kconfig | 8 ++ drivers/nvmem/Makefile| 2 + drivers/nvmem/core.c | 5 +- drivers/nvmem/imx-iim.c | 7 +- drivers/nvmem/rmem.c | 97 +++ drivers/of/platform.c | 1 + 7 files changed, 162 insertions(+), 7 deletions(-) create mode 100644 Documentation/devicetree/bindings/nvmem/rmem.yaml create mode 100644 drivers/nvmem/rmem.c -- 2.21.0
Re: [v5 PATCH 08/11] mm: vmscan: use per memcg nr_deferred of shrinker
On Fri, Jan 29, 2021 at 6:59 AM Kirill Tkhai wrote: > > On 29.01.2021 17:55, Kirill Tkhai wrote: > > On 28.01.2021 02:33, Yang Shi wrote: > >> Use per memcg's nr_deferred for memcg aware shrinkers. The shrinker's > >> nr_deferred > >> will be used in the following cases: > >> 1. Non memcg aware shrinkers > >> 2. !CONFIG_MEMCG > >> 3. memcg is disabled by boot parameter > >> > >> Signed-off-by: Yang Shi > >> --- > >> mm/vmscan.c | 87 - > >> 1 file changed, 73 insertions(+), 14 deletions(-) > >> > >> diff --git a/mm/vmscan.c b/mm/vmscan.c > >> index 20be0db291fe..e1f8960f5cf6 100644 > >> --- a/mm/vmscan.c > >> +++ b/mm/vmscan.c > >> @@ -205,7 +205,8 @@ static int expand_one_shrinker_info(struct mem_cgroup > >> *memcg, > >> > >> for_each_node(nid) { > >> old = rcu_dereference_protected( > >> -mem_cgroup_nodeinfo(memcg, nid)->shrinker_info, true); > >> +mem_cgroup_nodeinfo(memcg, nid)->shrinker_info, > >> +lockdep_is_held(&shrinker_rwsem)); > > > > Won't it better to pack this repeating pattern into helper function, e.g.: > > > > static struct shrinker_info memcg_shrinker_info(struct mem_cgroup *memcg, > > int nid) > > { > > return rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_info, > > lockdep_is_held(&shrinker_rwsem)); > > } > > > > ? > > > > Even shrink_slab_memcg() may want to use it. > > Hm, I see you already introduced a helper in [10/11], but it is used in only > place. > Then, we should use it for all places (introduce the helper earlier). Yes, good point. Will fix in v6. > > >> /* Not yet online memcg */ > >> if (!old) > >> return 0; > >> @@ -239,7 +240,8 @@ void free_shrinker_info(struct mem_cgroup *memcg) > >> > >> for_each_node(nid) { > >> pn = mem_cgroup_nodeinfo(memcg, nid); > >> -info = rcu_dereference_protected(pn->shrinker_info, true); > >> +info = rcu_dereference_protected(pn->shrinker_info, > >> + > >> lockdep_is_held(&shrinker_rwsem)); > >> if (info) > >> kvfree(info); > >> rcu_assign_pointer(pn->shrinker_info, NULL); > >> @@ -360,6 +362,27 @@ static void unregister_memcg_shrinker(struct shrinker > >> *shrinker) > >> up_write(&shrinker_rwsem); > >> } > >> > >> +static long count_nr_deferred_memcg(int nid, struct shrinker *shrinker, > >> +struct mem_cgroup *memcg) > >> +{ > >> +struct shrinker_info *info; > >> + > >> +info = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_info, > >> + lockdep_is_held(&shrinker_rwsem)); > >> +return atomic_long_xchg(&info->nr_deferred[shrinker->id], 0); > >> +} > >> + > >> +static long set_nr_deferred_memcg(long nr, int nid, struct shrinker > >> *shrinker, > >> + struct mem_cgroup *memcg) > >> +{ > >> +struct shrinker_info *info; > >> + > >> +info = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_info, > >> + lockdep_is_held(&shrinker_rwsem)); > >> + > >> +return atomic_long_add_return(nr, &info->nr_deferred[shrinker->id]); > >> +} > >> + > >> static bool cgroup_reclaim(struct scan_control *sc) > >> { > >> return sc->target_mem_cgroup; > >> @@ -398,6 +421,18 @@ static void unregister_memcg_shrinker(struct shrinker > >> *shrinker) > >> { > >> } > >> > >> +static long count_nr_deferred_memcg(int nid, struct shrinker *shrinker, > >> +struct mem_cgroup *memcg) > >> +{ > >> +return 0; > >> +} > >> + > >> +static long set_nr_deferred_memcg(long nr, int nid, struct shrinker > >> *shrinker, > >> + struct mem_cgroup *memcg) > >> +{ > >> +return 0; > >> +} > >> + > >> static bool cgroup_reclaim(struct scan_control *sc) > >> { > >> return false; > >> @@ -409,6 +444,39 @@ static bool writeback_throttling_sane(struct > >> scan_control *sc) > >> } > >> #endif > >> > >> +static long count_nr_deferred(struct shrinker *shrinker, > >> + struct shrink_control *sc) > >> +{ > >> +int nid = sc->nid; > >> + > >> +if (!(shrinker->flags & SHRINKER_NUMA_AWARE)) > >> +nid = 0; > >> + > >> +if (sc->memcg && > >> +(shrinker->flags & SHRINKER_MEMCG_AWARE)) > >> +return count_nr_deferred_memcg(nid, shrinker, > >> + sc->memcg); > >> + > >> +return atomic_long_xchg(&shrinker->nr_deferred[nid], 0); > >> +} > >> + > >> + > >> +static long set_nr_deferred(long nr, struct shrinker *shrinker, > >> +struct shrink_control *sc) > >> +{ > >> +int nid = sc->nid; > >> + > >> +if (!(shrinker->flags & SHRINK
Re: [v5 PATCH 07/11] mm: vmscan: add per memcg shrinker nr_deferred
On Fri, Jan 29, 2021 at 5:00 AM Vlastimil Babka wrote: > > On 1/28/21 12:33 AM, Yang Shi wrote: > > Currently the number of deferred objects are per shrinker, but some slabs, > > for example, > > vfs inode/dentry cache are per memcg, this would result in poor isolation > > among memcgs. > > > > The deferred objects typically are generated by __GFP_NOFS allocations, one > > memcg with > > excessive __GFP_NOFS allocations may blow up deferred objects, then other > > innocent memcgs > > may suffer from over shrink, excessive reclaim latency, etc. > > > > For example, two workloads run in memcgA and memcgB respectively, workload > > in B is vfs > > heavy workload. Workload in A generates excessive deferred objects, then > > B's vfs cache > > might be hit heavily (drop half of caches) by B's limit reclaim or global > > reclaim. > > > > We observed this hit in our production environment which was running vfs > > heavy workload > > shown as the below tracing log: > > > > <...>-409454 [016] 28286961.747146: mm_shrink_slab_start: > > super_cache_scan+0x0/0x1a0 9a83046f3458: > > nid: 1 objects to shrink 3641681686040 gfp_flags > > GFP_HIGHUSER_MOVABLE|__GFP_ZERO pgs_scanned 1 lru_pgs 15721 > > cache items 246404277 delta 31345 total_scan 123202138 > > <...>-409454 [022] 28287105.928018: mm_shrink_slab_end: > > super_cache_scan+0x0/0x1a0 9a83046f3458: > > nid: 1 unused scan count 3641681686040 new scan count 3641798379189 > > total_scan 602 > > last shrinker return val 123186855 > > > > The vfs cache and page cache ration was 10:1 on this machine, and half of > > caches were dropped. > > This also resulted in significant amount of page caches were dropped due to > > inodes eviction. > > > > Make nr_deferred per memcg for memcg aware shrinkers would solve the > > unfairness and bring > > better isolation. > > > > When memcg is not enabled (!CONFIG_MEMCG or memcg disabled), the shrinker's > > nr_deferred > > would be used. And non memcg aware shrinkers use shrinker's nr_deferred > > all the time. > > > > Signed-off-by: Yang Shi > > --- > > include/linux/memcontrol.h | 7 +++--- > > mm/vmscan.c| 48 +- > > 2 files changed, 36 insertions(+), 19 deletions(-) > > > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > > index 62b888b88a5f..e0384367e07d 100644 > > --- a/include/linux/memcontrol.h > > +++ b/include/linux/memcontrol.h > > @@ -93,12 +93,13 @@ struct lruvec_stat { > > }; > > > > /* > > - * Bitmap of shrinker::id corresponding to memcg-aware shrinkers, > > - * which have elements charged to this memcg. > > + * Bitmap and deferred work of shrinker::id corresponding to memcg-aware > > + * shrinkers, which have elements charged to this memcg. > > */ > > struct shrinker_info { > > struct rcu_head rcu; > > - unsigned long map[]; > > + unsigned long *map; > > + atomic_long_t *nr_deferred; > > }; > > > > /* > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 256896d157d4..20be0db291fe 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -187,16 +187,21 @@ static DECLARE_RWSEM(shrinker_rwsem); > > #ifdef CONFIG_MEMCG > > static int shrinker_nr_max; > > > > +#define NR_MAX_TO_SHR_MAP_SIZE(nr_max) \ > > + ((nr_max / BITS_PER_LONG + 1) * sizeof(unsigned long)) > > Could have been part of patch 4 already. And yeah, using DIV_ROUND_UP(), as > being hidden in a macro makes the "shorter statement" benefit disappear :) > > > + > > static void free_shrinker_info_rcu(struct rcu_head *head) > > { > > kvfree(container_of(head, struct shrinker_info, rcu)); > > } > > > > static int expand_one_shrinker_info(struct mem_cgroup *memcg, > > -int size, int old_size) > > + int m_size, int d_size, > > + int old_m_size, int old_d_size) > > { > > struct shrinker_info *new, *old; > > int nid; > > + int size = m_size + d_size; > > > > for_each_node(nid) { > > old = rcu_dereference_protected( > > @@ -209,9 +214,15 @@ static int expand_one_shrinker_info(struct mem_cgroup > > *memcg, > > if (!new) > > return -ENOMEM; > > > > - /* Set all old bits, clear all new bits */ > > - memset(new->map, (int)0xff, old_size); > > - memset((void *)new->map + old_size, 0, size - old_size); > > + new->map = (unsigned long *)(new + 1); > > + new->nr_deferred = (void *)new->map + m_size; > > This better be aligned to sizeof(atomic_long_t). Can we be sure about that? Good point. No, if unsigned long is 32 bit on some 64 bit machines. > Also it's all quite ugly and complex. Is it worth it? What about just leaving > map as it is and allocating a nr_deferred array separately, i.e.: > > struct shrinker_info { > struct rcu_head rcu; > atomic_long_t *nr_deferred; //
Re: [net-next PATCH v4 01/15] Documentation: ACPI: DSD: Document MDIO PHY
On Fri, Jan 29, 2021 at 6:44 PM Rafael J. Wysocki wrote: > On Fri, Jan 29, 2021 at 5:37 PM Rafael J. Wysocki wrote: > > On Fri, Jan 29, 2021 at 7:48 AM Calvin Johnson > > wrote: ... > > It would work, but I would introduce a wrapper around the _ADR > > evaluation, something like: > > > > int acpi_get_local_address(acpi_handle handle, u32 *addr) > > { > > unsigned long long adr; > > acpi_status status; > > > > status = acpi_evaluate_integer(handle, METHOD_NAME__ADR, NULL, &adr); > > if (ACPI_FAILURE(status)) > > return -ENODATA; > > > > *addr = (u32)adr; > > return 0; > > } > > > > in drivers/acpi/utils.c and add a static inline stub always returning > > -ENODEV for it for !CONFIG_ACPI. ... > BTW, you may not need the fwnode_get_local_addr() at all then, just > evaluate either the "reg" property for OF or acpi_get_local_address() > for ACPI in the "caller" code directly. A common helper doing this can > be added later. Sounds good to me and it will address your concern about different semantics of reg/_ADR on per driver/subsystem basis. -- With Best Regards, Andy Shevchenko
[RFC v4 1/3] vfio/platform: add support for msi
MSI support for platform devices. MSI is added as a single 'index' with 'count' as the number of MSI(s) supported by the devices. Signed-off-by: Vikas Gupta --- drivers/vfio/platform/Kconfig | 1 + drivers/vfio/platform/vfio_platform_common.c | 95 ++- drivers/vfio/platform/vfio_platform_irq.c | 253 -- drivers/vfio/platform/vfio_platform_private.h | 29 ++ include/uapi/linux/vfio.h | 24 ++ 5 files changed, 373 insertions(+), 29 deletions(-) diff --git a/drivers/vfio/platform/Kconfig b/drivers/vfio/platform/Kconfig index dc1a3c44f2c6..d4bbc9f27763 100644 --- a/drivers/vfio/platform/Kconfig +++ b/drivers/vfio/platform/Kconfig @@ -3,6 +3,7 @@ config VFIO_PLATFORM tristate "VFIO support for platform devices" depends on VFIO && EVENTFD && (ARM || ARM64) select VFIO_VIRQFD + select GENERIC_MSI_IRQ_DOMAIN help Support for platform devices with VFIO. This is required to make use of platform devices present on the system using the VFIO diff --git a/drivers/vfio/platform/vfio_platform_common.c b/drivers/vfio/platform/vfio_platform_common.c index fb4b385191f2..f2b1f0c3bfcc 100644 --- a/drivers/vfio/platform/vfio_platform_common.c +++ b/drivers/vfio/platform/vfio_platform_common.c @@ -16,6 +16,7 @@ #include #include #include +#include #include "vfio_platform_private.h" @@ -28,23 +29,22 @@ static LIST_HEAD(reset_list); static DEFINE_MUTEX(driver_lock); -static vfio_platform_reset_fn_t vfio_platform_lookup_reset(const char *compat, - struct module **module) +static void vfio_platform_lookup_reset(const char *compat, + struct module **module, + struct vfio_platform_reset_node **node) { struct vfio_platform_reset_node *iter; - vfio_platform_reset_fn_t reset_fn = NULL; mutex_lock(&driver_lock); list_for_each_entry(iter, &reset_list, link) { if (!strcmp(iter->compat, compat) && try_module_get(iter->owner)) { *module = iter->owner; - reset_fn = iter->of_reset; + *node = iter; break; } } mutex_unlock(&driver_lock); - return reset_fn; } static int vfio_platform_acpi_probe(struct vfio_platform_device *vdev, @@ -112,15 +112,23 @@ static bool vfio_platform_has_reset(struct vfio_platform_device *vdev) static int vfio_platform_get_reset(struct vfio_platform_device *vdev) { + struct vfio_platform_reset_node *node = NULL; + if (VFIO_PLATFORM_IS_ACPI(vdev)) return vfio_platform_acpi_has_reset(vdev) ? 0 : -ENOENT; - vdev->of_reset = vfio_platform_lookup_reset(vdev->compat, - &vdev->reset_module); - if (!vdev->of_reset) { + vfio_platform_lookup_reset(vdev->compat, &vdev->reset_module, + &node); + if (!node) { request_module("vfio-reset:%s", vdev->compat); - vdev->of_reset = vfio_platform_lookup_reset(vdev->compat, - &vdev->reset_module); + vfio_platform_lookup_reset(vdev->compat, &vdev->reset_module, + &node); + } + + if (node) { + vdev->of_reset = node->of_reset; + vdev->of_get_msi = node->of_get_msi; + vdev->of_msi_write = node->of_msi_write; } return vdev->of_reset ? 0 : -ENOENT; @@ -343,9 +351,16 @@ static long vfio_platform_ioctl(void *device_data, } else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) { struct vfio_irq_info info; + struct vfio_info_cap caps = { .buf = NULL, .size = 0 }; + int ext_irq_index = vdev->num_irqs - vdev->num_ext_irqs; + unsigned long capsz; + u32 index; minsz = offsetofend(struct vfio_irq_info, count); + /* For backward compatibility, cannot require this */ + capsz = offsetofend(struct vfio_irq_info, cap_offset); + if (copy_from_user(&info, (void __user *)arg, minsz)) return -EFAULT; @@ -355,8 +370,53 @@ static long vfio_platform_ioctl(void *device_data, if (info.index >= vdev->num_irqs) return -EINVAL; - info.flags = vdev->irqs[info.index].flags; - info.count = vdev->irqs[info.index].count; + if (info.argsz >= capsz) + minsz = capsz; + + index = info.index; + + info.flags = vdev->irqs[index].flags; + info.count = vdev->irqs[index].count; + + if (ext_irq_index - index == VFIO_EXT_IR
[RFC v4 0/3] msi support for platform devices
This RFC adds support for MSI for platform devices. MSI block is added as an ext irq along with the existing wired interrupt implementation. The patchset exports two caps for MSI and related data to configure MSI source device. Changes from: - v3 to v4: 1) Removed the 'cap' for exporting MSI info to userspace and restored into vedor specific module. 2) Enable GENERIC_MSI_IRQ_DOMAIN in Kconfig. 3) Removed the vendor specific, Broadcom, 'msi' module and integrated the MSI relates ops into the 'reset' module for MSI support. v2 to v3: 1) Restored the vendor specific module to get max number of MSIs supported and .count value initialized. 2) Comments from Eric addressed. v1 to v2: 1) IRQ allocation has been implemented as below: |IRQ-0|IRQ-1||IRQ-n|MSI| MSI block has msi contexts and its implemneted as ext irq. 2) Removed vendor specific module for msi handling so previously patch2 and patch3 are not required. 3) MSI related data is exported to userspace using 'caps'. Please note VFIO_IRQ_INFO_CAP_TYPE in include/uapi/linux/vfio.h implementation is taken from the Eric`s patch https://patchwork.kernel.org/project/kvm/patch/20201116110030.32335-8-eric.au...@redhat.com/ v0 to v1: i) Removed MSI device flag VFIO_DEVICE_FLAGS_MSI. ii) Add MSI(s) at the end of the irq list of platform IRQs. MSI(s) with first entry of MSI block has count and flag information. IRQ list: Allocation for IRQs + MSIs are allocated as below Example: if there are 'n' IRQs and 'k' MSIs --- |IRQ-0|IRQ-1||IRQ-n|MSI-0|MSI-1|MSI-2|..|MSI-k| --- MSI-0 will have count=k set and flags set accordingly. Vikas Gupta (3): vfio/platform: add support for msi vfio/platform: change cleanup order vfio: platform: reset: add msi support drivers/vfio/platform/Kconfig | 1 + .../platform/reset/vfio_platform_bcmflexrm.c | 72 - drivers/vfio/platform/vfio_platform_common.c | 97 +-- drivers/vfio/platform/vfio_platform_irq.c | 253 -- drivers/vfio/platform/vfio_platform_private.h | 29 ++ include/uapi/linux/vfio.h | 24 ++ 6 files changed, 444 insertions(+), 32 deletions(-) -- 2.17.1 smime.p7s Description: S/MIME Cryptographic Signature
[RFC v4 2/3] vfio/platform: change cleanup order
In the case of msi, vendor specific msi module may require region access to handle msi cleanup so we need to cleanup region after irq cleanup only. Signed-off-by: Vikas Gupta --- drivers/vfio/platform/vfio_platform_common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/vfio/platform/vfio_platform_common.c b/drivers/vfio/platform/vfio_platform_common.c index f2b1f0c3bfcc..1cc040e3ed1f 100644 --- a/drivers/vfio/platform/vfio_platform_common.c +++ b/drivers/vfio/platform/vfio_platform_common.c @@ -243,8 +243,8 @@ static void vfio_platform_release(void *device_data) WARN_ON(1); } pm_runtime_put(vdev->device); - vfio_platform_regions_cleanup(vdev); vfio_platform_irq_cleanup(vdev); + vfio_platform_regions_cleanup(vdev); } mutex_unlock(&driver_lock); -- 2.17.1 smime.p7s Description: S/MIME Cryptographic Signature
Re: [REGRESSION] "ALSA: HDA: Early Forbid of runtime PM" broke my laptop's internal audio
On Fri, Jan 29, 2021 at 5:17 pm, Takashi Iwai wrote: --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -2217,8 +2217,6 @@ static const struct snd_pci_quirk power_save_denylist[] = { /* https://bugzilla.redhat.com/show_bug.cgi?id=1525104 */ SND_PCI_QUIRK(0x1043, 0x8733, "Asus Prime X370-Pro", 0), /* https://bugzilla.redhat.com/show_bug.cgi?id=1525104 */ - SND_PCI_QUIRK(0x1558, 0x6504, "Clevo W65_67SB", 0), - /* https://bugzilla.redhat.com/show_bug.cgi?id=1525104 */ SND_PCI_QUIRK(0x1028, 0x0497, "Dell Precision T3600", 0), /* https://bugzilla.redhat.com/show_bug.cgi?id=1525104 */ /* Note the P55A-UD3 and Z87-D3HP share the subsys id for the HDA dev */ Hi, This patch works fine on my laptop. I have no clue whether that means it's really safe to remove the quirk. I've never noticed any clicking noise myself, but I understand it has been a problem for other System76 laptops. Michael
Re: dax alignment problem on arm64 (and other achitectures)
On 1/29/21 4:32 PM, Pavel Tatashin wrote: > On Fri, Jan 29, 2021 at 9:51 AM Joao Martins > wrote: >> >> Hey Pavel, >> >> On 1/29/21 1:50 PM, Pavel Tatashin wrote: Since we last talked about this the enabling for EFI "Special Purpose" / Soft Reserved Memory has gone upstream and instantiates device-dax instances for address ranges marked with EFI_MEMORY_SP attribute. Critically this way of declaring device-dax removes the consideration of it as persistent memory and as such no metadata reservation. So, if you are willing to maintain the metadata external to the device (which seems reasonable for your environment) and have your platform firmware / kernel command line mark it as EFI_CONVENTIONAL_MEMORY + EFI_MEMORY_SP, then these reserve-free dax-devices will surface. >>> >>> Hi Dan, >>> >>> This is cool. Does it allow conversion between devdax and fsdax so DAX >>> aware filesystem can be installed and data can be put there to be >>> preserved across the reboot? >>> >> >> fwiw wrt to the 'preserved across kexec' part, you are going to need >> something conceptually similar to snippet below the scissors mark. >> Alternatively, we could fix kexec userspace to add conventional memory >> ranges (without the SP attribute part) when it sees a Soft-Reserved region. >> But can't tell which one is the right thing to do. > > Hi Joao, > > Is not it just a matter of appending arguments to the kernel parameter > during kexec reboot with Soft-Reserved region specified, or am I > missing something? I understand with fileload kexec syscall we might > accidently load segments onto reserved region, but with the original > kexec syscall, where we can specify destinations for each segment that > should not be a problem with today's kexec tools. > efi_fake_mem only works with EFI_MEMMAP conventional memory ranges, thus not having a EFI_MEMMAP with RAM ranges means it's a nop for the soft-reserved regions. Unless, you trying to suggest something like: memmap=%+0xefff ... To mark soft reserved on top an existing RAM? Sadly don't know if there's an equivalent for ARM. > I agree that preserving it automatically as you are proposing, would > make more sense, instead of fiddling with kernel parameters and > segment destinations. > > Thank you, > Pasha > >> >> At the moment, HMAT ranges (or those defined with efi_fake_mem=) aren't >> preserved not because of anything special with HMAT, but simply because >> the EFI memmap conventional ram ranges are not preserved (only runtime >> services). And HMAT/efi_fake_mem expects these to based on EFI memmap. >> [snip]
[RFC v4 3/3] vfio: platform: reset: add msi support
Add msi support for Broadcom FlexRm device. Signed-off-by: Vikas Gupta --- .../platform/reset/vfio_platform_bcmflexrm.c | 72 ++- 1 file changed, 70 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/platform/reset/vfio_platform_bcmflexrm.c b/drivers/vfio/platform/reset/vfio_platform_bcmflexrm.c index 96064ef8f629..6ca4ca12575b 100644 --- a/drivers/vfio/platform/reset/vfio_platform_bcmflexrm.c +++ b/drivers/vfio/platform/reset/vfio_platform_bcmflexrm.c @@ -21,7 +21,9 @@ #include #include #include +#include #include +#include #include "../vfio_platform_private.h" @@ -33,6 +35,9 @@ #define RING_VER 0x000 #define RING_CONTROL 0x034 #define RING_FLUSH_DONE0x038 +#define RING_MSI_ADDR_LS 0x03c +#define RING_MSI_ADDR_MS 0x040 +#define RING_MSI_DATA_VALUE0x064 /* Register RING_CONTROL fields */ #define CONTROL_FLUSH_SHIFT5 @@ -105,8 +110,71 @@ static int vfio_platform_bcmflexrm_reset(struct vfio_platform_device *vdev) return ret; } -module_vfio_reset_handler("brcm,iproc-flexrm-mbox", - vfio_platform_bcmflexrm_reset); +static u32 bcm_num_msi(struct vfio_platform_device *vdev) +{ + struct vfio_platform_region *reg = &vdev->regions[0]; + + return (reg->size / RING_REGS_SIZE); +} + +static void bcm_write_msi(struct vfio_platform_device *vdev, + struct msi_desc *desc, + struct msi_msg *msg) +{ + int i; + int hwirq = -1; + int msi_src; + void __iomem *ring; + struct vfio_platform_region *reg = &vdev->regions[0]; + + if (!reg) + return; + + for (i = 0; i < vdev->num_irqs; i++) + if (vdev->irqs[i].type == VFIO_IRQ_TYPE_MSI) + hwirq = vdev->irqs[i].ctx[0].hwirq; + + if (hwirq < 0) + return; + + msi_src = desc->irq - hwirq; + + if (!reg->ioaddr) { + reg->ioaddr = ioremap(reg->addr, reg->size); + if (!reg->ioaddr) + return; + } + + ring = reg->ioaddr + msi_src * RING_REGS_SIZE; + + writel_relaxed(msg->address_lo, ring + RING_MSI_ADDR_LS); + writel_relaxed(msg->address_hi, ring + RING_MSI_ADDR_MS); + writel_relaxed(msg->data, ring + RING_MSI_DATA_VALUE); +} + +static struct vfio_platform_reset_node vfio_platform_bcmflexrm_reset_node = { + .owner = THIS_MODULE, + .compat = "brcm,iproc-flexrm-mbox", + .of_reset = vfio_platform_bcmflexrm_reset, + .of_get_msi = bcm_num_msi, + .of_msi_write = bcm_write_msi +}; + +static int __init vfio_platform_bcmflexrm_reset_module_init(void) +{ + __vfio_platform_register_reset(&vfio_platform_bcmflexrm_reset_node); + + return 0; +} + +static void __exit vfio_platform_bcmflexrm_reset_module_exit(void) +{ + vfio_platform_unregister_reset("brcm,iproc-flexrm-mbox", + vfio_platform_bcmflexrm_reset); +} + +module_init(vfio_platform_bcmflexrm_reset_module_init); +module_exit(vfio_platform_bcmflexrm_reset_module_exit); MODULE_LICENSE("GPL v2"); MODULE_AUTHOR("Anup Patel "); -- 2.17.1 smime.p7s Description: S/MIME Cryptographic Signature
[PATCH] platform/x86: dell-wmi-sysman: fix a NULL pointer dereference
An upcoming Dell platform is causing a NULL pointer dereference in dell-wmi-sysman initialization. Validate that the input from BIOS matches correct ACPI types and abort module initialization if it fails. This leads to a memory leak that needs to be cleaned up properly. Signed-off-by: Mario Limonciello --- drivers/platform/x86/dell-wmi-sysman/sysman.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/platform/x86/dell-wmi-sysman/sysman.c b/drivers/platform/x86/dell-wmi-sysman/sysman.c index dc6dd531c996..38b497991071 100644 --- a/drivers/platform/x86/dell-wmi-sysman/sysman.c +++ b/drivers/platform/x86/dell-wmi-sysman/sysman.c @@ -419,13 +419,19 @@ static int init_bios_attributes(int attr_type, const char *guid) return retval; /* need to use specific instance_id and guid combination to get right data */ obj = get_wmiobj_pointer(instance_id, guid); - if (!obj) + if (!obj || obj->type != ACPI_TYPE_PACKAGE) { + release_attributes_data(); return -ENODEV; + } elements = obj->package.elements; mutex_lock(&wmi_priv.mutex); while (elements) { /* sanity checking */ + if (elements[ATTR_NAME].type != ACPI_TYPE_STRING) { + pr_debug("incorrect element type\n"); + goto nextobj; + } if (strlen(elements[ATTR_NAME].string.pointer) == 0) { pr_debug("empty attribute found\n"); goto nextobj; -- 2.25.1
Re: [PATCH V3 1/5] perf/core: Add PERF_SAMPLE_WEIGHT_STRUCT
On 1/28/2021 5:40 PM, kan.li...@linux.intel.com wrote: From: Kan Liang Current PERF_SAMPLE_WEIGHT sample type is very useful to expresses the cost of an action represented by the sample. This allows the profiler to scale the samples to be more informative to the programmer. It could also help to locate a hotspot, e.g., when profiling by memory latencies, the expensive load appear higher up in the histograms. But current PERF_SAMPLE_WEIGHT sample type is solely determined by one factor. This could be a problem, if users want two or more factors to contribute to the weight. For example, Golden Cove core PMU can provide both the instruction latency and the cache Latency information as factors for the memory profiling. For current X86 platforms, although meminfo::latency is defined as a u64, only the lower 32 bits include the valid data in practice (No memory access could last than 4G cycles). The higher 32 bits can be used to store new factors. Add a new sample type, PERF_SAMPLE_WEIGHT_STRUCT, to indicate the new sample weight structure. It shares the same space as the PERF_SAMPLE_WEIGHT sample type. Users can apply either the PERF_SAMPLE_WEIGHT sample type or the PERF_SAMPLE_WEIGHT_STRUCT sample type to retrieve the sample weight, but they cannot apply both sample types simultaneously. Currently, only X86 and PowerPC use the PERF_SAMPLE_WEIGHT sample type. - For PowerPC, there is nothing changed for the PERF_SAMPLE_WEIGHT sample type. There is no effect for the new PERF_SAMPLE_WEIGHT_STRUCT sample type. PowerPC can re-struct the weight field similarly later. - For X86, the same value will be dumped for the PERF_SAMPLE_WEIGHT sample type or the PERF_SAMPLE_WEIGHT_STRUCT sample type for now. The following patches will apply the new factors for the PERF_SAMPLE_WEIGHT_STRUCT sample type. The field in the union perf_sample_weight should be shared among different architectures. A generic name is required, but it's hard to abstract a name that applies to all architectures. For example, on X86, the fields are to store all kinds of latency. While on PowerPC, it stores MMCRA[TECX/TECM], which should not be latency. So a general name prefix 'var$NUM' is used here. Suggested-by: Peter Zijlstra (Intel) Signed-off-by: Kan Liang --- arch/powerpc/perf/core-book3s.c | 2 +- arch/x86/events/intel/ds.c | 17 +++--- include/linux/perf_event.h | 4 ++-- include/uapi/linux/perf_event.h | 49 +++-- kernel/events/core.c| 11 + 5 files changed, 66 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index 28206b1..869d999 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -2195,7 +2195,7 @@ static void record_and_restart(struct perf_event *event, unsigned long val, if (event->attr.sample_type & PERF_SAMPLE_WEIGHT && ppmu->get_mem_weight) - ppmu->get_mem_weight(&data.weight); + ppmu->get_mem_weight(&data.weight.full); if (perf_event_overflow(event, &data, regs)) power_pmu_stop(event, 0); diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 67dbc91..2f54b1f 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -960,7 +960,8 @@ static void adaptive_pebs_record_size_update(void) } #define PERF_PEBS_MEMINFO_TYPE (PERF_SAMPLE_ADDR | PERF_SAMPLE_DATA_SRC | \ - PERF_SAMPLE_PHYS_ADDR | PERF_SAMPLE_WEIGHT | \ + PERF_SAMPLE_PHYS_ADDR | \ + PERF_SAMPLE_WEIGHT_TYPE |\ PERF_SAMPLE_TRANSACTION |\ PERF_SAMPLE_DATA_PAGE_SIZE) @@ -987,7 +988,7 @@ static u64 pebs_update_adaptive_cfg(struct perf_event *event) gprs = (sample_type & PERF_SAMPLE_REGS_INTR) && (attr->sample_regs_intr & PEBS_GP_REGS); - tsx_weight = (sample_type & PERF_SAMPLE_WEIGHT) && + tsx_weight = (sample_type & PERF_SAMPLE_WEIGHT_TYPE) && ((attr->config & INTEL_ARCH_EVENT_MASK) == x86_pmu.rtm_abort_event); @@ -1369,8 +1370,8 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event, /* * Use latency for weight (only avail with PEBS-LL) */ - if (fll && (sample_type & PERF_SAMPLE_WEIGHT)) - data->weight = pebs->lat; + if (fll && (sample_type & PERF_SAMPLE_WEIGHT_TYPE)) + data->weight.full = pebs->lat; /* * data.data_src encodes the data source @@ -1462,8 +1463,8 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event, if (x86_pmu.intel_cap.pebs_format >= 2) { /* Only set the TSX we
Re: [PATCH] sched/fair: Rate limit calls to update_blocked_averages() for NOHZ
Le vendredi 29 janv. 2021 à 11:33:00 (+0100), Vincent Guittot a écrit : > On Thu, 28 Jan 2021 at 16:09, Joel Fernandes wrote: > > > > Hi Vincent, > > > > On Thu, Jan 28, 2021 at 8:57 AM Vincent Guittot > > wrote: > > > > On Mon, Jan 25, 2021 at 03:42:41PM +0100, Vincent Guittot wrote: > > > > > On Fri, 22 Jan 2021 at 20:10, Joel Fernandes > > > > > wrote: > > > > > > On Fri, Jan 22, 2021 at 05:56:22PM +0100, Vincent Guittot wrote: > > > > > > > On Fri, 22 Jan 2021 at 16:46, Joel Fernandes (Google) > > > > > > > wrote: > > > > > > > > > > > > > > > > On an octacore ARM64 device running ChromeOS Linux kernel v5.4, > > > > > > > > I found > > > > > > > > that there are a lot of calls to update_blocked_averages(). > > > > > > > > This causes > > > > > > > > the schedule loop to slow down to taking upto 500 micro seconds > > > > > > > > at > > > > > > > > times (due to newidle load balance). I have also seen this > > > > > > > > manifest in > > > > > > > > the periodic balancer. > > > > > > > > > > > > > > > > Closer look shows that the problem is caused by the following > > > > > > > > ingredients: > > > > > > > > 1. If the system has a lot of inactive CGroups (thanks Dietmar > > > > > > > > for > > > > > > > > suggesting to inspect /proc/sched_debug for this), this can make > > > > > > > > __update_blocked_fair() take a long time. > > > > > > > > > > > > > > Inactive cgroups are removed from the list so they should not > > > > > > > impact > > > > > > > the duration > > > > > > > > > > > > I meant blocked CGroups. According to this code, a cfs_rq can be > > > > > > partially > > > > > > decayed and not have any tasks running on it but its load needs to > > > > > > be > > > > > > decayed, correct? That's what I meant by 'inactive'. I can reword > > > > > > it to > > > > > > 'blocked'. > > > > > > > > > > How many blocked cgroups have you got ? > > > > > > > > I put a counter in for_each_leaf_cfs_rq_safe() { } to count how many > > > > times > > > > this loop runs per new idle balance. When the problem happens I see > > > > this loop > > > > run 35-40 times (for one single instance of newidle balance). So in > > > > total > > > > there are at least these many cfs_rq load updates. > > > > > > Do you mean that you have 35-40 cgroups ? Or the 35-40 includes all CPUs ? > > > > All CPUs. > > > > > > I also see that new idle balance can be called 200-500 times per second. > > > > > > This is not surprising because newidle_balance() is called every time > > > the CPU is about to become idle > > > > Sure. > > > > > > > > > > > > > > * There can be a lot of idle CPU cgroups. Don't > > > > > > let fully > > > > > > * decayed cfs_rqs linger on the list. > > > > > > */ > > > > > > if (cfs_rq_is_decayed(cfs_rq)) > > > > > > list_del_leaf_cfs_rq(cfs_rq); > > > > > > > > > > > > > > 2. The device has a lot of CPUs in a cluster which causes > > > > > > > > schedutil in a > > > > > > > > shared frequency domain configuration to be slower than usual. > > > > > > > > (the load > > > > > > > > > > > > > > What do you mean exactly by it causes schedutil to be slower than > > > > > > > usual ? > > > > > > > > > > > > sugov_next_freq_shared() is order number of CPUs in the a cluster. > > > > > > This > > > > > > system is a 6+2 system with 6 CPUs in a cluster. schedutil shared > > > > > > policy > > > > > > frequency update needs to go through utilization of other CPUs in > > > > > > the > > > > > > cluster. I believe this could be adding to the problem but is not > > > > > > really > > > > > > needed to optimize if we can rate limit the calls to > > > > > > update_blocked_averages > > > > > > to begin with. > > > > > > > > > > Qais mentioned half of the time being used by > > > > > sugov_next_freq_shared(). Are there any frequency changes resulting in > > > > > this call ? > > > > > > > > I do not see a frequency update happening at the time of the problem. > > > > However > > > > note that sugov_iowait_boost() does run even if frequency is not being > > > > updated. IIRC, this function is also not that light weight and I am not > > > > sure > > > > if it is a good idea to call this that often. > > > > > > Scheduler can't make any assumption about how often schedutil/cpufreq > > > wants to be called. Some are fast and straightforward and can be > > > called very often to adjust frequency; Others can't handle much > > > updates. The rate limit mechanism in schedutil and io-boost should be > > > there for such purpose. > > > > Sure, I know that's the intention. > > > > > > > > > > average updates also try to update the frequency in schedutil). > > > > > > > > > > > > > > > > 3. The CPU is running at a low frequency causing the > > > > > > > > scheduler/schedutil > > > > > > > > code paths to take longer than when running at a high CPU > > > > > > > > frequency. > > > > > > > > > > > > > > Low frequency usually