Re: [PATCH] RAS/CEC: Add debugfs switch to disable at run time
On Thu, Apr 18, 2019 at 5:07 PM Luck, Tony wrote: > > On Fri, Apr 19, 2019 at 01:29:10AM +0200, Borislav Petkov wrote: > > Which reminds me, Tony, I think all those debugging files "pfn" > > and "array" and the one you add now, should all be under a > > CONFIG_RAS_CEC_DEBUG which is default off and used only for development. > > Mind adding that too pls? > > Patch below, on top of previous patch. Note that I didn't move "enable" > into the RAS_CEC_DEBUG code. I think it has some value even on > production systems. It is still in debugfs (which many production > systems don't mount) so I don't see that people are going to be randomly > using it to disable the CEC. For me, "enable" is useful to make mcelog work like before. Please see the other email from me for all the details. For debugfs, I believe many productions mount it, as tracing still uses debugfs rather than tracefs (at least on CentOS7), and rasdaemon also uses trace events to collect different errors. Therefore, I believe your patch is fine as it is. Thanks.
Re: [PATCH] RAS/CEC: Add debugfs switch to disable at run time
On Thu, Apr 18, 2019 at 5:26 PM Borislav Petkov wrote: > > Now, if any of that above still doesn't make it clear, please state what > you're trying to achieve and I'll try to help. Sorry that I misled you to believe we don't even enable CONFIG_X86_MCELOG_LEGACY. Here is what we have and what we have tried: 1. We have CONFIG_X86_MCELOG_LEGACY=y 2. We also have CONFIG_RAS=y and CONFIG_RAS_CEC=y 3. mcelog started as a daemon successfully, like before 4. Some real correctable memory errors happened, as logged in dmesg 5. mcelog couldn't receive any of them, reported 0 errors 6. Admin's complained to us as they believe this is a kernel bug 7. We dug into kernel source code and found out CONFIG_RAS hijacks all these errors, by stopping there in the notification chain: static int mce_first_notifier(struct notifier_block *nb, unsigned long val, void *data) { struct mce *m = (struct mce *)data; if (!m) return NOTIFY_DONE; if (cec_add_mce(m)) return NOTIFY_STOP; // <=== Returns and stops here /* Emit the trace record: */ trace_mce_record(m); set_bit(0, _need_notify); mce_notify_irq(); // <=== There is where MCELOG receives return NOTIFY_DONE; } 8. I noticed rasdaemon, and tried to start it instead of mcelog. 9. I injected some memory error and could successfully read them via ras-mc-ctl. To demonstrate what I think we should have, here is the PoC code ONLY to show the idea (please don't judge it): @ -567,12 +567,12 @@ static int mce_first_notifier(struct notifier_block *nb, unsigned long val, void *data) { struct mce *m = (struct mce *)data; + bool consumed; if (!m) return NOTIFY_DONE; - if (cec_add_mce(m)) - return NOTIFY_STOP; + consumed = cec_add_mce(m); /* Emit the trace record: */ trace_mce_record(m); @@ -581,7 +581,7 @@ static int mce_first_notifier(struct notifier_block *nb, unsigned long val, mce_notify_irq(); - return NOTIFY_DONE; + return consumed ? NOTIFY_STOP : NOTIFY_DONE; } With this change, although not even compiled, mcelog should still receive correctable memory errors like before, even when we have CONFIG_RAS_CEC=y. Does this make any sense to you? Thanks!
Re: [PATCH v20 00/28] Intel SGX1 support
On Fri, 19 Apr 2019, Jethro Beekman wrote: > On 2019-04-19 14:34, Thomas Gleixner wrote: > > And how so? You create writeable AND executable memory. That's a nono and > > you can argue in circles, that's not going to change with any of your > > proposed changes. > > On 2019-04-19 14:38, Thomas Gleixner wrote: > > You are working around LSM nothing else and that's just not going to fly. > > Based on your comments, I'm still unsure if we're on the same page with > regards to what I'm proposing. > > Here's a regular non-SGX flow that LSM would likely prevent: > > mmap(PROT_READ|PROT_WRITE) > memcpy() > mmap(PROT_READ|PROT_EXEC) <-- denied by LSM > > Or just something based on regular PT permissions: > > mmap(PROT_READ|PROT_EXEC) > memcpy() <-- SIGSEGV > > Now, the equivalent for SGX: > > mmap(PROT_READ|PROT_WRITE) > ioctl(EADD) > mmap(PROT_READ|PROT_EXEC) <-- denied by LSM This is completely irrelevant, really. The point is that the SGX driver loads and executes arbitrary data which is handed in from user space via an ioctl w/o any chance of verifying where that comes from. What Andy proposed is to open a file with the SGX payload and hand in the file descriptor. That way LSM can decide whether this is allowed or denied based on the file descriptor and whatever the security model/policy is in a particular setup. Right know the SGX driver and its proposed API prevent any form of LSM auditing and whatever permission checks you had in mind won't change that at all. Thanks, tglx
Re: [PATCH] KVM: nVMX: allow tests to use bad virtual-APIC page address
On Mon, Apr 15, 2019 at 03:35:33PM +0200, Paolo Bonzini wrote: > As mentioned in the comment, there are some special cases where we can simply > clear the TPR shadow bit from the CPU-based execution controls in the vmcs02. > Handle them so that we can remove some XFAILs from vmx.flat. > > Signed-off-by: Paolo Bonzini > --- > arch/x86/kvm/vmx/nested.c | 25 - > arch/x86/kvm/vmx/vmx.c| 2 +- > arch/x86/kvm/vmx/vmx.h| 2 ++ > 3 files changed, 19 insertions(+), 10 deletions(-) > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > index 7ec9bb1dd723..a22af5a85540 100644 > --- a/arch/x86/kvm/vmx/nested.c > +++ b/arch/x86/kvm/vmx/nested.c > @@ -2873,20 +2873,27 @@ static void nested_get_vmcs12_pages(struct kvm_vcpu > *vcpu) > /* >* If translation failed, VM entry will fail because >* prepare_vmcs02 set VIRTUAL_APIC_PAGE_ADDR to -1ull. > - * Failing the vm entry is _not_ what the processor > - * does but it's basically the only possibility we > - * have. We could still enter the guest if CR8 load > - * exits are enabled, CR8 store exits are enabled, and > - * virtualize APIC access is disabled; in this case > - * the processor would never use the TPR shadow and we > - * could simply clear the bit from the execution > - * control. But such a configuration is useless, so > - * let's keep the code simple. >*/ > if (!is_error_page(page)) { > vmx->nested.virtual_apic_page = page; > hpa = page_to_phys(vmx->nested.virtual_apic_page); > vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, hpa); > + } else if (nested_cpu_has(vmcs12, CPU_BASED_CR8_LOAD_EXITING) && > +nested_cpu_has(vmcs12, CPU_BASED_CR8_STORE_EXITING) > && > +!nested_cpu_has2(vmcs12, > SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) { > + /* > + * The processor will never use the TPR shadow, simply > + * clear the bit from the execution control. Such a > + * configuration is useless, but it happens in tests. > + * For any other configuration, failing the vm entry is > + * _not_ what the processor does but it's basically the > + * only possibility we have. > + */ > + vmcs_clear_bits(CPU_BASED_VM_EXEC_CONTROL, > + CPU_BASED_TPR_SHADOW); > + } else { > + printk("bad virtual-APIC page address\n"); > + dump_vmcs(); I don't think we should dump the VMCS here, or have any form of print at all. dump_vmcs() is especially bad as it allows userspace to spam the kernel log at the error level. I haven't actually checked, but I assume the nested consistency check tracing patch would be a better way to debug something of this nature in production. > } > } > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index f8054dc1de65..14cacfd7ffd0 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -5603,7 +5603,7 @@ static void vmx_dump_dtsel(char *name, uint32_t limit) > vmcs_readl(limit + GUEST_GDTR_BASE - GUEST_GDTR_LIMIT)); > } > > -static void dump_vmcs(void) > +void dump_vmcs(void) > { > u32 vmentry_ctl = vmcs_read32(VM_ENTRY_CONTROLS); > u32 vmexit_ctl = vmcs_read32(VM_EXIT_CONTROLS); > diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h > index a1e00d0a2482..f879529906b4 100644 > --- a/arch/x86/kvm/vmx/vmx.h > +++ b/arch/x86/kvm/vmx/vmx.h > @@ -517,4 +517,6 @@ static inline void decache_tsc_multiplier(struct vcpu_vmx > *vmx) > vmcs_write64(TSC_MULTIPLIER, vmx->current_tsc_ratio); > } > > +void dump_vmcs(void); > + > #endif /* __KVM_X86_VMX_H */ > -- > 1.8.3.1 >
Re: [PATCH 4.19 000/110] 4.19.36-stable review
On Thu, 18 Apr 2019 at 23:29, Greg Kroah-Hartman wrote: > > This is the start of the stable review cycle for the 4.19.36 release. > There are 110 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Sat Apr 20 16:03:29 UTC 2019. > Anything received after that time might be too late. > > The whole patch series can be found in one patch at: > > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.36-rc1.gz > or in the git tree and branch at: > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > linux-4.19.y > and the diffstat can be found below. > > thanks, > > greg k-h Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386. Summary kernel: 4.19.36-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.19.y git commit: e409a9a7917997e4aa9b49cd5c445208696ee1cf git describe: v4.19.35-111-ge409a9a79179 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.19-oe/build/v4.19.35-111-ge409a9a79179 No regressions (compared to build v4.19.34-102-ge4f859c2cd83) No fixes (compared to build v4.19.34-102-ge4f859c2cd83) Ran 23730 total tests in the following environments and test suites. Environments -- - dragonboard-410c - arm64 - hi6220-hikey - arm64 - i386 - juno-r2 - arm64 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - arm - x86_64 Test Suites --- * install-android-platform-tools-r2600 * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * perf * spectre-meltdown-checker-test * v4l2-compliance * kvm-unit-tests * ltp-open-posix-tests * kselftest-vsyscall-mode-native -- Linaro LKFT https://lkft.linaro.org
Re: [PATCH 5.0 00/93] 5.0.9-stable review
On Thu, 18 Apr 2019 at 23:40, Greg Kroah-Hartman wrote: > > This is the start of the stable review cycle for the 5.0.9 release. > There are 93 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Sat Apr 20 16:03:33 UTC 2019. > Anything received after that time might be too late. > > The whole patch series can be found in one patch at: > > https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.0.9-rc1.gz > or in the git tree and branch at: > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > linux-5.0.y > and the diffstat can be found below. > > thanks, > > greg k-h > Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386. Summary kernel: 5.0.9-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-5.0.y git commit: a6b9c7d1cb06316cd1cea76371029603b6459de8 git describe: v5.0.8-94-ga6b9c7d1cb06 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-5.0-oe/build/v5.0.8-94-ga6b9c7d1cb06 No regressions (compared to build v5.0.7-118-ge6fdfdbfcc6a) No fixes (compared to build v5.0.7-118-ge6fdfdbfcc6a) Ran 23788 total tests in the following environments and test suites. Environments -- - dragonboard-410c - hi6220-hikey - i386 - juno-r2 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - x86 Test Suites --- * install-android-platform-tools-r2600 * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * perf * spectre-meltdown-checker-test * v4l2-compliance * kvm-unit-tests * ltp-fs-tests * ltp-open-posix-tests * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none -- Linaro LKFT https://lkft.linaro.org
Re: mmotm 2019-04-19-14-53 uploaded (objtool)
On 4/19/19 2:53 PM, a...@linux-foundation.org wrote: > The mm-of-the-moment snapshot 2019-04-19-14-53 has been uploaded to > >http://www.ozlabs.org/~akpm/mmotm/ > > mmotm-readme.txt says > > README for mm-of-the-moment: > > http://www.ozlabs.org/~akpm/mmotm/ > > This is a snapshot of my -mm patch queue. Uploaded at random hopefully > more than once a week. on x86_64: CC lib/strncpy_from_user.o lib/strncpy_from_user.o: warning: objtool: strncpy_from_user()+0x315: call to __ubsan_handle_add_overflow() with UACCESS enabled CC lib/strnlen_user.o lib/strnlen_user.o: warning: objtool: strnlen_user()+0x337: call to __ubsan_handle_sub_overflow() with UACCESS enabled obj files are attached. -- ~Randy strncpy_from_user.o Description: application/object strnlen_user.o Description: application/object
[PATCH] power_supply: platform/chrome: wilco_ec: fix ptr_ret.cocci warnings
From: kbuild test robot drivers/power/supply/wilco-charger.c:173:1-3: WARNING: PTR_ERR_OR_ZERO can be used Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR Generated by: scripts/coccinelle/api/ptr_ret.cocci Fixes: 2600e89ec94e ("power_supply: platform/chrome: wilco_ec: Add charging config driver") CC: Nick Crews Signed-off-by: kbuild test robot --- url: https://github.com/0day-ci/linux/commits/Nick-Crews/power_supply-platform-chrome-wilco_ec-Add-charging-config-driver/20190420-041913 wilco-charger.c |5 + 1 file changed, 1 insertion(+), 4 deletions(-) --- a/drivers/power/supply/wilco-charger.c +++ b/drivers/power/supply/wilco-charger.c @@ -170,10 +170,7 @@ static int wilco_charge_probe(struct pla psy_cfg.drv_data = ec; psy = devm_power_supply_register(>dev, _ps_desc, _cfg); - if (IS_ERR(psy)) - return PTR_ERR(psy); - - return 0; + return PTR_ERR_OR_ZERO(psy); } static struct platform_driver wilco_charge_driver = {
[PATCH 3/3] PCI: use pci_*(), dev_*() and pr_*() instead of pci_printk()
Replace pci_printk with pci_*, dev_*() and pr_*() to avoid checkpatch warnings. Signed-off-by: Mohan Kumar --- drivers/pci/bus.c | 3 +-- drivers/pci/pci.c | 14 +++--- drivers/pci/probe.c| 21 + drivers/pci/setup-bus.c| 15 +++ drivers/pci/xen-pcifront.c | 7 +++ 5 files changed, 27 insertions(+), 33 deletions(-) diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c index a742ef5..25e91ae 100644 --- a/drivers/pci/bus.c +++ b/drivers/pci/bus.c @@ -291,8 +291,7 @@ bool pci_bus_clip_resource(struct pci_dev *dev, int idx) res->end = end; res->flags &= ~IORESOURCE_UNSET; orig_res.flags &= ~IORESOURCE_UNSET; - pci_printk(KERN_DEBUG, dev, "%pR clipped to %pR\n", -_res, res); + pci_info(dev, "%pR clipped to %pR\n", _res, res); return true; } diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index a24a8bc..d439151 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -2789,14 +2789,14 @@ void pci_pm_init(struct pci_dev *dev) dev->d2_support = true; if (dev->d1_support || dev->d2_support) - pci_printk(KERN_DEBUG, dev, "supports%s%s\n", + pci_info(dev, "supports%s%s\n", dev->d1_support ? " D1" : "", dev->d2_support ? " D2" : ""); } pmc &= PCI_PM_CAP_PME_MASK; if (pmc) { - pci_printk(KERN_DEBUG, dev, "PME# supported from%s%s%s%s%s\n", + pci_info(dev, "PME# supported from%s%s%s%s%s\n", (pmc & PCI_PM_CAP_PME_D0) ? " D0" : "", (pmc & PCI_PM_CAP_PME_D1) ? " D1" : "", (pmc & PCI_PM_CAP_PME_D2) ? " D2" : "", @@ -2964,16 +2964,16 @@ static int pci_ea_read(struct pci_dev *dev, int offset) res->flags = flags; if (bei <= PCI_EA_BEI_BAR5) - pci_printk(KERN_DEBUG, dev, "BAR %d: %pR (from Enhanced Allocation, properties %#02x)\n", + pci_info(dev, "BAR %d: %pR (from Enhanced Allocation, properties %#02x)\n", bei, res, prop); else if (bei == PCI_EA_BEI_ROM) - pci_printk(KERN_DEBUG, dev, "ROM: %pR (from Enhanced Allocation, properties %#02x)\n", + pci_info(dev, "ROM: %pR (from Enhanced Allocation, properties %#02x)\n", res, prop); else if (bei >= PCI_EA_BEI_VF_BAR0 && bei <= PCI_EA_BEI_VF_BAR5) - pci_printk(KERN_DEBUG, dev, "VF BAR %d: %pR (from Enhanced Allocation, properties %#02x)\n", + pci_info(dev, "VF BAR %d: %pR (from Enhanced Allocation, properties %#02x)\n", bei - PCI_EA_BEI_VF_BAR0, res, prop); else - pci_printk(KERN_DEBUG, dev, "BEI %d res: %pR (from Enhanced Allocation, properties %#02x)\n", + pci_info(dev, "BEI %d res: %pR (from Enhanced Allocation, properties %#02x)\n", bei, res, prop); out: @@ -4200,7 +4200,7 @@ int pci_set_cacheline_size(struct pci_dev *dev) if (cacheline_size == pci_cache_line_size) return 0; - pci_printk(KERN_DEBUG, dev, "cache line size of %d is not supported\n", + pci_info(dev, "cache line size of %d is not supported\n", pci_cache_line_size << 2); return -EINVAL; diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index eea7847..89fdfd4 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -317,7 +317,7 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type, res->flags = 0; out: if (res->flags) - pci_printk(KERN_DEBUG, dev, "reg 0x%x: %pR\n", pos, res); + pci_info(dev, "reg 0x%x: %pR\n", pos, res); return (res->flags & IORESOURCE_MEM_64) ? 1 : 0; } @@ -435,7 +435,7 @@ static void pci_read_bridge_io(struct pci_bus *child) region.start = base; region.end = limit + io_granularity - 1; pcibios_bus_to_resource(dev->bus, res, ); - pci_printk(KERN_DEBUG, dev, " bridge window %pR\n", res); + pci_info(dev, " bridge window %pR\n", res); } } @@ -457,7 +457,7 @@ static void pci_read_bridge_mmio(struct pci_bus *child) region.start = base; region.end = limit + 0xf; pcibios_bus_to_resource(dev->bus, res, ); - pci_printk(KERN_DEBUG, dev, " bridge window %pR\n", res); + pci_info(dev, " bridge window %pR\n", res); } } @@ -510,7 +510,7 @@ static void pci_read_bridge_mmio_pref(struct pci_bus *child) region.start = base; region.end = limit + 0xf; pcibios_bus_to_resource(dev->bus,
[PATCH 2/3] PCI: Define pr_fmt() and use pr_*() instead of printk()
Define a pr_fmt() macro that convert all of the explicit printk() calls into corresponding pr_*(). Signed-off-by: Mohan Kumar --- drivers/pci/bus.c | 5 - drivers/pci/pci-stub.c | 11 +-- drivers/pci/pci-sysfs.c | 3 ++- drivers/pci/pci.c | 5 +++-- drivers/pci/quirks.c| 9 + drivers/pci/setup-bus.c | 5 +++-- drivers/pci/slot.c | 4 +++- 7 files changed, 25 insertions(+), 17 deletions(-) diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c index 5cb40b2..a742ef5 100644 --- a/drivers/pci/bus.c +++ b/drivers/pci/bus.c @@ -6,6 +6,9 @@ * David Miller (da...@redhat.com) * Ivan Kokshaysky (i...@jurassic.park.msu.ru) */ + +#define pr_fmt(fmt) "PCI: " fmt + #include #include #include @@ -23,7 +26,7 @@ void pci_add_resource_offset(struct list_head *resources, struct resource *res, entry = resource_list_create_entry(res, 0); if (!entry) { - printk(KERN_ERR "PCI: can't add host bridge window %pR\n", res); + pr_err("can't add host bridge window %pR\n", res); return; } diff --git a/drivers/pci/pci-stub.c b/drivers/pci/pci-stub.c index 66f8a59..f946bf9 100644 --- a/drivers/pci/pci-stub.c +++ b/drivers/pci/pci-stub.c @@ -16,6 +16,8 @@ * .../:00:19.0/driver -> ../../../bus/pci/drivers/pci-stub */ +#define pr_fmt(fmt) "pci-stub: " fmt + #include #include @@ -66,20 +68,17 @@ static int __init pci_stub_init(void) , _mask); if (fields < 2) { - printk(KERN_WARNING - "pci-stub: invalid id string \"%s\"\n", id); + pr_warn("invalid id string \"%s\"\n", id); continue; } - printk(KERN_INFO - "pci-stub: add %04X:%04X sub=%04X:%04X cls=%08X/%08X\n", + pr_info("add %04X:%04X sub=%04X:%04X cls=%08X/%08X\n", vendor, device, subvendor, subdevice, class, class_mask); rc = pci_add_dynid(_driver, vendor, device, subvendor, subdevice, class, class_mask, 0); if (rc) - printk(KERN_WARNING - "pci-stub: failed to add dynamic id (%d)\n", rc); + pr_warn("failed to add dynamic id (%d)\n", rc); } return 0; diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index 25794c2..455bf29 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -12,6 +12,7 @@ * Modeled after usb's driverfs.c */ +#define pr_fmt(fmt) "pci: warning: " fmt #include #include @@ -,7 +1112,7 @@ void pci_create_legacy_files(struct pci_bus *b) kfree(b->legacy_io); b->legacy_io = NULL; kzalloc_err: - printk(KERN_WARNING "pci: warning: could not create legacy I/O port and ISA memory resources to sysfs\n"); + pr_warn("could not create legacy I/O port and ISA memory resources to sysfs\n"); return; } diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index f5ff01d..a24a8bc 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -8,6 +8,8 @@ * Copyright 1997 -- 2000 Martin Mares */ +#define pr_fmt(fmt) "PCI: " fmt + #include #include #include @@ -6276,8 +6278,7 @@ static int __init pci_setup(char *str) } else if (!strncmp(str, "disable_acs_redir=", 18)) { disable_acs_redir_param = str + 18; } else { - printk(KERN_ERR "PCI: Unknown option `%s'\n", - str); + pr_err("Unknown option `%s'\n", str); } } str = k; diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 06af0c3..fb802d1 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -12,6 +12,8 @@ * file, where their drivers can use them. */ +#define pr_fmt(fmt) "PCI: " fmt + #include #include #include @@ -159,8 +161,7 @@ static int __init pci_apply_final_quirks(void) u8 tmp; if (pci_cache_line_size) - printk(KERN_DEBUG "PCI: CLS %u bytes\n", - pci_cache_line_size << 2); + pr_info("CLS %u bytes\n", pci_cache_line_size << 2); pci_apply_fixup_final_quirks = true; for_each_pci_dev(dev) { @@ -177,7 +178,7 @@ static int __init pci_apply_final_quirks(void) if (!tmp || cls == tmp) continue; - printk(KERN_DEBUG "PCI: CLS mismatch (%u != %u), using %u bytes\n", + pr_info("CLS mismatch (%u != %u), using %u bytes\n", cls << 2, tmp << 2, pci_dfl_cache_line_size << 2);
[PATCH 1/3] PCI: Cleanup printk logging
Replace printk with pr_* to avoid checkpatch warnings. Signed-off-by: Mohan Kumar --- drivers/pci/pci-acpi.c | 11 --- drivers/pci/quirks.c | 2 +- 2 files changed, 5 insertions(+), 8 deletions(-) diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c index e1949f7..3ada026 100644 --- a/drivers/pci/pci-acpi.c +++ b/drivers/pci/pci-acpi.c @@ -140,8 +140,7 @@ static acpi_status decode_type0_hpx_record(union acpi_object *record, hpx->t0->enable_perr = fields[5].integer.value; break; default: - printk(KERN_WARNING - "%s: Type 0 Revision %d record not supported\n", + pr_warn("%s: Type 0 Revision %d record not supported\n", __func__, revision); return AE_ERROR; } @@ -169,8 +168,7 @@ static acpi_status decode_type1_hpx_record(union acpi_object *record, hpx->t1->tot_max_split = fields[4].integer.value; break; default: - printk(KERN_WARNING - "%s: Type 1 Revision %d record not supported\n", + pr_warn("%s: Type 1 Revision %d record not supported\n", __func__, revision); return AE_ERROR; } @@ -211,8 +209,7 @@ static acpi_status decode_type2_hpx_record(union acpi_object *record, hpx->t2->sec_unc_err_mask_or = fields[17].integer.value; break; default: - printk(KERN_WARNING - "%s: Type 2 Revision %d record not supported\n", + pr_warn("%s: Type 2 Revision %d record not supported\n", __func__, revision); return AE_ERROR; } @@ -272,7 +269,7 @@ static acpi_status acpi_run_hpx(acpi_handle handle, struct hotplug_params *hpx) goto exit; break; default: - printk(KERN_ERR "%s: Type %d record not supported\n", + pr_err("%s: Type %d record not supported\n", __func__, type); status = AE_ERROR; goto exit; diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index f9cd4d4..06af0c3 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -2613,7 +2613,7 @@ static void nvbridge_check_legacy_irq_routing(struct pci_dev *dev) pci_read_config_dword(dev, 0x74, ); if (cfg & ((1 << 2) | (1 << 15))) { - printk(KERN_INFO "Rewriting IRQ routing register on MCP55\n"); + pr_info("Rewriting IRQ routing register on MCP55\n"); cfg &= ~((1 << 2) | (1 << 15)); pci_write_config_dword(dev, 0x74, cfg); } -- 2.7.4
Re: [PATCH] random: Document get_random_int() family
On Tue, Mar 26, 2019 at 04:26:54AM +, George Spelvin wrote: > Explain what these functions are for and when they offer > an advantage over get_random_bytes(). > > (We still need documentation on rng_is_initialized(), the > random_ready_callback system, and early boot in general.) > > Signed-off-by: George Spelvin Applied, thanks. - Ted
Re: [PATCH] lib: scatterlist: Fix to support no mapped sg
On 2019/4/19 18:43, Robert Jarzmik wrote: > Zhou Wang writes: > >> In function sg_split, the second sg_calculate_split will return -EINVAL >> when in_mapped_nents is 0. >> >> Indeed there is no need to do second sg_calculate_split and sg_split_mapped >> when in_mapped_nents is 0, as in_mapped_nents indicates no mapped entry in >> original sgl. >> >> Signed-off-by: Zhou Wang > That's right. May I know what is the usecase which led you to this fix ? I posted a patchset to support HiSilicon compression engine: https://lkml.org/lkml/2019/1/23/358. As Herber Xu suggested, it should use new crypto acomp API which uses sgl as data input and output. However, our hardware can not handle compress head for input/output data, so I need split input/output sgl firstly which is not mapped sgl. New version of HiSilicon compression engine driver has not been posted to community yet. I will do this later. > > Acked-by: Robert Jarzmik Many thanks for your review, Zhou > > -- > Robert > > . >
Re: [PATCH -next] sched/fair: Make some functions static
Pls ignore this. On 2019/3/22 22:30, Yue Haibing wrote: > From: YueHaibing > > Fix sparse warnings: > > kernel/sched/fair.c:2596:6: warning: symbol 'task_tick_numa' was not > declared. Should it be static? > kernel/sched/fair.c:3570:6: warning: symbol 'sync_entity_load_avg' was not > declared. Should it be static? > kernel/sched/fair.c:3583:6: warning: symbol 'remove_entity_load_avg' was not > declared. Should it be static? > > Signed-off-by: YueHaibing > --- > kernel/sched/fair.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index fdab7eb..18fb781 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -2593,7 +2593,7 @@ void task_numa_work(struct callback_head *work) > /* > * Drive the periodic memory faults.. > */ > -void task_tick_numa(struct rq *rq, struct task_struct *curr) > +static void task_tick_numa(struct rq *rq, struct task_struct *curr) > { > struct callback_head *work = >numa_work; > u64 period, now; > @@ -3567,7 +3567,7 @@ static inline u64 cfs_rq_last_update_time(struct cfs_rq > *cfs_rq) > * Synchronize entity load avg of dequeued entity without locking > * the previous rq. > */ > -void sync_entity_load_avg(struct sched_entity *se) > +static void sync_entity_load_avg(struct sched_entity *se) > { > struct cfs_rq *cfs_rq = cfs_rq_of(se); > u64 last_update_time; > @@ -3580,7 +3580,7 @@ void sync_entity_load_avg(struct sched_entity *se) > * Task first catches up with cfs_rq, and then subtract > * itself from the cfs_rq (task must be off the queue now). > */ > -void remove_entity_load_avg(struct sched_entity *se) > +static void remove_entity_load_avg(struct sched_entity *se) > { > struct cfs_rq *cfs_rq = cfs_rq_of(se); > unsigned long flags; >
Re: [PATCH] random: fix CRNG initialization when random.trust_cpu=1
On Tue, Mar 19, 2019 at 01:28:47PM -0400, Jon DeVree wrote: > When the system boots with random.trust_cpu=1 it doesn't initialize the > per-NUMA CRNGs because it skips the rest of the CRNG startup code. This > means that the code from 1e7f583af67b ("random: make /dev/urandom scalable > for silly userspace programs") is not used when random.trust_cpu=1. > > crash> dmesg | grep random: > [0.00] random: get_random_bytes called from start_kernel+0x94/0x530 > with crng_init=0 > [0.314029] random: crng done (trusting CPU's manufacturer) > crash> print crng_node_pool > $6 = (struct crng_state **) 0x0 > > After adding the missing call to numa_crng_init() the per-NUMA CRNGs are > initialized again: > > crash> dmesg | grep random: > [0.00] random: get_random_bytes called from start_kernel+0x94/0x530 > with crng_init=0 > [0.314031] random: crng done (trusting CPU's manufacturer) > crash> print crng_node_pool > $1 = (struct crng_state **) 0x9a915f4014a0 > > The call to invalidate_batched_entropy() was also missing. This is > important for architectures like PPC and S390 which only have the > arch_get_random_seed_* functions. > > Fixes: 39a8883a2b98 ("random: add a config option to trust the CPU's hwrng") > Signed-off-by: Jon DeVree Thanks, applied. - Ted
Re: [PATCH -next] regulator: Make symbols static
On 2019/4/20 0:12, Mark Brown wrote: > On Tue, Apr 16, 2019 at 10:41:09PM +0800, Yue Haibing wrote: >> From: YueHaibing >> >> Fix sparse warnings: >> >> drivers/regulator/stm32-pwr.c:35:5: warning: >> symbol 'ready_mask_table' was not declared. Should it be static? >> drivers/regulator/stm32-pwr.c:47:5: warning: >> symbol 'stm32_pwr_reg_is_ready' was not declared. Should it be static? >> drivers/regulator/stm32-pwr.c:57:5: warning: >> symbol 'stm32_pwr_reg_is_enabled' was not declared. Should it be static? > > This doesn't apply against current code, please check and resend (I > think it's just that someone else made similar fixes already but didn't > check properly). Yep, there is already a fix from kbuild test robot: 82f26185a912 ("regulator: ready_mask_table[] can be static") >
Re: [PATCH] Staging: vc04_services: Cleanup in ctrl_set_bitrate()
On 04/19 :49, Stefan Wahren wrote: > Hi Madhumitha, > > Am 19.04.19 um 23:23 schrieb Madhumitha Prabakaran: > > Remove unnecessary variable and replace return type. > > > > Issue suggested by Coccinelle. > > > > Signed-off-by: Madhumitha Prabakaran > > --- > > drivers/staging/vc04_services/bcm2835-camera/controls.c | 7 +-- > > 1 file changed, 1 insertion(+), 6 deletions(-) > > > > diff --git a/drivers/staging/vc04_services/bcm2835-camera/controls.c > > b/drivers/staging/vc04_services/bcm2835-camera/controls.c > > index e39ab5fae724..f87fa7f61db3 100644 > > --- a/drivers/staging/vc04_services/bcm2835-camera/controls.c > > +++ b/drivers/staging/vc04_services/bcm2835-camera/controls.c > > @@ -607,18 +607,13 @@ static int ctrl_set_bitrate(struct bm2835_mmal_dev > > *dev, > > struct v4l2_ctrl *ctrl, > > const struct bm2835_mmal_v4l2_ctrl *mmal_ctrl) > > { > > - int ret; > > struct vchiq_mmal_port *encoder_out; > > > > dev->capture.encode_bitrate = ctrl->val; > > > > encoder_out = >component[MMAL_COMPONENT_VIDEO_ENCODE]->output[0]; > > > > - ret = vchiq_mmal_port_parameter_set(dev->instance, encoder_out, > > - mmal_ctrl->mmal_id, > > - >val, sizeof(ctrl->val)); > > - ret = 0; > > - return ret; > > + return 0; > > i don't understand why we can remove here the function call. I overlooked the function call. I will fix it. > > Stefan > > > } > > > > static int ctrl_set_bitrate_mode(struct bm2835_mmal_dev *dev,
Re: [PATCH] tracing: Fix a memory leak bug
On Fri, Apr 19, 2019 at 9:37 PM Steven Rostedt wrote: > > On Fri, 19 Apr 2019 21:22:59 -0500 > Wenwen Wang wrote: > > > In trace_pid_write(), the buffer for trace parser is allocated through > > kmalloc() in trace_parser_get_init(). Later on, after the buffer is used, > > it is then freed through kfree() in trace_parser_put(). However, it is > > possible that trace_pid_write() is terminated due to unexpected errors, > > e.g., ENOMEM. In that case, the allocated buffer will not be freed, which > > is a memory leak bug. > > > > To fix this issue, free the allocated buffer when an error is encountered. > > Thanks for the patch. Did you find this through manual inspection, > running KASAN or via one of the static analyzers? Thanks for your question, Steve. It was based on a prototype of a research project, which aims to statically detect memory leak bugs in operating system kernels. Wenwen
[PATCH] regulator: sy8106a: Get rid of struct sy8106a
All the fields in struct sy8106a are only used in sy8106a_i2c_probe(), so use local variables instead. Signed-off-by: Axel Lin --- drivers/regulator/sy8106a-regulator.c | 40 +-- 1 file changed, 13 insertions(+), 27 deletions(-) diff --git a/drivers/regulator/sy8106a-regulator.c b/drivers/regulator/sy8106a-regulator.c index 65fbd1f0b612..42e03b2c10a0 100644 --- a/drivers/regulator/sy8106a-regulator.c +++ b/drivers/regulator/sy8106a-regulator.c @@ -22,12 +22,6 @@ */ #define SY8106A_GO_BIT BIT(7) -struct sy8106a { - struct regulator_dev *rdev; - struct regmap *regmap; - u32 fixed_voltage; -}; - static const struct regmap_config sy8106a_regmap_config = { .reg_bits = 8, .val_bits = 8, @@ -70,36 +64,32 @@ static const struct regulator_desc sy8106a_reg = { static int sy8106a_i2c_probe(struct i2c_client *i2c, const struct i2c_device_id *id) { - struct sy8106a *chip; struct device *dev = >dev; - struct regulator_dev *rdev = NULL; + struct regulator_dev *rdev; struct regulator_config config = { }; + struct regmap *regmap; unsigned int reg, vsel; + u32 fixed_voltage; int error; - chip = devm_kzalloc(>dev, sizeof(struct sy8106a), GFP_KERNEL); - if (!chip) - return -ENOMEM; - error = of_property_read_u32(dev->of_node, "silergy,fixed-microvolt", ->fixed_voltage); +_voltage); if (error) return error; - if (chip->fixed_voltage < SY8106A_MIN_MV * 1000 || - chip->fixed_voltage > SY8106A_MAX_MV * 1000) + if (fixed_voltage < SY8106A_MIN_MV * 1000 || + fixed_voltage > SY8106A_MAX_MV * 1000) return -EINVAL; - chip->regmap = devm_regmap_init_i2c(i2c, _regmap_config); - if (IS_ERR(chip->regmap)) { - error = PTR_ERR(chip->regmap); + regmap = devm_regmap_init_i2c(i2c, _regmap_config); + if (IS_ERR(regmap)) { + error = PTR_ERR(regmap); dev_err(dev, "Failed to allocate register map: %d\n", error); return error; } config.dev = >dev; - config.regmap = chip->regmap; - config.driver_data = chip; + config.regmap = regmap; config.of_node = dev->of_node; config.init_data = of_get_regulator_init_data(dev, dev->of_node, @@ -109,15 +99,15 @@ static int sy8106a_i2c_probe(struct i2c_client *i2c, return -ENOMEM; /* Ensure GO_BIT is enabled when probing */ - error = regmap_read(chip->regmap, SY8106A_REG_VOUT1_SEL, ); + error = regmap_read(regmap, SY8106A_REG_VOUT1_SEL, ); if (error) return error; if (!(reg & SY8106A_GO_BIT)) { - vsel = (chip->fixed_voltage / 1000 - SY8106A_MIN_MV) / + vsel = (fixed_voltage / 1000 - SY8106A_MIN_MV) / SY8106A_STEP_MV; - error = regmap_write(chip->regmap, SY8106A_REG_VOUT1_SEL, + error = regmap_write(regmap, SY8106A_REG_VOUT1_SEL, vsel | SY8106A_GO_BIT); if (error) return error; @@ -131,10 +121,6 @@ static int sy8106a_i2c_probe(struct i2c_client *i2c, return error; } - chip->rdev = rdev; - - i2c_set_clientdata(i2c, chip); - return 0; } -- 2.17.1
Re: [RFC PATCH 16/62] f2fs: switch to ->free_inode()
On 2019/4/17 1:52, Al Viro wrote: > From: Al Viro > > Signed-off-by: Al Viro Acked-by: Chao Yu Thanks,
Re: [PATCH] tracing: Fix a memory leak bug
On Fri, 19 Apr 2019 21:22:59 -0500 Wenwen Wang wrote: > In trace_pid_write(), the buffer for trace parser is allocated through > kmalloc() in trace_parser_get_init(). Later on, after the buffer is used, > it is then freed through kfree() in trace_parser_put(). However, it is > possible that trace_pid_write() is terminated due to unexpected errors, > e.g., ENOMEM. In that case, the allocated buffer will not be freed, which > is a memory leak bug. > > To fix this issue, free the allocated buffer when an error is encountered. Thanks for the patch. Did you find this through manual inspection, running KASAN or via one of the static analyzers? -- Steve > > Signed-off-by: Wenwen Wang > --- > kernel/trace/trace.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > index 6c24755..fd12c9c 100644 > --- a/kernel/trace/trace.c > +++ b/kernel/trace/trace.c > @@ -496,8 +496,10 @@ int trace_pid_write(struct trace_pid_list > *filtered_pids, >* not modified. >*/ > pid_list = kmalloc(sizeof(*pid_list), GFP_KERNEL); > - if (!pid_list) > + if (!pid_list) { > + trace_parser_put(); > return -ENOMEM; > + } > > pid_list->pid_max = READ_ONCE(pid_max); > > @@ -507,6 +509,7 @@ int trace_pid_write(struct trace_pid_list > *filtered_pids, > pid_list->pids = vzalloc((pid_list->pid_max + 7) >> 3); > if (!pid_list->pids) { > + trace_parser_put(); > kfree(pid_list); > return -ENOMEM; > }
Re: [PATCH 1/6] ARM: ks8695: watchdog: stop using mach/*.h
Hi Arnd, On 16/4/19 6:24 am, Arnd Bergmann wrote: drivers should not rely on machine specific headers but get their information from the platform device. Signed-off-by: Arnd Bergmann I like the whole series, thanks for doing this. I haven't looked at the KS8695 in a long time now. I am not sure that I have any working hardware - but I will have a look around my lab and see if I can find something. I'll get back to you with acks and tested bys soon. Regards Greg --- arch/arm/mach-ks8695/devices.c | 13 - drivers/watchdog/Kconfig | 2 +- drivers/watchdog/ks8695_wdt.c | 30 +- 3 files changed, 30 insertions(+), 15 deletions(-) diff --git a/arch/arm/mach-ks8695/devices.c b/arch/arm/mach-ks8695/devices.c index 61cf20beb45f..57766817d86f 100644 --- a/arch/arm/mach-ks8695/devices.c +++ b/arch/arm/mach-ks8695/devices.c @@ -169,11 +169,22 @@ void __init ks8696_add_device_hpna(void) /* * Watchdog * */ +#define KS8695_TMR_OFFSET (0xF + 0xE400) +#define KS8695_TMR_PA (KS8695_IO_PA + KS8695_TMR_OFFSET) +static struct resource ks8695_wdt_resources[] = { + [0] = { + .name = "tmr", + .start = KS8695_TMR_PA, + .end= KS8695_TMR_PA + 0xf, + .flags = IORESOURCE_MEM, + }, +}; static struct platform_device ks8695_wdt_device = { .name = "ks8695_wdt", .id = -1, - .num_resources = 0, + .resource = ks8695_wdt_resources, + .num_resources = ARRAY_SIZE(ks8695_wdt_resources), }; static void __init ks8695_add_device_watchdog(void) diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig index 242eea859637..046e01daef57 100644 --- a/drivers/watchdog/Kconfig +++ b/drivers/watchdog/Kconfig @@ -397,7 +397,7 @@ config IXP4XX_WATCHDOG config KS8695_WATCHDOG tristate "KS8695 watchdog" - depends on ARCH_KS8695 + depends on ARCH_KS8695 || COMPILE_TEST help Watchdog timer embedded into KS8695 processor. This will reboot your system when the timeout is reached. diff --git a/drivers/watchdog/ks8695_wdt.c b/drivers/watchdog/ks8695_wdt.c index 1e41818a44bc..87c542c2f912 100644 --- a/drivers/watchdog/ks8695_wdt.c +++ b/drivers/watchdog/ks8695_wdt.c @@ -23,10 +23,8 @@ #include #include #include -#include -#define KS8695_TMR_OFFSET (0xF + 0xE400) -#define KS8695_TMR_VA (KS8695_IO_VA + KS8695_TMR_OFFSET) +#define KS8695_CLOCK_RATE 2500 /* * Timer registers @@ -57,6 +55,7 @@ MODULE_PARM_DESC(nowayout, "Watchdog cannot be stopped once started (default=" static unsigned long ks8695wdt_busy; static DEFINE_SPINLOCK(ks8695_lock); +static void __iomem *tmr_reg; /* . */ @@ -69,8 +68,8 @@ static inline void ks8695_wdt_stop(void) spin_lock(_lock); /* disable timer0 */ - tmcon = __raw_readl(KS8695_TMR_VA + KS8695_TMCON); - __raw_writel(tmcon & ~TMCON_T0EN, KS8695_TMR_VA + KS8695_TMCON); + tmcon = __raw_readl(tmr_reg + KS8695_TMCON); + __raw_writel(tmcon & ~TMCON_T0EN, tmr_reg + KS8695_TMCON); spin_unlock(_lock); } @@ -84,15 +83,15 @@ static inline void ks8695_wdt_start(void) spin_lock(_lock); /* disable timer0 */ - tmcon = __raw_readl(KS8695_TMR_VA + KS8695_TMCON); - __raw_writel(tmcon & ~TMCON_T0EN, KS8695_TMR_VA + KS8695_TMCON); + tmcon = __raw_readl(tmr_reg + KS8695_TMCON); + __raw_writel(tmcon & ~TMCON_T0EN, tmr_reg + KS8695_TMCON); /* program timer0 */ - __raw_writel(tval | T0TC_WATCHDOG, KS8695_TMR_VA + KS8695_T0TC); + __raw_writel(tval | T0TC_WATCHDOG, tmr_reg + KS8695_T0TC); /* re-enable timer0 */ - tmcon = __raw_readl(KS8695_TMR_VA + KS8695_TMCON); - __raw_writel(tmcon | TMCON_T0EN, KS8695_TMR_VA + KS8695_TMCON); + tmcon = __raw_readl(tmr_reg + KS8695_TMCON); + __raw_writel(tmcon | TMCON_T0EN, tmr_reg + KS8695_TMCON); spin_unlock(_lock); } @@ -105,9 +104,9 @@ static inline void ks8695_wdt_reload(void) spin_lock(_lock); /* disable, then re-enable timer0 */ - tmcon = __raw_readl(KS8695_TMR_VA + KS8695_TMCON); - __raw_writel(tmcon & ~TMCON_T0EN, KS8695_TMR_VA + KS8695_TMCON); - __raw_writel(tmcon | TMCON_T0EN, KS8695_TMR_VA + KS8695_TMCON); + tmcon = __raw_readl(tmr_reg + KS8695_TMCON); + __raw_writel(tmcon & ~TMCON_T0EN, tmr_reg + KS8695_TMCON); + __raw_writel(tmcon | TMCON_T0EN, tmr_reg + KS8695_TMCON); spin_unlock(_lock); } @@ -238,6 +237,11 @@ static struct miscdevice ks8695wdt_miscdev = { static int ks8695wdt_probe(struct platform_device *pdev) { int res; +
Re: [tip:core/rseq] rseq/selftests/x86: Work around bogus gcc-8 optimisation
- On Apr 19, 2019, at 5:13 PM, Mathieu Desnoyers mathieu.desnoy...@efficios.com wrote: > - On Apr 19, 2019, at 2:53 PM, Linus Torvalds > torva...@linux-foundation.org > wrote: > >> On Fri, Apr 19, 2019 at 11:46 AM tip-bot for Mathieu Desnoyers >> wrote: >>> >>> rseq/selftests/x86: Work around bogus gcc-8 optimisation >>> >>> At least the following versions of gcc-8: >>> >>> - gcc version 8.0.1 20180414 (experimental) [trunk revision 259383] (Ubuntu >>> 8-20180414-1ubuntu2) >>> - gcc 8.2.0-7ubuntu1 (Ubuntu 18.10 (Cosmic)), >>> >>> generate broken assembler with asm goto that have a thread-local storage >>> "m" input operand on both x86-32 and x86-64. For instance: >> >> Is there a gcc bugzilla for this? Shouldn't that be mentioned here? > > My google-fu did not find any match, so I don't think so. I'll open one > shortly. > >> >> Also, we use "asm goto" together with "m" all the time in the kernel. >> In fact, it's the most common case, with the RMWcc ops being generated >> with that. I realize that we don't use the gcc thread-local storage >> for them (we often do use our *own* thread-local storage), but it >> would be good to have that gcc bugzilla to see why it can only affect >> those user level "__thread" cases. > > Indeed, it would be good to investigate and understand all problematic > scenarios triggering this bug. I've opened a ticket on gcc's bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90193 Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
[PATCH] tracing: Fix a memory leak bug
In trace_pid_write(), the buffer for trace parser is allocated through kmalloc() in trace_parser_get_init(). Later on, after the buffer is used, it is then freed through kfree() in trace_parser_put(). However, it is possible that trace_pid_write() is terminated due to unexpected errors, e.g., ENOMEM. In that case, the allocated buffer will not be freed, which is a memory leak bug. To fix this issue, free the allocated buffer when an error is encountered. Signed-off-by: Wenwen Wang --- kernel/trace/trace.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 6c24755..fd12c9c 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -496,8 +496,10 @@ int trace_pid_write(struct trace_pid_list *filtered_pids, * not modified. */ pid_list = kmalloc(sizeof(*pid_list), GFP_KERNEL); - if (!pid_list) + if (!pid_list) { + trace_parser_put(); return -ENOMEM; + } pid_list->pid_max = READ_ONCE(pid_max); @@ -507,6 +509,7 @@ int trace_pid_write(struct trace_pid_list *filtered_pids, pid_list->pids = vzalloc((pid_list->pid_max + 7) >> 3); if (!pid_list->pids) { + trace_parser_put(); kfree(pid_list); return -ENOMEM; } -- 2.7.4
Re: [PATCH 5/6] ARM: ks8695, serial: skip manual tx IRQ ack
Hi Arnd, On 16/4/19 6:24 am, Arnd Bergmann wrote: The TX interrupt is marked as edge triggered, so it will already be acked by the top-level irq code, and does not need the ack in the driver. Removing this avoids a nasty dependency on the regs-irq.h file that is otherwise reserved for the interrupt controller driver. Signed-off-by: Arnd Bergmann --- drivers/tty/serial/serial_ks8695.c | 5 - 1 file changed, 5 deletions(-) diff --git a/drivers/tty/serial/serial_ks8695.c b/drivers/tty/serial/serial_ks8695.c index b461d791188c..6c5e9900e69d 100644 --- a/drivers/tty/serial/serial_ks8695.c +++ b/drivers/tty/serial/serial_ks8695.c @@ -21,7 +21,6 @@ #include #include -#include #if defined(CONFIG_SERIAL_KS8695_CONSOLE) && defined(CONFIG_MAGIC_SYSRQ) #define SUPPORT_SYSRQ @@ -52,8 +51,6 @@ #define UART_GET_BRDR(p) __raw_readl((p)->membase + KS8695_URBD) #define UART_PUT_BRDR(p, c) __raw_writel((c), (p)->membase + KS8695_URBD) -#define KS8695_CLR_TX_INT() __raw_writel(1 << KS8695_IRQ_UART_TX, KS8695_IRQ_VA + KS8695_INTST) - #define UART_DUMMY_LSR_RX 0x100 #define UART_PORT_SIZE(KS8695_USR - KS8695_URRB + 4) @@ -207,7 +204,6 @@ static irqreturn_t ks8695uart_tx_chars(int irq, void *dev_id) unsigned int count; if (port->x_char) { - KS8695_CLR_TX_INT(); UART_PUT_CHAR(port, port->x_char); port->icount.tx++; port->x_char = 0; @@ -221,7 +217,6 @@ static irqreturn_t ks8695uart_tx_chars(int irq, void *dev_id) count = 16; /* fifo size */ while (!uart_circ_empty(xmit) && (count-- > 0)) { - KS8695_CLR_TX_INT(); I haven't looked at the ks8695 in quite a while... But I recall that this was very problematic at the time. Without this being done after each character it was very easy to get the transmitter to "hang" - and stop wanting to send any more characters. I'd like to test this before acking. Regards Greg UART_PUT_CHAR(port, xmit->buf[xmit->tail]); xmit->tail = (xmit->tail + 1) & (UART_XMIT_SIZE - 1);
[PATCH] HID: intel-ish-hid: Add Comet Lake PCI device ID
Added Comet Lake PCI device ID to the supported device list. Signed-off-by: Srinivas Pandruvada --- drivers/hid/intel-ish-hid/ipc/hw-ish.h | 1 + drivers/hid/intel-ish-hid/ipc/pci-ish.c | 1 + 2 files changed, 2 insertions(+) diff --git a/drivers/hid/intel-ish-hid/ipc/hw-ish.h b/drivers/hid/intel-ish-hid/ipc/hw-ish.h index 08a8327dfd22..523c0cbd44a4 100644 --- a/drivers/hid/intel-ish-hid/ipc/hw-ish.h +++ b/drivers/hid/intel-ish-hid/ipc/hw-ish.h @@ -31,6 +31,7 @@ #define CNL_H_DEVICE_ID0xA37C #define ICL_MOBILE_DEVICE_ID 0x34FC #define SPT_H_DEVICE_ID0xA135 +#define CML_LP_DEVICE_ID 0x02FC #defineREVISION_ID_CHT_A0 0x6 #defineREVISION_ID_CHT_Ax_SI 0x0 diff --git a/drivers/hid/intel-ish-hid/ipc/pci-ish.c b/drivers/hid/intel-ish-hid/ipc/pci-ish.c index a6e1ee744f4d..ac0a179daf23 100644 --- a/drivers/hid/intel-ish-hid/ipc/pci-ish.c +++ b/drivers/hid/intel-ish-hid/ipc/pci-ish.c @@ -40,6 +40,7 @@ static const struct pci_device_id ish_pci_tbl[] = { {PCI_DEVICE(PCI_VENDOR_ID_INTEL, CNL_H_DEVICE_ID)}, {PCI_DEVICE(PCI_VENDOR_ID_INTEL, ICL_MOBILE_DEVICE_ID)}, {PCI_DEVICE(PCI_VENDOR_ID_INTEL, SPT_H_DEVICE_ID)}, + {PCI_DEVICE(PCI_VENDOR_ID_INTEL, CML_LP_DEVICE_ID)}, {0, } }; MODULE_DEVICE_TABLE(pci, ish_pci_tbl); -- 2.17.2
Re: [PATCH v2] clk: hi3660: Mark clk_gate_ufs_subsys as critical
On Fri, Apr 19, 2019 at 03:20:42PM -0700, Stephen Boyd wrote: > Quoting Leo Yan (2019-03-20 03:05:08) > > clk_gate_ufs_subsys is a system bus clock, turning off it will > > introduce lockup issue during system suspend flow. Let's mark > > clk_gate_ufs_subsys as critical clock, thus keeps it on during > > system suspend and resume. > > > > Fixes: d374e6fd5088 ("clk: hisilicon: Add clock driver for hi3660 SoC") > > Cc: sta...@vger.kernel.org > > Cc: Zhong Kaihua > > Cc: John Stultz > > Cc: Zhangfei Gao > > Suggested-by: Dong Zhang > > Signed-off-by: Leo Yan > > --- > > Applied to clk-next Thanks!
Re: [PATCH 4.9 00/50] 4.9.170-stable review
On Fri, Apr 19, 2019 at 01:16:37PM -0700, Guenter Roeck wrote: On Fri, Apr 19, 2019 at 12:39:26PM -0700, Guenter Roeck wrote: On Thu, Apr 18, 2019 at 07:57:11PM +0200, Greg Kroah-Hartman wrote: > This is the start of the stable review cycle for the 4.9.170 release. > There are 50 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Sat Apr 20 16:03:22 UTC 2019. > Anything received after that time might be too late. > Build results: total: 172 pass: 170 fail: 2 Failed builds: i386:tools/perf x86_64:tools/perf Caused by "tools include: Adopt linux/bits.h". That isn't really a bug fix, and since it is the last commit in the perf directory it isn't needed either. Reverting it fixes the problem. Indeed. I'll remove it from all branches since it doesn't look like a fix to begin with. -- Thanks, Sasha
Re: [GIT PULL] cgroup fix for v5.1-rc5
The pull request you sent on Fri, 19 Apr 2019 17:28:34 -0700: > git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-5.1-fixes has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/371dd432ab39f7bc55d6ec77d63b430285627e04 Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [PATCH RFC 1/2] Add polling support to pidfd
On Fri, Apr 19, 2019 at 04:11:37PM -0700, Linus Torvalds wrote: > On Fri, Apr 19, 2019 at 2:20 PM Joel Fernandes wrote: > > > > According to Linus, POLLHUP usually indicates that something is readable: > > Note that if you use the legacy interfaces (ie "select()"), then even > just a plain POLLHUP will always just show as "readable". > > So for a lot of applications, it won't even matter. Returning POLLHUP > will mean it's readable whether POLLIN is set or not (and EPOLLERR > will automatically mean that it's marked both readable and writable). > > In fact, I'd argue that not a lot of people care about the more > esoteric bits at all. If your poll function does POLLIN and POLLOUT, > it's fine. Anything else is specialized enough that most people simply > won't care. Don't overdesign things too much. You need to have a major > reason to care about the other POLL bits. > > It's also worth noting that POLLERR/POLLHUP/POLLNVAL cannot be masked > for "poll()". Even if you only ask for POLLIN/POLLOUT, you will always > get POLLERR/POLLHUP notification. That is again historical behavior, > and it's kind of a "you can't poll a hung up fd". But it once again > means that you should consider POLLHUP to be something *exceptional* > and final, where no further or other state changes can happen or are > relevant. > > That may well work fine for pidfd when the task is gone, but it's > really worth noting that any *normal* state should be about > POLLIN/POLLOUT. People should not think that "POLLHUP sounds like the > appropriate name", they should realize that POLLHUP is basically a > terminal error condition, not a "input is available". > > So just keep that in mind. Got it, thanks a lot for the detailed explanation of the flags. So then I feel I should not return POLLHUP or POLLERR because the task_struct getting reaped is not an error condition. I will simplify the patch and repost it. It seems that would be the best thing to do to serve the usecase. We just want to know when the task exited and a reaped task also counts as one that exited. thanks, - Joel
Re: [PATCH 3/6] y2038: linux: Provide __clock_settime64 implementation
15.04.2019 в 00:08:38 +0200 Lukasz Majewski написал: > +# if defined __NR_clock_settime64 > + /* Make sure that passed __timespec64 struct pad is 0. */ > + struct __timespec64 ts = *tp; > + ts.tv_pad = 0; > + return INLINE_SYSCALL_CALL (clock_settime64, clock_id, ); Isn't kernel supposed to zero out padding on its own? At least comment in kernel's get_timespec64 says so: /* Zero out the padding for 32 bit systems or in compat mode */ if (IS_ENABLED(CONFIG_64BIT_TIME) && in_compat_syscall()) kts.tv_nsec &= 0xUL; The code looks buggy though. It fails to zero out the padding in 32-bit kernels. That part is probably broken since 98f76206b3350 ("compat: Cleanup in_compat_syscall() callers"). And, hmm, is CONFIG_64BIT_TIME enabled anywhere?
Re: [PATCH] kexec_buffer measure
Currently for soft reboot(kexec_file_load) the kernel file and signature is measured by IMA. The cmdline args used to load the kernel is not measured. The boot aggregate that gets calculated will have no change since the EFI loader has not been triggered. Adding the kexec cmdline args measure and kernel version will add some attestable criteria. Following patch v2 set adds support in IMA to measure buffers, and uses the same to measure the cmdline arguments passed in case of kexec. Additional support to measure kernel version will follow in a subsequent patch set. Support for kexec_buffer pass for will be added later for the whole attestation/appraisal to work. Approaches and clarifications: 1) Use of Sig template Appraising buffer entries can't be done unless the buffers are pre calculated and available for appraisal. This raises a problem for cmdline args since the kernel can be loaded with a large set of args. For buffer entries, the appraisal can be deferred to attestation service, the event data is needed in that case. Adding the event data to ima template can be done by either adding a new template or reusing an existing one. -Using a new template entry will only work for buffers and adds no current use case for files, every EXISTING entry for measuring files will use this new template and so all but the buffer measurement line item will have the new columns empty. this will be a huge change pertaining to ima for a not a lot of use cases. 2) Adding a LSM hook We are doing both the command line and kernel version measurement in IMA. Can you please elaborate on how this can be used outside of the scenario? That will help me come back with a better design and code. I am neutral about this. I greatly appreciate the time and your feedback. -Prakhar Srivastava On Mon, Apr 15, 2019 at 2:38 PM Mimi Zohar wrote: > > On Fri, 2019-04-12 at 11:08 -0700, prsriv...@gmail.com wrote: > > From: Prakhar Srivastava > > > > This adds a generic buffer measure ima function hook. > > A patch description should begin with a problem description or > motivation for the patch, before describing the solution. > > > The Buffer passed will be added to the sig field of the template if used. > > Template fields are defined in ima_template.c You can't simply re-use > an existing field like this. > > > The hash(buffer) is added as the IMA measurements, along > > side the buffer itself in hex. > > For cmdline: kernel file name is prefixed to the cmdline to > > distinguish between callers. > > An enum is used to control what all buffers can be written to it. > > > > Verified How: > > Replaced kernel on machine. > > read ima_measurement_log > > called kexec -s > > read ima_measurement_log > > - A new entry for the cmdline passed can be seen. > > - HEX to text verified: > > http://www.unit-conversion.info/texttools/hexadecimal/ > > - Generated Hash for the text buffer and matched it with in IMA log > > The kexec man page provides this information. I don't see the need for > these details here. > > Please refer to section "2) Describe your changes" in > Documentation/process/submitting-patches.rst for instructions on > writing a patch description. > > > > > Signed-off-by: Prakhar Srivastava > > > --- > > include/linux/ima.h | 21 - > > kernel/kexec_core.c | 68 +++ > > kernel/kexec_file.c | 18 > > kernel/kexec_internal.h | 6 +- > > security/integrity/ima/Kconfig| 16 > > security/integrity/ima/ima_main.c | 135 ++ > > security/integrity/integrity.h| 3 + > > 7 files changed, 264 insertions(+), 3 deletions(-) > > > > diff --git a/include/linux/ima.h b/include/linux/ima.h > > index 7f6952f8d6aa..46c3b95b2637 100644 > > --- a/include/linux/ima.h > > +++ b/include/linux/ima.h > > @@ -14,6 +14,14 @@ > > #include > > struct linux_binprm; > > > > +#ifdef CONFIG_IMA_BUFFER_MEASURE > > +enum buffer_id{ > > + KEXEC_CMDLINE = 1 > > + // other buffer ids > > + // KERNEL_VERSION > > There are already two enumerations - kernel_read_file_id and > kernel_load_file_id. Unless there are clear examples of other buffer > data, please don't define a new enumeration. Removed new enums. > > > +}; > > +#endif > > + > > #ifdef CONFIG_IMA > > extern int ima_bprm_check(struct linux_binprm *bprm); > > extern int ima_file_check(struct file *file, int mask, int opened); > > @@ -23,7 +31,10 @@ extern int ima_read_file(struct file *file, enum > > kernel_read_file_id id); > > extern int ima_post_read_file(struct file *file, void *buf, loff_t size, > > enum kernel_read_file_id id); > > extern void ima_post_path_mknod(struct dentry *dentry); > > - > > +#ifdef CONFIG_IMA_BUFFER_MEASURE > > +extern int ima_add_buffer_measure(const void *buff, > > + loff_t size, enum buffer_id id); > > +#endif > > #ifdef CONFIG_IMA_KEXEC > > extern void
[PATCH] dcache: ensure d_flags & d_inode are consistent in lookup_fast()
After extending the size of dentry from 192-bytes to 208-bytes under aarch64, we got oops during the running of xfstests generic/429: Unable to handle kernel NULL pointer dereference at virtual address 0002 CPU: 3 PID: 2725 Comm: t_encrypted_d_r Tainted: G D 5.1.0-rc4 pc : inode_permission+0x28/0x160 lr : link_path_walk.part.11+0x27c/0x528 .. Call trace: inode_permission+0x28/0x160 link_path_walk.part.11+0x27c/0x528 path_lookupat+0x64/0x208 filename_lookup+0xa0/0x178 user_path_at_empty+0x58/0x70 vfs_statx+0x94/0x118 __se_sys_newfstatat+0x58/0x98 __arm64_sys_newfstatat+0x24/0x30 el0_svc_common+0x7c/0x148 el0_svc_handler+0x38/0x88 el0_svc+0x8/0xc If we revert the size extension of dentry, the oops will be gone. However if we just move the d_inode field from the begin of dentry struct to the end of dentry struct (poorly simulate the way how __randomize_layout works), the oops will reoccur. The following scenario illustrates the problem: precondition: * dentry A has just been unlinked and becomes a negative dentry * dentry A is encrypted, so it has d_revalidate hook: fscrypt_d_revalidate() * lookup process is looking A/file, and creation process is creating A lookup process: creation process: lookup_fast __d_lookup_rcu returns dentry A d_revalidate returns -ECHILD d_revalidate again succeed __d_set_inode_and_type dentry->d_inode = inode WRITE_ONCE(dentry->d_flags, flags) d_is_negative(dentry) return false follow_managed doesn't nothing // inconsistent with d_flags d_backing_inode() return NULL nd->inode = NULL may_lookup() // oops occurs inode_permission(nd->inode The root cause is the inconsistency between d_flags & d_inode during the REF-walk in lookup_fast(): d_is_negative(dentry) returns false, but d_backing_inode() still returns a NULL pointer. The RCU-walk path in lookup_fast() uses d_seq to ensure d_flags & d_inode are consistent, and lookup_slow() use inode lock to ensure that, so only the REF-walk path in lookup_fast() is problematic. Fixing it by adding a paired smp_rmb/smp_wmb between the reading/writing of d_inode & d_flags to ensure the consistency. Signed-off-by: Hou Tao --- fs/dcache.c | 2 ++ fs/namei.c | 7 +++ 2 files changed, 9 insertions(+) diff --git a/fs/dcache.c b/fs/dcache.c index aac41adf4743..1eb85f9fcb0f 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -316,6 +316,8 @@ static inline void __d_set_inode_and_type(struct dentry *dentry, unsigned flags; dentry->d_inode = inode; + /* paired with smp_rmb() in lookup_fast() */ + smp_wmb(); flags = READ_ONCE(dentry->d_flags); flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU); flags |= type_flags; diff --git a/fs/namei.c b/fs/namei.c index dede0147b3f6..833f760c70b2 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -1628,6 +1628,13 @@ static int lookup_fast(struct nameidata *nd, return -ENOENT; } + /* +* Paired with smp_wmb() in __d_set_inode_and_type() to ensure +* d_backing_inode is not NULL after the checking of d_flags +* in d_is_negative() completes. +*/ + smp_rmb(); + path->mnt = mnt; path->dentry = dentry; err = follow_managed(path, nd); -- 2.16.2.dirty
Re: Urgent Reply
Dear friend, I am the Head of file Department in Africa Develop bank Burkina Faso. I need your sincere cooperation to transfer the sum of $15 million U.S.A dollars to your bank account. I want you stand as next of kin to the deceased. I agree 40% for you and 60% for me. Upon receipt of your reply, I will send to you by email the text of the application. I will not fail to bring to your notice that this transaction is hitch free and that you should not entertain any atom of fear as all required arrangements have been made for the transfer. I am waiting for your immediate response. Yours faithfully, Mr.Frank Yacouba
[GIT PULL] cgroup fix for v5.1-rc5
Hello, Linus. A patch to fix a RCU imbalance error in the devices cgroup configuration error path. Thanks. The following changes since commit 9e98c678c2d6ae3a17cb2de55d17f69dddaa231b: Linux 5.1-rc1 (2019-03-17 14:22:26 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-5.1-fixes for you to fetch changes up to 0fcc4c8c044e117ac126ab6df4138ea9a67fa2a9: device_cgroup: fix RCU imbalance in error case (2019-03-19 10:46:15 -0700) Jann Horn (1): device_cgroup: fix RCU imbalance in error case security/device_cgroup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/security/device_cgroup.c b/security/device_cgroup.c index cd97929fac66..dc28914fa72e 100644 --- a/security/device_cgroup.c +++ b/security/device_cgroup.c @@ -560,7 +560,7 @@ static int propagate_exception(struct dev_cgroup *devcg_root, devcg->behavior == DEVCG_DEFAULT_ALLOW) { rc = dev_exception_add(devcg, ex); if (rc) - break; + return rc; } else { /* * in the other possible cases: -- tejun
[tip:WIP.x86/alternatives 2/9] kernel/jump_label.c:387:10-11: WARNING: return of 0/1 in function 'jump_label_can_update_check' with return type bool
tree: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.x86/alternatives head: 1f30946b1a01baf22df6faf74c0a1e602bb6cac7 commit: 41bef31d0abe29b5888b33b526cacc8a30795318 [2/9] jump_label: Add the jump_label_can_update_check() helper If you fix the issue, kindly add following tag Reported-by: kbuild test robot coccinelle warnings: (new ones prefixed by >>) >> kernel/jump_label.c:387:10-11: WARNING: return of 0/1 in function >> 'jump_label_can_update_check' with return type bool vim +/jump_label_can_update_check +387 kernel/jump_label.c 376 377 bool jump_label_can_update_check(struct jump_entry *entry, bool init) 378 { 379 /* 380 * An entry->code of 0 indicates an entry which has been 381 * disabled because it was in an init text area. 382 */ 383 if (init || !jump_entry_is_init(entry)) { 384 if (!kernel_text_address(jump_entry_code(entry))) { 385 WARN_ONCE(1, "can't patch jump_label at %pS", 386(void *)jump_entry_code(entry)); > 387 return 0; 388 } 389 return 1; 390 } 391 return 0; 392 } 393 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
Re: [PATCH] Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic()
Paul E. McKenney's on April 20, 2019 4:26 am: > On Fri, Apr 19, 2019 at 08:00:17PM +0200, Peter Zijlstra wrote: >> On Fri, Apr 19, 2019 at 01:21:45PM -0400, Alan Stern wrote: >> > Index: usb-devel/Documentation/atomic_t.txt >> > === >> > --- usb-devel.orig/Documentation/atomic_t.txt >> > +++ usb-devel/Documentation/atomic_t.txt >> > @@ -171,7 +171,10 @@ The barriers: >> >smp_mb__{before,after}_atomic() >> > >> > only apply to the RMW ops and can be used to augment/upgrade the ordering >> > -inherent to the used atomic op. These barriers provide a full smp_mb(). >> > +inherent to the used atomic op. Unlike normal smp_mb() barriers, they >> > order >> > +only the RMW op itself against the instructions preceding the >> > +smp_mb__before_atomic() or following the smp_mb__after_atomic(); they do >> > +not order instructions on the other side of the RMW op at all. >> >> Now it is I who is confused; what? >> >> x = 1; >> smp_mb__before_atomic(); >> atomic_add(1, ); >> y = 1; >> >> the stores to both x and y will be ordered as if an smp_mb() where >> there. There is no order between a and y otoh. > > Let's look at x86. And a slightly different example: > > x = 1; > smp_mb__before_atomic(); > atomic_add(1, ); > r1 = y; > > The atomic_add() asm does not have the "memory" constraint, which is > completely legitimate because atomic_add() does not return a value, > and thus guarantees no ordering. The compiler is therefore within > its rights to transform the code into the following: > > x = 1; > smp_mb__before_atomic(); > r1 = y; > atomic_add(1, ); > > But x86's smp_mb__before_atomic() is just a compiler barrier, and > x86 is further allowed to reorder prior stores with later loads. > The CPU can therefore execute this code as follows: > > r1 = y; > x = 1; > smp_mb__before_atomic(); > atomic_add(1, ); > > So in general, the ordering is guaranteed only to the atomic itself, > not to accesses on the other side of the atomic. That's interesting. I don't think that's what all our code expects. I had the same idea as Peter. IIRC the primitive was originally introduced exactly so x86 would not need to have the unnecessary hardware barrier with sequences like smp_mb(); ... atomic_inc(); The "new" semantics are a bit subtle. One option might be just to replace it entirely with atomic_xxx_mb() ? Thanks, Nick
Re: [PATCH RFC 1/2] Add polling support to pidfd
On Sat, Apr 20, 2019 at 1:47 AM Daniel Colascione wrote: > > On Fri, Apr 19, 2019 at 4:12 PM Christian Brauner > wrote: > > > > On Sat, Apr 20, 2019 at 12:46 AM Daniel Colascione > > wrote: > > > > > > On Fri, Apr 19, 2019 at 3:02 PM Christian Brauner > > > wrote: > > > > > > > > On Fri, Apr 19, 2019 at 11:48 PM Christian Brauner > > > > wrote: > > > > > > > > > > On Fri, Apr 19, 2019 at 11:21 PM Daniel Colascione > > > > > wrote: > > > > > > > > > > > > On Fri, Apr 19, 2019 at 1:57 PM Christian Brauner > > > > > > wrote: > > > > > > > > > > > > > > On Fri, Apr 19, 2019 at 10:34 PM Daniel Colascione > > > > > > > wrote: > > > > > > > > > > > > > > > > On Fri, Apr 19, 2019 at 12:49 PM Joel Fernandes > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > On Fri, Apr 19, 2019 at 09:18:59PM +0200, Christian Brauner > > > > > > > > > wrote: > > > > > > > > > > On Fri, Apr 19, 2019 at 03:02:47PM -0400, Joel Fernandes > > > > > > > > > > wrote: > > > > > > > > > > > On Thu, Apr 18, 2019 at 07:26:44PM +0200, Christian > > > > > > > > > > > Brauner wrote: > > > > > > > > > > > > On April 18, 2019 7:23:38 PM GMT+02:00, Jann Horn > > > > > > > > > > > > wrote: > > > > > > > > > > > > >On Wed, Apr 17, 2019 at 3:09 PM Oleg Nesterov > > > > > > > > > > > > > wrote: > > > > > > > > > > > > >> On 04/16, Joel Fernandes wrote: > > > > > > > > > > > > >> > On Tue, Apr 16, 2019 at 02:04:31PM +0200, Oleg > > > > > > > > > > > > >> > Nesterov wrote: > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > Could you explain when it should return POLLIN? > > > > > > > > > > > > >> > > When the whole > > > > > > > > > > > > >process exits? > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > It returns POLLIN when the task is dead or doesn't > > > > > > > > > > > > >> > exist anymore, > > > > > > > > > > > > >or when it > > > > > > > > > > > > >> > is in a zombie state and there's no other thread > > > > > > > > > > > > >> > in the thread > > > > > > > > > > > > >group. > > > > > > > > > > > > >> > > > > > > > > > > > > >> IOW, when the whole thread group exits, so it can't > > > > > > > > > > > > >> be used to > > > > > > > > > > > > >monitor sub-threads. > > > > > > > > > > > > >> > > > > > > > > > > > > >> just in case... speaking of this patch it doesn't > > > > > > > > > > > > >> modify > > > > > > > > > > > > >proc_tid_base_operations, > > > > > > > > > > > > >> so you can't poll("/proc/sub-thread-tid") anyway, > > > > > > > > > > > > >> but iiuc you are > > > > > > > > > > > > >going to use > > > > > > > > > > > > >> the anonymous file returned by CLONE_PIDFD ? > > > > > > > > > > > > > > > > > > > > > > > > > >I don't think procfs works that way. > > > > > > > > > > > > >/proc/sub-thread-tid has > > > > > > > > > > > > >proc_tgid_base_operations despite not being a thread > > > > > > > > > > > > >group leader. > > > > > > > > > > > > >(Yes, that's kinda weird.) AFAICS the WARN_ON_ONCE() > > > > > > > > > > > > >in this code can > > > > > > > > > > > > >be hit trivially, and then the code will misbehave. > > > > > > > > > > > > > > > > > > > > > > > > > >@Joel: I think you'll have to either rewrite this to > > > > > > > > > > > > >explicitly bail > > > > > > > > > > > > >out if you're dealing with a thread group leader, or > > > > > > > > > > > > >make the code > > > > > > > > > > > > >work for threads, too. > > > > > > > > > > > > > > > > > > > > > > > > The latter case probably being preferred if this API is > > > > > > > > > > > > supposed to be > > > > > > > > > > > > useable for thread management in userspace. > > > > > > > > > > > > > > > > > > > > > > At the moment, we are not planning to use this for > > > > > > > > > > > sub-thread management. I > > > > > > > > > > > am reworking this patch to only work on clone(2) pidfds > > > > > > > > > > > which makes the above > > > > > > > > > > > > > > > > > > > > Indeed and agreed. > > > > > > > > > > > > > > > > > > > > > discussion about /proc a bit unnecessary I think. Per the > > > > > > > > > > > latest CLONE_PIDFD > > > > > > > > > > > patches, CLONE_THREAD with pidfd is not supported. > > > > > > > > > > > > > > > > > > > > Yes. We have no one asking for it right now and we can > > > > > > > > > > easily add this > > > > > > > > > > later. > > > > > > > > > > > > > > > > > > > > Admittedly I haven't gotten around to reviewing the patches > > > > > > > > > > here yet > > > > > > > > > > completely. But one thing about using POLLIN. FreeBSD is > > > > > > > > > > using POLLHUP > > > > > > > > > > on process exit which I think is nice as well. How about > > > > > > > > > > returning > > > > > > > > > > POLLIN | POLLHUP on process exit? > > > > > > > > > > We already do things like this. For example, when you proxy > > > > > > > > > > between > > > > > > > > > > ttys. If the process that you're reading data from has > > > > > > > > > > exited and closed > > > > > > > > > > it's end you still can't usually simply
[PATCH v2 1/3 RFC] added ima hook for buffer, being enabled as a policy
From: Prakhar Srivastava Signed-off-by: Prakhar Srivastava --- Currently for soft reboot(kexec_file_load) the kernel file and signature is measured by IMA. The cmdline args used to load the kernel is not measured. The boot aggregate that gets calculated will have no change since the EFI loader has not been triggered. Adding the kexec cmdline args measure and kernel version will add some attestable criteria. This adds a new ima hook ima_buffer_check and a policy entry BUFFER_CHECK. This enables buffer has measurements into ima log Documentation/ABI/testing/ima_policy | 1 + include/linux/ima.h | 13 +++- security/integrity/ima/ima.h | 1 + security/integrity/ima/ima_main.c| 95 security/integrity/ima/ima_policy.c | 14 +++- 5 files changed, 122 insertions(+), 2 deletions(-) diff --git a/Documentation/ABI/testing/ima_policy b/Documentation/ABI/testing/ima_policy index bb0f9a135e21..676088c7ab26 100644 --- a/Documentation/ABI/testing/ima_policy +++ b/Documentation/ABI/testing/ima_policy @@ -28,6 +28,7 @@ Description: base: func:= [BPRM_CHECK][MMAP_CHECK][FILE_CHECK][MODULE_CHECK] [FIRMWARE_CHECK] [KEXEC_KERNEL_CHECK] [KEXEC_INITRAMFS_CHECK] + [BUFFER_CHECK] mask:= [[^]MAY_READ] [[^]MAY_WRITE] [[^]MAY_APPEND] [[^]MAY_EXEC] fsmagic:= hex value diff --git a/include/linux/ima.h b/include/linux/ima.h index 7f6952f8d6aa..733d0cb9dedc 100644 --- a/include/linux/ima.h +++ b/include/linux/ima.h @@ -14,6 +14,12 @@ #include struct linux_binprm; +enum __buffer_id { + KERNEL_VERSION, + KEXEC_CMDLINE, + MAX_BUFFER_ID = KEXEC_CMDLINE +} buffer_id; + #ifdef CONFIG_IMA extern int ima_bprm_check(struct linux_binprm *bprm); extern int ima_file_check(struct file *file, int mask, int opened); @@ -23,7 +29,7 @@ extern int ima_read_file(struct file *file, enum kernel_read_file_id id); extern int ima_post_read_file(struct file *file, void *buf, loff_t size, enum kernel_read_file_id id); extern void ima_post_path_mknod(struct dentry *dentry); - +extern void ima_buffer_check(const void *buff, int size, enum buffer_id id); #ifdef CONFIG_IMA_KEXEC extern void ima_add_kexec_buffer(struct kimage *image); #endif @@ -65,6 +71,11 @@ static inline void ima_post_path_mknod(struct dentry *dentry) return; } +static inline void ima_buffer_check(const void *buff, int size, + enum buffer_id id) +{ + return; +} #endif /* CONFIG_IMA */ #ifndef CONFIG_IMA_KEXEC diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h index b563fbd4d122..b71f2f6f7421 100644 --- a/security/integrity/ima/ima.h +++ b/security/integrity/ima/ima.h @@ -181,6 +181,7 @@ enum ima_hooks { FIRMWARE_CHECK, KEXEC_KERNEL_CHECK, KEXEC_INITRAMFS_CHECK, + BUFFER_CHECK, POLICY_CHECK, MAX_CHECK }; diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c index 2aebb7984437..6408cadaadbb 100644 --- a/security/integrity/ima/ima_main.c +++ b/security/integrity/ima/ima_main.c @@ -155,6 +155,84 @@ void ima_file_free(struct file *file) ima_check_last_writer(iint, inode, file); } +/* + * process_buffer_measurement - Measure the buffer passed to ima log. + * (Instead of using the file hash the buffer hash is used). + * @buff - The buffer that needs to be added to the log + * @size - size of buffer(in bytes) + * @id - buffer id, this is differentiator for the various buffers + * that can be measured. + * + * The buffer passed is added to the ima logs. + * If the sig template is used, then the sig field contains the buffer. + * + * On success return 0. + * On error cases surface errors from ima calls. + */ +static int process_buffer_measurement(const void *buff, int size, + enum buffer_id id) +{ + int ret = -EINVAL; + struct ima_template_entry *entry = NULL; + struct integrity_iint_cache tmp_iint, *iint = _iint; + struct ima_event_data event_data = {iint, NULL, NULL, + NULL, 0, NULL}; + struct { + struct ima_digest_data hdr; + char digest[IMA_MAX_DIGEST_SIZE]; + } hash; + char *name = NULL; + int violation = 0; + int pcr = CONFIG_IMA_MEASURE_PCR_IDX; + + if (!buff || size == 0) + goto err_out; + + if (ima_get_action(NULL, 0, BUFFER_CHECK, ) != IMA_MEASURE) + goto err_out; + + switch (buffer_id) { + case KERNEL_VERSION: + name = "Kernel-version"; + break; + case KEXEC_CMDLINE: + name = "Kexec-cmdline"; + break; + default: + goto err_out; + } + +
[PATCH v2 2/3 RFC] use event name instead of enum to make the call generic
From: Prakhar Srivastava Signed-off-by: Prakhar Srivastava --- Currently for soft reboot(kexec_file_load) the kernel file and signature is measured by IMA. The cmdline args used to load the kernel is not measured. The boot aggregate that gets calculated will have no change since the EFI loader has not been triggered. Adding the kexec cmdline args measure and kernel version will add some attestable criteria. remove enums to control type of buffers entries, instead pass the event name to be used. include/linux/ima.h | 10 ++ kernel/kexec_file.c | 3 +++ security/integrity/ima/ima.h | 2 +- security/integrity/ima/ima_main.c | 30 ++ 4 files changed, 16 insertions(+), 29 deletions(-) diff --git a/include/linux/ima.h b/include/linux/ima.h index 733d0cb9dedc..5e41507c57e5 100644 --- a/include/linux/ima.h +++ b/include/linux/ima.h @@ -14,12 +14,6 @@ #include struct linux_binprm; -enum __buffer_id { - KERNEL_VERSION, - KEXEC_CMDLINE, - MAX_BUFFER_ID = KEXEC_CMDLINE -} buffer_id; - #ifdef CONFIG_IMA extern int ima_bprm_check(struct linux_binprm *bprm); extern int ima_file_check(struct file *file, int mask, int opened); @@ -29,7 +23,7 @@ extern int ima_read_file(struct file *file, enum kernel_read_file_id id); extern int ima_post_read_file(struct file *file, void *buf, loff_t size, enum kernel_read_file_id id); extern void ima_post_path_mknod(struct dentry *dentry); -extern void ima_buffer_check(const void *buff, int size, enum buffer_id id); +extern void ima_buffer_check(const void *buff, int size, char *eventname); #ifdef CONFIG_IMA_KEXEC extern void ima_add_kexec_buffer(struct kimage *image); #endif @@ -72,7 +66,7 @@ static inline void ima_post_path_mknod(struct dentry *dentry) } static inline void ima_buffer_check(const void *buff, int size, - enum buffer_id id) + char *eventname) { return; } diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index b118735fea9d..2a5234eb4b28 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -182,6 +182,9 @@ kimage_file_prepare_segments(struct kimage *image, int kernel_fd, int initrd_fd, ret = -EINVAL; goto out; } + + ima_buffer_check(image->cmdline_buf, cmdline_len - 1, + "kexec_cmdline"); } /* Call arch image load handlers */ diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h index b71f2f6f7421..fcade3c103ed 100644 --- a/security/integrity/ima/ima.h +++ b/security/integrity/ima/ima.h @@ -181,8 +181,8 @@ enum ima_hooks { FIRMWARE_CHECK, KEXEC_KERNEL_CHECK, KEXEC_INITRAMFS_CHECK, - BUFFER_CHECK, POLICY_CHECK, + BUFFER_CHECK, MAX_CHECK }; diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c index 6408cadaadbb..da82c705a5ed 100644 --- a/security/integrity/ima/ima_main.c +++ b/security/integrity/ima/ima_main.c @@ -160,8 +160,7 @@ void ima_file_free(struct file *file) * (Instead of using the file hash the buffer hash is used). * @buff - The buffer that needs to be added to the log * @size - size of buffer(in bytes) - * @id - buffer id, this is differentiator for the various buffers - * that can be measured. + * @id - eventname, event name to be used for buffer measurement. * * The buffer passed is added to the ima logs. * If the sig template is used, then the sig field contains the buffer. @@ -170,7 +169,7 @@ void ima_file_free(struct file *file) * On error cases surface errors from ima calls. */ static int process_buffer_measurement(const void *buff, int size, - enum buffer_id id) + char *eventname) { int ret = -EINVAL; struct ima_template_entry *entry = NULL; @@ -185,23 +184,13 @@ static int process_buffer_measurement(const void *buff, int size, int violation = 0; int pcr = CONFIG_IMA_MEASURE_PCR_IDX; - if (!buff || size == 0) + if (!buff || size == 0 || !eventname) goto err_out; if (ima_get_action(NULL, 0, BUFFER_CHECK, ) != IMA_MEASURE) goto err_out; - switch (buffer_id) { - case KERNEL_VERSION: - name = "Kernel-version"; - break; - case KEXEC_CMDLINE: - name = "Kexec-cmdline"; - break; - default: - goto err_out; - } - + name = eventname; memset(iint, 0, sizeof(*iint)); memset(, 0, sizeof(hash)); @@ -452,15 +441,16 @@ int ima_read_file(struct file *file, enum kernel_read_file_id read_id) * ima_buffer_check - based on policy, collect & store buffer measurement * @buf: pointer to buffer * @size: size of buffer - * @buffer_id: caller
[PATCH v2 3/3 RFC] since cmdline args can be same for multiple kexec, log entry hash will collide. Prepend the kernel file name to the cmdline args to distinguish between cmdline args passed to subseq
From: Prakhar Srivastava Signed-off-by: Prakhar Srivastava --- Currently for soft reboot(kexec_file_load) the kernel file and signature is measured by IMA. The cmdline args used to load the kernel is not measured. The boot aggregate that gets calculated will have no change since the EFI loader has not been triggered. Adding the kexec cmdline args measure and kernel version will add some attestable criteria. Cmdline args can be same for multiple kexec, log entry hash will collide. Prepend the kernel file name to the cmdline args to distinguish between cmdline args passed to subsequent kexec calls kernel/kexec_core.c | 57 + kernel/kexec_file.c | 14 -- kernel/kexec_internal.h | 3 +++ 3 files changed, 72 insertions(+), 2 deletions(-) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index ae1a3ba24df5..97b77c780311 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -1151,3 +1151,60 @@ void __weak arch_kexec_protect_crashkres(void) void __weak arch_kexec_unprotect_crashkres(void) {} + +/** + * kexec_cmdline_prepend_img_name - prepare the buffer with cmdline + * that needs to be measured + * @outbuf - out buffer that contains the formated string + * @kernel_fd - the file identifier for the kerenel image + * @cmdline_ptr - ptr to the cmdline buffer + * @cmdline_len - len of the buffer. + * + * This generates a buffer in the format Kerenelfilename::cmdline + * + * On success return 0. + * On failure return -EINVAL. + */ +int kexec_cmdline_prepend_img_name(char **outbuf, int kernel_fd, + const char *cmdline_ptr, + unsigned long cmdline_len) +{ + int ret = -EINVAL; + struct fd f = {}; + int size = 0; + char *buf = NULL; + char delimiter[] = "::"; + + if (!outbuf || !cmdline_ptr) + goto out; + + f = fdget(kernel_fd); + if (!f.file) + goto out; + + size = (f.file->f_path.dentry->d_name.len + cmdline_len - 1+ + ARRAY_SIZE(delimiter)) - 1; + + buf = kzalloc(size, GFP_KERNEL); + if (!buf) + goto out; + + memcpy(buf, f.file->f_path.dentry->d_name.name, + f.file->f_path.dentry->d_name.len); + memcpy(buf + f.file->f_path.dentry->d_name.len, + delimiter, ARRAY_SIZE(delimiter) - 1); + memcpy(buf + f.file->f_path.dentry->d_name.len + + ARRAY_SIZE(delimiter) - 1, + cmdline_ptr, cmdline_len - 1); + + *outbuf = buf; + ret = size; + + pr_debug("kexec cmdline buff: %s\n", buf); + +out: + if (f.file) + fdput(f); + + return ret; +} diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index 2a5234eb4b28..a487491d55b9 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -126,6 +126,8 @@ kimage_file_prepare_segments(struct kimage *image, int kernel_fd, int initrd_fd, int ret = 0; void *ldata; loff_t size; + char *buff_to_measure = NULL; + int buff_to_measure_size = 0; ret = kernel_read_file_from_fd(kernel_fd, >kernel_buf, , INT_MAX, READING_KEXEC_IMAGE); @@ -183,8 +185,13 @@ kimage_file_prepare_segments(struct kimage *image, int kernel_fd, int initrd_fd, goto out; } - ima_buffer_check(image->cmdline_buf, cmdline_len - 1, - "kexec_cmdline"); + /* IMA measures the cmdline args passed to the next kernel*/ + buff_to_measure_size = kexec_cmdline_prepend_img_name(_to_measure, + kernel_fd, image->cmdline_buf, image->cmdline_buf_len); + + ima_buffer_check(buff_to_measure, buff_to_measure_size, + "kexec_cmdline"); + } /* Call arch image load handlers */ @@ -200,6 +207,9 @@ kimage_file_prepare_segments(struct kimage *image, int kernel_fd, int initrd_fd, /* In case of error, free up all allocated memory in this function */ if (ret) kimage_file_post_load_cleanup(image); + + kfree(buff_to_measure); + return ret; } diff --git a/kernel/kexec_internal.h b/kernel/kexec_internal.h index 799a8a452187..4d34a8ef4637 100644 --- a/kernel/kexec_internal.h +++ b/kernel/kexec_internal.h @@ -11,6 +11,9 @@ int kimage_load_segment(struct kimage *image, struct kexec_segment *segment); void kimage_terminate(struct kimage *image); int kimage_is_destination_range(struct kimage *image, unsigned long start, unsigned long end); +int kexec_cmdline_prepend_img_name(char **outbuf, int kernel_fd, + const char *cmdline_ptr, + unsigned long cmdline_len); extern struct mutex kexec_mutex; -- 2.17.1
Re: [PATCH RFC 1/2] Add polling support to pidfd
On Sat, Apr 20, 2019 at 1:30 AM Daniel Colascione wrote: > > On Fri, Apr 19, 2019 at 4:02 PM Christian Brauner > wrote: > > > > On Sat, Apr 20, 2019 at 12:35 AM Daniel Colascione > > wrote: > > > > > > On Fri, Apr 19, 2019 at 2:48 PM Christian Brauner > > > wrote: > > > > > > > > On Fri, Apr 19, 2019 at 11:21 PM Daniel Colascione > > > > wrote: > > > > > > > > > > On Fri, Apr 19, 2019 at 1:57 PM Christian Brauner > > > > > wrote: > > > > > > > > > > > > On Fri, Apr 19, 2019 at 10:34 PM Daniel Colascione > > > > > > wrote: > > > > > > > > > > > > > > On Fri, Apr 19, 2019 at 12:49 PM Joel Fernandes > > > > > > > wrote: > > > > > > > > > > > > > > > > On Fri, Apr 19, 2019 at 09:18:59PM +0200, Christian Brauner > > > > > > > > wrote: > > > > > > > > > On Fri, Apr 19, 2019 at 03:02:47PM -0400, Joel Fernandes > > > > > > > > > wrote: > > > > > > > > > > On Thu, Apr 18, 2019 at 07:26:44PM +0200, Christian Brauner > > > > > > > > > > wrote: > > > > > > > > > > > On April 18, 2019 7:23:38 PM GMT+02:00, Jann Horn > > > > > > > > > > > wrote: > > > > > > > > > > > >On Wed, Apr 17, 2019 at 3:09 PM Oleg Nesterov > > > > > > > > > > > > wrote: > > > > > > > > > > > >> On 04/16, Joel Fernandes wrote: > > > > > > > > > > > >> > On Tue, Apr 16, 2019 at 02:04:31PM +0200, Oleg > > > > > > > > > > > >> > Nesterov wrote: > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > Could you explain when it should return POLLIN? > > > > > > > > > > > >> > > When the whole > > > > > > > > > > > >process exits? > > > > > > > > > > > >> > > > > > > > > > > > > >> > It returns POLLIN when the task is dead or doesn't > > > > > > > > > > > >> > exist anymore, > > > > > > > > > > > >or when it > > > > > > > > > > > >> > is in a zombie state and there's no other thread in > > > > > > > > > > > >> > the thread > > > > > > > > > > > >group. > > > > > > > > > > > >> > > > > > > > > > > > >> IOW, when the whole thread group exits, so it can't be > > > > > > > > > > > >> used to > > > > > > > > > > > >monitor sub-threads. > > > > > > > > > > > >> > > > > > > > > > > > >> just in case... speaking of this patch it doesn't > > > > > > > > > > > >> modify > > > > > > > > > > > >proc_tid_base_operations, > > > > > > > > > > > >> so you can't poll("/proc/sub-thread-tid") anyway, but > > > > > > > > > > > >> iiuc you are > > > > > > > > > > > >going to use > > > > > > > > > > > >> the anonymous file returned by CLONE_PIDFD ? > > > > > > > > > > > > > > > > > > > > > > > >I don't think procfs works that way. > > > > > > > > > > > >/proc/sub-thread-tid has > > > > > > > > > > > >proc_tgid_base_operations despite not being a thread > > > > > > > > > > > >group leader. > > > > > > > > > > > >(Yes, that's kinda weird.) AFAICS the WARN_ON_ONCE() in > > > > > > > > > > > >this code can > > > > > > > > > > > >be hit trivially, and then the code will misbehave. > > > > > > > > > > > > > > > > > > > > > > > >@Joel: I think you'll have to either rewrite this to > > > > > > > > > > > >explicitly bail > > > > > > > > > > > >out if you're dealing with a thread group leader, or > > > > > > > > > > > >make the code > > > > > > > > > > > >work for threads, too. > > > > > > > > > > > > > > > > > > > > > > The latter case probably being preferred if this API is > > > > > > > > > > > supposed to be > > > > > > > > > > > useable for thread management in userspace. > > > > > > > > > > > > > > > > > > > > At the moment, we are not planning to use this for > > > > > > > > > > sub-thread management. I > > > > > > > > > > am reworking this patch to only work on clone(2) pidfds > > > > > > > > > > which makes the above > > > > > > > > > > > > > > > > > > Indeed and agreed. > > > > > > > > > > > > > > > > > > > discussion about /proc a bit unnecessary I think. Per the > > > > > > > > > > latest CLONE_PIDFD > > > > > > > > > > patches, CLONE_THREAD with pidfd is not supported. > > > > > > > > > > > > > > > > > > Yes. We have no one asking for it right now and we can easily > > > > > > > > > add this > > > > > > > > > later. > > > > > > > > > > > > > > > > > > Admittedly I haven't gotten around to reviewing the patches > > > > > > > > > here yet > > > > > > > > > completely. But one thing about using POLLIN. FreeBSD is > > > > > > > > > using POLLHUP > > > > > > > > > on process exit which I think is nice as well. How about > > > > > > > > > returning > > > > > > > > > POLLIN | POLLHUP on process exit? > > > > > > > > > We already do things like this. For example, when you proxy > > > > > > > > > between > > > > > > > > > ttys. If the process that you're reading data from has exited > > > > > > > > > and closed > > > > > > > > > it's end you still can't usually simply exit because it might > > > > > > > > > have still > > > > > > > > > buffered data that you want to read. The way one can deal > > > > > > > > > with this > > > > > > > > > from userspace is that you can observe a (POLLHUP | POLLIN) > > > > > > > > > event and > > > >
[PATCHv2] use event name instead of enum to make the call generic
From: Prakhar Srivastava Signed-off-by: Prakhar Srivastava --- remove enaums to control type of buffers entries, instead pass the event name to be used. include/linux/ima.h | 10 ++ kernel/kexec_file.c | 3 +++ security/integrity/ima/ima.h | 2 +- security/integrity/ima/ima_main.c | 30 ++ 4 files changed, 16 insertions(+), 29 deletions(-) diff --git a/include/linux/ima.h b/include/linux/ima.h index 733d0cb9dedc..5e41507c57e5 100644 --- a/include/linux/ima.h +++ b/include/linux/ima.h @@ -14,12 +14,6 @@ #include struct linux_binprm; -enum __buffer_id { - KERNEL_VERSION, - KEXEC_CMDLINE, - MAX_BUFFER_ID = KEXEC_CMDLINE -} buffer_id; - #ifdef CONFIG_IMA extern int ima_bprm_check(struct linux_binprm *bprm); extern int ima_file_check(struct file *file, int mask, int opened); @@ -29,7 +23,7 @@ extern int ima_read_file(struct file *file, enum kernel_read_file_id id); extern int ima_post_read_file(struct file *file, void *buf, loff_t size, enum kernel_read_file_id id); extern void ima_post_path_mknod(struct dentry *dentry); -extern void ima_buffer_check(const void *buff, int size, enum buffer_id id); +extern void ima_buffer_check(const void *buff, int size, char *eventname); #ifdef CONFIG_IMA_KEXEC extern void ima_add_kexec_buffer(struct kimage *image); #endif @@ -72,7 +66,7 @@ static inline void ima_post_path_mknod(struct dentry *dentry) } static inline void ima_buffer_check(const void *buff, int size, - enum buffer_id id) + char *eventname) { return; } diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index b118735fea9d..2a5234eb4b28 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -182,6 +182,9 @@ kimage_file_prepare_segments(struct kimage *image, int kernel_fd, int initrd_fd, ret = -EINVAL; goto out; } + + ima_buffer_check(image->cmdline_buf, cmdline_len - 1, + "kexec_cmdline"); } /* Call arch image load handlers */ diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h index b71f2f6f7421..fcade3c103ed 100644 --- a/security/integrity/ima/ima.h +++ b/security/integrity/ima/ima.h @@ -181,8 +181,8 @@ enum ima_hooks { FIRMWARE_CHECK, KEXEC_KERNEL_CHECK, KEXEC_INITRAMFS_CHECK, - BUFFER_CHECK, POLICY_CHECK, + BUFFER_CHECK, MAX_CHECK }; diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c index 6408cadaadbb..da82c705a5ed 100644 --- a/security/integrity/ima/ima_main.c +++ b/security/integrity/ima/ima_main.c @@ -160,8 +160,7 @@ void ima_file_free(struct file *file) * (Instead of using the file hash the buffer hash is used). * @buff - The buffer that needs to be added to the log * @size - size of buffer(in bytes) - * @id - buffer id, this is differentiator for the various buffers - * that can be measured. + * @id - eventname, event name to be used for buffer measurement. * * The buffer passed is added to the ima logs. * If the sig template is used, then the sig field contains the buffer. @@ -170,7 +169,7 @@ void ima_file_free(struct file *file) * On error cases surface errors from ima calls. */ static int process_buffer_measurement(const void *buff, int size, - enum buffer_id id) + char *eventname) { int ret = -EINVAL; struct ima_template_entry *entry = NULL; @@ -185,23 +184,13 @@ static int process_buffer_measurement(const void *buff, int size, int violation = 0; int pcr = CONFIG_IMA_MEASURE_PCR_IDX; - if (!buff || size == 0) + if (!buff || size == 0 || !eventname) goto err_out; if (ima_get_action(NULL, 0, BUFFER_CHECK, ) != IMA_MEASURE) goto err_out; - switch (buffer_id) { - case KERNEL_VERSION: - name = "Kernel-version"; - break; - case KEXEC_CMDLINE: - name = "Kexec-cmdline"; - break; - default: - goto err_out; - } - + name = eventname; memset(iint, 0, sizeof(*iint)); memset(, 0, sizeof(hash)); @@ -452,15 +441,16 @@ int ima_read_file(struct file *file, enum kernel_read_file_id read_id) * ima_buffer_check - based on policy, collect & store buffer measurement * @buf: pointer to buffer * @size: size of buffer - * @buffer_id: caller identifier + * @eventname: caller identifier * * Buffers can only be measured, not appraised. The buffer identifier - * is used as the measurement list entry name (eg. boot_cmdline). + * is used as the measurement list entry name (eg. boot_cmdline, + * kernel_version). */ -void ima_buffer_check(const void *buf, int size, enum buffer_id id) +void
[PATCHv2] added ima hook for buffer, being enabled as a policy
From: Prakhar Srivastava Signed-off-by: Prakhar Srivastava --- This adds a new ima hook ima_buffer_check and a policy entry BUFFER_CHECK. This enables buffer has measurements into ima log Documentation/ABI/testing/ima_policy | 1 + include/linux/ima.h | 13 +++- security/integrity/ima/ima.h | 1 + security/integrity/ima/ima_main.c| 95 security/integrity/ima/ima_policy.c | 14 +++- 5 files changed, 122 insertions(+), 2 deletions(-) diff --git a/Documentation/ABI/testing/ima_policy b/Documentation/ABI/testing/ima_policy index bb0f9a135e21..676088c7ab26 100644 --- a/Documentation/ABI/testing/ima_policy +++ b/Documentation/ABI/testing/ima_policy @@ -28,6 +28,7 @@ Description: base: func:= [BPRM_CHECK][MMAP_CHECK][FILE_CHECK][MODULE_CHECK] [FIRMWARE_CHECK] [KEXEC_KERNEL_CHECK] [KEXEC_INITRAMFS_CHECK] + [BUFFER_CHECK] mask:= [[^]MAY_READ] [[^]MAY_WRITE] [[^]MAY_APPEND] [[^]MAY_EXEC] fsmagic:= hex value diff --git a/include/linux/ima.h b/include/linux/ima.h index 7f6952f8d6aa..733d0cb9dedc 100644 --- a/include/linux/ima.h +++ b/include/linux/ima.h @@ -14,6 +14,12 @@ #include struct linux_binprm; +enum __buffer_id { + KERNEL_VERSION, + KEXEC_CMDLINE, + MAX_BUFFER_ID = KEXEC_CMDLINE +} buffer_id; + #ifdef CONFIG_IMA extern int ima_bprm_check(struct linux_binprm *bprm); extern int ima_file_check(struct file *file, int mask, int opened); @@ -23,7 +29,7 @@ extern int ima_read_file(struct file *file, enum kernel_read_file_id id); extern int ima_post_read_file(struct file *file, void *buf, loff_t size, enum kernel_read_file_id id); extern void ima_post_path_mknod(struct dentry *dentry); - +extern void ima_buffer_check(const void *buff, int size, enum buffer_id id); #ifdef CONFIG_IMA_KEXEC extern void ima_add_kexec_buffer(struct kimage *image); #endif @@ -65,6 +71,11 @@ static inline void ima_post_path_mknod(struct dentry *dentry) return; } +static inline void ima_buffer_check(const void *buff, int size, + enum buffer_id id) +{ + return; +} #endif /* CONFIG_IMA */ #ifndef CONFIG_IMA_KEXEC diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h index b563fbd4d122..b71f2f6f7421 100644 --- a/security/integrity/ima/ima.h +++ b/security/integrity/ima/ima.h @@ -181,6 +181,7 @@ enum ima_hooks { FIRMWARE_CHECK, KEXEC_KERNEL_CHECK, KEXEC_INITRAMFS_CHECK, + BUFFER_CHECK, POLICY_CHECK, MAX_CHECK }; diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c index 2aebb7984437..6408cadaadbb 100644 --- a/security/integrity/ima/ima_main.c +++ b/security/integrity/ima/ima_main.c @@ -155,6 +155,84 @@ void ima_file_free(struct file *file) ima_check_last_writer(iint, inode, file); } +/* + * process_buffer_measurement - Measure the buffer passed to ima log. + * (Instead of using the file hash the buffer hash is used). + * @buff - The buffer that needs to be added to the log + * @size - size of buffer(in bytes) + * @id - buffer id, this is differentiator for the various buffers + * that can be measured. + * + * The buffer passed is added to the ima logs. + * If the sig template is used, then the sig field contains the buffer. + * + * On success return 0. + * On error cases surface errors from ima calls. + */ +static int process_buffer_measurement(const void *buff, int size, + enum buffer_id id) +{ + int ret = -EINVAL; + struct ima_template_entry *entry = NULL; + struct integrity_iint_cache tmp_iint, *iint = _iint; + struct ima_event_data event_data = {iint, NULL, NULL, + NULL, 0, NULL}; + struct { + struct ima_digest_data hdr; + char digest[IMA_MAX_DIGEST_SIZE]; + } hash; + char *name = NULL; + int violation = 0; + int pcr = CONFIG_IMA_MEASURE_PCR_IDX; + + if (!buff || size == 0) + goto err_out; + + if (ima_get_action(NULL, 0, BUFFER_CHECK, ) != IMA_MEASURE) + goto err_out; + + switch (buffer_id) { + case KERNEL_VERSION: + name = "Kernel-version"; + break; + case KEXEC_CMDLINE: + name = "Kexec-cmdline"; + break; + default: + goto err_out; + } + + memset(iint, 0, sizeof(*iint)); + memset(, 0, sizeof(hash)); + + event_data.filename = name; + + iint->ima_hash = + iint->ima_hash->algo = ima_hash_algo; + iint->ima_hash->length = hash_digest_size[ima_hash_algo]; + + ret = ima_calc_buffer_hash(buff, size, iint->ima_hash); + if (ret < 0) +
[PATCHv2] since cmdline args can be same for multiple kexec, log entry hash will collide. Prepend the kernel file name to the cmdline args to distinguish between cmdline args passed to subsequent kexe
From: Prakhar Srivastava Signed-off-by: Prakhar Srivastava --- since cmdline args can be same for multiple kexec, log entry hash will collide. Prepend the kernel file name to the cmdline args to distinguish between cmdline args passed to subsequent kexec calls kernel/kexec_core.c | 57 + kernel/kexec_file.c | 14 -- kernel/kexec_internal.h | 3 +++ 3 files changed, 72 insertions(+), 2 deletions(-) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index ae1a3ba24df5..97b77c780311 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -1151,3 +1151,60 @@ void __weak arch_kexec_protect_crashkres(void) void __weak arch_kexec_unprotect_crashkres(void) {} + +/** + * kexec_cmdline_prepend_img_name - prepare the buffer with cmdline + * that needs to be measured + * @outbuf - out buffer that contains the formated string + * @kernel_fd - the file identifier for the kerenel image + * @cmdline_ptr - ptr to the cmdline buffer + * @cmdline_len - len of the buffer. + * + * This generates a buffer in the format Kerenelfilename::cmdline + * + * On success return 0. + * On failure return -EINVAL. + */ +int kexec_cmdline_prepend_img_name(char **outbuf, int kernel_fd, + const char *cmdline_ptr, + unsigned long cmdline_len) +{ + int ret = -EINVAL; + struct fd f = {}; + int size = 0; + char *buf = NULL; + char delimiter[] = "::"; + + if (!outbuf || !cmdline_ptr) + goto out; + + f = fdget(kernel_fd); + if (!f.file) + goto out; + + size = (f.file->f_path.dentry->d_name.len + cmdline_len - 1+ + ARRAY_SIZE(delimiter)) - 1; + + buf = kzalloc(size, GFP_KERNEL); + if (!buf) + goto out; + + memcpy(buf, f.file->f_path.dentry->d_name.name, + f.file->f_path.dentry->d_name.len); + memcpy(buf + f.file->f_path.dentry->d_name.len, + delimiter, ARRAY_SIZE(delimiter) - 1); + memcpy(buf + f.file->f_path.dentry->d_name.len + + ARRAY_SIZE(delimiter) - 1, + cmdline_ptr, cmdline_len - 1); + + *outbuf = buf; + ret = size; + + pr_debug("kexec cmdline buff: %s\n", buf); + +out: + if (f.file) + fdput(f); + + return ret; +} diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index 2a5234eb4b28..a487491d55b9 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -126,6 +126,8 @@ kimage_file_prepare_segments(struct kimage *image, int kernel_fd, int initrd_fd, int ret = 0; void *ldata; loff_t size; + char *buff_to_measure = NULL; + int buff_to_measure_size = 0; ret = kernel_read_file_from_fd(kernel_fd, >kernel_buf, , INT_MAX, READING_KEXEC_IMAGE); @@ -183,8 +185,13 @@ kimage_file_prepare_segments(struct kimage *image, int kernel_fd, int initrd_fd, goto out; } - ima_buffer_check(image->cmdline_buf, cmdline_len - 1, - "kexec_cmdline"); + /* IMA measures the cmdline args passed to the next kernel*/ + buff_to_measure_size = kexec_cmdline_prepend_img_name(_to_measure, + kernel_fd, image->cmdline_buf, image->cmdline_buf_len); + + ima_buffer_check(buff_to_measure, buff_to_measure_size, + "kexec_cmdline"); + } /* Call arch image load handlers */ @@ -200,6 +207,9 @@ kimage_file_prepare_segments(struct kimage *image, int kernel_fd, int initrd_fd, /* In case of error, free up all allocated memory in this function */ if (ret) kimage_file_post_load_cleanup(image); + + kfree(buff_to_measure); + return ret; } diff --git a/kernel/kexec_internal.h b/kernel/kexec_internal.h index 799a8a452187..4d34a8ef4637 100644 --- a/kernel/kexec_internal.h +++ b/kernel/kexec_internal.h @@ -11,6 +11,9 @@ int kimage_load_segment(struct kimage *image, struct kexec_segment *segment); void kimage_terminate(struct kimage *image); int kimage_is_destination_range(struct kimage *image, unsigned long start, unsigned long end); +int kexec_cmdline_prepend_img_name(char **outbuf, int kernel_fd, + const char *cmdline_ptr, + unsigned long cmdline_len); extern struct mutex kexec_mutex; -- 2.17.1
Re: [PATCH RFC 1/2] Add polling support to pidfd
On Fri, Apr 19, 2019 at 4:12 PM Christian Brauner wrote: > > On Sat, Apr 20, 2019 at 12:46 AM Daniel Colascione wrote: > > > > On Fri, Apr 19, 2019 at 3:02 PM Christian Brauner > > wrote: > > > > > > On Fri, Apr 19, 2019 at 11:48 PM Christian Brauner > > > wrote: > > > > > > > > On Fri, Apr 19, 2019 at 11:21 PM Daniel Colascione > > > > wrote: > > > > > > > > > > On Fri, Apr 19, 2019 at 1:57 PM Christian Brauner > > > > > wrote: > > > > > > > > > > > > On Fri, Apr 19, 2019 at 10:34 PM Daniel Colascione > > > > > > wrote: > > > > > > > > > > > > > > On Fri, Apr 19, 2019 at 12:49 PM Joel Fernandes > > > > > > > wrote: > > > > > > > > > > > > > > > > On Fri, Apr 19, 2019 at 09:18:59PM +0200, Christian Brauner > > > > > > > > wrote: > > > > > > > > > On Fri, Apr 19, 2019 at 03:02:47PM -0400, Joel Fernandes > > > > > > > > > wrote: > > > > > > > > > > On Thu, Apr 18, 2019 at 07:26:44PM +0200, Christian Brauner > > > > > > > > > > wrote: > > > > > > > > > > > On April 18, 2019 7:23:38 PM GMT+02:00, Jann Horn > > > > > > > > > > > wrote: > > > > > > > > > > > >On Wed, Apr 17, 2019 at 3:09 PM Oleg Nesterov > > > > > > > > > > > > wrote: > > > > > > > > > > > >> On 04/16, Joel Fernandes wrote: > > > > > > > > > > > >> > On Tue, Apr 16, 2019 at 02:04:31PM +0200, Oleg > > > > > > > > > > > >> > Nesterov wrote: > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > Could you explain when it should return POLLIN? > > > > > > > > > > > >> > > When the whole > > > > > > > > > > > >process exits? > > > > > > > > > > > >> > > > > > > > > > > > > >> > It returns POLLIN when the task is dead or doesn't > > > > > > > > > > > >> > exist anymore, > > > > > > > > > > > >or when it > > > > > > > > > > > >> > is in a zombie state and there's no other thread in > > > > > > > > > > > >> > the thread > > > > > > > > > > > >group. > > > > > > > > > > > >> > > > > > > > > > > > >> IOW, when the whole thread group exits, so it can't be > > > > > > > > > > > >> used to > > > > > > > > > > > >monitor sub-threads. > > > > > > > > > > > >> > > > > > > > > > > > >> just in case... speaking of this patch it doesn't > > > > > > > > > > > >> modify > > > > > > > > > > > >proc_tid_base_operations, > > > > > > > > > > > >> so you can't poll("/proc/sub-thread-tid") anyway, but > > > > > > > > > > > >> iiuc you are > > > > > > > > > > > >going to use > > > > > > > > > > > >> the anonymous file returned by CLONE_PIDFD ? > > > > > > > > > > > > > > > > > > > > > > > >I don't think procfs works that way. > > > > > > > > > > > >/proc/sub-thread-tid has > > > > > > > > > > > >proc_tgid_base_operations despite not being a thread > > > > > > > > > > > >group leader. > > > > > > > > > > > >(Yes, that's kinda weird.) AFAICS the WARN_ON_ONCE() in > > > > > > > > > > > >this code can > > > > > > > > > > > >be hit trivially, and then the code will misbehave. > > > > > > > > > > > > > > > > > > > > > > > >@Joel: I think you'll have to either rewrite this to > > > > > > > > > > > >explicitly bail > > > > > > > > > > > >out if you're dealing with a thread group leader, or > > > > > > > > > > > >make the code > > > > > > > > > > > >work for threads, too. > > > > > > > > > > > > > > > > > > > > > > The latter case probably being preferred if this API is > > > > > > > > > > > supposed to be > > > > > > > > > > > useable for thread management in userspace. > > > > > > > > > > > > > > > > > > > > At the moment, we are not planning to use this for > > > > > > > > > > sub-thread management. I > > > > > > > > > > am reworking this patch to only work on clone(2) pidfds > > > > > > > > > > which makes the above > > > > > > > > > > > > > > > > > > Indeed and agreed. > > > > > > > > > > > > > > > > > > > discussion about /proc a bit unnecessary I think. Per the > > > > > > > > > > latest CLONE_PIDFD > > > > > > > > > > patches, CLONE_THREAD with pidfd is not supported. > > > > > > > > > > > > > > > > > > Yes. We have no one asking for it right now and we can easily > > > > > > > > > add this > > > > > > > > > later. > > > > > > > > > > > > > > > > > > Admittedly I haven't gotten around to reviewing the patches > > > > > > > > > here yet > > > > > > > > > completely. But one thing about using POLLIN. FreeBSD is > > > > > > > > > using POLLHUP > > > > > > > > > on process exit which I think is nice as well. How about > > > > > > > > > returning > > > > > > > > > POLLIN | POLLHUP on process exit? > > > > > > > > > We already do things like this. For example, when you proxy > > > > > > > > > between > > > > > > > > > ttys. If the process that you're reading data from has exited > > > > > > > > > and closed > > > > > > > > > it's end you still can't usually simply exit because it might > > > > > > > > > have still > > > > > > > > > buffered data that you want to read. The way one can deal > > > > > > > > > with this > > > > > > > > > from userspace is that you can observe a (POLLHUP | POLLIN) > > > > > > > > > event and > > > > >
Re: [GIT PULL] percpu changes for v5.1-rc6
The pull request you sent on Fri, 19 Apr 2019 17:29:18 -0400: > git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu.git for-5.1-fixes has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/4c3f49ae1306c05e91211c06feddfd0a4a57fabd Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [PATCH RFC 1/2] Add polling support to pidfd
On Fri, Apr 19, 2019 at 4:20 PM Christian Brauner wrote: > > On Sat, Apr 20, 2019 at 1:11 AM Linus Torvalds > wrote: > > > > It's also worth noting that POLLERR/POLLHUP/POLLNVAL cannot be masked > > for "poll()". Even if you only ask for POLLIN/POLLOUT, you will always > > get POLLERR/POLLHUP notification. That is again historical behavior, > > and it's kind of a "you can't poll a hung up fd". But it once again > > means that you should consider POLLHUP to be something *exceptional* > > and final, where no further or other state changes can happen or are > > relevant. > > Which kind of makes sense for process exit. So the historical behavior > here is in our favor and having POLLIN | POLLHUP rather fitting. > It just seems right that POLLHUP indicates "there can be > no more state transitions". Note that that is *not* true of process exit. The final state transition isn't "exit", it is actually "process has been reaped". That's the point where data no longer exists. Arguably "exit()" just means "pidfd is now readable - you can read the status". That sounds very much like a normal POLLIN condition to me, since the whole *point* of read() on pidfd is presumably to read the status. Now, if you want to have other state transitions (ie read execve/fork/whatever state details), then you could say that _those_ state transitions are just POLLIN, but that the exit state transition is POLLIN | POLLHUP. But logically to me it still smells like the process being reaped should be POLLHUP. You could also say that the execve/fork/whatever state is out of band data, and use EPOLLRDBAND for it. But in fact EPOLLPRI might be better for that, because that works well with select() (ei if you want to select for execve/fork, you use the "ex" bitmask). That said, if FreeBSD already has something like this, and people actually have code that uses it, there's also simply a strong argument for "don't be needlessly different". Linus
Re: [PATCH RFC 1/2] Add polling support to pidfd
On Fri, Apr 19, 2019 at 4:33 PM Linus Torvalds wrote: > > On Fri, Apr 19, 2019 at 4:20 PM Christian Brauner > wrote: > > > > On Sat, Apr 20, 2019 at 1:11 AM Linus Torvalds > > wrote: > > > > > > It's also worth noting that POLLERR/POLLHUP/POLLNVAL cannot be masked > > > for "poll()". Even if you only ask for POLLIN/POLLOUT, you will always > > > get POLLERR/POLLHUP notification. That is again historical behavior, > > > and it's kind of a "you can't poll a hung up fd". But it once again > > > means that you should consider POLLHUP to be something *exceptional* > > > and final, where no further or other state changes can happen or are > > > relevant. > > > > Which kind of makes sense for process exit. So the historical behavior > > here is in our favor and having POLLIN | POLLHUP rather fitting. > > It just seems right that POLLHUP indicates "there can be > > no more state transitions". > > Note that that is *not* true of process exit. > > The final state transition isn't "exit", it is actually "process has > been reaped". That's the point where data no longer exists. FWIW, I think the exit status should be available via pidfd even after process reaping. A non-parent holder of a pidfd has no ability to control when the parent reaps the child, or even if reaping is necessary at all --- the parent could make SIGCHLD SIG_IGN. Someone trying to read exit status via a pidfd shouldn't fail to get that exit status just because he lost the race with a concurrent waitpid().
[RESEND PATCH] mm/hmm: Fix initial PFN for hugetlbfs pages
From: Ralph Campbell The mmotm patch [1] adds hugetlbfs support for HMM but the initial PFN used to fill the HMM range->pfns[] array doesn't properly compute the starting PFN offset. This can be tested by running test-hugetlbfs-read from [2]. Fix the PFN offset by adjusting the page offset by the device's page size. Andrew, this should probably be squashed into Jerome's patch. [1] https://marc.info/?l=linux-mm=155432003506068=2 ("mm/hmm: mirror hugetlbfs (snapshoting, faulting and DMA mapping)") [2] https://gitlab.freedesktop.org/glisse/svm-cl-tests Signed-off-by: Ralph Campbell Cc: Jérôme Glisse Cc: Ira Weiny Cc: John Hubbard Cc: Dan Williams Cc: Arnd Bergmann Cc: Balbir Singh Cc: Dan Carpenter Cc: Matthew Wilcox Cc: Souptick Joarder Cc: Andrew Morton --- mm/hmm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/hmm.c b/mm/hmm.c index def451a56c3e..fcf8e4fb5770 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -868,7 +868,7 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask, goto unlock; } - pfn = pte_pfn(entry) + (start & mask); + pfn = pte_pfn(entry) + ((start & mask) >> range->page_shift); for (; addr < end; addr += size, i++, pfn += pfn_inc) range->pfns[i] = hmm_device_entry_from_pfn(range, pfn) | cpu_flags; -- 2.20.1
Re: [PATCH RFC 1/2] Add polling support to pidfd
On Fri, Apr 19, 2019 at 4:02 PM Christian Brauner wrote: > > On Sat, Apr 20, 2019 at 12:35 AM Daniel Colascione wrote: > > > > On Fri, Apr 19, 2019 at 2:48 PM Christian Brauner > > wrote: > > > > > > On Fri, Apr 19, 2019 at 11:21 PM Daniel Colascione > > > wrote: > > > > > > > > On Fri, Apr 19, 2019 at 1:57 PM Christian Brauner > > > > wrote: > > > > > > > > > > On Fri, Apr 19, 2019 at 10:34 PM Daniel Colascione > > > > > wrote: > > > > > > > > > > > > On Fri, Apr 19, 2019 at 12:49 PM Joel Fernandes > > > > > > wrote: > > > > > > > > > > > > > > On Fri, Apr 19, 2019 at 09:18:59PM +0200, Christian Brauner wrote: > > > > > > > > On Fri, Apr 19, 2019 at 03:02:47PM -0400, Joel Fernandes wrote: > > > > > > > > > On Thu, Apr 18, 2019 at 07:26:44PM +0200, Christian Brauner > > > > > > > > > wrote: > > > > > > > > > > On April 18, 2019 7:23:38 PM GMT+02:00, Jann Horn > > > > > > > > > > wrote: > > > > > > > > > > >On Wed, Apr 17, 2019 at 3:09 PM Oleg Nesterov > > > > > > > > > > > wrote: > > > > > > > > > > >> On 04/16, Joel Fernandes wrote: > > > > > > > > > > >> > On Tue, Apr 16, 2019 at 02:04:31PM +0200, Oleg > > > > > > > > > > >> > Nesterov wrote: > > > > > > > > > > >> > > > > > > > > > > > > >> > > Could you explain when it should return POLLIN? When > > > > > > > > > > >> > > the whole > > > > > > > > > > >process exits? > > > > > > > > > > >> > > > > > > > > > > > >> > It returns POLLIN when the task is dead or doesn't > > > > > > > > > > >> > exist anymore, > > > > > > > > > > >or when it > > > > > > > > > > >> > is in a zombie state and there's no other thread in > > > > > > > > > > >> > the thread > > > > > > > > > > >group. > > > > > > > > > > >> > > > > > > > > > > >> IOW, when the whole thread group exits, so it can't be > > > > > > > > > > >> used to > > > > > > > > > > >monitor sub-threads. > > > > > > > > > > >> > > > > > > > > > > >> just in case... speaking of this patch it doesn't modify > > > > > > > > > > >proc_tid_base_operations, > > > > > > > > > > >> so you can't poll("/proc/sub-thread-tid") anyway, but > > > > > > > > > > >> iiuc you are > > > > > > > > > > >going to use > > > > > > > > > > >> the anonymous file returned by CLONE_PIDFD ? > > > > > > > > > > > > > > > > > > > > > >I don't think procfs works that way. /proc/sub-thread-tid > > > > > > > > > > >has > > > > > > > > > > >proc_tgid_base_operations despite not being a thread group > > > > > > > > > > >leader. > > > > > > > > > > >(Yes, that's kinda weird.) AFAICS the WARN_ON_ONCE() in > > > > > > > > > > >this code can > > > > > > > > > > >be hit trivially, and then the code will misbehave. > > > > > > > > > > > > > > > > > > > > > >@Joel: I think you'll have to either rewrite this to > > > > > > > > > > >explicitly bail > > > > > > > > > > >out if you're dealing with a thread group leader, or make > > > > > > > > > > >the code > > > > > > > > > > >work for threads, too. > > > > > > > > > > > > > > > > > > > > The latter case probably being preferred if this API is > > > > > > > > > > supposed to be > > > > > > > > > > useable for thread management in userspace. > > > > > > > > > > > > > > > > > > At the moment, we are not planning to use this for sub-thread > > > > > > > > > management. I > > > > > > > > > am reworking this patch to only work on clone(2) pidfds which > > > > > > > > > makes the above > > > > > > > > > > > > > > > > Indeed and agreed. > > > > > > > > > > > > > > > > > discussion about /proc a bit unnecessary I think. Per the > > > > > > > > > latest CLONE_PIDFD > > > > > > > > > patches, CLONE_THREAD with pidfd is not supported. > > > > > > > > > > > > > > > > Yes. We have no one asking for it right now and we can easily > > > > > > > > add this > > > > > > > > later. > > > > > > > > > > > > > > > > Admittedly I haven't gotten around to reviewing the patches > > > > > > > > here yet > > > > > > > > completely. But one thing about using POLLIN. FreeBSD is using > > > > > > > > POLLHUP > > > > > > > > on process exit which I think is nice as well. How about > > > > > > > > returning > > > > > > > > POLLIN | POLLHUP on process exit? > > > > > > > > We already do things like this. For example, when you proxy > > > > > > > > between > > > > > > > > ttys. If the process that you're reading data from has exited > > > > > > > > and closed > > > > > > > > it's end you still can't usually simply exit because it might > > > > > > > > have still > > > > > > > > buffered data that you want to read. The way one can deal with > > > > > > > > this > > > > > > > > from userspace is that you can observe a (POLLHUP | POLLIN) > > > > > > > > event and > > > > > > > > you keep on reading until you only observe a POLLHUP without a > > > > > > > > POLLIN > > > > > > > > event at which point you know you have read > > > > > > > > all data. > > > > > > > > I like the semantics for pidfds as well as it would indicate: > > > > > > > > - POLLHUP -> process has exited > > > > > > > > - POLLIN ->
[PATCH] Makefile: add kselftest-build target
One of the use-cases is building kselftest and then installing. Adding a target to just build the tests from the main Makefile will help users and makes it easier. Signed-off-by: Shuah Khan --- Makefile | 3 +++ 1 file changed, 3 insertions(+) diff --git a/Makefile b/Makefile index 15c8251d4d5e..38a2ce3c343f 100644 --- a/Makefile +++ b/Makefile @@ -1195,6 +1195,9 @@ endif # --- # Kernel selftest +PHONY += kselftest-build +kselftest-build: + $(Q)$(MAKE) -C $(srctree)/tools/testing/selftests all PHONY += kselftest kselftest: $(Q)$(MAKE) -C $(srctree)/tools/testing/selftests run_tests -- 2.17.1
[PATCH] selftests: build and run gpio when output directory is the src dir
Build and run gpio when output directory is the src dir. gpio has dependency on tools/gpio and builds tools/gpio objects in the src directory in all cases making the src repo dirty even when object relocation is specified. This fixes the following commands from generating gpio objects in the source repository: make O=dir kselftest export KBUILD_OUTPUT=dir; make kselftest make O=dir -C tools/testing/selftests expoert KBUILD_OUTPUT=dir; make -C tools/testing/selftests The following still don't work (need fixing in gpio Makefile): make O=dir kselftest TARGETS="gpio" export KBUILD_OUTPUT=dir; make kselftest TARGETS="gpio" make O=dir -C tools/testing/selftests TARGETS="gpio" expoert KBUILD_OUTPUT=dir; make -C tools/testing/selftests TARGETS="gpio" Signed-off-by: Shuah Khan --- tools/testing/selftests/Makefile | 9 + 1 file changed, 9 insertions(+) diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index b43c5f41fb3e..f2ebf8cf4686 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -99,6 +99,15 @@ ARCH ?= $(SUBARCH) export KSFT_KHDR_INSTALL_DONE := 1 export BUILD +# build and run gpio when output directory is the src dir. +# gpio has dependency on tools/gpio and builds tools/gpio +# objects in the src directory in all cases making the src +# repo dirty even when objects are relocated. +ifneq (1,$(DEFAULT_INSTALL_HDR_PATH)) + TMP := $(filter-out gpio, $(TARGETS)) + TARGETS := $(TMP) +endif + # set default goal to all, so make without a target runs all, even when # all isn't the first target in the file. .DEFAULT_GOAL := all -- 2.17.1
Re: [RFC][PATCH 13/16] sched: Add core wide task selection and scheduling.
On 4/19/19 1:40 AM, Ingo Molnar wrote: * Subhra Mazumdar wrote: I see similar improvement with this patch as removing the condition I earlier mentioned. So that's not needed. I also included the patch for the priority fix. For 2 DB instances, HT disabling stands at -22% for 32 users (from earlier emails). 1 DB instance users baseline %idle core_sched %idle 16 1 84 -4.9% 84 24 1 76 -6.7% 75 32 1 69 -2.4% 69 2 DB instance users baseline %idle core_sched %idle 16 1 66 -19.5% 69 24 1 54 -9.8% 57 32 1 42 -27.2% 48 So HT disabling slows down the 2DB instance by -22%, while core-sched slows it down by -27.2%? Would it be possible to see all the results in two larger tables (1 DB instance and 2 DB instance) so that we can compare the performance of the 3 kernel variants with each other: - "vanilla +HT": Hyperthreading enabled, vanilla scheduler - "vanilla -HT": Hyperthreading disabled, vanilla scheduler - "core_sched": Hyperthreading enabled, core-scheduling enabled ? Thanks, Ingo Following are the numbers. Disabling HT gives improvement in some cases. 1 DB instance users vanilla+HT core_sched vanilla-HT 16 1 -4.9% -11.7% 24 1 -6.7% +13.7% 32 1 -2.4% +8% 2 DB instance users vanilla+HT core_sched vanilla-HT 16 1 -19.5% +5.6% 24 1 -9.8% +3.5% 32 1 -27.2% -22.8%
Re: [PATCH RFC 1/2] Add polling support to pidfd
On Sat, Apr 20, 2019 at 1:11 AM Linus Torvalds wrote: > > On Fri, Apr 19, 2019 at 2:20 PM Joel Fernandes wrote: > > > > According to Linus, POLLHUP usually indicates that something is readable: > > Note that if you use the legacy interfaces (ie "select()"), then even > just a plain POLLHUP will always just show as "readable". > > So for a lot of applications, it won't even matter. Returning POLLHUP > will mean it's readable whether POLLIN is set or not (and EPOLLERR > will automatically mean that it's marked both readable and writable). > > In fact, I'd argue that not a lot of people care about the more > esoteric bits at all. If your poll function does POLLIN and POLLOUT, > it's fine. Anything else is specialized enough that most people simply > won't care. Don't overdesign things too much. You need to have a major > reason to care about the other POLL bits. > > It's also worth noting that POLLERR/POLLHUP/POLLNVAL cannot be masked > for "poll()". Even if you only ask for POLLIN/POLLOUT, you will always > get POLLERR/POLLHUP notification. That is again historical behavior, > and it's kind of a "you can't poll a hung up fd". But it once again > means that you should consider POLLHUP to be something *exceptional* > and final, where no further or other state changes can happen or are > relevant. Which kind of makes sense for process exit. So the historical behavior here is in our favor and having POLLIN | POLLHUP rather fitting. It just seems right that POLLHUP indicates "there can be no more state transitions". Christian
Re: [PATCH RFC 1/2] Add polling support to pidfd
On Sat, Apr 20, 2019 at 12:46 AM Daniel Colascione wrote: > > On Fri, Apr 19, 2019 at 3:02 PM Christian Brauner > wrote: > > > > On Fri, Apr 19, 2019 at 11:48 PM Christian Brauner > > wrote: > > > > > > On Fri, Apr 19, 2019 at 11:21 PM Daniel Colascione > > > wrote: > > > > > > > > On Fri, Apr 19, 2019 at 1:57 PM Christian Brauner > > > > wrote: > > > > > > > > > > On Fri, Apr 19, 2019 at 10:34 PM Daniel Colascione > > > > > wrote: > > > > > > > > > > > > On Fri, Apr 19, 2019 at 12:49 PM Joel Fernandes > > > > > > wrote: > > > > > > > > > > > > > > On Fri, Apr 19, 2019 at 09:18:59PM +0200, Christian Brauner wrote: > > > > > > > > On Fri, Apr 19, 2019 at 03:02:47PM -0400, Joel Fernandes wrote: > > > > > > > > > On Thu, Apr 18, 2019 at 07:26:44PM +0200, Christian Brauner > > > > > > > > > wrote: > > > > > > > > > > On April 18, 2019 7:23:38 PM GMT+02:00, Jann Horn > > > > > > > > > > wrote: > > > > > > > > > > >On Wed, Apr 17, 2019 at 3:09 PM Oleg Nesterov > > > > > > > > > > > wrote: > > > > > > > > > > >> On 04/16, Joel Fernandes wrote: > > > > > > > > > > >> > On Tue, Apr 16, 2019 at 02:04:31PM +0200, Oleg > > > > > > > > > > >> > Nesterov wrote: > > > > > > > > > > >> > > > > > > > > > > > > >> > > Could you explain when it should return POLLIN? When > > > > > > > > > > >> > > the whole > > > > > > > > > > >process exits? > > > > > > > > > > >> > > > > > > > > > > > >> > It returns POLLIN when the task is dead or doesn't > > > > > > > > > > >> > exist anymore, > > > > > > > > > > >or when it > > > > > > > > > > >> > is in a zombie state and there's no other thread in > > > > > > > > > > >> > the thread > > > > > > > > > > >group. > > > > > > > > > > >> > > > > > > > > > > >> IOW, when the whole thread group exits, so it can't be > > > > > > > > > > >> used to > > > > > > > > > > >monitor sub-threads. > > > > > > > > > > >> > > > > > > > > > > >> just in case... speaking of this patch it doesn't modify > > > > > > > > > > >proc_tid_base_operations, > > > > > > > > > > >> so you can't poll("/proc/sub-thread-tid") anyway, but > > > > > > > > > > >> iiuc you are > > > > > > > > > > >going to use > > > > > > > > > > >> the anonymous file returned by CLONE_PIDFD ? > > > > > > > > > > > > > > > > > > > > > >I don't think procfs works that way. /proc/sub-thread-tid > > > > > > > > > > >has > > > > > > > > > > >proc_tgid_base_operations despite not being a thread group > > > > > > > > > > >leader. > > > > > > > > > > >(Yes, that's kinda weird.) AFAICS the WARN_ON_ONCE() in > > > > > > > > > > >this code can > > > > > > > > > > >be hit trivially, and then the code will misbehave. > > > > > > > > > > > > > > > > > > > > > >@Joel: I think you'll have to either rewrite this to > > > > > > > > > > >explicitly bail > > > > > > > > > > >out if you're dealing with a thread group leader, or make > > > > > > > > > > >the code > > > > > > > > > > >work for threads, too. > > > > > > > > > > > > > > > > > > > > The latter case probably being preferred if this API is > > > > > > > > > > supposed to be > > > > > > > > > > useable for thread management in userspace. > > > > > > > > > > > > > > > > > > At the moment, we are not planning to use this for sub-thread > > > > > > > > > management. I > > > > > > > > > am reworking this patch to only work on clone(2) pidfds which > > > > > > > > > makes the above > > > > > > > > > > > > > > > > Indeed and agreed. > > > > > > > > > > > > > > > > > discussion about /proc a bit unnecessary I think. Per the > > > > > > > > > latest CLONE_PIDFD > > > > > > > > > patches, CLONE_THREAD with pidfd is not supported. > > > > > > > > > > > > > > > > Yes. We have no one asking for it right now and we can easily > > > > > > > > add this > > > > > > > > later. > > > > > > > > > > > > > > > > Admittedly I haven't gotten around to reviewing the patches > > > > > > > > here yet > > > > > > > > completely. But one thing about using POLLIN. FreeBSD is using > > > > > > > > POLLHUP > > > > > > > > on process exit which I think is nice as well. How about > > > > > > > > returning > > > > > > > > POLLIN | POLLHUP on process exit? > > > > > > > > We already do things like this. For example, when you proxy > > > > > > > > between > > > > > > > > ttys. If the process that you're reading data from has exited > > > > > > > > and closed > > > > > > > > it's end you still can't usually simply exit because it might > > > > > > > > have still > > > > > > > > buffered data that you want to read. The way one can deal with > > > > > > > > this > > > > > > > > from userspace is that you can observe a (POLLHUP | POLLIN) > > > > > > > > event and > > > > > > > > you keep on reading until you only observe a POLLHUP without a > > > > > > > > POLLIN > > > > > > > > event at which point you know you have read > > > > > > > > all data. > > > > > > > > I like the semantics for pidfds as well as it would indicate: > > > > > > > > - POLLHUP -> process has exited > > > > > > > > - POLLIN
Re: [PATCH RFC 1/2] Add polling support to pidfd
On Fri, Apr 19, 2019 at 2:20 PM Joel Fernandes wrote: > > According to Linus, POLLHUP usually indicates that something is readable: Note that if you use the legacy interfaces (ie "select()"), then even just a plain POLLHUP will always just show as "readable". So for a lot of applications, it won't even matter. Returning POLLHUP will mean it's readable whether POLLIN is set or not (and EPOLLERR will automatically mean that it's marked both readable and writable). In fact, I'd argue that not a lot of people care about the more esoteric bits at all. If your poll function does POLLIN and POLLOUT, it's fine. Anything else is specialized enough that most people simply won't care. Don't overdesign things too much. You need to have a major reason to care about the other POLL bits. It's also worth noting that POLLERR/POLLHUP/POLLNVAL cannot be masked for "poll()". Even if you only ask for POLLIN/POLLOUT, you will always get POLLERR/POLLHUP notification. That is again historical behavior, and it's kind of a "you can't poll a hung up fd". But it once again means that you should consider POLLHUP to be something *exceptional* and final, where no further or other state changes can happen or are relevant. That may well work fine for pidfd when the task is gone, but it's really worth noting that any *normal* state should be about POLLIN/POLLOUT. People should not think that "POLLHUP sounds like the appropriate name", they should realize that POLLHUP is basically a terminal error condition, not a "input is available". So just keep that in mind. Linus
Re: [PATCH 2/3] ARM: at91: move platform-specific asm-offset.h to arch/arm/mach-at91
On Sat, Apr 20, 2019 at 4:03 AM Ludovic Desroches wrote: > > On Mon, Apr 15, 2019 at 05:14:50PM +0200, Alexandre Belloni wrote: > > External E-Mail > > > > > > On 08/04/2019 16:54:26+0900, Masahiro Yamada wrote: > > > is only generated and included > > > by arch/arm/mach-at91/, so it does not need to reside in the > > > globally visible include/generated/. > > > > > > I moved and renamed it to arch/arm/mach-at91/pm_data-offsets.h > > > since the prefix 'at91_' is just redundant in mach-at91/. > > > > > > Signed-off-by: Masahiro Yamada > > Acked-by: Alexandre Belloni > > > > Applied in at91-soc. Let me know if it's an issue, I plan to do the PR > soon. There is one minor issue. If you apply 2/3 (this one) alone, arch/arm/mach-at91/pm_data-offsets.h is not cleaned. 1/3 fixes the "make clean" issue: https://lkml.org/lkml/2019/4/8/153 That is why I sent this as a series in order to avoid the regression of cleaning. Thanks. Masahiro Yamada > Regards > > Ludovic > > > > --- > > > > > > Can this be applied to ARM-SOC tree in a series? > > > (with Ack from the platform sub-maintainer.) > > > > > > at91_pm_data-offsets.h header does not need to reside in > > > include/generated/, but you may ask > > > "Why must it get out of include/generated/?" > > > > > > My main motivation is to avoid a race condition in the currently > > > proposed patch: > > > > > > https://lore.kernel.org/patchwork/patch/1052763/ > > > > > > This patch tries to embed some build artifacts into the kernel. > > > > > > If arch/arm/mach-at91/ and kernel/ are built at the same time, > > > it may embed a truncated file. > > > > > > > > > arch/arm/mach-at91/.gitignore | 1 + > > > arch/arm/mach-at91/Makefile | 5 +++-- > > > arch/arm/mach-at91/pm_suspend.S | 2 +- > > > 3 files changed, 5 insertions(+), 3 deletions(-) > > > create mode 100644 arch/arm/mach-at91/.gitignore > > > > > > diff --git a/arch/arm/mach-at91/.gitignore b/arch/arm/mach-at91/.gitignore > > > new file mode 100644 > > > index ..2ecd6f51c8a9 > > > --- /dev/null > > > +++ b/arch/arm/mach-at91/.gitignore > > > @@ -0,0 +1 @@ > > > +pm_data-offsets.h > > > diff --git a/arch/arm/mach-at91/Makefile b/arch/arm/mach-at91/Makefile > > > index 31b61f0e1c07..de64301dcff2 100644 > > > --- a/arch/arm/mach-at91/Makefile > > > +++ b/arch/arm/mach-at91/Makefile > > > @@ -19,9 +19,10 @@ ifeq ($(CONFIG_PM_DEBUG),y) > > > CFLAGS_pm.o += -DDEBUG > > > endif > > > > > > -include/generated/at91_pm_data-offsets.h: > > > arch/arm/mach-at91/pm_data-offsets.s FORCE > > > +$(obj)/pm_data-offsets.h: $(obj)/pm_data-offsets.s FORCE > > > $(call filechk,offsets,__PM_DATA_OFFSETS_H__) > > > > > > -arch/arm/mach-at91/pm_suspend.o: include/generated/at91_pm_data-offsets.h > > > +$(obj)/pm_suspend.o: $(obj)/pm_data-offsets.h > > > > > > targets += pm_data-offsets.s > > > +clean-files += pm_data-offsets.h > > > diff --git a/arch/arm/mach-at91/pm_suspend.S > > > b/arch/arm/mach-at91/pm_suspend.S > > > index bfe1c4d06901..a31c1b20f3fa 100644 > > > --- a/arch/arm/mach-at91/pm_suspend.S > > > +++ b/arch/arm/mach-at91/pm_suspend.S > > > @@ -14,7 +14,7 @@ > > > #include > > > #include > > > #include "pm.h" > > > -#include "generated/at91_pm_data-offsets.h" > > > +#include "pm_data-offsets.h" > > > > > > #defineSRAMC_SELF_FRESH_ACTIVE 0x01 > > > #defineSRAMC_SELF_FRESH_EXIT 0x00 > > > -- > > > 2.17.1 > > > > > > > -- > > Alexandre Belloni, Bootlin > > Embedded Linux and Kernel engineering > > https://bootlin.com > > -- Best Regards Masahiro Yamada
Re: Detecting x86 LAPIC timer frequency from CPUID data
On Fri, 19 Apr 2019 22:52:01 +0200 (CEST) Thomas Gleixner wrote: > On Fri, 19 Apr 2019, Jacob Pan wrote: > > On Fri, 19 Apr 2019 10:57:10 +0200 (CEST) > > Thomas Gleixner wrote: > > > On Fri, 19 Apr 2019, Daniel Drake wrote: > > > > 0x7F vs 0x7FFF, is that intentional? > > > > > > I don't think so. Looks like a failed copy and paste. Cc'ed > > > Jacob, he might know. > > > > > At the time of v2.6.35 both places use 0x7F. But later this > > patch increased the latter to 0x7FFF but forgot the first part. > > So I guess it is not exactly a failed copy and paste. > > > > commit 4aed89d6b515b9185351706ca95cd712c9d8d6a3 > > Author: Pierre Tardy > > Date: Thu Jan 6 16:23:29 2011 +0100 > > > > x86, lapic-timer: Increase the max_delta to 31 bits > > Indeed. Thanks for digging that up! > > tglx How about a fix like this? I should have taken your advice 9 years ago to avoid duplicated code :) https://lkml.org/lkml/2010/5/11/499 >From 18450bb67e09f5b472f1ed313d7f87a983cb0ac1 Mon Sep 17 00:00:00 2001 From: Jacob Pan Date: Fri, 19 Apr 2019 15:56:06 -0700 Subject: [PATCH] x86/apic: Fix duplicated lapic timer calculation Local APIC timer clockevent parameters can be calculated based on platform specific methods. However the code is mostly duplicated with the interrupt based calibration. This causes further updates prone to mistakes. Fixes: 4aed89d ("x86, lapic-timer: Increase the max_delta to 31 bits") Signed-off-by: Jacob Pan --- arch/x86/kernel/apic/apic.c | 46 ++--- 1 file changed, 27 insertions(+), 19 deletions(-) diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index b7bcdd7..b2ef91c 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -802,6 +802,24 @@ calibrate_by_pmtimer(long deltapm, long *delta, long *deltatsc) return 0; } +static int __init calculate_lapic_clockevent(void) +{ + if (!lapic_timer_frequency) + return -1; + + /* Calculate the scaled math multiplication factor */ + lapic_clockevent.mult = div_sc(lapic_timer_frequency/APIC_DIVISOR, + TICK_NSEC, lapic_clockevent.shift); + lapic_clockevent.max_delta_ns = + clockevent_delta2ns(0x7FFF, _clockevent); + lapic_clockevent.max_delta_ticks = 0x7FFF; + lapic_clockevent.min_delta_ns = + clockevent_delta2ns(0xF, _clockevent); + lapic_clockevent.min_delta_ticks = 0xF; + + return 0; +} + static int __init calibrate_APIC_clock(void) { struct clock_event_device *levt = this_cpu_ptr(_events); @@ -818,18 +836,17 @@ static int __init calibrate_APIC_clock(void) if (boot_cpu_has(X86_FEATURE_TSC_DEADLINE_TIMER)) { return 0; - } else if (lapic_timer_frequency) { + } + + if (!calculate_lapic_clockevent()) { apic_printk(APIC_VERBOSE, "lapic timer already calibrated %d\n", lapic_timer_frequency); - lapic_clockevent.mult = div_sc(lapic_timer_frequency/APIC_DIVISOR, - TICK_NSEC, lapic_clockevent.shift); - lapic_clockevent.max_delta_ns = - clockevent_delta2ns(0x7F, _clockevent); - lapic_clockevent.max_delta_ticks = 0x7F; - lapic_clockevent.min_delta_ns = - clockevent_delta2ns(0xF, _clockevent); - lapic_clockevent.min_delta_ticks = 0xF; + /* +* Newer platforms provide early calibration methods must have always +* running local APIC timers, no need for boradcast timer. +*/ lapic_clockevent.features &= ~CLOCK_EVT_FEAT_DUMMY; + return 0; } @@ -869,17 +886,8 @@ static int __init calibrate_APIC_clock(void) pm_referenced = !calibrate_by_pmtimer(lapic_cal_pm2 - lapic_cal_pm1, , ); - /* Calculate the scaled math multiplication factor */ - lapic_clockevent.mult = div_sc(delta, TICK_NSEC * LAPIC_CAL_LOOPS, - lapic_clockevent.shift); - lapic_clockevent.max_delta_ns = - clockevent_delta2ns(0x7FFF, _clockevent); - lapic_clockevent.max_delta_ticks = 0x7FFF; - lapic_clockevent.min_delta_ns = - clockevent_delta2ns(0xF, _clockevent); - lapic_clockevent.min_delta_ticks = 0xF; - lapic_timer_frequency = (delta * APIC_DIVISOR) / LAPIC_CAL_LOOPS; + calculate_lapic_clockevent(); apic_printk(APIC_VERBOSE, ". delta %ld\n", delta); apic_printk(APIC_VERBOSE, ". mult: %u\n", lapic_clockevent.mult); -- 2.7.4
Re: [PATCH v4 2/2] power_supply: platform/chrome: wilco_ec: Add charging config driver
Hi Nick, Thank you for the patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v5.1-rc5 next-20190418] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Nick-Crews/power_supply-Add-more-charge-types-and-CHARGE_CONTROL_-properties/20190420-045118 config: i386-randconfig-x017-201915 (attached as .config) compiler: gcc-7 (Debian 7.3.0-1) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=i386 If you fix the issue, kindly add following tag Reported-by: kbuild test robot All errors (new ones prefixed by >>): drivers/platform/chrome/wilco_ec/core.c: In function 'wilco_ec_probe': >> drivers/platform/chrome/wilco_ec/core.c:106:33: error: 'struct >> wilco_ec_device' has no member named 'kbbl_pdev'; did you mean 'rtc_pdev'? platform_device_unregister(ec->kbbl_pdev); ^ rtc_pdev drivers/platform/chrome/wilco_ec/core.c:105:1: warning: label 'unregister_kbbl' defined but not used [-Wunused-label] unregister_kbbl: ^~~ vim +106 drivers/platform/chrome/wilco_ec/core.c 40 41 static int wilco_ec_probe(struct platform_device *pdev) 42 { 43 struct device *dev = >dev; 44 struct wilco_ec_device *ec; 45 int ret; 46 47 ec = devm_kzalloc(dev, sizeof(*ec), GFP_KERNEL); 48 if (!ec) 49 return -ENOMEM; 50 51 platform_set_drvdata(pdev, ec); 52 ec->dev = dev; 53 mutex_init(>mailbox_lock); 54 55 /* Largest data buffer size requirement is extended data response */ 56 ec->data_size = sizeof(struct wilco_ec_response) + 57 EC_MAILBOX_DATA_SIZE_EXTENDED; 58 ec->data_buffer = devm_kzalloc(dev, ec->data_size, GFP_KERNEL); 59 if (!ec->data_buffer) 60 return -ENOMEM; 61 62 /* Prepare access to IO regions provided by ACPI */ 63 ec->io_data = wilco_get_resource(pdev, 0); /* Host Data */ 64 ec->io_command = wilco_get_resource(pdev, 1); /* Host Command */ 65 ec->io_packet = wilco_get_resource(pdev, 2);/* MEC EMI */ 66 if (!ec->io_data || !ec->io_command || !ec->io_packet) 67 return -ENODEV; 68 69 /* Initialize cros_ec register interface for communication */ 70 cros_ec_lpc_mec_init(ec->io_packet->start, 71 ec->io_packet->start + EC_MAILBOX_DATA_SIZE); 72 73 /* 74 * Register a child device that will be found by the debugfs driver. 75 * Ignore failure. 76 */ 77 ec->debugfs_pdev = platform_device_register_data(dev, 78 "wilco-ec-debugfs", 79 PLATFORM_DEVID_AUTO, 80 NULL, 0); 81 82 /* Register a child device that will be found by the RTC driver. */ 83 ec->rtc_pdev = platform_device_register_data(dev, "rtc-wilco-ec", 84 PLATFORM_DEVID_AUTO, 85 NULL, 0); 86 if (IS_ERR(ec->rtc_pdev)) { 87 dev_err(dev, "Failed to create RTC platform device\n"); 88 ret = PTR_ERR(ec->rtc_pdev); 89 goto unregister_debugfs; 90 } 91 92 /* Register child device to be found by charging config driver. */ 93 ec->charging_pdev = platform_device_register_data(dev, 94 "wilco-ec-charging", 95 PLATFORM_DEVID_AUTO, 96NULL, 0); 97 if (IS_ERR(ec->charging_pdev)) { 98 dev_err(dev, "Failed to create charging platform device\n"); 99 ret = PTR_ERR(ec->charging_pdev); 100 goto unregister_rtc; 101 } 102 103 return 0; 104 105 unregister_kbbl: > 106 platform_device_unregister(ec->kbbl_pdev); 107 unregister_rtc: 108 platform_device_unregister(ec->rtc_pdev); 109 unregister_debugfs: 110 if (ec->debugfs_pdev) 111 platform_device_unregister(ec->debugfs_pdev); 112 cros_ec_lpc_mec_destroy(); 113 return ret; 114 } 115 --- 0-DAY kernel test infrastructure
Re: [PATCH RFC 1/2] Add polling support to pidfd
On Sat, Apr 20, 2019 at 12:35 AM Daniel Colascione wrote: > > On Fri, Apr 19, 2019 at 2:48 PM Christian Brauner > wrote: > > > > On Fri, Apr 19, 2019 at 11:21 PM Daniel Colascione > > wrote: > > > > > > On Fri, Apr 19, 2019 at 1:57 PM Christian Brauner > > > wrote: > > > > > > > > On Fri, Apr 19, 2019 at 10:34 PM Daniel Colascione > > > > wrote: > > > > > > > > > > On Fri, Apr 19, 2019 at 12:49 PM Joel Fernandes > > > > > wrote: > > > > > > > > > > > > On Fri, Apr 19, 2019 at 09:18:59PM +0200, Christian Brauner wrote: > > > > > > > On Fri, Apr 19, 2019 at 03:02:47PM -0400, Joel Fernandes wrote: > > > > > > > > On Thu, Apr 18, 2019 at 07:26:44PM +0200, Christian Brauner > > > > > > > > wrote: > > > > > > > > > On April 18, 2019 7:23:38 PM GMT+02:00, Jann Horn > > > > > > > > > wrote: > > > > > > > > > >On Wed, Apr 17, 2019 at 3:09 PM Oleg Nesterov > > > > > > > > > > wrote: > > > > > > > > > >> On 04/16, Joel Fernandes wrote: > > > > > > > > > >> > On Tue, Apr 16, 2019 at 02:04:31PM +0200, Oleg Nesterov > > > > > > > > > >> > wrote: > > > > > > > > > >> > > > > > > > > > > > >> > > Could you explain when it should return POLLIN? When > > > > > > > > > >> > > the whole > > > > > > > > > >process exits? > > > > > > > > > >> > > > > > > > > > > >> > It returns POLLIN when the task is dead or doesn't exist > > > > > > > > > >> > anymore, > > > > > > > > > >or when it > > > > > > > > > >> > is in a zombie state and there's no other thread in the > > > > > > > > > >> > thread > > > > > > > > > >group. > > > > > > > > > >> > > > > > > > > > >> IOW, when the whole thread group exits, so it can't be > > > > > > > > > >> used to > > > > > > > > > >monitor sub-threads. > > > > > > > > > >> > > > > > > > > > >> just in case... speaking of this patch it doesn't modify > > > > > > > > > >proc_tid_base_operations, > > > > > > > > > >> so you can't poll("/proc/sub-thread-tid") anyway, but iiuc > > > > > > > > > >> you are > > > > > > > > > >going to use > > > > > > > > > >> the anonymous file returned by CLONE_PIDFD ? > > > > > > > > > > > > > > > > > > > >I don't think procfs works that way. /proc/sub-thread-tid has > > > > > > > > > >proc_tgid_base_operations despite not being a thread group > > > > > > > > > >leader. > > > > > > > > > >(Yes, that's kinda weird.) AFAICS the WARN_ON_ONCE() in this > > > > > > > > > >code can > > > > > > > > > >be hit trivially, and then the code will misbehave. > > > > > > > > > > > > > > > > > > > >@Joel: I think you'll have to either rewrite this to > > > > > > > > > >explicitly bail > > > > > > > > > >out if you're dealing with a thread group leader, or make > > > > > > > > > >the code > > > > > > > > > >work for threads, too. > > > > > > > > > > > > > > > > > > The latter case probably being preferred if this API is > > > > > > > > > supposed to be > > > > > > > > > useable for thread management in userspace. > > > > > > > > > > > > > > > > At the moment, we are not planning to use this for sub-thread > > > > > > > > management. I > > > > > > > > am reworking this patch to only work on clone(2) pidfds which > > > > > > > > makes the above > > > > > > > > > > > > > > Indeed and agreed. > > > > > > > > > > > > > > > discussion about /proc a bit unnecessary I think. Per the > > > > > > > > latest CLONE_PIDFD > > > > > > > > patches, CLONE_THREAD with pidfd is not supported. > > > > > > > > > > > > > > Yes. We have no one asking for it right now and we can easily add > > > > > > > this > > > > > > > later. > > > > > > > > > > > > > > Admittedly I haven't gotten around to reviewing the patches here > > > > > > > yet > > > > > > > completely. But one thing about using POLLIN. FreeBSD is using > > > > > > > POLLHUP > > > > > > > on process exit which I think is nice as well. How about returning > > > > > > > POLLIN | POLLHUP on process exit? > > > > > > > We already do things like this. For example, when you proxy > > > > > > > between > > > > > > > ttys. If the process that you're reading data from has exited and > > > > > > > closed > > > > > > > it's end you still can't usually simply exit because it might > > > > > > > have still > > > > > > > buffered data that you want to read. The way one can deal with > > > > > > > this > > > > > > > from userspace is that you can observe a (POLLHUP | POLLIN) > > > > > > > event and > > > > > > > you keep on reading until you only observe a POLLHUP without a > > > > > > > POLLIN > > > > > > > event at which point you know you have read > > > > > > > all data. > > > > > > > I like the semantics for pidfds as well as it would indicate: > > > > > > > - POLLHUP -> process has exited > > > > > > > - POLLIN -> information can be read > > > > > > > > > > > > Actually I think a bit different about this, in my opinion the > > > > > > pidfd should > > > > > > always be readable (we would store the exit status somewhere in the > > > > > > future > > > > > > which would be readable, even after task_struct is dead). So
DISCONTIGMEM is deprecated
On Fri, Apr 19, 2019 at 10:43:35AM +0100, Mel Gorman wrote: > DISCONTIG is essentially deprecated and even parisc plans to move to > SPARSEMEM so there is no need to be fancy, this patch simply disables > watermark boosting by default on DISCONTIGMEM. I don't think parisc is the only arch which uses DISCONTIGMEM for !NUMA scenarios. Grepping the arch/ directories shows: alpha (does support NUMA, but also non-NUMA DISCONTIGMEM) arc (for supporting more than 1GB of memory) ia64 (looks complicated ...) m68k (for multiple chunks of memory) mips (does support NUMA but also non-NUMA) parisc (both NUMA and non-NUMA) I'm not sure that these architecture maintainers even know that DISCONTIGMEM is deprecated. Adding linux-arch to the cc.
Re: [PATCH v4 14/16] locking/rwsem: Guard against making count negative
On Fri, Apr 19, 2019 at 02:02:07PM +0200, Peter Zijlstra wrote: > On Fri, Apr 19, 2019 at 12:26:47PM +0200, Peter Zijlstra wrote: > > I thought of a horrible horrible alternative: > > Hurm, that's broken as heck. Let me try again. So I can't make that scheme work, it all ends up wanting to have cmpxchg(). Do we have a performance comparison somewhere of xadd vs cmpxchg readers? I tried looking in the old threads, but I can't seem to locate it. We need new instructions :/ Or more clever than I can muster just now.
[PATCH 2/2] docs: hwmon: remove the extension from .rst files
On almost all places, we're including ReST files without the extension. Let's remove the extension here as well, in order to use just one standard. Suggested-by: Jani Nikula --- Documentation/hwmon/index.rst | 316 +- 1 file changed, 158 insertions(+), 158 deletions(-) diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst index 893804414510..bc74c43e3e2c 100644 --- a/Documentation/hwmon/index.rst +++ b/Documentation/hwmon/index.rst @@ -5,11 +5,11 @@ Linux Hardware Monitoring .. toctree:: :maxdepth: 1 - hwmon-kernel-api.rst - pmbus-core.rst - submitting-patches.rst - sysfs-interface.rst - userspace-tools.rst + hwmon-kernel-api + pmbus-core + submitting-patches + sysfs-interface + userspace-tools Hardware Monitoring Kernel Drivers == @@ -17,159 +17,159 @@ Hardware Monitoring Kernel Drivers .. toctree:: :maxdepth: 1 - ab8500.rst - abituguru.rst - abituguru3.rst - abx500.rst - acpi_power_meter.rst - ad7314.rst - adc128d818.rst - adm1021.rst - adm1025.rst - adm1026.rst - adm1031.rst - adm1275.rst - adm9240.rst - ads1015.rst - ads7828.rst - adt7410.rst - adt7411.rst - adt7462.rst - adt7470.rst - adt7475.rst - amc6821.rst - asb100.rst - asc7621.rst - aspeed-pwm-tacho.rst - coretemp.rst - da9052.rst - da9055.rst - dme1737.rst - ds1621.rst - ds620.rst - emc1403.rst - emc2103.rst - emc6w201.rst - f71805f.rst - f71882fg.rst - fam15h_power.rst - ftsteutates.rst - g760a.rst - g762.rst - gl518sm.rst - hih6130.rst - ibmaem.rst - ibm-cffps.rst - ibmpowernv.rst - ina209.rst - ina2xx.rst - ina3221.rst - ir35221.rst - it87.rst - jc42.rst - k10temp.rst - k8temp.rst - lineage-pem.rst - lm25066.rst - lm63.rst - lm70.rst - lm73.rst - lm75.rst - lm77.rst - lm78.rst - lm80.rst - lm83.rst - lm85.rst - lm87.rst - lm90.rst - lm92.rst - lm93.rst - lm95234.rst - lm95245.rst - ltc2945.rst - ltc2978.rst - ltc2990.rst - ltc3815.rst - ltc4151.rst - ltc4215.rst - ltc4245.rst - ltc4260.rst - ltc4261.rst - max16064.rst - max16065.rst - max1619.rst - max1668.rst - max197.rst - max20751.rst - max31722.rst - max31785.rst - max31790.rst - max34440.rst - max6639.rst - max6642.rst - max6650.rst - max6697.rst - max8688.rst - mc13783-adc.rst - mcp3021.rst - menf21bmc.rst - mlxreg-fan.rst - nct6683.rst - nct6775.rst - nct7802.rst - nct7904.rst - npcm750-pwm-fan.rst - nsa320.rst - ntc_thermistor.rst - occ.rst - pc87360.rst - pc87427.rst - pcf8591.rst - pmbus.rst - powr1220.rst - pwm-fan.rst - raspberrypi-hwmon.rst - sch5627.rst - sch5636.rst - scpi-hwmon.rst - sht15.rst - sht21.rst - sht3x.rst - shtc1.rst - sis5595.rst - smm665.rst - smsc47b397.rst - smsc47m192.rst - smsc47m1.rst - tc654.rst - tc74.rst - thmc50.rst - tmp102.rst - tmp103.rst - tmp108.rst - tmp401.rst - tmp421.rst - tps40422.rst - twl4030-madc-hwmon.rst - ucd9000.rst - ucd9200.rst - vexpress.rst - via686a.rst - vt1211.rst - w83627ehf.rst - w83627hf.rst - w83773g.rst - w83781d.rst - w83791d.rst - w83792d.rst - w83793.rst - w83795.rst - w83l785ts.rst - w83l786ng.rst - wm831x.rst - wm8350.rst - xgene-hwmon.rst - zl6100.rst + ab8500 + abituguru + abituguru3 + abx500 + acpi_power_meter + ad7314 + adc128d818 + adm1021 + adm1025 + adm1026 + adm1031 + adm1275 + adm9240 + ads1015 + ads7828 + adt7410 + adt7411 + adt7462 + adt7470 + adt7475 + amc6821 + asb100 + asc7621 + aspeed-pwm-tacho + coretemp + da9052 + da9055 + dme1737 + ds1621 + ds620 + emc1403 + emc2103 + emc6w201 + f71805f + f71882fg + fam15h_power + ftsteutates + g760a + g762 + gl518sm + hih6130 + ibmaem + ibm-cffps + ibmpowernv + ina209 + ina2xx + ina3221 + ir35221 + it87 + jc42 + k10temp + k8temp + lineage-pem + lm25066 + lm63 + lm70 + lm73 + lm75 + lm77 + lm78 + lm80 + lm83 + lm85 + lm87 + lm90 + lm92 + lm93 + lm95234 + lm95245 + ltc2945 + ltc2978 + ltc2990 + ltc3815 + ltc4151 + ltc4215 + ltc4245 + ltc4260 + ltc4261 + max16064 + max16065 + max1619 + max1668 + max197 + max20751 + max31722 + max31785 + max31790 + max34440 + max6639 + max6642 + max6650 + max6697 + max8688 + mc13783-adc + mcp3021 + menf21bmc + mlxreg-fan + nct6683 + nct6775 + nct7802 + nct7904 + npcm750-pwm-fan + nsa320 + ntc_thermistor + occ + pc87360 + pc87427 + pcf8591 + pmbus + powr1220 + pwm-fan + raspberrypi-hwmon + sch5627 + sch5636 + scpi-hwmon + sht15 + sht21 + sht3x + shtc1 + sis5595 + smm665 + smsc47b397 + smsc47m192 + smsc47m1 + tc654 + tc74 + thmc50 + tmp102 + tmp103 + tmp108 +
Re: [patch V2 28/29] stacktrace: Provide common infrastructure
On Thu, Apr 18, 2019 at 05:42:55PM +0200, Thomas Gleixner wrote: > On Thu, 18 Apr 2019, Josh Poimboeuf wrote: > > Another idea I had (but never got a chance to work on) was to extend the > > x86 unwind interface to all arches. So instead of the callbacks, each > > arch would implement something like this API: > I surely thought about that, but after staring at all incarnations of > arch/*/stacktrace.c I just gave up. > > Aside of that quite some archs already have callback based unwinders > because they use them for more than stacktracing and just have a single > implementation of that loop. > > I'm fine either way. We can start with x86 and then let archs convert over > their stuff, but I wouldn't hold my breath that this will be completed in > the forseeable future. I suggested the same to Thomas early on, and I even spend the time to convert some $random arch to the iterator interface, and while it is indeed entirely feasible, it is _far_ more work. The callback thing OTOH is flexible enough to do what we want to do now, and allows converting most archs to it without too much pain (as Thomas said, many archs are already in this form and only need minor API adjustments), which gets us in a far better place than we are now. And we can always go to iterators later on. But I think getting the generic unwinder improved across all archs is a really important first step here.
Re: [PATCH v4 14/16] locking/rwsem: Guard against making count negative
On Fri, Apr 19, 2019 at 12:26:47PM +0200, Peter Zijlstra wrote: > I thought of a horrible horrible alternative: Hurm, that's broken as heck. Let me try again.
Re: [PATCH RFC 1/2] Add polling support to pidfd
On Fri, Apr 19, 2019 at 3:02 PM Christian Brauner wrote: > > On Fri, Apr 19, 2019 at 11:48 PM Christian Brauner > wrote: > > > > On Fri, Apr 19, 2019 at 11:21 PM Daniel Colascione > > wrote: > > > > > > On Fri, Apr 19, 2019 at 1:57 PM Christian Brauner > > > wrote: > > > > > > > > On Fri, Apr 19, 2019 at 10:34 PM Daniel Colascione > > > > wrote: > > > > > > > > > > On Fri, Apr 19, 2019 at 12:49 PM Joel Fernandes > > > > > wrote: > > > > > > > > > > > > On Fri, Apr 19, 2019 at 09:18:59PM +0200, Christian Brauner wrote: > > > > > > > On Fri, Apr 19, 2019 at 03:02:47PM -0400, Joel Fernandes wrote: > > > > > > > > On Thu, Apr 18, 2019 at 07:26:44PM +0200, Christian Brauner > > > > > > > > wrote: > > > > > > > > > On April 18, 2019 7:23:38 PM GMT+02:00, Jann Horn > > > > > > > > > wrote: > > > > > > > > > >On Wed, Apr 17, 2019 at 3:09 PM Oleg Nesterov > > > > > > > > > > wrote: > > > > > > > > > >> On 04/16, Joel Fernandes wrote: > > > > > > > > > >> > On Tue, Apr 16, 2019 at 02:04:31PM +0200, Oleg Nesterov > > > > > > > > > >> > wrote: > > > > > > > > > >> > > > > > > > > > > > >> > > Could you explain when it should return POLLIN? When > > > > > > > > > >> > > the whole > > > > > > > > > >process exits? > > > > > > > > > >> > > > > > > > > > > >> > It returns POLLIN when the task is dead or doesn't exist > > > > > > > > > >> > anymore, > > > > > > > > > >or when it > > > > > > > > > >> > is in a zombie state and there's no other thread in the > > > > > > > > > >> > thread > > > > > > > > > >group. > > > > > > > > > >> > > > > > > > > > >> IOW, when the whole thread group exits, so it can't be > > > > > > > > > >> used to > > > > > > > > > >monitor sub-threads. > > > > > > > > > >> > > > > > > > > > >> just in case... speaking of this patch it doesn't modify > > > > > > > > > >proc_tid_base_operations, > > > > > > > > > >> so you can't poll("/proc/sub-thread-tid") anyway, but iiuc > > > > > > > > > >> you are > > > > > > > > > >going to use > > > > > > > > > >> the anonymous file returned by CLONE_PIDFD ? > > > > > > > > > > > > > > > > > > > >I don't think procfs works that way. /proc/sub-thread-tid has > > > > > > > > > >proc_tgid_base_operations despite not being a thread group > > > > > > > > > >leader. > > > > > > > > > >(Yes, that's kinda weird.) AFAICS the WARN_ON_ONCE() in this > > > > > > > > > >code can > > > > > > > > > >be hit trivially, and then the code will misbehave. > > > > > > > > > > > > > > > > > > > >@Joel: I think you'll have to either rewrite this to > > > > > > > > > >explicitly bail > > > > > > > > > >out if you're dealing with a thread group leader, or make > > > > > > > > > >the code > > > > > > > > > >work for threads, too. > > > > > > > > > > > > > > > > > > The latter case probably being preferred if this API is > > > > > > > > > supposed to be > > > > > > > > > useable for thread management in userspace. > > > > > > > > > > > > > > > > At the moment, we are not planning to use this for sub-thread > > > > > > > > management. I > > > > > > > > am reworking this patch to only work on clone(2) pidfds which > > > > > > > > makes the above > > > > > > > > > > > > > > Indeed and agreed. > > > > > > > > > > > > > > > discussion about /proc a bit unnecessary I think. Per the > > > > > > > > latest CLONE_PIDFD > > > > > > > > patches, CLONE_THREAD with pidfd is not supported. > > > > > > > > > > > > > > Yes. We have no one asking for it right now and we can easily add > > > > > > > this > > > > > > > later. > > > > > > > > > > > > > > Admittedly I haven't gotten around to reviewing the patches here > > > > > > > yet > > > > > > > completely. But one thing about using POLLIN. FreeBSD is using > > > > > > > POLLHUP > > > > > > > on process exit which I think is nice as well. How about returning > > > > > > > POLLIN | POLLHUP on process exit? > > > > > > > We already do things like this. For example, when you proxy > > > > > > > between > > > > > > > ttys. If the process that you're reading data from has exited and > > > > > > > closed > > > > > > > it's end you still can't usually simply exit because it might > > > > > > > have still > > > > > > > buffered data that you want to read. The way one can deal with > > > > > > > this > > > > > > > from userspace is that you can observe a (POLLHUP | POLLIN) > > > > > > > event and > > > > > > > you keep on reading until you only observe a POLLHUP without a > > > > > > > POLLIN > > > > > > > event at which point you know you have read > > > > > > > all data. > > > > > > > I like the semantics for pidfds as well as it would indicate: > > > > > > > - POLLHUP -> process has exited > > > > > > > - POLLIN -> information can be read > > > > > > > > > > > > Actually I think a bit different about this, in my opinion the > > > > > > pidfd should > > > > > > always be readable (we would store the exit status somewhere in the > > > > > > future > > > > > > which would be readable, even after task_struct is dead). So
Re: [PATCHv2] selftests/efivarfs: clean up test files from test_create*()
Hello, Commit f8a0590f fix some part of this issue. I will send out V3 base on this commit in linux-kselftest next branch. Thank you. On Fri, Apr 19, 2019 at 9:20 PM shuah wrote: > > On 4/19/19 7:12 AM, Po-Hsu Lin wrote: > > Test files created by test_create*() tests will stay in the > > $efivarfs_mount directory unless the system was rebooted. > > > > When the tester tries to run this efivarfs test again on the same > > system, the immutable characteristics in that directory will cause some > > "Operation not permitted" noises and a false-positve test result to the > > test_create_read() test. > > > > > > running test_create > > > > ./efivarfs.sh: line 59: > > /sys/firmware/efi/efivars/test_create-210be57c-9849-4fc7-a635-e6382d1aec27: > > Operation not permitted > >[PASS] > > > > running test_create_empty > > > > ./efivarfs.sh: line 78: > > /sys/firmware/efi/efivars/test_create_empty-210be57c-9849-4fc7-a635-e6382d1aec27: > > Operation not permitted > > [PASS] > > > > running test_create_read > > > > open(O_WRONLY): Operation not permitted > > [FAIL] > > > > > > Create a file_cleanup() to remove those test files in the end of each > > test to solve this issue. > > > > Also, use this function to replace the existing file removal code. > > > > Link: https://bugs.launchpad.net/bugs/1809704 > > > > Signed-off-by: Po-Hsu Lin > > --- > > Thanks for the patch. There is another patch that does the same in > linux-kselftest next branch. > > Please check to see if that fixes the problem you are seeing. > > thanks, > -- Shuah >
[PATCHv2] selftests/efivarfs: clean up test files from test_create*()
Test files created by test_create*() tests will stay in the $efivarfs_mount directory unless the system was rebooted. When the tester tries to run this efivarfs test again on the same system, the immutable characteristics in that directory will cause some "Operation not permitted" noises and a false-positve test result to the test_create_read() test. running test_create ./efivarfs.sh: line 59: /sys/firmware/efi/efivars/test_create-210be57c-9849-4fc7-a635-e6382d1aec27: Operation not permitted [PASS] running test_create_empty ./efivarfs.sh: line 78: /sys/firmware/efi/efivars/test_create_empty-210be57c-9849-4fc7-a635-e6382d1aec27: Operation not permitted [PASS] running test_create_read open(O_WRONLY): Operation not permitted [FAIL] Create a file_cleanup() to remove those test files in the end of each test to solve this issue. Also, use this function to replace the existing file removal code. Link: https://bugs.launchpad.net/bugs/1809704 Signed-off-by: Po-Hsu Lin --- tools/testing/selftests/efivarfs/efivarfs.sh | 19 +-- 1 file changed, 13 insertions(+), 6 deletions(-) mode change 100755 => 100644 tools/testing/selftests/efivarfs/efivarfs.sh diff --git a/tools/testing/selftests/efivarfs/efivarfs.sh b/tools/testing/selftests/efivarfs/efivarfs.sh old mode 100755 new mode 100644 index a47029a..14fa6fe --- a/tools/testing/selftests/efivarfs/efivarfs.sh +++ b/tools/testing/selftests/efivarfs/efivarfs.sh @@ -7,6 +7,12 @@ test_guid=210be57c-9849-4fc7-a635-e6382d1aec27 # Kselftest framework requirement - SKIP code is 4. ksft_skip=4 +file_cleanup() +{ + chattr -i $1 + rm $1 +} + check_prereqs() { local msg="skip all tests:" @@ -58,8 +64,10 @@ test_create() if [ $(stat -c %s $file) -ne 5 ]; then echo "$file has invalid size" >&2 + file_cleanup $file exit 1 fi + file_cleanup $file } test_create_empty() @@ -72,12 +80,14 @@ test_create_empty() echo "$file can not be created without writing" >&2 exit 1 fi + file_cleanup $file } test_create_read() { local file=$efivarfs_mount/$FUNCNAME-$test_guid ./create-read $file + file_cleanup $file } test_delete() @@ -94,8 +104,7 @@ test_delete() rm $file 2>/dev/null if [ $? -ne 0 ]; then - chattr -i $file - rm $file + file_cleanup $file fi if [ -e $file ]; then @@ -152,8 +161,7 @@ test_valid_filenames() else rm $file 2>/dev/null if [ $? -ne 0 ]; then - chattr -i $file - rm $file + file_cleanup $file fi fi done @@ -189,8 +197,7 @@ test_invalid_filenames() echo "Creating $file should have failed" >&2 rm $file 2>/dev/null if [ $? -ne 0 ]; then - chattr -i $file - rm $file + file_cleanup $file fi ret=1 fi -- 2.7.4
[PATCHv3] selftests/efivarfs: clean up test files from test_create*()
Test files created by test_create() and test_create_empty() tests will stay in the $efivarfs_mount directory until the system was rebooted. When the tester tries to run this efivarfs test again on the same system, the immutable characteristics in that directory will cause some "Operation not permitted" noises, and a false-positve test result as the file was created in previous run. running test_create ./efivarfs.sh: line 59: /sys/firmware/efi/efivars/test_create-210be57c-9849-4fc7-a635-e6382d1aec27: Operation not permitted [PASS] running test_create_empty ./efivarfs.sh: line 78: /sys/firmware/efi/efivars/test_create_empty-210be57c-9849-4fc7-a635-e6382d1aec27: Operation not permitted [PASS] Create a file_cleanup() to remove those test files in the end of each test to solve this issue. For the test_create_read, we can move the clean up task to the end of the test to ensure the system is clean. Also, use this function to replace the existing file removal code. Signed-off-by: Po-Hsu Lin --- tools/testing/selftests/efivarfs/efivarfs.sh | 32 +++- 1 file changed, 13 insertions(+), 19 deletions(-) diff --git a/tools/testing/selftests/efivarfs/efivarfs.sh b/tools/testing/selftests/efivarfs/efivarfs.sh index d386610..a90f394 100755 --- a/tools/testing/selftests/efivarfs/efivarfs.sh +++ b/tools/testing/selftests/efivarfs/efivarfs.sh @@ -7,6 +7,12 @@ test_guid=210be57c-9849-4fc7-a635-e6382d1aec27 # Kselftest framework requirement - SKIP code is 4. ksft_skip=4 +file_cleanup() +{ + chattr -i $1 + rm -f $1 +} + check_prereqs() { local msg="skip all tests:" @@ -58,8 +64,10 @@ test_create() if [ $(stat -c %s $file) -ne 5 ]; then echo "$file has invalid size" >&2 + file_cleanup $file exit 1 fi + file_cleanup $file } test_create_empty() @@ -72,16 +80,14 @@ test_create_empty() echo "$file can not be created without writing" >&2 exit 1 fi + file_cleanup $file } test_create_read() { local file=$efivarfs_mount/$FUNCNAME-$test_guid - if [ -f $file]; then - chattr -i $file - rm -rf $file - fi ./create-read $file + file_cleanup $file } test_delete() @@ -96,11 +102,7 @@ test_delete() exit 1 fi - rm $file 2>/dev/null - if [ $? -ne 0 ]; then - chattr -i $file - rm $file - fi + file_cleanup $file if [ -e $file ]; then echo "$file couldn't be deleted" >&2 @@ -154,11 +156,7 @@ test_valid_filenames() echo "$file could not be created" >&2 ret=1 else - rm $file 2>/dev/null - if [ $? -ne 0 ]; then - chattr -i $file - rm $file - fi + file_cleanup $file fi done @@ -191,11 +189,7 @@ test_invalid_filenames() if [ -e $file ]; then echo "Creating $file should have failed" >&2 - rm $file 2>/dev/null - if [ $? -ne 0 ]; then - chattr -i $file - rm $file - fi + file_cleanup $file ret=1 fi done -- 2.7.4
Re: [PATCH] ACPI / LPSS: Don't skip late system PM ops for hibernate on BYT/CHT
On 4/18/19 5:42 AM, Hans de Goede wrote: >> On 4/8/19 2:16 AM, Hans de Goede wrote:> >>> >>> Hmm, interesting so you have hibernation working on a T100TA >>> (with 5.0 + 02e45646d53b reverted), right ? >>> > Still since my patch is regressing things for you I will try to > take a look at this and see if I can reproduce and come up with > a fix. But this is not going to be a high priority thing for me to > work on. > > In the mean time I've gone ahead and submitted my version of the > fix for the problem Kai-Heng was seeing, since that does not seem > to make your problem worse; and it will be good to get that problem > fixed. > > Regards, > > Hans > I've managed to find a way around the i2c_designware timeout issues on the T100TA's. The key is to NOT set DPM_FLAG_SMART_SUSPEND, which was added in the 02e45646d53b commit. To test that I've started with a 5.1-rc5 kernel, applied your recent patch to acpi_lpss.c, then apply the following patch of mine, removing DPM_FLAG_SMART_SUSPEND. (For the T100 hardware I need to apply some other patches as well but those are not related to the i2c-designware or acpi issues addressed here.) On a resume from hibernation I still see one error: "i2c_designware 80860F41:00: Error i2c_dw_xfer called while suspended" but I no longer get the i2c_designware timeouts, and audio does now work after the resume. Removing DPM_FLAG_SMART_SUSPEND may not be what you want for other hardware, but perhaps this will give you a clue as to what is going wrong with hibernate/resume on the T100TA's. With the above now working I've also experimented with my systemd hibernate script and at least in limited testing I only need to remove the brcmfmac and hci_uart drivers before hibernate. I no longer need to remove sound drivers or the other ones shown in my earlier script. Without more testing I'm not sure at what point in kernel development those other driver removals stopped being necessary. Let me know if I can do any other tests to help. Bob Howell --- diff -uprN linux_original/drivers/i2c/busses/i2c-designware-platdrv.c linux/drivers/i2c/busses/i2c-designware-platdrv.c --- linux_original/drivers/i2c/busses/i2c-designware-platdrv.c 2019-04-14 16:17:41.0 -0600 +++ linux/drivers/i2c/busses/i2c-designware-platdrv.c 2019-04-18 22:45:58.836246889 -0600 @@ -367,7 +367,6 @@ static int dw_i2c_plat_probe(struct plat dev_pm_set_driver_flags(>dev, DPM_FLAG_SMART_PREPARE | - DPM_FLAG_SMART_SUSPEND | DPM_FLAG_LEAVE_SUSPENDED); /* The code below assumes runtime PM to be disabled. */ ---
[PATCH] mm/hmm: Fix initial PFN for hugetlbfs pages
From: Ralph Campbell The mmotm patch [1] adds hugetlbfs support for HMM but the initial PFN used to fill the HMM range->pfns[] array doesn't properly compute the starting PFN offset. This can be tested by running test-hugetlbfs-read from [2]. Fix the PFN offset by adjusting the page offset by the device's page size. Andrew, this should probably be squashed into Jerome's patch. [1] https://marc.info/?l=linux-mm=155432003506068=2 ("mm/hmm: mirror hugetlbfs (snapshoting, faulting and DMA mapping)") [2] https://gitlab.freedesktop.org/glisse/svm-cl-tests Signed-off-by: Ralph Campbell Cc: Jérôme Glisse Cc: Ira Weiny Cc: John Hubbard Cc: Dan Williams Cc: Arnd Bergmann Cc: Balbir Singh Cc: Dan Carpenter Cc: Matthew Wilcox Cc: Souptick Joarder Cc: Andrew Morton --- mm/hmm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/hmm.c b/mm/hmm.c index def451a56c3e..fcf8e4fb5770 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -868,7 +868,7 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask, goto unlock; } - pfn = pte_pfn(entry) + (start & mask); + pfn = pte_pfn(entry) + ((start & mask) >> range->page_shift); for (; addr < end; addr += size, i++, pfn += pfn_inc) range->pfns[i] = hmm_device_entry_from_pfn(range, pfn) | cpu_flags; -- 2.20.1
Re: [PATCH RFC 1/2] Add polling support to pidfd
On Fri, Apr 19, 2019 at 3:18 PM Christian Brauner wrote: > > On Sat, Apr 20, 2019 at 12:08 AM Daniel Colascione wrote: > > > > On Fri, Apr 19, 2019 at 2:45 PM Joel Fernandes > > wrote: > > > > > > On Fri, Apr 19, 2019 at 02:24:09PM -0700, Daniel Colascione wrote: > > > > On Fri, Apr 19, 2019 at 2:20 PM Joel Fernandes > > > > wrote: > > > > > > > > > > On Fri, Apr 19, 2019 at 10:57:11PM +0200, Christian Brauner wrote: > > > > > > On Fri, Apr 19, 2019 at 10:34 PM Daniel Colascione > > > > > > wrote: > > > > > > > > > > > > > > On Fri, Apr 19, 2019 at 12:49 PM Joel Fernandes > > > > > > > wrote: > > > > > > > > > > > > > > > > On Fri, Apr 19, 2019 at 09:18:59PM +0200, Christian Brauner > > > > > > > > wrote: > > > > > > > > > On Fri, Apr 19, 2019 at 03:02:47PM -0400, Joel Fernandes > > > > > > > > > wrote: > > > > > > > > > > On Thu, Apr 18, 2019 at 07:26:44PM +0200, Christian Brauner > > > > > > > > > > wrote: > > > > > > > > > > > On April 18, 2019 7:23:38 PM GMT+02:00, Jann Horn > > > > > > > > > > > wrote: > > > > > > > > > > > >On Wed, Apr 17, 2019 at 3:09 PM Oleg Nesterov > > > > > > > > > > > > wrote: > > > > > > > > > > > >> On 04/16, Joel Fernandes wrote: > > > > > > > > > > > >> > On Tue, Apr 16, 2019 at 02:04:31PM +0200, Oleg > > > > > > > > > > > >> > Nesterov wrote: > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > Could you explain when it should return POLLIN? > > > > > > > > > > > >> > > When the whole > > > > > > > > > > > >process exits? > > > > > > > > > > > >> > > > > > > > > > > > > >> > It returns POLLIN when the task is dead or doesn't > > > > > > > > > > > >> > exist anymore, > > > > > > > > > > > >or when it > > > > > > > > > > > >> > is in a zombie state and there's no other thread in > > > > > > > > > > > >> > the thread > > > > > > > > > > > >group. > > > > > > > > > > > >> > > > > > > > > > > > >> IOW, when the whole thread group exits, so it can't be > > > > > > > > > > > >> used to > > > > > > > > > > > >monitor sub-threads. > > > > > > > > > > > >> > > > > > > > > > > > >> just in case... speaking of this patch it doesn't > > > > > > > > > > > >> modify > > > > > > > > > > > >proc_tid_base_operations, > > > > > > > > > > > >> so you can't poll("/proc/sub-thread-tid") anyway, but > > > > > > > > > > > >> iiuc you are > > > > > > > > > > > >going to use > > > > > > > > > > > >> the anonymous file returned by CLONE_PIDFD ? > > > > > > > > > > > > > > > > > > > > > > > >I don't think procfs works that way. > > > > > > > > > > > >/proc/sub-thread-tid has > > > > > > > > > > > >proc_tgid_base_operations despite not being a thread > > > > > > > > > > > >group leader. > > > > > > > > > > > >(Yes, that's kinda weird.) AFAICS the WARN_ON_ONCE() in > > > > > > > > > > > >this code can > > > > > > > > > > > >be hit trivially, and then the code will misbehave. > > > > > > > > > > > > > > > > > > > > > > > >@Joel: I think you'll have to either rewrite this to > > > > > > > > > > > >explicitly bail > > > > > > > > > > > >out if you're dealing with a thread group leader, or > > > > > > > > > > > >make the code > > > > > > > > > > > >work for threads, too. > > > > > > > > > > > > > > > > > > > > > > The latter case probably being preferred if this API is > > > > > > > > > > > supposed to be > > > > > > > > > > > useable for thread management in userspace. > > > > > > > > > > > > > > > > > > > > At the moment, we are not planning to use this for > > > > > > > > > > sub-thread management. I > > > > > > > > > > am reworking this patch to only work on clone(2) pidfds > > > > > > > > > > which makes the above > > > > > > > > > > > > > > > > > > Indeed and agreed. > > > > > > > > > > > > > > > > > > > discussion about /proc a bit unnecessary I think. Per the > > > > > > > > > > latest CLONE_PIDFD > > > > > > > > > > patches, CLONE_THREAD with pidfd is not supported. > > > > > > > > > > > > > > > > > > Yes. We have no one asking for it right now and we can easily > > > > > > > > > add this > > > > > > > > > later. > > > > > > > > > > > > > > > > > > Admittedly I haven't gotten around to reviewing the patches > > > > > > > > > here yet > > > > > > > > > completely. But one thing about using POLLIN. FreeBSD is > > > > > > > > > using POLLHUP > > > > > > > > > on process exit which I think is nice as well. How about > > > > > > > > > returning > > > > > > > > > POLLIN | POLLHUP on process exit? > > > > > > > > > We already do things like this. For example, when you proxy > > > > > > > > > between > > > > > > > > > ttys. If the process that you're reading data from has exited > > > > > > > > > and closed > > > > > > > > > it's end you still can't usually simply exit because it might > > > > > > > > > have still > > > > > > > > > buffered data that you want to read. The way one can deal > > > > > > > > > with this > > > > > > > > > from userspace is that you can observe a (POLLHUP | POLLIN) > > > > > > > > > event and > > > > > > > > > you keep on reading
Re: [PATCH RFC 1/2] Add polling support to pidfd
On Fri, Apr 19, 2019 at 2:48 PM Christian Brauner wrote: > > On Fri, Apr 19, 2019 at 11:21 PM Daniel Colascione wrote: > > > > On Fri, Apr 19, 2019 at 1:57 PM Christian Brauner > > wrote: > > > > > > On Fri, Apr 19, 2019 at 10:34 PM Daniel Colascione > > > wrote: > > > > > > > > On Fri, Apr 19, 2019 at 12:49 PM Joel Fernandes > > > > wrote: > > > > > > > > > > On Fri, Apr 19, 2019 at 09:18:59PM +0200, Christian Brauner wrote: > > > > > > On Fri, Apr 19, 2019 at 03:02:47PM -0400, Joel Fernandes wrote: > > > > > > > On Thu, Apr 18, 2019 at 07:26:44PM +0200, Christian Brauner wrote: > > > > > > > > On April 18, 2019 7:23:38 PM GMT+02:00, Jann Horn > > > > > > > > wrote: > > > > > > > > >On Wed, Apr 17, 2019 at 3:09 PM Oleg Nesterov > > > > > > > > > wrote: > > > > > > > > >> On 04/16, Joel Fernandes wrote: > > > > > > > > >> > On Tue, Apr 16, 2019 at 02:04:31PM +0200, Oleg Nesterov > > > > > > > > >> > wrote: > > > > > > > > >> > > > > > > > > > > >> > > Could you explain when it should return POLLIN? When the > > > > > > > > >> > > whole > > > > > > > > >process exits? > > > > > > > > >> > > > > > > > > > >> > It returns POLLIN when the task is dead or doesn't exist > > > > > > > > >> > anymore, > > > > > > > > >or when it > > > > > > > > >> > is in a zombie state and there's no other thread in the > > > > > > > > >> > thread > > > > > > > > >group. > > > > > > > > >> > > > > > > > > >> IOW, when the whole thread group exits, so it can't be used > > > > > > > > >> to > > > > > > > > >monitor sub-threads. > > > > > > > > >> > > > > > > > > >> just in case... speaking of this patch it doesn't modify > > > > > > > > >proc_tid_base_operations, > > > > > > > > >> so you can't poll("/proc/sub-thread-tid") anyway, but iiuc > > > > > > > > >> you are > > > > > > > > >going to use > > > > > > > > >> the anonymous file returned by CLONE_PIDFD ? > > > > > > > > > > > > > > > > > >I don't think procfs works that way. /proc/sub-thread-tid has > > > > > > > > >proc_tgid_base_operations despite not being a thread group > > > > > > > > >leader. > > > > > > > > >(Yes, that's kinda weird.) AFAICS the WARN_ON_ONCE() in this > > > > > > > > >code can > > > > > > > > >be hit trivially, and then the code will misbehave. > > > > > > > > > > > > > > > > > >@Joel: I think you'll have to either rewrite this to > > > > > > > > >explicitly bail > > > > > > > > >out if you're dealing with a thread group leader, or make the > > > > > > > > >code > > > > > > > > >work for threads, too. > > > > > > > > > > > > > > > > The latter case probably being preferred if this API is > > > > > > > > supposed to be > > > > > > > > useable for thread management in userspace. > > > > > > > > > > > > > > At the moment, we are not planning to use this for sub-thread > > > > > > > management. I > > > > > > > am reworking this patch to only work on clone(2) pidfds which > > > > > > > makes the above > > > > > > > > > > > > Indeed and agreed. > > > > > > > > > > > > > discussion about /proc a bit unnecessary I think. Per the latest > > > > > > > CLONE_PIDFD > > > > > > > patches, CLONE_THREAD with pidfd is not supported. > > > > > > > > > > > > Yes. We have no one asking for it right now and we can easily add > > > > > > this > > > > > > later. > > > > > > > > > > > > Admittedly I haven't gotten around to reviewing the patches here yet > > > > > > completely. But one thing about using POLLIN. FreeBSD is using > > > > > > POLLHUP > > > > > > on process exit which I think is nice as well. How about returning > > > > > > POLLIN | POLLHUP on process exit? > > > > > > We already do things like this. For example, when you proxy between > > > > > > ttys. If the process that you're reading data from has exited and > > > > > > closed > > > > > > it's end you still can't usually simply exit because it might have > > > > > > still > > > > > > buffered data that you want to read. The way one can deal with this > > > > > > from userspace is that you can observe a (POLLHUP | POLLIN) event > > > > > > and > > > > > > you keep on reading until you only observe a POLLHUP without a > > > > > > POLLIN > > > > > > event at which point you know you have read > > > > > > all data. > > > > > > I like the semantics for pidfds as well as it would indicate: > > > > > > - POLLHUP -> process has exited > > > > > > - POLLIN -> information can be read > > > > > > > > > > Actually I think a bit different about this, in my opinion the pidfd > > > > > should > > > > > always be readable (we would store the exit status somewhere in the > > > > > future > > > > > which would be readable, even after task_struct is dead). So I was > > > > > thinking > > > > > we always return EPOLLIN. If process has not exited, then it blocks. > > > > > > > > ITYM that a pidfd polls as readable *once a task exits* and stays > > > > readable forever. Before a task exit, a poll on a pidfd should *not* > > > > yield POLLIN and reading that pidfd should *not* complete
Re: [PATCH] KVM: nVMX: allow tests to use bad virtual-APIC page address
On 04/15/2019 06:35 AM, Paolo Bonzini wrote: As mentioned in the comment, there are some special cases where we can simply clear the TPR shadow bit from the CPU-based execution controls in the vmcs02. Handle them so that we can remove some XFAILs from vmx.flat. Signed-off-by: Paolo Bonzini --- arch/x86/kvm/vmx/nested.c | 25 - arch/x86/kvm/vmx/vmx.c| 2 +- arch/x86/kvm/vmx/vmx.h| 2 ++ 3 files changed, 19 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 7ec9bb1dd723..a22af5a85540 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -2873,20 +2873,27 @@ static void nested_get_vmcs12_pages(struct kvm_vcpu *vcpu) /* * If translation failed, VM entry will fail because * prepare_vmcs02 set VIRTUAL_APIC_PAGE_ADDR to -1ull. -* Failing the vm entry is _not_ what the processor -* does but it's basically the only possibility we -* have. We could still enter the guest if CR8 load -* exits are enabled, CR8 store exits are enabled, and -* virtualize APIC access is disabled; in this case -* the processor would never use the TPR shadow and we -* could simply clear the bit from the execution -* control. But such a configuration is useless, so -* let's keep the code simple. */ if (!is_error_page(page)) { vmx->nested.virtual_apic_page = page; hpa = page_to_phys(vmx->nested.virtual_apic_page); vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, hpa); + } else if (nested_cpu_has(vmcs12, CPU_BASED_CR8_LOAD_EXITING) && + nested_cpu_has(vmcs12, CPU_BASED_CR8_STORE_EXITING) && + !nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) { + /* +* The processor will never use the TPR shadow, simply +* clear the bit from the execution control. Such a +* configuration is useless, but it happens in tests. +* For any other configuration, failing the vm entry is +* _not_ what the processor does but it's basically the +* only possibility we have. +*/ + vmcs_clear_bits(CPU_BASED_VM_EXEC_CONTROL, + CPU_BASED_TPR_SHADOW); + } else { + printk("bad virtual-APIC page address\n"); + dump_vmcs(); } } diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index f8054dc1de65..14cacfd7ffd0 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -5603,7 +5603,7 @@ static void vmx_dump_dtsel(char *name, uint32_t limit) vmcs_readl(limit + GUEST_GDTR_BASE - GUEST_GDTR_LIMIT)); } -static void dump_vmcs(void) +void dump_vmcs(void) { u32 vmentry_ctl = vmcs_read32(VM_ENTRY_CONTROLS); u32 vmexit_ctl = vmcs_read32(VM_EXIT_CONTROLS); diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index a1e00d0a2482..f879529906b4 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -517,4 +517,6 @@ static inline void decache_tsc_multiplier(struct vcpu_vmx *vmx) vmcs_write64(TSC_MULTIPLIER, vmx->current_tsc_ratio); } +void dump_vmcs(void); + #endif /* __KVM_X86_VMX_H */ Reviewed-by: Krish Sadhukhan
Re: [PATCH v2] clk: hi3660: Mark clk_gate_ufs_subsys as critical
Quoting Leo Yan (2019-03-20 03:05:08) > clk_gate_ufs_subsys is a system bus clock, turning off it will > introduce lockup issue during system suspend flow. Let's mark > clk_gate_ufs_subsys as critical clock, thus keeps it on during > system suspend and resume. > > Fixes: d374e6fd5088 ("clk: hisilicon: Add clock driver for hi3660 SoC") > Cc: sta...@vger.kernel.org > Cc: Zhong Kaihua > Cc: John Stultz > Cc: Zhangfei Gao > Suggested-by: Dong Zhang > Signed-off-by: Leo Yan > --- Applied to clk-next
Re: [PATCH RESEND] clk: tegra: Don't enable already enabled PLLs
Quoting Dmitry Osipenko (2019-04-19 04:42:26) > Initially Common Clock Framework isn't aware of the clock-enable status, > this results in enabling of clocks that were enabled by bootloader. This > is not a big deal for a regular clock-gates, but for PLL's it may have > some unpleasant consequences. Thus re-enabling PLLX (the main CPU parent > clock) may result in extra long period of PLL re-locking. > > Acked-by: Peter De Schrijver > Signed-off-by: Dmitry Osipenko > --- Applied to clk-next
Re: [PATCH RFC 1/2] Add polling support to pidfd
On Sat, Apr 20, 2019 at 12:08 AM Daniel Colascione wrote: > > On Fri, Apr 19, 2019 at 2:45 PM Joel Fernandes wrote: > > > > On Fri, Apr 19, 2019 at 02:24:09PM -0700, Daniel Colascione wrote: > > > On Fri, Apr 19, 2019 at 2:20 PM Joel Fernandes > > > wrote: > > > > > > > > On Fri, Apr 19, 2019 at 10:57:11PM +0200, Christian Brauner wrote: > > > > > On Fri, Apr 19, 2019 at 10:34 PM Daniel Colascione > > > > > wrote: > > > > > > > > > > > > On Fri, Apr 19, 2019 at 12:49 PM Joel Fernandes > > > > > > wrote: > > > > > > > > > > > > > > On Fri, Apr 19, 2019 at 09:18:59PM +0200, Christian Brauner wrote: > > > > > > > > On Fri, Apr 19, 2019 at 03:02:47PM -0400, Joel Fernandes wrote: > > > > > > > > > On Thu, Apr 18, 2019 at 07:26:44PM +0200, Christian Brauner > > > > > > > > > wrote: > > > > > > > > > > On April 18, 2019 7:23:38 PM GMT+02:00, Jann Horn > > > > > > > > > > wrote: > > > > > > > > > > >On Wed, Apr 17, 2019 at 3:09 PM Oleg Nesterov > > > > > > > > > > > wrote: > > > > > > > > > > >> On 04/16, Joel Fernandes wrote: > > > > > > > > > > >> > On Tue, Apr 16, 2019 at 02:04:31PM +0200, Oleg > > > > > > > > > > >> > Nesterov wrote: > > > > > > > > > > >> > > > > > > > > > > > > >> > > Could you explain when it should return POLLIN? When > > > > > > > > > > >> > > the whole > > > > > > > > > > >process exits? > > > > > > > > > > >> > > > > > > > > > > > >> > It returns POLLIN when the task is dead or doesn't > > > > > > > > > > >> > exist anymore, > > > > > > > > > > >or when it > > > > > > > > > > >> > is in a zombie state and there's no other thread in > > > > > > > > > > >> > the thread > > > > > > > > > > >group. > > > > > > > > > > >> > > > > > > > > > > >> IOW, when the whole thread group exits, so it can't be > > > > > > > > > > >> used to > > > > > > > > > > >monitor sub-threads. > > > > > > > > > > >> > > > > > > > > > > >> just in case... speaking of this patch it doesn't modify > > > > > > > > > > >proc_tid_base_operations, > > > > > > > > > > >> so you can't poll("/proc/sub-thread-tid") anyway, but > > > > > > > > > > >> iiuc you are > > > > > > > > > > >going to use > > > > > > > > > > >> the anonymous file returned by CLONE_PIDFD ? > > > > > > > > > > > > > > > > > > > > > >I don't think procfs works that way. /proc/sub-thread-tid > > > > > > > > > > >has > > > > > > > > > > >proc_tgid_base_operations despite not being a thread group > > > > > > > > > > >leader. > > > > > > > > > > >(Yes, that's kinda weird.) AFAICS the WARN_ON_ONCE() in > > > > > > > > > > >this code can > > > > > > > > > > >be hit trivially, and then the code will misbehave. > > > > > > > > > > > > > > > > > > > > > >@Joel: I think you'll have to either rewrite this to > > > > > > > > > > >explicitly bail > > > > > > > > > > >out if you're dealing with a thread group leader, or make > > > > > > > > > > >the code > > > > > > > > > > >work for threads, too. > > > > > > > > > > > > > > > > > > > > The latter case probably being preferred if this API is > > > > > > > > > > supposed to be > > > > > > > > > > useable for thread management in userspace. > > > > > > > > > > > > > > > > > > At the moment, we are not planning to use this for sub-thread > > > > > > > > > management. I > > > > > > > > > am reworking this patch to only work on clone(2) pidfds which > > > > > > > > > makes the above > > > > > > > > > > > > > > > > Indeed and agreed. > > > > > > > > > > > > > > > > > discussion about /proc a bit unnecessary I think. Per the > > > > > > > > > latest CLONE_PIDFD > > > > > > > > > patches, CLONE_THREAD with pidfd is not supported. > > > > > > > > > > > > > > > > Yes. We have no one asking for it right now and we can easily > > > > > > > > add this > > > > > > > > later. > > > > > > > > > > > > > > > > Admittedly I haven't gotten around to reviewing the patches > > > > > > > > here yet > > > > > > > > completely. But one thing about using POLLIN. FreeBSD is using > > > > > > > > POLLHUP > > > > > > > > on process exit which I think is nice as well. How about > > > > > > > > returning > > > > > > > > POLLIN | POLLHUP on process exit? > > > > > > > > We already do things like this. For example, when you proxy > > > > > > > > between > > > > > > > > ttys. If the process that you're reading data from has exited > > > > > > > > and closed > > > > > > > > it's end you still can't usually simply exit because it might > > > > > > > > have still > > > > > > > > buffered data that you want to read. The way one can deal with > > > > > > > > this > > > > > > > > from userspace is that you can observe a (POLLHUP | POLLIN) > > > > > > > > event and > > > > > > > > you keep on reading until you only observe a POLLHUP without a > > > > > > > > POLLIN > > > > > > > > event at which point you know you have read > > > > > > > > all data. > > > > > > > > I like the semantics for pidfds as well as it would indicate: > > > > > > > > - POLLHUP -> process has exited > > > > > > > > - POLLIN -> information can be
[PATCH 2/2] ARM: errata: add support for A12/A17 errata CR711784
This adds a code for turning on chicken bit 11, which appears to avoid a potential CPU deadlock that could occur. The exact set of instruction needed to trigger this errata is not totaly known but we have a high level of confidence that the problem is fixed by setting chicken bit 11. All details are in http://crbug.com/711784 This erratum has no known number and thus I have tagged it CR711784 (after the Chrome OS bug number). I have created separate A12 / A17 configs to match how the rest of the A12 / A17 errata is handled. Signed-off-by: Douglas Anderson --- arch/arm/Kconfig | 18 ++ arch/arm/mm/proc-v7.S | 10 ++ 2 files changed, 28 insertions(+) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 4376fe74f95e..34ec9039206b 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1181,6 +1181,13 @@ config ARM_ERRATA_857271 hang. The workaround is expected to have a negligible performance impact. +config ARM_ERRATA_CR711784_A12 + bool "ARM errata: A12: conditional instructions can lead to a CPU hang" + depends on CPU_V7 + help + This option enables the workaround for a Cortex-A12 erratum without a + number. The problems are best described in https://crbug.com/711784 + config ARM_ERRATA_852421 bool "ARM errata: A17: DMB ST might fail to create order between stores" depends on CPU_V7 @@ -1212,6 +1219,17 @@ config ARM_ERRATA_857272 config option from the A12 erratum due to the way errata are checked for and handled. +config ARM_ERRATA_CR711784_A17 + bool "ARM errata: A17: conditional instructions can lead to a CPU hang" + depends on CPU_V7 + help + This option enables the workaround for a Cortex-A17 erratum without a + number. The problems are best described in https://crbug.com/711784 + This erratum is not known to be fixed in any A17 revision. + This is identical to Cortex-A12 erratum CR711784. It is a separate + config option from the A12 erratum due to the way errata are checked + for and handled. + endmenu source "arch/arm/common/Kconfig" diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S index cd2accbab844..a5156ea734ee 100644 --- a/arch/arm/mm/proc-v7.S +++ b/arch/arm/mm/proc-v7.S @@ -396,6 +396,11 @@ __ca12_errata: mrc p15, 0, r10, c15, c0, 1 @ read diagnostic register orr r10, r10, #1 << 10 @ set bit #10 mcr p15, 0, r10, c15, c0, 1 @ write diagnostic register +#endif +#ifdef CONFIG_ARM_ERRATA_CR711784_A12 + mrc p15, 0, r10, c15, c0, 1 @ read diagnostic register + orr r10, r10, #1 << 11 @ set bit #11 + mcr p15, 0, r10, c15, c0, 1 @ write diagnostic register #endif b __errata_finish @@ -416,6 +421,11 @@ __ca17_errata: mrc p15, 0, r10, c15, c0, 1 @ read diagnostic register orr r10, r10, #1 << 10 @ set bit #10 mcr p15, 0, r10, c15, c0, 1 @ write diagnostic register +#endif +#ifdef CONFIG_ARM_ERRATA_CR711784_A17 + mrc p15, 0, r10, c15, c0, 1 @ read diagnostic register + orr r10, r10, #1 << 11 @ set bit #11 + mcr p15, 0, r10, c15, c0, 1 @ write diagnostic register #endif b __errata_finish -- 2.21.0.593.g511ec345e18-goog
[PATCH 1/2] ARM: errata: Workaround errata A12 857271 / A17 857272
From: Sonny Rao This adds support for working around errata A12 857271 / A17 857272. These errata were causing hangs on rk3288-based Chromebooks and it was confirmed that this workaround fixed the problems. In the Chrome OS 3.14 kernel [1] this erratum was known as ERRATA_FOOBAR due to lack of an official number from ARM (though the workaround of setting chicken bit 10 came from ARM). In the meantime ARM came up with official errata numbers but never published the workaround upstream. Let's actually get the workaround landed. [1] https://crrev.com/c/342753 Signed-off-by: Sonny Rao Signed-off-by: Douglas Anderson --- arch/arm/Kconfig | 19 +++ arch/arm/mm/proc-v7.S | 10 ++ 2 files changed, 29 insertions(+) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index b509cd338219..4376fe74f95e 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1172,6 +1172,15 @@ config ARM_ERRATA_825619 DMB NSHST or DMB ISHST instruction followed by a mix of Cacheable and Device/Strongly-Ordered loads and stores might cause deadlock +config ARM_ERRATA_857271 + bool "ARM errata: A12: CPU might deadlock under some very rare internal conditions" + depends on CPU_V7 + help + This option enables the workaround for the 857271 Cortex-A12 + (all revs) erratum. Under very rare timing conditions, the CPU might + hang. The workaround is expected to have a negligible performance + impact. + config ARM_ERRATA_852421 bool "ARM errata: A17: DMB ST might fail to create order between stores" depends on CPU_V7 @@ -1193,6 +1202,16 @@ config ARM_ERRATA_852423 config option from the A12 erratum due to the way errata are checked for and handled. +config ARM_ERRATA_857272 + bool "ARM errata: A17: CPU might deadlock under some very rare internal conditions" + depends on CPU_V7 + help + This option enables the workaround for the 857272 Cortex-A17 erratum. + This erratum is not known to be fixed in any A17 revision. + This is identical to Cortex-A12 erratum 857271. It is a separate + config option from the A12 erratum due to the way errata are checked + for and handled. + endmenu source "arch/arm/common/Kconfig" diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S index 339eb17c9808..cd2accbab844 100644 --- a/arch/arm/mm/proc-v7.S +++ b/arch/arm/mm/proc-v7.S @@ -391,6 +391,11 @@ __ca12_errata: mrc p15, 0, r10, c15, c0, 1 @ read diagnostic register orr r10, r10, #1 << 24 @ set bit #24 mcr p15, 0, r10, c15, c0, 1 @ write diagnostic register +#endif +#ifdef CONFIG_ARM_ERRATA_857271 + mrc p15, 0, r10, c15, c0, 1 @ read diagnostic register + orr r10, r10, #1 << 10 @ set bit #10 + mcr p15, 0, r10, c15, c0, 1 @ write diagnostic register #endif b __errata_finish @@ -406,6 +411,11 @@ __ca17_errata: mrcle p15, 0, r10, c15, c0, 1 @ read diagnostic register orrle r10, r10, #1 << 12 @ set bit #12 mcrle p15, 0, r10, c15, c0, 1 @ write diagnostic register +#endif +#ifdef CONFIG_ARM_ERRATA_857272 + mrc p15, 0, r10, c15, c0, 1 @ read diagnostic register + orr r10, r10, #1 << 10 @ set bit #10 + mcr p15, 0, r10, c15, c0, 1 @ write diagnostic register #endif b __errata_finish -- 2.21.0.593.g511ec345e18-goog
Re: [PATCH v4 8/9] clk: Allow parents to be specified via clkspec index
Quoting Stephen Boyd (2019-04-12 11:31:49) > Some clk providers are simple DT nodes that only have a 'clocks' > property without having an associated 'clock-names' property. In these > cases, we want to let these clk providers point to their parent clks > without having to dereference the 'clocks' property at probe time to > figure out the parent's globally unique clk name. Let's add an 'index' > property to the parent_data structure so that clk providers can indicate > that their parent is a particular index in the 'clocks' DT property. > > Cc: Miquel Raynal > Cc: Jerome Brunet > Cc: Russell King > Cc: Michael Turquette > Cc: Jeffrey Hugo > Cc: Chen-Yu Tsai > Signed-off-by: Stephen Boyd > --- Applied to clk-next
Re: [PATCH v4 4/9] driver core: Let dev_of_node() accept a NULL dev
Quoting Stephen Boyd (2019-04-12 11:31:45) > We'd like to chain this in places where the 'dev' argument might be > NULL. Let this function take a NULL 'dev' so this can work. > > Cc: Miquel Raynal > Cc: Jerome Brunet > Cc: Russell King > Cc: Michael Turquette > Cc: Jeffrey Hugo > Cc: Chen-Yu Tsai > Cc: Greg Kroah-Hartman > Cc: Rob Herring > Signed-off-by: Stephen Boyd > --- Applied to clk-next
Re: [PATCH v4 7/9] clk: Look for parents with clkdev based clk_lookups
Quoting Stephen Boyd (2019-04-12 11:31:48) > In addition to looking for DT based parents, support clkdev based > clk_lookups. This should allow non-DT based clk drivers to participate > in the parent lookup process. > > Cc: Miquel Raynal > Cc: Jerome Brunet > Cc: Russell King > Cc: Michael Turquette > Cc: Jeffrey Hugo > Cc: Chen-Yu Tsai > Signed-off-by: Stephen Boyd > --- Applied to clk-next
Re: [PATCH v4 9/9] clk: fixed-factor: Let clk framework find parent
Quoting Stephen Boyd (2019-04-12 11:31:50) > Convert this driver to a more modern way of specifying parents now that > we have a way to specify clk parents by DT index. This lets us nicely > avoid a problem where a parent clk name isn't know because the parent > clk hasn't been registered yet. > > Cc: Miquel Raynal > Cc: Jerome Brunet > Cc: Russell King > Cc: Michael Turquette > Cc: Jeffrey Hugo > Cc: Chen-Yu Tsai > Signed-off-by: Stephen Boyd > --- Applied to clk-next
Re: [PATCH v4 5/9] clk: Add of_clk_hw_register() API for early clk drivers
Quoting Stephen Boyd (2019-04-12 11:31:46) > In some circumstances drivers register clks early and don't have access > to a struct device because the device model isn't initialized yet. Add > an API to let drivers register clks associated with a struct device_node > so that these drivers can participate in getting parent clks through DT. > > Cc: Miquel Raynal > Cc: Jerome Brunet > Cc: Russell King > Cc: Michael Turquette > Cc: Jeffrey Hugo > Cc: Chen-Yu Tsai > Cc: Rob Herring > Signed-off-by: Stephen Boyd > --- Applied to clk-next
Re: [PATCH v4 6/9] clk: Allow parents to be specified without string names
Quoting Stephen Boyd (2019-04-12 11:31:47) > The common clk framework is lacking in ability to describe the clk > topology without specifying strings for every possible parent-child > link. There are a few drawbacks to the current approach: > > 1) String comparisons are used for everything, including describing > topologies that are 'local' to a single clock controller. > > 2) clk providers (e.g. i2c clk drivers) need to create globally unique > clk names to avoid collisions in the clk namespace, leading to awkward > name generation code in various clk drivers. > > 3) DT bindings may not fully describe the clk topology and linkages > between clk controllers because drivers can easily rely on globally unique > strings to describe connections between clks. > > This leads to confusing DT bindings, complicated clk name generation > code, and inefficient string comparisons during clk registration just so > that the clk framework can detect the topology of the clk tree. > Furthermore, some drivers call clk_get() and then __clk_get_name() to > extract the globally unique clk name just so they can specify the parent > of the clk they're registering. We have of_clk_parent_fill() but that > mostly only works for single clks registered from a DT node, which isn't > the norm. Let's simplify this all by introducing two new ways of > specifying clk parents. > > The first method is an array of pointers to clk_hw structures > corresponding to the parents at that index. This works for clks that are > registered when we have access to all the clk_hw pointers for the > parents. > > The second method is a mix of clk_hw pointers and strings of local and > global parent clk names. If the .fw_name member of the map is set we'll > look for that clk by performing a DT based lookup of the device the clk > is registered with and the .name specified in the map. If that fails, > we'll fallback to the .name member and perform a global clk name lookup > like we've always done before. > > Using either one of these new methods is entirely optional. Existing > drivers will continue to work, and they can migrate to this new approach > as they see fit. Eventually, we'll want to get rid of the 'parent_names' > array in struct clk_init_data and use one of these new methods instead. > > Cc: Miquel Raynal > Cc: Jerome Brunet > Cc: Russell King > Cc: Michael Turquette > Cc: Jeffrey Hugo > Cc: Chen-Yu Tsai > Cc: Rob Herring > Signed-off-by: Stephen Boyd > --- Applied to clk-next
Re: [PATCH v4 3/9] clk: Prepare for clk registration API that uses DT nodes
Quoting Stephen Boyd (2019-04-12 11:31:44) > Split out the body of the clk_register() function so it can be shared > between the different types of registration APIs (DT, device). > > Cc: Miquel Raynal > Cc: Jerome Brunet > Cc: Russell King > Cc: Michael Turquette > Cc: Jeffrey Hugo > Cc: Chen-Yu Tsai > Cc: Rob Herring > Signed-off-by: Stephen Boyd > --- Applied to clk-next
Re: [PATCH v4 2/9] clkdev: Move clk creation outside of 'clocks_mutex'
Quoting Stephen Boyd (2019-04-12 11:31:43) > We don't need to hold the 'clocks_mutex' here when we're creating a clk > pointer from a clk_lookup structure. Instead, we just need to make sure > that the lookup doesn't go away while we dereference the lookup pointer > to extract the clk_hw pointer out of it. Let's move things around > slightly so that we have a new function to get the clk_hw out of the > lookup with the right locking and then chain the two together for what > used to be __clk_get_sys(). > > Cc: Miquel Raynal > Cc: Jerome Brunet > Cc: Russell King > Cc: Michael Turquette > Cc: Jeffrey Hugo > Cc: Chen-Yu Tsai > Signed-off-by: Stephen Boyd > --- Applied to clk-next
Re: [PATCH RFC 1/2] Add polling support to pidfd
On Fri, Apr 19, 2019 at 2:45 PM Joel Fernandes wrote: > > On Fri, Apr 19, 2019 at 02:24:09PM -0700, Daniel Colascione wrote: > > On Fri, Apr 19, 2019 at 2:20 PM Joel Fernandes > > wrote: > > > > > > On Fri, Apr 19, 2019 at 10:57:11PM +0200, Christian Brauner wrote: > > > > On Fri, Apr 19, 2019 at 10:34 PM Daniel Colascione > > > > wrote: > > > > > > > > > > On Fri, Apr 19, 2019 at 12:49 PM Joel Fernandes > > > > > wrote: > > > > > > > > > > > > On Fri, Apr 19, 2019 at 09:18:59PM +0200, Christian Brauner wrote: > > > > > > > On Fri, Apr 19, 2019 at 03:02:47PM -0400, Joel Fernandes wrote: > > > > > > > > On Thu, Apr 18, 2019 at 07:26:44PM +0200, Christian Brauner > > > > > > > > wrote: > > > > > > > > > On April 18, 2019 7:23:38 PM GMT+02:00, Jann Horn > > > > > > > > > wrote: > > > > > > > > > >On Wed, Apr 17, 2019 at 3:09 PM Oleg Nesterov > > > > > > > > > > wrote: > > > > > > > > > >> On 04/16, Joel Fernandes wrote: > > > > > > > > > >> > On Tue, Apr 16, 2019 at 02:04:31PM +0200, Oleg Nesterov > > > > > > > > > >> > wrote: > > > > > > > > > >> > > > > > > > > > > > >> > > Could you explain when it should return POLLIN? When > > > > > > > > > >> > > the whole > > > > > > > > > >process exits? > > > > > > > > > >> > > > > > > > > > > >> > It returns POLLIN when the task is dead or doesn't exist > > > > > > > > > >> > anymore, > > > > > > > > > >or when it > > > > > > > > > >> > is in a zombie state and there's no other thread in the > > > > > > > > > >> > thread > > > > > > > > > >group. > > > > > > > > > >> > > > > > > > > > >> IOW, when the whole thread group exits, so it can't be > > > > > > > > > >> used to > > > > > > > > > >monitor sub-threads. > > > > > > > > > >> > > > > > > > > > >> just in case... speaking of this patch it doesn't modify > > > > > > > > > >proc_tid_base_operations, > > > > > > > > > >> so you can't poll("/proc/sub-thread-tid") anyway, but iiuc > > > > > > > > > >> you are > > > > > > > > > >going to use > > > > > > > > > >> the anonymous file returned by CLONE_PIDFD ? > > > > > > > > > > > > > > > > > > > >I don't think procfs works that way. /proc/sub-thread-tid has > > > > > > > > > >proc_tgid_base_operations despite not being a thread group > > > > > > > > > >leader. > > > > > > > > > >(Yes, that's kinda weird.) AFAICS the WARN_ON_ONCE() in this > > > > > > > > > >code can > > > > > > > > > >be hit trivially, and then the code will misbehave. > > > > > > > > > > > > > > > > > > > >@Joel: I think you'll have to either rewrite this to > > > > > > > > > >explicitly bail > > > > > > > > > >out if you're dealing with a thread group leader, or make > > > > > > > > > >the code > > > > > > > > > >work for threads, too. > > > > > > > > > > > > > > > > > > The latter case probably being preferred if this API is > > > > > > > > > supposed to be > > > > > > > > > useable for thread management in userspace. > > > > > > > > > > > > > > > > At the moment, we are not planning to use this for sub-thread > > > > > > > > management. I > > > > > > > > am reworking this patch to only work on clone(2) pidfds which > > > > > > > > makes the above > > > > > > > > > > > > > > Indeed and agreed. > > > > > > > > > > > > > > > discussion about /proc a bit unnecessary I think. Per the > > > > > > > > latest CLONE_PIDFD > > > > > > > > patches, CLONE_THREAD with pidfd is not supported. > > > > > > > > > > > > > > Yes. We have no one asking for it right now and we can easily add > > > > > > > this > > > > > > > later. > > > > > > > > > > > > > > Admittedly I haven't gotten around to reviewing the patches here > > > > > > > yet > > > > > > > completely. But one thing about using POLLIN. FreeBSD is using > > > > > > > POLLHUP > > > > > > > on process exit which I think is nice as well. How about returning > > > > > > > POLLIN | POLLHUP on process exit? > > > > > > > We already do things like this. For example, when you proxy > > > > > > > between > > > > > > > ttys. If the process that you're reading data from has exited and > > > > > > > closed > > > > > > > it's end you still can't usually simply exit because it might > > > > > > > have still > > > > > > > buffered data that you want to read. The way one can deal with > > > > > > > this > > > > > > > from userspace is that you can observe a (POLLHUP | POLLIN) > > > > > > > event and > > > > > > > you keep on reading until you only observe a POLLHUP without a > > > > > > > POLLIN > > > > > > > event at which point you know you have read > > > > > > > all data. > > > > > > > I like the semantics for pidfds as well as it would indicate: > > > > > > > - POLLHUP -> process has exited > > > > > > > - POLLIN -> information can be read > > > > > > > > > > > > Actually I think a bit different about this, in my opinion the > > > > > > pidfd should > > > > > > always be readable (we would store the exit status somewhere in the > > > > > > future > > > > > > which would be readable, even after task_struct is dead). So I was > > >
Re: [PATCH RFC 1/2] Add polling support to pidfd
On Fri, Apr 19, 2019 at 11:48 PM Christian Brauner wrote: > > On Fri, Apr 19, 2019 at 11:21 PM Daniel Colascione wrote: > > > > On Fri, Apr 19, 2019 at 1:57 PM Christian Brauner > > wrote: > > > > > > On Fri, Apr 19, 2019 at 10:34 PM Daniel Colascione > > > wrote: > > > > > > > > On Fri, Apr 19, 2019 at 12:49 PM Joel Fernandes > > > > wrote: > > > > > > > > > > On Fri, Apr 19, 2019 at 09:18:59PM +0200, Christian Brauner wrote: > > > > > > On Fri, Apr 19, 2019 at 03:02:47PM -0400, Joel Fernandes wrote: > > > > > > > On Thu, Apr 18, 2019 at 07:26:44PM +0200, Christian Brauner wrote: > > > > > > > > On April 18, 2019 7:23:38 PM GMT+02:00, Jann Horn > > > > > > > > wrote: > > > > > > > > >On Wed, Apr 17, 2019 at 3:09 PM Oleg Nesterov > > > > > > > > > wrote: > > > > > > > > >> On 04/16, Joel Fernandes wrote: > > > > > > > > >> > On Tue, Apr 16, 2019 at 02:04:31PM +0200, Oleg Nesterov > > > > > > > > >> > wrote: > > > > > > > > >> > > > > > > > > > > >> > > Could you explain when it should return POLLIN? When the > > > > > > > > >> > > whole > > > > > > > > >process exits? > > > > > > > > >> > > > > > > > > > >> > It returns POLLIN when the task is dead or doesn't exist > > > > > > > > >> > anymore, > > > > > > > > >or when it > > > > > > > > >> > is in a zombie state and there's no other thread in the > > > > > > > > >> > thread > > > > > > > > >group. > > > > > > > > >> > > > > > > > > >> IOW, when the whole thread group exits, so it can't be used > > > > > > > > >> to > > > > > > > > >monitor sub-threads. > > > > > > > > >> > > > > > > > > >> just in case... speaking of this patch it doesn't modify > > > > > > > > >proc_tid_base_operations, > > > > > > > > >> so you can't poll("/proc/sub-thread-tid") anyway, but iiuc > > > > > > > > >> you are > > > > > > > > >going to use > > > > > > > > >> the anonymous file returned by CLONE_PIDFD ? > > > > > > > > > > > > > > > > > >I don't think procfs works that way. /proc/sub-thread-tid has > > > > > > > > >proc_tgid_base_operations despite not being a thread group > > > > > > > > >leader. > > > > > > > > >(Yes, that's kinda weird.) AFAICS the WARN_ON_ONCE() in this > > > > > > > > >code can > > > > > > > > >be hit trivially, and then the code will misbehave. > > > > > > > > > > > > > > > > > >@Joel: I think you'll have to either rewrite this to > > > > > > > > >explicitly bail > > > > > > > > >out if you're dealing with a thread group leader, or make the > > > > > > > > >code > > > > > > > > >work for threads, too. > > > > > > > > > > > > > > > > The latter case probably being preferred if this API is > > > > > > > > supposed to be > > > > > > > > useable for thread management in userspace. > > > > > > > > > > > > > > At the moment, we are not planning to use this for sub-thread > > > > > > > management. I > > > > > > > am reworking this patch to only work on clone(2) pidfds which > > > > > > > makes the above > > > > > > > > > > > > Indeed and agreed. > > > > > > > > > > > > > discussion about /proc a bit unnecessary I think. Per the latest > > > > > > > CLONE_PIDFD > > > > > > > patches, CLONE_THREAD with pidfd is not supported. > > > > > > > > > > > > Yes. We have no one asking for it right now and we can easily add > > > > > > this > > > > > > later. > > > > > > > > > > > > Admittedly I haven't gotten around to reviewing the patches here yet > > > > > > completely. But one thing about using POLLIN. FreeBSD is using > > > > > > POLLHUP > > > > > > on process exit which I think is nice as well. How about returning > > > > > > POLLIN | POLLHUP on process exit? > > > > > > We already do things like this. For example, when you proxy between > > > > > > ttys. If the process that you're reading data from has exited and > > > > > > closed > > > > > > it's end you still can't usually simply exit because it might have > > > > > > still > > > > > > buffered data that you want to read. The way one can deal with this > > > > > > from userspace is that you can observe a (POLLHUP | POLLIN) event > > > > > > and > > > > > > you keep on reading until you only observe a POLLHUP without a > > > > > > POLLIN > > > > > > event at which point you know you have read > > > > > > all data. > > > > > > I like the semantics for pidfds as well as it would indicate: > > > > > > - POLLHUP -> process has exited > > > > > > - POLLIN -> information can be read > > > > > > > > > > Actually I think a bit different about this, in my opinion the pidfd > > > > > should > > > > > always be readable (we would store the exit status somewhere in the > > > > > future > > > > > which would be readable, even after task_struct is dead). So I was > > > > > thinking > > > > > we always return EPOLLIN. If process has not exited, then it blocks. > > > > > > > > ITYM that a pidfd polls as readable *once a task exits* and stays > > > > readable forever. Before a task exit, a poll on a pidfd should *not* > > > > yield POLLIN and reading that pidfd should *not* complete
Re: [PATCH v1 2/3] PCI / ACPI: Remove the need for 'struct hotplug_params'
On Fri, Apr 19, 2019 at 03:07:45PM -0500, Bjorn Helgaas wrote: > On Fri, Feb 08, 2019 at 10:24:12AM -0600, Alexandru Gagniuc wrote: > > We used to first parse all the _HPP and _HPX tables before using the > > information to program registers of PCIe devices. Up until HPX type 2, > > there was only one structure of each type, so we could cheat and store > > it on the stack. > > > > With HPX type 3 we get an arbitrary number of entries, so the above > > model doesn't scale that well. Instead of parsing all tables at once, > > parse and program each entry separately. For _HPP and _HPX 0 thru 2, > > this is functionally equivalent. The change enables the upcoming _HPX3 > > to integrate more easily. > > I think this is tremendous! It's going to simplify this code > dramatically. Two comments below. > > static void pci_configure_device(struct pci_dev *dev) > > { > > - struct hotplug_params hpp; > > - int ret; > > + static const struct hotplug_program_ops hp_ops = { > > + .program_type0 = program_hpp_type0, > > + .program_type1 = program_hpp_type1, > > + .program_type2 = program_hpp_type2, > > + }; > > What if we just moved program_hpp_type0(), etc from probe.c to > pci-acpi.c? The only reason I see to have it in probe.c is for > pci_default_type0, and I think that is a pretty obtuse way of doing > default configuration. I would have no problem at all just hardcoding > those defaults in probe.c and then potentially having them overwritten > by _HPP/_HPX. Actually, never mind about this. This would be a perfect project for mentoring a Linux newbie. I'll merge this series as-is and any restructuring/cleanup can happen later, since it's not related to this series anyway. Bjorn
Re: [PATCH RFC 1/2] Add polling support to pidfd
On Fri, Apr 19, 2019 at 11:20 PM Joel Fernandes wrote: > > On Fri, Apr 19, 2019 at 10:57:11PM +0200, Christian Brauner wrote: > > On Fri, Apr 19, 2019 at 10:34 PM Daniel Colascione > > wrote: > > > > > > On Fri, Apr 19, 2019 at 12:49 PM Joel Fernandes > > > wrote: > > > > > > > > On Fri, Apr 19, 2019 at 09:18:59PM +0200, Christian Brauner wrote: > > > > > On Fri, Apr 19, 2019 at 03:02:47PM -0400, Joel Fernandes wrote: > > > > > > On Thu, Apr 18, 2019 at 07:26:44PM +0200, Christian Brauner wrote: > > > > > > > On April 18, 2019 7:23:38 PM GMT+02:00, Jann Horn > > > > > > > wrote: > > > > > > > >On Wed, Apr 17, 2019 at 3:09 PM Oleg Nesterov > > > > > > > >wrote: > > > > > > > >> On 04/16, Joel Fernandes wrote: > > > > > > > >> > On Tue, Apr 16, 2019 at 02:04:31PM +0200, Oleg Nesterov > > > > > > > >> > wrote: > > > > > > > >> > > > > > > > > > >> > > Could you explain when it should return POLLIN? When the > > > > > > > >> > > whole > > > > > > > >process exits? > > > > > > > >> > > > > > > > > >> > It returns POLLIN when the task is dead or doesn't exist > > > > > > > >> > anymore, > > > > > > > >or when it > > > > > > > >> > is in a zombie state and there's no other thread in the > > > > > > > >> > thread > > > > > > > >group. > > > > > > > >> > > > > > > > >> IOW, when the whole thread group exits, so it can't be used to > > > > > > > >monitor sub-threads. > > > > > > > >> > > > > > > > >> just in case... speaking of this patch it doesn't modify > > > > > > > >proc_tid_base_operations, > > > > > > > >> so you can't poll("/proc/sub-thread-tid") anyway, but iiuc you > > > > > > > >> are > > > > > > > >going to use > > > > > > > >> the anonymous file returned by CLONE_PIDFD ? > > > > > > > > > > > > > > > >I don't think procfs works that way. /proc/sub-thread-tid has > > > > > > > >proc_tgid_base_operations despite not being a thread group > > > > > > > >leader. > > > > > > > >(Yes, that's kinda weird.) AFAICS the WARN_ON_ONCE() in this > > > > > > > >code can > > > > > > > >be hit trivially, and then the code will misbehave. > > > > > > > > > > > > > > > >@Joel: I think you'll have to either rewrite this to explicitly > > > > > > > >bail > > > > > > > >out if you're dealing with a thread group leader, or make the > > > > > > > >code > > > > > > > >work for threads, too. > > > > > > > > > > > > > > The latter case probably being preferred if this API is supposed > > > > > > > to be > > > > > > > useable for thread management in userspace. > > > > > > > > > > > > At the moment, we are not planning to use this for sub-thread > > > > > > management. I > > > > > > am reworking this patch to only work on clone(2) pidfds which makes > > > > > > the above > > > > > > > > > > Indeed and agreed. > > > > > > > > > > > discussion about /proc a bit unnecessary I think. Per the latest > > > > > > CLONE_PIDFD > > > > > > patches, CLONE_THREAD with pidfd is not supported. > > > > > > > > > > Yes. We have no one asking for it right now and we can easily add this > > > > > later. > > > > > > > > > > Admittedly I haven't gotten around to reviewing the patches here yet > > > > > completely. But one thing about using POLLIN. FreeBSD is using POLLHUP > > > > > on process exit which I think is nice as well. How about returning > > > > > POLLIN | POLLHUP on process exit? > > > > > We already do things like this. For example, when you proxy between > > > > > ttys. If the process that you're reading data from has exited and > > > > > closed > > > > > it's end you still can't usually simply exit because it might have > > > > > still > > > > > buffered data that you want to read. The way one can deal with this > > > > > from userspace is that you can observe a (POLLHUP | POLLIN) event and > > > > > you keep on reading until you only observe a POLLHUP without a POLLIN > > > > > event at which point you know you have read > > > > > all data. > > > > > I like the semantics for pidfds as well as it would indicate: > > > > > - POLLHUP -> process has exited > > > > > - POLLIN -> information can be read > > > > > > > > Actually I think a bit different about this, in my opinion the pidfd > > > > should > > > > always be readable (we would store the exit status somewhere in the > > > > future > > > > which would be readable, even after task_struct is dead). So I was > > > > thinking > > > > we always return EPOLLIN. If process has not exited, then it blocks. > > > > > > ITYM that a pidfd polls as readable *once a task exits* and stays > > > readable forever. Before a task exit, a poll on a pidfd should *not* > > > yield POLLIN and reading that pidfd should *not* complete immediately. > > > There's no way that, having observed POLLIN on a pidfd, you should > > > ever then *not* see POLLIN on that pidfd in the future --- it's a > > > one-way transition from not-ready-to-get-exit-status to > > > ready-to-get-exit-status. > > > > What do you consider interesting state transitions? A listener on a pidfd > > in
Re: [PATCH v20 00/28] Intel SGX1 support
On 2019-04-19 14:34, Thomas Gleixner wrote: > And how so? You create writeable AND executable memory. That's a nono and > you can argue in circles, that's not going to change with any of your > proposed changes. On 2019-04-19 14:38, Thomas Gleixner wrote: > You are working around LSM nothing else and that's just not going to fly. Based on your comments, I'm still unsure if we're on the same page with regards to what I'm proposing. Here's a regular non-SGX flow that LSM would likely prevent: mmap(PROT_READ|PROT_WRITE) memcpy() mmap(PROT_READ|PROT_EXEC) <-- denied by LSM Or just something based on regular PT permissions: mmap(PROT_READ|PROT_EXEC) memcpy() <-- SIGSEGV Now, the equivalent for SGX: mmap(PROT_READ|PROT_WRITE) ioctl(EADD) mmap(PROT_READ|PROT_EXEC) <-- denied by LSM This works fine with v20 as-is. However, consider the equivalent of the PT-based flow: mmap(PROT_READ|PROT_EXEC) ioctl(EADD) <-- no error! It's not me that's working around the LSM, it's the SGX driver! It's writing to memory that's not marked writable! The fundamental issue here is that the SGX instruction set has several instructions that bypass the page table permission bits, and this is (naturally) confusing to any kind of reference monitor like the LSM framework. You can come up with similar scenarios that involve PROT_READ|PROT_WRITE|PROT_EXEC or ptrace(PTRACE_POKETEXT). So, clearly, the proper way to fix this failure of complete mediation is by enforcing appropriate page-table permissions even on the SGX instructions that don't do it themselves. Just make any implicit memory access look like a regular memory access and now everyone is on the same page (pun intended). -- Jethro Beekman | Fortanix
mmotm 2019-04-19-14-53 uploaded
The mm-of-the-moment snapshot 2019-04-19-14-53 has been uploaded to http://www.ozlabs.org/~akpm/mmotm/ mmotm-readme.txt says README for mm-of-the-moment: http://www.ozlabs.org/~akpm/mmotm/ This is a snapshot of my -mm patch queue. Uploaded at random hopefully more than once a week. You will need quilt to apply these patches to the latest Linus release (5.x or 5.x-rcY). The series file is in broken-out.tar.gz and is duplicated in http://ozlabs.org/~akpm/mmotm/series The file broken-out.tar.gz contains two datestamp files: .DATE and .DATE--mm-dd-hh-mm-ss. Both contain the string -mm-dd-hh-mm-ss, followed by the base kernel version against which this patch series is to be applied. This tree is partially included in linux-next. To see which patches are included in linux-next, consult the `series' file. Only the patches within the #NEXT_PATCHES_START/#NEXT_PATCHES_END markers are included in linux-next. A full copy of the full kernel tree with the linux-next and mmotm patches already applied is available through git within an hour of the mmotm release. Individual mmotm releases are tagged. The master branch always points to the latest release, so it's constantly rebasing. http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/ The directory http://www.ozlabs.org/~akpm/mmots/ (mm-of-the-second) contains daily snapshots of the -mm tree. It is updated more frequently than mmotm, and is untested. A git copy of this tree is available at http://git.cmpxchg.org/cgit.cgi/linux-mmots.git/ and use of this tree is similar to http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/, described above. This mmotm tree contains the following patches against 5.1-rc5: (patches marked "*" will be included in linux-next) origin.patch * mm-add-sys-kernel-slab-cache-cache_dma32.patch * userfaultfd-use-rcu-to-free-the-task-struct-when-fork-fails.patch * mm-memory_hotplug-drop-memory-device-reference-after-find_memory_block.patch * zram-pass-down-the-bvec-we-need-to-read-into-in-the-work-struct.patch * lib-kconfigdebug-fix-build-error-without-config_block.patch * lib-test_vmalloc-do-not-create-cpumask_t-variable-on-stack.patch * mm-do-not-boost-watermarks-to-avoid-fragmentation-for-the-discontig-memory-model.patch * mm-page_alloc-always-use-a-captured-page-regardless-of-compaction-result.patch * prctl-fix-false-positive-in-validate_prctl_map.patch * scripts-spellingtxt-add-more-typos-to-spellingtxt-and-sort.patch * arch-sh-boards-mach-dreamcast-irqc-remove-duplicate-header.patch * debugobjects-move-printk-out-of-db-lock-critical-sections.patch * ocfs2-use-common-file-type-conversion.patch * ocfs2-fix-ocfs2-read-inode-data-panic-in-ocfs2_iget.patch * ocfs2-clear-zero-in-unaligned-direct-io.patch * ocfs2-clear-zero-in-unaligned-direct-io-checkpatch-fixes.patch * ocfs2-wait-for-recovering-done-after-direct-unlock-request.patch * ocfs2-checkpoint-appending-truncate-log-transaction-before-flushing.patch * ramfs-support-o_tmpfile.patch mm.patch * list-add-function-list_rotate_to_front.patch * slob-respect-list_head-abstraction-layer.patch * slob-use-slab_list-instead-of-lru.patch * slub-add-comments-to-endif-pre-processor-macros.patch * slub-use-slab_list-instead-of-lru.patch * slab-use-slab_list-instead-of-lru.patch * mm-remove-stale-comment-from-page-struct.patch * slub-remove-useless-kmem_cache_debug-before-remove_full.patch * mm-slab-remove-unneed-check-in-cpuup_canceled.patch * slub-update-the-comment-about-slab-frozen.patch * slab-fix-an-infinite-loop-in-leaks_show.patch * slab-fix-an-infinite-loop-in-leaks_show-fix.patch * mm-vmscan-drop-zone-id-from-kswapd-tracepoints.patch * mm-cma_debugc-fix-the-break-condition-in-cma_maxchunk_get.patch * userfaultfd-sysctl-add-vmunprivileged_userfaultfd.patch * userfaultfd-sysctl-add-vmunprivileged_userfaultfd-fix.patch * page-cache-store-only-head-pages-in-i_pages.patch * page-cache-store-only-head-pages-in-i_pages-fix.patch * page-cache-store-only-head-pages-in-i_pages-fix-fix.patch * mm-page_alloc-disallow-__gfp_comp-in-alloc_pages_exact.patch * mm-move-recent_rotated-pages-calculation-to-shrink_inactive_list.patch * mm-move-nr_deactivate-accounting-to-shrink_active_list.patch * mm-move-nr_deactivate-accounting-to-shrink_active_list-fix.patch * mm-remove-pages_to_free-argument-of-move_active_pages_to_lru.patch * mm-generalize-putback-scan-functions.patch * mm-gup-replace-get_user_pages_longterm-with-foll_longterm.patch * mm-gup-replace-get_user_pages_longterm-with-foll_longterm-v3.patch * mm-gup-change-write-parameter-to-flags-in-fast-walk.patch * mm-gup-change-gup-fast-to-use-flags-rather-than-a-write-bool.patch * mm-gup-add-foll_longterm-capability-to-gup-fast.patch * mm-gup-add-foll_longterm-capability-to-gup-fast-v3.patch * ib-hfi1-use-the-new-foll_longterm-flag-to-get_user_pages_fast.patch * ib-hfi1-use-the-new-foll_longterm-flag-to-get_user_pages_fast-v3.patch * ib-qib-use-the-new-foll_longterm-flag-to-get_user_pages_fast.patch *
Re: [PATCH] Staging: vc04_services: Cleanup in ctrl_set_bitrate()
Hi Madhumitha, Am 19.04.19 um 23:23 schrieb Madhumitha Prabakaran: > Remove unnecessary variable and replace return type. > > Issue suggested by Coccinelle. > > Signed-off-by: Madhumitha Prabakaran > --- > drivers/staging/vc04_services/bcm2835-camera/controls.c | 7 +-- > 1 file changed, 1 insertion(+), 6 deletions(-) > > diff --git a/drivers/staging/vc04_services/bcm2835-camera/controls.c > b/drivers/staging/vc04_services/bcm2835-camera/controls.c > index e39ab5fae724..f87fa7f61db3 100644 > --- a/drivers/staging/vc04_services/bcm2835-camera/controls.c > +++ b/drivers/staging/vc04_services/bcm2835-camera/controls.c > @@ -607,18 +607,13 @@ static int ctrl_set_bitrate(struct bm2835_mmal_dev *dev, > struct v4l2_ctrl *ctrl, > const struct bm2835_mmal_v4l2_ctrl *mmal_ctrl) > { > - int ret; > struct vchiq_mmal_port *encoder_out; > > dev->capture.encode_bitrate = ctrl->val; > > encoder_out = >component[MMAL_COMPONENT_VIDEO_ENCODE]->output[0]; > > - ret = vchiq_mmal_port_parameter_set(dev->instance, encoder_out, > - mmal_ctrl->mmal_id, > - >val, sizeof(ctrl->val)); > - ret = 0; > - return ret; > + return 0; i don't understand why we can remove here the function call. Stefan > } > > static int ctrl_set_bitrate_mode(struct bm2835_mmal_dev *dev,
Re: [PATCH RFC 1/2] Add polling support to pidfd
On Fri, Apr 19, 2019 at 11:21 PM Daniel Colascione wrote: > > On Fri, Apr 19, 2019 at 1:57 PM Christian Brauner > wrote: > > > > On Fri, Apr 19, 2019 at 10:34 PM Daniel Colascione > > wrote: > > > > > > On Fri, Apr 19, 2019 at 12:49 PM Joel Fernandes > > > wrote: > > > > > > > > On Fri, Apr 19, 2019 at 09:18:59PM +0200, Christian Brauner wrote: > > > > > On Fri, Apr 19, 2019 at 03:02:47PM -0400, Joel Fernandes wrote: > > > > > > On Thu, Apr 18, 2019 at 07:26:44PM +0200, Christian Brauner wrote: > > > > > > > On April 18, 2019 7:23:38 PM GMT+02:00, Jann Horn > > > > > > > wrote: > > > > > > > >On Wed, Apr 17, 2019 at 3:09 PM Oleg Nesterov > > > > > > > >wrote: > > > > > > > >> On 04/16, Joel Fernandes wrote: > > > > > > > >> > On Tue, Apr 16, 2019 at 02:04:31PM +0200, Oleg Nesterov > > > > > > > >> > wrote: > > > > > > > >> > > > > > > > > > >> > > Could you explain when it should return POLLIN? When the > > > > > > > >> > > whole > > > > > > > >process exits? > > > > > > > >> > > > > > > > > >> > It returns POLLIN when the task is dead or doesn't exist > > > > > > > >> > anymore, > > > > > > > >or when it > > > > > > > >> > is in a zombie state and there's no other thread in the > > > > > > > >> > thread > > > > > > > >group. > > > > > > > >> > > > > > > > >> IOW, when the whole thread group exits, so it can't be used to > > > > > > > >monitor sub-threads. > > > > > > > >> > > > > > > > >> just in case... speaking of this patch it doesn't modify > > > > > > > >proc_tid_base_operations, > > > > > > > >> so you can't poll("/proc/sub-thread-tid") anyway, but iiuc you > > > > > > > >> are > > > > > > > >going to use > > > > > > > >> the anonymous file returned by CLONE_PIDFD ? > > > > > > > > > > > > > > > >I don't think procfs works that way. /proc/sub-thread-tid has > > > > > > > >proc_tgid_base_operations despite not being a thread group > > > > > > > >leader. > > > > > > > >(Yes, that's kinda weird.) AFAICS the WARN_ON_ONCE() in this > > > > > > > >code can > > > > > > > >be hit trivially, and then the code will misbehave. > > > > > > > > > > > > > > > >@Joel: I think you'll have to either rewrite this to explicitly > > > > > > > >bail > > > > > > > >out if you're dealing with a thread group leader, or make the > > > > > > > >code > > > > > > > >work for threads, too. > > > > > > > > > > > > > > The latter case probably being preferred if this API is supposed > > > > > > > to be > > > > > > > useable for thread management in userspace. > > > > > > > > > > > > At the moment, we are not planning to use this for sub-thread > > > > > > management. I > > > > > > am reworking this patch to only work on clone(2) pidfds which makes > > > > > > the above > > > > > > > > > > Indeed and agreed. > > > > > > > > > > > discussion about /proc a bit unnecessary I think. Per the latest > > > > > > CLONE_PIDFD > > > > > > patches, CLONE_THREAD with pidfd is not supported. > > > > > > > > > > Yes. We have no one asking for it right now and we can easily add this > > > > > later. > > > > > > > > > > Admittedly I haven't gotten around to reviewing the patches here yet > > > > > completely. But one thing about using POLLIN. FreeBSD is using POLLHUP > > > > > on process exit which I think is nice as well. How about returning > > > > > POLLIN | POLLHUP on process exit? > > > > > We already do things like this. For example, when you proxy between > > > > > ttys. If the process that you're reading data from has exited and > > > > > closed > > > > > it's end you still can't usually simply exit because it might have > > > > > still > > > > > buffered data that you want to read. The way one can deal with this > > > > > from userspace is that you can observe a (POLLHUP | POLLIN) event and > > > > > you keep on reading until you only observe a POLLHUP without a POLLIN > > > > > event at which point you know you have read > > > > > all data. > > > > > I like the semantics for pidfds as well as it would indicate: > > > > > - POLLHUP -> process has exited > > > > > - POLLIN -> information can be read > > > > > > > > Actually I think a bit different about this, in my opinion the pidfd > > > > should > > > > always be readable (we would store the exit status somewhere in the > > > > future > > > > which would be readable, even after task_struct is dead). So I was > > > > thinking > > > > we always return EPOLLIN. If process has not exited, then it blocks. > > > > > > ITYM that a pidfd polls as readable *once a task exits* and stays > > > readable forever. Before a task exit, a poll on a pidfd should *not* > > > yield POLLIN and reading that pidfd should *not* complete immediately. > > > There's no way that, having observed POLLIN on a pidfd, you should > > > ever then *not* see POLLIN on that pidfd in the future --- it's a > > > one-way transition from not-ready-to-get-exit-status to > > > ready-to-get-exit-status. > > > > What do you consider interesting state transitions? A listener on a pidfd > > in
Re: [PATCH RFC 1/2] Add polling support to pidfd
On Fri, Apr 19, 2019 at 02:24:09PM -0700, Daniel Colascione wrote: > On Fri, Apr 19, 2019 at 2:20 PM Joel Fernandes wrote: > > > > On Fri, Apr 19, 2019 at 10:57:11PM +0200, Christian Brauner wrote: > > > On Fri, Apr 19, 2019 at 10:34 PM Daniel Colascione > > > wrote: > > > > > > > > On Fri, Apr 19, 2019 at 12:49 PM Joel Fernandes > > > > wrote: > > > > > > > > > > On Fri, Apr 19, 2019 at 09:18:59PM +0200, Christian Brauner wrote: > > > > > > On Fri, Apr 19, 2019 at 03:02:47PM -0400, Joel Fernandes wrote: > > > > > > > On Thu, Apr 18, 2019 at 07:26:44PM +0200, Christian Brauner wrote: > > > > > > > > On April 18, 2019 7:23:38 PM GMT+02:00, Jann Horn > > > > > > > > wrote: > > > > > > > > >On Wed, Apr 17, 2019 at 3:09 PM Oleg Nesterov > > > > > > > > > wrote: > > > > > > > > >> On 04/16, Joel Fernandes wrote: > > > > > > > > >> > On Tue, Apr 16, 2019 at 02:04:31PM +0200, Oleg Nesterov > > > > > > > > >> > wrote: > > > > > > > > >> > > > > > > > > > > >> > > Could you explain when it should return POLLIN? When the > > > > > > > > >> > > whole > > > > > > > > >process exits? > > > > > > > > >> > > > > > > > > > >> > It returns POLLIN when the task is dead or doesn't exist > > > > > > > > >> > anymore, > > > > > > > > >or when it > > > > > > > > >> > is in a zombie state and there's no other thread in the > > > > > > > > >> > thread > > > > > > > > >group. > > > > > > > > >> > > > > > > > > >> IOW, when the whole thread group exits, so it can't be used > > > > > > > > >> to > > > > > > > > >monitor sub-threads. > > > > > > > > >> > > > > > > > > >> just in case... speaking of this patch it doesn't modify > > > > > > > > >proc_tid_base_operations, > > > > > > > > >> so you can't poll("/proc/sub-thread-tid") anyway, but iiuc > > > > > > > > >> you are > > > > > > > > >going to use > > > > > > > > >> the anonymous file returned by CLONE_PIDFD ? > > > > > > > > > > > > > > > > > >I don't think procfs works that way. /proc/sub-thread-tid has > > > > > > > > >proc_tgid_base_operations despite not being a thread group > > > > > > > > >leader. > > > > > > > > >(Yes, that's kinda weird.) AFAICS the WARN_ON_ONCE() in this > > > > > > > > >code can > > > > > > > > >be hit trivially, and then the code will misbehave. > > > > > > > > > > > > > > > > > >@Joel: I think you'll have to either rewrite this to > > > > > > > > >explicitly bail > > > > > > > > >out if you're dealing with a thread group leader, or make the > > > > > > > > >code > > > > > > > > >work for threads, too. > > > > > > > > > > > > > > > > The latter case probably being preferred if this API is > > > > > > > > supposed to be > > > > > > > > useable for thread management in userspace. > > > > > > > > > > > > > > At the moment, we are not planning to use this for sub-thread > > > > > > > management. I > > > > > > > am reworking this patch to only work on clone(2) pidfds which > > > > > > > makes the above > > > > > > > > > > > > Indeed and agreed. > > > > > > > > > > > > > discussion about /proc a bit unnecessary I think. Per the latest > > > > > > > CLONE_PIDFD > > > > > > > patches, CLONE_THREAD with pidfd is not supported. > > > > > > > > > > > > Yes. We have no one asking for it right now and we can easily add > > > > > > this > > > > > > later. > > > > > > > > > > > > Admittedly I haven't gotten around to reviewing the patches here yet > > > > > > completely. But one thing about using POLLIN. FreeBSD is using > > > > > > POLLHUP > > > > > > on process exit which I think is nice as well. How about returning > > > > > > POLLIN | POLLHUP on process exit? > > > > > > We already do things like this. For example, when you proxy between > > > > > > ttys. If the process that you're reading data from has exited and > > > > > > closed > > > > > > it's end you still can't usually simply exit because it might have > > > > > > still > > > > > > buffered data that you want to read. The way one can deal with this > > > > > > from userspace is that you can observe a (POLLHUP | POLLIN) event > > > > > > and > > > > > > you keep on reading until you only observe a POLLHUP without a > > > > > > POLLIN > > > > > > event at which point you know you have read > > > > > > all data. > > > > > > I like the semantics for pidfds as well as it would indicate: > > > > > > - POLLHUP -> process has exited > > > > > > - POLLIN -> information can be read > > > > > > > > > > Actually I think a bit different about this, in my opinion the pidfd > > > > > should > > > > > always be readable (we would store the exit status somewhere in the > > > > > future > > > > > which would be readable, even after task_struct is dead). So I was > > > > > thinking > > > > > we always return EPOLLIN. If process has not exited, then it blocks. > > > > > > > > ITYM that a pidfd polls as readable *once a task exits* and stays > > > > readable forever. Before a task exit, a poll on a pidfd should *not* > > > > yield POLLIN and reading that pidfd should *not* complete
Re: [PATCH v20 00/28] Intel SGX1 support
On Fri, 19 Apr 2019, Jethro Beekman wrote: > On 2019-04-19 14:31, Andy Lutomirski wrote: > > I do think we need to follow LSM rules. But my bigger point is that > > there are policies that don’t allow JIT at all. I think we should > > arrange the SGX API so it’s still usable when such a policy is in > > effect. > I don't think we need to arrange that right now. This patch set needs to > be merged after more than 2 years of development. I'd like to avoid We merge stuff when it is ready and not when someone declares that it needs to be merged. > introducing any more big changes. Let's just do what I described to make > LSM not broken, which is a minimal change to the current approach. We > can adjust the API later to support the use case you describe. You are working around LSM nothing else and that's just not going to fly. Thanks, tglx
Re: [PATCH v20 00/28] Intel SGX1 support
On 2019-04-19 14:31, Andy Lutomirski wrote: > I do think we need to follow LSM rules. But my bigger point is that there > are policies that don’t allow JIT at all. I think we should arrange the SGX > API so it’s still usable when such a policy is in effect. I don't think we need to arrange that right now. This patch set needs to be merged after more than 2 years of development. I'd like to avoid introducing any more big changes. Let's just do what I described to make LSM not broken, which is a minimal change to the current approach. We can adjust the API later to support the use case you describe. -- Jethro Beekman | Fortanix
Re: [PATCH v20 00/28] Intel SGX1 support
On Fri, 19 Apr 2019, Jethro Beekman wrote: > On 2019-04-19 14:15, Andy Lutomirski wrote: > > With plain mmap() + mprotect(), the LSM will prevent you from making > > memory that *was* writable executable. This is by design and SELinux > > supports it. I don’t remember the name of the associated SELinux > > permission off the top of my head. > > > > If we start enforcing equivalent rules on SGX, then the current API > > will simply not allow enclaves to be loaded — no matter how you slice > > it, loading an enclave with the current API is indistinguishable from > > making arbitrary data executable. > > > Yes this is exactly what I intended here: a very simple change that > stops SGX from confusing LSM. Just by enforcing that everything that > looks like a memory write (EADD, EAUG, EDBGWR, etc.) actually requires > write permissions, reality and LSM should be on the same page. And how so? You create writeable AND executable memory. That's a nono and you can argue in circles, that's not going to change with any of your proposed changes. Andy clearly made a proposal which solves it in a proper way. Thanks, tglx