Re: [PATCH] RAS/CEC: Add debugfs switch to disable at run time

2019-04-19 Thread Cong Wang
On Thu, Apr 18, 2019 at 5:07 PM Luck, Tony  wrote:
>
> On Fri, Apr 19, 2019 at 01:29:10AM +0200, Borislav Petkov wrote:
> > Which reminds me, Tony, I think all those debugging files "pfn"
> > and "array" and the one you add now, should all be under a
> > CONFIG_RAS_CEC_DEBUG which is default off and used only for development.
> > Mind adding that too pls?
>
> Patch below, on top of previous patch.  Note that I didn't move "enable"
> into the RAS_CEC_DEBUG code. I think it has some value even on
> production systems.  It is still in debugfs (which many production
> systems don't mount) so I don't see that people are going to be randomly
> using it to disable the CEC.

For me, "enable" is useful to make mcelog work like before. Please see the
other email from me for all the details.

For debugfs, I believe many productions mount it, as tracing still uses debugfs
rather than tracefs (at least on CentOS7), and rasdaemon also uses trace
events to collect different errors.

Therefore, I believe your patch is fine as it is.

Thanks.


Re: [PATCH] RAS/CEC: Add debugfs switch to disable at run time

2019-04-19 Thread Cong Wang
On Thu, Apr 18, 2019 at 5:26 PM Borislav Petkov  wrote:
>
> Now, if any of that above still doesn't make it clear, please state what
> you're trying to achieve and I'll try to help.

Sorry that I misled you to believe we don't even enable
CONFIG_X86_MCELOG_LEGACY. Here is what we have and
what we have tried:

1. We have CONFIG_X86_MCELOG_LEGACY=y

2. We also have CONFIG_RAS=y and CONFIG_RAS_CEC=y

3. mcelog started as a daemon successfully, like before

4. Some real correctable memory errors happened, as logged in
dmesg

5. mcelog couldn't receive any of them, reported 0 errors

6. Admin's complained to us as they believe this is a kernel bug

7. We dug into kernel source code and found out CONFIG_RAS
hijacks all these errors, by stopping there in the notification chain:

static int mce_first_notifier(struct notifier_block *nb, unsigned long val,
  void *data)
{
struct mce *m = (struct mce *)data;

if (!m)
return NOTIFY_DONE;

if (cec_add_mce(m))
return NOTIFY_STOP; // <=== Returns and stops here

/* Emit the trace record: */
trace_mce_record(m);

set_bit(0, _need_notify);

mce_notify_irq(); // <=== There is where MCELOG receives

return NOTIFY_DONE;
}


8. I noticed rasdaemon, and tried to start it instead of mcelog.

9. I injected some memory error and could successfully read them
via ras-mc-ctl.

To demonstrate what I think we should have, here is the PoC code
ONLY to show the idea (please don't judge it):


@ -567,12 +567,12 @@ static int mce_first_notifier(struct
notifier_block *nb, unsigned long val,
  void *data)
 {
struct mce *m = (struct mce *)data;
+   bool consumed;

if (!m)
return NOTIFY_DONE;

-   if (cec_add_mce(m))
-   return NOTIFY_STOP;
+   consumed = cec_add_mce(m);

/* Emit the trace record: */
trace_mce_record(m);
@@ -581,7 +581,7 @@ static int mce_first_notifier(struct
notifier_block *nb, unsigned long val,

mce_notify_irq();

-   return NOTIFY_DONE;
+   return consumed ? NOTIFY_STOP : NOTIFY_DONE;
 }

With this change, although not even compiled, mcelog should still
receive correctable memory errors like before, even when we have
CONFIG_RAS_CEC=y.

Does this make any sense to you?

Thanks!


Re: [PATCH v20 00/28] Intel SGX1 support

2019-04-19 Thread Thomas Gleixner
On Fri, 19 Apr 2019, Jethro Beekman wrote:
> On 2019-04-19 14:34, Thomas Gleixner wrote:
> > And how so? You create writeable AND executable memory. That's a nono and
> > you can argue in circles, that's not going to change with any of your
> > proposed changes.
> 
> On 2019-04-19 14:38, Thomas Gleixner wrote:
> > You are working around LSM nothing else and that's just not going to fly.
> 
> Based on your comments, I'm still unsure if we're on the same page with
> regards to what I'm proposing.
> 
> Here's a regular non-SGX flow that LSM would likely prevent:
> 
> mmap(PROT_READ|PROT_WRITE)
> memcpy()
> mmap(PROT_READ|PROT_EXEC) <-- denied by LSM
> 
> Or just something based on regular PT permissions:
> 
> mmap(PROT_READ|PROT_EXEC)
> memcpy() <-- SIGSEGV
> 
> Now, the equivalent for SGX:
> 
> mmap(PROT_READ|PROT_WRITE)
> ioctl(EADD)
> mmap(PROT_READ|PROT_EXEC) <-- denied by LSM

This is completely irrelevant, really.

The point is that the SGX driver loads and executes arbitrary data which is
handed in from user space via an ioctl w/o any chance of verifying where
that comes from.

What Andy proposed is to open a file with the SGX payload and hand in the
file descriptor. That way LSM can decide whether this is allowed or denied
based on the file descriptor and whatever the security model/policy is in a
particular setup.

Right know the SGX driver and its proposed API prevent any form of LSM
auditing and whatever permission checks you had in mind won't change that
at all.

Thanks,

tglx


Re: [PATCH] KVM: nVMX: allow tests to use bad virtual-APIC page address

2019-04-19 Thread Sean Christopherson
On Mon, Apr 15, 2019 at 03:35:33PM +0200, Paolo Bonzini wrote:
> As mentioned in the comment, there are some special cases where we can simply
> clear the TPR shadow bit from the CPU-based execution controls in the vmcs02.
> Handle them so that we can remove some XFAILs from vmx.flat.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  arch/x86/kvm/vmx/nested.c | 25 -
>  arch/x86/kvm/vmx/vmx.c|  2 +-
>  arch/x86/kvm/vmx/vmx.h|  2 ++
>  3 files changed, 19 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 7ec9bb1dd723..a22af5a85540 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -2873,20 +2873,27 @@ static void nested_get_vmcs12_pages(struct kvm_vcpu 
> *vcpu)
>   /*
>* If translation failed, VM entry will fail because
>* prepare_vmcs02 set VIRTUAL_APIC_PAGE_ADDR to -1ull.
> -  * Failing the vm entry is _not_ what the processor
> -  * does but it's basically the only possibility we
> -  * have.  We could still enter the guest if CR8 load
> -  * exits are enabled, CR8 store exits are enabled, and
> -  * virtualize APIC access is disabled; in this case
> -  * the processor would never use the TPR shadow and we
> -  * could simply clear the bit from the execution
> -  * control.  But such a configuration is useless, so
> -  * let's keep the code simple.
>*/
>   if (!is_error_page(page)) {
>   vmx->nested.virtual_apic_page = page;
>   hpa = page_to_phys(vmx->nested.virtual_apic_page);
>   vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, hpa);
> + } else if (nested_cpu_has(vmcs12, CPU_BASED_CR8_LOAD_EXITING) &&
> +nested_cpu_has(vmcs12, CPU_BASED_CR8_STORE_EXITING) 
> &&
> +!nested_cpu_has2(vmcs12, 
> SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) {
> + /*
> +  * The processor will never use the TPR shadow, simply
> +  * clear the bit from the execution control.  Such a
> +  * configuration is useless, but it happens in tests.
> +  * For any other configuration, failing the vm entry is
> +  * _not_ what the processor does but it's basically the
> +  * only possibility we have.
> +  */
> + vmcs_clear_bits(CPU_BASED_VM_EXEC_CONTROL,
> + CPU_BASED_TPR_SHADOW);
> + } else {
> + printk("bad virtual-APIC page address\n");
> + dump_vmcs();

I don't think we should dump the VMCS here, or have any form of print
at all.  dump_vmcs() is especially bad as it allows userspace to spam
the kernel log at the error level.  I haven't actually checked, but I
assume the nested consistency check tracing patch would be a better
way to debug something of this nature in production.

>   }
>   }
>  
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index f8054dc1de65..14cacfd7ffd0 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -5603,7 +5603,7 @@ static void vmx_dump_dtsel(char *name, uint32_t limit)
>  vmcs_readl(limit + GUEST_GDTR_BASE - GUEST_GDTR_LIMIT));
>  }
>  
> -static void dump_vmcs(void)
> +void dump_vmcs(void)
>  {
>   u32 vmentry_ctl = vmcs_read32(VM_ENTRY_CONTROLS);
>   u32 vmexit_ctl = vmcs_read32(VM_EXIT_CONTROLS);
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index a1e00d0a2482..f879529906b4 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -517,4 +517,6 @@ static inline void decache_tsc_multiplier(struct vcpu_vmx 
> *vmx)
>   vmcs_write64(TSC_MULTIPLIER, vmx->current_tsc_ratio);
>  }
>  
> +void dump_vmcs(void);
> +
>  #endif /* __KVM_X86_VMX_H */
> -- 
> 1.8.3.1
> 


Re: [PATCH 4.19 000/110] 4.19.36-stable review

2019-04-19 Thread Naresh Kamboju
On Thu, 18 Apr 2019 at 23:29, Greg Kroah-Hartman
 wrote:
>
> This is the start of the stable review cycle for the 4.19.36 release.
> There are 110 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sat Apr 20 16:03:29 UTC 2019.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.36-rc1.gz
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.19.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.

Summary


kernel: 4.19.36-rc1
git repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.19.y
git commit: e409a9a7917997e4aa9b49cd5c445208696ee1cf
git describe: v4.19.35-111-ge409a9a79179
Test details: 
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.19-oe/build/v4.19.35-111-ge409a9a79179


No regressions (compared to build v4.19.34-102-ge4f859c2cd83)


No fixes (compared to build v4.19.34-102-ge4f859c2cd83)

Ran 23730 total tests in the following environments and test suites.

Environments
--
- dragonboard-410c - arm64
- hi6220-hikey - arm64
- i386
- juno-r2 - arm64
- qemu_arm
- qemu_arm64
- qemu_i386
- qemu_x86_64
- x15 - arm
- x86_64

Test Suites
---
* install-android-platform-tools-r2600
* kselftest
* libhugetlbfs
* ltp-cap_bounds-tests
* ltp-commands-tests
* ltp-containers-tests
* ltp-cpuhotplug-tests
* ltp-cve-tests
* ltp-dio-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-mm-tests
* ltp-nptl-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-timers-tests
* perf
* spectre-meltdown-checker-test
* v4l2-compliance
* kvm-unit-tests
* ltp-open-posix-tests
* kselftest-vsyscall-mode-native

-- 
Linaro LKFT
https://lkft.linaro.org


Re: [PATCH 5.0 00/93] 5.0.9-stable review

2019-04-19 Thread Naresh Kamboju
On Thu, 18 Apr 2019 at 23:40, Greg Kroah-Hartman
 wrote:
>
> This is the start of the stable review cycle for the 5.0.9 release.
> There are 93 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sat Apr 20 16:03:33 UTC 2019.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.0.9-rc1.gz
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-5.0.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>

Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.

Summary


kernel: 5.0.9-rc1
git repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-5.0.y
git commit: a6b9c7d1cb06316cd1cea76371029603b6459de8
git describe: v5.0.8-94-ga6b9c7d1cb06
Test details: 
https://qa-reports.linaro.org/lkft/linux-stable-rc-5.0-oe/build/v5.0.8-94-ga6b9c7d1cb06

No regressions (compared to build v5.0.7-118-ge6fdfdbfcc6a)

No fixes (compared to build v5.0.7-118-ge6fdfdbfcc6a)

Ran 23788 total tests in the following environments and test suites.

Environments
--
- dragonboard-410c
- hi6220-hikey
- i386
- juno-r2
- qemu_arm
- qemu_arm64
- qemu_i386
- qemu_x86_64
- x15
- x86

Test Suites
---
* install-android-platform-tools-r2600
* kselftest
* libhugetlbfs
* ltp-cap_bounds-tests
* ltp-commands-tests
* ltp-containers-tests
* ltp-cpuhotplug-tests
* ltp-cve-tests
* ltp-dio-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-mm-tests
* ltp-nptl-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-timers-tests
* perf
* spectre-meltdown-checker-test
* v4l2-compliance
* kvm-unit-tests
* ltp-fs-tests
* ltp-open-posix-tests
* kselftest-vsyscall-mode-native
* kselftest-vsyscall-mode-none

-- 
Linaro LKFT
https://lkft.linaro.org


Re: mmotm 2019-04-19-14-53 uploaded (objtool)

2019-04-19 Thread Randy Dunlap
On 4/19/19 2:53 PM, a...@linux-foundation.org wrote:
> The mm-of-the-moment snapshot 2019-04-19-14-53 has been uploaded to
> 
>http://www.ozlabs.org/~akpm/mmotm/
> 
> mmotm-readme.txt says
> 
> README for mm-of-the-moment:
> 
> http://www.ozlabs.org/~akpm/mmotm/
> 
> This is a snapshot of my -mm patch queue.  Uploaded at random hopefully
> more than once a week.

on x86_64:

  CC  lib/strncpy_from_user.o
lib/strncpy_from_user.o: warning: objtool: strncpy_from_user()+0x315: call to 
__ubsan_handle_add_overflow() with UACCESS enabled
  CC  lib/strnlen_user.o
lib/strnlen_user.o: warning: objtool: strnlen_user()+0x337: call to 
__ubsan_handle_sub_overflow() with UACCESS enabled



obj files are attached.


-- 
~Randy


strncpy_from_user.o
Description: application/object


strnlen_user.o
Description: application/object


[PATCH] power_supply: platform/chrome: wilco_ec: fix ptr_ret.cocci warnings

2019-04-19 Thread kbuild test robot
From: kbuild test robot 

drivers/power/supply/wilco-charger.c:173:1-3: WARNING: PTR_ERR_OR_ZERO can be 
used


 Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR

Generated by: scripts/coccinelle/api/ptr_ret.cocci

Fixes: 2600e89ec94e ("power_supply: platform/chrome: wilco_ec: Add charging 
config driver")
CC: Nick Crews 
Signed-off-by: kbuild test robot 
---

url:
https://github.com/0day-ci/linux/commits/Nick-Crews/power_supply-platform-chrome-wilco_ec-Add-charging-config-driver/20190420-041913

 wilco-charger.c |5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

--- a/drivers/power/supply/wilco-charger.c
+++ b/drivers/power/supply/wilco-charger.c
@@ -170,10 +170,7 @@ static int wilco_charge_probe(struct pla
 
psy_cfg.drv_data = ec;
psy = devm_power_supply_register(>dev, _ps_desc, _cfg);
-   if (IS_ERR(psy))
-   return PTR_ERR(psy);
-
-   return 0;
+   return PTR_ERR_OR_ZERO(psy);
 }
 
 static struct platform_driver wilco_charge_driver = {


[PATCH 3/3] PCI: use pci_*(), dev_*() and pr_*() instead of pci_printk()

2019-04-19 Thread Mohan Kumar
Replace pci_printk with pci_*, dev_*() and pr_*() to avoid checkpatch
warnings.

Signed-off-by: Mohan Kumar 
---
 drivers/pci/bus.c  |  3 +--
 drivers/pci/pci.c  | 14 +++---
 drivers/pci/probe.c| 21 +
 drivers/pci/setup-bus.c| 15 +++
 drivers/pci/xen-pcifront.c |  7 +++
 5 files changed, 27 insertions(+), 33 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index a742ef5..25e91ae 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -291,8 +291,7 @@ bool pci_bus_clip_resource(struct pci_dev *dev, int idx)
res->end = end;
res->flags &= ~IORESOURCE_UNSET;
orig_res.flags &= ~IORESOURCE_UNSET;
-   pci_printk(KERN_DEBUG, dev, "%pR clipped to %pR\n",
-_res, res);
+   pci_info(dev, "%pR clipped to %pR\n", _res, res);
 
return true;
}
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index a24a8bc..d439151 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2789,14 +2789,14 @@ void pci_pm_init(struct pci_dev *dev)
dev->d2_support = true;
 
if (dev->d1_support || dev->d2_support)
-   pci_printk(KERN_DEBUG, dev, "supports%s%s\n",
+   pci_info(dev, "supports%s%s\n",
   dev->d1_support ? " D1" : "",
   dev->d2_support ? " D2" : "");
}
 
pmc &= PCI_PM_CAP_PME_MASK;
if (pmc) {
-   pci_printk(KERN_DEBUG, dev, "PME# supported from%s%s%s%s%s\n",
+   pci_info(dev, "PME# supported from%s%s%s%s%s\n",
 (pmc & PCI_PM_CAP_PME_D0) ? " D0" : "",
 (pmc & PCI_PM_CAP_PME_D1) ? " D1" : "",
 (pmc & PCI_PM_CAP_PME_D2) ? " D2" : "",
@@ -2964,16 +2964,16 @@ static int pci_ea_read(struct pci_dev *dev, int offset)
res->flags = flags;
 
if (bei <= PCI_EA_BEI_BAR5)
-   pci_printk(KERN_DEBUG, dev, "BAR %d: %pR (from Enhanced 
Allocation, properties %#02x)\n",
+   pci_info(dev, "BAR %d: %pR (from Enhanced Allocation, 
properties %#02x)\n",
   bei, res, prop);
else if (bei == PCI_EA_BEI_ROM)
-   pci_printk(KERN_DEBUG, dev, "ROM: %pR (from Enhanced 
Allocation, properties %#02x)\n",
+   pci_info(dev, "ROM: %pR (from Enhanced Allocation, properties 
%#02x)\n",
   res, prop);
else if (bei >= PCI_EA_BEI_VF_BAR0 && bei <= PCI_EA_BEI_VF_BAR5)
-   pci_printk(KERN_DEBUG, dev, "VF BAR %d: %pR (from Enhanced 
Allocation, properties %#02x)\n",
+   pci_info(dev, "VF BAR %d: %pR (from Enhanced Allocation, 
properties %#02x)\n",
   bei - PCI_EA_BEI_VF_BAR0, res, prop);
else
-   pci_printk(KERN_DEBUG, dev, "BEI %d res: %pR (from Enhanced 
Allocation, properties %#02x)\n",
+   pci_info(dev, "BEI %d res: %pR (from Enhanced Allocation, 
properties %#02x)\n",
   bei, res, prop);
 
 out:
@@ -4200,7 +4200,7 @@ int pci_set_cacheline_size(struct pci_dev *dev)
if (cacheline_size == pci_cache_line_size)
return 0;
 
-   pci_printk(KERN_DEBUG, dev, "cache line size of %d is not supported\n",
+   pci_info(dev, "cache line size of %d is not supported\n",
   pci_cache_line_size << 2);
 
return -EINVAL;
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index eea7847..89fdfd4 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -317,7 +317,7 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type 
type,
res->flags = 0;
 out:
if (res->flags)
-   pci_printk(KERN_DEBUG, dev, "reg 0x%x: %pR\n", pos, res);
+   pci_info(dev, "reg 0x%x: %pR\n", pos, res);
 
return (res->flags & IORESOURCE_MEM_64) ? 1 : 0;
 }
@@ -435,7 +435,7 @@ static void pci_read_bridge_io(struct pci_bus *child)
region.start = base;
region.end = limit + io_granularity - 1;
pcibios_bus_to_resource(dev->bus, res, );
-   pci_printk(KERN_DEBUG, dev, "  bridge window %pR\n", res);
+   pci_info(dev, "  bridge window %pR\n", res);
}
 }
 
@@ -457,7 +457,7 @@ static void pci_read_bridge_mmio(struct pci_bus *child)
region.start = base;
region.end = limit + 0xf;
pcibios_bus_to_resource(dev->bus, res, );
-   pci_printk(KERN_DEBUG, dev, "  bridge window %pR\n", res);
+   pci_info(dev, "  bridge window %pR\n", res);
}
 }
 
@@ -510,7 +510,7 @@ static void pci_read_bridge_mmio_pref(struct pci_bus *child)
region.start = base;
region.end = limit + 0xf;
pcibios_bus_to_resource(dev->bus, 

[PATCH 2/3] PCI: Define pr_fmt() and use pr_*() instead of printk()

2019-04-19 Thread Mohan Kumar
Define a pr_fmt() macro that convert all of the explicit printk() calls
into corresponding pr_*().

Signed-off-by: Mohan Kumar 
---
 drivers/pci/bus.c   |  5 -
 drivers/pci/pci-stub.c  | 11 +--
 drivers/pci/pci-sysfs.c |  3 ++-
 drivers/pci/pci.c   |  5 +++--
 drivers/pci/quirks.c|  9 +
 drivers/pci/setup-bus.c |  5 +++--
 drivers/pci/slot.c  |  4 +++-
 7 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 5cb40b2..a742ef5 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -6,6 +6,9 @@
  * David Miller (da...@redhat.com)
  * Ivan Kokshaysky (i...@jurassic.park.msu.ru)
  */
+
+#define pr_fmt(fmt) "PCI: " fmt
+
 #include 
 #include 
 #include 
@@ -23,7 +26,7 @@ void pci_add_resource_offset(struct list_head *resources, 
struct resource *res,
 
entry = resource_list_create_entry(res, 0);
if (!entry) {
-   printk(KERN_ERR "PCI: can't add host bridge window %pR\n", res);
+   pr_err("can't add host bridge window %pR\n", res);
return;
}
 
diff --git a/drivers/pci/pci-stub.c b/drivers/pci/pci-stub.c
index 66f8a59..f946bf9 100644
--- a/drivers/pci/pci-stub.c
+++ b/drivers/pci/pci-stub.c
@@ -16,6 +16,8 @@
  * .../:00:19.0/driver -> ../../../bus/pci/drivers/pci-stub
  */
 
+#define pr_fmt(fmt) "pci-stub: " fmt
+
 #include 
 #include 
 
@@ -66,20 +68,17 @@ static int __init pci_stub_init(void)
, _mask);
 
if (fields < 2) {
-   printk(KERN_WARNING
-  "pci-stub: invalid id string \"%s\"\n", id);
+   pr_warn("invalid id string \"%s\"\n", id);
continue;
}
 
-   printk(KERN_INFO
-  "pci-stub: add %04X:%04X sub=%04X:%04X cls=%08X/%08X\n",
+   pr_info("add %04X:%04X sub=%04X:%04X cls=%08X/%08X\n",
   vendor, device, subvendor, subdevice, class, class_mask);
 
rc = pci_add_dynid(_driver, vendor, device,
   subvendor, subdevice, class, class_mask, 0);
if (rc)
-   printk(KERN_WARNING
-  "pci-stub: failed to add dynamic id (%d)\n", rc);
+   pr_warn("failed to add dynamic id (%d)\n", rc);
}
 
return 0;
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 25794c2..455bf29 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -12,6 +12,7 @@
  * Modeled after usb's driverfs.c
  */
 
+#define pr_fmt(fmt) "pci: warning: " fmt
 
 #include 
 #include 
@@ -,7 +1112,7 @@ void pci_create_legacy_files(struct pci_bus *b)
kfree(b->legacy_io);
b->legacy_io = NULL;
 kzalloc_err:
-   printk(KERN_WARNING "pci: warning: could not create legacy I/O port and 
ISA memory resources to sysfs\n");
+   pr_warn("could not create legacy I/O port and ISA memory resources to 
sysfs\n");
return;
 }
 
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index f5ff01d..a24a8bc 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -8,6 +8,8 @@
  * Copyright 1997 -- 2000 Martin Mares 
  */
 
+#define pr_fmt(fmt) "PCI: " fmt
+
 #include 
 #include 
 #include 
@@ -6276,8 +6278,7 @@ static int __init pci_setup(char *str)
} else if (!strncmp(str, "disable_acs_redir=", 18)) {
disable_acs_redir_param = str + 18;
} else {
-   printk(KERN_ERR "PCI: Unknown option `%s'\n",
-   str);
+   pr_err("Unknown option `%s'\n", str);
}
}
str = k;
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 06af0c3..fb802d1 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -12,6 +12,8 @@
  * file, where their drivers can use them.
  */
 
+#define pr_fmt(fmt) "PCI: " fmt
+
 #include 
 #include 
 #include 
@@ -159,8 +161,7 @@ static int __init pci_apply_final_quirks(void)
u8 tmp;
 
if (pci_cache_line_size)
-   printk(KERN_DEBUG "PCI: CLS %u bytes\n",
-  pci_cache_line_size << 2);
+   pr_info("CLS %u bytes\n", pci_cache_line_size << 2);
 
pci_apply_fixup_final_quirks = true;
for_each_pci_dev(dev) {
@@ -177,7 +178,7 @@ static int __init pci_apply_final_quirks(void)
if (!tmp || cls == tmp)
continue;
 
-   printk(KERN_DEBUG "PCI: CLS mismatch (%u != %u), using 
%u bytes\n",
+   pr_info("CLS mismatch (%u != %u), using %u bytes\n",
   cls << 2, tmp << 2,
   pci_dfl_cache_line_size << 2);

[PATCH 1/3] PCI: Cleanup printk logging

2019-04-19 Thread Mohan Kumar
Replace printk with pr_* to avoid checkpatch warnings.

Signed-off-by: Mohan Kumar 
---
 drivers/pci/pci-acpi.c | 11 ---
 drivers/pci/quirks.c   |  2 +-
 2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
index e1949f7..3ada026 100644
--- a/drivers/pci/pci-acpi.c
+++ b/drivers/pci/pci-acpi.c
@@ -140,8 +140,7 @@ static acpi_status decode_type0_hpx_record(union 
acpi_object *record,
hpx->t0->enable_perr = fields[5].integer.value;
break;
default:
-   printk(KERN_WARNING
-  "%s: Type 0 Revision %d record not supported\n",
+   pr_warn("%s: Type 0 Revision %d record not supported\n",
   __func__, revision);
return AE_ERROR;
}
@@ -169,8 +168,7 @@ static acpi_status decode_type1_hpx_record(union 
acpi_object *record,
hpx->t1->tot_max_split = fields[4].integer.value;
break;
default:
-   printk(KERN_WARNING
-  "%s: Type 1 Revision %d record not supported\n",
+   pr_warn("%s: Type 1 Revision %d record not supported\n",
   __func__, revision);
return AE_ERROR;
}
@@ -211,8 +209,7 @@ static acpi_status decode_type2_hpx_record(union 
acpi_object *record,
hpx->t2->sec_unc_err_mask_or   = fields[17].integer.value;
break;
default:
-   printk(KERN_WARNING
-  "%s: Type 2 Revision %d record not supported\n",
+   pr_warn("%s: Type 2 Revision %d record not supported\n",
   __func__, revision);
return AE_ERROR;
}
@@ -272,7 +269,7 @@ static acpi_status acpi_run_hpx(acpi_handle handle, struct 
hotplug_params *hpx)
goto exit;
break;
default:
-   printk(KERN_ERR "%s: Type %d record not supported\n",
+   pr_err("%s: Type %d record not supported\n",
   __func__, type);
status = AE_ERROR;
goto exit;
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index f9cd4d4..06af0c3 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -2613,7 +2613,7 @@ static void nvbridge_check_legacy_irq_routing(struct 
pci_dev *dev)
pci_read_config_dword(dev, 0x74, );
 
if (cfg & ((1 << 2) | (1 << 15))) {
-   printk(KERN_INFO "Rewriting IRQ routing register on MCP55\n");
+   pr_info("Rewriting IRQ routing register on MCP55\n");
cfg &= ~((1 << 2) | (1 << 15));
pci_write_config_dword(dev, 0x74, cfg);
}
-- 
2.7.4



Re: [PATCH] random: Document get_random_int() family

2019-04-19 Thread Theodore Ts'o
On Tue, Mar 26, 2019 at 04:26:54AM +, George Spelvin wrote:
> Explain what these functions are for and when they offer
> an advantage over get_random_bytes().
> 
> (We still need documentation on rng_is_initialized(), the
> random_ready_callback system, and early boot in general.)
> 
> Signed-off-by: George Spelvin 

Applied, thanks.

- Ted


Re: [PATCH] lib: scatterlist: Fix to support no mapped sg

2019-04-19 Thread Zhou Wang
On 2019/4/19 18:43, Robert Jarzmik wrote:
> Zhou Wang  writes:
> 
>> In function sg_split, the second sg_calculate_split will return -EINVAL
>> when in_mapped_nents is 0.
>>
>> Indeed there is no need to do second sg_calculate_split and sg_split_mapped
>> when in_mapped_nents is 0, as in_mapped_nents indicates no mapped entry in
>> original sgl.
>>
>> Signed-off-by: Zhou Wang 
> That's right. May I know what is the usecase which led you to this fix ?

I posted a patchset to support HiSilicon compression engine:
https://lkml.org/lkml/2019/1/23/358. As Herber Xu suggested, it should use
new crypto acomp API which uses sgl as data input and output. However, our
hardware can not handle compress head for input/output data, so I need split
input/output sgl firstly which is not mapped sgl.

New version of HiSilicon compression engine driver has not been posted to
community yet. I will do this later.

> 
> Acked-by: Robert Jarzmik 

Many thanks for your review,
Zhou

> 
> --
> Robert
> 
> .
> 



Re: [PATCH -next] sched/fair: Make some functions static

2019-04-19 Thread YueHaibing
Pls ignore this.

On 2019/3/22 22:30, Yue Haibing wrote:
> From: YueHaibing 
> 
> Fix sparse warnings:
> 
> kernel/sched/fair.c:2596:6: warning: symbol 'task_tick_numa' was not 
> declared. Should it be static?
> kernel/sched/fair.c:3570:6: warning: symbol 'sync_entity_load_avg' was not 
> declared. Should it be static?
> kernel/sched/fair.c:3583:6: warning: symbol 'remove_entity_load_avg' was not 
> declared. Should it be static?
> 
> Signed-off-by: YueHaibing 
> ---
>  kernel/sched/fair.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index fdab7eb..18fb781 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2593,7 +2593,7 @@ void task_numa_work(struct callback_head *work)
>  /*
>   * Drive the periodic memory faults..
>   */
> -void task_tick_numa(struct rq *rq, struct task_struct *curr)
> +static void task_tick_numa(struct rq *rq, struct task_struct *curr)
>  {
>   struct callback_head *work = >numa_work;
>   u64 period, now;
> @@ -3567,7 +3567,7 @@ static inline u64 cfs_rq_last_update_time(struct cfs_rq 
> *cfs_rq)
>   * Synchronize entity load avg of dequeued entity without locking
>   * the previous rq.
>   */
> -void sync_entity_load_avg(struct sched_entity *se)
> +static void sync_entity_load_avg(struct sched_entity *se)
>  {
>   struct cfs_rq *cfs_rq = cfs_rq_of(se);
>   u64 last_update_time;
> @@ -3580,7 +3580,7 @@ void sync_entity_load_avg(struct sched_entity *se)
>   * Task first catches up with cfs_rq, and then subtract
>   * itself from the cfs_rq (task must be off the queue now).
>   */
> -void remove_entity_load_avg(struct sched_entity *se)
> +static void remove_entity_load_avg(struct sched_entity *se)
>  {
>   struct cfs_rq *cfs_rq = cfs_rq_of(se);
>   unsigned long flags;
> 



Re: [PATCH] random: fix CRNG initialization when random.trust_cpu=1

2019-04-19 Thread Theodore Ts'o
On Tue, Mar 19, 2019 at 01:28:47PM -0400, Jon DeVree wrote:
> When the system boots with random.trust_cpu=1 it doesn't initialize the
> per-NUMA CRNGs because it skips the rest of the CRNG startup code. This
> means that the code from 1e7f583af67b ("random: make /dev/urandom scalable
> for silly userspace programs") is not used when random.trust_cpu=1.
> 
> crash> dmesg | grep random:
> [0.00] random: get_random_bytes called from start_kernel+0x94/0x530 
> with crng_init=0
> [0.314029] random: crng done (trusting CPU's manufacturer)
> crash> print crng_node_pool
> $6 = (struct crng_state **) 0x0
> 
> After adding the missing call to numa_crng_init() the per-NUMA CRNGs are
> initialized again:
> 
> crash> dmesg | grep random:
> [0.00] random: get_random_bytes called from start_kernel+0x94/0x530 
> with crng_init=0
> [0.314031] random: crng done (trusting CPU's manufacturer)
> crash> print crng_node_pool
> $1 = (struct crng_state **) 0x9a915f4014a0
> 
> The call to invalidate_batched_entropy() was also missing. This is
> important for architectures like PPC and S390 which only have the
> arch_get_random_seed_* functions.
> 
> Fixes: 39a8883a2b98 ("random: add a config option to trust the CPU's hwrng")
> Signed-off-by: Jon DeVree 

Thanks, applied.

- Ted


Re: [PATCH -next] regulator: Make symbols static

2019-04-19 Thread YueHaibing
On 2019/4/20 0:12, Mark Brown wrote:
> On Tue, Apr 16, 2019 at 10:41:09PM +0800, Yue Haibing wrote:
>> From: YueHaibing 
>>
>> Fix sparse warnings:
>>
>> drivers/regulator/stm32-pwr.c:35:5: warning:
>>  symbol 'ready_mask_table' was not declared. Should it be static?
>> drivers/regulator/stm32-pwr.c:47:5: warning:
>>  symbol 'stm32_pwr_reg_is_ready' was not declared. Should it be static?
>> drivers/regulator/stm32-pwr.c:57:5: warning:
>>  symbol 'stm32_pwr_reg_is_enabled' was not declared. Should it be static?
> 
> This doesn't apply against current code, please check and resend (I
> think it's just that someone else made similar fixes already but didn't
> check properly).

Yep, there is already a fix from kbuild test robot:

82f26185a912 ("regulator: ready_mask_table[] can be static")

> 



Re: [PATCH] Staging: vc04_services: Cleanup in ctrl_set_bitrate()

2019-04-19 Thread Madhumitha Prabakaran
On 04/19  :49, Stefan Wahren wrote:
> Hi Madhumitha,
> 
> Am 19.04.19 um 23:23 schrieb Madhumitha Prabakaran:
> > Remove unnecessary variable and replace return type.
> >
> > Issue suggested by Coccinelle.
> >
> > Signed-off-by: Madhumitha Prabakaran 
> > ---
> >  drivers/staging/vc04_services/bcm2835-camera/controls.c | 7 +--
> >  1 file changed, 1 insertion(+), 6 deletions(-)
> >
> > diff --git a/drivers/staging/vc04_services/bcm2835-camera/controls.c 
> > b/drivers/staging/vc04_services/bcm2835-camera/controls.c
> > index e39ab5fae724..f87fa7f61db3 100644
> > --- a/drivers/staging/vc04_services/bcm2835-camera/controls.c
> > +++ b/drivers/staging/vc04_services/bcm2835-camera/controls.c
> > @@ -607,18 +607,13 @@ static int ctrl_set_bitrate(struct bm2835_mmal_dev 
> > *dev,
> > struct v4l2_ctrl *ctrl,
> > const struct bm2835_mmal_v4l2_ctrl *mmal_ctrl)
> >  {
> > -   int ret;
> > struct vchiq_mmal_port *encoder_out;
> >  
> > dev->capture.encode_bitrate = ctrl->val;
> >  
> > encoder_out = >component[MMAL_COMPONENT_VIDEO_ENCODE]->output[0];
> >  
> > -   ret = vchiq_mmal_port_parameter_set(dev->instance, encoder_out,
> > -   mmal_ctrl->mmal_id,
> > -   >val, sizeof(ctrl->val));
> > -   ret = 0;
> > -   return ret;
> > +   return 0;
> 
> i don't understand why we can remove here the function call.

I overlooked the function call. I will fix it.

> 
> Stefan
> 
> >  }
> >  
> >  static int ctrl_set_bitrate_mode(struct bm2835_mmal_dev *dev,


Re: [PATCH] tracing: Fix a memory leak bug

2019-04-19 Thread Wenwen Wang
On Fri, Apr 19, 2019 at 9:37 PM Steven Rostedt  wrote:
>
> On Fri, 19 Apr 2019 21:22:59 -0500
> Wenwen Wang  wrote:
>
> > In trace_pid_write(), the buffer for trace parser is allocated through
> > kmalloc() in trace_parser_get_init(). Later on, after the buffer is used,
> > it is then freed through kfree() in trace_parser_put(). However, it is
> > possible that trace_pid_write() is terminated due to unexpected errors,
> > e.g., ENOMEM. In that case, the allocated buffer will not be freed, which
> > is a memory leak bug.
> >
> > To fix this issue, free the allocated buffer when an error is encountered.
>
> Thanks for the patch. Did you find this through manual inspection,
> running KASAN or via one of the static analyzers?

Thanks for your question, Steve. It was based on a prototype of a
research project, which aims to statically detect memory leak bugs in
operating system kernels.

Wenwen


[PATCH] regulator: sy8106a: Get rid of struct sy8106a

2019-04-19 Thread Axel Lin
All the fields in struct sy8106a are only used in sy8106a_i2c_probe(), so
use local variables instead.

Signed-off-by: Axel Lin 
---
 drivers/regulator/sy8106a-regulator.c | 40 +--
 1 file changed, 13 insertions(+), 27 deletions(-)

diff --git a/drivers/regulator/sy8106a-regulator.c 
b/drivers/regulator/sy8106a-regulator.c
index 65fbd1f0b612..42e03b2c10a0 100644
--- a/drivers/regulator/sy8106a-regulator.c
+++ b/drivers/regulator/sy8106a-regulator.c
@@ -22,12 +22,6 @@
  */
 #define SY8106A_GO_BIT BIT(7)
 
-struct sy8106a {
-   struct regulator_dev *rdev;
-   struct regmap *regmap;
-   u32 fixed_voltage;
-};
-
 static const struct regmap_config sy8106a_regmap_config = {
.reg_bits = 8,
.val_bits = 8,
@@ -70,36 +64,32 @@ static const struct regulator_desc sy8106a_reg = {
 static int sy8106a_i2c_probe(struct i2c_client *i2c,
const struct i2c_device_id *id)
 {
-   struct sy8106a *chip;
struct device *dev = >dev;
-   struct regulator_dev *rdev = NULL;
+   struct regulator_dev *rdev;
struct regulator_config config = { };
+   struct regmap *regmap;
unsigned int reg, vsel;
+   u32 fixed_voltage;
int error;
 
-   chip = devm_kzalloc(>dev, sizeof(struct sy8106a), GFP_KERNEL);
-   if (!chip)
-   return -ENOMEM;
-
error = of_property_read_u32(dev->of_node, "silergy,fixed-microvolt",
->fixed_voltage);
+_voltage);
if (error)
return error;
 
-   if (chip->fixed_voltage < SY8106A_MIN_MV * 1000 ||
-   chip->fixed_voltage > SY8106A_MAX_MV * 1000)
+   if (fixed_voltage < SY8106A_MIN_MV * 1000 ||
+   fixed_voltage > SY8106A_MAX_MV * 1000)
return -EINVAL;
 
-   chip->regmap = devm_regmap_init_i2c(i2c, _regmap_config);
-   if (IS_ERR(chip->regmap)) {
-   error = PTR_ERR(chip->regmap);
+   regmap = devm_regmap_init_i2c(i2c, _regmap_config);
+   if (IS_ERR(regmap)) {
+   error = PTR_ERR(regmap);
dev_err(dev, "Failed to allocate register map: %d\n", error);
return error;
}
 
config.dev = >dev;
-   config.regmap = chip->regmap;
-   config.driver_data = chip;
+   config.regmap = regmap;
 
config.of_node = dev->of_node;
config.init_data = of_get_regulator_init_data(dev, dev->of_node,
@@ -109,15 +99,15 @@ static int sy8106a_i2c_probe(struct i2c_client *i2c,
return -ENOMEM;
 
/* Ensure GO_BIT is enabled when probing */
-   error = regmap_read(chip->regmap, SY8106A_REG_VOUT1_SEL, );
+   error = regmap_read(regmap, SY8106A_REG_VOUT1_SEL, );
if (error)
return error;
 
if (!(reg & SY8106A_GO_BIT)) {
-   vsel = (chip->fixed_voltage / 1000 - SY8106A_MIN_MV) /
+   vsel = (fixed_voltage / 1000 - SY8106A_MIN_MV) /
   SY8106A_STEP_MV;
 
-   error = regmap_write(chip->regmap, SY8106A_REG_VOUT1_SEL,
+   error = regmap_write(regmap, SY8106A_REG_VOUT1_SEL,
 vsel | SY8106A_GO_BIT);
if (error)
return error;
@@ -131,10 +121,6 @@ static int sy8106a_i2c_probe(struct i2c_client *i2c,
return error;
}
 
-   chip->rdev = rdev;
-
-   i2c_set_clientdata(i2c, chip);
-
return 0;
 }
 
-- 
2.17.1



Re: [RFC PATCH 16/62] f2fs: switch to ->free_inode()

2019-04-19 Thread Chao Yu
On 2019/4/17 1:52, Al Viro wrote:
> From: Al Viro 
> 
> Signed-off-by: Al Viro 

Acked-by: Chao Yu 

Thanks,


Re: [PATCH] tracing: Fix a memory leak bug

2019-04-19 Thread Steven Rostedt
On Fri, 19 Apr 2019 21:22:59 -0500
Wenwen Wang  wrote:

> In trace_pid_write(), the buffer for trace parser is allocated through
> kmalloc() in trace_parser_get_init(). Later on, after the buffer is used,
> it is then freed through kfree() in trace_parser_put(). However, it is
> possible that trace_pid_write() is terminated due to unexpected errors,
> e.g., ENOMEM. In that case, the allocated buffer will not be freed, which
> is a memory leak bug.
> 
> To fix this issue, free the allocated buffer when an error is encountered.

Thanks for the patch. Did you find this through manual inspection,
running KASAN or via one of the static analyzers?

-- Steve

> 
> Signed-off-by: Wenwen Wang 
> ---
>  kernel/trace/trace.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 6c24755..fd12c9c 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -496,8 +496,10 @@ int trace_pid_write(struct trace_pid_list
> *filtered_pids,
>* not modified.
>*/
>   pid_list = kmalloc(sizeof(*pid_list), GFP_KERNEL);
> - if (!pid_list)
> + if (!pid_list) {
> + trace_parser_put();
>   return -ENOMEM;
> + }
>  
>   pid_list->pid_max = READ_ONCE(pid_max);
>  
> @@ -507,6 +509,7 @@ int trace_pid_write(struct trace_pid_list
> *filtered_pids, 
>   pid_list->pids = vzalloc((pid_list->pid_max + 7) >> 3);
>   if (!pid_list->pids) {
> + trace_parser_put();
>   kfree(pid_list);
>   return -ENOMEM;
>   }



Re: [PATCH 1/6] ARM: ks8695: watchdog: stop using mach/*.h

2019-04-19 Thread Greg Ungerer

Hi Arnd,

On 16/4/19 6:24 am, Arnd Bergmann wrote:

drivers should not rely on machine specific headers but
get their information from the platform device.

Signed-off-by: Arnd Bergmann 


I like the whole series, thanks for doing this.

I haven't looked at the KS8695 in a long time now. I am not sure
that I have any working hardware - but I will have a look around my lab
and see if I can find something.

I'll get back to you with acks and tested bys soon.

Regards
Greg




---
  arch/arm/mach-ks8695/devices.c | 13 -
  drivers/watchdog/Kconfig   |  2 +-
  drivers/watchdog/ks8695_wdt.c  | 30 +-
  3 files changed, 30 insertions(+), 15 deletions(-)

diff --git a/arch/arm/mach-ks8695/devices.c b/arch/arm/mach-ks8695/devices.c
index 61cf20beb45f..57766817d86f 100644
--- a/arch/arm/mach-ks8695/devices.c
+++ b/arch/arm/mach-ks8695/devices.c
@@ -169,11 +169,22 @@ void __init ks8696_add_device_hpna(void)
  /* 
   *  Watchdog
   *  */
+#define KS8695_TMR_OFFSET  (0xF + 0xE400)
+#define KS8695_TMR_PA  (KS8695_IO_PA + KS8695_TMR_OFFSET)
+static struct resource ks8695_wdt_resources[] = {
+   [0] = {
+   .name   = "tmr",
+   .start  = KS8695_TMR_PA,
+   .end= KS8695_TMR_PA + 0xf,
+   .flags  = IORESOURCE_MEM,
+   },
+};
  
  static struct platform_device ks8695_wdt_device = {

.name   = "ks8695_wdt",
.id = -1,
-   .num_resources  = 0,
+   .resource   = ks8695_wdt_resources,
+   .num_resources  = ARRAY_SIZE(ks8695_wdt_resources),
  };
  
  static void __init ks8695_add_device_watchdog(void)

diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index 242eea859637..046e01daef57 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -397,7 +397,7 @@ config IXP4XX_WATCHDOG
  
  config KS8695_WATCHDOG

tristate "KS8695 watchdog"
-   depends on ARCH_KS8695
+   depends on ARCH_KS8695 || COMPILE_TEST
help
  Watchdog timer embedded into KS8695 processor. This will reboot your
  system when the timeout is reached.
diff --git a/drivers/watchdog/ks8695_wdt.c b/drivers/watchdog/ks8695_wdt.c
index 1e41818a44bc..87c542c2f912 100644
--- a/drivers/watchdog/ks8695_wdt.c
+++ b/drivers/watchdog/ks8695_wdt.c
@@ -23,10 +23,8 @@
  #include 
  #include 
  #include 
-#include 
  
-#define KS8695_TMR_OFFSET	(0xF + 0xE400)

-#define KS8695_TMR_VA  (KS8695_IO_VA + KS8695_TMR_OFFSET)
+#define KS8695_CLOCK_RATE  2500
  
  /*

   * Timer registers
@@ -57,6 +55,7 @@ MODULE_PARM_DESC(nowayout, "Watchdog cannot be stopped once 
started (default="
  
  static unsigned long ks8695wdt_busy;

  static DEFINE_SPINLOCK(ks8695_lock);
+static void __iomem *tmr_reg;
  
  /* . */
  
@@ -69,8 +68,8 @@ static inline void ks8695_wdt_stop(void)
  
  	spin_lock(_lock);

/* disable timer0 */
-   tmcon = __raw_readl(KS8695_TMR_VA + KS8695_TMCON);
-   __raw_writel(tmcon & ~TMCON_T0EN, KS8695_TMR_VA + KS8695_TMCON);
+   tmcon = __raw_readl(tmr_reg + KS8695_TMCON);
+   __raw_writel(tmcon & ~TMCON_T0EN, tmr_reg + KS8695_TMCON);
spin_unlock(_lock);
  }
  
@@ -84,15 +83,15 @@ static inline void ks8695_wdt_start(void)
  
  	spin_lock(_lock);

/* disable timer0 */
-   tmcon = __raw_readl(KS8695_TMR_VA + KS8695_TMCON);
-   __raw_writel(tmcon & ~TMCON_T0EN, KS8695_TMR_VA + KS8695_TMCON);
+   tmcon = __raw_readl(tmr_reg + KS8695_TMCON);
+   __raw_writel(tmcon & ~TMCON_T0EN, tmr_reg + KS8695_TMCON);
  
  	/* program timer0 */

-   __raw_writel(tval | T0TC_WATCHDOG, KS8695_TMR_VA + KS8695_T0TC);
+   __raw_writel(tval | T0TC_WATCHDOG, tmr_reg + KS8695_T0TC);
  
  	/* re-enable timer0 */

-   tmcon = __raw_readl(KS8695_TMR_VA + KS8695_TMCON);
-   __raw_writel(tmcon | TMCON_T0EN, KS8695_TMR_VA + KS8695_TMCON);
+   tmcon = __raw_readl(tmr_reg + KS8695_TMCON);
+   __raw_writel(tmcon | TMCON_T0EN, tmr_reg + KS8695_TMCON);
spin_unlock(_lock);
  }
  
@@ -105,9 +104,9 @@ static inline void ks8695_wdt_reload(void)
  
  	spin_lock(_lock);

/* disable, then re-enable timer0 */
-   tmcon = __raw_readl(KS8695_TMR_VA + KS8695_TMCON);
-   __raw_writel(tmcon & ~TMCON_T0EN, KS8695_TMR_VA + KS8695_TMCON);
-   __raw_writel(tmcon | TMCON_T0EN, KS8695_TMR_VA + KS8695_TMCON);
+   tmcon = __raw_readl(tmr_reg + KS8695_TMCON);
+   __raw_writel(tmcon & ~TMCON_T0EN, tmr_reg + KS8695_TMCON);
+   __raw_writel(tmcon | TMCON_T0EN, tmr_reg + KS8695_TMCON);
spin_unlock(_lock);
  }
  
@@ -238,6 +237,11 @@ static struct miscdevice ks8695wdt_miscdev = {

  static int ks8695wdt_probe(struct platform_device *pdev)
  {
int res;
+   

Re: [tip:core/rseq] rseq/selftests/x86: Work around bogus gcc-8 optimisation

2019-04-19 Thread Mathieu Desnoyers
- On Apr 19, 2019, at 5:13 PM, Mathieu Desnoyers 
mathieu.desnoy...@efficios.com wrote:

> - On Apr 19, 2019, at 2:53 PM, Linus Torvalds 
> torva...@linux-foundation.org
> wrote:
> 
>> On Fri, Apr 19, 2019 at 11:46 AM tip-bot for Mathieu Desnoyers
>>  wrote:
>>>
>>> rseq/selftests/x86: Work around bogus gcc-8 optimisation
>>>
>>> At least the following versions of gcc-8:
>>>
>>>  - gcc version 8.0.1 20180414 (experimental) [trunk revision 259383] (Ubuntu
>>>  8-20180414-1ubuntu2)
>>>  - gcc 8.2.0-7ubuntu1 (Ubuntu 18.10 (Cosmic)),
>>>
>>> generate broken assembler with asm goto that have a thread-local storage
>>> "m" input operand on both x86-32 and x86-64. For instance:
>> 
>> Is there a gcc bugzilla for this? Shouldn't that be mentioned here?
> 
> My google-fu did not find any match, so I don't think so. I'll open one
> shortly.
> 
>> 
>> Also, we use "asm goto" together with "m" all the time in the kernel.
>> In fact, it's the most common case, with the RMWcc ops being generated
>> with that. I realize that we don't use the gcc thread-local storage
>> for them (we often do use our *own* thread-local storage), but it
>> would be good to have that gcc bugzilla to see why it can only affect
>> those user level "__thread" cases.
> 
> Indeed, it would be good to investigate and understand all problematic
> scenarios triggering this bug.

I've opened a ticket on gcc's bugzilla:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90193

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


[PATCH] tracing: Fix a memory leak bug

2019-04-19 Thread Wenwen Wang
In trace_pid_write(), the buffer for trace parser is allocated through
kmalloc() in trace_parser_get_init(). Later on, after the buffer is used,
it is then freed through kfree() in trace_parser_put(). However, it is
possible that trace_pid_write() is terminated due to unexpected errors,
e.g., ENOMEM. In that case, the allocated buffer will not be freed, which
is a memory leak bug.

To fix this issue, free the allocated buffer when an error is encountered.

Signed-off-by: Wenwen Wang 
---
 kernel/trace/trace.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 6c24755..fd12c9c 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -496,8 +496,10 @@ int trace_pid_write(struct trace_pid_list *filtered_pids,
 * not modified.
 */
pid_list = kmalloc(sizeof(*pid_list), GFP_KERNEL);
-   if (!pid_list)
+   if (!pid_list) {
+   trace_parser_put();
return -ENOMEM;
+   }
 
pid_list->pid_max = READ_ONCE(pid_max);
 
@@ -507,6 +509,7 @@ int trace_pid_write(struct trace_pid_list *filtered_pids,
 
pid_list->pids = vzalloc((pid_list->pid_max + 7) >> 3);
if (!pid_list->pids) {
+   trace_parser_put();
kfree(pid_list);
return -ENOMEM;
}
-- 
2.7.4



Re: [PATCH 5/6] ARM: ks8695, serial: skip manual tx IRQ ack

2019-04-19 Thread Greg Ungerer

Hi Arnd,

On 16/4/19 6:24 am, Arnd Bergmann wrote:

The TX interrupt is marked as edge triggered, so it will
already be acked by the top-level irq code, and does not
need the ack in the driver.

Removing this avoids a nasty dependency on the regs-irq.h
file that is otherwise reserved for the interrupt controller
driver.

Signed-off-by: Arnd Bergmann 
---
  drivers/tty/serial/serial_ks8695.c | 5 -
  1 file changed, 5 deletions(-)

diff --git a/drivers/tty/serial/serial_ks8695.c 
b/drivers/tty/serial/serial_ks8695.c
index b461d791188c..6c5e9900e69d 100644
--- a/drivers/tty/serial/serial_ks8695.c
+++ b/drivers/tty/serial/serial_ks8695.c
@@ -21,7 +21,6 @@
  #include 
  
  #include 

-#include 
  
  #if defined(CONFIG_SERIAL_KS8695_CONSOLE) && defined(CONFIG_MAGIC_SYSRQ)

  #define SUPPORT_SYSRQ
@@ -52,8 +51,6 @@
  #define UART_GET_BRDR(p)  __raw_readl((p)->membase + KS8695_URBD)
  #define UART_PUT_BRDR(p, c)   __raw_writel((c), (p)->membase + KS8695_URBD)
  
-#define KS8695_CLR_TX_INT()	__raw_writel(1 << KS8695_IRQ_UART_TX, KS8695_IRQ_VA + KS8695_INTST)

-
  #define UART_DUMMY_LSR_RX 0x100
  #define UART_PORT_SIZE(KS8695_USR - KS8695_URRB + 4)
  
@@ -207,7 +204,6 @@ static irqreturn_t ks8695uart_tx_chars(int irq, void *dev_id)

unsigned int count;
  
  	if (port->x_char) {

-   KS8695_CLR_TX_INT();
UART_PUT_CHAR(port, port->x_char);
port->icount.tx++;
port->x_char = 0;
@@ -221,7 +217,6 @@ static irqreturn_t ks8695uart_tx_chars(int irq, void 
*dev_id)
  
  	count = 16;	/* fifo size */

while (!uart_circ_empty(xmit) && (count-- > 0)) {
-   KS8695_CLR_TX_INT();


I haven't looked at the ks8695 in quite a while...
But I recall that this was very problematic at the time. Without this being
done after each character it was very easy to get the transmitter to "hang" -
and stop wanting to send any more characters.

I'd like to test this before acking.

Regards
Greg



UART_PUT_CHAR(port, xmit->buf[xmit->tail]);
  
  		xmit->tail = (xmit->tail + 1) & (UART_XMIT_SIZE - 1);




[PATCH] HID: intel-ish-hid: Add Comet Lake PCI device ID

2019-04-19 Thread Srinivas Pandruvada
Added Comet Lake PCI device ID to the supported device list.

Signed-off-by: Srinivas Pandruvada 
---
 drivers/hid/intel-ish-hid/ipc/hw-ish.h  | 1 +
 drivers/hid/intel-ish-hid/ipc/pci-ish.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/hid/intel-ish-hid/ipc/hw-ish.h 
b/drivers/hid/intel-ish-hid/ipc/hw-ish.h
index 08a8327dfd22..523c0cbd44a4 100644
--- a/drivers/hid/intel-ish-hid/ipc/hw-ish.h
+++ b/drivers/hid/intel-ish-hid/ipc/hw-ish.h
@@ -31,6 +31,7 @@
 #define CNL_H_DEVICE_ID0xA37C
 #define ICL_MOBILE_DEVICE_ID   0x34FC
 #define SPT_H_DEVICE_ID0xA135
+#define CML_LP_DEVICE_ID   0x02FC
 
 #defineREVISION_ID_CHT_A0  0x6
 #defineREVISION_ID_CHT_Ax_SI   0x0
diff --git a/drivers/hid/intel-ish-hid/ipc/pci-ish.c 
b/drivers/hid/intel-ish-hid/ipc/pci-ish.c
index a6e1ee744f4d..ac0a179daf23 100644
--- a/drivers/hid/intel-ish-hid/ipc/pci-ish.c
+++ b/drivers/hid/intel-ish-hid/ipc/pci-ish.c
@@ -40,6 +40,7 @@ static const struct pci_device_id ish_pci_tbl[] = {
{PCI_DEVICE(PCI_VENDOR_ID_INTEL, CNL_H_DEVICE_ID)},
{PCI_DEVICE(PCI_VENDOR_ID_INTEL, ICL_MOBILE_DEVICE_ID)},
{PCI_DEVICE(PCI_VENDOR_ID_INTEL, SPT_H_DEVICE_ID)},
+   {PCI_DEVICE(PCI_VENDOR_ID_INTEL, CML_LP_DEVICE_ID)},
{0, }
 };
 MODULE_DEVICE_TABLE(pci, ish_pci_tbl);
-- 
2.17.2



Re: [PATCH v2] clk: hi3660: Mark clk_gate_ufs_subsys as critical

2019-04-19 Thread Leo Yan
On Fri, Apr 19, 2019 at 03:20:42PM -0700, Stephen Boyd wrote:
> Quoting Leo Yan (2019-03-20 03:05:08)
> > clk_gate_ufs_subsys is a system bus clock, turning off it will
> > introduce lockup issue during system suspend flow.  Let's mark
> > clk_gate_ufs_subsys as critical clock, thus keeps it on during
> > system suspend and resume.
> > 
> > Fixes: d374e6fd5088 ("clk: hisilicon: Add clock driver for hi3660 SoC")
> > Cc: sta...@vger.kernel.org
> > Cc: Zhong Kaihua 
> > Cc: John Stultz 
> > Cc: Zhangfei Gao 
> > Suggested-by: Dong Zhang 
> > Signed-off-by: Leo Yan 
> > ---
> 
> Applied to clk-next

Thanks!


Re: [PATCH 4.9 00/50] 4.9.170-stable review

2019-04-19 Thread Sasha Levin

On Fri, Apr 19, 2019 at 01:16:37PM -0700, Guenter Roeck wrote:

On Fri, Apr 19, 2019 at 12:39:26PM -0700, Guenter Roeck wrote:

On Thu, Apr 18, 2019 at 07:57:11PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.9.170 release.
> There are 50 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sat Apr 20 16:03:22 UTC 2019.
> Anything received after that time might be too late.
>
Build results:
total: 172 pass: 170 fail: 2
Failed builds:
i386:tools/perf
x86_64:tools/perf


Caused by "tools include: Adopt linux/bits.h". That isn't really
a bug fix, and since it is the last commit in the perf directory
it isn't needed either. Reverting it fixes the problem.


Indeed. I'll remove it from all branches since it doesn't look like a
fix to begin with.

--
Thanks,
Sasha


Re: [GIT PULL] cgroup fix for v5.1-rc5

2019-04-19 Thread pr-tracker-bot
The pull request you sent on Fri, 19 Apr 2019 17:28:34 -0700:

> git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-5.1-fixes

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/371dd432ab39f7bc55d6ec77d63b430285627e04

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [PATCH RFC 1/2] Add polling support to pidfd

2019-04-19 Thread Joel Fernandes
On Fri, Apr 19, 2019 at 04:11:37PM -0700, Linus Torvalds wrote:
> On Fri, Apr 19, 2019 at 2:20 PM Joel Fernandes  wrote:
> >
> > According to Linus, POLLHUP usually indicates that something is readable:
> 
> Note that if you use the legacy interfaces (ie "select()"), then even
> just a plain POLLHUP will always just show as "readable".
> 
> So for a lot of applications, it won't even matter. Returning POLLHUP
> will mean it's readable whether POLLIN is set or not (and EPOLLERR
> will automatically mean that it's marked both readable and writable).
> 
> In fact, I'd argue that not a lot of people care about the more
> esoteric bits at all. If your poll function does POLLIN and POLLOUT,
> it's fine. Anything else is specialized enough that most people simply
> won't care. Don't overdesign things too much. You need to have a major
> reason to care about the other POLL bits.
> 
> It's also worth noting that POLLERR/POLLHUP/POLLNVAL cannot be masked
> for "poll()". Even if you only ask for POLLIN/POLLOUT, you will always
> get POLLERR/POLLHUP notification. That is again historical behavior,
> and it's kind of a "you can't poll a hung up fd". But it once again
> means that you should consider POLLHUP to be something *exceptional*
> and final, where no further or other state changes can happen or are
> relevant.
> 
> That may well work fine for pidfd when the task is gone, but it's
> really worth noting that any *normal* state should be about
> POLLIN/POLLOUT. People should not think that "POLLHUP sounds like the
> appropriate name", they should realize that POLLHUP is basically a
> terminal error condition, not a "input is available".
> 
> So just keep that in mind.

Got it, thanks a lot for the detailed explanation of the flags. So then I
feel I should not return POLLHUP or POLLERR because the task_struct getting
reaped is not an error condition. I will simplify the patch and repost it. It
seems that would be the best thing to do to serve the usecase. We just want
to know when the task exited and a reaped task also counts as one that
exited.

thanks,

 - Joel



Re: [PATCH 3/6] y2038: linux: Provide __clock_settime64 implementation

2019-04-19 Thread Stepan Golosunov
15.04.2019 в 00:08:38 +0200 Lukasz Majewski написал:
> +# if defined __NR_clock_settime64
> +  /* Make sure that passed __timespec64 struct pad is 0.  */
> +  struct __timespec64 ts = *tp;
> +  ts.tv_pad = 0;
> +  return INLINE_SYSCALL_CALL (clock_settime64, clock_id, );

Isn't kernel supposed to zero out padding on its own?
At least comment in kernel's get_timespec64 says so:

/* Zero out the padding for 32 bit systems or in compat mode */
if (IS_ENABLED(CONFIG_64BIT_TIME) && in_compat_syscall())
kts.tv_nsec &= 0xUL;

The code looks buggy though. It fails to zero out the padding in
32-bit kernels. That part is probably broken since
98f76206b3350 ("compat: Cleanup in_compat_syscall() callers").

And, hmm, is CONFIG_64BIT_TIME enabled anywhere?


Re: [PATCH] kexec_buffer measure

2019-04-19 Thread prakhar srivastava
Currently for soft reboot(kexec_file_load) the kernel file and
signature is measured by IMA. The cmdline args used to load the kernel
is not measured.
The boot aggregate that gets calculated will have no change since the
EFI loader has not been triggered.
Adding the kexec cmdline args measure and kernel version will add some
attestable criteria.

Following patch v2 set adds support in IMA to measure buffers, and
uses the same to measure the cmdline arguments passed in case of
kexec.
Additional support to measure kernel version will follow in a
subsequent patch set.
Support for kexec_buffer pass for will be added later for
the whole attestation/appraisal to work.

Approaches and clarifications:
1) Use of Sig template
Appraising buffer entries can't be done unless the buffers are pre
calculated and available for appraisal. This raises a problem for
cmdline args since the kernel can be loaded with a large set of args.
For buffer entries, the appraisal can be deferred to attestation
service, the event data is needed in that case.
Adding the event data to ima template can be done by either adding a
new template or reusing an existing one.
-Using a new template entry will only work for buffers and adds no
current use case for files, every EXISTING entry for measuring files
will use this new template
and so all but the buffer measurement line item will have the new columns empty.
this will be a huge change pertaining to ima for a not a lot of use cases.

2) Adding a LSM hook
We are doing both the command line and kernel version measurement in IMA.
Can you please elaborate on how this can be used outside of the scenario?
That will help me come back with a better design and code. I am
neutral about this.

I greatly appreciate the time and your feedback.
-Prakhar Srivastava

On Mon, Apr 15, 2019 at 2:38 PM Mimi Zohar  wrote:
>
> On Fri, 2019-04-12 at 11:08 -0700, prsriv...@gmail.com wrote:
> > From: Prakhar Srivastava 
> >
> > This adds a generic buffer measure ima function hook.
>
> A patch description should begin with a problem description or
> motivation for the patch, before describing the solution.
>
> > The Buffer passed will be added to the sig field of the template if used.
>
> Template fields are defined in ima_template.c  You can't simply re-use
> an existing field like this.
>
> > The hash(buffer) is added as the IMA measurements, along 
> > side the buffer itself in hex.
> >   For cmdline: kernel file name is prefixed to the cmdline to 
> > distinguish between callers.
> > An enum is used to control what all buffers can be written to it.
> >
> > Verified How:
> > Replaced kernel on machine.
> > read ima_measurement_log
> > called kexec -s 
> > read ima_measurement_log
> >  - A new entry for the cmdline passed can be seen.
> >  - HEX to text verified: 
> > http://www.unit-conversion.info/texttools/hexadecimal/
> >  - Generated Hash for the text buffer and matched it with in IMA log
>
> The kexec man page provides this information. I don't see the need for
> these details here.
>
> Please refer to section "2) Describe your changes" in
> Documentation/process/submitting-patches.rst for instructions on
> writing a patch description.
>
> >
> > Signed-off-by: Prakhar Srivastava 
>
> > ---
> >  include/linux/ima.h   |  21 -
> >  kernel/kexec_core.c   |  68 +++
> >  kernel/kexec_file.c   |  18 
> >  kernel/kexec_internal.h   |   6 +-
> >  security/integrity/ima/Kconfig|  16 
> >  security/integrity/ima/ima_main.c | 135 ++
> >  security/integrity/integrity.h|   3 +
> >  7 files changed, 264 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/ima.h b/include/linux/ima.h
> > index 7f6952f8d6aa..46c3b95b2637 100644
> > --- a/include/linux/ima.h
> > +++ b/include/linux/ima.h
> > @@ -14,6 +14,14 @@
> >  #include 
> >  struct linux_binprm;
> >
> > +#ifdef CONFIG_IMA_BUFFER_MEASURE
> > +enum buffer_id{
> > + KEXEC_CMDLINE = 1
> > + // other buffer ids
> > + // KERNEL_VERSION
>
> There are already two enumerations - kernel_read_file_id and
> kernel_load_file_id.  Unless there are clear examples of other buffer
> data, please don't define a new enumeration.
Removed new enums.
>
> > +};
> > +#endif
> > +
> >  #ifdef CONFIG_IMA
> >  extern int ima_bprm_check(struct linux_binprm *bprm);
> >  extern int ima_file_check(struct file *file, int mask, int opened);
> > @@ -23,7 +31,10 @@ extern int ima_read_file(struct file *file, enum 
> > kernel_read_file_id id);
> >  extern int ima_post_read_file(struct file *file, void *buf, loff_t size,
> > enum kernel_read_file_id id);
> >  extern void ima_post_path_mknod(struct dentry *dentry);
> > -
> > +#ifdef CONFIG_IMA_BUFFER_MEASURE
> > +extern int ima_add_buffer_measure(const void *buff,
> > + loff_t size, enum buffer_id id);
> > +#endif
> >  #ifdef CONFIG_IMA_KEXEC
> >  extern void 

[PATCH] dcache: ensure d_flags & d_inode are consistent in lookup_fast()

2019-04-19 Thread Hou Tao
After extending the size of dentry from 192-bytes to 208-bytes
under aarch64, we got oops during the running of xfstests generic/429:

  Unable to handle kernel NULL pointer dereference at virtual address 
0002
  CPU: 3 PID: 2725 Comm: t_encrypted_d_r Tainted: G  D   5.1.0-rc4
  pc : inode_permission+0x28/0x160
  lr : link_path_walk.part.11+0x27c/0x528
  ..
  Call trace:
   inode_permission+0x28/0x160
   link_path_walk.part.11+0x27c/0x528
   path_lookupat+0x64/0x208
   filename_lookup+0xa0/0x178
   user_path_at_empty+0x58/0x70
   vfs_statx+0x94/0x118
   __se_sys_newfstatat+0x58/0x98
   __arm64_sys_newfstatat+0x24/0x30
   el0_svc_common+0x7c/0x148
   el0_svc_handler+0x38/0x88
   el0_svc+0x8/0xc

If we revert the size extension of dentry, the oops will be gone.
However if we just move the d_inode field from the begin of dentry
struct to the end of dentry struct (poorly simulate the way how
__randomize_layout works), the oops will reoccur.

The following scenario illustrates the problem:

precondition:
* dentry A has just been unlinked and becomes a negative dentry
* dentry A is encrypted, so it has d_revalidate hook: fscrypt_d_revalidate()
* lookup process is looking A/file, and creation process is creating A

lookup process: creation process:

lookup_fast
__d_lookup_rcu returns dentry A

d_revalidate returns -ECHILD

d_revalidate again succeed
__d_set_inode_and_type
dentry->d_inode = inode
WRITE_ONCE(dentry->d_flags, flags)

d_is_negative(dentry) return false
follow_managed doesn't nothing
// inconsistent with d_flags
d_backing_inode() return NULL
nd->inode = NULL

may_lookup()
// oops occurs
inode_permission(nd->inode

The root cause is the inconsistency between d_flags & d_inode
during the REF-walk in lookup_fast(): d_is_negative(dentry)
returns false, but d_backing_inode() still returns a NULL pointer.

The RCU-walk path in lookup_fast() uses d_seq to ensure d_flags & d_inode
are consistent, and lookup_slow() use inode lock to ensure that, so only
the REF-walk path in lookup_fast() is problematic.

Fixing it by adding a paired smp_rmb/smp_wmb between the reading/writing
of d_inode & d_flags to ensure the consistency.

Signed-off-by: Hou Tao 
---
 fs/dcache.c | 2 ++
 fs/namei.c  | 7 +++
 2 files changed, 9 insertions(+)

diff --git a/fs/dcache.c b/fs/dcache.c
index aac41adf4743..1eb85f9fcb0f 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -316,6 +316,8 @@ static inline void __d_set_inode_and_type(struct dentry 
*dentry,
unsigned flags;
 
dentry->d_inode = inode;
+   /* paired with smp_rmb() in lookup_fast() */
+   smp_wmb();
flags = READ_ONCE(dentry->d_flags);
flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU);
flags |= type_flags;
diff --git a/fs/namei.c b/fs/namei.c
index dede0147b3f6..833f760c70b2 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1628,6 +1628,13 @@ static int lookup_fast(struct nameidata *nd,
return -ENOENT;
}
 
+   /*
+* Paired with smp_wmb() in __d_set_inode_and_type() to ensure
+* d_backing_inode is not NULL after the checking of d_flags
+* in d_is_negative() completes.
+*/
+   smp_rmb();
+
path->mnt = mnt;
path->dentry = dentry;
err = follow_managed(path, nd);
-- 
2.16.2.dirty



Re: Urgent Reply

2019-04-19 Thread Mr.Yacouba Frank
Dear friend,

I am the Head of file Department in Africa Develop bank Burkina Faso.
I need your sincere cooperation to transfer the sum of $15 million
U.S.A dollars to your bank account.

I want you stand as next of kin to the deceased. I agree 40% for you
and 60% for me. Upon receipt of your reply, I will send to you by
email the text of the application. I will not fail to bring to your
notice that this transaction is hitch free and that you should not
entertain any atom of fear as all required arrangements have been made
for the transfer.

I am waiting for your immediate response.

Yours faithfully,
Mr.Frank Yacouba


[GIT PULL] cgroup fix for v5.1-rc5

2019-04-19 Thread Tejun Heo
Hello, Linus.

A patch to fix a RCU imbalance error in the devices cgroup
configuration error path.

Thanks.

The following changes since commit 9e98c678c2d6ae3a17cb2de55d17f69dddaa231b:

  Linux 5.1-rc1 (2019-03-17 14:22:26 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-5.1-fixes

for you to fetch changes up to 0fcc4c8c044e117ac126ab6df4138ea9a67fa2a9:

  device_cgroup: fix RCU imbalance in error case (2019-03-19 10:46:15 -0700)


Jann Horn (1):
  device_cgroup: fix RCU imbalance in error case

 security/device_cgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/device_cgroup.c b/security/device_cgroup.c
index cd97929fac66..dc28914fa72e 100644
--- a/security/device_cgroup.c
+++ b/security/device_cgroup.c
@@ -560,7 +560,7 @@ static int propagate_exception(struct dev_cgroup 
*devcg_root,
devcg->behavior == DEVCG_DEFAULT_ALLOW) {
rc = dev_exception_add(devcg, ex);
if (rc)
-   break;
+   return rc;
} else {
/*
 * in the other possible cases:

-- 
tejun


[tip:WIP.x86/alternatives 2/9] kernel/jump_label.c:387:10-11: WARNING: return of 0/1 in function 'jump_label_can_update_check' with return type bool

2019-04-19 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
WIP.x86/alternatives
head:   1f30946b1a01baf22df6faf74c0a1e602bb6cac7
commit: 41bef31d0abe29b5888b33b526cacc8a30795318 [2/9] jump_label: Add the 
jump_label_can_update_check() helper

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 



coccinelle warnings: (new ones prefixed by >>)

>> kernel/jump_label.c:387:10-11: WARNING: return of 0/1 in function 
>> 'jump_label_can_update_check' with return type bool

vim +/jump_label_can_update_check +387 kernel/jump_label.c

   376  
   377  bool jump_label_can_update_check(struct jump_entry *entry, bool init)
   378  {
   379  /*
   380   * An entry->code of 0 indicates an entry which has been
   381   * disabled because it was in an init text area.
   382   */
   383  if (init || !jump_entry_is_init(entry)) {
   384  if (!kernel_text_address(jump_entry_code(entry))) {
   385  WARN_ONCE(1, "can't patch jump_label at %pS",
   386(void *)jump_entry_code(entry));
 > 387  return 0;
   388  }
   389  return 1;
   390  }
   391  return 0;
   392  }
   393  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


Re: [PATCH] Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic()

2019-04-19 Thread Nicholas Piggin
Paul E. McKenney's on April 20, 2019 4:26 am:
> On Fri, Apr 19, 2019 at 08:00:17PM +0200, Peter Zijlstra wrote:
>> On Fri, Apr 19, 2019 at 01:21:45PM -0400, Alan Stern wrote:
>> > Index: usb-devel/Documentation/atomic_t.txt
>> > ===
>> > --- usb-devel.orig/Documentation/atomic_t.txt
>> > +++ usb-devel/Documentation/atomic_t.txt
>> > @@ -171,7 +171,10 @@ The barriers:
>> >smp_mb__{before,after}_atomic()
>> >  
>> >  only apply to the RMW ops and can be used to augment/upgrade the ordering
>> > -inherent to the used atomic op. These barriers provide a full smp_mb().
>> > +inherent to the used atomic op. Unlike normal smp_mb() barriers, they 
>> > order
>> > +only the RMW op itself against the instructions preceding the
>> > +smp_mb__before_atomic() or following the smp_mb__after_atomic(); they do
>> > +not order instructions on the other side of the RMW op at all.
>> 
>> Now it is I who is confused; what?
>> 
>>  x = 1;
>>  smp_mb__before_atomic();
>>  atomic_add(1, );
>>  y = 1;
>> 
>> the stores to both x and y will be ordered as if an smp_mb() where
>> there. There is no order between a and y otoh.
> 
> Let's look at x86.  And a slightly different example:
> 
>   x = 1;
>   smp_mb__before_atomic();
>   atomic_add(1, );
>   r1 = y;
> 
> The atomic_add() asm does not have the "memory" constraint, which is
> completely legitimate because atomic_add() does not return a value,
> and thus guarantees no ordering.  The compiler is therefore within
> its rights to transform the code into the following:
> 
>   x = 1;
>   smp_mb__before_atomic();
>   r1 = y;
>   atomic_add(1, );
> 
> But x86's smp_mb__before_atomic() is just a compiler barrier, and
> x86 is further allowed to reorder prior stores with later loads.
> The CPU can therefore execute this code as follows:
> 
>   r1 = y;
>   x = 1;
>   smp_mb__before_atomic();
>   atomic_add(1, );
> 
> So in general, the ordering is guaranteed only to the atomic itself,
> not to accesses on the other side of the atomic.

That's interesting. I don't think that's what all our code expects.
I had the same idea as Peter.

IIRC the primitive was originally introduced exactly so x86 would not
need to have the unnecessary hardware barrier with sequences like

  smp_mb();
  ...
  atomic_inc();

The "new" semantics are a bit subtle. One option might be just to
replace it entirely with atomic_xxx_mb() ?

Thanks,
Nick



Re: [PATCH RFC 1/2] Add polling support to pidfd

2019-04-19 Thread Christian Brauner
On Sat, Apr 20, 2019 at 1:47 AM Daniel Colascione  wrote:
>
> On Fri, Apr 19, 2019 at 4:12 PM Christian Brauner  
> wrote:
> >
> > On Sat, Apr 20, 2019 at 12:46 AM Daniel Colascione  
> > wrote:
> > >
> > > On Fri, Apr 19, 2019 at 3:02 PM Christian Brauner  
> > > wrote:
> > > >
> > > > On Fri, Apr 19, 2019 at 11:48 PM Christian Brauner 
> > > >  wrote:
> > > > >
> > > > > On Fri, Apr 19, 2019 at 11:21 PM Daniel Colascione 
> > > > >  wrote:
> > > > > >
> > > > > > On Fri, Apr 19, 2019 at 1:57 PM Christian Brauner 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Fri, Apr 19, 2019 at 10:34 PM Daniel Colascione 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Fri, Apr 19, 2019 at 12:49 PM Joel Fernandes 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Fri, Apr 19, 2019 at 09:18:59PM +0200, Christian Brauner 
> > > > > > > > > wrote:
> > > > > > > > > > On Fri, Apr 19, 2019 at 03:02:47PM -0400, Joel Fernandes 
> > > > > > > > > > wrote:
> > > > > > > > > > > On Thu, Apr 18, 2019 at 07:26:44PM +0200, Christian 
> > > > > > > > > > > Brauner wrote:
> > > > > > > > > > > > On April 18, 2019 7:23:38 PM GMT+02:00, Jann Horn 
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > >On Wed, Apr 17, 2019 at 3:09 PM Oleg Nesterov 
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >> On 04/16, Joel Fernandes wrote:
> > > > > > > > > > > > >> > On Tue, Apr 16, 2019 at 02:04:31PM +0200, Oleg 
> > > > > > > > > > > > >> > Nesterov wrote:
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > Could you explain when it should return POLLIN? 
> > > > > > > > > > > > >> > > When the whole
> > > > > > > > > > > > >process exits?
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > It returns POLLIN when the task is dead or doesn't 
> > > > > > > > > > > > >> > exist anymore,
> > > > > > > > > > > > >or when it
> > > > > > > > > > > > >> > is in a zombie state and there's no other thread 
> > > > > > > > > > > > >> > in the thread
> > > > > > > > > > > > >group.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> IOW, when the whole thread group exits, so it can't 
> > > > > > > > > > > > >> be used to
> > > > > > > > > > > > >monitor sub-threads.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> just in case... speaking of this patch it doesn't 
> > > > > > > > > > > > >> modify
> > > > > > > > > > > > >proc_tid_base_operations,
> > > > > > > > > > > > >> so you can't poll("/proc/sub-thread-tid") anyway, 
> > > > > > > > > > > > >> but iiuc you are
> > > > > > > > > > > > >going to use
> > > > > > > > > > > > >> the anonymous file returned by CLONE_PIDFD ?
> > > > > > > > > > > > >
> > > > > > > > > > > > >I don't think procfs works that way. 
> > > > > > > > > > > > >/proc/sub-thread-tid has
> > > > > > > > > > > > >proc_tgid_base_operations despite not being a thread 
> > > > > > > > > > > > >group leader.
> > > > > > > > > > > > >(Yes, that's kinda weird.) AFAICS the WARN_ON_ONCE() 
> > > > > > > > > > > > >in this code can
> > > > > > > > > > > > >be hit trivially, and then the code will misbehave.
> > > > > > > > > > > > >
> > > > > > > > > > > > >@Joel: I think you'll have to either rewrite this to 
> > > > > > > > > > > > >explicitly bail
> > > > > > > > > > > > >out if you're dealing with a thread group leader, or 
> > > > > > > > > > > > >make the code
> > > > > > > > > > > > >work for threads, too.
> > > > > > > > > > > >
> > > > > > > > > > > > The latter case probably being preferred if this API is 
> > > > > > > > > > > > supposed to be
> > > > > > > > > > > > useable for thread management in userspace.
> > > > > > > > > > >
> > > > > > > > > > > At the moment, we are not planning to use this for 
> > > > > > > > > > > sub-thread management. I
> > > > > > > > > > > am reworking this patch to only work on clone(2) pidfds 
> > > > > > > > > > > which makes the above
> > > > > > > > > >
> > > > > > > > > > Indeed and agreed.
> > > > > > > > > >
> > > > > > > > > > > discussion about /proc a bit unnecessary I think. Per the 
> > > > > > > > > > > latest CLONE_PIDFD
> > > > > > > > > > > patches, CLONE_THREAD with pidfd is not supported.
> > > > > > > > > >
> > > > > > > > > > Yes. We have no one asking for it right now and we can 
> > > > > > > > > > easily add this
> > > > > > > > > > later.
> > > > > > > > > >
> > > > > > > > > > Admittedly I haven't gotten around to reviewing the patches 
> > > > > > > > > > here yet
> > > > > > > > > > completely. But one thing about using POLLIN. FreeBSD is 
> > > > > > > > > > using POLLHUP
> > > > > > > > > > on process exit which I think is nice as well. How about 
> > > > > > > > > > returning
> > > > > > > > > > POLLIN | POLLHUP on process exit?
> > > > > > > > > > We already do things like this. For example, when you proxy 
> > > > > > > > > > between
> > > > > > > > > > ttys. If the process that you're reading data from has 
> > > > > > > > > > exited and closed
> > > > > > > > > > it's end you still can't usually simply 

[PATCH v2 1/3 RFC] added ima hook for buffer, being enabled as a policy

2019-04-19 Thread Prakhar Srivastava
From: Prakhar Srivastava 

Signed-off-by: Prakhar Srivastava 
---
Currently for soft reboot(kexec_file_load) the kernel file and
signature is measured by IMA. The cmdline args used to load the kernel
is not measured.
The boot aggregate that gets calculated will have no change since the
EFI loader has not been triggered.
Adding the kexec cmdline args measure and kernel version will add some
attestable criteria.

This adds a new ima hook ima_buffer_check and a policy entry BUFFER_CHECK.
This enables buffer has measurements into ima log

 Documentation/ABI/testing/ima_policy |  1 +
 include/linux/ima.h  | 13 +++-
 security/integrity/ima/ima.h |  1 +
 security/integrity/ima/ima_main.c| 95 
 security/integrity/ima/ima_policy.c  | 14 +++-
 5 files changed, 122 insertions(+), 2 deletions(-)

diff --git a/Documentation/ABI/testing/ima_policy 
b/Documentation/ABI/testing/ima_policy
index bb0f9a135e21..676088c7ab26 100644
--- a/Documentation/ABI/testing/ima_policy
+++ b/Documentation/ABI/testing/ima_policy
@@ -28,6 +28,7 @@ Description:
base:   func:= 
[BPRM_CHECK][MMAP_CHECK][FILE_CHECK][MODULE_CHECK]
[FIRMWARE_CHECK]
[KEXEC_KERNEL_CHECK] [KEXEC_INITRAMFS_CHECK]
+   [BUFFER_CHECK]
mask:= [[^]MAY_READ] [[^]MAY_WRITE] [[^]MAY_APPEND]
   [[^]MAY_EXEC]
fsmagic:= hex value
diff --git a/include/linux/ima.h b/include/linux/ima.h
index 7f6952f8d6aa..733d0cb9dedc 100644
--- a/include/linux/ima.h
+++ b/include/linux/ima.h
@@ -14,6 +14,12 @@
 #include 
 struct linux_binprm;
 
+enum __buffer_id {
+   KERNEL_VERSION,
+   KEXEC_CMDLINE,
+   MAX_BUFFER_ID = KEXEC_CMDLINE
+} buffer_id;
+
 #ifdef CONFIG_IMA
 extern int ima_bprm_check(struct linux_binprm *bprm);
 extern int ima_file_check(struct file *file, int mask, int opened);
@@ -23,7 +29,7 @@ extern int ima_read_file(struct file *file, enum 
kernel_read_file_id id);
 extern int ima_post_read_file(struct file *file, void *buf, loff_t size,
  enum kernel_read_file_id id);
 extern void ima_post_path_mknod(struct dentry *dentry);
-
+extern void ima_buffer_check(const void *buff, int size, enum buffer_id id);
 #ifdef CONFIG_IMA_KEXEC
 extern void ima_add_kexec_buffer(struct kimage *image);
 #endif
@@ -65,6 +71,11 @@ static inline void ima_post_path_mknod(struct dentry *dentry)
return;
 }
 
+static inline void ima_buffer_check(const void *buff, int size,
+   enum buffer_id id)
+{
+   return;
+}
 #endif /* CONFIG_IMA */
 
 #ifndef CONFIG_IMA_KEXEC
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index b563fbd4d122..b71f2f6f7421 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -181,6 +181,7 @@ enum ima_hooks {
FIRMWARE_CHECK,
KEXEC_KERNEL_CHECK,
KEXEC_INITRAMFS_CHECK,
+   BUFFER_CHECK,
POLICY_CHECK,
MAX_CHECK
 };
diff --git a/security/integrity/ima/ima_main.c 
b/security/integrity/ima/ima_main.c
index 2aebb7984437..6408cadaadbb 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -155,6 +155,84 @@ void ima_file_free(struct file *file)
ima_check_last_writer(iint, inode, file);
 }
 
+/*
+ * process_buffer_measurement - Measure the buffer passed to ima log.
+ * (Instead of using the file hash the buffer hash is used).
+ * @buff - The buffer that needs to be added to the log
+ * @size - size of buffer(in bytes)
+ * @id - buffer id, this is differentiator for the various buffers
+ * that can be measured.
+ *
+ * The buffer passed is added to the ima logs.
+ * If the sig template is used, then the sig field contains the buffer.
+ *
+ * On success return 0.
+ * On error cases surface errors from ima calls.
+ */
+static int process_buffer_measurement(const void *buff, int size,
+   enum buffer_id id)
+{
+   int ret = -EINVAL;
+   struct ima_template_entry *entry = NULL;
+   struct integrity_iint_cache tmp_iint, *iint = _iint;
+   struct ima_event_data event_data = {iint, NULL, NULL,
+   NULL, 0, NULL};
+   struct {
+   struct ima_digest_data hdr;
+   char digest[IMA_MAX_DIGEST_SIZE];
+   } hash;
+   char *name = NULL;
+   int violation = 0;
+   int pcr = CONFIG_IMA_MEASURE_PCR_IDX;
+
+   if (!buff || size ==  0)
+   goto err_out;
+
+   if (ima_get_action(NULL, 0, BUFFER_CHECK, ) != IMA_MEASURE)
+   goto err_out;
+
+   switch (buffer_id) {
+   case KERNEL_VERSION:
+   name = "Kernel-version";
+   break;
+   case KEXEC_CMDLINE:
+   name = "Kexec-cmdline";
+   break;
+   default:
+   goto err_out;
+   }
+
+   

[PATCH v2 2/3 RFC] use event name instead of enum to make the call generic

2019-04-19 Thread Prakhar Srivastava
From: Prakhar Srivastava 

Signed-off-by: Prakhar Srivastava 
---

Currently for soft reboot(kexec_file_load) the kernel file and
signature is measured by IMA. The cmdline args used to load the kernel
is not measured.
The boot aggregate that gets calculated will have no change since the
EFI loader has not been triggered.
Adding the kexec cmdline args measure and kernel version will add some
attestable criteria.

remove enums to control type of buffers entries, instead pass the event name to 
be used.

 include/linux/ima.h   | 10 ++
 kernel/kexec_file.c   |  3 +++
 security/integrity/ima/ima.h  |  2 +-
 security/integrity/ima/ima_main.c | 30 ++
 4 files changed, 16 insertions(+), 29 deletions(-)

diff --git a/include/linux/ima.h b/include/linux/ima.h
index 733d0cb9dedc..5e41507c57e5 100644
--- a/include/linux/ima.h
+++ b/include/linux/ima.h
@@ -14,12 +14,6 @@
 #include 
 struct linux_binprm;
 
-enum __buffer_id {
-   KERNEL_VERSION,
-   KEXEC_CMDLINE,
-   MAX_BUFFER_ID = KEXEC_CMDLINE
-} buffer_id;
-
 #ifdef CONFIG_IMA
 extern int ima_bprm_check(struct linux_binprm *bprm);
 extern int ima_file_check(struct file *file, int mask, int opened);
@@ -29,7 +23,7 @@ extern int ima_read_file(struct file *file, enum 
kernel_read_file_id id);
 extern int ima_post_read_file(struct file *file, void *buf, loff_t size,
  enum kernel_read_file_id id);
 extern void ima_post_path_mknod(struct dentry *dentry);
-extern void ima_buffer_check(const void *buff, int size, enum buffer_id id);
+extern void ima_buffer_check(const void *buff, int size, char *eventname);
 #ifdef CONFIG_IMA_KEXEC
 extern void ima_add_kexec_buffer(struct kimage *image);
 #endif
@@ -72,7 +66,7 @@ static inline void ima_post_path_mknod(struct dentry *dentry)
 }
 
 static inline void ima_buffer_check(const void *buff, int size,
-   enum buffer_id id)
+   char *eventname)
 {
return;
 }
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index b118735fea9d..2a5234eb4b28 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -182,6 +182,9 @@ kimage_file_prepare_segments(struct kimage *image, int 
kernel_fd, int initrd_fd,
ret = -EINVAL;
goto out;
}
+
+   ima_buffer_check(image->cmdline_buf, cmdline_len - 1,
+   "kexec_cmdline");
}
 
/* Call arch image load handlers */
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index b71f2f6f7421..fcade3c103ed 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -181,8 +181,8 @@ enum ima_hooks {
FIRMWARE_CHECK,
KEXEC_KERNEL_CHECK,
KEXEC_INITRAMFS_CHECK,
-   BUFFER_CHECK,
POLICY_CHECK,
+   BUFFER_CHECK,
MAX_CHECK
 };
 
diff --git a/security/integrity/ima/ima_main.c 
b/security/integrity/ima/ima_main.c
index 6408cadaadbb..da82c705a5ed 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -160,8 +160,7 @@ void ima_file_free(struct file *file)
  * (Instead of using the file hash the buffer hash is used).
  * @buff - The buffer that needs to be added to the log
  * @size - size of buffer(in bytes)
- * @id - buffer id, this is differentiator for the various buffers
- * that can be measured.
+ * @id - eventname, event name to be used for buffer measurement.
  *
  * The buffer passed is added to the ima logs.
  * If the sig template is used, then the sig field contains the buffer.
@@ -170,7 +169,7 @@ void ima_file_free(struct file *file)
  * On error cases surface errors from ima calls.
  */
 static int process_buffer_measurement(const void *buff, int size,
-   enum buffer_id id)
+   char *eventname)
 {
int ret = -EINVAL;
struct ima_template_entry *entry = NULL;
@@ -185,23 +184,13 @@ static int process_buffer_measurement(const void *buff, 
int size,
int violation = 0;
int pcr = CONFIG_IMA_MEASURE_PCR_IDX;
 
-   if (!buff || size ==  0)
+   if (!buff || size ==  0 || !eventname)
goto err_out;
 
if (ima_get_action(NULL, 0, BUFFER_CHECK, ) != IMA_MEASURE)
goto err_out;
 
-   switch (buffer_id) {
-   case KERNEL_VERSION:
-   name = "Kernel-version";
-   break;
-   case KEXEC_CMDLINE:
-   name = "Kexec-cmdline";
-   break;
-   default:
-   goto err_out;
-   }
-
+   name = eventname;
memset(iint, 0, sizeof(*iint));
memset(, 0, sizeof(hash));
 
@@ -452,15 +441,16 @@ int ima_read_file(struct file *file, enum 
kernel_read_file_id read_id)
  * ima_buffer_check - based on policy, collect & store buffer measurement
  * @buf: pointer to buffer
  * @size: size of buffer
- * @buffer_id: caller 

[PATCH v2 3/3 RFC] since cmdline args can be same for multiple kexec, log entry hash will collide. Prepend the kernel file name to the cmdline args to distinguish between cmdline args passed to subseq

2019-04-19 Thread Prakhar Srivastava
From: Prakhar Srivastava 

Signed-off-by: Prakhar Srivastava 
---

Currently for soft reboot(kexec_file_load) the kernel file and
signature is measured by IMA. The cmdline args used to load the kernel
is not measured.
The boot aggregate that gets calculated will have no change since the
EFI loader has not been triggered.
Adding the kexec cmdline args measure and kernel version will add some
attestable criteria.

Cmdline args can be same for multiple kexec, log entry
hash will collide. Prepend the kernel file name to the cmdline args to
distinguish between cmdline args passed to subsequent kexec calls

 kernel/kexec_core.c | 57 +
 kernel/kexec_file.c | 14 --
 kernel/kexec_internal.h |  3 +++
 3 files changed, 72 insertions(+), 2 deletions(-)

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index ae1a3ba24df5..97b77c780311 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -1151,3 +1151,60 @@ void __weak arch_kexec_protect_crashkres(void)
 
 void __weak arch_kexec_unprotect_crashkres(void)
 {}
+
+/**
+ * kexec_cmdline_prepend_img_name - prepare the buffer with cmdline
+ * that needs to be measured
+ * @outbuf - out buffer that contains the formated string
+ * @kernel_fd - the file identifier for the kerenel image
+ * @cmdline_ptr - ptr to the cmdline buffer
+ * @cmdline_len - len of the buffer.
+ *
+ * This generates a buffer in the format Kerenelfilename::cmdline
+ *
+ * On success return 0.
+ * On failure return -EINVAL.
+ */
+int kexec_cmdline_prepend_img_name(char **outbuf, int kernel_fd,
+   const char *cmdline_ptr,
+   unsigned long cmdline_len)
+{
+   int ret = -EINVAL;
+   struct fd f = {};
+   int size = 0;
+   char *buf = NULL;
+   char delimiter[] = "::";
+
+   if (!outbuf || !cmdline_ptr)
+   goto out;
+
+   f = fdget(kernel_fd);
+   if (!f.file)
+   goto out;
+
+   size = (f.file->f_path.dentry->d_name.len + cmdline_len - 1+
+   ARRAY_SIZE(delimiter)) - 1;
+
+   buf = kzalloc(size, GFP_KERNEL);
+   if (!buf)
+   goto out;
+
+   memcpy(buf, f.file->f_path.dentry->d_name.name,
+   f.file->f_path.dentry->d_name.len);
+   memcpy(buf + f.file->f_path.dentry->d_name.len,
+   delimiter, ARRAY_SIZE(delimiter) - 1);
+   memcpy(buf + f.file->f_path.dentry->d_name.len +
+   ARRAY_SIZE(delimiter) - 1,
+   cmdline_ptr, cmdline_len - 1);
+
+   *outbuf = buf;
+   ret = size;
+
+   pr_debug("kexec cmdline buff: %s\n", buf);
+
+out:
+   if (f.file)
+   fdput(f);
+
+   return ret;
+}
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 2a5234eb4b28..a487491d55b9 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -126,6 +126,8 @@ kimage_file_prepare_segments(struct kimage *image, int 
kernel_fd, int initrd_fd,
int ret = 0;
void *ldata;
loff_t size;
+   char *buff_to_measure = NULL;
+   int buff_to_measure_size = 0;
 
ret = kernel_read_file_from_fd(kernel_fd, >kernel_buf,
   , INT_MAX, READING_KEXEC_IMAGE);
@@ -183,8 +185,13 @@ kimage_file_prepare_segments(struct kimage *image, int 
kernel_fd, int initrd_fd,
goto out;
}
 
-   ima_buffer_check(image->cmdline_buf, cmdline_len - 1,
-   "kexec_cmdline");
+   /* IMA measures the cmdline args passed to the next kernel*/
+   buff_to_measure_size = 
kexec_cmdline_prepend_img_name(_to_measure,
+   kernel_fd, image->cmdline_buf, image->cmdline_buf_len);
+
+   ima_buffer_check(buff_to_measure, buff_to_measure_size,
+   "kexec_cmdline");
+
}
 
/* Call arch image load handlers */
@@ -200,6 +207,9 @@ kimage_file_prepare_segments(struct kimage *image, int 
kernel_fd, int initrd_fd,
/* In case of error, free up all allocated memory in this function */
if (ret)
kimage_file_post_load_cleanup(image);
+
+   kfree(buff_to_measure);
+
return ret;
 }
 
diff --git a/kernel/kexec_internal.h b/kernel/kexec_internal.h
index 799a8a452187..4d34a8ef4637 100644
--- a/kernel/kexec_internal.h
+++ b/kernel/kexec_internal.h
@@ -11,6 +11,9 @@ int kimage_load_segment(struct kimage *image, struct 
kexec_segment *segment);
 void kimage_terminate(struct kimage *image);
 int kimage_is_destination_range(struct kimage *image,
unsigned long start, unsigned long end);
+int kexec_cmdline_prepend_img_name(char **outbuf, int kernel_fd,
+   const char *cmdline_ptr,
+   unsigned long cmdline_len);
 
 extern struct mutex kexec_mutex;
 
-- 
2.17.1



Re: [PATCH RFC 1/2] Add polling support to pidfd

2019-04-19 Thread Christian Brauner
On Sat, Apr 20, 2019 at 1:30 AM Daniel Colascione  wrote:
>
> On Fri, Apr 19, 2019 at 4:02 PM Christian Brauner  
> wrote:
> >
> > On Sat, Apr 20, 2019 at 12:35 AM Daniel Colascione  
> > wrote:
> > >
> > > On Fri, Apr 19, 2019 at 2:48 PM Christian Brauner  
> > > wrote:
> > > >
> > > > On Fri, Apr 19, 2019 at 11:21 PM Daniel Colascione  
> > > > wrote:
> > > > >
> > > > > On Fri, Apr 19, 2019 at 1:57 PM Christian Brauner 
> > > > >  wrote:
> > > > > >
> > > > > > On Fri, Apr 19, 2019 at 10:34 PM Daniel Colascione 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Fri, Apr 19, 2019 at 12:49 PM Joel Fernandes 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Fri, Apr 19, 2019 at 09:18:59PM +0200, Christian Brauner 
> > > > > > > > wrote:
> > > > > > > > > On Fri, Apr 19, 2019 at 03:02:47PM -0400, Joel Fernandes 
> > > > > > > > > wrote:
> > > > > > > > > > On Thu, Apr 18, 2019 at 07:26:44PM +0200, Christian Brauner 
> > > > > > > > > > wrote:
> > > > > > > > > > > On April 18, 2019 7:23:38 PM GMT+02:00, Jann Horn 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >On Wed, Apr 17, 2019 at 3:09 PM Oleg Nesterov 
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >> On 04/16, Joel Fernandes wrote:
> > > > > > > > > > > >> > On Tue, Apr 16, 2019 at 02:04:31PM +0200, Oleg 
> > > > > > > > > > > >> > Nesterov wrote:
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > > Could you explain when it should return POLLIN? 
> > > > > > > > > > > >> > > When the whole
> > > > > > > > > > > >process exits?
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > It returns POLLIN when the task is dead or doesn't 
> > > > > > > > > > > >> > exist anymore,
> > > > > > > > > > > >or when it
> > > > > > > > > > > >> > is in a zombie state and there's no other thread in 
> > > > > > > > > > > >> > the thread
> > > > > > > > > > > >group.
> > > > > > > > > > > >>
> > > > > > > > > > > >> IOW, when the whole thread group exits, so it can't be 
> > > > > > > > > > > >> used to
> > > > > > > > > > > >monitor sub-threads.
> > > > > > > > > > > >>
> > > > > > > > > > > >> just in case... speaking of this patch it doesn't 
> > > > > > > > > > > >> modify
> > > > > > > > > > > >proc_tid_base_operations,
> > > > > > > > > > > >> so you can't poll("/proc/sub-thread-tid") anyway, but 
> > > > > > > > > > > >> iiuc you are
> > > > > > > > > > > >going to use
> > > > > > > > > > > >> the anonymous file returned by CLONE_PIDFD ?
> > > > > > > > > > > >
> > > > > > > > > > > >I don't think procfs works that way. 
> > > > > > > > > > > >/proc/sub-thread-tid has
> > > > > > > > > > > >proc_tgid_base_operations despite not being a thread 
> > > > > > > > > > > >group leader.
> > > > > > > > > > > >(Yes, that's kinda weird.) AFAICS the WARN_ON_ONCE() in 
> > > > > > > > > > > >this code can
> > > > > > > > > > > >be hit trivially, and then the code will misbehave.
> > > > > > > > > > > >
> > > > > > > > > > > >@Joel: I think you'll have to either rewrite this to 
> > > > > > > > > > > >explicitly bail
> > > > > > > > > > > >out if you're dealing with a thread group leader, or 
> > > > > > > > > > > >make the code
> > > > > > > > > > > >work for threads, too.
> > > > > > > > > > >
> > > > > > > > > > > The latter case probably being preferred if this API is 
> > > > > > > > > > > supposed to be
> > > > > > > > > > > useable for thread management in userspace.
> > > > > > > > > >
> > > > > > > > > > At the moment, we are not planning to use this for 
> > > > > > > > > > sub-thread management. I
> > > > > > > > > > am reworking this patch to only work on clone(2) pidfds 
> > > > > > > > > > which makes the above
> > > > > > > > >
> > > > > > > > > Indeed and agreed.
> > > > > > > > >
> > > > > > > > > > discussion about /proc a bit unnecessary I think. Per the 
> > > > > > > > > > latest CLONE_PIDFD
> > > > > > > > > > patches, CLONE_THREAD with pidfd is not supported.
> > > > > > > > >
> > > > > > > > > Yes. We have no one asking for it right now and we can easily 
> > > > > > > > > add this
> > > > > > > > > later.
> > > > > > > > >
> > > > > > > > > Admittedly I haven't gotten around to reviewing the patches 
> > > > > > > > > here yet
> > > > > > > > > completely. But one thing about using POLLIN. FreeBSD is 
> > > > > > > > > using POLLHUP
> > > > > > > > > on process exit which I think is nice as well. How about 
> > > > > > > > > returning
> > > > > > > > > POLLIN | POLLHUP on process exit?
> > > > > > > > > We already do things like this. For example, when you proxy 
> > > > > > > > > between
> > > > > > > > > ttys. If the process that you're reading data from has exited 
> > > > > > > > > and closed
> > > > > > > > > it's end you still can't usually simply exit because it might 
> > > > > > > > > have still
> > > > > > > > > buffered data that you want to read.  The way one can deal 
> > > > > > > > > with this
> > > > > > > > > from  userspace is that you can observe a (POLLHUP | POLLIN) 
> > > > > > > > > event and
> > > > 

[PATCHv2] use event name instead of enum to make the call generic

2019-04-19 Thread Prakhar Srivastava
From: Prakhar Srivastava 

Signed-off-by: Prakhar Srivastava 
---

remove enaums to control type of buffers entries, instead pass the event name 
to be used.

 include/linux/ima.h   | 10 ++
 kernel/kexec_file.c   |  3 +++
 security/integrity/ima/ima.h  |  2 +-
 security/integrity/ima/ima_main.c | 30 ++
 4 files changed, 16 insertions(+), 29 deletions(-)

diff --git a/include/linux/ima.h b/include/linux/ima.h
index 733d0cb9dedc..5e41507c57e5 100644
--- a/include/linux/ima.h
+++ b/include/linux/ima.h
@@ -14,12 +14,6 @@
 #include 
 struct linux_binprm;
 
-enum __buffer_id {
-   KERNEL_VERSION,
-   KEXEC_CMDLINE,
-   MAX_BUFFER_ID = KEXEC_CMDLINE
-} buffer_id;
-
 #ifdef CONFIG_IMA
 extern int ima_bprm_check(struct linux_binprm *bprm);
 extern int ima_file_check(struct file *file, int mask, int opened);
@@ -29,7 +23,7 @@ extern int ima_read_file(struct file *file, enum 
kernel_read_file_id id);
 extern int ima_post_read_file(struct file *file, void *buf, loff_t size,
  enum kernel_read_file_id id);
 extern void ima_post_path_mknod(struct dentry *dentry);
-extern void ima_buffer_check(const void *buff, int size, enum buffer_id id);
+extern void ima_buffer_check(const void *buff, int size, char *eventname);
 #ifdef CONFIG_IMA_KEXEC
 extern void ima_add_kexec_buffer(struct kimage *image);
 #endif
@@ -72,7 +66,7 @@ static inline void ima_post_path_mknod(struct dentry *dentry)
 }
 
 static inline void ima_buffer_check(const void *buff, int size,
-   enum buffer_id id)
+   char *eventname)
 {
return;
 }
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index b118735fea9d..2a5234eb4b28 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -182,6 +182,9 @@ kimage_file_prepare_segments(struct kimage *image, int 
kernel_fd, int initrd_fd,
ret = -EINVAL;
goto out;
}
+
+   ima_buffer_check(image->cmdline_buf, cmdline_len - 1,
+   "kexec_cmdline");
}
 
/* Call arch image load handlers */
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index b71f2f6f7421..fcade3c103ed 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -181,8 +181,8 @@ enum ima_hooks {
FIRMWARE_CHECK,
KEXEC_KERNEL_CHECK,
KEXEC_INITRAMFS_CHECK,
-   BUFFER_CHECK,
POLICY_CHECK,
+   BUFFER_CHECK,
MAX_CHECK
 };
 
diff --git a/security/integrity/ima/ima_main.c 
b/security/integrity/ima/ima_main.c
index 6408cadaadbb..da82c705a5ed 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -160,8 +160,7 @@ void ima_file_free(struct file *file)
  * (Instead of using the file hash the buffer hash is used).
  * @buff - The buffer that needs to be added to the log
  * @size - size of buffer(in bytes)
- * @id - buffer id, this is differentiator for the various buffers
- * that can be measured.
+ * @id - eventname, event name to be used for buffer measurement.
  *
  * The buffer passed is added to the ima logs.
  * If the sig template is used, then the sig field contains the buffer.
@@ -170,7 +169,7 @@ void ima_file_free(struct file *file)
  * On error cases surface errors from ima calls.
  */
 static int process_buffer_measurement(const void *buff, int size,
-   enum buffer_id id)
+   char *eventname)
 {
int ret = -EINVAL;
struct ima_template_entry *entry = NULL;
@@ -185,23 +184,13 @@ static int process_buffer_measurement(const void *buff, 
int size,
int violation = 0;
int pcr = CONFIG_IMA_MEASURE_PCR_IDX;
 
-   if (!buff || size ==  0)
+   if (!buff || size ==  0 || !eventname)
goto err_out;
 
if (ima_get_action(NULL, 0, BUFFER_CHECK, ) != IMA_MEASURE)
goto err_out;
 
-   switch (buffer_id) {
-   case KERNEL_VERSION:
-   name = "Kernel-version";
-   break;
-   case KEXEC_CMDLINE:
-   name = "Kexec-cmdline";
-   break;
-   default:
-   goto err_out;
-   }
-
+   name = eventname;
memset(iint, 0, sizeof(*iint));
memset(, 0, sizeof(hash));
 
@@ -452,15 +441,16 @@ int ima_read_file(struct file *file, enum 
kernel_read_file_id read_id)
  * ima_buffer_check - based on policy, collect & store buffer measurement
  * @buf: pointer to buffer
  * @size: size of buffer
- * @buffer_id: caller identifier
+ * @eventname: caller identifier
  *
  * Buffers can only be measured, not appraised.  The buffer identifier
- * is used as the measurement list entry name (eg. boot_cmdline).
+ * is used as the measurement list entry name (eg. boot_cmdline,
+ * kernel_version).
  */
-void ima_buffer_check(const void *buf, int size, enum buffer_id id)
+void 

[PATCHv2] added ima hook for buffer, being enabled as a policy

2019-04-19 Thread Prakhar Srivastava
From: Prakhar Srivastava 

Signed-off-by: Prakhar Srivastava 
---

This adds a new ima hook ima_buffer_check and a policy entry BUFFER_CHECK.
This enables buffer has measurements into ima log

 Documentation/ABI/testing/ima_policy |  1 +
 include/linux/ima.h  | 13 +++-
 security/integrity/ima/ima.h |  1 +
 security/integrity/ima/ima_main.c| 95 
 security/integrity/ima/ima_policy.c  | 14 +++-
 5 files changed, 122 insertions(+), 2 deletions(-)

diff --git a/Documentation/ABI/testing/ima_policy 
b/Documentation/ABI/testing/ima_policy
index bb0f9a135e21..676088c7ab26 100644
--- a/Documentation/ABI/testing/ima_policy
+++ b/Documentation/ABI/testing/ima_policy
@@ -28,6 +28,7 @@ Description:
base:   func:= 
[BPRM_CHECK][MMAP_CHECK][FILE_CHECK][MODULE_CHECK]
[FIRMWARE_CHECK]
[KEXEC_KERNEL_CHECK] [KEXEC_INITRAMFS_CHECK]
+   [BUFFER_CHECK]
mask:= [[^]MAY_READ] [[^]MAY_WRITE] [[^]MAY_APPEND]
   [[^]MAY_EXEC]
fsmagic:= hex value
diff --git a/include/linux/ima.h b/include/linux/ima.h
index 7f6952f8d6aa..733d0cb9dedc 100644
--- a/include/linux/ima.h
+++ b/include/linux/ima.h
@@ -14,6 +14,12 @@
 #include 
 struct linux_binprm;
 
+enum __buffer_id {
+   KERNEL_VERSION,
+   KEXEC_CMDLINE,
+   MAX_BUFFER_ID = KEXEC_CMDLINE
+} buffer_id;
+
 #ifdef CONFIG_IMA
 extern int ima_bprm_check(struct linux_binprm *bprm);
 extern int ima_file_check(struct file *file, int mask, int opened);
@@ -23,7 +29,7 @@ extern int ima_read_file(struct file *file, enum 
kernel_read_file_id id);
 extern int ima_post_read_file(struct file *file, void *buf, loff_t size,
  enum kernel_read_file_id id);
 extern void ima_post_path_mknod(struct dentry *dentry);
-
+extern void ima_buffer_check(const void *buff, int size, enum buffer_id id);
 #ifdef CONFIG_IMA_KEXEC
 extern void ima_add_kexec_buffer(struct kimage *image);
 #endif
@@ -65,6 +71,11 @@ static inline void ima_post_path_mknod(struct dentry *dentry)
return;
 }
 
+static inline void ima_buffer_check(const void *buff, int size,
+   enum buffer_id id)
+{
+   return;
+}
 #endif /* CONFIG_IMA */
 
 #ifndef CONFIG_IMA_KEXEC
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index b563fbd4d122..b71f2f6f7421 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -181,6 +181,7 @@ enum ima_hooks {
FIRMWARE_CHECK,
KEXEC_KERNEL_CHECK,
KEXEC_INITRAMFS_CHECK,
+   BUFFER_CHECK,
POLICY_CHECK,
MAX_CHECK
 };
diff --git a/security/integrity/ima/ima_main.c 
b/security/integrity/ima/ima_main.c
index 2aebb7984437..6408cadaadbb 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -155,6 +155,84 @@ void ima_file_free(struct file *file)
ima_check_last_writer(iint, inode, file);
 }
 
+/*
+ * process_buffer_measurement - Measure the buffer passed to ima log.
+ * (Instead of using the file hash the buffer hash is used).
+ * @buff - The buffer that needs to be added to the log
+ * @size - size of buffer(in bytes)
+ * @id - buffer id, this is differentiator for the various buffers
+ * that can be measured.
+ *
+ * The buffer passed is added to the ima logs.
+ * If the sig template is used, then the sig field contains the buffer.
+ *
+ * On success return 0.
+ * On error cases surface errors from ima calls.
+ */
+static int process_buffer_measurement(const void *buff, int size,
+   enum buffer_id id)
+{
+   int ret = -EINVAL;
+   struct ima_template_entry *entry = NULL;
+   struct integrity_iint_cache tmp_iint, *iint = _iint;
+   struct ima_event_data event_data = {iint, NULL, NULL,
+   NULL, 0, NULL};
+   struct {
+   struct ima_digest_data hdr;
+   char digest[IMA_MAX_DIGEST_SIZE];
+   } hash;
+   char *name = NULL;
+   int violation = 0;
+   int pcr = CONFIG_IMA_MEASURE_PCR_IDX;
+
+   if (!buff || size ==  0)
+   goto err_out;
+
+   if (ima_get_action(NULL, 0, BUFFER_CHECK, ) != IMA_MEASURE)
+   goto err_out;
+
+   switch (buffer_id) {
+   case KERNEL_VERSION:
+   name = "Kernel-version";
+   break;
+   case KEXEC_CMDLINE:
+   name = "Kexec-cmdline";
+   break;
+   default:
+   goto err_out;
+   }
+
+   memset(iint, 0, sizeof(*iint));
+   memset(, 0, sizeof(hash));
+
+   event_data.filename = name;
+
+   iint->ima_hash = 
+   iint->ima_hash->algo = ima_hash_algo;
+   iint->ima_hash->length = hash_digest_size[ima_hash_algo];
+
+   ret = ima_calc_buffer_hash(buff, size, iint->ima_hash);
+   if (ret < 0)
+   

[PATCHv2] since cmdline args can be same for multiple kexec, log entry hash will collide. Prepend the kernel file name to the cmdline args to distinguish between cmdline args passed to subsequent kexe

2019-04-19 Thread Prakhar Srivastava
From: Prakhar Srivastava 

Signed-off-by: Prakhar Srivastava 
---
since cmdline args can be same for multiple kexec, log entry
hash will collide. Prepend the kernel file name to the cmdline args to
distinguish between cmdline args passed to subsequent kexec calls

 kernel/kexec_core.c | 57 +
 kernel/kexec_file.c | 14 --
 kernel/kexec_internal.h |  3 +++
 3 files changed, 72 insertions(+), 2 deletions(-)

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index ae1a3ba24df5..97b77c780311 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -1151,3 +1151,60 @@ void __weak arch_kexec_protect_crashkres(void)
 
 void __weak arch_kexec_unprotect_crashkres(void)
 {}
+
+/**
+ * kexec_cmdline_prepend_img_name - prepare the buffer with cmdline
+ * that needs to be measured
+ * @outbuf - out buffer that contains the formated string
+ * @kernel_fd - the file identifier for the kerenel image
+ * @cmdline_ptr - ptr to the cmdline buffer
+ * @cmdline_len - len of the buffer.
+ *
+ * This generates a buffer in the format Kerenelfilename::cmdline
+ *
+ * On success return 0.
+ * On failure return -EINVAL.
+ */
+int kexec_cmdline_prepend_img_name(char **outbuf, int kernel_fd,
+   const char *cmdline_ptr,
+   unsigned long cmdline_len)
+{
+   int ret = -EINVAL;
+   struct fd f = {};
+   int size = 0;
+   char *buf = NULL;
+   char delimiter[] = "::";
+
+   if (!outbuf || !cmdline_ptr)
+   goto out;
+
+   f = fdget(kernel_fd);
+   if (!f.file)
+   goto out;
+
+   size = (f.file->f_path.dentry->d_name.len + cmdline_len - 1+
+   ARRAY_SIZE(delimiter)) - 1;
+
+   buf = kzalloc(size, GFP_KERNEL);
+   if (!buf)
+   goto out;
+
+   memcpy(buf, f.file->f_path.dentry->d_name.name,
+   f.file->f_path.dentry->d_name.len);
+   memcpy(buf + f.file->f_path.dentry->d_name.len,
+   delimiter, ARRAY_SIZE(delimiter) - 1);
+   memcpy(buf + f.file->f_path.dentry->d_name.len +
+   ARRAY_SIZE(delimiter) - 1,
+   cmdline_ptr, cmdline_len - 1);
+
+   *outbuf = buf;
+   ret = size;
+
+   pr_debug("kexec cmdline buff: %s\n", buf);
+
+out:
+   if (f.file)
+   fdput(f);
+
+   return ret;
+}
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 2a5234eb4b28..a487491d55b9 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -126,6 +126,8 @@ kimage_file_prepare_segments(struct kimage *image, int 
kernel_fd, int initrd_fd,
int ret = 0;
void *ldata;
loff_t size;
+   char *buff_to_measure = NULL;
+   int buff_to_measure_size = 0;
 
ret = kernel_read_file_from_fd(kernel_fd, >kernel_buf,
   , INT_MAX, READING_KEXEC_IMAGE);
@@ -183,8 +185,13 @@ kimage_file_prepare_segments(struct kimage *image, int 
kernel_fd, int initrd_fd,
goto out;
}
 
-   ima_buffer_check(image->cmdline_buf, cmdline_len - 1,
-   "kexec_cmdline");
+   /* IMA measures the cmdline args passed to the next kernel*/
+   buff_to_measure_size = 
kexec_cmdline_prepend_img_name(_to_measure,
+   kernel_fd, image->cmdline_buf, image->cmdline_buf_len);
+
+   ima_buffer_check(buff_to_measure, buff_to_measure_size,
+   "kexec_cmdline");
+
}
 
/* Call arch image load handlers */
@@ -200,6 +207,9 @@ kimage_file_prepare_segments(struct kimage *image, int 
kernel_fd, int initrd_fd,
/* In case of error, free up all allocated memory in this function */
if (ret)
kimage_file_post_load_cleanup(image);
+
+   kfree(buff_to_measure);
+
return ret;
 }
 
diff --git a/kernel/kexec_internal.h b/kernel/kexec_internal.h
index 799a8a452187..4d34a8ef4637 100644
--- a/kernel/kexec_internal.h
+++ b/kernel/kexec_internal.h
@@ -11,6 +11,9 @@ int kimage_load_segment(struct kimage *image, struct 
kexec_segment *segment);
 void kimage_terminate(struct kimage *image);
 int kimage_is_destination_range(struct kimage *image,
unsigned long start, unsigned long end);
+int kexec_cmdline_prepend_img_name(char **outbuf, int kernel_fd,
+   const char *cmdline_ptr,
+   unsigned long cmdline_len);
 
 extern struct mutex kexec_mutex;
 
-- 
2.17.1



Re: [PATCH RFC 1/2] Add polling support to pidfd

2019-04-19 Thread Daniel Colascione
On Fri, Apr 19, 2019 at 4:12 PM Christian Brauner  wrote:
>
> On Sat, Apr 20, 2019 at 12:46 AM Daniel Colascione  wrote:
> >
> > On Fri, Apr 19, 2019 at 3:02 PM Christian Brauner  
> > wrote:
> > >
> > > On Fri, Apr 19, 2019 at 11:48 PM Christian Brauner  
> > > wrote:
> > > >
> > > > On Fri, Apr 19, 2019 at 11:21 PM Daniel Colascione  
> > > > wrote:
> > > > >
> > > > > On Fri, Apr 19, 2019 at 1:57 PM Christian Brauner 
> > > > >  wrote:
> > > > > >
> > > > > > On Fri, Apr 19, 2019 at 10:34 PM Daniel Colascione 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Fri, Apr 19, 2019 at 12:49 PM Joel Fernandes 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Fri, Apr 19, 2019 at 09:18:59PM +0200, Christian Brauner 
> > > > > > > > wrote:
> > > > > > > > > On Fri, Apr 19, 2019 at 03:02:47PM -0400, Joel Fernandes 
> > > > > > > > > wrote:
> > > > > > > > > > On Thu, Apr 18, 2019 at 07:26:44PM +0200, Christian Brauner 
> > > > > > > > > > wrote:
> > > > > > > > > > > On April 18, 2019 7:23:38 PM GMT+02:00, Jann Horn 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >On Wed, Apr 17, 2019 at 3:09 PM Oleg Nesterov 
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >> On 04/16, Joel Fernandes wrote:
> > > > > > > > > > > >> > On Tue, Apr 16, 2019 at 02:04:31PM +0200, Oleg 
> > > > > > > > > > > >> > Nesterov wrote:
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > > Could you explain when it should return POLLIN? 
> > > > > > > > > > > >> > > When the whole
> > > > > > > > > > > >process exits?
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > It returns POLLIN when the task is dead or doesn't 
> > > > > > > > > > > >> > exist anymore,
> > > > > > > > > > > >or when it
> > > > > > > > > > > >> > is in a zombie state and there's no other thread in 
> > > > > > > > > > > >> > the thread
> > > > > > > > > > > >group.
> > > > > > > > > > > >>
> > > > > > > > > > > >> IOW, when the whole thread group exits, so it can't be 
> > > > > > > > > > > >> used to
> > > > > > > > > > > >monitor sub-threads.
> > > > > > > > > > > >>
> > > > > > > > > > > >> just in case... speaking of this patch it doesn't 
> > > > > > > > > > > >> modify
> > > > > > > > > > > >proc_tid_base_operations,
> > > > > > > > > > > >> so you can't poll("/proc/sub-thread-tid") anyway, but 
> > > > > > > > > > > >> iiuc you are
> > > > > > > > > > > >going to use
> > > > > > > > > > > >> the anonymous file returned by CLONE_PIDFD ?
> > > > > > > > > > > >
> > > > > > > > > > > >I don't think procfs works that way. 
> > > > > > > > > > > >/proc/sub-thread-tid has
> > > > > > > > > > > >proc_tgid_base_operations despite not being a thread 
> > > > > > > > > > > >group leader.
> > > > > > > > > > > >(Yes, that's kinda weird.) AFAICS the WARN_ON_ONCE() in 
> > > > > > > > > > > >this code can
> > > > > > > > > > > >be hit trivially, and then the code will misbehave.
> > > > > > > > > > > >
> > > > > > > > > > > >@Joel: I think you'll have to either rewrite this to 
> > > > > > > > > > > >explicitly bail
> > > > > > > > > > > >out if you're dealing with a thread group leader, or 
> > > > > > > > > > > >make the code
> > > > > > > > > > > >work for threads, too.
> > > > > > > > > > >
> > > > > > > > > > > The latter case probably being preferred if this API is 
> > > > > > > > > > > supposed to be
> > > > > > > > > > > useable for thread management in userspace.
> > > > > > > > > >
> > > > > > > > > > At the moment, we are not planning to use this for 
> > > > > > > > > > sub-thread management. I
> > > > > > > > > > am reworking this patch to only work on clone(2) pidfds 
> > > > > > > > > > which makes the above
> > > > > > > > >
> > > > > > > > > Indeed and agreed.
> > > > > > > > >
> > > > > > > > > > discussion about /proc a bit unnecessary I think. Per the 
> > > > > > > > > > latest CLONE_PIDFD
> > > > > > > > > > patches, CLONE_THREAD with pidfd is not supported.
> > > > > > > > >
> > > > > > > > > Yes. We have no one asking for it right now and we can easily 
> > > > > > > > > add this
> > > > > > > > > later.
> > > > > > > > >
> > > > > > > > > Admittedly I haven't gotten around to reviewing the patches 
> > > > > > > > > here yet
> > > > > > > > > completely. But one thing about using POLLIN. FreeBSD is 
> > > > > > > > > using POLLHUP
> > > > > > > > > on process exit which I think is nice as well. How about 
> > > > > > > > > returning
> > > > > > > > > POLLIN | POLLHUP on process exit?
> > > > > > > > > We already do things like this. For example, when you proxy 
> > > > > > > > > between
> > > > > > > > > ttys. If the process that you're reading data from has exited 
> > > > > > > > > and closed
> > > > > > > > > it's end you still can't usually simply exit because it might 
> > > > > > > > > have still
> > > > > > > > > buffered data that you want to read.  The way one can deal 
> > > > > > > > > with this
> > > > > > > > > from  userspace is that you can observe a (POLLHUP | POLLIN) 
> > > > > > > > > event and
> > > > > 

Re: [GIT PULL] percpu changes for v5.1-rc6

2019-04-19 Thread pr-tracker-bot
The pull request you sent on Fri, 19 Apr 2019 17:29:18 -0400:

> git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu.git for-5.1-fixes

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/4c3f49ae1306c05e91211c06feddfd0a4a57fabd

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [PATCH RFC 1/2] Add polling support to pidfd

2019-04-19 Thread Linus Torvalds
On Fri, Apr 19, 2019 at 4:20 PM Christian Brauner  wrote:
>
> On Sat, Apr 20, 2019 at 1:11 AM Linus Torvalds
>  wrote:
> >
> > It's also worth noting that POLLERR/POLLHUP/POLLNVAL cannot be masked
> > for "poll()". Even if you only ask for POLLIN/POLLOUT, you will always
> > get POLLERR/POLLHUP notification. That is again historical behavior,
> > and it's kind of a "you can't poll a hung up fd". But it once again
> > means that you should consider POLLHUP to be something *exceptional*
> > and final, where no further or other state changes can happen or are
> > relevant.
>
> Which kind of makes sense for process exit. So the historical behavior
> here is in our favor and having POLLIN | POLLHUP rather fitting.
> It just seems right that POLLHUP indicates "there can be
> no more state transitions".

Note that that is *not* true of process exit.

The final state transition isn't "exit", it is actually "process has
been reaped".  That's the point where data no longer exists.

Arguably "exit()" just means "pidfd is now readable - you can read the
status". That sounds very much like a normal POLLIN condition to me,
since the whole *point* of read() on pidfd is presumably to read the
status.

Now, if you want to have other state transitions (ie read
execve/fork/whatever state details), then you could say that _those_
state transitions are just POLLIN, but that the exit state transition
is POLLIN | POLLHUP. But logically to me it still smells like the
process being reaped should be POLLHUP.

You could also say that the execve/fork/whatever state is out of band
data, and use EPOLLRDBAND for it.

But in fact EPOLLPRI might be better for that, because that works well
with select() (ei if you want to select for execve/fork, you use the
"ex" bitmask).

That said, if FreeBSD already has something like this, and people
actually have code that uses it, there's also simply a strong argument
for "don't be needlessly different".

  Linus


Re: [PATCH RFC 1/2] Add polling support to pidfd

2019-04-19 Thread Daniel Colascione
On Fri, Apr 19, 2019 at 4:33 PM Linus Torvalds
 wrote:
>
> On Fri, Apr 19, 2019 at 4:20 PM Christian Brauner  
> wrote:
> >
> > On Sat, Apr 20, 2019 at 1:11 AM Linus Torvalds
> >  wrote:
> > >
> > > It's also worth noting that POLLERR/POLLHUP/POLLNVAL cannot be masked
> > > for "poll()". Even if you only ask for POLLIN/POLLOUT, you will always
> > > get POLLERR/POLLHUP notification. That is again historical behavior,
> > > and it's kind of a "you can't poll a hung up fd". But it once again
> > > means that you should consider POLLHUP to be something *exceptional*
> > > and final, where no further or other state changes can happen or are
> > > relevant.
> >
> > Which kind of makes sense for process exit. So the historical behavior
> > here is in our favor and having POLLIN | POLLHUP rather fitting.
> > It just seems right that POLLHUP indicates "there can be
> > no more state transitions".
>
> Note that that is *not* true of process exit.
>
> The final state transition isn't "exit", it is actually "process has
> been reaped".  That's the point where data no longer exists.

FWIW, I think the exit status should be available via pidfd even after
process reaping. A non-parent holder of a pidfd has no ability to
control when the parent reaps the child, or even if reaping is
necessary at all --- the parent could make SIGCHLD SIG_IGN. Someone
trying to read exit status via a pidfd shouldn't fail to get that exit
status just because he lost the race with a concurrent waitpid().


[RESEND PATCH] mm/hmm: Fix initial PFN for hugetlbfs pages

2019-04-19 Thread rcampbell
From: Ralph Campbell 

The mmotm patch [1] adds hugetlbfs support for HMM but the initial
PFN used to fill the HMM range->pfns[] array doesn't properly
compute the starting PFN offset.
This can be tested by running test-hugetlbfs-read from [2].

Fix the PFN offset by adjusting the page offset by the device's
page size.

Andrew, this should probably be squashed into Jerome's patch.

[1] https://marc.info/?l=linux-mm=155432003506068=2
("mm/hmm: mirror hugetlbfs (snapshoting, faulting and DMA mapping)")
[2] https://gitlab.freedesktop.org/glisse/svm-cl-tests

Signed-off-by: Ralph Campbell 
Cc: Jérôme Glisse 
Cc: Ira Weiny 
Cc: John Hubbard 
Cc: Dan Williams 
Cc: Arnd Bergmann 
Cc: Balbir Singh 
Cc: Dan Carpenter 
Cc: Matthew Wilcox 
Cc: Souptick Joarder 
Cc: Andrew Morton 
---
 mm/hmm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index def451a56c3e..fcf8e4fb5770 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -868,7 +868,7 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned 
long hmask,
goto unlock;
}
 
-   pfn = pte_pfn(entry) + (start & mask);
+   pfn = pte_pfn(entry) + ((start & mask) >> range->page_shift);
for (; addr < end; addr += size, i++, pfn += pfn_inc)
range->pfns[i] = hmm_device_entry_from_pfn(range, pfn) |
 cpu_flags;
-- 
2.20.1



Re: [PATCH RFC 1/2] Add polling support to pidfd

2019-04-19 Thread Daniel Colascione
On Fri, Apr 19, 2019 at 4:02 PM Christian Brauner  wrote:
>
> On Sat, Apr 20, 2019 at 12:35 AM Daniel Colascione  wrote:
> >
> > On Fri, Apr 19, 2019 at 2:48 PM Christian Brauner  
> > wrote:
> > >
> > > On Fri, Apr 19, 2019 at 11:21 PM Daniel Colascione  
> > > wrote:
> > > >
> > > > On Fri, Apr 19, 2019 at 1:57 PM Christian Brauner 
> > > >  wrote:
> > > > >
> > > > > On Fri, Apr 19, 2019 at 10:34 PM Daniel Colascione 
> > > > >  wrote:
> > > > > >
> > > > > > On Fri, Apr 19, 2019 at 12:49 PM Joel Fernandes 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Fri, Apr 19, 2019 at 09:18:59PM +0200, Christian Brauner wrote:
> > > > > > > > On Fri, Apr 19, 2019 at 03:02:47PM -0400, Joel Fernandes wrote:
> > > > > > > > > On Thu, Apr 18, 2019 at 07:26:44PM +0200, Christian Brauner 
> > > > > > > > > wrote:
> > > > > > > > > > On April 18, 2019 7:23:38 PM GMT+02:00, Jann Horn 
> > > > > > > > > >  wrote:
> > > > > > > > > > >On Wed, Apr 17, 2019 at 3:09 PM Oleg Nesterov 
> > > > > > > > > > > wrote:
> > > > > > > > > > >> On 04/16, Joel Fernandes wrote:
> > > > > > > > > > >> > On Tue, Apr 16, 2019 at 02:04:31PM +0200, Oleg 
> > > > > > > > > > >> > Nesterov wrote:
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > Could you explain when it should return POLLIN? When 
> > > > > > > > > > >> > > the whole
> > > > > > > > > > >process exits?
> > > > > > > > > > >> >
> > > > > > > > > > >> > It returns POLLIN when the task is dead or doesn't 
> > > > > > > > > > >> > exist anymore,
> > > > > > > > > > >or when it
> > > > > > > > > > >> > is in a zombie state and there's no other thread in 
> > > > > > > > > > >> > the thread
> > > > > > > > > > >group.
> > > > > > > > > > >>
> > > > > > > > > > >> IOW, when the whole thread group exits, so it can't be 
> > > > > > > > > > >> used to
> > > > > > > > > > >monitor sub-threads.
> > > > > > > > > > >>
> > > > > > > > > > >> just in case... speaking of this patch it doesn't modify
> > > > > > > > > > >proc_tid_base_operations,
> > > > > > > > > > >> so you can't poll("/proc/sub-thread-tid") anyway, but 
> > > > > > > > > > >> iiuc you are
> > > > > > > > > > >going to use
> > > > > > > > > > >> the anonymous file returned by CLONE_PIDFD ?
> > > > > > > > > > >
> > > > > > > > > > >I don't think procfs works that way. /proc/sub-thread-tid 
> > > > > > > > > > >has
> > > > > > > > > > >proc_tgid_base_operations despite not being a thread group 
> > > > > > > > > > >leader.
> > > > > > > > > > >(Yes, that's kinda weird.) AFAICS the WARN_ON_ONCE() in 
> > > > > > > > > > >this code can
> > > > > > > > > > >be hit trivially, and then the code will misbehave.
> > > > > > > > > > >
> > > > > > > > > > >@Joel: I think you'll have to either rewrite this to 
> > > > > > > > > > >explicitly bail
> > > > > > > > > > >out if you're dealing with a thread group leader, or make 
> > > > > > > > > > >the code
> > > > > > > > > > >work for threads, too.
> > > > > > > > > >
> > > > > > > > > > The latter case probably being preferred if this API is 
> > > > > > > > > > supposed to be
> > > > > > > > > > useable for thread management in userspace.
> > > > > > > > >
> > > > > > > > > At the moment, we are not planning to use this for sub-thread 
> > > > > > > > > management. I
> > > > > > > > > am reworking this patch to only work on clone(2) pidfds which 
> > > > > > > > > makes the above
> > > > > > > >
> > > > > > > > Indeed and agreed.
> > > > > > > >
> > > > > > > > > discussion about /proc a bit unnecessary I think. Per the 
> > > > > > > > > latest CLONE_PIDFD
> > > > > > > > > patches, CLONE_THREAD with pidfd is not supported.
> > > > > > > >
> > > > > > > > Yes. We have no one asking for it right now and we can easily 
> > > > > > > > add this
> > > > > > > > later.
> > > > > > > >
> > > > > > > > Admittedly I haven't gotten around to reviewing the patches 
> > > > > > > > here yet
> > > > > > > > completely. But one thing about using POLLIN. FreeBSD is using 
> > > > > > > > POLLHUP
> > > > > > > > on process exit which I think is nice as well. How about 
> > > > > > > > returning
> > > > > > > > POLLIN | POLLHUP on process exit?
> > > > > > > > We already do things like this. For example, when you proxy 
> > > > > > > > between
> > > > > > > > ttys. If the process that you're reading data from has exited 
> > > > > > > > and closed
> > > > > > > > it's end you still can't usually simply exit because it might 
> > > > > > > > have still
> > > > > > > > buffered data that you want to read.  The way one can deal with 
> > > > > > > > this
> > > > > > > > from  userspace is that you can observe a (POLLHUP | POLLIN) 
> > > > > > > > event and
> > > > > > > > you keep on reading until you only observe a POLLHUP without a 
> > > > > > > > POLLIN
> > > > > > > > event at which point you know you have read
> > > > > > > > all data.
> > > > > > > > I like the semantics for pidfds as well as it would indicate:
> > > > > > > > - POLLHUP -> process has exited
> > > > > > > > - POLLIN  -> 

[PATCH] Makefile: add kselftest-build target

2019-04-19 Thread Shuah Khan
One of the use-cases is building kselftest and then installing.
Adding a target to just build the tests from the main Makefile
will help users and makes it easier.

Signed-off-by: Shuah Khan 
---
 Makefile | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Makefile b/Makefile
index 15c8251d4d5e..38a2ce3c343f 100644
--- a/Makefile
+++ b/Makefile
@@ -1195,6 +1195,9 @@ endif
 # ---
 # Kernel selftest
 
+PHONY += kselftest-build
+kselftest-build:
+   $(Q)$(MAKE) -C $(srctree)/tools/testing/selftests all
 PHONY += kselftest
 kselftest:
$(Q)$(MAKE) -C $(srctree)/tools/testing/selftests run_tests
-- 
2.17.1



[PATCH] selftests: build and run gpio when output directory is the src dir

2019-04-19 Thread Shuah Khan
Build and run gpio when output directory is the src dir.  gpio has
dependency on tools/gpio and builds tools/gpio objects in the src
directory in all cases making the src repo dirty even when object
relocation is specified.

This fixes the following commands from generating gpio objects in
the source repository:

make O=dir kselftest
export KBUILD_OUTPUT=dir; make kselftest
make O=dir -C tools/testing/selftests
expoert KBUILD_OUTPUT=dir; make -C tools/testing/selftests

The following still don't work (need fixing in gpio Makefile):
make O=dir kselftest TARGETS="gpio"
export KBUILD_OUTPUT=dir; make kselftest TARGETS="gpio"
make O=dir -C tools/testing/selftests TARGETS="gpio"
expoert KBUILD_OUTPUT=dir; make -C tools/testing/selftests TARGETS="gpio"

Signed-off-by: Shuah Khan 
---
 tools/testing/selftests/Makefile | 9 +
 1 file changed, 9 insertions(+)

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index b43c5f41fb3e..f2ebf8cf4686 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -99,6 +99,15 @@ ARCH   ?= $(SUBARCH)
 export KSFT_KHDR_INSTALL_DONE := 1
 export BUILD
 
+# build and run gpio when output directory is the src dir.
+# gpio has dependency on tools/gpio and builds tools/gpio
+# objects in the src directory in all cases making the src
+# repo dirty even when objects are relocated.
+ifneq (1,$(DEFAULT_INSTALL_HDR_PATH))
+   TMP := $(filter-out gpio, $(TARGETS))
+   TARGETS := $(TMP)
+endif
+
 # set default goal to all, so make without a target runs all, even when
 # all isn't the first target in the file.
 .DEFAULT_GOAL := all
-- 
2.17.1



Re: [RFC][PATCH 13/16] sched: Add core wide task selection and scheduling.

2019-04-19 Thread Subhra Mazumdar



On 4/19/19 1:40 AM, Ingo Molnar wrote:

* Subhra Mazumdar  wrote:


I see similar improvement with this patch as removing the condition I
earlier mentioned. So that's not needed. I also included the patch for the
priority fix. For 2 DB instances, HT disabling stands at -22% for 32 users
(from earlier emails).


1 DB instance

users  baseline   %idle    core_sched %idle
16 1  84   -4.9% 84
24 1  76   -6.7% 75
32 1  69   -2.4% 69

2 DB instance

users  baseline   %idle    core_sched %idle
16 1  66   -19.5% 69
24 1  54   -9.8% 57
32 1  42   -27.2%    48

So HT disabling slows down the 2DB instance by -22%, while core-sched
slows it down by -27.2%?

Would it be possible to see all the results in two larger tables (1 DB
instance and 2 DB instance) so that we can compare the performance of the
3 kernel variants with each other:

  - "vanilla +HT": Hyperthreading enabled,  vanilla scheduler
  - "vanilla -HT": Hyperthreading disabled, vanilla scheduler
  - "core_sched":  Hyperthreading enabled,  core-scheduling enabled

?

Thanks,

Ingo

Following are the numbers. Disabling HT gives improvement in some cases.

1 DB instance

users  vanilla+HT   core_sched vanilla-HT
16 1    -4.9% -11.7%
24 1    -6.7% +13.7%
32 1    -2.4% +8%

2 DB instance

users  vanilla+HT   core_sched vanilla-HT
16 1    -19.5% +5.6%
24 1    -9.8% +3.5%
32 1    -27.2%    -22.8%


Re: [PATCH RFC 1/2] Add polling support to pidfd

2019-04-19 Thread Christian Brauner
On Sat, Apr 20, 2019 at 1:11 AM Linus Torvalds
 wrote:
>
> On Fri, Apr 19, 2019 at 2:20 PM Joel Fernandes  wrote:
> >
> > According to Linus, POLLHUP usually indicates that something is readable:
>
> Note that if you use the legacy interfaces (ie "select()"), then even
> just a plain POLLHUP will always just show as "readable".
>
> So for a lot of applications, it won't even matter. Returning POLLHUP
> will mean it's readable whether POLLIN is set or not (and EPOLLERR
> will automatically mean that it's marked both readable and writable).
>
> In fact, I'd argue that not a lot of people care about the more
> esoteric bits at all. If your poll function does POLLIN and POLLOUT,
> it's fine. Anything else is specialized enough that most people simply
> won't care. Don't overdesign things too much. You need to have a major
> reason to care about the other POLL bits.
>
> It's also worth noting that POLLERR/POLLHUP/POLLNVAL cannot be masked
> for "poll()". Even if you only ask for POLLIN/POLLOUT, you will always
> get POLLERR/POLLHUP notification. That is again historical behavior,
> and it's kind of a "you can't poll a hung up fd". But it once again
> means that you should consider POLLHUP to be something *exceptional*
> and final, where no further or other state changes can happen or are
> relevant.

Which kind of makes sense for process exit. So the historical behavior
here is in our favor and having POLLIN | POLLHUP rather fitting.
It just seems right that POLLHUP indicates "there can be
no more state transitions".

Christian


Re: [PATCH RFC 1/2] Add polling support to pidfd

2019-04-19 Thread Christian Brauner
On Sat, Apr 20, 2019 at 12:46 AM Daniel Colascione  wrote:
>
> On Fri, Apr 19, 2019 at 3:02 PM Christian Brauner  
> wrote:
> >
> > On Fri, Apr 19, 2019 at 11:48 PM Christian Brauner  
> > wrote:
> > >
> > > On Fri, Apr 19, 2019 at 11:21 PM Daniel Colascione  
> > > wrote:
> > > >
> > > > On Fri, Apr 19, 2019 at 1:57 PM Christian Brauner 
> > > >  wrote:
> > > > >
> > > > > On Fri, Apr 19, 2019 at 10:34 PM Daniel Colascione 
> > > > >  wrote:
> > > > > >
> > > > > > On Fri, Apr 19, 2019 at 12:49 PM Joel Fernandes 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Fri, Apr 19, 2019 at 09:18:59PM +0200, Christian Brauner wrote:
> > > > > > > > On Fri, Apr 19, 2019 at 03:02:47PM -0400, Joel Fernandes wrote:
> > > > > > > > > On Thu, Apr 18, 2019 at 07:26:44PM +0200, Christian Brauner 
> > > > > > > > > wrote:
> > > > > > > > > > On April 18, 2019 7:23:38 PM GMT+02:00, Jann Horn 
> > > > > > > > > >  wrote:
> > > > > > > > > > >On Wed, Apr 17, 2019 at 3:09 PM Oleg Nesterov 
> > > > > > > > > > > wrote:
> > > > > > > > > > >> On 04/16, Joel Fernandes wrote:
> > > > > > > > > > >> > On Tue, Apr 16, 2019 at 02:04:31PM +0200, Oleg 
> > > > > > > > > > >> > Nesterov wrote:
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > Could you explain when it should return POLLIN? When 
> > > > > > > > > > >> > > the whole
> > > > > > > > > > >process exits?
> > > > > > > > > > >> >
> > > > > > > > > > >> > It returns POLLIN when the task is dead or doesn't 
> > > > > > > > > > >> > exist anymore,
> > > > > > > > > > >or when it
> > > > > > > > > > >> > is in a zombie state and there's no other thread in 
> > > > > > > > > > >> > the thread
> > > > > > > > > > >group.
> > > > > > > > > > >>
> > > > > > > > > > >> IOW, when the whole thread group exits, so it can't be 
> > > > > > > > > > >> used to
> > > > > > > > > > >monitor sub-threads.
> > > > > > > > > > >>
> > > > > > > > > > >> just in case... speaking of this patch it doesn't modify
> > > > > > > > > > >proc_tid_base_operations,
> > > > > > > > > > >> so you can't poll("/proc/sub-thread-tid") anyway, but 
> > > > > > > > > > >> iiuc you are
> > > > > > > > > > >going to use
> > > > > > > > > > >> the anonymous file returned by CLONE_PIDFD ?
> > > > > > > > > > >
> > > > > > > > > > >I don't think procfs works that way. /proc/sub-thread-tid 
> > > > > > > > > > >has
> > > > > > > > > > >proc_tgid_base_operations despite not being a thread group 
> > > > > > > > > > >leader.
> > > > > > > > > > >(Yes, that's kinda weird.) AFAICS the WARN_ON_ONCE() in 
> > > > > > > > > > >this code can
> > > > > > > > > > >be hit trivially, and then the code will misbehave.
> > > > > > > > > > >
> > > > > > > > > > >@Joel: I think you'll have to either rewrite this to 
> > > > > > > > > > >explicitly bail
> > > > > > > > > > >out if you're dealing with a thread group leader, or make 
> > > > > > > > > > >the code
> > > > > > > > > > >work for threads, too.
> > > > > > > > > >
> > > > > > > > > > The latter case probably being preferred if this API is 
> > > > > > > > > > supposed to be
> > > > > > > > > > useable for thread management in userspace.
> > > > > > > > >
> > > > > > > > > At the moment, we are not planning to use this for sub-thread 
> > > > > > > > > management. I
> > > > > > > > > am reworking this patch to only work on clone(2) pidfds which 
> > > > > > > > > makes the above
> > > > > > > >
> > > > > > > > Indeed and agreed.
> > > > > > > >
> > > > > > > > > discussion about /proc a bit unnecessary I think. Per the 
> > > > > > > > > latest CLONE_PIDFD
> > > > > > > > > patches, CLONE_THREAD with pidfd is not supported.
> > > > > > > >
> > > > > > > > Yes. We have no one asking for it right now and we can easily 
> > > > > > > > add this
> > > > > > > > later.
> > > > > > > >
> > > > > > > > Admittedly I haven't gotten around to reviewing the patches 
> > > > > > > > here yet
> > > > > > > > completely. But one thing about using POLLIN. FreeBSD is using 
> > > > > > > > POLLHUP
> > > > > > > > on process exit which I think is nice as well. How about 
> > > > > > > > returning
> > > > > > > > POLLIN | POLLHUP on process exit?
> > > > > > > > We already do things like this. For example, when you proxy 
> > > > > > > > between
> > > > > > > > ttys. If the process that you're reading data from has exited 
> > > > > > > > and closed
> > > > > > > > it's end you still can't usually simply exit because it might 
> > > > > > > > have still
> > > > > > > > buffered data that you want to read.  The way one can deal with 
> > > > > > > > this
> > > > > > > > from  userspace is that you can observe a (POLLHUP | POLLIN) 
> > > > > > > > event and
> > > > > > > > you keep on reading until you only observe a POLLHUP without a 
> > > > > > > > POLLIN
> > > > > > > > event at which point you know you have read
> > > > > > > > all data.
> > > > > > > > I like the semantics for pidfds as well as it would indicate:
> > > > > > > > - POLLHUP -> process has exited
> > > > > > > > - POLLIN 

Re: [PATCH RFC 1/2] Add polling support to pidfd

2019-04-19 Thread Linus Torvalds
On Fri, Apr 19, 2019 at 2:20 PM Joel Fernandes  wrote:
>
> According to Linus, POLLHUP usually indicates that something is readable:

Note that if you use the legacy interfaces (ie "select()"), then even
just a plain POLLHUP will always just show as "readable".

So for a lot of applications, it won't even matter. Returning POLLHUP
will mean it's readable whether POLLIN is set or not (and EPOLLERR
will automatically mean that it's marked both readable and writable).

In fact, I'd argue that not a lot of people care about the more
esoteric bits at all. If your poll function does POLLIN and POLLOUT,
it's fine. Anything else is specialized enough that most people simply
won't care. Don't overdesign things too much. You need to have a major
reason to care about the other POLL bits.

It's also worth noting that POLLERR/POLLHUP/POLLNVAL cannot be masked
for "poll()". Even if you only ask for POLLIN/POLLOUT, you will always
get POLLERR/POLLHUP notification. That is again historical behavior,
and it's kind of a "you can't poll a hung up fd". But it once again
means that you should consider POLLHUP to be something *exceptional*
and final, where no further or other state changes can happen or are
relevant.

That may well work fine for pidfd when the task is gone, but it's
really worth noting that any *normal* state should be about
POLLIN/POLLOUT. People should not think that "POLLHUP sounds like the
appropriate name", they should realize that POLLHUP is basically a
terminal error condition, not a "input is available".

So just keep that in mind.

Linus


Re: [PATCH 2/3] ARM: at91: move platform-specific asm-offset.h to arch/arm/mach-at91

2019-04-19 Thread Masahiro Yamada
On Sat, Apr 20, 2019 at 4:03 AM Ludovic Desroches
 wrote:
>
> On Mon, Apr 15, 2019 at 05:14:50PM +0200, Alexandre Belloni wrote:
> > External E-Mail
> >
> >
> > On 08/04/2019 16:54:26+0900, Masahiro Yamada wrote:
> > >  is only generated and included
> > > by arch/arm/mach-at91/, so it does not need to reside in the
> > > globally visible include/generated/.
> > >
> > > I moved and renamed it to arch/arm/mach-at91/pm_data-offsets.h
> > > since the prefix 'at91_' is just redundant in mach-at91/.
> > >
> > > Signed-off-by: Masahiro Yamada 
> > Acked-by: Alexandre Belloni 
> >
>
> Applied in at91-soc. Let me know if it's an issue, I plan to do the PR
> soon.


There is one minor issue.

If you apply 2/3 (this one) alone,
arch/arm/mach-at91/pm_data-offsets.h is not cleaned.


1/3 fixes the "make clean" issue:
https://lkml.org/lkml/2019/4/8/153


That is why I sent this as a series
in order to avoid the regression of cleaning.



Thanks.

Masahiro Yamada


> Regards
>
> Ludovic
>
> > > ---
> > >
> > > Can this be applied to ARM-SOC tree in a series?
> > > (with Ack from the platform sub-maintainer.)
> > >
> > > at91_pm_data-offsets.h header does not need to reside in
> > > include/generated/, but you may ask
> > > "Why must it get out of include/generated/?"
> > >
> > > My main motivation is to avoid a race condition in the currently
> > > proposed patch:
> > >
> > > https://lore.kernel.org/patchwork/patch/1052763/
> > >
> > > This patch tries to embed some build artifacts into the kernel.
> > >
> > > If arch/arm/mach-at91/ and kernel/ are built at the same time,
> > > it may embed a truncated file.
> > >
> > >
> > >  arch/arm/mach-at91/.gitignore   | 1 +
> > >  arch/arm/mach-at91/Makefile | 5 +++--
> > >  arch/arm/mach-at91/pm_suspend.S | 2 +-
> > >  3 files changed, 5 insertions(+), 3 deletions(-)
> > >  create mode 100644 arch/arm/mach-at91/.gitignore
> > >
> > > diff --git a/arch/arm/mach-at91/.gitignore b/arch/arm/mach-at91/.gitignore
> > > new file mode 100644
> > > index ..2ecd6f51c8a9
> > > --- /dev/null
> > > +++ b/arch/arm/mach-at91/.gitignore
> > > @@ -0,0 +1 @@
> > > +pm_data-offsets.h
> > > diff --git a/arch/arm/mach-at91/Makefile b/arch/arm/mach-at91/Makefile
> > > index 31b61f0e1c07..de64301dcff2 100644
> > > --- a/arch/arm/mach-at91/Makefile
> > > +++ b/arch/arm/mach-at91/Makefile
> > > @@ -19,9 +19,10 @@ ifeq ($(CONFIG_PM_DEBUG),y)
> > >  CFLAGS_pm.o += -DDEBUG
> > >  endif
> > >
> > > -include/generated/at91_pm_data-offsets.h: 
> > > arch/arm/mach-at91/pm_data-offsets.s FORCE
> > > +$(obj)/pm_data-offsets.h: $(obj)/pm_data-offsets.s FORCE
> > > $(call filechk,offsets,__PM_DATA_OFFSETS_H__)
> > >
> > > -arch/arm/mach-at91/pm_suspend.o: include/generated/at91_pm_data-offsets.h
> > > +$(obj)/pm_suspend.o: $(obj)/pm_data-offsets.h
> > >
> > >  targets += pm_data-offsets.s
> > > +clean-files += pm_data-offsets.h
> > > diff --git a/arch/arm/mach-at91/pm_suspend.S 
> > > b/arch/arm/mach-at91/pm_suspend.S
> > > index bfe1c4d06901..a31c1b20f3fa 100644
> > > --- a/arch/arm/mach-at91/pm_suspend.S
> > > +++ b/arch/arm/mach-at91/pm_suspend.S
> > > @@ -14,7 +14,7 @@
> > >  #include 
> > >  #include 
> > >  #include "pm.h"
> > > -#include "generated/at91_pm_data-offsets.h"
> > > +#include "pm_data-offsets.h"
> > >
> > >  #defineSRAMC_SELF_FRESH_ACTIVE 0x01
> > >  #defineSRAMC_SELF_FRESH_EXIT   0x00
> > > --
> > > 2.17.1
> > >
> >
> > --
> > Alexandre Belloni, Bootlin
> > Embedded Linux and Kernel engineering
> > https://bootlin.com
> >



-- 
Best Regards
Masahiro Yamada


Re: Detecting x86 LAPIC timer frequency from CPUID data

2019-04-19 Thread Jacob Pan
On Fri, 19 Apr 2019 22:52:01 +0200 (CEST)
Thomas Gleixner  wrote:

> On Fri, 19 Apr 2019, Jacob Pan wrote:
> > On Fri, 19 Apr 2019 10:57:10 +0200 (CEST)
> > Thomas Gleixner  wrote:  
> > > On Fri, 19 Apr 2019, Daniel Drake wrote:  
> > > > 0x7F vs 0x7FFF, is that intentional?
> > > 
> > > I don't think so. Looks like a failed copy and paste. Cc'ed
> > > Jacob, he might know.
> > >   
> > At the time of v2.6.35 both places use 0x7F. But later this
> > patch increased the latter to 0x7FFF but forgot the first part.
> > So I guess it is not exactly a failed copy and paste.
> > 
> > commit 4aed89d6b515b9185351706ca95cd712c9d8d6a3
> > Author: Pierre Tardy 
> > Date:   Thu Jan 6 16:23:29 2011 +0100
> > 
> > x86, lapic-timer: Increase the max_delta to 31 bits  
> 
> Indeed. Thanks for digging that up!
> 
>   tglx

How about a fix like this? I should have taken your advice 9 years ago
to avoid duplicated code :)
https://lkml.org/lkml/2010/5/11/499

>From 18450bb67e09f5b472f1ed313d7f87a983cb0ac1 Mon Sep 17 00:00:00 2001
From: Jacob Pan 
Date: Fri, 19 Apr 2019 15:56:06 -0700
Subject: [PATCH] x86/apic: Fix duplicated lapic timer calculation

Local APIC timer clockevent parameters can be calculated based on
platform specific methods. However the code is mostly duplicated
with the interrupt based calibration. This causes further updates
prone to mistakes.

Fixes: 4aed89d ("x86, lapic-timer: Increase the max_delta to 31 bits")

Signed-off-by: Jacob Pan 
---
 arch/x86/kernel/apic/apic.c | 46
++--- 1 file changed, 27
insertions(+), 19 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index b7bcdd7..b2ef91c 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -802,6 +802,24 @@ calibrate_by_pmtimer(long deltapm, long *delta,
long *deltatsc) return 0;
 }
 
+static int __init calculate_lapic_clockevent(void)
+{
+   if (!lapic_timer_frequency)
+   return -1;
+
+   /* Calculate the scaled math multiplication factor */
+   lapic_clockevent.mult =
div_sc(lapic_timer_frequency/APIC_DIVISOR,
+   TICK_NSEC,
lapic_clockevent.shift);
+   lapic_clockevent.max_delta_ns =
+   clockevent_delta2ns(0x7FFF, _clockevent);
+   lapic_clockevent.max_delta_ticks = 0x7FFF;
+   lapic_clockevent.min_delta_ns =
+   clockevent_delta2ns(0xF, _clockevent);
+   lapic_clockevent.min_delta_ticks = 0xF;
+
+   return 0;
+}
+
 static int __init calibrate_APIC_clock(void)
 {
struct clock_event_device *levt = this_cpu_ptr(_events);
@@ -818,18 +836,17 @@ static int __init calibrate_APIC_clock(void)
 
if (boot_cpu_has(X86_FEATURE_TSC_DEADLINE_TIMER)) {
return 0;
-   } else if (lapic_timer_frequency) {
+   }
+
+   if (!calculate_lapic_clockevent()) {
apic_printk(APIC_VERBOSE, "lapic timer already
calibrated %d\n", lapic_timer_frequency);
-   lapic_clockevent.mult =
div_sc(lapic_timer_frequency/APIC_DIVISOR,
-   TICK_NSEC,
lapic_clockevent.shift);
-   lapic_clockevent.max_delta_ns =
-   clockevent_delta2ns(0x7F,
_clockevent);
-   lapic_clockevent.max_delta_ticks = 0x7F;
-   lapic_clockevent.min_delta_ns =
-   clockevent_delta2ns(0xF, _clockevent);
-   lapic_clockevent.min_delta_ticks = 0xF;
+   /*
+* Newer platforms provide early calibration methods
must have always
+* running local APIC timers, no need for boradcast
timer.
+*/
lapic_clockevent.features &= ~CLOCK_EVT_FEAT_DUMMY;
+
return 0;
}
 
@@ -869,17 +886,8 @@ static int __init calibrate_APIC_clock(void)
pm_referenced = !calibrate_by_pmtimer(lapic_cal_pm2 -
lapic_cal_pm1, , );
 
-   /* Calculate the scaled math multiplication factor */
-   lapic_clockevent.mult = div_sc(delta, TICK_NSEC *
LAPIC_CAL_LOOPS,
-  lapic_clockevent.shift);
-   lapic_clockevent.max_delta_ns =
-   clockevent_delta2ns(0x7FFF, _clockevent);
-   lapic_clockevent.max_delta_ticks = 0x7FFF;
-   lapic_clockevent.min_delta_ns =
-   clockevent_delta2ns(0xF, _clockevent);
-   lapic_clockevent.min_delta_ticks = 0xF;
-
lapic_timer_frequency = (delta * APIC_DIVISOR) /
LAPIC_CAL_LOOPS;
+   calculate_lapic_clockevent();
 
apic_printk(APIC_VERBOSE, ". delta %ld\n", delta);
apic_printk(APIC_VERBOSE, ". mult: %u\n",
lapic_clockevent.mult);
-- 
2.7.4



Re: [PATCH v4 2/2] power_supply: platform/chrome: wilco_ec: Add charging config driver

2019-04-19 Thread kbuild test robot
Hi Nick,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.1-rc5 next-20190418]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Nick-Crews/power_supply-Add-more-charge-types-and-CHARGE_CONTROL_-properties/20190420-045118
config: i386-randconfig-x017-201915 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 


All errors (new ones prefixed by >>):

   drivers/platform/chrome/wilco_ec/core.c: In function 'wilco_ec_probe':
>> drivers/platform/chrome/wilco_ec/core.c:106:33: error: 'struct 
>> wilco_ec_device' has no member named 'kbbl_pdev'; did you mean 'rtc_pdev'?
 platform_device_unregister(ec->kbbl_pdev);
^
rtc_pdev
   drivers/platform/chrome/wilco_ec/core.c:105:1: warning: label 
'unregister_kbbl' defined but not used [-Wunused-label]
unregister_kbbl:
^~~

vim +106 drivers/platform/chrome/wilco_ec/core.c

40  
41  static int wilco_ec_probe(struct platform_device *pdev)
42  {
43  struct device *dev = >dev;
44  struct wilco_ec_device *ec;
45  int ret;
46  
47  ec = devm_kzalloc(dev, sizeof(*ec), GFP_KERNEL);
48  if (!ec)
49  return -ENOMEM;
50  
51  platform_set_drvdata(pdev, ec);
52  ec->dev = dev;
53  mutex_init(>mailbox_lock);
54  
55  /* Largest data buffer size requirement is extended data 
response */
56  ec->data_size = sizeof(struct wilco_ec_response) +
57  EC_MAILBOX_DATA_SIZE_EXTENDED;
58  ec->data_buffer = devm_kzalloc(dev, ec->data_size, GFP_KERNEL);
59  if (!ec->data_buffer)
60  return -ENOMEM;
61  
62  /* Prepare access to IO regions provided by ACPI */
63  ec->io_data = wilco_get_resource(pdev, 0);  /* Host Data */
64  ec->io_command = wilco_get_resource(pdev, 1);   /* Host Command 
*/
65  ec->io_packet = wilco_get_resource(pdev, 2);/* MEC EMI */
66  if (!ec->io_data || !ec->io_command || !ec->io_packet)
67  return -ENODEV;
68  
69  /* Initialize cros_ec register interface for communication */
70  cros_ec_lpc_mec_init(ec->io_packet->start,
71   ec->io_packet->start + 
EC_MAILBOX_DATA_SIZE);
72  
73  /*
74   * Register a child device that will be found by the debugfs 
driver.
75   * Ignore failure.
76   */
77  ec->debugfs_pdev = platform_device_register_data(dev,
78   
"wilco-ec-debugfs",
79   
PLATFORM_DEVID_AUTO,
80   NULL, 0);
81  
82  /* Register a child device that will be found by the RTC 
driver. */
83  ec->rtc_pdev = platform_device_register_data(dev, 
"rtc-wilco-ec",
84   
PLATFORM_DEVID_AUTO,
85   NULL, 0);
86  if (IS_ERR(ec->rtc_pdev)) {
87  dev_err(dev, "Failed to create RTC platform device\n");
88  ret = PTR_ERR(ec->rtc_pdev);
89  goto unregister_debugfs;
90  }
91  
92  /* Register child device to be found by charging config driver. 
*/
93  ec->charging_pdev = platform_device_register_data(dev,
94
"wilco-ec-charging",
95
PLATFORM_DEVID_AUTO,
96NULL, 0);
97  if (IS_ERR(ec->charging_pdev)) {
98  dev_err(dev, "Failed to create charging platform 
device\n");
99  ret = PTR_ERR(ec->charging_pdev);
   100  goto unregister_rtc;
   101  }
   102  
   103  return 0;
   104  
   105  unregister_kbbl:
 > 106  platform_device_unregister(ec->kbbl_pdev);
   107  unregister_rtc:
   108  platform_device_unregister(ec->rtc_pdev);
   109  unregister_debugfs:
   110  if (ec->debugfs_pdev)
   111  platform_device_unregister(ec->debugfs_pdev);
   112  cros_ec_lpc_mec_destroy();
   113  return ret;
   114  }
   115  

---
0-DAY kernel test infrastructure   

Re: [PATCH RFC 1/2] Add polling support to pidfd

2019-04-19 Thread Christian Brauner
On Sat, Apr 20, 2019 at 12:35 AM Daniel Colascione  wrote:
>
> On Fri, Apr 19, 2019 at 2:48 PM Christian Brauner  
> wrote:
> >
> > On Fri, Apr 19, 2019 at 11:21 PM Daniel Colascione  
> > wrote:
> > >
> > > On Fri, Apr 19, 2019 at 1:57 PM Christian Brauner  
> > > wrote:
> > > >
> > > > On Fri, Apr 19, 2019 at 10:34 PM Daniel Colascione  
> > > > wrote:
> > > > >
> > > > > On Fri, Apr 19, 2019 at 12:49 PM Joel Fernandes 
> > > > >  wrote:
> > > > > >
> > > > > > On Fri, Apr 19, 2019 at 09:18:59PM +0200, Christian Brauner wrote:
> > > > > > > On Fri, Apr 19, 2019 at 03:02:47PM -0400, Joel Fernandes wrote:
> > > > > > > > On Thu, Apr 18, 2019 at 07:26:44PM +0200, Christian Brauner 
> > > > > > > > wrote:
> > > > > > > > > On April 18, 2019 7:23:38 PM GMT+02:00, Jann Horn 
> > > > > > > > >  wrote:
> > > > > > > > > >On Wed, Apr 17, 2019 at 3:09 PM Oleg Nesterov 
> > > > > > > > > > wrote:
> > > > > > > > > >> On 04/16, Joel Fernandes wrote:
> > > > > > > > > >> > On Tue, Apr 16, 2019 at 02:04:31PM +0200, Oleg Nesterov 
> > > > > > > > > >> > wrote:
> > > > > > > > > >> > >
> > > > > > > > > >> > > Could you explain when it should return POLLIN? When 
> > > > > > > > > >> > > the whole
> > > > > > > > > >process exits?
> > > > > > > > > >> >
> > > > > > > > > >> > It returns POLLIN when the task is dead or doesn't exist 
> > > > > > > > > >> > anymore,
> > > > > > > > > >or when it
> > > > > > > > > >> > is in a zombie state and there's no other thread in the 
> > > > > > > > > >> > thread
> > > > > > > > > >group.
> > > > > > > > > >>
> > > > > > > > > >> IOW, when the whole thread group exits, so it can't be 
> > > > > > > > > >> used to
> > > > > > > > > >monitor sub-threads.
> > > > > > > > > >>
> > > > > > > > > >> just in case... speaking of this patch it doesn't modify
> > > > > > > > > >proc_tid_base_operations,
> > > > > > > > > >> so you can't poll("/proc/sub-thread-tid") anyway, but iiuc 
> > > > > > > > > >> you are
> > > > > > > > > >going to use
> > > > > > > > > >> the anonymous file returned by CLONE_PIDFD ?
> > > > > > > > > >
> > > > > > > > > >I don't think procfs works that way. /proc/sub-thread-tid has
> > > > > > > > > >proc_tgid_base_operations despite not being a thread group 
> > > > > > > > > >leader.
> > > > > > > > > >(Yes, that's kinda weird.) AFAICS the WARN_ON_ONCE() in this 
> > > > > > > > > >code can
> > > > > > > > > >be hit trivially, and then the code will misbehave.
> > > > > > > > > >
> > > > > > > > > >@Joel: I think you'll have to either rewrite this to 
> > > > > > > > > >explicitly bail
> > > > > > > > > >out if you're dealing with a thread group leader, or make 
> > > > > > > > > >the code
> > > > > > > > > >work for threads, too.
> > > > > > > > >
> > > > > > > > > The latter case probably being preferred if this API is 
> > > > > > > > > supposed to be
> > > > > > > > > useable for thread management in userspace.
> > > > > > > >
> > > > > > > > At the moment, we are not planning to use this for sub-thread 
> > > > > > > > management. I
> > > > > > > > am reworking this patch to only work on clone(2) pidfds which 
> > > > > > > > makes the above
> > > > > > >
> > > > > > > Indeed and agreed.
> > > > > > >
> > > > > > > > discussion about /proc a bit unnecessary I think. Per the 
> > > > > > > > latest CLONE_PIDFD
> > > > > > > > patches, CLONE_THREAD with pidfd is not supported.
> > > > > > >
> > > > > > > Yes. We have no one asking for it right now and we can easily add 
> > > > > > > this
> > > > > > > later.
> > > > > > >
> > > > > > > Admittedly I haven't gotten around to reviewing the patches here 
> > > > > > > yet
> > > > > > > completely. But one thing about using POLLIN. FreeBSD is using 
> > > > > > > POLLHUP
> > > > > > > on process exit which I think is nice as well. How about returning
> > > > > > > POLLIN | POLLHUP on process exit?
> > > > > > > We already do things like this. For example, when you proxy 
> > > > > > > between
> > > > > > > ttys. If the process that you're reading data from has exited and 
> > > > > > > closed
> > > > > > > it's end you still can't usually simply exit because it might 
> > > > > > > have still
> > > > > > > buffered data that you want to read.  The way one can deal with 
> > > > > > > this
> > > > > > > from  userspace is that you can observe a (POLLHUP | POLLIN) 
> > > > > > > event and
> > > > > > > you keep on reading until you only observe a POLLHUP without a 
> > > > > > > POLLIN
> > > > > > > event at which point you know you have read
> > > > > > > all data.
> > > > > > > I like the semantics for pidfds as well as it would indicate:
> > > > > > > - POLLHUP -> process has exited
> > > > > > > - POLLIN  -> information can be read
> > > > > >
> > > > > > Actually I think a bit different about this, in my opinion the 
> > > > > > pidfd should
> > > > > > always be readable (we would store the exit status somewhere in the 
> > > > > > future
> > > > > > which would be readable, even after task_struct is dead). So 

DISCONTIGMEM is deprecated

2019-04-19 Thread Matthew Wilcox
On Fri, Apr 19, 2019 at 10:43:35AM +0100, Mel Gorman wrote:
> DISCONTIG is essentially deprecated and even parisc plans to move to
> SPARSEMEM so there is no need to be fancy, this patch simply disables
> watermark boosting by default on DISCONTIGMEM.

I don't think parisc is the only arch which uses DISCONTIGMEM for !NUMA
scenarios.  Grepping the arch/ directories shows:

alpha (does support NUMA, but also non-NUMA DISCONTIGMEM)
arc (for supporting more than 1GB of memory)
ia64 (looks complicated ...)
m68k (for multiple chunks of memory)
mips (does support NUMA but also non-NUMA)
parisc (both NUMA and non-NUMA)

I'm not sure that these architecture maintainers even know that DISCONTIGMEM
is deprecated.  Adding linux-arch to the cc.


Re: [PATCH v4 14/16] locking/rwsem: Guard against making count negative

2019-04-19 Thread Peter Zijlstra
On Fri, Apr 19, 2019 at 02:02:07PM +0200, Peter Zijlstra wrote:
> On Fri, Apr 19, 2019 at 12:26:47PM +0200, Peter Zijlstra wrote:
> > I thought of a horrible horrible alternative:
> 
> Hurm, that's broken as heck. Let me try again.

So I can't make that scheme work, it all ends up wanting to have
cmpxchg().

Do we have a performance comparison somewhere of xadd vs cmpxchg
readers? I tried looking in the old threads, but I can't seem to locate
it.

We need new instructions :/ Or more clever than I can muster just now.


[PATCH 2/2] docs: hwmon: remove the extension from .rst files

2019-04-19 Thread Mauro Carvalho Chehab
On almost all places, we're including ReST files without the
extension.

Let's remove the extension here as well, in order to use just
one standard.

Suggested-by: Jani Nikula 
---
 Documentation/hwmon/index.rst | 316 +-
 1 file changed, 158 insertions(+), 158 deletions(-)

diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst
index 893804414510..bc74c43e3e2c 100644
--- a/Documentation/hwmon/index.rst
+++ b/Documentation/hwmon/index.rst
@@ -5,11 +5,11 @@ Linux Hardware Monitoring
 .. toctree::
:maxdepth: 1
 
-   hwmon-kernel-api.rst
-   pmbus-core.rst
-   submitting-patches.rst
-   sysfs-interface.rst
-   userspace-tools.rst
+   hwmon-kernel-api
+   pmbus-core
+   submitting-patches
+   sysfs-interface
+   userspace-tools
 
 Hardware Monitoring Kernel Drivers
 ==
@@ -17,159 +17,159 @@ Hardware Monitoring Kernel Drivers
 .. toctree::
:maxdepth: 1
 
-   ab8500.rst
-   abituguru.rst
-   abituguru3.rst
-   abx500.rst
-   acpi_power_meter.rst
-   ad7314.rst
-   adc128d818.rst
-   adm1021.rst
-   adm1025.rst
-   adm1026.rst
-   adm1031.rst
-   adm1275.rst
-   adm9240.rst
-   ads1015.rst
-   ads7828.rst
-   adt7410.rst
-   adt7411.rst
-   adt7462.rst
-   adt7470.rst
-   adt7475.rst
-   amc6821.rst
-   asb100.rst
-   asc7621.rst
-   aspeed-pwm-tacho.rst
-   coretemp.rst
-   da9052.rst
-   da9055.rst
-   dme1737.rst
-   ds1621.rst
-   ds620.rst
-   emc1403.rst
-   emc2103.rst
-   emc6w201.rst
-   f71805f.rst
-   f71882fg.rst
-   fam15h_power.rst
-   ftsteutates.rst
-   g760a.rst
-   g762.rst
-   gl518sm.rst
-   hih6130.rst
-   ibmaem.rst
-   ibm-cffps.rst
-   ibmpowernv.rst
-   ina209.rst
-   ina2xx.rst
-   ina3221.rst
-   ir35221.rst
-   it87.rst
-   jc42.rst
-   k10temp.rst
-   k8temp.rst
-   lineage-pem.rst
-   lm25066.rst
-   lm63.rst
-   lm70.rst
-   lm73.rst
-   lm75.rst
-   lm77.rst
-   lm78.rst
-   lm80.rst
-   lm83.rst
-   lm85.rst
-   lm87.rst
-   lm90.rst
-   lm92.rst
-   lm93.rst
-   lm95234.rst
-   lm95245.rst
-   ltc2945.rst
-   ltc2978.rst
-   ltc2990.rst
-   ltc3815.rst
-   ltc4151.rst
-   ltc4215.rst
-   ltc4245.rst
-   ltc4260.rst
-   ltc4261.rst
-   max16064.rst
-   max16065.rst
-   max1619.rst
-   max1668.rst
-   max197.rst
-   max20751.rst
-   max31722.rst
-   max31785.rst
-   max31790.rst
-   max34440.rst
-   max6639.rst
-   max6642.rst
-   max6650.rst
-   max6697.rst
-   max8688.rst
-   mc13783-adc.rst
-   mcp3021.rst
-   menf21bmc.rst
-   mlxreg-fan.rst
-   nct6683.rst
-   nct6775.rst
-   nct7802.rst
-   nct7904.rst
-   npcm750-pwm-fan.rst
-   nsa320.rst
-   ntc_thermistor.rst
-   occ.rst
-   pc87360.rst
-   pc87427.rst
-   pcf8591.rst
-   pmbus.rst
-   powr1220.rst
-   pwm-fan.rst
-   raspberrypi-hwmon.rst
-   sch5627.rst
-   sch5636.rst
-   scpi-hwmon.rst
-   sht15.rst
-   sht21.rst
-   sht3x.rst
-   shtc1.rst
-   sis5595.rst
-   smm665.rst
-   smsc47b397.rst
-   smsc47m192.rst
-   smsc47m1.rst
-   tc654.rst
-   tc74.rst
-   thmc50.rst
-   tmp102.rst
-   tmp103.rst
-   tmp108.rst
-   tmp401.rst
-   tmp421.rst
-   tps40422.rst
-   twl4030-madc-hwmon.rst
-   ucd9000.rst
-   ucd9200.rst
-   vexpress.rst
-   via686a.rst
-   vt1211.rst
-   w83627ehf.rst
-   w83627hf.rst
-   w83773g.rst
-   w83781d.rst
-   w83791d.rst
-   w83792d.rst
-   w83793.rst
-   w83795.rst
-   w83l785ts.rst
-   w83l786ng.rst
-   wm831x.rst
-   wm8350.rst
-   xgene-hwmon.rst
-   zl6100.rst
+   ab8500
+   abituguru
+   abituguru3
+   abx500
+   acpi_power_meter
+   ad7314
+   adc128d818
+   adm1021
+   adm1025
+   adm1026
+   adm1031
+   adm1275
+   adm9240
+   ads1015
+   ads7828
+   adt7410
+   adt7411
+   adt7462
+   adt7470
+   adt7475
+   amc6821
+   asb100
+   asc7621
+   aspeed-pwm-tacho
+   coretemp
+   da9052
+   da9055
+   dme1737
+   ds1621
+   ds620
+   emc1403
+   emc2103
+   emc6w201
+   f71805f
+   f71882fg
+   fam15h_power
+   ftsteutates
+   g760a
+   g762
+   gl518sm
+   hih6130
+   ibmaem
+   ibm-cffps
+   ibmpowernv
+   ina209
+   ina2xx
+   ina3221
+   ir35221
+   it87
+   jc42
+   k10temp
+   k8temp
+   lineage-pem
+   lm25066
+   lm63
+   lm70
+   lm73
+   lm75
+   lm77
+   lm78
+   lm80
+   lm83
+   lm85
+   lm87
+   lm90
+   lm92
+   lm93
+   lm95234
+   lm95245
+   ltc2945
+   ltc2978
+   ltc2990
+   ltc3815
+   ltc4151
+   ltc4215
+   ltc4245
+   ltc4260
+   ltc4261
+   max16064
+   max16065
+   max1619
+   max1668
+   max197
+   max20751
+   max31722
+   max31785
+   max31790
+   max34440
+   max6639
+   max6642
+   max6650
+   max6697
+   max8688
+   mc13783-adc
+   mcp3021
+   menf21bmc
+   mlxreg-fan
+   nct6683
+   nct6775
+   nct7802
+   nct7904
+   npcm750-pwm-fan
+   nsa320
+   ntc_thermistor
+   occ
+   pc87360
+   pc87427
+   pcf8591
+   pmbus
+   powr1220
+   pwm-fan
+   raspberrypi-hwmon
+   sch5627
+   sch5636
+   scpi-hwmon
+   sht15
+   sht21
+   sht3x
+   shtc1
+   sis5595
+   smm665
+   smsc47b397
+   smsc47m192
+   smsc47m1
+   tc654
+   tc74
+   thmc50
+   tmp102
+   tmp103
+   tmp108
+ 

Re: [patch V2 28/29] stacktrace: Provide common infrastructure

2019-04-19 Thread Peter Zijlstra
On Thu, Apr 18, 2019 at 05:42:55PM +0200, Thomas Gleixner wrote:
> On Thu, 18 Apr 2019, Josh Poimboeuf wrote:

> > Another idea I had (but never got a chance to work on) was to extend the
> > x86 unwind interface to all arches.  So instead of the callbacks, each
> > arch would implement something like this API:

> I surely thought about that, but after staring at all incarnations of
> arch/*/stacktrace.c I just gave up.
> 
> Aside of that quite some archs already have callback based unwinders
> because they use them for more than stacktracing and just have a single
> implementation of that loop.
> 
> I'm fine either way. We can start with x86 and then let archs convert over
> their stuff, but I wouldn't hold my breath that this will be completed in
> the forseeable future.

I suggested the same to Thomas early on, and I even spend the time to
convert some $random arch to the iterator interface, and while it is
indeed entirely feasible, it is _far_ more work.

The callback thing OTOH is flexible enough to do what we want to do now,
and allows converting most archs to it without too much pain (as Thomas
said, many archs are already in this form and only need minor API
adjustments), which gets us in a far better place than we are now.

And we can always go to iterators later on. But I think getting the
generic unwinder improved across all archs is a really important first
step here.


Re: [PATCH v4 14/16] locking/rwsem: Guard against making count negative

2019-04-19 Thread Peter Zijlstra
On Fri, Apr 19, 2019 at 12:26:47PM +0200, Peter Zijlstra wrote:
> I thought of a horrible horrible alternative:

Hurm, that's broken as heck. Let me try again.


Re: [PATCH RFC 1/2] Add polling support to pidfd

2019-04-19 Thread Daniel Colascione
On Fri, Apr 19, 2019 at 3:02 PM Christian Brauner  wrote:
>
> On Fri, Apr 19, 2019 at 11:48 PM Christian Brauner  
> wrote:
> >
> > On Fri, Apr 19, 2019 at 11:21 PM Daniel Colascione  
> > wrote:
> > >
> > > On Fri, Apr 19, 2019 at 1:57 PM Christian Brauner  
> > > wrote:
> > > >
> > > > On Fri, Apr 19, 2019 at 10:34 PM Daniel Colascione  
> > > > wrote:
> > > > >
> > > > > On Fri, Apr 19, 2019 at 12:49 PM Joel Fernandes 
> > > > >  wrote:
> > > > > >
> > > > > > On Fri, Apr 19, 2019 at 09:18:59PM +0200, Christian Brauner wrote:
> > > > > > > On Fri, Apr 19, 2019 at 03:02:47PM -0400, Joel Fernandes wrote:
> > > > > > > > On Thu, Apr 18, 2019 at 07:26:44PM +0200, Christian Brauner 
> > > > > > > > wrote:
> > > > > > > > > On April 18, 2019 7:23:38 PM GMT+02:00, Jann Horn 
> > > > > > > > >  wrote:
> > > > > > > > > >On Wed, Apr 17, 2019 at 3:09 PM Oleg Nesterov 
> > > > > > > > > > wrote:
> > > > > > > > > >> On 04/16, Joel Fernandes wrote:
> > > > > > > > > >> > On Tue, Apr 16, 2019 at 02:04:31PM +0200, Oleg Nesterov 
> > > > > > > > > >> > wrote:
> > > > > > > > > >> > >
> > > > > > > > > >> > > Could you explain when it should return POLLIN? When 
> > > > > > > > > >> > > the whole
> > > > > > > > > >process exits?
> > > > > > > > > >> >
> > > > > > > > > >> > It returns POLLIN when the task is dead or doesn't exist 
> > > > > > > > > >> > anymore,
> > > > > > > > > >or when it
> > > > > > > > > >> > is in a zombie state and there's no other thread in the 
> > > > > > > > > >> > thread
> > > > > > > > > >group.
> > > > > > > > > >>
> > > > > > > > > >> IOW, when the whole thread group exits, so it can't be 
> > > > > > > > > >> used to
> > > > > > > > > >monitor sub-threads.
> > > > > > > > > >>
> > > > > > > > > >> just in case... speaking of this patch it doesn't modify
> > > > > > > > > >proc_tid_base_operations,
> > > > > > > > > >> so you can't poll("/proc/sub-thread-tid") anyway, but iiuc 
> > > > > > > > > >> you are
> > > > > > > > > >going to use
> > > > > > > > > >> the anonymous file returned by CLONE_PIDFD ?
> > > > > > > > > >
> > > > > > > > > >I don't think procfs works that way. /proc/sub-thread-tid has
> > > > > > > > > >proc_tgid_base_operations despite not being a thread group 
> > > > > > > > > >leader.
> > > > > > > > > >(Yes, that's kinda weird.) AFAICS the WARN_ON_ONCE() in this 
> > > > > > > > > >code can
> > > > > > > > > >be hit trivially, and then the code will misbehave.
> > > > > > > > > >
> > > > > > > > > >@Joel: I think you'll have to either rewrite this to 
> > > > > > > > > >explicitly bail
> > > > > > > > > >out if you're dealing with a thread group leader, or make 
> > > > > > > > > >the code
> > > > > > > > > >work for threads, too.
> > > > > > > > >
> > > > > > > > > The latter case probably being preferred if this API is 
> > > > > > > > > supposed to be
> > > > > > > > > useable for thread management in userspace.
> > > > > > > >
> > > > > > > > At the moment, we are not planning to use this for sub-thread 
> > > > > > > > management. I
> > > > > > > > am reworking this patch to only work on clone(2) pidfds which 
> > > > > > > > makes the above
> > > > > > >
> > > > > > > Indeed and agreed.
> > > > > > >
> > > > > > > > discussion about /proc a bit unnecessary I think. Per the 
> > > > > > > > latest CLONE_PIDFD
> > > > > > > > patches, CLONE_THREAD with pidfd is not supported.
> > > > > > >
> > > > > > > Yes. We have no one asking for it right now and we can easily add 
> > > > > > > this
> > > > > > > later.
> > > > > > >
> > > > > > > Admittedly I haven't gotten around to reviewing the patches here 
> > > > > > > yet
> > > > > > > completely. But one thing about using POLLIN. FreeBSD is using 
> > > > > > > POLLHUP
> > > > > > > on process exit which I think is nice as well. How about returning
> > > > > > > POLLIN | POLLHUP on process exit?
> > > > > > > We already do things like this. For example, when you proxy 
> > > > > > > between
> > > > > > > ttys. If the process that you're reading data from has exited and 
> > > > > > > closed
> > > > > > > it's end you still can't usually simply exit because it might 
> > > > > > > have still
> > > > > > > buffered data that you want to read.  The way one can deal with 
> > > > > > > this
> > > > > > > from  userspace is that you can observe a (POLLHUP | POLLIN) 
> > > > > > > event and
> > > > > > > you keep on reading until you only observe a POLLHUP without a 
> > > > > > > POLLIN
> > > > > > > event at which point you know you have read
> > > > > > > all data.
> > > > > > > I like the semantics for pidfds as well as it would indicate:
> > > > > > > - POLLHUP -> process has exited
> > > > > > > - POLLIN  -> information can be read
> > > > > >
> > > > > > Actually I think a bit different about this, in my opinion the 
> > > > > > pidfd should
> > > > > > always be readable (we would store the exit status somewhere in the 
> > > > > > future
> > > > > > which would be readable, even after task_struct is dead). So 

Re: [PATCHv2] selftests/efivarfs: clean up test files from test_create*()

2019-04-19 Thread Po-Hsu Lin
Hello,

Commit f8a0590f fix some part of this issue.
I will send out V3 base on this commit in linux-kselftest next branch.

Thank you.


On Fri, Apr 19, 2019 at 9:20 PM shuah  wrote:
>
> On 4/19/19 7:12 AM, Po-Hsu Lin wrote:
> > Test files created by test_create*() tests will stay in the
> > $efivarfs_mount directory unless the system was rebooted.
> >
> > When the tester tries to run this efivarfs test again on the same
> > system, the immutable characteristics in that directory will cause some
> > "Operation not permitted" noises and a false-positve test result to the
> > test_create_read() test.
> >
> >  
> >  running test_create
> >  
> >  ./efivarfs.sh: line 59: 
> > /sys/firmware/efi/efivars/test_create-210be57c-9849-4fc7-a635-e6382d1aec27: 
> > Operation not permitted
> >[PASS]
> >  
> >  running test_create_empty
> >  
> >  ./efivarfs.sh: line 78: 
> > /sys/firmware/efi/efivars/test_create_empty-210be57c-9849-4fc7-a635-e6382d1aec27:
> >  Operation not permitted
> >   [PASS]
> >  
> >  running test_create_read
> >  
> >  open(O_WRONLY): Operation not permitted
> >   [FAIL]
> >  
> >
> > Create a file_cleanup() to remove those test files in the end of each
> > test to solve this issue.
> >
> > Also, use this function to replace the existing file removal code.
> >
> > Link: https://bugs.launchpad.net/bugs/1809704
> >
> > Signed-off-by: Po-Hsu Lin 
> > ---
>
> Thanks for the patch. There is another patch that does the same in
> linux-kselftest next branch.
>
> Please check to see if that fixes the problem you are seeing.
>
> thanks,
> -- Shuah
>


[PATCHv2] selftests/efivarfs: clean up test files from test_create*()

2019-04-19 Thread Po-Hsu Lin
Test files created by test_create*() tests will stay in the
$efivarfs_mount directory unless the system was rebooted.

When the tester tries to run this efivarfs test again on the same
system, the immutable characteristics in that directory will cause some
"Operation not permitted" noises and a false-positve test result to the
test_create_read() test.


running test_create

./efivarfs.sh: line 59: 
/sys/firmware/efi/efivars/test_create-210be57c-9849-4fc7-a635-e6382d1aec27: 
Operation not permitted
  [PASS]

running test_create_empty

./efivarfs.sh: line 78: 
/sys/firmware/efi/efivars/test_create_empty-210be57c-9849-4fc7-a635-e6382d1aec27:
 Operation not permitted
 [PASS]

running test_create_read

open(O_WRONLY): Operation not permitted
 [FAIL]


Create a file_cleanup() to remove those test files in the end of each
test to solve this issue.

Also, use this function to replace the existing file removal code.

Link: https://bugs.launchpad.net/bugs/1809704

Signed-off-by: Po-Hsu Lin 
---
 tools/testing/selftests/efivarfs/efivarfs.sh | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)
 mode change 100755 => 100644 tools/testing/selftests/efivarfs/efivarfs.sh

diff --git a/tools/testing/selftests/efivarfs/efivarfs.sh 
b/tools/testing/selftests/efivarfs/efivarfs.sh
old mode 100755
new mode 100644
index a47029a..14fa6fe
--- a/tools/testing/selftests/efivarfs/efivarfs.sh
+++ b/tools/testing/selftests/efivarfs/efivarfs.sh
@@ -7,6 +7,12 @@ test_guid=210be57c-9849-4fc7-a635-e6382d1aec27
 # Kselftest framework requirement - SKIP code is 4.
 ksft_skip=4
 
+file_cleanup()
+{
+   chattr -i $1
+   rm $1
+}
+
 check_prereqs()
 {
local msg="skip all tests:"
@@ -58,8 +64,10 @@ test_create()
 
if [ $(stat -c %s $file) -ne 5 ]; then
echo "$file has invalid size" >&2
+   file_cleanup $file
exit 1
fi
+   file_cleanup $file
 }
 
 test_create_empty()
@@ -72,12 +80,14 @@ test_create_empty()
echo "$file can not be created without writing" >&2
exit 1
fi
+   file_cleanup $file
 }
 
 test_create_read()
 {
local file=$efivarfs_mount/$FUNCNAME-$test_guid
./create-read $file
+   file_cleanup $file
 }
 
 test_delete()
@@ -94,8 +104,7 @@ test_delete()
 
rm $file 2>/dev/null
if [ $? -ne 0 ]; then
-   chattr -i $file
-   rm $file
+   file_cleanup $file
fi
 
if [ -e $file ]; then
@@ -152,8 +161,7 @@ test_valid_filenames()
else
rm $file 2>/dev/null
if [ $? -ne 0 ]; then
-   chattr -i $file
-   rm $file
+   file_cleanup $file
fi
fi
done
@@ -189,8 +197,7 @@ test_invalid_filenames()
echo "Creating $file should have failed" >&2
rm $file 2>/dev/null
if [ $? -ne 0 ]; then
-   chattr -i $file
-   rm $file
+   file_cleanup $file
fi
ret=1
fi
-- 
2.7.4



[PATCHv3] selftests/efivarfs: clean up test files from test_create*()

2019-04-19 Thread Po-Hsu Lin
Test files created by test_create() and test_create_empty() tests will
stay in the $efivarfs_mount directory until the system was rebooted.

When the tester tries to run this efivarfs test again on the same
system, the immutable characteristics in that directory will cause some
"Operation not permitted" noises, and a false-positve test result as the
file was created in previous run.

running test_create

./efivarfs.sh: line 59: 
/sys/firmware/efi/efivars/test_create-210be57c-9849-4fc7-a635-e6382d1aec27: 
Operation not permitted
  [PASS]

running test_create_empty

./efivarfs.sh: line 78: 
/sys/firmware/efi/efivars/test_create_empty-210be57c-9849-4fc7-a635-e6382d1aec27:
 Operation not permitted
 [PASS]


Create a file_cleanup() to remove those test files in the end of each
test to solve this issue.

For the test_create_read, we can move the clean up task to the end of
the test to ensure the system is clean.

Also, use this function to replace the existing file removal code.

Signed-off-by: Po-Hsu Lin 
---
 tools/testing/selftests/efivarfs/efivarfs.sh | 32 +++-
 1 file changed, 13 insertions(+), 19 deletions(-)

diff --git a/tools/testing/selftests/efivarfs/efivarfs.sh 
b/tools/testing/selftests/efivarfs/efivarfs.sh
index d386610..a90f394 100755
--- a/tools/testing/selftests/efivarfs/efivarfs.sh
+++ b/tools/testing/selftests/efivarfs/efivarfs.sh
@@ -7,6 +7,12 @@ test_guid=210be57c-9849-4fc7-a635-e6382d1aec27
 # Kselftest framework requirement - SKIP code is 4.
 ksft_skip=4
 
+file_cleanup()
+{
+   chattr -i $1
+   rm -f $1
+}
+
 check_prereqs()
 {
local msg="skip all tests:"
@@ -58,8 +64,10 @@ test_create()
 
if [ $(stat -c %s $file) -ne 5 ]; then
echo "$file has invalid size" >&2
+   file_cleanup $file
exit 1
fi
+   file_cleanup $file
 }
 
 test_create_empty()
@@ -72,16 +80,14 @@ test_create_empty()
echo "$file can not be created without writing" >&2
exit 1
fi
+   file_cleanup $file
 }
 
 test_create_read()
 {
local file=$efivarfs_mount/$FUNCNAME-$test_guid
-   if [ -f $file]; then
-   chattr -i $file
-   rm -rf $file
-   fi
./create-read $file
+   file_cleanup $file
 }
 
 test_delete()
@@ -96,11 +102,7 @@ test_delete()
exit 1
fi
 
-   rm $file 2>/dev/null
-   if [ $? -ne 0 ]; then
-   chattr -i $file
-   rm $file
-   fi
+   file_cleanup $file
 
if [ -e $file ]; then
echo "$file couldn't be deleted" >&2
@@ -154,11 +156,7 @@ test_valid_filenames()
echo "$file could not be created" >&2
ret=1
else
-   rm $file 2>/dev/null
-   if [ $? -ne 0 ]; then
-   chattr -i $file
-   rm $file
-   fi
+   file_cleanup $file
fi
done
 
@@ -191,11 +189,7 @@ test_invalid_filenames()
 
if [ -e $file ]; then
echo "Creating $file should have failed" >&2
-   rm $file 2>/dev/null
-   if [ $? -ne 0 ]; then
-   chattr -i $file
-   rm $file
-   fi
+   file_cleanup $file
ret=1
fi
done
-- 
2.7.4



Re: [PATCH] ACPI / LPSS: Don't skip late system PM ops for hibernate on BYT/CHT

2019-04-19 Thread Robert R. Howell
On 4/18/19 5:42 AM, Hans de Goede wrote:

>> On 4/8/19 2:16 AM, Hans de Goede wrote:>
>>>
>>> Hmm, interesting so you have hibernation working on a T100TA
>>> (with 5.0 + 02e45646d53b reverted), right ?
>>>

> Still since my patch is regressing things for you I will try to
> take a look at this and see if I can reproduce and come up with
> a fix. But this is not going to be a high priority thing for me to
> work on.
> 
> In the mean time I've gone ahead and submitted my version of the
> fix for the problem Kai-Heng was seeing, since that does not seem
> to make your problem worse; and it will be good to get that problem
> fixed.
> 
> Regards,
> 
> Hans
> 

I've managed to find a way around the i2c_designware timeout issues
on the T100TA's.  The key is to NOT set DPM_FLAG_SMART_SUSPEND,
which was added in the 02e45646d53b commit.

To test that I've started with a 5.1-rc5 kernel, applied your recent patch
to acpi_lpss.c, then apply the following patch of mine, removing 
DPM_FLAG_SMART_SUSPEND.  (For the T100 hardware I need to apply some 
other patches as well but those are not related to the i2c-designware or 
acpi issues addressed here.)  

On a resume from hibernation I still see one error:
  "i2c_designware 80860F41:00: Error i2c_dw_xfer called while suspended"
but I no longer get the i2c_designware timeouts, and audio does now work
after the resume.

Removing DPM_FLAG_SMART_SUSPEND may not be what you want for other 
hardware, but perhaps this will give you a clue as to what is going
wrong with hibernate/resume on the T100TA's.

With the above now working I've also experimented with my systemd
hibernate script and at least in limited testing I only need to
remove the brcmfmac and hci_uart drivers before hibernate.
I no longer need to remove sound drivers or the other ones shown in 
my earlier script.  Without more testing I'm not sure at what point
in kernel development those other driver removals stopped being necessary.

Let me know if I can do any other tests to help.

Bob Howell



---
diff -uprN linux_original/drivers/i2c/busses/i2c-designware-platdrv.c 
linux/drivers/i2c/busses/i2c-designware-platdrv.c
--- linux_original/drivers/i2c/busses/i2c-designware-platdrv.c  2019-04-14 
16:17:41.0 -0600
+++ linux/drivers/i2c/busses/i2c-designware-platdrv.c   2019-04-18 
22:45:58.836246889 -0600
@@ -367,7 +367,6 @@ static int dw_i2c_plat_probe(struct plat

dev_pm_set_driver_flags(>dev,
DPM_FLAG_SMART_PREPARE |
-   DPM_FLAG_SMART_SUSPEND |
DPM_FLAG_LEAVE_SUSPENDED);

/* The code below assumes runtime PM to be disabled. */
---


[PATCH] mm/hmm: Fix initial PFN for hugetlbfs pages

2019-04-19 Thread rcampbell
From: Ralph Campbell 

The mmotm patch [1] adds hugetlbfs support for HMM but the initial
PFN used to fill the HMM range->pfns[] array doesn't properly
compute the starting PFN offset.
This can be tested by running test-hugetlbfs-read from [2].

Fix the PFN offset by adjusting the page offset by the device's
page size.

Andrew, this should probably be squashed into Jerome's patch.

[1] https://marc.info/?l=linux-mm=155432003506068=2
("mm/hmm: mirror hugetlbfs (snapshoting, faulting and DMA mapping)")
[2] https://gitlab.freedesktop.org/glisse/svm-cl-tests

Signed-off-by: Ralph Campbell 
Cc: Jérôme Glisse 
Cc: Ira Weiny 
Cc: John Hubbard 
Cc: Dan Williams 
Cc: Arnd Bergmann 
Cc: Balbir Singh 
Cc: Dan Carpenter 
Cc: Matthew Wilcox 
Cc: Souptick Joarder 
Cc: Andrew Morton 
---
 mm/hmm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index def451a56c3e..fcf8e4fb5770 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -868,7 +868,7 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned 
long hmask,
goto unlock;
}
 
-   pfn = pte_pfn(entry) + (start & mask);
+   pfn = pte_pfn(entry) + ((start & mask) >> range->page_shift);
for (; addr < end; addr += size, i++, pfn += pfn_inc)
range->pfns[i] = hmm_device_entry_from_pfn(range, pfn) |
 cpu_flags;
-- 
2.20.1



Re: [PATCH RFC 1/2] Add polling support to pidfd

2019-04-19 Thread Daniel Colascione
On Fri, Apr 19, 2019 at 3:18 PM Christian Brauner  wrote:
>
> On Sat, Apr 20, 2019 at 12:08 AM Daniel Colascione  wrote:
> >
> > On Fri, Apr 19, 2019 at 2:45 PM Joel Fernandes  
> > wrote:
> > >
> > > On Fri, Apr 19, 2019 at 02:24:09PM -0700, Daniel Colascione wrote:
> > > > On Fri, Apr 19, 2019 at 2:20 PM Joel Fernandes  
> > > > wrote:
> > > > >
> > > > > On Fri, Apr 19, 2019 at 10:57:11PM +0200, Christian Brauner wrote:
> > > > > > On Fri, Apr 19, 2019 at 10:34 PM Daniel Colascione 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Fri, Apr 19, 2019 at 12:49 PM Joel Fernandes 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Fri, Apr 19, 2019 at 09:18:59PM +0200, Christian Brauner 
> > > > > > > > wrote:
> > > > > > > > > On Fri, Apr 19, 2019 at 03:02:47PM -0400, Joel Fernandes 
> > > > > > > > > wrote:
> > > > > > > > > > On Thu, Apr 18, 2019 at 07:26:44PM +0200, Christian Brauner 
> > > > > > > > > > wrote:
> > > > > > > > > > > On April 18, 2019 7:23:38 PM GMT+02:00, Jann Horn 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >On Wed, Apr 17, 2019 at 3:09 PM Oleg Nesterov 
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >> On 04/16, Joel Fernandes wrote:
> > > > > > > > > > > >> > On Tue, Apr 16, 2019 at 02:04:31PM +0200, Oleg 
> > > > > > > > > > > >> > Nesterov wrote:
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > > Could you explain when it should return POLLIN? 
> > > > > > > > > > > >> > > When the whole
> > > > > > > > > > > >process exits?
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > It returns POLLIN when the task is dead or doesn't 
> > > > > > > > > > > >> > exist anymore,
> > > > > > > > > > > >or when it
> > > > > > > > > > > >> > is in a zombie state and there's no other thread in 
> > > > > > > > > > > >> > the thread
> > > > > > > > > > > >group.
> > > > > > > > > > > >>
> > > > > > > > > > > >> IOW, when the whole thread group exits, so it can't be 
> > > > > > > > > > > >> used to
> > > > > > > > > > > >monitor sub-threads.
> > > > > > > > > > > >>
> > > > > > > > > > > >> just in case... speaking of this patch it doesn't 
> > > > > > > > > > > >> modify
> > > > > > > > > > > >proc_tid_base_operations,
> > > > > > > > > > > >> so you can't poll("/proc/sub-thread-tid") anyway, but 
> > > > > > > > > > > >> iiuc you are
> > > > > > > > > > > >going to use
> > > > > > > > > > > >> the anonymous file returned by CLONE_PIDFD ?
> > > > > > > > > > > >
> > > > > > > > > > > >I don't think procfs works that way. 
> > > > > > > > > > > >/proc/sub-thread-tid has
> > > > > > > > > > > >proc_tgid_base_operations despite not being a thread 
> > > > > > > > > > > >group leader.
> > > > > > > > > > > >(Yes, that's kinda weird.) AFAICS the WARN_ON_ONCE() in 
> > > > > > > > > > > >this code can
> > > > > > > > > > > >be hit trivially, and then the code will misbehave.
> > > > > > > > > > > >
> > > > > > > > > > > >@Joel: I think you'll have to either rewrite this to 
> > > > > > > > > > > >explicitly bail
> > > > > > > > > > > >out if you're dealing with a thread group leader, or 
> > > > > > > > > > > >make the code
> > > > > > > > > > > >work for threads, too.
> > > > > > > > > > >
> > > > > > > > > > > The latter case probably being preferred if this API is 
> > > > > > > > > > > supposed to be
> > > > > > > > > > > useable for thread management in userspace.
> > > > > > > > > >
> > > > > > > > > > At the moment, we are not planning to use this for 
> > > > > > > > > > sub-thread management. I
> > > > > > > > > > am reworking this patch to only work on clone(2) pidfds 
> > > > > > > > > > which makes the above
> > > > > > > > >
> > > > > > > > > Indeed and agreed.
> > > > > > > > >
> > > > > > > > > > discussion about /proc a bit unnecessary I think. Per the 
> > > > > > > > > > latest CLONE_PIDFD
> > > > > > > > > > patches, CLONE_THREAD with pidfd is not supported.
> > > > > > > > >
> > > > > > > > > Yes. We have no one asking for it right now and we can easily 
> > > > > > > > > add this
> > > > > > > > > later.
> > > > > > > > >
> > > > > > > > > Admittedly I haven't gotten around to reviewing the patches 
> > > > > > > > > here yet
> > > > > > > > > completely. But one thing about using POLLIN. FreeBSD is 
> > > > > > > > > using POLLHUP
> > > > > > > > > on process exit which I think is nice as well. How about 
> > > > > > > > > returning
> > > > > > > > > POLLIN | POLLHUP on process exit?
> > > > > > > > > We already do things like this. For example, when you proxy 
> > > > > > > > > between
> > > > > > > > > ttys. If the process that you're reading data from has exited 
> > > > > > > > > and closed
> > > > > > > > > it's end you still can't usually simply exit because it might 
> > > > > > > > > have still
> > > > > > > > > buffered data that you want to read.  The way one can deal 
> > > > > > > > > with this
> > > > > > > > > from  userspace is that you can observe a (POLLHUP | POLLIN) 
> > > > > > > > > event and
> > > > > > > > > you keep on reading 

Re: [PATCH RFC 1/2] Add polling support to pidfd

2019-04-19 Thread Daniel Colascione
On Fri, Apr 19, 2019 at 2:48 PM Christian Brauner  wrote:
>
> On Fri, Apr 19, 2019 at 11:21 PM Daniel Colascione  wrote:
> >
> > On Fri, Apr 19, 2019 at 1:57 PM Christian Brauner  
> > wrote:
> > >
> > > On Fri, Apr 19, 2019 at 10:34 PM Daniel Colascione  
> > > wrote:
> > > >
> > > > On Fri, Apr 19, 2019 at 12:49 PM Joel Fernandes 
> > > >  wrote:
> > > > >
> > > > > On Fri, Apr 19, 2019 at 09:18:59PM +0200, Christian Brauner wrote:
> > > > > > On Fri, Apr 19, 2019 at 03:02:47PM -0400, Joel Fernandes wrote:
> > > > > > > On Thu, Apr 18, 2019 at 07:26:44PM +0200, Christian Brauner wrote:
> > > > > > > > On April 18, 2019 7:23:38 PM GMT+02:00, Jann Horn 
> > > > > > > >  wrote:
> > > > > > > > >On Wed, Apr 17, 2019 at 3:09 PM Oleg Nesterov 
> > > > > > > > > wrote:
> > > > > > > > >> On 04/16, Joel Fernandes wrote:
> > > > > > > > >> > On Tue, Apr 16, 2019 at 02:04:31PM +0200, Oleg Nesterov 
> > > > > > > > >> > wrote:
> > > > > > > > >> > >
> > > > > > > > >> > > Could you explain when it should return POLLIN? When the 
> > > > > > > > >> > > whole
> > > > > > > > >process exits?
> > > > > > > > >> >
> > > > > > > > >> > It returns POLLIN when the task is dead or doesn't exist 
> > > > > > > > >> > anymore,
> > > > > > > > >or when it
> > > > > > > > >> > is in a zombie state and there's no other thread in the 
> > > > > > > > >> > thread
> > > > > > > > >group.
> > > > > > > > >>
> > > > > > > > >> IOW, when the whole thread group exits, so it can't be used 
> > > > > > > > >> to
> > > > > > > > >monitor sub-threads.
> > > > > > > > >>
> > > > > > > > >> just in case... speaking of this patch it doesn't modify
> > > > > > > > >proc_tid_base_operations,
> > > > > > > > >> so you can't poll("/proc/sub-thread-tid") anyway, but iiuc 
> > > > > > > > >> you are
> > > > > > > > >going to use
> > > > > > > > >> the anonymous file returned by CLONE_PIDFD ?
> > > > > > > > >
> > > > > > > > >I don't think procfs works that way. /proc/sub-thread-tid has
> > > > > > > > >proc_tgid_base_operations despite not being a thread group 
> > > > > > > > >leader.
> > > > > > > > >(Yes, that's kinda weird.) AFAICS the WARN_ON_ONCE() in this 
> > > > > > > > >code can
> > > > > > > > >be hit trivially, and then the code will misbehave.
> > > > > > > > >
> > > > > > > > >@Joel: I think you'll have to either rewrite this to 
> > > > > > > > >explicitly bail
> > > > > > > > >out if you're dealing with a thread group leader, or make the 
> > > > > > > > >code
> > > > > > > > >work for threads, too.
> > > > > > > >
> > > > > > > > The latter case probably being preferred if this API is 
> > > > > > > > supposed to be
> > > > > > > > useable for thread management in userspace.
> > > > > > >
> > > > > > > At the moment, we are not planning to use this for sub-thread 
> > > > > > > management. I
> > > > > > > am reworking this patch to only work on clone(2) pidfds which 
> > > > > > > makes the above
> > > > > >
> > > > > > Indeed and agreed.
> > > > > >
> > > > > > > discussion about /proc a bit unnecessary I think. Per the latest 
> > > > > > > CLONE_PIDFD
> > > > > > > patches, CLONE_THREAD with pidfd is not supported.
> > > > > >
> > > > > > Yes. We have no one asking for it right now and we can easily add 
> > > > > > this
> > > > > > later.
> > > > > >
> > > > > > Admittedly I haven't gotten around to reviewing the patches here yet
> > > > > > completely. But one thing about using POLLIN. FreeBSD is using 
> > > > > > POLLHUP
> > > > > > on process exit which I think is nice as well. How about returning
> > > > > > POLLIN | POLLHUP on process exit?
> > > > > > We already do things like this. For example, when you proxy between
> > > > > > ttys. If the process that you're reading data from has exited and 
> > > > > > closed
> > > > > > it's end you still can't usually simply exit because it might have 
> > > > > > still
> > > > > > buffered data that you want to read.  The way one can deal with this
> > > > > > from  userspace is that you can observe a (POLLHUP | POLLIN) event 
> > > > > > and
> > > > > > you keep on reading until you only observe a POLLHUP without a 
> > > > > > POLLIN
> > > > > > event at which point you know you have read
> > > > > > all data.
> > > > > > I like the semantics for pidfds as well as it would indicate:
> > > > > > - POLLHUP -> process has exited
> > > > > > - POLLIN  -> information can be read
> > > > >
> > > > > Actually I think a bit different about this, in my opinion the pidfd 
> > > > > should
> > > > > always be readable (we would store the exit status somewhere in the 
> > > > > future
> > > > > which would be readable, even after task_struct is dead). So I was 
> > > > > thinking
> > > > > we always return EPOLLIN.  If process has not exited, then it blocks.
> > > >
> > > > ITYM that a pidfd polls as readable *once a task exits* and stays
> > > > readable forever. Before a task exit, a poll on a pidfd should *not*
> > > > yield POLLIN and reading that pidfd should *not* complete 

Re: [PATCH] KVM: nVMX: allow tests to use bad virtual-APIC page address

2019-04-19 Thread Krish Sadhukhan




On 04/15/2019 06:35 AM, Paolo Bonzini wrote:

As mentioned in the comment, there are some special cases where we can simply
clear the TPR shadow bit from the CPU-based execution controls in the vmcs02.
Handle them so that we can remove some XFAILs from vmx.flat.

Signed-off-by: Paolo Bonzini 
---
  arch/x86/kvm/vmx/nested.c | 25 -
  arch/x86/kvm/vmx/vmx.c|  2 +-
  arch/x86/kvm/vmx/vmx.h|  2 ++
  3 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 7ec9bb1dd723..a22af5a85540 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2873,20 +2873,27 @@ static void nested_get_vmcs12_pages(struct kvm_vcpu 
*vcpu)
/*
 * If translation failed, VM entry will fail because
 * prepare_vmcs02 set VIRTUAL_APIC_PAGE_ADDR to -1ull.
-* Failing the vm entry is _not_ what the processor
-* does but it's basically the only possibility we
-* have.  We could still enter the guest if CR8 load
-* exits are enabled, CR8 store exits are enabled, and
-* virtualize APIC access is disabled; in this case
-* the processor would never use the TPR shadow and we
-* could simply clear the bit from the execution
-* control.  But such a configuration is useless, so
-* let's keep the code simple.
 */
if (!is_error_page(page)) {
vmx->nested.virtual_apic_page = page;
hpa = page_to_phys(vmx->nested.virtual_apic_page);
vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, hpa);
+   } else if (nested_cpu_has(vmcs12, CPU_BASED_CR8_LOAD_EXITING) &&
+  nested_cpu_has(vmcs12, CPU_BASED_CR8_STORE_EXITING) 
&&
+  !nested_cpu_has2(vmcs12, 
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) {
+   /*
+* The processor will never use the TPR shadow, simply
+* clear the bit from the execution control.  Such a
+* configuration is useless, but it happens in tests.
+* For any other configuration, failing the vm entry is
+* _not_ what the processor does but it's basically the
+* only possibility we have.
+*/
+   vmcs_clear_bits(CPU_BASED_VM_EXEC_CONTROL,
+   CPU_BASED_TPR_SHADOW);
+   } else {
+   printk("bad virtual-APIC page address\n");
+   dump_vmcs();
}
}
  
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c

index f8054dc1de65..14cacfd7ffd0 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5603,7 +5603,7 @@ static void vmx_dump_dtsel(char *name, uint32_t limit)
   vmcs_readl(limit + GUEST_GDTR_BASE - GUEST_GDTR_LIMIT));
  }
  
-static void dump_vmcs(void)

+void dump_vmcs(void)
  {
u32 vmentry_ctl = vmcs_read32(VM_ENTRY_CONTROLS);
u32 vmexit_ctl = vmcs_read32(VM_EXIT_CONTROLS);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index a1e00d0a2482..f879529906b4 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -517,4 +517,6 @@ static inline void decache_tsc_multiplier(struct vcpu_vmx 
*vmx)
vmcs_write64(TSC_MULTIPLIER, vmx->current_tsc_ratio);
  }
  
+void dump_vmcs(void);

+
  #endif /* __KVM_X86_VMX_H */


Reviewed-by: Krish Sadhukhan 


Re: [PATCH v2] clk: hi3660: Mark clk_gate_ufs_subsys as critical

2019-04-19 Thread Stephen Boyd
Quoting Leo Yan (2019-03-20 03:05:08)
> clk_gate_ufs_subsys is a system bus clock, turning off it will
> introduce lockup issue during system suspend flow.  Let's mark
> clk_gate_ufs_subsys as critical clock, thus keeps it on during
> system suspend and resume.
> 
> Fixes: d374e6fd5088 ("clk: hisilicon: Add clock driver for hi3660 SoC")
> Cc: sta...@vger.kernel.org
> Cc: Zhong Kaihua 
> Cc: John Stultz 
> Cc: Zhangfei Gao 
> Suggested-by: Dong Zhang 
> Signed-off-by: Leo Yan 
> ---

Applied to clk-next



Re: [PATCH RESEND] clk: tegra: Don't enable already enabled PLLs

2019-04-19 Thread Stephen Boyd
Quoting Dmitry Osipenko (2019-04-19 04:42:26)
> Initially Common Clock Framework isn't aware of the clock-enable status,
> this results in enabling of clocks that were enabled by bootloader. This
> is not a big deal for a regular clock-gates, but for PLL's it may have
> some unpleasant consequences. Thus re-enabling PLLX (the main CPU parent
> clock) may result in extra long period of PLL re-locking.
> 
> Acked-by: Peter De Schrijver 
> Signed-off-by: Dmitry Osipenko 
> ---

Applied to clk-next



Re: [PATCH RFC 1/2] Add polling support to pidfd

2019-04-19 Thread Christian Brauner
On Sat, Apr 20, 2019 at 12:08 AM Daniel Colascione  wrote:
>
> On Fri, Apr 19, 2019 at 2:45 PM Joel Fernandes  wrote:
> >
> > On Fri, Apr 19, 2019 at 02:24:09PM -0700, Daniel Colascione wrote:
> > > On Fri, Apr 19, 2019 at 2:20 PM Joel Fernandes  
> > > wrote:
> > > >
> > > > On Fri, Apr 19, 2019 at 10:57:11PM +0200, Christian Brauner wrote:
> > > > > On Fri, Apr 19, 2019 at 10:34 PM Daniel Colascione 
> > > > >  wrote:
> > > > > >
> > > > > > On Fri, Apr 19, 2019 at 12:49 PM Joel Fernandes 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Fri, Apr 19, 2019 at 09:18:59PM +0200, Christian Brauner wrote:
> > > > > > > > On Fri, Apr 19, 2019 at 03:02:47PM -0400, Joel Fernandes wrote:
> > > > > > > > > On Thu, Apr 18, 2019 at 07:26:44PM +0200, Christian Brauner 
> > > > > > > > > wrote:
> > > > > > > > > > On April 18, 2019 7:23:38 PM GMT+02:00, Jann Horn 
> > > > > > > > > >  wrote:
> > > > > > > > > > >On Wed, Apr 17, 2019 at 3:09 PM Oleg Nesterov 
> > > > > > > > > > > wrote:
> > > > > > > > > > >> On 04/16, Joel Fernandes wrote:
> > > > > > > > > > >> > On Tue, Apr 16, 2019 at 02:04:31PM +0200, Oleg 
> > > > > > > > > > >> > Nesterov wrote:
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > Could you explain when it should return POLLIN? When 
> > > > > > > > > > >> > > the whole
> > > > > > > > > > >process exits?
> > > > > > > > > > >> >
> > > > > > > > > > >> > It returns POLLIN when the task is dead or doesn't 
> > > > > > > > > > >> > exist anymore,
> > > > > > > > > > >or when it
> > > > > > > > > > >> > is in a zombie state and there's no other thread in 
> > > > > > > > > > >> > the thread
> > > > > > > > > > >group.
> > > > > > > > > > >>
> > > > > > > > > > >> IOW, when the whole thread group exits, so it can't be 
> > > > > > > > > > >> used to
> > > > > > > > > > >monitor sub-threads.
> > > > > > > > > > >>
> > > > > > > > > > >> just in case... speaking of this patch it doesn't modify
> > > > > > > > > > >proc_tid_base_operations,
> > > > > > > > > > >> so you can't poll("/proc/sub-thread-tid") anyway, but 
> > > > > > > > > > >> iiuc you are
> > > > > > > > > > >going to use
> > > > > > > > > > >> the anonymous file returned by CLONE_PIDFD ?
> > > > > > > > > > >
> > > > > > > > > > >I don't think procfs works that way. /proc/sub-thread-tid 
> > > > > > > > > > >has
> > > > > > > > > > >proc_tgid_base_operations despite not being a thread group 
> > > > > > > > > > >leader.
> > > > > > > > > > >(Yes, that's kinda weird.) AFAICS the WARN_ON_ONCE() in 
> > > > > > > > > > >this code can
> > > > > > > > > > >be hit trivially, and then the code will misbehave.
> > > > > > > > > > >
> > > > > > > > > > >@Joel: I think you'll have to either rewrite this to 
> > > > > > > > > > >explicitly bail
> > > > > > > > > > >out if you're dealing with a thread group leader, or make 
> > > > > > > > > > >the code
> > > > > > > > > > >work for threads, too.
> > > > > > > > > >
> > > > > > > > > > The latter case probably being preferred if this API is 
> > > > > > > > > > supposed to be
> > > > > > > > > > useable for thread management in userspace.
> > > > > > > > >
> > > > > > > > > At the moment, we are not planning to use this for sub-thread 
> > > > > > > > > management. I
> > > > > > > > > am reworking this patch to only work on clone(2) pidfds which 
> > > > > > > > > makes the above
> > > > > > > >
> > > > > > > > Indeed and agreed.
> > > > > > > >
> > > > > > > > > discussion about /proc a bit unnecessary I think. Per the 
> > > > > > > > > latest CLONE_PIDFD
> > > > > > > > > patches, CLONE_THREAD with pidfd is not supported.
> > > > > > > >
> > > > > > > > Yes. We have no one asking for it right now and we can easily 
> > > > > > > > add this
> > > > > > > > later.
> > > > > > > >
> > > > > > > > Admittedly I haven't gotten around to reviewing the patches 
> > > > > > > > here yet
> > > > > > > > completely. But one thing about using POLLIN. FreeBSD is using 
> > > > > > > > POLLHUP
> > > > > > > > on process exit which I think is nice as well. How about 
> > > > > > > > returning
> > > > > > > > POLLIN | POLLHUP on process exit?
> > > > > > > > We already do things like this. For example, when you proxy 
> > > > > > > > between
> > > > > > > > ttys. If the process that you're reading data from has exited 
> > > > > > > > and closed
> > > > > > > > it's end you still can't usually simply exit because it might 
> > > > > > > > have still
> > > > > > > > buffered data that you want to read.  The way one can deal with 
> > > > > > > > this
> > > > > > > > from  userspace is that you can observe a (POLLHUP | POLLIN) 
> > > > > > > > event and
> > > > > > > > you keep on reading until you only observe a POLLHUP without a 
> > > > > > > > POLLIN
> > > > > > > > event at which point you know you have read
> > > > > > > > all data.
> > > > > > > > I like the semantics for pidfds as well as it would indicate:
> > > > > > > > - POLLHUP -> process has exited
> > > > > > > > - POLLIN  -> information can be 

[PATCH 2/2] ARM: errata: add support for A12/A17 errata CR711784

2019-04-19 Thread Douglas Anderson
This adds a code for turning on chicken bit 11, which appears to avoid
a potential CPU deadlock that could occur.  The exact set of
instruction needed to trigger this errata is not totaly known but we
have a high level of confidence that the problem is fixed by setting
chicken bit 11.

All details are in http://crbug.com/711784

This erratum has no known number and thus I have tagged it CR711784
(after the Chrome OS bug number).  I have created separate A12 / A17
configs to match how the rest of the A12 / A17 errata is handled.

Signed-off-by: Douglas Anderson 
---

 arch/arm/Kconfig  | 18 ++
 arch/arm/mm/proc-v7.S | 10 ++
 2 files changed, 28 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 4376fe74f95e..34ec9039206b 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1181,6 +1181,13 @@ config ARM_ERRATA_857271
  hang. The workaround is expected to have a negligible performance
  impact.
 
+config ARM_ERRATA_CR711784_A12
+   bool "ARM errata: A12: conditional instructions can lead to a CPU hang"
+   depends on CPU_V7
+   help
+ This option enables the workaround for a Cortex-A12 erratum without a
+ number. The problems are best described in https://crbug.com/711784
+
 config ARM_ERRATA_852421
bool "ARM errata: A17: DMB ST might fail to create order between stores"
depends on CPU_V7
@@ -1212,6 +1219,17 @@ config ARM_ERRATA_857272
  config option from the A12 erratum due to the way errata are checked
  for and handled.
 
+config ARM_ERRATA_CR711784_A17
+   bool "ARM errata: A17: conditional instructions can lead to a CPU hang"
+   depends on CPU_V7
+   help
+ This option enables the workaround for a Cortex-A17 erratum without a
+ number. The problems are best described in https://crbug.com/711784
+ This erratum is not known to be fixed in any A17 revision.
+ This is identical to Cortex-A12 erratum CR711784.  It is a separate
+ config option from the A12 erratum due to the way errata are checked
+ for and handled.
+
 endmenu
 
 source "arch/arm/common/Kconfig"
diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
index cd2accbab844..a5156ea734ee 100644
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
@@ -396,6 +396,11 @@ __ca12_errata:
mrc p15, 0, r10, c15, c0, 1 @ read diagnostic register
orr r10, r10, #1 << 10  @ set bit #10
mcr p15, 0, r10, c15, c0, 1 @ write diagnostic register
+#endif
+#ifdef CONFIG_ARM_ERRATA_CR711784_A12
+   mrc p15, 0, r10, c15, c0, 1 @ read diagnostic register
+   orr r10, r10, #1 << 11  @ set bit #11
+   mcr p15, 0, r10, c15, c0, 1 @ write diagnostic register
 #endif
b   __errata_finish
 
@@ -416,6 +421,11 @@ __ca17_errata:
mrc p15, 0, r10, c15, c0, 1 @ read diagnostic register
orr r10, r10, #1 << 10  @ set bit #10
mcr p15, 0, r10, c15, c0, 1 @ write diagnostic register
+#endif
+#ifdef CONFIG_ARM_ERRATA_CR711784_A17
+   mrc p15, 0, r10, c15, c0, 1 @ read diagnostic register
+   orr r10, r10, #1 << 11  @ set bit #11
+   mcr p15, 0, r10, c15, c0, 1 @ write diagnostic register
 #endif
b   __errata_finish
 
-- 
2.21.0.593.g511ec345e18-goog



[PATCH 1/2] ARM: errata: Workaround errata A12 857271 / A17 857272

2019-04-19 Thread Douglas Anderson
From: Sonny Rao 

This adds support for working around errata A12 857271 / A17 857272.
These errata were causing hangs on rk3288-based Chromebooks and it was
confirmed that this workaround fixed the problems.  In the Chrome OS
3.14 kernel [1] this erratum was known as ERRATA_FOOBAR due to lack of
an official number from ARM (though the workaround of setting chicken
bit 10 came from ARM).  In the meantime ARM came up with official
errata numbers but never published the workaround upstream.

Let's actually get the workaround landed.

[1] https://crrev.com/c/342753

Signed-off-by: Sonny Rao 
Signed-off-by: Douglas Anderson 
---

 arch/arm/Kconfig  | 19 +++
 arch/arm/mm/proc-v7.S | 10 ++
 2 files changed, 29 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index b509cd338219..4376fe74f95e 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1172,6 +1172,15 @@ config ARM_ERRATA_825619
  DMB NSHST or DMB ISHST instruction followed by a mix of Cacheable
  and Device/Strongly-Ordered loads and stores might cause deadlock
 
+config ARM_ERRATA_857271
+   bool "ARM errata: A12: CPU might deadlock under some very rare internal 
conditions"
+   depends on CPU_V7
+   help
+ This option enables the workaround for the 857271 Cortex-A12
+ (all revs) erratum. Under very rare timing conditions, the CPU might
+ hang. The workaround is expected to have a negligible performance
+ impact.
+
 config ARM_ERRATA_852421
bool "ARM errata: A17: DMB ST might fail to create order between stores"
depends on CPU_V7
@@ -1193,6 +1202,16 @@ config ARM_ERRATA_852423
  config option from the A12 erratum due to the way errata are checked
  for and handled.
 
+config ARM_ERRATA_857272
+   bool "ARM errata: A17: CPU might deadlock under some very rare internal 
conditions"
+   depends on CPU_V7
+   help
+ This option enables the workaround for the 857272 Cortex-A17 erratum.
+ This erratum is not known to be fixed in any A17 revision.
+ This is identical to Cortex-A12 erratum 857271.  It is a separate
+ config option from the A12 erratum due to the way errata are checked
+ for and handled.
+
 endmenu
 
 source "arch/arm/common/Kconfig"
diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
index 339eb17c9808..cd2accbab844 100644
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
@@ -391,6 +391,11 @@ __ca12_errata:
mrc p15, 0, r10, c15, c0, 1 @ read diagnostic register
orr r10, r10, #1 << 24  @ set bit #24
mcr p15, 0, r10, c15, c0, 1 @ write diagnostic register
+#endif
+#ifdef CONFIG_ARM_ERRATA_857271
+   mrc p15, 0, r10, c15, c0, 1 @ read diagnostic register
+   orr r10, r10, #1 << 10  @ set bit #10
+   mcr p15, 0, r10, c15, c0, 1 @ write diagnostic register
 #endif
b   __errata_finish
 
@@ -406,6 +411,11 @@ __ca17_errata:
mrcle   p15, 0, r10, c15, c0, 1 @ read diagnostic register
orrle   r10, r10, #1 << 12  @ set bit #12
mcrle   p15, 0, r10, c15, c0, 1 @ write diagnostic register
+#endif
+#ifdef CONFIG_ARM_ERRATA_857272
+   mrc p15, 0, r10, c15, c0, 1 @ read diagnostic register
+   orr r10, r10, #1 << 10  @ set bit #10
+   mcr p15, 0, r10, c15, c0, 1 @ write diagnostic register
 #endif
b   __errata_finish
 
-- 
2.21.0.593.g511ec345e18-goog



Re: [PATCH v4 8/9] clk: Allow parents to be specified via clkspec index

2019-04-19 Thread Stephen Boyd
Quoting Stephen Boyd (2019-04-12 11:31:49)
> Some clk providers are simple DT nodes that only have a 'clocks'
> property without having an associated 'clock-names' property. In these
> cases, we want to let these clk providers point to their parent clks
> without having to dereference the 'clocks' property at probe time to
> figure out the parent's globally unique clk name. Let's add an 'index'
> property to the parent_data structure so that clk providers can indicate
> that their parent is a particular index in the 'clocks' DT property.
> 
> Cc: Miquel Raynal 
> Cc: Jerome Brunet 
> Cc: Russell King 
> Cc: Michael Turquette 
> Cc: Jeffrey Hugo 
> Cc: Chen-Yu Tsai 
> Signed-off-by: Stephen Boyd 
> ---

Applied to clk-next



Re: [PATCH v4 4/9] driver core: Let dev_of_node() accept a NULL dev

2019-04-19 Thread Stephen Boyd
Quoting Stephen Boyd (2019-04-12 11:31:45)
> We'd like to chain this in places where the 'dev' argument might be
> NULL. Let this function take a NULL 'dev' so this can work.
> 
> Cc: Miquel Raynal 
> Cc: Jerome Brunet 
> Cc: Russell King 
> Cc: Michael Turquette 
> Cc: Jeffrey Hugo 
> Cc: Chen-Yu Tsai 
> Cc: Greg Kroah-Hartman 
> Cc: Rob Herring 
> Signed-off-by: Stephen Boyd 
> ---

Applied to clk-next



Re: [PATCH v4 7/9] clk: Look for parents with clkdev based clk_lookups

2019-04-19 Thread Stephen Boyd
Quoting Stephen Boyd (2019-04-12 11:31:48)
> In addition to looking for DT based parents, support clkdev based
> clk_lookups. This should allow non-DT based clk drivers to participate
> in the parent lookup process.
> 
> Cc: Miquel Raynal 
> Cc: Jerome Brunet 
> Cc: Russell King 
> Cc: Michael Turquette 
> Cc: Jeffrey Hugo 
> Cc: Chen-Yu Tsai 
> Signed-off-by: Stephen Boyd 
> ---

Applied to clk-next



Re: [PATCH v4 9/9] clk: fixed-factor: Let clk framework find parent

2019-04-19 Thread Stephen Boyd
Quoting Stephen Boyd (2019-04-12 11:31:50)
> Convert this driver to a more modern way of specifying parents now that
> we have a way to specify clk parents by DT index. This lets us nicely
> avoid a problem where a parent clk name isn't know because the parent
> clk hasn't been registered yet.
> 
> Cc: Miquel Raynal 
> Cc: Jerome Brunet 
> Cc: Russell King 
> Cc: Michael Turquette 
> Cc: Jeffrey Hugo 
> Cc: Chen-Yu Tsai 
> Signed-off-by: Stephen Boyd 
> ---

Applied to clk-next



Re: [PATCH v4 5/9] clk: Add of_clk_hw_register() API for early clk drivers

2019-04-19 Thread Stephen Boyd
Quoting Stephen Boyd (2019-04-12 11:31:46)
> In some circumstances drivers register clks early and don't have access
> to a struct device because the device model isn't initialized yet. Add
> an API to let drivers register clks associated with a struct device_node
> so that these drivers can participate in getting parent clks through DT.
> 
> Cc: Miquel Raynal 
> Cc: Jerome Brunet 
> Cc: Russell King 
> Cc: Michael Turquette 
> Cc: Jeffrey Hugo 
> Cc: Chen-Yu Tsai 
> Cc: Rob Herring 
> Signed-off-by: Stephen Boyd 
> ---

Applied to clk-next



Re: [PATCH v4 6/9] clk: Allow parents to be specified without string names

2019-04-19 Thread Stephen Boyd
Quoting Stephen Boyd (2019-04-12 11:31:47)
> The common clk framework is lacking in ability to describe the clk
> topology without specifying strings for every possible parent-child
> link. There are a few drawbacks to the current approach:
> 
>  1) String comparisons are used for everything, including describing
>  topologies that are 'local' to a single clock controller.
> 
>  2) clk providers (e.g. i2c clk drivers) need to create globally unique
>  clk names to avoid collisions in the clk namespace, leading to awkward
>  name generation code in various clk drivers.
> 
>  3) DT bindings may not fully describe the clk topology and linkages
>  between clk controllers because drivers can easily rely on globally unique
>  strings to describe connections between clks.
> 
> This leads to confusing DT bindings, complicated clk name generation
> code, and inefficient string comparisons during clk registration just so
> that the clk framework can detect the topology of the clk tree.
> Furthermore, some drivers call clk_get() and then __clk_get_name() to
> extract the globally unique clk name just so they can specify the parent
> of the clk they're registering. We have of_clk_parent_fill() but that
> mostly only works for single clks registered from a DT node, which isn't
> the norm. Let's simplify this all by introducing two new ways of
> specifying clk parents.
> 
> The first method is an array of pointers to clk_hw structures
> corresponding to the parents at that index. This works for clks that are
> registered when we have access to all the clk_hw pointers for the
> parents.
> 
> The second method is a mix of clk_hw pointers and strings of local and
> global parent clk names. If the .fw_name member of the map is set we'll
> look for that clk by performing a DT based lookup of the device the clk
> is registered with and the .name specified in the map. If that fails,
> we'll fallback to the .name member and perform a global clk name lookup
> like we've always done before.
> 
> Using either one of these new methods is entirely optional. Existing
> drivers will continue to work, and they can migrate to this new approach
> as they see fit. Eventually, we'll want to get rid of the 'parent_names'
> array in struct clk_init_data and use one of these new methods instead.
> 
> Cc: Miquel Raynal 
> Cc: Jerome Brunet 
> Cc: Russell King 
> Cc: Michael Turquette 
> Cc: Jeffrey Hugo 
> Cc: Chen-Yu Tsai 
> Cc: Rob Herring 
> Signed-off-by: Stephen Boyd 
> ---

Applied to clk-next



Re: [PATCH v4 3/9] clk: Prepare for clk registration API that uses DT nodes

2019-04-19 Thread Stephen Boyd
Quoting Stephen Boyd (2019-04-12 11:31:44)
> Split out the body of the clk_register() function so it can be shared
> between the different types of registration APIs (DT, device).
> 
> Cc: Miquel Raynal 
> Cc: Jerome Brunet 
> Cc: Russell King 
> Cc: Michael Turquette 
> Cc: Jeffrey Hugo 
> Cc: Chen-Yu Tsai 
> Cc: Rob Herring 
> Signed-off-by: Stephen Boyd 
> ---

Applied to clk-next



Re: [PATCH v4 2/9] clkdev: Move clk creation outside of 'clocks_mutex'

2019-04-19 Thread Stephen Boyd
Quoting Stephen Boyd (2019-04-12 11:31:43)
> We don't need to hold the 'clocks_mutex' here when we're creating a clk
> pointer from a clk_lookup structure. Instead, we just need to make sure
> that the lookup doesn't go away while we dereference the lookup pointer
> to extract the clk_hw pointer out of it. Let's move things around
> slightly so that we have a new function to get the clk_hw out of the
> lookup with the right locking and then chain the two together for what
> used to be __clk_get_sys().
> 
> Cc: Miquel Raynal 
> Cc: Jerome Brunet 
> Cc: Russell King 
> Cc: Michael Turquette 
> Cc: Jeffrey Hugo 
> Cc: Chen-Yu Tsai 
> Signed-off-by: Stephen Boyd 
> ---

Applied to clk-next



Re: [PATCH RFC 1/2] Add polling support to pidfd

2019-04-19 Thread Daniel Colascione
On Fri, Apr 19, 2019 at 2:45 PM Joel Fernandes  wrote:
>
> On Fri, Apr 19, 2019 at 02:24:09PM -0700, Daniel Colascione wrote:
> > On Fri, Apr 19, 2019 at 2:20 PM Joel Fernandes  
> > wrote:
> > >
> > > On Fri, Apr 19, 2019 at 10:57:11PM +0200, Christian Brauner wrote:
> > > > On Fri, Apr 19, 2019 at 10:34 PM Daniel Colascione  
> > > > wrote:
> > > > >
> > > > > On Fri, Apr 19, 2019 at 12:49 PM Joel Fernandes 
> > > > >  wrote:
> > > > > >
> > > > > > On Fri, Apr 19, 2019 at 09:18:59PM +0200, Christian Brauner wrote:
> > > > > > > On Fri, Apr 19, 2019 at 03:02:47PM -0400, Joel Fernandes wrote:
> > > > > > > > On Thu, Apr 18, 2019 at 07:26:44PM +0200, Christian Brauner 
> > > > > > > > wrote:
> > > > > > > > > On April 18, 2019 7:23:38 PM GMT+02:00, Jann Horn 
> > > > > > > > >  wrote:
> > > > > > > > > >On Wed, Apr 17, 2019 at 3:09 PM Oleg Nesterov 
> > > > > > > > > > wrote:
> > > > > > > > > >> On 04/16, Joel Fernandes wrote:
> > > > > > > > > >> > On Tue, Apr 16, 2019 at 02:04:31PM +0200, Oleg Nesterov 
> > > > > > > > > >> > wrote:
> > > > > > > > > >> > >
> > > > > > > > > >> > > Could you explain when it should return POLLIN? When 
> > > > > > > > > >> > > the whole
> > > > > > > > > >process exits?
> > > > > > > > > >> >
> > > > > > > > > >> > It returns POLLIN when the task is dead or doesn't exist 
> > > > > > > > > >> > anymore,
> > > > > > > > > >or when it
> > > > > > > > > >> > is in a zombie state and there's no other thread in the 
> > > > > > > > > >> > thread
> > > > > > > > > >group.
> > > > > > > > > >>
> > > > > > > > > >> IOW, when the whole thread group exits, so it can't be 
> > > > > > > > > >> used to
> > > > > > > > > >monitor sub-threads.
> > > > > > > > > >>
> > > > > > > > > >> just in case... speaking of this patch it doesn't modify
> > > > > > > > > >proc_tid_base_operations,
> > > > > > > > > >> so you can't poll("/proc/sub-thread-tid") anyway, but iiuc 
> > > > > > > > > >> you are
> > > > > > > > > >going to use
> > > > > > > > > >> the anonymous file returned by CLONE_PIDFD ?
> > > > > > > > > >
> > > > > > > > > >I don't think procfs works that way. /proc/sub-thread-tid has
> > > > > > > > > >proc_tgid_base_operations despite not being a thread group 
> > > > > > > > > >leader.
> > > > > > > > > >(Yes, that's kinda weird.) AFAICS the WARN_ON_ONCE() in this 
> > > > > > > > > >code can
> > > > > > > > > >be hit trivially, and then the code will misbehave.
> > > > > > > > > >
> > > > > > > > > >@Joel: I think you'll have to either rewrite this to 
> > > > > > > > > >explicitly bail
> > > > > > > > > >out if you're dealing with a thread group leader, or make 
> > > > > > > > > >the code
> > > > > > > > > >work for threads, too.
> > > > > > > > >
> > > > > > > > > The latter case probably being preferred if this API is 
> > > > > > > > > supposed to be
> > > > > > > > > useable for thread management in userspace.
> > > > > > > >
> > > > > > > > At the moment, we are not planning to use this for sub-thread 
> > > > > > > > management. I
> > > > > > > > am reworking this patch to only work on clone(2) pidfds which 
> > > > > > > > makes the above
> > > > > > >
> > > > > > > Indeed and agreed.
> > > > > > >
> > > > > > > > discussion about /proc a bit unnecessary I think. Per the 
> > > > > > > > latest CLONE_PIDFD
> > > > > > > > patches, CLONE_THREAD with pidfd is not supported.
> > > > > > >
> > > > > > > Yes. We have no one asking for it right now and we can easily add 
> > > > > > > this
> > > > > > > later.
> > > > > > >
> > > > > > > Admittedly I haven't gotten around to reviewing the patches here 
> > > > > > > yet
> > > > > > > completely. But one thing about using POLLIN. FreeBSD is using 
> > > > > > > POLLHUP
> > > > > > > on process exit which I think is nice as well. How about returning
> > > > > > > POLLIN | POLLHUP on process exit?
> > > > > > > We already do things like this. For example, when you proxy 
> > > > > > > between
> > > > > > > ttys. If the process that you're reading data from has exited and 
> > > > > > > closed
> > > > > > > it's end you still can't usually simply exit because it might 
> > > > > > > have still
> > > > > > > buffered data that you want to read.  The way one can deal with 
> > > > > > > this
> > > > > > > from  userspace is that you can observe a (POLLHUP | POLLIN) 
> > > > > > > event and
> > > > > > > you keep on reading until you only observe a POLLHUP without a 
> > > > > > > POLLIN
> > > > > > > event at which point you know you have read
> > > > > > > all data.
> > > > > > > I like the semantics for pidfds as well as it would indicate:
> > > > > > > - POLLHUP -> process has exited
> > > > > > > - POLLIN  -> information can be read
> > > > > >
> > > > > > Actually I think a bit different about this, in my opinion the 
> > > > > > pidfd should
> > > > > > always be readable (we would store the exit status somewhere in the 
> > > > > > future
> > > > > > which would be readable, even after task_struct is dead). So I was 
> > > 

Re: [PATCH RFC 1/2] Add polling support to pidfd

2019-04-19 Thread Christian Brauner
On Fri, Apr 19, 2019 at 11:48 PM Christian Brauner  wrote:
>
> On Fri, Apr 19, 2019 at 11:21 PM Daniel Colascione  wrote:
> >
> > On Fri, Apr 19, 2019 at 1:57 PM Christian Brauner  
> > wrote:
> > >
> > > On Fri, Apr 19, 2019 at 10:34 PM Daniel Colascione  
> > > wrote:
> > > >
> > > > On Fri, Apr 19, 2019 at 12:49 PM Joel Fernandes 
> > > >  wrote:
> > > > >
> > > > > On Fri, Apr 19, 2019 at 09:18:59PM +0200, Christian Brauner wrote:
> > > > > > On Fri, Apr 19, 2019 at 03:02:47PM -0400, Joel Fernandes wrote:
> > > > > > > On Thu, Apr 18, 2019 at 07:26:44PM +0200, Christian Brauner wrote:
> > > > > > > > On April 18, 2019 7:23:38 PM GMT+02:00, Jann Horn 
> > > > > > > >  wrote:
> > > > > > > > >On Wed, Apr 17, 2019 at 3:09 PM Oleg Nesterov 
> > > > > > > > > wrote:
> > > > > > > > >> On 04/16, Joel Fernandes wrote:
> > > > > > > > >> > On Tue, Apr 16, 2019 at 02:04:31PM +0200, Oleg Nesterov 
> > > > > > > > >> > wrote:
> > > > > > > > >> > >
> > > > > > > > >> > > Could you explain when it should return POLLIN? When the 
> > > > > > > > >> > > whole
> > > > > > > > >process exits?
> > > > > > > > >> >
> > > > > > > > >> > It returns POLLIN when the task is dead or doesn't exist 
> > > > > > > > >> > anymore,
> > > > > > > > >or when it
> > > > > > > > >> > is in a zombie state and there's no other thread in the 
> > > > > > > > >> > thread
> > > > > > > > >group.
> > > > > > > > >>
> > > > > > > > >> IOW, when the whole thread group exits, so it can't be used 
> > > > > > > > >> to
> > > > > > > > >monitor sub-threads.
> > > > > > > > >>
> > > > > > > > >> just in case... speaking of this patch it doesn't modify
> > > > > > > > >proc_tid_base_operations,
> > > > > > > > >> so you can't poll("/proc/sub-thread-tid") anyway, but iiuc 
> > > > > > > > >> you are
> > > > > > > > >going to use
> > > > > > > > >> the anonymous file returned by CLONE_PIDFD ?
> > > > > > > > >
> > > > > > > > >I don't think procfs works that way. /proc/sub-thread-tid has
> > > > > > > > >proc_tgid_base_operations despite not being a thread group 
> > > > > > > > >leader.
> > > > > > > > >(Yes, that's kinda weird.) AFAICS the WARN_ON_ONCE() in this 
> > > > > > > > >code can
> > > > > > > > >be hit trivially, and then the code will misbehave.
> > > > > > > > >
> > > > > > > > >@Joel: I think you'll have to either rewrite this to 
> > > > > > > > >explicitly bail
> > > > > > > > >out if you're dealing with a thread group leader, or make the 
> > > > > > > > >code
> > > > > > > > >work for threads, too.
> > > > > > > >
> > > > > > > > The latter case probably being preferred if this API is 
> > > > > > > > supposed to be
> > > > > > > > useable for thread management in userspace.
> > > > > > >
> > > > > > > At the moment, we are not planning to use this for sub-thread 
> > > > > > > management. I
> > > > > > > am reworking this patch to only work on clone(2) pidfds which 
> > > > > > > makes the above
> > > > > >
> > > > > > Indeed and agreed.
> > > > > >
> > > > > > > discussion about /proc a bit unnecessary I think. Per the latest 
> > > > > > > CLONE_PIDFD
> > > > > > > patches, CLONE_THREAD with pidfd is not supported.
> > > > > >
> > > > > > Yes. We have no one asking for it right now and we can easily add 
> > > > > > this
> > > > > > later.
> > > > > >
> > > > > > Admittedly I haven't gotten around to reviewing the patches here yet
> > > > > > completely. But one thing about using POLLIN. FreeBSD is using 
> > > > > > POLLHUP
> > > > > > on process exit which I think is nice as well. How about returning
> > > > > > POLLIN | POLLHUP on process exit?
> > > > > > We already do things like this. For example, when you proxy between
> > > > > > ttys. If the process that you're reading data from has exited and 
> > > > > > closed
> > > > > > it's end you still can't usually simply exit because it might have 
> > > > > > still
> > > > > > buffered data that you want to read.  The way one can deal with this
> > > > > > from  userspace is that you can observe a (POLLHUP | POLLIN) event 
> > > > > > and
> > > > > > you keep on reading until you only observe a POLLHUP without a 
> > > > > > POLLIN
> > > > > > event at which point you know you have read
> > > > > > all data.
> > > > > > I like the semantics for pidfds as well as it would indicate:
> > > > > > - POLLHUP -> process has exited
> > > > > > - POLLIN  -> information can be read
> > > > >
> > > > > Actually I think a bit different about this, in my opinion the pidfd 
> > > > > should
> > > > > always be readable (we would store the exit status somewhere in the 
> > > > > future
> > > > > which would be readable, even after task_struct is dead). So I was 
> > > > > thinking
> > > > > we always return EPOLLIN.  If process has not exited, then it blocks.
> > > >
> > > > ITYM that a pidfd polls as readable *once a task exits* and stays
> > > > readable forever. Before a task exit, a poll on a pidfd should *not*
> > > > yield POLLIN and reading that pidfd should *not* complete 

Re: [PATCH v1 2/3] PCI / ACPI: Remove the need for 'struct hotplug_params'

2019-04-19 Thread Bjorn Helgaas
On Fri, Apr 19, 2019 at 03:07:45PM -0500, Bjorn Helgaas wrote:
> On Fri, Feb 08, 2019 at 10:24:12AM -0600, Alexandru Gagniuc wrote:
> > We used to first parse all the _HPP and _HPX tables before using the
> > information to program registers of PCIe devices. Up until HPX type 2,
> > there was only one structure of each type, so we could cheat and store
> > it on the stack.
> > 
> > With HPX type 3 we get an arbitrary number of entries, so the above
> > model doesn't scale that well. Instead of parsing all tables at once,
> > parse and program each entry separately. For _HPP and _HPX 0 thru 2,
> > this is functionally equivalent. The change enables the upcoming _HPX3
> > to integrate more easily.
> 
> I think this is tremendous!  It's going to simplify this code
> dramatically.  Two comments below.

> >  static void pci_configure_device(struct pci_dev *dev)
> >  {
> > -   struct hotplug_params hpp;
> > -   int ret;
> > +   static const struct hotplug_program_ops hp_ops = {
> > +   .program_type0 = program_hpp_type0,
> > +   .program_type1 = program_hpp_type1,
> > +   .program_type2 = program_hpp_type2,
> > +   };
> 
> What if we just moved program_hpp_type0(), etc from probe.c to
> pci-acpi.c?  The only reason I see to have it in probe.c is for
> pci_default_type0, and I think that is a pretty obtuse way of doing
> default configuration.  I would have no problem at all just hardcoding
> those defaults in probe.c and then potentially having them overwritten
> by _HPP/_HPX.

Actually, never mind about this.  This would be a perfect project for
mentoring a Linux newbie.

I'll merge this series as-is and any restructuring/cleanup can happen
later, since it's not related to this series anyway.

Bjorn


Re: [PATCH RFC 1/2] Add polling support to pidfd

2019-04-19 Thread Christian Brauner
On Fri, Apr 19, 2019 at 11:20 PM Joel Fernandes  wrote:
>
> On Fri, Apr 19, 2019 at 10:57:11PM +0200, Christian Brauner wrote:
> > On Fri, Apr 19, 2019 at 10:34 PM Daniel Colascione  
> > wrote:
> > >
> > > On Fri, Apr 19, 2019 at 12:49 PM Joel Fernandes  
> > > wrote:
> > > >
> > > > On Fri, Apr 19, 2019 at 09:18:59PM +0200, Christian Brauner wrote:
> > > > > On Fri, Apr 19, 2019 at 03:02:47PM -0400, Joel Fernandes wrote:
> > > > > > On Thu, Apr 18, 2019 at 07:26:44PM +0200, Christian Brauner wrote:
> > > > > > > On April 18, 2019 7:23:38 PM GMT+02:00, Jann Horn 
> > > > > > >  wrote:
> > > > > > > >On Wed, Apr 17, 2019 at 3:09 PM Oleg Nesterov  
> > > > > > > >wrote:
> > > > > > > >> On 04/16, Joel Fernandes wrote:
> > > > > > > >> > On Tue, Apr 16, 2019 at 02:04:31PM +0200, Oleg Nesterov 
> > > > > > > >> > wrote:
> > > > > > > >> > >
> > > > > > > >> > > Could you explain when it should return POLLIN? When the 
> > > > > > > >> > > whole
> > > > > > > >process exits?
> > > > > > > >> >
> > > > > > > >> > It returns POLLIN when the task is dead or doesn't exist 
> > > > > > > >> > anymore,
> > > > > > > >or when it
> > > > > > > >> > is in a zombie state and there's no other thread in the 
> > > > > > > >> > thread
> > > > > > > >group.
> > > > > > > >>
> > > > > > > >> IOW, when the whole thread group exits, so it can't be used to
> > > > > > > >monitor sub-threads.
> > > > > > > >>
> > > > > > > >> just in case... speaking of this patch it doesn't modify
> > > > > > > >proc_tid_base_operations,
> > > > > > > >> so you can't poll("/proc/sub-thread-tid") anyway, but iiuc you 
> > > > > > > >> are
> > > > > > > >going to use
> > > > > > > >> the anonymous file returned by CLONE_PIDFD ?
> > > > > > > >
> > > > > > > >I don't think procfs works that way. /proc/sub-thread-tid has
> > > > > > > >proc_tgid_base_operations despite not being a thread group 
> > > > > > > >leader.
> > > > > > > >(Yes, that's kinda weird.) AFAICS the WARN_ON_ONCE() in this 
> > > > > > > >code can
> > > > > > > >be hit trivially, and then the code will misbehave.
> > > > > > > >
> > > > > > > >@Joel: I think you'll have to either rewrite this to explicitly 
> > > > > > > >bail
> > > > > > > >out if you're dealing with a thread group leader, or make the 
> > > > > > > >code
> > > > > > > >work for threads, too.
> > > > > > >
> > > > > > > The latter case probably being preferred if this API is supposed 
> > > > > > > to be
> > > > > > > useable for thread management in userspace.
> > > > > >
> > > > > > At the moment, we are not planning to use this for sub-thread 
> > > > > > management. I
> > > > > > am reworking this patch to only work on clone(2) pidfds which makes 
> > > > > > the above
> > > > >
> > > > > Indeed and agreed.
> > > > >
> > > > > > discussion about /proc a bit unnecessary I think. Per the latest 
> > > > > > CLONE_PIDFD
> > > > > > patches, CLONE_THREAD with pidfd is not supported.
> > > > >
> > > > > Yes. We have no one asking for it right now and we can easily add this
> > > > > later.
> > > > >
> > > > > Admittedly I haven't gotten around to reviewing the patches here yet
> > > > > completely. But one thing about using POLLIN. FreeBSD is using POLLHUP
> > > > > on process exit which I think is nice as well. How about returning
> > > > > POLLIN | POLLHUP on process exit?
> > > > > We already do things like this. For example, when you proxy between
> > > > > ttys. If the process that you're reading data from has exited and 
> > > > > closed
> > > > > it's end you still can't usually simply exit because it might have 
> > > > > still
> > > > > buffered data that you want to read.  The way one can deal with this
> > > > > from  userspace is that you can observe a (POLLHUP | POLLIN) event and
> > > > > you keep on reading until you only observe a POLLHUP without a POLLIN
> > > > > event at which point you know you have read
> > > > > all data.
> > > > > I like the semantics for pidfds as well as it would indicate:
> > > > > - POLLHUP -> process has exited
> > > > > - POLLIN  -> information can be read
> > > >
> > > > Actually I think a bit different about this, in my opinion the pidfd 
> > > > should
> > > > always be readable (we would store the exit status somewhere in the 
> > > > future
> > > > which would be readable, even after task_struct is dead). So I was 
> > > > thinking
> > > > we always return EPOLLIN.  If process has not exited, then it blocks.
> > >
> > > ITYM that a pidfd polls as readable *once a task exits* and stays
> > > readable forever. Before a task exit, a poll on a pidfd should *not*
> > > yield POLLIN and reading that pidfd should *not* complete immediately.
> > > There's no way that, having observed POLLIN on a pidfd, you should
> > > ever then *not* see POLLIN on that pidfd in the future --- it's a
> > > one-way transition from not-ready-to-get-exit-status to
> > > ready-to-get-exit-status.
> >
> > What do you consider interesting state transitions? A listener on a pidfd
> > in 

Re: [PATCH v20 00/28] Intel SGX1 support

2019-04-19 Thread Jethro Beekman
On 2019-04-19 14:34, Thomas Gleixner wrote:
> And how so? You create writeable AND executable memory. That's a nono and
> you can argue in circles, that's not going to change with any of your
> proposed changes.

On 2019-04-19 14:38, Thomas Gleixner wrote:
> You are working around LSM nothing else and that's just not going to fly.

Based on your comments, I'm still unsure if we're on the same page with
regards to what I'm proposing.

Here's a regular non-SGX flow that LSM would likely prevent:

mmap(PROT_READ|PROT_WRITE)
memcpy()
mmap(PROT_READ|PROT_EXEC) <-- denied by LSM

Or just something based on regular PT permissions:

mmap(PROT_READ|PROT_EXEC)
memcpy() <-- SIGSEGV

Now, the equivalent for SGX:

mmap(PROT_READ|PROT_WRITE)
ioctl(EADD)
mmap(PROT_READ|PROT_EXEC) <-- denied by LSM

This works fine with v20 as-is. However, consider the equivalent of the
PT-based flow:

mmap(PROT_READ|PROT_EXEC)
ioctl(EADD) <-- no error!

It's not me that's working around the LSM, it's the SGX driver! It's
writing to memory that's not marked writable! The fundamental issue here
is that the SGX instruction set has several instructions that bypass the
page table permission bits, and this is (naturally) confusing to any
kind of reference monitor like the LSM framework. You can come up with
similar scenarios that involve PROT_READ|PROT_WRITE|PROT_EXEC or
ptrace(PTRACE_POKETEXT). So, clearly, the proper way to fix this failure
of complete mediation is by enforcing appropriate page-table permissions
even on the SGX instructions that don't do it themselves. Just make any
implicit memory access look like a regular memory access and now
everyone is on the same page (pun intended).

--
Jethro Beekman | Fortanix


mmotm 2019-04-19-14-53 uploaded

2019-04-19 Thread akpm
The mm-of-the-moment snapshot 2019-04-19-14-53 has been uploaded to

   http://www.ozlabs.org/~akpm/mmotm/

mmotm-readme.txt says

README for mm-of-the-moment:

http://www.ozlabs.org/~akpm/mmotm/

This is a snapshot of my -mm patch queue.  Uploaded at random hopefully
more than once a week.

You will need quilt to apply these patches to the latest Linus release (5.x
or 5.x-rcY).  The series file is in broken-out.tar.gz and is duplicated in
http://ozlabs.org/~akpm/mmotm/series

The file broken-out.tar.gz contains two datestamp files: .DATE and
.DATE--mm-dd-hh-mm-ss.  Both contain the string -mm-dd-hh-mm-ss,
followed by the base kernel version against which this patch series is to
be applied.

This tree is partially included in linux-next.  To see which patches are
included in linux-next, consult the `series' file.  Only the patches
within the #NEXT_PATCHES_START/#NEXT_PATCHES_END markers are included in
linux-next.


A full copy of the full kernel tree with the linux-next and mmotm patches
already applied is available through git within an hour of the mmotm
release.  Individual mmotm releases are tagged.  The master branch always
points to the latest release, so it's constantly rebasing.

http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/



The directory http://www.ozlabs.org/~akpm/mmots/ (mm-of-the-second)
contains daily snapshots of the -mm tree.  It is updated more frequently
than mmotm, and is untested.

A git copy of this tree is available at

http://git.cmpxchg.org/cgit.cgi/linux-mmots.git/

and use of this tree is similar to
http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/, described above.


This mmotm tree contains the following patches against 5.1-rc5:
(patches marked "*" will be included in linux-next)

  origin.patch
* mm-add-sys-kernel-slab-cache-cache_dma32.patch
* userfaultfd-use-rcu-to-free-the-task-struct-when-fork-fails.patch
* mm-memory_hotplug-drop-memory-device-reference-after-find_memory_block.patch
* zram-pass-down-the-bvec-we-need-to-read-into-in-the-work-struct.patch
* lib-kconfigdebug-fix-build-error-without-config_block.patch
* lib-test_vmalloc-do-not-create-cpumask_t-variable-on-stack.patch
* 
mm-do-not-boost-watermarks-to-avoid-fragmentation-for-the-discontig-memory-model.patch
* mm-page_alloc-always-use-a-captured-page-regardless-of-compaction-result.patch
* prctl-fix-false-positive-in-validate_prctl_map.patch
* scripts-spellingtxt-add-more-typos-to-spellingtxt-and-sort.patch
* arch-sh-boards-mach-dreamcast-irqc-remove-duplicate-header.patch
* debugobjects-move-printk-out-of-db-lock-critical-sections.patch
* ocfs2-use-common-file-type-conversion.patch
* ocfs2-fix-ocfs2-read-inode-data-panic-in-ocfs2_iget.patch
* ocfs2-clear-zero-in-unaligned-direct-io.patch
* ocfs2-clear-zero-in-unaligned-direct-io-checkpatch-fixes.patch
* ocfs2-wait-for-recovering-done-after-direct-unlock-request.patch
* ocfs2-checkpoint-appending-truncate-log-transaction-before-flushing.patch
* ramfs-support-o_tmpfile.patch
  mm.patch
* list-add-function-list_rotate_to_front.patch
* slob-respect-list_head-abstraction-layer.patch
* slob-use-slab_list-instead-of-lru.patch
* slub-add-comments-to-endif-pre-processor-macros.patch
* slub-use-slab_list-instead-of-lru.patch
* slab-use-slab_list-instead-of-lru.patch
* mm-remove-stale-comment-from-page-struct.patch
* slub-remove-useless-kmem_cache_debug-before-remove_full.patch
* mm-slab-remove-unneed-check-in-cpuup_canceled.patch
* slub-update-the-comment-about-slab-frozen.patch
* slab-fix-an-infinite-loop-in-leaks_show.patch
* slab-fix-an-infinite-loop-in-leaks_show-fix.patch
* mm-vmscan-drop-zone-id-from-kswapd-tracepoints.patch
* mm-cma_debugc-fix-the-break-condition-in-cma_maxchunk_get.patch
* userfaultfd-sysctl-add-vmunprivileged_userfaultfd.patch
* userfaultfd-sysctl-add-vmunprivileged_userfaultfd-fix.patch
* page-cache-store-only-head-pages-in-i_pages.patch
* page-cache-store-only-head-pages-in-i_pages-fix.patch
* page-cache-store-only-head-pages-in-i_pages-fix-fix.patch
* mm-page_alloc-disallow-__gfp_comp-in-alloc_pages_exact.patch
* mm-move-recent_rotated-pages-calculation-to-shrink_inactive_list.patch
* mm-move-nr_deactivate-accounting-to-shrink_active_list.patch
* mm-move-nr_deactivate-accounting-to-shrink_active_list-fix.patch
* mm-remove-pages_to_free-argument-of-move_active_pages_to_lru.patch
* mm-generalize-putback-scan-functions.patch
* mm-gup-replace-get_user_pages_longterm-with-foll_longterm.patch
* mm-gup-replace-get_user_pages_longterm-with-foll_longterm-v3.patch
* mm-gup-change-write-parameter-to-flags-in-fast-walk.patch
* mm-gup-change-gup-fast-to-use-flags-rather-than-a-write-bool.patch
* mm-gup-add-foll_longterm-capability-to-gup-fast.patch
* mm-gup-add-foll_longterm-capability-to-gup-fast-v3.patch
* ib-hfi1-use-the-new-foll_longterm-flag-to-get_user_pages_fast.patch
* ib-hfi1-use-the-new-foll_longterm-flag-to-get_user_pages_fast-v3.patch
* ib-qib-use-the-new-foll_longterm-flag-to-get_user_pages_fast.patch
* 

Re: [PATCH] Staging: vc04_services: Cleanup in ctrl_set_bitrate()

2019-04-19 Thread Stefan Wahren
Hi Madhumitha,

Am 19.04.19 um 23:23 schrieb Madhumitha Prabakaran:
> Remove unnecessary variable and replace return type.
>
> Issue suggested by Coccinelle.
>
> Signed-off-by: Madhumitha Prabakaran 
> ---
>  drivers/staging/vc04_services/bcm2835-camera/controls.c | 7 +--
>  1 file changed, 1 insertion(+), 6 deletions(-)
>
> diff --git a/drivers/staging/vc04_services/bcm2835-camera/controls.c 
> b/drivers/staging/vc04_services/bcm2835-camera/controls.c
> index e39ab5fae724..f87fa7f61db3 100644
> --- a/drivers/staging/vc04_services/bcm2835-camera/controls.c
> +++ b/drivers/staging/vc04_services/bcm2835-camera/controls.c
> @@ -607,18 +607,13 @@ static int ctrl_set_bitrate(struct bm2835_mmal_dev *dev,
>   struct v4l2_ctrl *ctrl,
>   const struct bm2835_mmal_v4l2_ctrl *mmal_ctrl)
>  {
> - int ret;
>   struct vchiq_mmal_port *encoder_out;
>  
>   dev->capture.encode_bitrate = ctrl->val;
>  
>   encoder_out = >component[MMAL_COMPONENT_VIDEO_ENCODE]->output[0];
>  
> - ret = vchiq_mmal_port_parameter_set(dev->instance, encoder_out,
> - mmal_ctrl->mmal_id,
> - >val, sizeof(ctrl->val));
> - ret = 0;
> - return ret;
> + return 0;

i don't understand why we can remove here the function call.

Stefan

>  }
>  
>  static int ctrl_set_bitrate_mode(struct bm2835_mmal_dev *dev,


Re: [PATCH RFC 1/2] Add polling support to pidfd

2019-04-19 Thread Christian Brauner
On Fri, Apr 19, 2019 at 11:21 PM Daniel Colascione  wrote:
>
> On Fri, Apr 19, 2019 at 1:57 PM Christian Brauner  
> wrote:
> >
> > On Fri, Apr 19, 2019 at 10:34 PM Daniel Colascione  
> > wrote:
> > >
> > > On Fri, Apr 19, 2019 at 12:49 PM Joel Fernandes  
> > > wrote:
> > > >
> > > > On Fri, Apr 19, 2019 at 09:18:59PM +0200, Christian Brauner wrote:
> > > > > On Fri, Apr 19, 2019 at 03:02:47PM -0400, Joel Fernandes wrote:
> > > > > > On Thu, Apr 18, 2019 at 07:26:44PM +0200, Christian Brauner wrote:
> > > > > > > On April 18, 2019 7:23:38 PM GMT+02:00, Jann Horn 
> > > > > > >  wrote:
> > > > > > > >On Wed, Apr 17, 2019 at 3:09 PM Oleg Nesterov  
> > > > > > > >wrote:
> > > > > > > >> On 04/16, Joel Fernandes wrote:
> > > > > > > >> > On Tue, Apr 16, 2019 at 02:04:31PM +0200, Oleg Nesterov 
> > > > > > > >> > wrote:
> > > > > > > >> > >
> > > > > > > >> > > Could you explain when it should return POLLIN? When the 
> > > > > > > >> > > whole
> > > > > > > >process exits?
> > > > > > > >> >
> > > > > > > >> > It returns POLLIN when the task is dead or doesn't exist 
> > > > > > > >> > anymore,
> > > > > > > >or when it
> > > > > > > >> > is in a zombie state and there's no other thread in the 
> > > > > > > >> > thread
> > > > > > > >group.
> > > > > > > >>
> > > > > > > >> IOW, when the whole thread group exits, so it can't be used to
> > > > > > > >monitor sub-threads.
> > > > > > > >>
> > > > > > > >> just in case... speaking of this patch it doesn't modify
> > > > > > > >proc_tid_base_operations,
> > > > > > > >> so you can't poll("/proc/sub-thread-tid") anyway, but iiuc you 
> > > > > > > >> are
> > > > > > > >going to use
> > > > > > > >> the anonymous file returned by CLONE_PIDFD ?
> > > > > > > >
> > > > > > > >I don't think procfs works that way. /proc/sub-thread-tid has
> > > > > > > >proc_tgid_base_operations despite not being a thread group 
> > > > > > > >leader.
> > > > > > > >(Yes, that's kinda weird.) AFAICS the WARN_ON_ONCE() in this 
> > > > > > > >code can
> > > > > > > >be hit trivially, and then the code will misbehave.
> > > > > > > >
> > > > > > > >@Joel: I think you'll have to either rewrite this to explicitly 
> > > > > > > >bail
> > > > > > > >out if you're dealing with a thread group leader, or make the 
> > > > > > > >code
> > > > > > > >work for threads, too.
> > > > > > >
> > > > > > > The latter case probably being preferred if this API is supposed 
> > > > > > > to be
> > > > > > > useable for thread management in userspace.
> > > > > >
> > > > > > At the moment, we are not planning to use this for sub-thread 
> > > > > > management. I
> > > > > > am reworking this patch to only work on clone(2) pidfds which makes 
> > > > > > the above
> > > > >
> > > > > Indeed and agreed.
> > > > >
> > > > > > discussion about /proc a bit unnecessary I think. Per the latest 
> > > > > > CLONE_PIDFD
> > > > > > patches, CLONE_THREAD with pidfd is not supported.
> > > > >
> > > > > Yes. We have no one asking for it right now and we can easily add this
> > > > > later.
> > > > >
> > > > > Admittedly I haven't gotten around to reviewing the patches here yet
> > > > > completely. But one thing about using POLLIN. FreeBSD is using POLLHUP
> > > > > on process exit which I think is nice as well. How about returning
> > > > > POLLIN | POLLHUP on process exit?
> > > > > We already do things like this. For example, when you proxy between
> > > > > ttys. If the process that you're reading data from has exited and 
> > > > > closed
> > > > > it's end you still can't usually simply exit because it might have 
> > > > > still
> > > > > buffered data that you want to read.  The way one can deal with this
> > > > > from  userspace is that you can observe a (POLLHUP | POLLIN) event and
> > > > > you keep on reading until you only observe a POLLHUP without a POLLIN
> > > > > event at which point you know you have read
> > > > > all data.
> > > > > I like the semantics for pidfds as well as it would indicate:
> > > > > - POLLHUP -> process has exited
> > > > > - POLLIN  -> information can be read
> > > >
> > > > Actually I think a bit different about this, in my opinion the pidfd 
> > > > should
> > > > always be readable (we would store the exit status somewhere in the 
> > > > future
> > > > which would be readable, even after task_struct is dead). So I was 
> > > > thinking
> > > > we always return EPOLLIN.  If process has not exited, then it blocks.
> > >
> > > ITYM that a pidfd polls as readable *once a task exits* and stays
> > > readable forever. Before a task exit, a poll on a pidfd should *not*
> > > yield POLLIN and reading that pidfd should *not* complete immediately.
> > > There's no way that, having observed POLLIN on a pidfd, you should
> > > ever then *not* see POLLIN on that pidfd in the future --- it's a
> > > one-way transition from not-ready-to-get-exit-status to
> > > ready-to-get-exit-status.
> >
> > What do you consider interesting state transitions? A listener on a pidfd
> > in 

Re: [PATCH RFC 1/2] Add polling support to pidfd

2019-04-19 Thread Joel Fernandes
On Fri, Apr 19, 2019 at 02:24:09PM -0700, Daniel Colascione wrote:
> On Fri, Apr 19, 2019 at 2:20 PM Joel Fernandes  wrote:
> >
> > On Fri, Apr 19, 2019 at 10:57:11PM +0200, Christian Brauner wrote:
> > > On Fri, Apr 19, 2019 at 10:34 PM Daniel Colascione  
> > > wrote:
> > > >
> > > > On Fri, Apr 19, 2019 at 12:49 PM Joel Fernandes 
> > > >  wrote:
> > > > >
> > > > > On Fri, Apr 19, 2019 at 09:18:59PM +0200, Christian Brauner wrote:
> > > > > > On Fri, Apr 19, 2019 at 03:02:47PM -0400, Joel Fernandes wrote:
> > > > > > > On Thu, Apr 18, 2019 at 07:26:44PM +0200, Christian Brauner wrote:
> > > > > > > > On April 18, 2019 7:23:38 PM GMT+02:00, Jann Horn 
> > > > > > > >  wrote:
> > > > > > > > >On Wed, Apr 17, 2019 at 3:09 PM Oleg Nesterov 
> > > > > > > > > wrote:
> > > > > > > > >> On 04/16, Joel Fernandes wrote:
> > > > > > > > >> > On Tue, Apr 16, 2019 at 02:04:31PM +0200, Oleg Nesterov 
> > > > > > > > >> > wrote:
> > > > > > > > >> > >
> > > > > > > > >> > > Could you explain when it should return POLLIN? When the 
> > > > > > > > >> > > whole
> > > > > > > > >process exits?
> > > > > > > > >> >
> > > > > > > > >> > It returns POLLIN when the task is dead or doesn't exist 
> > > > > > > > >> > anymore,
> > > > > > > > >or when it
> > > > > > > > >> > is in a zombie state and there's no other thread in the 
> > > > > > > > >> > thread
> > > > > > > > >group.
> > > > > > > > >>
> > > > > > > > >> IOW, when the whole thread group exits, so it can't be used 
> > > > > > > > >> to
> > > > > > > > >monitor sub-threads.
> > > > > > > > >>
> > > > > > > > >> just in case... speaking of this patch it doesn't modify
> > > > > > > > >proc_tid_base_operations,
> > > > > > > > >> so you can't poll("/proc/sub-thread-tid") anyway, but iiuc 
> > > > > > > > >> you are
> > > > > > > > >going to use
> > > > > > > > >> the anonymous file returned by CLONE_PIDFD ?
> > > > > > > > >
> > > > > > > > >I don't think procfs works that way. /proc/sub-thread-tid has
> > > > > > > > >proc_tgid_base_operations despite not being a thread group 
> > > > > > > > >leader.
> > > > > > > > >(Yes, that's kinda weird.) AFAICS the WARN_ON_ONCE() in this 
> > > > > > > > >code can
> > > > > > > > >be hit trivially, and then the code will misbehave.
> > > > > > > > >
> > > > > > > > >@Joel: I think you'll have to either rewrite this to 
> > > > > > > > >explicitly bail
> > > > > > > > >out if you're dealing with a thread group leader, or make the 
> > > > > > > > >code
> > > > > > > > >work for threads, too.
> > > > > > > >
> > > > > > > > The latter case probably being preferred if this API is 
> > > > > > > > supposed to be
> > > > > > > > useable for thread management in userspace.
> > > > > > >
> > > > > > > At the moment, we are not planning to use this for sub-thread 
> > > > > > > management. I
> > > > > > > am reworking this patch to only work on clone(2) pidfds which 
> > > > > > > makes the above
> > > > > >
> > > > > > Indeed and agreed.
> > > > > >
> > > > > > > discussion about /proc a bit unnecessary I think. Per the latest 
> > > > > > > CLONE_PIDFD
> > > > > > > patches, CLONE_THREAD with pidfd is not supported.
> > > > > >
> > > > > > Yes. We have no one asking for it right now and we can easily add 
> > > > > > this
> > > > > > later.
> > > > > >
> > > > > > Admittedly I haven't gotten around to reviewing the patches here yet
> > > > > > completely. But one thing about using POLLIN. FreeBSD is using 
> > > > > > POLLHUP
> > > > > > on process exit which I think is nice as well. How about returning
> > > > > > POLLIN | POLLHUP on process exit?
> > > > > > We already do things like this. For example, when you proxy between
> > > > > > ttys. If the process that you're reading data from has exited and 
> > > > > > closed
> > > > > > it's end you still can't usually simply exit because it might have 
> > > > > > still
> > > > > > buffered data that you want to read.  The way one can deal with this
> > > > > > from  userspace is that you can observe a (POLLHUP | POLLIN) event 
> > > > > > and
> > > > > > you keep on reading until you only observe a POLLHUP without a 
> > > > > > POLLIN
> > > > > > event at which point you know you have read
> > > > > > all data.
> > > > > > I like the semantics for pidfds as well as it would indicate:
> > > > > > - POLLHUP -> process has exited
> > > > > > - POLLIN  -> information can be read
> > > > >
> > > > > Actually I think a bit different about this, in my opinion the pidfd 
> > > > > should
> > > > > always be readable (we would store the exit status somewhere in the 
> > > > > future
> > > > > which would be readable, even after task_struct is dead). So I was 
> > > > > thinking
> > > > > we always return EPOLLIN.  If process has not exited, then it blocks.
> > > >
> > > > ITYM that a pidfd polls as readable *once a task exits* and stays
> > > > readable forever. Before a task exit, a poll on a pidfd should *not*
> > > > yield POLLIN and reading that pidfd should *not* complete 

Re: [PATCH v20 00/28] Intel SGX1 support

2019-04-19 Thread Thomas Gleixner
On Fri, 19 Apr 2019, Jethro Beekman wrote:
> On 2019-04-19 14:31, Andy Lutomirski wrote:
> > I do think we need to follow LSM rules.  But my bigger point is that
> > there are policies that don’t allow JIT at all. I think we should
> > arrange the SGX API so it’s still usable when such a policy is in
> > effect.

> I don't think we need to arrange that right now. This patch set needs to
> be merged after more than 2 years of development. I'd like to avoid

We merge stuff when it is ready and not when someone declares that it needs
to be merged.

> introducing any more big changes. Let's just do what I described to make
> LSM not broken, which is a minimal change to the current approach. We
> can adjust the API later to support the use case you describe.

You are working around LSM nothing else and that's just not going to fly.

Thanks,

tglx


Re: [PATCH v20 00/28] Intel SGX1 support

2019-04-19 Thread Jethro Beekman
On 2019-04-19 14:31, Andy Lutomirski wrote:
> I do think we need to follow LSM rules.  But my bigger point is that there 
> are policies that don’t allow JIT at all. I think we should arrange the SGX 
> API so it’s still usable when such a policy is in effect.

I don't think we need to arrange that right now. This patch set needs to
be merged after more than 2 years of development. I'd like to avoid
introducing any more big changes. Let's just do what I described to make
LSM not broken, which is a minimal change to the current approach. We
can adjust the API later to support the use case you describe.

--
Jethro Beekman | Fortanix



Re: [PATCH v20 00/28] Intel SGX1 support

2019-04-19 Thread Thomas Gleixner
On Fri, 19 Apr 2019, Jethro Beekman wrote:
> On 2019-04-19 14:15, Andy Lutomirski wrote:
> > With plain mmap() + mprotect(), the LSM will prevent you from making
> > memory that *was* writable executable.  This is by design and SELinux
> > supports it.  I don’t remember the name of the associated SELinux
> > permission off the top of my head.
> >
> > If we start enforcing equivalent rules on SGX, then the current API
> > will simply not allow enclaves to be loaded — no matter how you slice
> > it, loading an enclave with the current API is indistinguishable from
> > making arbitrary data executable.
> >
> Yes this is exactly what I intended here: a very simple change that
> stops SGX from confusing LSM. Just by enforcing that everything that
> looks like a memory write (EADD, EAUG, EDBGWR, etc.) actually requires
> write permissions, reality and LSM should be on the same page.

And how so? You create writeable AND executable memory. That's a nono and
you can argue in circles, that's not going to change with any of your
proposed changes. Andy clearly made a proposal which solves it in a proper
way.

Thanks,

tglx

  1   2   3   4   5   6   >