date:20200923

Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends

2020-09-23 Thread Thomas Gleixner

On Wed, Sep 23 2020 at 17:12, Steven Rostedt wrote:
> On Wed, 23 Sep 2020 22:55:54 +0200
> Then scratch the idea of having anonymous local_lock() and just bring
> local_lock in directly? Then have a kmap local lock, which would only
> block those that need to do a kmap.

That's still going to end up in lock ordering nightmares and you lose
the ability to use kmap_local from arbitrary contexts which was again
one of the goals of this exercise.

Aside of that you're imposing reentrancy protections on something which
does not need it in the first place.

> Now as for migration disabled nesting, at least now we would have
> groupings of this, and perhaps the theorists can handle that. I mean,
> how is this much different that having a bunch of tasks blocked on a
> mutex with the owner is pinned on a CPU?
>
> migrate_disable() is a BKL of pinning affinity.

No. That's just wrong. preempt disable is a concurrency control,
i.e. protecting against reentrancy on a given CPU. But it's a cpu global
protection which means that it's not protecting a specific code path.

Contrary to preempt disable, migrate disable is not protecting against
reentrancy on a given CPU. It's a temporary restriction to the scheduler
on placement.

The fact that disabling preemption implicitely disables migration does
not make them semantically equivalent.

> If we only have local_lock() available (even on !RT), then it makes
> the blocking in groups. At least this way you could grep for all the
> different local_locks in the system and plug that into the algorithm
> for WCS, just like one would with a bunch of mutexes.

You cannot do that on RT at all where migrate disable is substituting
preempt disable in spin and rw locks. The result would be the same as
with a !RT kernel just with horribly bad performance.

That means the stacking problem has to be solved anyway.

So why on earth do you want to create yet another special duct tape case
for kamp_local() which proliferates inconsistency instead of aiming for
consistency accross all preemption models?

Thanks,

tglx

Re: [PATCH] mm: swapfile: avoid split_swap_cluster() NULL pointer dereference

2020-09-23 Thread Huang, Ying

Rafael Aquini  writes:

>> 
>> If there's a race, we should fix the race.  But the code path for
>> swapcache insertion is,
>> 
>> add_to_swap()
>>   get_swap_page() /* Return if fails to allocate */
>>   add_to_swap_cache()
>> SetPageSwapCache()
>> 
>> While the code path to split THP is,
>> 
>> split_huge_page_to_list()
>>   if PageSwapCache()
>> split_swap_cluster()
>> 
>> Both code paths are protected by the page lock.  So there should be some
>> other reasons to trigger the bug.
>
> As mentioned above, no they seem to not be protected (at least, not the
> same page, depending on the case). While add_to_swap() will assure a 
> page_lock on the compound head, split_huge_page_to_list() does not.
>

int split_huge_page_to_list(struct page *page, struct list_head *list)
{
struct page *head = compound_head(page);
struct pglist_data *pgdata = NODE_DATA(page_to_nid(head));
struct deferred_split *ds_queue = get_deferred_split_queue(head);
struct anon_vma *anon_vma = NULL;
struct address_space *mapping = NULL;
int count, mapcount, extra_pins, ret;
unsigned long flags;
pgoff_t end;

VM_BUG_ON_PAGE(is_huge_zero_page(head), head);
VM_BUG_ON_PAGE(!PageLocked(head), head);

I found there's page lock checking in split_huge_page_to_list().

Best Regards,
Huang, Ying

Re: [PATCH v13 2/2] Add PWM fan controller driver for LGM SoC

2020-09-23 Thread Uwe Kleine-König

Hello,

(hhm Thierry already announced to have taken this patch, so my review is
late.)

On Tue, Sep 15, 2020 at 04:23:37PM +0800, Rahul Tanwar wrote:
> Intel Lightning Mountain(LGM) SoC contains a PWM fan controller.
> This PWM controller does not have any other consumer, it is a
> dedicated PWM controller for fan attached to the system. Add
> driver for this PWM fan controller.
> 
> Signed-off-by: Rahul Tanwar 
> Reviewed-by: Andy Shevchenko 
> ---
>  drivers/pwm/Kconfig |  11 ++
>  drivers/pwm/Makefile|   1 +
>  drivers/pwm/pwm-intel-lgm.c | 246 
> 
>  3 files changed, 258 insertions(+)
>  create mode 100644 drivers/pwm/pwm-intel-lgm.c
> 
> diff --git a/drivers/pwm/Kconfig b/drivers/pwm/Kconfig
> index 7dbcf6973d33..4949c51fe90b 100644
> --- a/drivers/pwm/Kconfig
> +++ b/drivers/pwm/Kconfig
> @@ -232,6 +232,17 @@ config PWM_IMX_TPM
> To compile this driver as a module, choose M here: the module
> will be called pwm-imx-tpm.
>  
> +config PWM_INTEL_LGM
> + tristate "Intel LGM PWM support"
> + depends on HAS_IOMEM
> + depends on (OF && X86) || COMPILE_TEST
> + select REGMAP_MMIO
> + help
> +   Generic PWM fan controller driver for LGM SoC.
> +
> +   To compile this driver as a module, choose M here: the module
> +   will be called pwm-intel-lgm.
> +
>  config PWM_IQS620A
>   tristate "Azoteq IQS620A PWM support"
>   depends on MFD_IQS62X || COMPILE_TEST
> diff --git a/drivers/pwm/Makefile b/drivers/pwm/Makefile
> index 2c2ba0a03557..e9431b151694 100644
> --- a/drivers/pwm/Makefile
> +++ b/drivers/pwm/Makefile
> @@ -20,6 +20,7 @@ obj-$(CONFIG_PWM_IMG)   += pwm-img.o
>  obj-$(CONFIG_PWM_IMX1)   += pwm-imx1.o
>  obj-$(CONFIG_PWM_IMX27)  += pwm-imx27.o
>  obj-$(CONFIG_PWM_IMX_TPM)+= pwm-imx-tpm.o
> +obj-$(CONFIG_PWM_INTEL_LGM)  += pwm-intel-lgm.o
>  obj-$(CONFIG_PWM_IQS620A)+= pwm-iqs620a.o
>  obj-$(CONFIG_PWM_JZ4740) += pwm-jz4740.o
>  obj-$(CONFIG_PWM_LP3943) += pwm-lp3943.o
> diff --git a/drivers/pwm/pwm-intel-lgm.c b/drivers/pwm/pwm-intel-lgm.c
> new file mode 100644
> index ..ea3df75a5971
> --- /dev/null
> +++ b/drivers/pwm/pwm-intel-lgm.c
> @@ -0,0 +1,246 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2020 Intel Corporation.
> + *
> + * Limitations:
> + * - The hardware supports fixed period which is dependent on 2/3 or 4
> + *   wire fan mode.

The driver now hardcodes 2-wire mode. IMHO that is worth mentioning.

> +static void lgm_clk_disable(void *data)
> +{
> + struct lgm_pwm_chip *pc = data;
> +
> + clk_disable_unprepare(pc->clk);
> +}
> +
> +static int lgm_clk_enable(struct device *dev, struct lgm_pwm_chip *pc)
> +{
> + int ret;
> +
> + ret = clk_prepare_enable(pc->clk);
> + if (ret)
> + return ret;
> +
> + return devm_add_action_or_reset(dev, lgm_clk_disable, pc);
> +}

My first reflex here was to point out that lgm_clk_disable() isn't the
counter part to lgm_clk_enable() and so lgm_clk_disable() needs
adaption. On a second look this is correct and so I think the function
names are wrong. The usual naming would be to use _release instead of
_disable. Having said that the enable function could be named
devm_clk_enable and live in drivers/clk/clk-devres.c. (Or
devm_clk_get_enabled()?)

> +static void lgm_reset_control_assert(void *data)
> +{
> + struct lgm_pwm_chip *pc = data;
> +
> + reset_control_assert(pc->rst);
> +}
> +
> +static int lgm_reset_control_deassert(struct device *dev, struct 
> lgm_pwm_chip *pc)
> +{
> + int ret;
> +
> + ret = reset_control_deassert(pc->rst);
> + if (ret)
> + return ret;
> +
> + return devm_add_action_or_reset(dev, lgm_reset_control_assert, pc);
> +}

A similar comment applies here.

> +static int lgm_pwm_probe(struct platform_device *pdev)
> +{
> + struct device *dev = &pdev->dev;
> + struct lgm_pwm_chip *pc;
> + void __iomem *io_base;
> + int ret;
> +
> + pc = devm_kzalloc(dev, sizeof(*pc), GFP_KERNEL);
> + if (!pc)
> + return -ENOMEM;
> +
> + platform_set_drvdata(pdev, pc);
> +
> + io_base = devm_platform_ioremap_resource(pdev, 0);
> + if (IS_ERR(io_base))
> + return PTR_ERR(io_base);
> +
> + pc->regmap = devm_regmap_init_mmio(dev, io_base, 
> &lgm_pwm_regmap_config);
> + if (IS_ERR(pc->regmap))
> + return dev_err_probe(dev, PTR_ERR(pc->regmap),
> +  "failed to init register map\n");
> +
> + pc->clk = devm_clk_get(dev, NULL);
> + if (IS_ERR(pc->clk))
> + return dev_err_probe(dev, PTR_ERR(pc->clk), "failed to get 
> clock\n");
> +
> + ret = lgm_clk_enable(dev, pc);
> + if (ret) {
> + dev_err(dev, "failed to enable clock\n");

You used dev_err_probe four times for six error paths. I wonder why you
didn't use it here (and below for a failing pwmchip

[PATCH v4 1/2] Add UFFD_USER_MODE_ONLY

2020-09-23 Thread Lokesh Gidra

userfaultfd handles page faults from both user and kernel code.
Add a new UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes
the resulting userfaultfd object refuse to handle faults from kernel
mode, treating these faults as if SIGBUS were always raised, causing
the kernel code to fail with EFAULT.

A future patch adds a knob allowing administrators to give some
processes the ability to create userfaultfd file objects only if they
pass UFFD_USER_MODE_ONLY, reducing the likelihood that these processes
will exploit userfaultfd's ability to delay kernel page faults to open
timing windows for future exploits.

Signed-off-by: Daniel Colascione 
Signed-off-by: Lokesh Gidra 
---
 fs/userfaultfd.c | 6 +-
 include/uapi/linux/userfaultfd.h | 9 +
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 0e4a3837da52..3191434057f3 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -405,6 +405,9 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned 
long reason)
 
if (ctx->features & UFFD_FEATURE_SIGBUS)
goto out;
+   if ((vmf->flags & FAULT_FLAG_USER) == 0 &&
+   ctx->flags & UFFD_USER_MODE_ONLY)
+   goto out;
 
/*
 * If it's already released don't get it. This avoids to loop
@@ -1975,10 +1978,11 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
BUG_ON(!current->mm);
 
/* Check the UFFD_* constants for consistency.  */
+   BUILD_BUG_ON(UFFD_USER_MODE_ONLY & UFFD_SHARED_FCNTL_FLAGS);
BUILD_BUG_ON(UFFD_CLOEXEC != O_CLOEXEC);
BUILD_BUG_ON(UFFD_NONBLOCK != O_NONBLOCK);
 
-   if (flags & ~UFFD_SHARED_FCNTL_FLAGS)
+   if (flags & ~(UFFD_SHARED_FCNTL_FLAGS | UFFD_USER_MODE_ONLY))
return -EINVAL;
 
ctx = kmem_cache_alloc(userfaultfd_ctx_cachep, GFP_KERNEL);
diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
index e7e98bde221f..5f2d88212f7c 100644
--- a/include/uapi/linux/userfaultfd.h
+++ b/include/uapi/linux/userfaultfd.h
@@ -257,4 +257,13 @@ struct uffdio_writeprotect {
__u64 mode;
 };
 
+/*
+ * Flags for the userfaultfd(2) system call itself.
+ */
+
+/*
+ * Create a userfaultfd that can handle page faults only in user mode.
+ */
+#define UFFD_USER_MODE_ONLY 1
+
 #endif /* _LINUX_USERFAULTFD_H */
-- 
2.28.0.681.g6f77f65b4e-goog

[PATCH v4 0/2] Control over userfaultfd kernel-fault handling

2020-09-23 Thread Lokesh Gidra

This patch series is split from [1]. The other series enables SELinux
support for userfaultfd file descriptors so that its creation and
movement can be controlled.

It has been demonstrated on various occasions that suspending kernel
code execution for an arbitrary amount of time at any access to
userspace memory (copy_from_user()/copy_to_user()/...) can be exploited
to change the intended behavior of the kernel. For instance, handling
page faults in kernel-mode using userfaultfd has been exploited in [2, 3].
Likewise, FUSE, which is similar to userfaultfd in this respect, has been
exploited in [4, 5] for similar outcome.

This small patch series adds a new flag to userfaultfd(2) that allows
callers to give up the ability to handle kernel-mode faults with the
resulting UFFD file object. It then adds a 'user-mode only' option to
the unprivileged_userfaultfd sysctl knob to require unprivileged
callers to use this new flag.

The purpose of this new interface is to decrease the chance of an
unprivileged userfaultfd user taking advantage of userfaultfd to
enhance security vulnerabilities by lengthening the race window in
kernel code.

[1] https://lore.kernel.org/lkml/20200211225547.235083-1-dan...@google.com/
[2] https://duasynt.com/blog/linux-kernel-heap-spray
[3] https://duasynt.com/blog/cve-2016-6187-heap-off-by-one-exploit
[4]
https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html
[5] https://bugs.chromium.org/p/project-zero/issues/detail?id=808

Changes since v3:

- Modified the meaning of value '0' of unprivileged_userfaultfd
sysctl knob. Setting this knob to '0' now allows unprivileged users
to use userfaultfd, but can handle page faults in user-mode only.
- The default value of unprivileged_userfaultfd sysctl knob is changed
to '0'.

Changes since v2:

- Removed 'uffd_flags' and directly used 'UFFD_USER_MODE_ONLY' in
userfaultfd().

Changes since v1:

- Added external references to the threats from allowing unprivileged
users to handle page faults from kernel-mode.
- Removed the new sysctl knob restricting handling of page
faults from kernel-mode, and added an option for the same
in the existing 'unprivileged_userfaultfd' knob.

Lokesh Gidra (2):
Add UFFD_USER_MODE_ONLY
Add user-mode only option to unprivileged_userfaultfd sysctl knob

Documentation/admin-guide/sysctl/vm.rst | 15 ++-
fs/userfaultfd.c| 12 +---
include/uapi/linux/userfaultfd.h| 9 +
3 files changed, 28 insertions(+), 8 deletions(-)

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1325 matches

Mail list logo