[tip: core/rcu] rcu: Enable rcu_normal_after_boot unconditionally for RT

2021-02-15 Thread tip-bot2 for Julia Cartwright
The following commit has been merged into the core/rcu branch of tip:

Commit-ID: 36221e109eb20ac111bc3bf3e8d5639aa457c7e0
Gitweb:
https://git.kernel.org/tip/36221e109eb20ac111bc3bf3e8d5639aa457c7e0
Author:Julia Cartwright 
AuthorDate:Tue, 15 Dec 2020 15:16:47 +01:00
Committer: Paul E. McKenney 
CommitterDate: Mon, 04 Jan 2021 13:43:51 -08:00

rcu: Enable rcu_normal_after_boot unconditionally for RT

Expedited RCU grace periods send IPIs to all non-idle CPUs, and thus can
disrupt time-critical code in real-time applications.  However, there
is a portion of boot-time processing (presumably before any real-time
applications have started) where expedited RCU grace periods are the only
option.  And so it is that experience with the -rt patchset indicates that
PREEMPT_RT systems should always set the rcupdate.rcu_normal_after_boot
kernel boot parameter.

This commit therefore makes the post-boot application environment safe
for real-time applications by making PREEMPT_RT systems disable the
rcupdate.rcu_normal_after_boot kernel boot parameter and acting as
if this parameter had been set.  This means that post-boot calls to
synchronize_rcu_expedited() will be treated as if they were instead
calls to synchronize_rcu(), thus preventing the IPIs, and thus avoiding
disrupting real-time applications.

Suggested-by: Luiz Capitulino 
Acked-by: Paul E. McKenney 
Signed-off-by: Julia Cartwright 
Signed-off-by: Sebastian Andrzej Siewior 
[ paulmck: Update kernel-parameters.txt accordingly. ]
Signed-off-by: Paul E. McKenney 
---
 Documentation/admin-guide/kernel-parameters.txt | 7 +++
 kernel/rcu/update.c | 4 +++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 521255f..e0008d9 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4474,6 +4474,13 @@
only normal grace-period primitives.  No effect
on CONFIG_TINY_RCU kernels.
 
+   But note that CONFIG_PREEMPT_RT=y kernels enables
+   this kernel boot parameter, forcibly setting
+   it to the value one, that is, converting any
+   post-boot attempt at an expedited RCU grace
+   period to instead use normal non-expedited
+   grace-period processing.
+
rcupdate.rcu_task_ipi_delay= [KNL]
Set time in jiffies during which RCU tasks will
avoid sending IPIs, starting with the beginning
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index 39334d2..b95ae86 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -56,8 +56,10 @@
 #ifndef CONFIG_TINY_RCU
 module_param(rcu_expedited, int, 0);
 module_param(rcu_normal, int, 0);
-static int rcu_normal_after_boot;
+static int rcu_normal_after_boot = IS_ENABLED(CONFIG_PREEMPT_RT);
+#ifndef CONFIG_PREEMPT_RT
 module_param(rcu_normal_after_boot, int, 0);
+#endif
 #endif /* #ifndef CONFIG_TINY_RCU */
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC


[tip: locking/core] squashfs: Make use of local lock in multi_cpu decompressor

2020-06-01 Thread tip-bot2 for Julia Cartwright
The following commit has been merged into the locking/core branch of tip:

Commit-ID: fd56200a16c72c7c3ec3e54e06160dfaa5b8dee8
Gitweb:
https://git.kernel.org/tip/fd56200a16c72c7c3ec3e54e06160dfaa5b8dee8
Author:Julia Cartwright 
AuthorDate:Wed, 27 May 2020 22:11:16 +02:00
Committer: Ingo Molnar 
CommitterDate: Thu, 28 May 2020 10:31:10 +02:00

squashfs: Make use of local lock in multi_cpu decompressor

The squashfs multi CPU decompressor makes use of get_cpu_ptr() to
acquire a pointer to per-CPU data. get_cpu_ptr() implicitly disables
preemption which serializes the access to the per-CPU data.

But decompression can take quite some time depending on the size. The
observed preempt disabled times in real world scenarios went up to 8ms,
causing massive wakeup latencies. This happens on all CPUs as the
decompression is fully parallelized.

Replace the implicit preemption control with an explicit local lock.
This allows RT kernels to substitute it with a real per CPU lock, which
serializes the access but keeps the code section preemptible. On non RT
kernels this maps to preempt_disable() as before, i.e. no functional
change.

[ bigeasy: Use local_lock(), patch description]

Reported-by: Alexander Stein 
Signed-off-by: Julia Cartwright 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Ingo Molnar 
Tested-by: Alexander Stein 
Acked-by: Peter Zijlstra 
Link: https://lore.kernel.org/r/20200527201119.1692513-5-bige...@linutronix.de
---
 fs/squashfs/decompressor_multi_percpu.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/fs/squashfs/decompressor_multi_percpu.c 
b/fs/squashfs/decompressor_multi_percpu.c
index 2a2a2d1..e206ebf 100644
--- a/fs/squashfs/decompressor_multi_percpu.c
+++ b/fs/squashfs/decompressor_multi_percpu.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "squashfs_fs.h"
 #include "squashfs_fs_sb.h"
@@ -20,7 +21,8 @@
  */
 
 struct squashfs_stream {
-   void*stream;
+   void*stream;
+   local_lock_tlock;
 };
 
 void *squashfs_decompressor_create(struct squashfs_sb_info *msblk,
@@ -41,6 +43,7 @@ void *squashfs_decompressor_create(struct squashfs_sb_info 
*msblk,
err = PTR_ERR(stream->stream);
goto out;
}
+   local_lock_init(>lock);
}
 
kfree(comp_opts);
@@ -75,12 +78,16 @@ void squashfs_decompressor_destroy(struct squashfs_sb_info 
*msblk)
 int squashfs_decompress(struct squashfs_sb_info *msblk, struct buffer_head 
**bh,
int b, int offset, int length, struct squashfs_page_actor *output)
 {
-   struct squashfs_stream __percpu *percpu =
-   (struct squashfs_stream __percpu *) msblk->stream;
-   struct squashfs_stream *stream = get_cpu_ptr(percpu);
-   int res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
-   offset, length, output);
-   put_cpu_ptr(stream);
+   struct squashfs_stream *stream;
+   int res;
+
+   local_lock(>stream->lock);
+   stream = this_cpu_ptr(msblk->stream);
+
+   res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
+   offset, length, output);
+
+   local_unlock(>stream->lock);
 
if (res < 0)
ERROR("%s decompression failed, data probably corrupt\n",


Re: [patch 10/12] hrtimer: Determine hard/soft expiry mode for hrtimer sleepers on RT

2019-07-26 Thread Julia Cartwright
On Fri, Jul 26, 2019 at 08:30:58PM +0200, Thomas Gleixner wrote:
> From: Sebastian Andrzej Siewior 
> 
> On PREEMPT_RT enabled kernels hrtimers which are not explicitely marked for
> hard interrupt expiry mode are moved into soft interrupt context either for
> latency reasons or because the hrtimer callback takes regular spinlocks or
> invokes other functions which are not suitable for hard interrupt context
> on PREEMPT_RT.
> 
> The hrtimer_sleeper callback is RT compatible in hard interrupt context,
> but there is a latency concern: Untrusted userspace can spawn many threads
> which arm timers for the same expiry time on the same CPU. On expiry that
> causes a latency spike due to the wakeup of a gazillion threads.
> 
> OTOH, priviledged real-time user space applications rely on the low latency
> of hard interrupt wakeups. These syscall related wakeups are all based on
> hrtimer sleepers.
> 
> If the current task is in a real-time scheduling class, mark the mode for
> hard interrupt expiry.
> 
> [ tglx: Split out of a larger combo patch. Added changelog ]
> 
> Signed-off-by: Sebastian Andrzej Siewior 
> Signed-off-by: Thomas Gleixner 
> ---
>  kernel/time/hrtimer.c |   24 
>  1 file changed, 24 insertions(+)
> 
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -1662,6 +1662,30 @@ static enum hrtimer_restart hrtimer_wake
>  static void __hrtimer_init_sleeper(struct hrtimer_sleeper *sl,
>  clockid_t clock_id, enum hrtimer_mode mode)
>  {
> + /*
> +  * On PREEMPT_RT enabled kernels hrtimers which are not explicitely
> +  * marked for hard interrupt expiry mode are moved into soft
> +  * interrupt context either for latency reasons or because the
> +  * hrtimer callback takes regular spinlocks or invokes other
> +  * functions which are not suitable for hard interrupt context on
> +  * PREEMPT_RT.
> +  *
> +  * The hrtimer_sleeper callback is RT compatible in hard interrupt
> +  * context, but there is a latency concern: Untrusted userspace can
> +  * spawn many threads which arm timers for the same expiry time on
> +  * the same CPU. That causes a latency spike due to the wakeup of
> +  * a gazillion threads.
> +  *
> +  * OTOH, priviledged real-time user space applications rely on the
> +  * low latency of hard interrupt wakeups. If the current task is in
> +  * a real-time scheduling class, mark the mode for hard interrupt
> +  * expiry.
> +  */
> + if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
> + if (task_is_realtime(current) && !(mode & HRTIMER_MODE_SOFT))
> + mode |= HRTIMER_MODE_HARD;

Because this ends up sampling the tasks' scheduling parameters only at
the time of enqueue, it doesn't take into consideration whether or not
the task maybe holding a PI lock and later be boosted if contended by an
RT thread.

Am I correct in assuming there is an induced inversion here in this
case, because the deferred wakeup mechanism isn't part of the PI chain?

If so, is this just to be an accepted limitation at this point?  Is the
intent to argue this away as bad RT application design? :)

   Julia


Re: [patch V2 1/1] Kconfig: Introduce CONFIG_PREEMPT_RT

2019-07-18 Thread Julia Cartwright
On Wed, Jul 17, 2019 at 10:01:49PM +0200, Thomas Gleixner wrote:
> Add a new entry to the preemption menu which enables the real-time support
> for the kernel. The choice is only enabled when an architecture supports
> it.
> 
> It selects PREEMPT as the RT features depend on it. To achieve that the
> existing PREEMPT choice is renamed to PREEMPT_LL which select PREEMPT as
> well.
> 
> No functional change.
> 
> Signed-off-by: Thomas Gleixner 
> Acked-by: Paul E. McKenney 
> Acked-by: Steven Rostedt (VMware) 
> Acked-by: Clark Williams 
> Acked-by: Daniel Bristot de Oliveira 
> Acked-by: Frederic Weisbecker 
> Acked-by: Ingo Molnar 
> Acked-by: Peter Zijlstra (Intel) 
> Acked-by: Marc Zyngier 
> Acked-by: Daniel Wagner 
> ---

I'm excited to see where this goes.

Acked-by: Julia Cartwright 

  Julia


[tip:sched/core] kthread: Convert worker lock to raw spinlock

2019-02-28 Thread tip-bot for Julia Cartwright
Commit-ID:  fe99a4f4d6022ec92f9b52a5528cb9b77513e7d1
Gitweb: https://git.kernel.org/tip/fe99a4f4d6022ec92f9b52a5528cb9b77513e7d1
Author: Julia Cartwright 
AuthorDate: Tue, 12 Feb 2019 17:25:53 +0100
Committer:  Thomas Gleixner 
CommitDate: Thu, 28 Feb 2019 11:18:38 +0100

kthread: Convert worker lock to raw spinlock

In order to enable the queuing of kthread work items from hardirq context
even when PREEMPT_RT_FULL is enabled, convert the worker spin_lock to a
raw_spin_lock.

This is only acceptable to do because the work performed under the lock is
well-bounded and minimal.

Reported-by: Steffen Trumtrar 
Reported-by: Tim Sander 
Signed-off-by: Julia Cartwright 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Thomas Gleixner 
Tested-by: Steffen Trumtrar 
Reviewed-by: Petr Mladek 
Cc: Guenter Roeck 
Link: https://lkml.kernel.org/r/20190212162554.19779-1-bige...@linutronix.de

---
 include/linux/kthread.h |  4 ++--
 kernel/kthread.c| 42 +-
 2 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index c1961761311d..6b8c064f0cbc 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -85,7 +85,7 @@ enum {
 
 struct kthread_worker {
unsigned intflags;
-   spinlock_t  lock;
+   raw_spinlock_t  lock;
struct list_headwork_list;
struct list_headdelayed_work_list;
struct task_struct  *task;
@@ -106,7 +106,7 @@ struct kthread_delayed_work {
 };
 
 #define KTHREAD_WORKER_INIT(worker){   \
-   .lock = __SPIN_LOCK_UNLOCKED((worker).lock),\
+   .lock = __RAW_SPIN_LOCK_UNLOCKED((worker).lock),\
.work_list = LIST_HEAD_INIT((worker).work_list),\
.delayed_work_list = LIST_HEAD_INIT((worker).delayed_work_list),\
}
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 087d18d771b5..5641b55783a6 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -599,7 +599,7 @@ void __kthread_init_worker(struct kthread_worker *worker,
struct lock_class_key *key)
 {
memset(worker, 0, sizeof(struct kthread_worker));
-   spin_lock_init(>lock);
+   raw_spin_lock_init(>lock);
lockdep_set_class_and_name(>lock, key, name);
INIT_LIST_HEAD(>work_list);
INIT_LIST_HEAD(>delayed_work_list);
@@ -641,21 +641,21 @@ repeat:
 
if (kthread_should_stop()) {
__set_current_state(TASK_RUNNING);
-   spin_lock_irq(>lock);
+   raw_spin_lock_irq(>lock);
worker->task = NULL;
-   spin_unlock_irq(>lock);
+   raw_spin_unlock_irq(>lock);
return 0;
}
 
work = NULL;
-   spin_lock_irq(>lock);
+   raw_spin_lock_irq(>lock);
if (!list_empty(>work_list)) {
work = list_first_entry(>work_list,
struct kthread_work, node);
list_del_init(>node);
}
worker->current_work = work;
-   spin_unlock_irq(>lock);
+   raw_spin_unlock_irq(>lock);
 
if (work) {
__set_current_state(TASK_RUNNING);
@@ -812,12 +812,12 @@ bool kthread_queue_work(struct kthread_worker *worker,
bool ret = false;
unsigned long flags;
 
-   spin_lock_irqsave(>lock, flags);
+   raw_spin_lock_irqsave(>lock, flags);
if (!queuing_blocked(worker, work)) {
kthread_insert_work(worker, work, >work_list);
ret = true;
}
-   spin_unlock_irqrestore(>lock, flags);
+   raw_spin_unlock_irqrestore(>lock, flags);
return ret;
 }
 EXPORT_SYMBOL_GPL(kthread_queue_work);
@@ -843,7 +843,7 @@ void kthread_delayed_work_timer_fn(struct timer_list *t)
if (WARN_ON_ONCE(!worker))
return;
 
-   spin_lock(>lock);
+   raw_spin_lock(>lock);
/* Work must not be used with >1 worker, see kthread_queue_work(). */
WARN_ON_ONCE(work->worker != worker);
 
@@ -852,7 +852,7 @@ void kthread_delayed_work_timer_fn(struct timer_list *t)
list_del_init(>node);
kthread_insert_work(worker, work, >work_list);
 
-   spin_unlock(>lock);
+   raw_spin_unlock(>lock);
 }
 EXPORT_SYMBOL(kthread_delayed_work_timer_fn);
 
@@ -908,14 +908,14 @@ bool kthread_queue_delayed_work(struct kthread_worker 
*worker,
unsigned long flags;
bool ret = false;
 
-   spin_lock_irqsave(>lock, flags);
+   raw_spin_lock_irqsave(>lock, flags);
 
if (!queuing_blocked(worker, work)) {
__kthread_queue_delayed_work(worker, dwork, delay);
ret = true;
}
 
-   spin_unlock_irqrestore(>loc

Re: [PATCH] iommu/dmar: fix buffer overflow during PCI bus notification

2019-02-20 Thread Julia Cartwright
On Wed, Feb 20, 2019 at 10:46:31AM -0600, Julia Cartwright wrote:
> Commit 57384592c433 ("iommu/vt-d: Store bus information in RMRR PCI
> device path") changed the type of the path data, however, the change in
> path type was not reflected in size calculations.  Update to use the
> correct type and prevent a buffer overflow.
> 
> This bug manifests in systems with deep PCI hierarchies, and can lead to
> an overflow of the static allocated buffer (dmar_pci_notify_info_buf),
> or can lead to overflow of slab-allocated data.
> 
>BUG: KASAN: global-out-of-bounds in dmar_alloc_pci_notify_info+0x1d5/0x2e0
>Write of size 1 at addr 90445d80 by task swapper/0/1
>CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW   
> 4.14.87-rt49-02406-gd0a0e96 #1
>Call Trace:
> ? dump_stack+0x46/0x59
> ? print_address_description+0x1df/0x290
[..]
>The buggy address belongs to the variable:
> dmar_pci_notify_info_buf+0x40/0x60
> 
> Fixes: 57384592c433 ("iommu/vt-d: Store bus information in RMRR PCI device 
> path")
> Signed-off-by: Julia Cartwright 
> ---
>  drivers/iommu/dmar.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> index 6b7df25e1488..9c49300e9fb7 100644
> --- a/drivers/iommu/dmar.c
> +++ b/drivers/iommu/dmar.c
> @@ -145,7 +145,7 @@ dmar_alloc_pci_notify_info(struct pci_dev *dev, unsigned 
> long event)
>   for (tmp = dev; tmp; tmp = tmp->bus->self)
>   level++;
>  
> - size = sizeof(*info) + level * sizeof(struct acpi_dmar_pci_path);
> + size = sizeof(*info) + level * sizeof(info->path[0]);

This is probably a candidate for struct_size() instead, if that's what
is preferred.

   Julia


[PATCH] iommu/dmar: fix buffer overflow during PCI bus notification

2019-02-20 Thread Julia Cartwright
Commit 57384592c433 ("iommu/vt-d: Store bus information in RMRR PCI
device path") changed the type of the path data, however, the change in
path type was not reflected in size calculations.  Update to use the
correct type and prevent a buffer overflow.

This bug manifests in systems with deep PCI hierarchies, and can lead to
an overflow of the static allocated buffer (dmar_pci_notify_info_buf),
or can lead to overflow of slab-allocated data.

   BUG: KASAN: global-out-of-bounds in dmar_alloc_pci_notify_info+0x1d5/0x2e0
   Write of size 1 at addr 90445d80 by task swapper/0/1
   CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW   
4.14.87-rt49-02406-gd0a0e96 #1
   Call Trace:
? dump_stack+0x46/0x59
? print_address_description+0x1df/0x290
? dmar_alloc_pci_notify_info+0x1d5/0x2e0
? kasan_report+0x256/0x340
? dmar_alloc_pci_notify_info+0x1d5/0x2e0
? e820__memblock_setup+0xb0/0xb0
? dmar_dev_scope_init+0x424/0x48f
? __down_write_common+0x1ec/0x230
? dmar_dev_scope_init+0x48f/0x48f
? dmar_free_unused_resources+0x109/0x109
? cpumask_next+0x16/0x20
? __kmem_cache_create+0x392/0x430
? kmem_cache_create+0x135/0x2f0
? e820__memblock_setup+0xb0/0xb0
? intel_iommu_init+0x170/0x1848
? _raw_spin_unlock_irqrestore+0x32/0x60
? migrate_enable+0x27a/0x5b0
? sched_setattr+0x20/0x20
? migrate_disable+0x1fc/0x380
? task_rq_lock+0x170/0x170
? try_to_run_init_process+0x40/0x40
? locks_remove_file+0x85/0x2f0
? dev_prepare_static_identity_mapping+0x78/0x78
? rt_spin_unlock+0x39/0x50
? lockref_put_or_lock+0x2a/0x40
? dput+0x128/0x2f0
? __rcu_read_unlock+0x66/0x80
? __fput+0x250/0x300
? __rcu_read_lock+0x1b/0x30
? mntput_no_expire+0x38/0x290
? e820__memblock_setup+0xb0/0xb0
? pci_iommu_init+0x25/0x63
? pci_iommu_init+0x25/0x63
? do_one_initcall+0x7e/0x1c0
? initcall_blacklisted+0x120/0x120
? kernel_init_freeable+0x27b/0x307
? rest_init+0xd0/0xd0
? kernel_init+0xf/0x120
? rest_init+0xd0/0xd0
? ret_from_fork+0x1f/0x40
   The buggy address belongs to the variable:
dmar_pci_notify_info_buf+0x40/0x60

Fixes: 57384592c433 ("iommu/vt-d: Store bus information in RMRR PCI device 
path")
Signed-off-by: Julia Cartwright 
---
 drivers/iommu/dmar.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 6b7df25e1488..9c49300e9fb7 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -145,7 +145,7 @@ dmar_alloc_pci_notify_info(struct pci_dev *dev, unsigned 
long event)
for (tmp = dev; tmp; tmp = tmp->bus->self)
level++;
 
-   size = sizeof(*info) + level * sizeof(struct acpi_dmar_pci_path);
+   size = sizeof(*info) + level * sizeof(info->path[0]);
if (size <= sizeof(dmar_pci_notify_info_buf)) {
info = (struct dmar_pci_notify_info *)dmar_pci_notify_info_buf;
} else {
-- 
2.20.1



Re: [RFC PATCH] arm64/fpsimd: Don't disable softirq when touching FPSIMD/SVE state

2019-02-12 Thread Julia Cartwright
Hello Julien-

On Fri, Feb 08, 2019 at 04:55:13PM +, Julien Grall wrote:
> When the kernel is compiled with CONFIG_KERNEL_MODE_NEON, some part of
> the kernel may be able to use FPSIMD/SVE. This is for instance the case
> for crypto code.
> 
> Any use of FPSIMD/SVE in the kernel are clearly marked by using the
> function kernel_neon_{begin, end}. Furthermore, this can only be used
> when may_use_simd() returns true.
> 
> The current implementation of may_use_simd() allows softirq to use
> FPSIMD/SVE unless it is currently in used (i.e kernel_neon_busy is true).
> When in used, softirqs usually fallback to a software method.
> 
> At the moment, as a softirq may use FPSIMD/SVE, softirqs are disabled
> when touching the FPSIMD/SVE context. This has the drawback to disable
> all softirqs even if they are not using FPSIMD/SVE.
> 
> As a softirq should not rely on been able to use simd at a given time,
> there are limited reason to keep softirq disabled when touching the
> FPSIMD/SVE context. Instead, we can only disable preemption and tell
> the NEON unit is currently in use.
> 
> This patch introduces two new helpers kernel_neon_{disable, enable} to
> mark the area using FPSIMD/SVE context and use them in replacement of
> local_bh_{disable, enable}. The functions kernel_neon_{begin, end} are
> also re-implemented to use the new helpers.
> 
> Signed-off-by: Julien Grall 
> 
> ---
> 
> I have been exploring this solution as an alternative approach to the RT
> patch "arm64: fpsimd: use preemp_disable in addition to local_bh_disable()".
> 
> So far, the patch has only been lightly tested.
> 
> For RT-linux, it might be possible to use migrate_{enable, disable}. I
> am quite new with RT and have some trouble to understand the semantics
> of migrate_{enable, disable}. So far, I am still unsure if it is possible
> to run another userspace task on the same CPU while getting preempted
> when the migration is disabled.

Yes, a thread in a migrate_disable()-protected critical section can be
preempted, and another thread on the same CPU may enter the critical
section.

If it's necessary to prevent this from happening, then you need to also
make use of a per-CPU mutex.  On RT, this would do the right thing
w.r.t. priority inheritance.

   Julia


Re: Re: [PATCH v3 1/3] KVM: arm/arm64: vgic: Make vgic_irq->irq_lock a raw_spinlock

2019-02-01 Thread Julia Cartwright
On Fri, Feb 01, 2019 at 03:30:58PM +, Julien Grall wrote:
> Hi Julien,
> 
> On 07/01/2019 15:06, Julien Thierry wrote:
> > vgic_irq->irq_lock must always be taken with interrupts disabled as
> > it is used in interrupt context.
> 
> I am a bit confused with the reason here. The code mention that ap_list_lock
> could be taken from the timer interrupt handler interrupt. I assume it
> speaks about the handler kvm_arch_timer_handler. Looking at the
> configuration of the interrupt, the flag IRQF_NO_THREAD is not set, so the
> interrupt should be threaded when CONFIG_PREEMPT_FULL is set. If my
> understanding is correct, this means the interrupt thread would sleep if it
> takes the spinlock.
> 
> Did I miss anything? Do you have an exact path where the vGIC is actually
> called from an interrupt context?

The part you're missing is that percpu interrupts are not force
threaded:

static int irq_setup_forced_threading(struct irqaction *new)
{
if (!force_irqthreads)
return 0;
if (new->flags & (IRQF_NO_THREAD | IRQF_PERCPU | IRQF_ONESHOT))
return 0;

/* ...*/
}

   Julia


[ANNOUNCE] 4.9.146-rt125

2018-12-20 Thread Julia Cartwright
Hello RT Folks!

I'm pleased to announce the 4.9.146-rt125 stable release.

Apologies for an update to the 4.9-rt tree being way overdue.

This release is just an update to the new stable 4.9.146 version
and no RT specific changes have been made.

You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v4.9-rt
  Head SHA1: 5b5e350d8c82f5dcbad136aeede6b73d41cd1e8a

Or to build 4.9.146-rt125 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v4.x/linux-4.9.tar.xz

  http://www.kernel.org/pub/linux/kernel/v4.x/patch-4.9.146.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/4.9/patch-4.9.146-rt125.patch.xz

Enjoy, and happy holidays,
Julia



[PATCH 0/2] Fix watchdogd wakeup deferral on RT

2018-09-28 Thread Julia Cartwright
The following two patches solve an issue reported by Steffen Trumtrar
and Tim Sander to the linux-rt-users mailing list.  Namely, the wakeup
of the watchdogd thread being starved by an RT task due to the hrtimer
deferral through ktimersoftd (on PREEMPT_RT_FULL).

The first patch adjusts the kthread_worker locking to make use of a raw
spinlock, making it suitable for work item queueing from hardirq context
on PREEMPT_RT.  This patch can be applied directly to mainline without
any functional change.

The second patch adjusts the hrtimer used by the watchdog core to be a
_HARD timer (and thus not deferred through ktimersoftd w/ PREEMPT_RT).
This patch depends on hrtimer patches carried in the RT patch, and so
should therefore land there.

Cc: Guenter Roeck 
Cc: Steffen Trumtrar 
Cc: Tim Sander 

Julia Cartwright (2):
  kthread: convert worker lock to raw spinlock
  watchdog, rt: prevent deferral of watchdogd wakeup

 drivers/watchdog/watchdog_dev.c |  2 +-
 include/linux/kthread.h |  2 +-
 kernel/kthread.c| 42 -
 3 files changed, 23 insertions(+), 23 deletions(-)

-- 
2.18.0



[PATCH 1/2] kthread: convert worker lock to raw spinlock

2018-09-28 Thread Julia Cartwright
In order to enable the queuing of kthread work items from hardirq
context even when PREEMPT_RT_FULL is enabled, convert the worker
spin_lock to a raw_spin_lock.

This is only acceptable to do because the work performed under the lock
is well-bounded and minimal.

Cc: Sebastian Andrzej Siewior 
Cc: Guenter Roeck 
Reported-and-tested-by: Steffen Trumtrar 
Reported-by: Tim Sander 
Signed-off-by: Julia Cartwright 
---
 include/linux/kthread.h |  2 +-
 kernel/kthread.c| 42 -
 2 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index c1961761311d..ad292898f7f2 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -85,7 +85,7 @@ enum {
 
 struct kthread_worker {
unsigned intflags;
-   spinlock_t  lock;
+   raw_spinlock_t  lock;
struct list_headwork_list;
struct list_headdelayed_work_list;
struct task_struct  *task;
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 486dedbd9af5..c1d9ee6671c6 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -597,7 +597,7 @@ void __kthread_init_worker(struct kthread_worker *worker,
struct lock_class_key *key)
 {
memset(worker, 0, sizeof(struct kthread_worker));
-   spin_lock_init(>lock);
+   raw_spin_lock_init(>lock);
lockdep_set_class_and_name(>lock, key, name);
INIT_LIST_HEAD(>work_list);
INIT_LIST_HEAD(>delayed_work_list);
@@ -639,21 +639,21 @@ int kthread_worker_fn(void *worker_ptr)
 
if (kthread_should_stop()) {
__set_current_state(TASK_RUNNING);
-   spin_lock_irq(>lock);
+   raw_spin_lock_irq(>lock);
worker->task = NULL;
-   spin_unlock_irq(>lock);
+   raw_spin_unlock_irq(>lock);
return 0;
}
 
work = NULL;
-   spin_lock_irq(>lock);
+   raw_spin_lock_irq(>lock);
if (!list_empty(>work_list)) {
work = list_first_entry(>work_list,
struct kthread_work, node);
list_del_init(>node);
}
worker->current_work = work;
-   spin_unlock_irq(>lock);
+   raw_spin_unlock_irq(>lock);
 
if (work) {
__set_current_state(TASK_RUNNING);
@@ -810,12 +810,12 @@ bool kthread_queue_work(struct kthread_worker *worker,
bool ret = false;
unsigned long flags;
 
-   spin_lock_irqsave(>lock, flags);
+   raw_spin_lock_irqsave(>lock, flags);
if (!queuing_blocked(worker, work)) {
kthread_insert_work(worker, work, >work_list);
ret = true;
}
-   spin_unlock_irqrestore(>lock, flags);
+   raw_spin_unlock_irqrestore(>lock, flags);
return ret;
 }
 EXPORT_SYMBOL_GPL(kthread_queue_work);
@@ -841,7 +841,7 @@ void kthread_delayed_work_timer_fn(struct timer_list *t)
if (WARN_ON_ONCE(!worker))
return;
 
-   spin_lock(>lock);
+   raw_spin_lock(>lock);
/* Work must not be used with >1 worker, see kthread_queue_work(). */
WARN_ON_ONCE(work->worker != worker);
 
@@ -850,7 +850,7 @@ void kthread_delayed_work_timer_fn(struct timer_list *t)
list_del_init(>node);
kthread_insert_work(worker, work, >work_list);
 
-   spin_unlock(>lock);
+   raw_spin_unlock(>lock);
 }
 EXPORT_SYMBOL(kthread_delayed_work_timer_fn);
 
@@ -906,14 +906,14 @@ bool kthread_queue_delayed_work(struct kthread_worker 
*worker,
unsigned long flags;
bool ret = false;
 
-   spin_lock_irqsave(>lock, flags);
+   raw_spin_lock_irqsave(>lock, flags);
 
if (!queuing_blocked(worker, work)) {
__kthread_queue_delayed_work(worker, dwork, delay);
ret = true;
}
 
-   spin_unlock_irqrestore(>lock, flags);
+   raw_spin_unlock_irqrestore(>lock, flags);
return ret;
 }
 EXPORT_SYMBOL_GPL(kthread_queue_delayed_work);
@@ -949,7 +949,7 @@ void kthread_flush_work(struct kthread_work *work)
if (!worker)
return;
 
-   spin_lock_irq(>lock);
+   raw_spin_lock_irq(>lock);
/* Work must not be used with >1 worker, see kthread_queue_work(). */
WARN_ON_ONCE(work->worker != worker);
 
@@ -961,7 +961,7 @@ void kthread_flush_work(struct kthread_work *work)
else
noop = true;
 
-   spin_unlock_irq(>lock);
+   raw_spin_unlock_irq(>lock);
 
if (!noop)
wait_for_completion();
@@ -994,9 +994,9 @@ static bool __kthread_cancel_work(struct kthread_work 
*work, bool is_dwork,
 * any queuing is blocked by setting the canceling counter.
 */
work-&g

[PATCH 0/2] Fix watchdogd wakeup deferral on RT

2018-09-28 Thread Julia Cartwright
The following two patches solve an issue reported by Steffen Trumtrar
and Tim Sander to the linux-rt-users mailing list.  Namely, the wakeup
of the watchdogd thread being starved by an RT task due to the hrtimer
deferral through ktimersoftd (on PREEMPT_RT_FULL).

The first patch adjusts the kthread_worker locking to make use of a raw
spinlock, making it suitable for work item queueing from hardirq context
on PREEMPT_RT.  This patch can be applied directly to mainline without
any functional change.

The second patch adjusts the hrtimer used by the watchdog core to be a
_HARD timer (and thus not deferred through ktimersoftd w/ PREEMPT_RT).
This patch depends on hrtimer patches carried in the RT patch, and so
should therefore land there.

Cc: Guenter Roeck 
Cc: Steffen Trumtrar 
Cc: Tim Sander 

Julia Cartwright (2):
  kthread: convert worker lock to raw spinlock
  watchdog, rt: prevent deferral of watchdogd wakeup

 drivers/watchdog/watchdog_dev.c |  2 +-
 include/linux/kthread.h |  2 +-
 kernel/kthread.c| 42 -
 3 files changed, 23 insertions(+), 23 deletions(-)

-- 
2.18.0



[PATCH 1/2] kthread: convert worker lock to raw spinlock

2018-09-28 Thread Julia Cartwright
In order to enable the queuing of kthread work items from hardirq
context even when PREEMPT_RT_FULL is enabled, convert the worker
spin_lock to a raw_spin_lock.

This is only acceptable to do because the work performed under the lock
is well-bounded and minimal.

Cc: Sebastian Andrzej Siewior 
Cc: Guenter Roeck 
Reported-and-tested-by: Steffen Trumtrar 
Reported-by: Tim Sander 
Signed-off-by: Julia Cartwright 
---
 include/linux/kthread.h |  2 +-
 kernel/kthread.c| 42 -
 2 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index c1961761311d..ad292898f7f2 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -85,7 +85,7 @@ enum {
 
 struct kthread_worker {
unsigned intflags;
-   spinlock_t  lock;
+   raw_spinlock_t  lock;
struct list_headwork_list;
struct list_headdelayed_work_list;
struct task_struct  *task;
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 486dedbd9af5..c1d9ee6671c6 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -597,7 +597,7 @@ void __kthread_init_worker(struct kthread_worker *worker,
struct lock_class_key *key)
 {
memset(worker, 0, sizeof(struct kthread_worker));
-   spin_lock_init(>lock);
+   raw_spin_lock_init(>lock);
lockdep_set_class_and_name(>lock, key, name);
INIT_LIST_HEAD(>work_list);
INIT_LIST_HEAD(>delayed_work_list);
@@ -639,21 +639,21 @@ int kthread_worker_fn(void *worker_ptr)
 
if (kthread_should_stop()) {
__set_current_state(TASK_RUNNING);
-   spin_lock_irq(>lock);
+   raw_spin_lock_irq(>lock);
worker->task = NULL;
-   spin_unlock_irq(>lock);
+   raw_spin_unlock_irq(>lock);
return 0;
}
 
work = NULL;
-   spin_lock_irq(>lock);
+   raw_spin_lock_irq(>lock);
if (!list_empty(>work_list)) {
work = list_first_entry(>work_list,
struct kthread_work, node);
list_del_init(>node);
}
worker->current_work = work;
-   spin_unlock_irq(>lock);
+   raw_spin_unlock_irq(>lock);
 
if (work) {
__set_current_state(TASK_RUNNING);
@@ -810,12 +810,12 @@ bool kthread_queue_work(struct kthread_worker *worker,
bool ret = false;
unsigned long flags;
 
-   spin_lock_irqsave(>lock, flags);
+   raw_spin_lock_irqsave(>lock, flags);
if (!queuing_blocked(worker, work)) {
kthread_insert_work(worker, work, >work_list);
ret = true;
}
-   spin_unlock_irqrestore(>lock, flags);
+   raw_spin_unlock_irqrestore(>lock, flags);
return ret;
 }
 EXPORT_SYMBOL_GPL(kthread_queue_work);
@@ -841,7 +841,7 @@ void kthread_delayed_work_timer_fn(struct timer_list *t)
if (WARN_ON_ONCE(!worker))
return;
 
-   spin_lock(>lock);
+   raw_spin_lock(>lock);
/* Work must not be used with >1 worker, see kthread_queue_work(). */
WARN_ON_ONCE(work->worker != worker);
 
@@ -850,7 +850,7 @@ void kthread_delayed_work_timer_fn(struct timer_list *t)
list_del_init(>node);
kthread_insert_work(worker, work, >work_list);
 
-   spin_unlock(>lock);
+   raw_spin_unlock(>lock);
 }
 EXPORT_SYMBOL(kthread_delayed_work_timer_fn);
 
@@ -906,14 +906,14 @@ bool kthread_queue_delayed_work(struct kthread_worker 
*worker,
unsigned long flags;
bool ret = false;
 
-   spin_lock_irqsave(>lock, flags);
+   raw_spin_lock_irqsave(>lock, flags);
 
if (!queuing_blocked(worker, work)) {
__kthread_queue_delayed_work(worker, dwork, delay);
ret = true;
}
 
-   spin_unlock_irqrestore(>lock, flags);
+   raw_spin_unlock_irqrestore(>lock, flags);
return ret;
 }
 EXPORT_SYMBOL_GPL(kthread_queue_delayed_work);
@@ -949,7 +949,7 @@ void kthread_flush_work(struct kthread_work *work)
if (!worker)
return;
 
-   spin_lock_irq(>lock);
+   raw_spin_lock_irq(>lock);
/* Work must not be used with >1 worker, see kthread_queue_work(). */
WARN_ON_ONCE(work->worker != worker);
 
@@ -961,7 +961,7 @@ void kthread_flush_work(struct kthread_work *work)
else
noop = true;
 
-   spin_unlock_irq(>lock);
+   raw_spin_unlock_irq(>lock);
 
if (!noop)
wait_for_completion();
@@ -994,9 +994,9 @@ static bool __kthread_cancel_work(struct kthread_work 
*work, bool is_dwork,
 * any queuing is blocked by setting the canceling counter.
 */
work-&g

[PATCH RT 2/2] watchdog, rt: prevent deferral of watchdogd wakeup

2018-09-28 Thread Julia Cartwright
When PREEMPT_RT_FULL is enabled, all hrtimer expiry functions are
deferred for execution into the context of ktimersoftd unless otherwise
annotated.

Deferring the expiry of the hrtimer used by the watchdog core, however,
is a waste, as the callback does nothing but queue a kthread work item
and wakeup watchdogd.

It's worst then that, too: the deferral through ktimersoftd also means
that for correct behavior a user must adjust the scheduling parameters
of both watchdogd _and_ ktimersoftd, which is unnecessary and has other
side effects (like causing unrelated expiry functions to execute at
potentially elevated priority).

Instead, mark the hrtimer used by the watchdog core as being _HARD to
allow it's execution directly from hardirq context.  The work done in
this expiry function is well-bounded and minimal.

A user still must adjust the scheduling parameters of the watchdogd
to be correct w.r.t. their application needs.

Cc: Guenter Roeck 
Reported-and-tested-by: Steffen Trumtrar 
Reported-by: Tim Sander 
Signed-off-by: Julia Cartwright 
---
 drivers/watchdog/watchdog_dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
index ffbdc4642ea5..9c2b3e5cebdc 100644
--- a/drivers/watchdog/watchdog_dev.c
+++ b/drivers/watchdog/watchdog_dev.c
@@ -945,7 +945,7 @@ static int watchdog_cdev_register(struct watchdog_device 
*wdd, dev_t devno)
return -ENODEV;
 
kthread_init_work(_data->work, watchdog_ping_work);
-   hrtimer_init(_data->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+   hrtimer_init(_data->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_HARD);
wd_data->timer.function = watchdog_timer_expired;
 
if (wdd->id == 0) {
-- 
2.18.0



[PATCH RT 2/2] watchdog, rt: prevent deferral of watchdogd wakeup

2018-09-28 Thread Julia Cartwright
When PREEMPT_RT_FULL is enabled, all hrtimer expiry functions are
deferred for execution into the context of ktimersoftd unless otherwise
annotated.

Deferring the expiry of the hrtimer used by the watchdog core, however,
is a waste, as the callback does nothing but queue a kthread work item
and wakeup watchdogd.

It's worst then that, too: the deferral through ktimersoftd also means
that for correct behavior a user must adjust the scheduling parameters
of both watchdogd _and_ ktimersoftd, which is unnecessary and has other
side effects (like causing unrelated expiry functions to execute at
potentially elevated priority).

Instead, mark the hrtimer used by the watchdog core as being _HARD to
allow it's execution directly from hardirq context.  The work done in
this expiry function is well-bounded and minimal.

A user still must adjust the scheduling parameters of the watchdogd
to be correct w.r.t. their application needs.

Cc: Guenter Roeck 
Reported-and-tested-by: Steffen Trumtrar 
Reported-by: Tim Sander 
Signed-off-by: Julia Cartwright 
---
 drivers/watchdog/watchdog_dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
index ffbdc4642ea5..9c2b3e5cebdc 100644
--- a/drivers/watchdog/watchdog_dev.c
+++ b/drivers/watchdog/watchdog_dev.c
@@ -945,7 +945,7 @@ static int watchdog_cdev_register(struct watchdog_device 
*wdd, dev_t devno)
return -ENODEV;
 
kthread_init_work(_data->work, watchdog_ping_work);
-   hrtimer_init(_data->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+   hrtimer_init(_data->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_HARD);
wd_data->timer.function = watchdog_timer_expired;
 
if (wdd->id == 0) {
-- 
2.18.0



Re: [ANNOUNCE] Submit a topic for the RT Microconference at Linux Plumbers

2018-09-07 Thread Julia Cartwright
Hello all-

On Tue, Sep 04, 2018 at 01:25:29PM -0400, Steven Rostedt wrote:
> Hi RT folks!
>
> The call for proposals (CfP) is now open for the RT Microconference at
> Linux Plumbers in Vancouver, Canada. The topics we are looking at this
> year are:
>
>  - How will PREEMPT_RT be maintained (is it a subsystem?)
>
>  - Who will maintain it?
>
>  - How do we teach the rest of the kernel developers how not to break 
> PREEMPT_RT?
>
>  - How do we catch when it breaks?
>
>  - What tools can we add into tools/ that other kernel developers can use to 
> test and learn about PREEMPT_RT?
>
>  - What tests can we add into tools/testing/selftests?
>
>  - What kernel boot selftests can be added?
>
>  - How can we modify lockdep to catch problems that can break PREEMPT_RT?
>
>  - Discuss various types of failures that can happen with PREEMPT_RT that 
> normally would not happen in the vanilla kernel.
>
>  - How do we handle stable backports?
>
>  - Interaction with Safety Critical domains?
>
>  - Interrupt threads are RT and are not protected by the RT Throttling.
>How can we prevent interrupt thread starvation from a rogue RT task?
>
>  - How to determine if a kernel is booted with PREEMPT_RT enabled?
>
>  - librtpi: "solving" long-standing glibc pthread condvar internal lock 
> inversion with a new implementation; motivations, technical details, etc.
>
>  - SCHED_DEADLINE - Lots of research has been made to various issues like CPU 
> affinity, Bandwidth inheritance, cgroup support; but nothing has been done to 
> the kernel. Let's make a commitment and push these ideas forward into 
> mainline.
>
>  - How can we modify lockdep to catch problems that can break PREEMPT_RT?
>
>  - Discuss various types of failures that can happen with PREEMPT_RT that 
> normally would not happen in the vanilla kernel.

There was some confusion as to whether or not a topic being listed above
automatically meant that it was to be included in the RT Microconference
agenda.

The answer is: NO.  If you would like to see a specific topic discussed,
either from this list or otherwise, you must submit it via the portal
listed below.  Only topics submitted to the portal will be considered
for the RT Microconference agenda.

Note also that submitting a topic will also mean that you will be
responsible for introducing the necessary context to the group to
facilitate further discussion.

> If you want to lead one of the above sessions, or if you have a new
> session you want to submit, please do so here:
>
>   https://linuxplumbersconf.org/event/2/abstracts/
>
> Set up an account, and then click "Submit new proposal".
>
> Fill out the "Title", "Content", select the "Authors" as the one leading
> the session (yourself), and then for "Track" select "RT MC Topics CfP".
> Select "Yes" to agree with the anti-harassment policy, and then hit
> Submit.

Thanks, and see you there!

   Julia


Re: [ANNOUNCE] Submit a topic for the RT Microconference at Linux Plumbers

2018-09-07 Thread Julia Cartwright
Hello all-

On Tue, Sep 04, 2018 at 01:25:29PM -0400, Steven Rostedt wrote:
> Hi RT folks!
>
> The call for proposals (CfP) is now open for the RT Microconference at
> Linux Plumbers in Vancouver, Canada. The topics we are looking at this
> year are:
>
>  - How will PREEMPT_RT be maintained (is it a subsystem?)
>
>  - Who will maintain it?
>
>  - How do we teach the rest of the kernel developers how not to break 
> PREEMPT_RT?
>
>  - How do we catch when it breaks?
>
>  - What tools can we add into tools/ that other kernel developers can use to 
> test and learn about PREEMPT_RT?
>
>  - What tests can we add into tools/testing/selftests?
>
>  - What kernel boot selftests can be added?
>
>  - How can we modify lockdep to catch problems that can break PREEMPT_RT?
>
>  - Discuss various types of failures that can happen with PREEMPT_RT that 
> normally would not happen in the vanilla kernel.
>
>  - How do we handle stable backports?
>
>  - Interaction with Safety Critical domains?
>
>  - Interrupt threads are RT and are not protected by the RT Throttling.
>How can we prevent interrupt thread starvation from a rogue RT task?
>
>  - How to determine if a kernel is booted with PREEMPT_RT enabled?
>
>  - librtpi: "solving" long-standing glibc pthread condvar internal lock 
> inversion with a new implementation; motivations, technical details, etc.
>
>  - SCHED_DEADLINE - Lots of research has been made to various issues like CPU 
> affinity, Bandwidth inheritance, cgroup support; but nothing has been done to 
> the kernel. Let's make a commitment and push these ideas forward into 
> mainline.
>
>  - How can we modify lockdep to catch problems that can break PREEMPT_RT?
>
>  - Discuss various types of failures that can happen with PREEMPT_RT that 
> normally would not happen in the vanilla kernel.

There was some confusion as to whether or not a topic being listed above
automatically meant that it was to be included in the RT Microconference
agenda.

The answer is: NO.  If you would like to see a specific topic discussed,
either from this list or otherwise, you must submit it via the portal
listed below.  Only topics submitted to the portal will be considered
for the RT Microconference agenda.

Note also that submitting a topic will also mean that you will be
responsible for introducing the necessary context to the group to
facilitate further discussion.

> If you want to lead one of the above sessions, or if you have a new
> session you want to submit, please do so here:
>
>   https://linuxplumbersconf.org/event/2/abstracts/
>
> Set up an account, and then click "Submit new proposal".
>
> Fill out the "Title", "Content", select the "Authors" as the one leading
> the session (yourself), and then for "Track" select "RT MC Topics CfP".
> Select "Yes" to agree with the anti-harassment policy, and then hit
> Submit.

Thanks, and see you there!

   Julia


[PATCH RT 11/22] mm/slub: close possible memory-leak in kmem_cache_alloc_bulk()

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
Under certain circumstances we could leak elements which were moved to
the local "to_free" list. The damage is limited since I can't find
any users here.

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit 5022166d3b225bf5e343efb3ea01b3c5a41d69ba)
Signed-off-by: Julia Cartwright 
---
 mm/slub.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/slub.c b/mm/slub.c
index 67eb368b9314..738b2bccbd5f 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3209,6 +3209,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t 
flags, size_t size,
return i;
 error:
local_irq_enable();
+   free_delayed(_free);
slab_post_alloc_hook(s, flags, i, p);
__kmem_cache_free_bulk(s, i, p);
return 0;
-- 
2.18.0



[PATCH RT 11/22] mm/slub: close possible memory-leak in kmem_cache_alloc_bulk()

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
Under certain circumstances we could leak elements which were moved to
the local "to_free" list. The damage is limited since I can't find
any users here.

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit 5022166d3b225bf5e343efb3ea01b3c5a41d69ba)
Signed-off-by: Julia Cartwright 
---
 mm/slub.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/slub.c b/mm/slub.c
index 67eb368b9314..738b2bccbd5f 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3209,6 +3209,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t 
flags, size_t size,
return i;
 error:
local_irq_enable();
+   free_delayed(_free);
slab_post_alloc_hook(s, flags, i, p);
__kmem_cache_free_bulk(s, i, p);
return 0;
-- 
2.18.0



[ANNOUNCE] 4.9.115-rt93

2018-08-06 Thread Julia Cartwright
Hello RT Folks!

I'm pleased to announce the 4.9.115-rt93 stable release.

This release is just an update to the new stable 4.9.115 version
and no RT specific changes have been made.

You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v4.9-rt
  Head SHA1: 236d20fd941a444577579ba398ba9907f3b33277

Or to build 4.9.115-rt93 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v4.x/linux-4.9.tar.xz

  http://www.kernel.org/pub/linux/kernel/v4.x/patch-4.9.115.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/4.9/patch-4.9.115-rt93.patch.xz

Enjoy!
Julia



[ANNOUNCE] 4.9.115-rt93

2018-08-06 Thread Julia Cartwright
Hello RT Folks!

I'm pleased to announce the 4.9.115-rt93 stable release.

This release is just an update to the new stable 4.9.115 version
and no RT specific changes have been made.

You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v4.9-rt
  Head SHA1: 236d20fd941a444577579ba398ba9907f3b33277

Or to build 4.9.115-rt93 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v4.x/linux-4.9.tar.xz

  http://www.kernel.org/pub/linux/kernel/v4.x/patch-4.9.115.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/4.9/patch-4.9.115-rt93.patch.xz

Enjoy!
Julia



[PATCH RT 10/22] arm*: disable NEON in kernel mode

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
NEON in kernel mode is used by the crypto algorithms and raid6 code.
While the raid6 code looks okay, the crypto algorithms do not: NEON
is enabled on first invocation and may allocate/free/map memory before
the NEON mode is disabled again.
This needs to be changed until it can be enabled.
On ARM NEON in kernel mode can be simply disabled. on ARM64 it needs to
stay on due to possible EFI callbacks so here I disable each algorithm.

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit b3a776555e0d465df138d254d6dc3ac1b718ac6d)
Signed-off-by: Julia Cartwright 
---
 arch/arm/Kconfig  |  2 +-
 arch/arm64/crypto/Kconfig | 14 +++---
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 5715844e83e3..8c40f7d73251 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -2158,7 +2158,7 @@ config NEON
 
 config KERNEL_MODE_NEON
bool "Support for NEON in kernel mode"
-   depends on NEON && AEABI
+   depends on NEON && AEABI && !PREEMPT_RT_BASE
help
  Say Y to include support for NEON in kernel mode.
 
diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 2cf32e9887e1..cd71b3432720 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -10,41 +10,41 @@ if ARM64_CRYPTO
 
 config CRYPTO_SHA1_ARM64_CE
tristate "SHA-1 digest algorithm (ARMv8 Crypto Extensions)"
-   depends on ARM64 && KERNEL_MODE_NEON
+   depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE
select CRYPTO_HASH
 
 config CRYPTO_SHA2_ARM64_CE
tristate "SHA-224/SHA-256 digest algorithm (ARMv8 Crypto Extensions)"
-   depends on ARM64 && KERNEL_MODE_NEON
+   depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE
select CRYPTO_HASH
 
 config CRYPTO_GHASH_ARM64_CE
tristate "GHASH (for GCM chaining mode) using ARMv8 Crypto Extensions"
-   depends on ARM64 && KERNEL_MODE_NEON
+   depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE
select CRYPTO_HASH
 
 config CRYPTO_AES_ARM64_CE
tristate "AES core cipher using ARMv8 Crypto Extensions"
-   depends on ARM64 && KERNEL_MODE_NEON
+   depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE
select CRYPTO_ALGAPI
 
 config CRYPTO_AES_ARM64_CE_CCM
tristate "AES in CCM mode using ARMv8 Crypto Extensions"
-   depends on ARM64 && KERNEL_MODE_NEON
+   depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE
select CRYPTO_ALGAPI
select CRYPTO_AES_ARM64_CE
select CRYPTO_AEAD
 
 config CRYPTO_AES_ARM64_CE_BLK
tristate "AES in ECB/CBC/CTR/XTS modes using ARMv8 Crypto Extensions"
-   depends on ARM64 && KERNEL_MODE_NEON
+   depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE
select CRYPTO_BLKCIPHER
select CRYPTO_AES_ARM64_CE
select CRYPTO_ABLK_HELPER
 
 config CRYPTO_AES_ARM64_NEON_BLK
tristate "AES in ECB/CBC/CTR/XTS modes using NEON instructions"
-   depends on ARM64 && KERNEL_MODE_NEON
+   depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE
select CRYPTO_BLKCIPHER
select CRYPTO_AES
select CRYPTO_ABLK_HELPER
-- 
2.18.0



[PATCH RT 17/22] alarmtimer: Prevent live lock in alarm_cancel()

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
If alarm_try_to_cancel() requires a retry, then depending on the
priority setting the retry loop might prevent timer callback completion
on RT. Prevent that by waiting for completion on RT, no change for a
non RT kernel.

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit 51e376c469bf05f32cb1ceb9e39d31bb92f1f6c8)
Signed-off-by: Julia Cartwright 
---
 kernel/time/alarmtimer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index d67ef56ca9bc..61b20a656863 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -407,7 +407,7 @@ int alarm_cancel(struct alarm *alarm)
int ret = alarm_try_to_cancel(alarm);
if (ret >= 0)
return ret;
-   cpu_relax();
+   hrtimer_wait_for_timer(>timer);
}
 }
 EXPORT_SYMBOL_GPL(alarm_cancel);
-- 
2.18.0



[PATCH RT 18/22] posix-timers: move the rcu head out of the union

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
On RT the timer can be preempted while running and therefore we wait
with timer_wait_for_callback() for the timer to complete (instead of
busy looping). The RCU-readlock is held to ensure that this posix timer
is not removed while we wait on it.
If the timer is removed then it invokes call_rcu() with a pointer that
is shared with the hrtimer because it is part of the same union.
In order to avoid any possible side effects I am moving the rcu pointer
out of the union.

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit b8401365af110949f12c7cf1fa86b4c0ea069bbd)
Signed-off-by: Julia Cartwright 
---
 include/linux/posix-timers.h | 2 +-
 kernel/time/posix-timers.c   | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h
index 62d44c176071..cbd3f9334543 100644
--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -92,8 +92,8 @@ struct k_itimer {
struct alarm alarmtimer;
ktime_t interval;
} alarm;
-   struct rcu_head rcu;
} it;
+   struct rcu_head rcu;
 };
 
 struct k_clock {
diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index 6084618436fd..45d8033caec4 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -568,7 +568,7 @@ static struct k_itimer * alloc_posix_timer(void)
 
 static void k_itimer_rcu_free(struct rcu_head *head)
 {
-   struct k_itimer *tmr = container_of(head, struct k_itimer, it.rcu);
+   struct k_itimer *tmr = container_of(head, struct k_itimer, rcu);
 
kmem_cache_free(posix_timers_cache, tmr);
 }
@@ -585,7 +585,7 @@ static void release_posix_timer(struct k_itimer *tmr, int 
it_id_set)
}
put_pid(tmr->it_pid);
sigqueue_free(tmr->sigq);
-   call_rcu(>it.rcu, k_itimer_rcu_free);
+   call_rcu(>rcu, k_itimer_rcu_free);
 }
 
 static struct k_clock *clockid_to_kclock(const clockid_t id)
-- 
2.18.0



[PATCH RT 10/22] arm*: disable NEON in kernel mode

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
NEON in kernel mode is used by the crypto algorithms and raid6 code.
While the raid6 code looks okay, the crypto algorithms do not: NEON
is enabled on first invocation and may allocate/free/map memory before
the NEON mode is disabled again.
This needs to be changed until it can be enabled.
On ARM NEON in kernel mode can be simply disabled. on ARM64 it needs to
stay on due to possible EFI callbacks so here I disable each algorithm.

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit b3a776555e0d465df138d254d6dc3ac1b718ac6d)
Signed-off-by: Julia Cartwright 
---
 arch/arm/Kconfig  |  2 +-
 arch/arm64/crypto/Kconfig | 14 +++---
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 5715844e83e3..8c40f7d73251 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -2158,7 +2158,7 @@ config NEON
 
 config KERNEL_MODE_NEON
bool "Support for NEON in kernel mode"
-   depends on NEON && AEABI
+   depends on NEON && AEABI && !PREEMPT_RT_BASE
help
  Say Y to include support for NEON in kernel mode.
 
diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 2cf32e9887e1..cd71b3432720 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -10,41 +10,41 @@ if ARM64_CRYPTO
 
 config CRYPTO_SHA1_ARM64_CE
tristate "SHA-1 digest algorithm (ARMv8 Crypto Extensions)"
-   depends on ARM64 && KERNEL_MODE_NEON
+   depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE
select CRYPTO_HASH
 
 config CRYPTO_SHA2_ARM64_CE
tristate "SHA-224/SHA-256 digest algorithm (ARMv8 Crypto Extensions)"
-   depends on ARM64 && KERNEL_MODE_NEON
+   depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE
select CRYPTO_HASH
 
 config CRYPTO_GHASH_ARM64_CE
tristate "GHASH (for GCM chaining mode) using ARMv8 Crypto Extensions"
-   depends on ARM64 && KERNEL_MODE_NEON
+   depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE
select CRYPTO_HASH
 
 config CRYPTO_AES_ARM64_CE
tristate "AES core cipher using ARMv8 Crypto Extensions"
-   depends on ARM64 && KERNEL_MODE_NEON
+   depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE
select CRYPTO_ALGAPI
 
 config CRYPTO_AES_ARM64_CE_CCM
tristate "AES in CCM mode using ARMv8 Crypto Extensions"
-   depends on ARM64 && KERNEL_MODE_NEON
+   depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE
select CRYPTO_ALGAPI
select CRYPTO_AES_ARM64_CE
select CRYPTO_AEAD
 
 config CRYPTO_AES_ARM64_CE_BLK
tristate "AES in ECB/CBC/CTR/XTS modes using ARMv8 Crypto Extensions"
-   depends on ARM64 && KERNEL_MODE_NEON
+   depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE
select CRYPTO_BLKCIPHER
select CRYPTO_AES_ARM64_CE
select CRYPTO_ABLK_HELPER
 
 config CRYPTO_AES_ARM64_NEON_BLK
tristate "AES in ECB/CBC/CTR/XTS modes using NEON instructions"
-   depends on ARM64 && KERNEL_MODE_NEON
+   depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE
select CRYPTO_BLKCIPHER
select CRYPTO_AES
select CRYPTO_ABLK_HELPER
-- 
2.18.0



[PATCH RT 17/22] alarmtimer: Prevent live lock in alarm_cancel()

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
If alarm_try_to_cancel() requires a retry, then depending on the
priority setting the retry loop might prevent timer callback completion
on RT. Prevent that by waiting for completion on RT, no change for a
non RT kernel.

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit 51e376c469bf05f32cb1ceb9e39d31bb92f1f6c8)
Signed-off-by: Julia Cartwright 
---
 kernel/time/alarmtimer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index d67ef56ca9bc..61b20a656863 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -407,7 +407,7 @@ int alarm_cancel(struct alarm *alarm)
int ret = alarm_try_to_cancel(alarm);
if (ret >= 0)
return ret;
-   cpu_relax();
+   hrtimer_wait_for_timer(>timer);
}
 }
 EXPORT_SYMBOL_GPL(alarm_cancel);
-- 
2.18.0



[PATCH RT 18/22] posix-timers: move the rcu head out of the union

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
On RT the timer can be preempted while running and therefore we wait
with timer_wait_for_callback() for the timer to complete (instead of
busy looping). The RCU-readlock is held to ensure that this posix timer
is not removed while we wait on it.
If the timer is removed then it invokes call_rcu() with a pointer that
is shared with the hrtimer because it is part of the same union.
In order to avoid any possible side effects I am moving the rcu pointer
out of the union.

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit b8401365af110949f12c7cf1fa86b4c0ea069bbd)
Signed-off-by: Julia Cartwright 
---
 include/linux/posix-timers.h | 2 +-
 kernel/time/posix-timers.c   | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h
index 62d44c176071..cbd3f9334543 100644
--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -92,8 +92,8 @@ struct k_itimer {
struct alarm alarmtimer;
ktime_t interval;
} alarm;
-   struct rcu_head rcu;
} it;
+   struct rcu_head rcu;
 };
 
 struct k_clock {
diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index 6084618436fd..45d8033caec4 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -568,7 +568,7 @@ static struct k_itimer * alloc_posix_timer(void)
 
 static void k_itimer_rcu_free(struct rcu_head *head)
 {
-   struct k_itimer *tmr = container_of(head, struct k_itimer, it.rcu);
+   struct k_itimer *tmr = container_of(head, struct k_itimer, rcu);
 
kmem_cache_free(posix_timers_cache, tmr);
 }
@@ -585,7 +585,7 @@ static void release_posix_timer(struct k_itimer *tmr, int 
it_id_set)
}
put_pid(tmr->it_pid);
sigqueue_free(tmr->sigq);
-   call_rcu(>it.rcu, k_itimer_rcu_free);
+   call_rcu(>rcu, k_itimer_rcu_free);
 }
 
 static struct k_clock *clockid_to_kclock(const clockid_t id)
-- 
2.18.0



[PATCH RT 19/22] locallock: provide {get,put}_locked_ptr() variants

2018-08-06 Thread Julia Cartwright
4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
Provide a set of locallocked accessors for pointers to per-CPU data;
this is useful for dynamically-allocated per-CPU regions, for example.

These are symmetric with the {get,put}_cpu_ptr() per-CPU accessor
variants.

Signed-off-by: Julia Cartwright 
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit 3d45cf23db4f76cd356ebb0aa4cdaa7d92d1a64e)
Signed-off-by: Julia Cartwright 
---
 include/linux/locallock.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/locallock.h b/include/linux/locallock.h
index 280f884a05a3..0c3ff5b23f6a 100644
--- a/include/linux/locallock.h
+++ b/include/linux/locallock.h
@@ -238,6 +238,14 @@ static inline int __local_unlock_irqrestore(struct 
local_irq_lock *lv,
 
 #define put_locked_var(lvar, var)  local_unlock(lvar);
 
+#define get_locked_ptr(lvar, var)  \
+   ({  \
+   local_lock(lvar);   \
+   this_cpu_ptr(var);  \
+   })
+
+#define put_locked_ptr(lvar, var)  local_unlock(lvar);
+
 #define local_lock_cpu(lvar)   \
({  \
local_lock(lvar);   \
@@ -278,6 +286,8 @@ static inline void local_irq_lock_init(int lvar) { }
 
 #define get_locked_var(lvar, var)  get_cpu_var(var)
 #define put_locked_var(lvar, var)  put_cpu_var(var)
+#define get_locked_ptr(lvar, var)  get_cpu_ptr(var)
+#define put_locked_ptr(lvar, var)  put_cpu_ptr(var)
 
 #define local_lock_cpu(lvar)   get_cpu()
 #define local_unlock_cpu(lvar) put_cpu()
-- 
2.18.0



[PATCH RT 19/22] locallock: provide {get,put}_locked_ptr() variants

2018-08-06 Thread Julia Cartwright
4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
Provide a set of locallocked accessors for pointers to per-CPU data;
this is useful for dynamically-allocated per-CPU regions, for example.

These are symmetric with the {get,put}_cpu_ptr() per-CPU accessor
variants.

Signed-off-by: Julia Cartwright 
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit 3d45cf23db4f76cd356ebb0aa4cdaa7d92d1a64e)
Signed-off-by: Julia Cartwright 
---
 include/linux/locallock.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/locallock.h b/include/linux/locallock.h
index 280f884a05a3..0c3ff5b23f6a 100644
--- a/include/linux/locallock.h
+++ b/include/linux/locallock.h
@@ -238,6 +238,14 @@ static inline int __local_unlock_irqrestore(struct 
local_irq_lock *lv,
 
 #define put_locked_var(lvar, var)  local_unlock(lvar);
 
+#define get_locked_ptr(lvar, var)  \
+   ({  \
+   local_lock(lvar);   \
+   this_cpu_ptr(var);  \
+   })
+
+#define put_locked_ptr(lvar, var)  local_unlock(lvar);
+
 #define local_lock_cpu(lvar)   \
({  \
local_lock(lvar);   \
@@ -278,6 +286,8 @@ static inline void local_irq_lock_init(int lvar) { }
 
 #define get_locked_var(lvar, var)  get_cpu_var(var)
 #define put_locked_var(lvar, var)  put_cpu_var(var)
+#define get_locked_ptr(lvar, var)  get_cpu_ptr(var)
+#define put_locked_ptr(lvar, var)  put_cpu_ptr(var)
 
 #define local_lock_cpu(lvar)   get_cpu()
 #define local_unlock_cpu(lvar) put_cpu()
-- 
2.18.0



[PATCH RT 06/22] rcu: Do not include rtmutex_common.h unconditionally

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
[ upstream commit b88697810d7c1d102a529990f9071b0f14cfe6df ]

This commit adjusts include files and provides definitions in preparation
for suppressing lockdep false-positive ->boost_mtx complaints.  Without
this preparation, architectures not supporting rt_mutex will get build
failures.

Reported-by: kbuild test robot 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Paul E. McKenney 
Signed-off-by: Julia Cartwright 
---
 kernel/rcu/tree_plugin.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index a1a7bafcce15..3d18d08e8382 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -37,6 +37,7 @@
  * This probably needs to be excluded from -rt builds.
  */
 #define rt_mutex_owner(a) ({ WARN_ON_ONCE(1); NULL; })
+#define rt_mutex_futex_unlock(x) WARN_ON_ONCE(1)
 
 #endif /* #else #ifdef CONFIG_RCU_BOOST */
 
@@ -834,8 +835,6 @@ static void rcu_cpu_kthread_setup(unsigned int cpu)
 
 #ifdef CONFIG_RCU_BOOST
 
-#include "../locking/rtmutex_common.h"
-
 #ifdef CONFIG_RCU_TRACE
 
 static void rcu_initiate_boost_trace(struct rcu_node *rnp)
-- 
2.18.0



[PATCH RT 12/22] locking: add types.h

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
During the stable update the arm architecture did not compile anymore
due to missing definition of u16/32.

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit 1289b06974d64f244a26455fab699c6a1332f4bc)
Signed-off-by: Julia Cartwright 
---
 include/linux/spinlock_types_raw.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/spinlock_types_raw.h 
b/include/linux/spinlock_types_raw.h
index edffc4d53fc9..03235b475b77 100644
--- a/include/linux/spinlock_types_raw.h
+++ b/include/linux/spinlock_types_raw.h
@@ -1,6 +1,8 @@
 #ifndef __LINUX_SPINLOCK_TYPES_RAW_H
 #define __LINUX_SPINLOCK_TYPES_RAW_H
 
+#include 
+
 #if defined(CONFIG_SMP)
 # include 
 #else
-- 
2.18.0



[PATCH RT 06/22] rcu: Do not include rtmutex_common.h unconditionally

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
[ upstream commit b88697810d7c1d102a529990f9071b0f14cfe6df ]

This commit adjusts include files and provides definitions in preparation
for suppressing lockdep false-positive ->boost_mtx complaints.  Without
this preparation, architectures not supporting rt_mutex will get build
failures.

Reported-by: kbuild test robot 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Paul E. McKenney 
Signed-off-by: Julia Cartwright 
---
 kernel/rcu/tree_plugin.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index a1a7bafcce15..3d18d08e8382 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -37,6 +37,7 @@
  * This probably needs to be excluded from -rt builds.
  */
 #define rt_mutex_owner(a) ({ WARN_ON_ONCE(1); NULL; })
+#define rt_mutex_futex_unlock(x) WARN_ON_ONCE(1)
 
 #endif /* #else #ifdef CONFIG_RCU_BOOST */
 
@@ -834,8 +835,6 @@ static void rcu_cpu_kthread_setup(unsigned int cpu)
 
 #ifdef CONFIG_RCU_BOOST
 
-#include "../locking/rtmutex_common.h"
-
 #ifdef CONFIG_RCU_TRACE
 
 static void rcu_initiate_boost_trace(struct rcu_node *rnp)
-- 
2.18.0



[PATCH RT 12/22] locking: add types.h

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
During the stable update the arm architecture did not compile anymore
due to missing definition of u16/32.

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit 1289b06974d64f244a26455fab699c6a1332f4bc)
Signed-off-by: Julia Cartwright 
---
 include/linux/spinlock_types_raw.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/spinlock_types_raw.h 
b/include/linux/spinlock_types_raw.h
index edffc4d53fc9..03235b475b77 100644
--- a/include/linux/spinlock_types_raw.h
+++ b/include/linux/spinlock_types_raw.h
@@ -1,6 +1,8 @@
 #ifndef __LINUX_SPINLOCK_TYPES_RAW_H
 #define __LINUX_SPINLOCK_TYPES_RAW_H
 
+#include 
+
 #if defined(CONFIG_SMP)
 # include 
 #else
-- 
2.18.0



[PATCH RT 09/22] crypto: limit more FPU-enabled sections

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
Those crypto drivers use SSE/AVX/… for their crypto work and in order to
do so in kernel they need to enable the "FPU" in kernel mode which
disables preemption.
There are two problems with the way they are used:
- the while loop which processes X bytes may create latency spikes and
  should be avoided or limited.
- the cipher-walk-next part may allocate/free memory and may use
  kmap_atomic().

The whole kernel_fpu_begin()/end() processing isn't probably that cheap.
It most likely makes sense to process as much of those as possible in one
go. The new *_fpu_sched_rt() schedules only if a RT task is pending.

Probably we should measure the performance those ciphers in pure SW
mode and with this optimisations to see if it makes sense to keep them
for RT.

This kernel_fpu_resched() makes the code more preemptible which might hurt
performance.

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit 0dcc4c1693ef37e166da420ef7c68c7047c996f1)
Signed-off-by: Julia Cartwright 
---
 arch/x86/crypto/camellia_aesni_avx2_glue.c | 20 
 arch/x86/crypto/camellia_aesni_avx_glue.c  | 19 +++
 arch/x86/crypto/cast6_avx_glue.c   | 24 +++
 arch/x86/crypto/chacha20_glue.c|  9 
 arch/x86/crypto/serpent_avx2_glue.c| 19 +++
 arch/x86/crypto/serpent_avx_glue.c | 23 ++
 arch/x86/crypto/serpent_sse2_glue.c| 23 ++
 arch/x86/crypto/twofish_avx_glue.c | 27 --
 arch/x86/include/asm/fpu/api.h |  1 +
 arch/x86/kernel/fpu/core.c | 12 ++
 10 files changed, 158 insertions(+), 19 deletions(-)

diff --git a/arch/x86/crypto/camellia_aesni_avx2_glue.c 
b/arch/x86/crypto/camellia_aesni_avx2_glue.c
index 60907c139c4e..0902db7d326a 100644
--- a/arch/x86/crypto/camellia_aesni_avx2_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx2_glue.c
@@ -206,6 +206,20 @@ struct crypt_priv {
bool fpu_enabled;
 };
 
+#ifdef CONFIG_PREEMPT_RT_FULL
+static void camellia_fpu_end_rt(struct crypt_priv *ctx)
+{
+   bool fpu_enabled = ctx->fpu_enabled;
+
+   if (!fpu_enabled)
+   return;
+   camellia_fpu_end(fpu_enabled);
+   ctx->fpu_enabled = false;
+}
+#else
+static void camellia_fpu_end_rt(struct crypt_priv *ctx) { }
+#endif
+
 static void encrypt_callback(void *priv, u8 *srcdst, unsigned int nbytes)
 {
const unsigned int bsize = CAMELLIA_BLOCK_SIZE;
@@ -221,16 +235,19 @@ static void encrypt_callback(void *priv, u8 *srcdst, 
unsigned int nbytes)
}
 
if (nbytes >= CAMELLIA_AESNI_PARALLEL_BLOCKS * bsize) {
+   kernel_fpu_resched();
camellia_ecb_enc_16way(ctx->ctx, srcdst, srcdst);
srcdst += bsize * CAMELLIA_AESNI_PARALLEL_BLOCKS;
nbytes -= bsize * CAMELLIA_AESNI_PARALLEL_BLOCKS;
}
 
while (nbytes >= CAMELLIA_PARALLEL_BLOCKS * bsize) {
+   kernel_fpu_resched();
camellia_enc_blk_2way(ctx->ctx, srcdst, srcdst);
srcdst += bsize * CAMELLIA_PARALLEL_BLOCKS;
nbytes -= bsize * CAMELLIA_PARALLEL_BLOCKS;
}
+   camellia_fpu_end_rt(ctx);
 
for (i = 0; i < nbytes / bsize; i++, srcdst += bsize)
camellia_enc_blk(ctx->ctx, srcdst, srcdst);
@@ -251,16 +268,19 @@ static void decrypt_callback(void *priv, u8 *srcdst, 
unsigned int nbytes)
}
 
if (nbytes >= CAMELLIA_AESNI_PARALLEL_BLOCKS * bsize) {
+   kernel_fpu_resched();
camellia_ecb_dec_16way(ctx->ctx, srcdst, srcdst);
srcdst += bsize * CAMELLIA_AESNI_PARALLEL_BLOCKS;
nbytes -= bsize * CAMELLIA_AESNI_PARALLEL_BLOCKS;
}
 
while (nbytes >= CAMELLIA_PARALLEL_BLOCKS * bsize) {
+   kernel_fpu_resched();
camellia_dec_blk_2way(ctx->ctx, srcdst, srcdst);
srcdst += bsize * CAMELLIA_PARALLEL_BLOCKS;
nbytes -= bsize * CAMELLIA_PARALLEL_BLOCKS;
}
+   camellia_fpu_end_rt(ctx);
 
for (i = 0; i < nbytes / bsize; i++, srcdst += bsize)
camellia_dec_blk(ctx->ctx, srcdst, srcdst);
diff --git a/arch/x86/crypto/camellia_aesni_avx_glue.c 
b/arch/x86/crypto/camellia_aesni_avx_glue.c
index d96429da88eb..3b8e91841039 100644
--- a/arch/x86/crypto/camellia_aesni_avx_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx_glue.c
@@ -210,6 +210,21 @@ struct crypt_priv {
bool fpu_enabled;
 };
 
+#ifdef CONFIG_PREEMPT_RT_FULL
+static void camellia_fpu_end_rt(struct crypt_priv *ctx)
+{
+   bool fpu_enabled = ctx->fpu_enabled;
+
+   if (!fpu_enabled)
+   return;

[PATCH RT 07/22] rcu: Suppress lockdep false-positive ->boost_mtx complaints

2018-08-06 Thread Julia Cartwright
From: "Paul E. McKenney" 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
[ Upstream commit 02a7c234e54052101164368ff981bd72f7acdd65 ]

RCU priority boosting uses rt_mutex_init_proxy_locked() to initialize an
rt_mutex structure in locked state held by some other task.  When that
other task releases it, lockdep complains (quite accurately, but a bit
uselessly) that the other task never acquired it.  This complaint can
suppress other, more helpful, lockdep complaints, and in any case it is
a false positive.

This commit therefore switches from rt_mutex_unlock() to
rt_mutex_futex_unlock(), thereby avoiding the lockdep annotations.
Of course, if lockdep ever learns about rt_mutex_init_proxy_locked(),
addtional adjustments will be required.

Suggested-by: Peter Zijlstra 
Signed-off-by: Paul E. McKenney 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Julia Cartwright 
---
 kernel/rcu/tree_plugin.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 3d18d08e8382..510de72ad8a3 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -486,7 +486,7 @@ void rcu_read_unlock_special(struct task_struct *t)
 
/* Unboost if we were boosted. */
if (IS_ENABLED(CONFIG_RCU_BOOST) && drop_boost_mutex)
-   rt_mutex_unlock(>boost_mtx);
+   rt_mutex_futex_unlock(>boost_mtx);
 
/*
 * If this was the last task on the expedited lists,
-- 
2.18.0



[PATCH RT 09/22] crypto: limit more FPU-enabled sections

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
Those crypto drivers use SSE/AVX/… for their crypto work and in order to
do so in kernel they need to enable the "FPU" in kernel mode which
disables preemption.
There are two problems with the way they are used:
- the while loop which processes X bytes may create latency spikes and
  should be avoided or limited.
- the cipher-walk-next part may allocate/free memory and may use
  kmap_atomic().

The whole kernel_fpu_begin()/end() processing isn't probably that cheap.
It most likely makes sense to process as much of those as possible in one
go. The new *_fpu_sched_rt() schedules only if a RT task is pending.

Probably we should measure the performance those ciphers in pure SW
mode and with this optimisations to see if it makes sense to keep them
for RT.

This kernel_fpu_resched() makes the code more preemptible which might hurt
performance.

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit 0dcc4c1693ef37e166da420ef7c68c7047c996f1)
Signed-off-by: Julia Cartwright 
---
 arch/x86/crypto/camellia_aesni_avx2_glue.c | 20 
 arch/x86/crypto/camellia_aesni_avx_glue.c  | 19 +++
 arch/x86/crypto/cast6_avx_glue.c   | 24 +++
 arch/x86/crypto/chacha20_glue.c|  9 
 arch/x86/crypto/serpent_avx2_glue.c| 19 +++
 arch/x86/crypto/serpent_avx_glue.c | 23 ++
 arch/x86/crypto/serpent_sse2_glue.c| 23 ++
 arch/x86/crypto/twofish_avx_glue.c | 27 --
 arch/x86/include/asm/fpu/api.h |  1 +
 arch/x86/kernel/fpu/core.c | 12 ++
 10 files changed, 158 insertions(+), 19 deletions(-)

diff --git a/arch/x86/crypto/camellia_aesni_avx2_glue.c 
b/arch/x86/crypto/camellia_aesni_avx2_glue.c
index 60907c139c4e..0902db7d326a 100644
--- a/arch/x86/crypto/camellia_aesni_avx2_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx2_glue.c
@@ -206,6 +206,20 @@ struct crypt_priv {
bool fpu_enabled;
 };
 
+#ifdef CONFIG_PREEMPT_RT_FULL
+static void camellia_fpu_end_rt(struct crypt_priv *ctx)
+{
+   bool fpu_enabled = ctx->fpu_enabled;
+
+   if (!fpu_enabled)
+   return;
+   camellia_fpu_end(fpu_enabled);
+   ctx->fpu_enabled = false;
+}
+#else
+static void camellia_fpu_end_rt(struct crypt_priv *ctx) { }
+#endif
+
 static void encrypt_callback(void *priv, u8 *srcdst, unsigned int nbytes)
 {
const unsigned int bsize = CAMELLIA_BLOCK_SIZE;
@@ -221,16 +235,19 @@ static void encrypt_callback(void *priv, u8 *srcdst, 
unsigned int nbytes)
}
 
if (nbytes >= CAMELLIA_AESNI_PARALLEL_BLOCKS * bsize) {
+   kernel_fpu_resched();
camellia_ecb_enc_16way(ctx->ctx, srcdst, srcdst);
srcdst += bsize * CAMELLIA_AESNI_PARALLEL_BLOCKS;
nbytes -= bsize * CAMELLIA_AESNI_PARALLEL_BLOCKS;
}
 
while (nbytes >= CAMELLIA_PARALLEL_BLOCKS * bsize) {
+   kernel_fpu_resched();
camellia_enc_blk_2way(ctx->ctx, srcdst, srcdst);
srcdst += bsize * CAMELLIA_PARALLEL_BLOCKS;
nbytes -= bsize * CAMELLIA_PARALLEL_BLOCKS;
}
+   camellia_fpu_end_rt(ctx);
 
for (i = 0; i < nbytes / bsize; i++, srcdst += bsize)
camellia_enc_blk(ctx->ctx, srcdst, srcdst);
@@ -251,16 +268,19 @@ static void decrypt_callback(void *priv, u8 *srcdst, 
unsigned int nbytes)
}
 
if (nbytes >= CAMELLIA_AESNI_PARALLEL_BLOCKS * bsize) {
+   kernel_fpu_resched();
camellia_ecb_dec_16way(ctx->ctx, srcdst, srcdst);
srcdst += bsize * CAMELLIA_AESNI_PARALLEL_BLOCKS;
nbytes -= bsize * CAMELLIA_AESNI_PARALLEL_BLOCKS;
}
 
while (nbytes >= CAMELLIA_PARALLEL_BLOCKS * bsize) {
+   kernel_fpu_resched();
camellia_dec_blk_2way(ctx->ctx, srcdst, srcdst);
srcdst += bsize * CAMELLIA_PARALLEL_BLOCKS;
nbytes -= bsize * CAMELLIA_PARALLEL_BLOCKS;
}
+   camellia_fpu_end_rt(ctx);
 
for (i = 0; i < nbytes / bsize; i++, srcdst += bsize)
camellia_dec_blk(ctx->ctx, srcdst, srcdst);
diff --git a/arch/x86/crypto/camellia_aesni_avx_glue.c 
b/arch/x86/crypto/camellia_aesni_avx_glue.c
index d96429da88eb..3b8e91841039 100644
--- a/arch/x86/crypto/camellia_aesni_avx_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx_glue.c
@@ -210,6 +210,21 @@ struct crypt_priv {
bool fpu_enabled;
 };
 
+#ifdef CONFIG_PREEMPT_RT_FULL
+static void camellia_fpu_end_rt(struct crypt_priv *ctx)
+{
+   bool fpu_enabled = ctx->fpu_enabled;
+
+   if (!fpu_enabled)
+   return;

[PATCH RT 07/22] rcu: Suppress lockdep false-positive ->boost_mtx complaints

2018-08-06 Thread Julia Cartwright
From: "Paul E. McKenney" 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
[ Upstream commit 02a7c234e54052101164368ff981bd72f7acdd65 ]

RCU priority boosting uses rt_mutex_init_proxy_locked() to initialize an
rt_mutex structure in locked state held by some other task.  When that
other task releases it, lockdep complains (quite accurately, but a bit
uselessly) that the other task never acquired it.  This complaint can
suppress other, more helpful, lockdep complaints, and in any case it is
a false positive.

This commit therefore switches from rt_mutex_unlock() to
rt_mutex_futex_unlock(), thereby avoiding the lockdep annotations.
Of course, if lockdep ever learns about rt_mutex_init_proxy_locked(),
addtional adjustments will be required.

Suggested-by: Peter Zijlstra 
Signed-off-by: Paul E. McKenney 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Julia Cartwright 
---
 kernel/rcu/tree_plugin.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 3d18d08e8382..510de72ad8a3 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -486,7 +486,7 @@ void rcu_read_unlock_special(struct task_struct *t)
 
/* Unboost if we were boosted. */
if (IS_ENABLED(CONFIG_RCU_BOOST) && drop_boost_mutex)
-   rt_mutex_unlock(>boost_mtx);
+   rt_mutex_futex_unlock(>boost_mtx);
 
/*
 * If this was the last task on the expedited lists,
-- 
2.18.0



[PATCH RT 04/22] futex: Fix OWNER_DEAD fixup

2018-08-06 Thread Julia Cartwright
From: Peter Zijlstra 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
[ Upstream commit a97cb0e7b3f4c6297fd857055ae8e895f402f501 ]

Both Geert and DaveJ reported that the recent futex commit:

  c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex")

introduced a problem with setting OWNER_DEAD. We set the bit on an
uninitialized variable and then entirely optimize it away as a
dead-store.

Move the setting of the bit to where it is more useful.

Reported-by: Geert Uytterhoeven 
Reported-by: Dave Jones 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Andrew Morton 
Cc: Linus Torvalds 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Fixes: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex")
Link: 
http://lkml.kernel.org/r/20180122103947.gd2...@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Julia Cartwright 
---
 kernel/futex.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index cdd68ba6e3a6..57038131ad3f 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2301,9 +2301,6 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct 
futex_q *q,
raw_spin_lock_irq(_state->pi_mutex.wait_lock);
 
oldowner = pi_state->owner;
-   /* Owner died? */
-   if (!pi_state->owner)
-   newtid |= FUTEX_OWNER_DIED;
 
/*
 * We are here because either:
@@ -2364,6 +2361,9 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct 
futex_q *q,
}
 
newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
+   /* Owner died? */
+   if (!pi_state->owner)
+   newtid |= FUTEX_OWNER_DIED;
 
if (get_futex_value_locked(, uaddr))
goto handle_fault;
-- 
2.18.0



[PATCH RT 16/22] block: blk-mq: move blk_queue_usage_counter_release() into process context

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
| BUG: sleeping function called from invalid context at 
kernel/locking/rtmutex.c:914
| in_atomic(): 1, irqs_disabled(): 0, pid: 255, name: kworker/u257:6
| 5 locks held by kworker/u257:6/255:
|  #0:  ("events_unbound"){.+.+.+}, at: [] 
process_one_work+0x171/0x5e0
|  #1:  ((>work)){+.+.+.}, at: [] 
process_one_work+0x171/0x5e0
|  #2:  (>scan_mutex){+.+.+.}, at: [] 
__scsi_add_device+0xa3/0x130 [scsi_mod]
|  #3:  (>tag_list_lock){+.+...}, at: [] 
blk_mq_init_queue+0x96a/0xa50
|  #4:  (rcu_read_lock_sched){..}, at: [] 
percpu_ref_kill_and_confirm+0x1d/0x120
| Preemption disabled at:[] 
blk_mq_freeze_queue_start+0x56/0x70
|
| CPU: 2 PID: 255 Comm: kworker/u257:6 Not tainted 3.18.7-rt0+ #1
| Workqueue: events_unbound async_run_entry_fn
|  0003 8800bc29f998 815b3a12 
|   8800bc29f9b8 8109aa16 8800bc29fa28
|  8800bc5d1bc8 8800bc29f9e8 815b8dd4 8800
| Call Trace:
|  [] dump_stack+0x4f/0x7c
|  [] __might_sleep+0x116/0x190
|  [] rt_spin_lock+0x24/0x60
|  [] __wake_up+0x29/0x60
|  [] blk_mq_usage_counter_release+0x1e/0x20
|  [] percpu_ref_kill_and_confirm+0x106/0x120
|  [] blk_mq_freeze_queue_start+0x56/0x70
|  [] blk_mq_update_tag_set_depth+0x40/0xd0
|  [] blk_mq_init_queue+0x98c/0xa50
|  [] scsi_mq_alloc_queue+0x20/0x60 [scsi_mod]
|  [] scsi_alloc_sdev+0x2f5/0x370 [scsi_mod]
|  [] scsi_probe_and_add_lun+0x9e4/0xdd0 [scsi_mod]
|  [] __scsi_add_device+0x126/0x130 [scsi_mod]
|  [] ata_scsi_scan_host+0xaf/0x200 [libata]
|  [] async_port_probe+0x46/0x60 [libata]
|  [] async_run_entry_fn+0x3b/0xf0
|  [] process_one_work+0x201/0x5e0

percpu_ref_kill_and_confirm() invokes blk_mq_usage_counter_release() in
a rcu-sched region. swait based wake queue can't be used due to
wake_up_all() usage and disabled interrupts in !RT configs (as reported
by Corey Minyard).
The wq_has_sleeper() check has been suggested by Peter Zijlstra.

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit 2d701058d614554cce412a787f00568b9fdffade)
Signed-off-by: Julia Cartwright 
---
 block/blk-core.c   | 14 +-
 include/linux/blkdev.h |  2 ++
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 87d3e0a503e5..346d5bba3948 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -675,12 +675,21 @@ void blk_queue_exit(struct request_queue *q)
percpu_ref_put(>q_usage_counter);
 }
 
+static void blk_queue_usage_counter_release_swork(struct swork_event *sev)
+{
+   struct request_queue *q =
+   container_of(sev, struct request_queue, mq_pcpu_wake);
+
+   wake_up_all(>mq_freeze_wq);
+}
+
 static void blk_queue_usage_counter_release(struct percpu_ref *ref)
 {
struct request_queue *q =
container_of(ref, struct request_queue, q_usage_counter);
 
-   wake_up_all(>mq_freeze_wq);
+   if (wq_has_sleeper(>mq_freeze_wq))
+   swork_queue(>mq_pcpu_wake);
 }
 
 static void blk_rq_timed_out_timer(unsigned long data)
@@ -751,6 +760,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
__set_bit(QUEUE_FLAG_BYPASS, >queue_flags);
 
init_waitqueue_head(>mq_freeze_wq);
+   INIT_SWORK(>mq_pcpu_wake, blk_queue_usage_counter_release_swork);
 
/*
 * Init percpu_ref in atomic mode so that it's faster to shutdown.
@@ -3556,6 +3566,8 @@ int __init blk_dev_init(void)
if (!kblockd_workqueue)
panic("Failed to create kblockd\n");
 
+   BUG_ON(swork_get());
+
request_cachep = kmem_cache_create("blkdev_requests",
sizeof(struct request), 0, SLAB_PANIC, NULL);
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index fdb449fe3ff7..ab039211ab9f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct module;
 struct scsi_ioctl_command;
@@ -469,6 +470,7 @@ struct request_queue {
 #endif
struct rcu_head rcu_head;
wait_queue_head_t   mq_freeze_wq;
+   struct swork_event  mq_pcpu_wake;
struct percpu_ref   q_usage_counter;
struct list_headall_q_node;
 
-- 
2.18.0



[PATCH RT 02/22] futex: Fix more put_pi_state() vs. exit_pi_state_list() races

2018-08-06 Thread Julia Cartwright
From: Peter Zijlstra 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
[ Upstream commit 51d00899f7e6ded15c89cb4e2cb11a35283bac81 ]

Dmitry (through syzbot) reported being able to trigger the WARN in
get_pi_state() and a use-after-free on:

raw_spin_lock_irq(_state->pi_mutex.wait_lock);

Both are due to this race:

  exit_pi_state_list()  put_pi_state()

  lock(>pi_lock)
  while() {
pi_state = list_first_entry(head);
hb = hash_futex(_state->key);
unlock(>pi_lock);


dec_and_test(_state->refcount);

lock(>lock)
lock(_state->pi_mutex.wait_lock) // uaf if pi_state free'd
lock(>pi_lock);



unlock(>pi_lock);
get_pi_state(); // WARN; refcount==0

The problem is we take the reference count too late, and don't allow it
being 0. Fix it by using inc_not_zero() and simply retrying the loop
when we fail to get a refcount. In that case put_pi_state() should
remove the entry from the list.

Reported-by: Dmitry Vyukov 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Thomas Gleixner 
Cc: Gratian Crisan 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: dvh...@infradead.org
Cc: syzbot 

Cc: syzkaller-b...@googlegroups.com
Cc: 
Fixes: c74aef2d06a9 ("futex: Fix pi_state->owner serialization")
Link: 
http://lkml.kernel.org/r/20171031101853.xpfh72y643kdf...@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Julia Cartwright 
---
 kernel/futex.c | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 47e42faad6c5..270148be5647 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -899,11 +899,27 @@ void exit_pi_state_list(struct task_struct *curr)
 */
raw_spin_lock_irq(>pi_lock);
while (!list_empty(head)) {
-
next = head->next;
pi_state = list_entry(next, struct futex_pi_state, list);
key = pi_state->key;
hb = hash_futex();
+
+   /*
+* We can race against put_pi_state() removing itself from the
+* list (a waiter going away). put_pi_state() will first
+* decrement the reference count and then modify the list, so
+* its possible to see the list entry but fail this reference
+* acquire.
+*
+* In that case; drop the locks to let put_pi_state() make
+* progress and retry the loop.
+*/
+   if (!atomic_inc_not_zero(_state->refcount)) {
+   raw_spin_unlock_irq(>pi_lock);
+   cpu_relax();
+   raw_spin_lock_irq(>pi_lock);
+   continue;
+   }
raw_spin_unlock_irq(>pi_lock);
 
spin_lock(>lock);
@@ -914,10 +930,12 @@ void exit_pi_state_list(struct task_struct *curr)
 * task still owns the PI-state:
 */
if (head->next != next) {
+   /* retain curr->pi_lock for the loop invariant */
raw_spin_unlock(>pi_lock);
raw_spin_unlock_irq(_state->pi_mutex.wait_lock);
spin_unlock(>lock);
raw_spin_lock_irq(>pi_lock);
+   put_pi_state(pi_state);
continue;
}
 
@@ -925,9 +943,8 @@ void exit_pi_state_list(struct task_struct *curr)
WARN_ON(list_empty(_state->list));
list_del_init(_state->list);
pi_state->owner = NULL;
-   raw_spin_unlock(>pi_lock);
 
-   get_pi_state(pi_state);
+   raw_spin_unlock(>pi_lock);
raw_spin_unlock_irq(_state->pi_mutex.wait_lock);
spin_unlock(>lock);
 
-- 
2.18.0



[PATCH RT 21/22] seqlock: provide the same ordering semantics as mainline

2018-08-06 Thread Julia Cartwright
4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
The mainline implementation of read_seqbegin() orders prior loads w.r.t.
the read-side critical section.  Fixup the RT writer-boosting
implementation to provide the same guarantee.

Also, while we're here, update the usage of ACCESS_ONCE() to use
READ_ONCE().

Fixes: e69f15cf77c23 ("seqlock: Prevent rt starvation")
Cc: stable...@vger.kernel.org
Signed-off-by: Julia Cartwright 
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit afa4c06b89a3c0fb7784ff900ccd707bef519cb7)
Signed-off-by: Julia Cartwright 
---
 include/linux/seqlock.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 3d7223ffdd3b..de04d6d0face 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -461,6 +461,7 @@ static inline unsigned read_seqbegin(seqlock_t *sl)
spin_unlock_wait(>lock);
goto repeat;
}
+   smp_rmb();
return ret;
 }
 #endif
-- 
2.18.0



[PATCH RT 01/22] futex: Fix pi_state->owner serialization

2018-08-06 Thread Julia Cartwright
From: Peter Zijlstra 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
[ Upstream commit c74aef2d06a9f59cece89093eecc552933cba72a ]

There was a reported suspicion about a race between exit_pi_state_list()
and put_pi_state(). The same report mentioned the comment with
put_pi_state() said it should be called with hb->lock held, and it no
longer is in all places.

As it turns out, the pi_state->owner serialization is indeed broken. As per
the new rules:

  734009e96d19 ("futex: Change locking rules")

pi_state->owner should be serialized by pi_state->pi_mutex.wait_lock.
For the sites setting pi_state->owner we already hold wait_lock (where
required) but exit_pi_state_list() and put_pi_state() were not and
raced on clearing it.

Fixes: 734009e96d19 ("futex: Change locking rules")
Reported-by: Gratian Crisan 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Thomas Gleixner 
Cc: dvh...@infradead.org
Cc: sta...@vger.kernel.org
Link: 
https://lkml.kernel.org/r/20170922154806.jd3ffltfk24m4...@hirez.programming.kicks-ass.net
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Julia Cartwright 
---
 kernel/futex.c | 34 ++
 1 file changed, 22 insertions(+), 12 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 8ab0ddd4cf8f..47e42faad6c5 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -819,8 +819,6 @@ static void get_pi_state(struct futex_pi_state *pi_state)
 /*
  * Drops a reference to the pi_state object and frees or caches it
  * when the last reference is gone.
- *
- * Must be called with the hb lock held.
  */
 static void put_pi_state(struct futex_pi_state *pi_state)
 {
@@ -835,16 +833,22 @@ static void put_pi_state(struct futex_pi_state *pi_state)
 * and has cleaned up the pi_state already
 */
if (pi_state->owner) {
-   raw_spin_lock_irq(_state->owner->pi_lock);
-   list_del_init(_state->list);
-   raw_spin_unlock_irq(_state->owner->pi_lock);
+   struct task_struct *owner;
 
-   rt_mutex_proxy_unlock(_state->pi_mutex, pi_state->owner);
+   raw_spin_lock_irq(_state->pi_mutex.wait_lock);
+   owner = pi_state->owner;
+   if (owner) {
+   raw_spin_lock(>pi_lock);
+   list_del_init(_state->list);
+   raw_spin_unlock(>pi_lock);
+   }
+   rt_mutex_proxy_unlock(_state->pi_mutex, owner);
+   raw_spin_unlock_irq(_state->pi_mutex.wait_lock);
}
 
-   if (current->pi_state_cache)
+   if (current->pi_state_cache) {
kfree(pi_state);
-   else {
+   } else {
/*
 * pi_state->list is already empty.
 * clear pi_state->owner.
@@ -903,14 +907,15 @@ void exit_pi_state_list(struct task_struct *curr)
raw_spin_unlock_irq(>pi_lock);
 
spin_lock(>lock);
-
-   raw_spin_lock_irq(>pi_lock);
+   raw_spin_lock_irq(_state->pi_mutex.wait_lock);
+   raw_spin_lock(>pi_lock);
/*
 * We dropped the pi-lock, so re-check whether this
 * task still owns the PI-state:
 */
if (head->next != next) {
-   raw_spin_unlock_irq(>pi_lock);
+   raw_spin_unlock(>pi_lock);
+   raw_spin_unlock_irq(_state->pi_mutex.wait_lock);
spin_unlock(>lock);
raw_spin_lock_irq(>pi_lock);
continue;
@@ -920,9 +925,10 @@ void exit_pi_state_list(struct task_struct *curr)
WARN_ON(list_empty(_state->list));
list_del_init(_state->list);
pi_state->owner = NULL;
-   raw_spin_unlock_irq(>pi_lock);
+   raw_spin_unlock(>pi_lock);
 
get_pi_state(pi_state);
+   raw_spin_unlock_irq(_state->pi_mutex.wait_lock);
spin_unlock(>lock);
 
rt_mutex_futex_unlock(_state->pi_mutex);
@@ -1204,6 +1210,10 @@ static int attach_to_pi_owner(u32 uval, union futex_key 
*key,
 
WARN_ON(!list_empty(_state->list));
list_add(_state->list, >pi_state_list);
+   /*
+* Assignment without holding pi_state->pi_mutex.wait_lock is safe
+* because there is no concurrency as the object is not published yet.
+*/
pi_state->owner = p;
raw_spin_unlock_irq(>pi_lock);
 
-- 
2.18.0



[PATCH RT 14/22] Revert "rt,ntp: Move call to schedule_delayed_work() to helper thread"

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
I've been looking at this in v3.10-RT where it got in. The patch
description says

|The ntp code for notify_cmos_timer() is called from a hard interrupt
|context.

I see only one caller of ntp_notify_cmos_timer() and that is
do_adjtimex() after "raw_spin_unlock_irqrestore()".
I see a few callers of do_adjtimex() which is SYS_adjtimex() (+compat)
and posix_clock_realtime_adj() which in turn is called by
SYS_clock_adjtime().

Reverting the patch.

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit 932c5783d4434250a1019f49ae81b80731dfd4cd)
Signed-off-by: Julia Cartwright 
---
 kernel/time/ntp.c | 26 --
 1 file changed, 26 deletions(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 05b7391bf9bd..6df8927c58a5 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -17,7 +17,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "ntp_internal.h"
 #include "timekeeping_internal.h"
@@ -569,35 +568,10 @@ static void sync_cmos_clock(struct work_struct *work)
   _cmos_work, timespec64_to_jiffies());
 }
 
-#ifdef CONFIG_PREEMPT_RT_FULL
-
-static void run_clock_set_delay(struct swork_event *event)
-{
-   queue_delayed_work(system_power_efficient_wq, _cmos_work, 0);
-}
-
-static struct swork_event ntp_cmos_swork;
-
-void ntp_notify_cmos_timer(void)
-{
-   swork_queue(_cmos_swork);
-}
-
-static __init int create_cmos_delay_thread(void)
-{
-   WARN_ON(swork_get());
-   INIT_SWORK(_cmos_swork, run_clock_set_delay);
-   return 0;
-}
-early_initcall(create_cmos_delay_thread);
-
-#else
-
 void ntp_notify_cmos_timer(void)
 {
queue_delayed_work(system_power_efficient_wq, _cmos_work, 0);
 }
-#endif /* CONFIG_PREEMPT_RT_FULL */
 
 #else
 void ntp_notify_cmos_timer(void) { }
-- 
2.18.0



[PATCH RT 04/22] futex: Fix OWNER_DEAD fixup

2018-08-06 Thread Julia Cartwright
From: Peter Zijlstra 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
[ Upstream commit a97cb0e7b3f4c6297fd857055ae8e895f402f501 ]

Both Geert and DaveJ reported that the recent futex commit:

  c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex")

introduced a problem with setting OWNER_DEAD. We set the bit on an
uninitialized variable and then entirely optimize it away as a
dead-store.

Move the setting of the bit to where it is more useful.

Reported-by: Geert Uytterhoeven 
Reported-by: Dave Jones 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Andrew Morton 
Cc: Linus Torvalds 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Fixes: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex")
Link: 
http://lkml.kernel.org/r/20180122103947.gd2...@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Julia Cartwright 
---
 kernel/futex.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index cdd68ba6e3a6..57038131ad3f 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2301,9 +2301,6 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct 
futex_q *q,
raw_spin_lock_irq(_state->pi_mutex.wait_lock);
 
oldowner = pi_state->owner;
-   /* Owner died? */
-   if (!pi_state->owner)
-   newtid |= FUTEX_OWNER_DIED;
 
/*
 * We are here because either:
@@ -2364,6 +2361,9 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct 
futex_q *q,
}
 
newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
+   /* Owner died? */
+   if (!pi_state->owner)
+   newtid |= FUTEX_OWNER_DIED;
 
if (get_futex_value_locked(, uaddr))
goto handle_fault;
-- 
2.18.0



[PATCH RT 16/22] block: blk-mq: move blk_queue_usage_counter_release() into process context

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
| BUG: sleeping function called from invalid context at 
kernel/locking/rtmutex.c:914
| in_atomic(): 1, irqs_disabled(): 0, pid: 255, name: kworker/u257:6
| 5 locks held by kworker/u257:6/255:
|  #0:  ("events_unbound"){.+.+.+}, at: [] 
process_one_work+0x171/0x5e0
|  #1:  ((>work)){+.+.+.}, at: [] 
process_one_work+0x171/0x5e0
|  #2:  (>scan_mutex){+.+.+.}, at: [] 
__scsi_add_device+0xa3/0x130 [scsi_mod]
|  #3:  (>tag_list_lock){+.+...}, at: [] 
blk_mq_init_queue+0x96a/0xa50
|  #4:  (rcu_read_lock_sched){..}, at: [] 
percpu_ref_kill_and_confirm+0x1d/0x120
| Preemption disabled at:[] 
blk_mq_freeze_queue_start+0x56/0x70
|
| CPU: 2 PID: 255 Comm: kworker/u257:6 Not tainted 3.18.7-rt0+ #1
| Workqueue: events_unbound async_run_entry_fn
|  0003 8800bc29f998 815b3a12 
|   8800bc29f9b8 8109aa16 8800bc29fa28
|  8800bc5d1bc8 8800bc29f9e8 815b8dd4 8800
| Call Trace:
|  [] dump_stack+0x4f/0x7c
|  [] __might_sleep+0x116/0x190
|  [] rt_spin_lock+0x24/0x60
|  [] __wake_up+0x29/0x60
|  [] blk_mq_usage_counter_release+0x1e/0x20
|  [] percpu_ref_kill_and_confirm+0x106/0x120
|  [] blk_mq_freeze_queue_start+0x56/0x70
|  [] blk_mq_update_tag_set_depth+0x40/0xd0
|  [] blk_mq_init_queue+0x98c/0xa50
|  [] scsi_mq_alloc_queue+0x20/0x60 [scsi_mod]
|  [] scsi_alloc_sdev+0x2f5/0x370 [scsi_mod]
|  [] scsi_probe_and_add_lun+0x9e4/0xdd0 [scsi_mod]
|  [] __scsi_add_device+0x126/0x130 [scsi_mod]
|  [] ata_scsi_scan_host+0xaf/0x200 [libata]
|  [] async_port_probe+0x46/0x60 [libata]
|  [] async_run_entry_fn+0x3b/0xf0
|  [] process_one_work+0x201/0x5e0

percpu_ref_kill_and_confirm() invokes blk_mq_usage_counter_release() in
a rcu-sched region. swait based wake queue can't be used due to
wake_up_all() usage and disabled interrupts in !RT configs (as reported
by Corey Minyard).
The wq_has_sleeper() check has been suggested by Peter Zijlstra.

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit 2d701058d614554cce412a787f00568b9fdffade)
Signed-off-by: Julia Cartwright 
---
 block/blk-core.c   | 14 +-
 include/linux/blkdev.h |  2 ++
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 87d3e0a503e5..346d5bba3948 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -675,12 +675,21 @@ void blk_queue_exit(struct request_queue *q)
percpu_ref_put(>q_usage_counter);
 }
 
+static void blk_queue_usage_counter_release_swork(struct swork_event *sev)
+{
+   struct request_queue *q =
+   container_of(sev, struct request_queue, mq_pcpu_wake);
+
+   wake_up_all(>mq_freeze_wq);
+}
+
 static void blk_queue_usage_counter_release(struct percpu_ref *ref)
 {
struct request_queue *q =
container_of(ref, struct request_queue, q_usage_counter);
 
-   wake_up_all(>mq_freeze_wq);
+   if (wq_has_sleeper(>mq_freeze_wq))
+   swork_queue(>mq_pcpu_wake);
 }
 
 static void blk_rq_timed_out_timer(unsigned long data)
@@ -751,6 +760,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
__set_bit(QUEUE_FLAG_BYPASS, >queue_flags);
 
init_waitqueue_head(>mq_freeze_wq);
+   INIT_SWORK(>mq_pcpu_wake, blk_queue_usage_counter_release_swork);
 
/*
 * Init percpu_ref in atomic mode so that it's faster to shutdown.
@@ -3556,6 +3566,8 @@ int __init blk_dev_init(void)
if (!kblockd_workqueue)
panic("Failed to create kblockd\n");
 
+   BUG_ON(swork_get());
+
request_cachep = kmem_cache_create("blkdev_requests",
sizeof(struct request), 0, SLAB_PANIC, NULL);
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index fdb449fe3ff7..ab039211ab9f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct module;
 struct scsi_ioctl_command;
@@ -469,6 +470,7 @@ struct request_queue {
 #endif
struct rcu_head rcu_head;
wait_queue_head_t   mq_freeze_wq;
+   struct swork_event  mq_pcpu_wake;
struct percpu_ref   q_usage_counter;
struct list_headall_q_node;
 
-- 
2.18.0



[PATCH RT 02/22] futex: Fix more put_pi_state() vs. exit_pi_state_list() races

2018-08-06 Thread Julia Cartwright
From: Peter Zijlstra 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
[ Upstream commit 51d00899f7e6ded15c89cb4e2cb11a35283bac81 ]

Dmitry (through syzbot) reported being able to trigger the WARN in
get_pi_state() and a use-after-free on:

raw_spin_lock_irq(_state->pi_mutex.wait_lock);

Both are due to this race:

  exit_pi_state_list()  put_pi_state()

  lock(>pi_lock)
  while() {
pi_state = list_first_entry(head);
hb = hash_futex(_state->key);
unlock(>pi_lock);


dec_and_test(_state->refcount);

lock(>lock)
lock(_state->pi_mutex.wait_lock) // uaf if pi_state free'd
lock(>pi_lock);



unlock(>pi_lock);
get_pi_state(); // WARN; refcount==0

The problem is we take the reference count too late, and don't allow it
being 0. Fix it by using inc_not_zero() and simply retrying the loop
when we fail to get a refcount. In that case put_pi_state() should
remove the entry from the list.

Reported-by: Dmitry Vyukov 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Thomas Gleixner 
Cc: Gratian Crisan 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: dvh...@infradead.org
Cc: syzbot 

Cc: syzkaller-b...@googlegroups.com
Cc: 
Fixes: c74aef2d06a9 ("futex: Fix pi_state->owner serialization")
Link: 
http://lkml.kernel.org/r/20171031101853.xpfh72y643kdf...@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Julia Cartwright 
---
 kernel/futex.c | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 47e42faad6c5..270148be5647 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -899,11 +899,27 @@ void exit_pi_state_list(struct task_struct *curr)
 */
raw_spin_lock_irq(>pi_lock);
while (!list_empty(head)) {
-
next = head->next;
pi_state = list_entry(next, struct futex_pi_state, list);
key = pi_state->key;
hb = hash_futex();
+
+   /*
+* We can race against put_pi_state() removing itself from the
+* list (a waiter going away). put_pi_state() will first
+* decrement the reference count and then modify the list, so
+* its possible to see the list entry but fail this reference
+* acquire.
+*
+* In that case; drop the locks to let put_pi_state() make
+* progress and retry the loop.
+*/
+   if (!atomic_inc_not_zero(_state->refcount)) {
+   raw_spin_unlock_irq(>pi_lock);
+   cpu_relax();
+   raw_spin_lock_irq(>pi_lock);
+   continue;
+   }
raw_spin_unlock_irq(>pi_lock);
 
spin_lock(>lock);
@@ -914,10 +930,12 @@ void exit_pi_state_list(struct task_struct *curr)
 * task still owns the PI-state:
 */
if (head->next != next) {
+   /* retain curr->pi_lock for the loop invariant */
raw_spin_unlock(>pi_lock);
raw_spin_unlock_irq(_state->pi_mutex.wait_lock);
spin_unlock(>lock);
raw_spin_lock_irq(>pi_lock);
+   put_pi_state(pi_state);
continue;
}
 
@@ -925,9 +943,8 @@ void exit_pi_state_list(struct task_struct *curr)
WARN_ON(list_empty(_state->list));
list_del_init(_state->list);
pi_state->owner = NULL;
-   raw_spin_unlock(>pi_lock);
 
-   get_pi_state(pi_state);
+   raw_spin_unlock(>pi_lock);
raw_spin_unlock_irq(_state->pi_mutex.wait_lock);
spin_unlock(>lock);
 
-- 
2.18.0



[PATCH RT 21/22] seqlock: provide the same ordering semantics as mainline

2018-08-06 Thread Julia Cartwright
4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
The mainline implementation of read_seqbegin() orders prior loads w.r.t.
the read-side critical section.  Fixup the RT writer-boosting
implementation to provide the same guarantee.

Also, while we're here, update the usage of ACCESS_ONCE() to use
READ_ONCE().

Fixes: e69f15cf77c23 ("seqlock: Prevent rt starvation")
Cc: stable...@vger.kernel.org
Signed-off-by: Julia Cartwright 
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit afa4c06b89a3c0fb7784ff900ccd707bef519cb7)
Signed-off-by: Julia Cartwright 
---
 include/linux/seqlock.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 3d7223ffdd3b..de04d6d0face 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -461,6 +461,7 @@ static inline unsigned read_seqbegin(seqlock_t *sl)
spin_unlock_wait(>lock);
goto repeat;
}
+   smp_rmb();
return ret;
 }
 #endif
-- 
2.18.0



[PATCH RT 01/22] futex: Fix pi_state->owner serialization

2018-08-06 Thread Julia Cartwright
From: Peter Zijlstra 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
[ Upstream commit c74aef2d06a9f59cece89093eecc552933cba72a ]

There was a reported suspicion about a race between exit_pi_state_list()
and put_pi_state(). The same report mentioned the comment with
put_pi_state() said it should be called with hb->lock held, and it no
longer is in all places.

As it turns out, the pi_state->owner serialization is indeed broken. As per
the new rules:

  734009e96d19 ("futex: Change locking rules")

pi_state->owner should be serialized by pi_state->pi_mutex.wait_lock.
For the sites setting pi_state->owner we already hold wait_lock (where
required) but exit_pi_state_list() and put_pi_state() were not and
raced on clearing it.

Fixes: 734009e96d19 ("futex: Change locking rules")
Reported-by: Gratian Crisan 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Thomas Gleixner 
Cc: dvh...@infradead.org
Cc: sta...@vger.kernel.org
Link: 
https://lkml.kernel.org/r/20170922154806.jd3ffltfk24m4...@hirez.programming.kicks-ass.net
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Julia Cartwright 
---
 kernel/futex.c | 34 ++
 1 file changed, 22 insertions(+), 12 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 8ab0ddd4cf8f..47e42faad6c5 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -819,8 +819,6 @@ static void get_pi_state(struct futex_pi_state *pi_state)
 /*
  * Drops a reference to the pi_state object and frees or caches it
  * when the last reference is gone.
- *
- * Must be called with the hb lock held.
  */
 static void put_pi_state(struct futex_pi_state *pi_state)
 {
@@ -835,16 +833,22 @@ static void put_pi_state(struct futex_pi_state *pi_state)
 * and has cleaned up the pi_state already
 */
if (pi_state->owner) {
-   raw_spin_lock_irq(_state->owner->pi_lock);
-   list_del_init(_state->list);
-   raw_spin_unlock_irq(_state->owner->pi_lock);
+   struct task_struct *owner;
 
-   rt_mutex_proxy_unlock(_state->pi_mutex, pi_state->owner);
+   raw_spin_lock_irq(_state->pi_mutex.wait_lock);
+   owner = pi_state->owner;
+   if (owner) {
+   raw_spin_lock(>pi_lock);
+   list_del_init(_state->list);
+   raw_spin_unlock(>pi_lock);
+   }
+   rt_mutex_proxy_unlock(_state->pi_mutex, owner);
+   raw_spin_unlock_irq(_state->pi_mutex.wait_lock);
}
 
-   if (current->pi_state_cache)
+   if (current->pi_state_cache) {
kfree(pi_state);
-   else {
+   } else {
/*
 * pi_state->list is already empty.
 * clear pi_state->owner.
@@ -903,14 +907,15 @@ void exit_pi_state_list(struct task_struct *curr)
raw_spin_unlock_irq(>pi_lock);
 
spin_lock(>lock);
-
-   raw_spin_lock_irq(>pi_lock);
+   raw_spin_lock_irq(_state->pi_mutex.wait_lock);
+   raw_spin_lock(>pi_lock);
/*
 * We dropped the pi-lock, so re-check whether this
 * task still owns the PI-state:
 */
if (head->next != next) {
-   raw_spin_unlock_irq(>pi_lock);
+   raw_spin_unlock(>pi_lock);
+   raw_spin_unlock_irq(_state->pi_mutex.wait_lock);
spin_unlock(>lock);
raw_spin_lock_irq(>pi_lock);
continue;
@@ -920,9 +925,10 @@ void exit_pi_state_list(struct task_struct *curr)
WARN_ON(list_empty(_state->list));
list_del_init(_state->list);
pi_state->owner = NULL;
-   raw_spin_unlock_irq(>pi_lock);
+   raw_spin_unlock(>pi_lock);
 
get_pi_state(pi_state);
+   raw_spin_unlock_irq(_state->pi_mutex.wait_lock);
spin_unlock(>lock);
 
rt_mutex_futex_unlock(_state->pi_mutex);
@@ -1204,6 +1210,10 @@ static int attach_to_pi_owner(u32 uval, union futex_key 
*key,
 
WARN_ON(!list_empty(_state->list));
list_add(_state->list, >pi_state_list);
+   /*
+* Assignment without holding pi_state->pi_mutex.wait_lock is safe
+* because there is no concurrency as the object is not published yet.
+*/
pi_state->owner = p;
raw_spin_unlock_irq(>pi_lock);
 
-- 
2.18.0



[PATCH RT 14/22] Revert "rt,ntp: Move call to schedule_delayed_work() to helper thread"

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
I've been looking at this in v3.10-RT where it got in. The patch
description says

|The ntp code for notify_cmos_timer() is called from a hard interrupt
|context.

I see only one caller of ntp_notify_cmos_timer() and that is
do_adjtimex() after "raw_spin_unlock_irqrestore()".
I see a few callers of do_adjtimex() which is SYS_adjtimex() (+compat)
and posix_clock_realtime_adj() which in turn is called by
SYS_clock_adjtime().

Reverting the patch.

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit 932c5783d4434250a1019f49ae81b80731dfd4cd)
Signed-off-by: Julia Cartwright 
---
 kernel/time/ntp.c | 26 --
 1 file changed, 26 deletions(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 05b7391bf9bd..6df8927c58a5 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -17,7 +17,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "ntp_internal.h"
 #include "timekeeping_internal.h"
@@ -569,35 +568,10 @@ static void sync_cmos_clock(struct work_struct *work)
   _cmos_work, timespec64_to_jiffies());
 }
 
-#ifdef CONFIG_PREEMPT_RT_FULL
-
-static void run_clock_set_delay(struct swork_event *event)
-{
-   queue_delayed_work(system_power_efficient_wq, _cmos_work, 0);
-}
-
-static struct swork_event ntp_cmos_swork;
-
-void ntp_notify_cmos_timer(void)
-{
-   swork_queue(_cmos_swork);
-}
-
-static __init int create_cmos_delay_thread(void)
-{
-   WARN_ON(swork_get());
-   INIT_SWORK(_cmos_swork, run_clock_set_delay);
-   return 0;
-}
-early_initcall(create_cmos_delay_thread);
-
-#else
-
 void ntp_notify_cmos_timer(void)
 {
queue_delayed_work(system_power_efficient_wq, _cmos_work, 0);
 }
-#endif /* CONFIG_PREEMPT_RT_FULL */
 
 #else
 void ntp_notify_cmos_timer(void) { }
-- 
2.18.0



[PATCH RT 13/22] net: use task_struct instead of CPU number as the queue owner on -RT

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
In commit ("net: move xmit_recursion to per-task variable on -RT") the
recursion level was changed to be per-task since we can get preempted in
BH on -RT. The lock owner should consequently be recorded as the task
that holds the lock and not the CPU. Otherwise we trigger the "Dead loop
on virtual device" warning on SMP systems.

Cc: stable...@vger.kernel.org
Reported-by: Kurt Kanzenbach 
Tested-by: Kurt Kanzenbach 
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit d3a66ffd1c4f0253076069b10a8223e7b6e80e38)
Signed-off-by: Julia Cartwright 
---
 include/linux/netdevice.h | 54 ++-
 net/core/dev.c|  6 -
 2 files changed, 53 insertions(+), 7 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 85fc72b8a92b..6f1a3f286c4b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -594,7 +594,11 @@ struct netdev_queue {
  * write-mostly part
  */
spinlock_t  _xmit_lock cacheline_aligned_in_smp;
+#ifdef CONFIG_PREEMPT_RT_FULL
+   struct task_struct  *xmit_lock_owner;
+#else
int xmit_lock_owner;
+#endif
/*
 * Time (in jiffies) of last Tx
 */
@@ -3610,41 +3614,79 @@ static inline u32 netif_msg_init(int debug_value, int 
default_msg_enable_bits)
return (1 << debug_value) - 1;
 }
 
+#ifdef CONFIG_PREEMPT_RT_FULL
+static inline void netdev_queue_set_owner(struct netdev_queue *txq, int cpu)
+{
+   txq->xmit_lock_owner = current;
+}
+
+static inline void netdev_queue_clear_owner(struct netdev_queue *txq)
+{
+   txq->xmit_lock_owner = NULL;
+}
+
+static inline bool netdev_queue_has_owner(struct netdev_queue *txq)
+{
+   if (txq->xmit_lock_owner != NULL)
+   return true;
+   return false;
+}
+
+#else
+
+static inline void netdev_queue_set_owner(struct netdev_queue *txq, int cpu)
+{
+   txq->xmit_lock_owner = cpu;
+}
+
+static inline void netdev_queue_clear_owner(struct netdev_queue *txq)
+{
+   txq->xmit_lock_owner = -1;
+}
+
+static inline bool netdev_queue_has_owner(struct netdev_queue *txq)
+{
+   if (txq->xmit_lock_owner != -1)
+   return true;
+   return false;
+}
+#endif
+
 static inline void __netif_tx_lock(struct netdev_queue *txq, int cpu)
 {
spin_lock(>_xmit_lock);
-   txq->xmit_lock_owner = cpu;
+   netdev_queue_set_owner(txq, cpu);
 }
 
 static inline void __netif_tx_lock_bh(struct netdev_queue *txq)
 {
spin_lock_bh(>_xmit_lock);
-   txq->xmit_lock_owner = smp_processor_id();
+   netdev_queue_set_owner(txq, smp_processor_id());
 }
 
 static inline bool __netif_tx_trylock(struct netdev_queue *txq)
 {
bool ok = spin_trylock(>_xmit_lock);
if (likely(ok))
-   txq->xmit_lock_owner = smp_processor_id();
+   netdev_queue_set_owner(txq, smp_processor_id());
return ok;
 }
 
 static inline void __netif_tx_unlock(struct netdev_queue *txq)
 {
-   txq->xmit_lock_owner = -1;
+   netdev_queue_clear_owner(txq);
spin_unlock(>_xmit_lock);
 }
 
 static inline void __netif_tx_unlock_bh(struct netdev_queue *txq)
 {
-   txq->xmit_lock_owner = -1;
+   netdev_queue_clear_owner(txq);
spin_unlock_bh(>_xmit_lock);
 }
 
 static inline void txq_trans_update(struct netdev_queue *txq)
 {
-   if (txq->xmit_lock_owner != -1)
+   if (netdev_queue_has_owner(txq))
txq->trans_start = jiffies;
 }
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 93995575d23a..e7dc4700e463 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3449,7 +3449,11 @@ static int __dev_queue_xmit(struct sk_buff *skb, void 
*accel_priv)
if (dev->flags & IFF_UP) {
int cpu = smp_processor_id(); /* ok because BHs are off */
 
+#ifdef CONFIG_PREEMPT_RT_FULL
+   if (txq->xmit_lock_owner != current) {
+#else
if (txq->xmit_lock_owner != cpu) {
+#endif
if (unlikely(xmit_rec_read() > XMIT_RECURSION_LIMIT))
goto recursion_alert;
 
@@ -7168,7 +7172,7 @@ static void netdev_init_one_queue(struct net_device *dev,
/* Initialize queue lock */
spin_lock_init(>_xmit_lock);
netdev_set_xmit_lockdep_class(>_xmit_lock, dev->type);
-   queue->xmit_lock_owner = -1;
+   netdev_queue_clear_owner(queue);
netdev_queue_numa_node_write(queue, NUMA_NO_NODE);
queue->dev = dev;
 #ifdef CONFIG_BQL
-- 
2.18.0



[PATCH RT 05/22] rtmutex: Make rt_mutex_futex_unlock() safe for irq-off callsites

2018-08-06 Thread Julia Cartwright
From: Boqun Feng 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
[ Upstream commit 6b0ef92fee2a3189eba6d6b827b247cb4f6da7e9 ]

When running rcutorture with TREE03 config, CONFIG_PROVE_LOCKING=y, and
kernel cmdline argument "rcutorture.gp_exp=1", lockdep reports a
HARDIRQ-safe->HARDIRQ-unsafe deadlock:

 === WARNING: inconsistent lock state
 4.16.0-rc4+ #1 Not tainted
 
 inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
 takes:
 __schedule+0xbe/0xaf0
 {IN-HARDIRQ-W} state was registered at:
   _raw_spin_lock+0x2a/0x40
   scheduler_tick+0x47/0xf0
...
 other info that might help us debug this:
  Possible unsafe locking scenario:
CPU0

   lock(>lock);
   
 lock(>lock);
  *** DEADLOCK ***
 1 lock held by rcu_torture_rea/724:
 rcu_torture_read_lock+0x0/0x70
 stack backtrace:
 CPU: 2 PID: 724 Comm: rcu_torture_rea Not tainted 4.16.0-rc4+ #1
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.11.0-20171110_100015-anatol 04/01/2014
 Call Trace:
  lock_acquire+0x90/0x200
  ? __schedule+0xbe/0xaf0
  _raw_spin_lock+0x2a/0x40
  ? __schedule+0xbe/0xaf0
  __schedule+0xbe/0xaf0
  preempt_schedule_irq+0x2f/0x60
  retint_kernel+0x1b/0x2d
 RIP: 0010:rcu_read_unlock_special+0x0/0x680
  ? rcu_torture_read_unlock+0x60/0x60
  __rcu_read_unlock+0x64/0x70
  rcu_torture_read_unlock+0x17/0x60
  rcu_torture_reader+0x275/0x450
  ? rcutorture_booster_init+0x110/0x110
  ? rcu_torture_stall+0x230/0x230
  ? kthread+0x10e/0x130
  kthread+0x10e/0x130
  ? kthread_create_worker_on_cpu+0x70/0x70
  ? call_usermodehelper_exec_async+0x11a/0x150
  ret_from_fork+0x3a/0x50

This happens with the following even sequence:

preempt_schedule_irq();
  local_irq_enable();
  __schedule():
local_irq_disable(); // irq off
...
rcu_note_context_switch():
  rcu_note_preempt_context_switch():
rcu_read_unlock_special():
  local_irq_save(flags);
  ...
  raw_spin_unlock_irqrestore(...,flags); // irq remains off
  rt_mutex_futex_unlock():
raw_spin_lock_irq();
...
raw_spin_unlock_irq(); // accidentally set irq on


rq_lock():
  raw_spin_lock(); // acquiring rq->lock with irq on

which means rq->lock becomes a HARDIRQ-unsafe lock, which can cause
deadlocks in scheduler code.

This problem was introduced by commit 02a7c234e540 ("rcu: Suppress
lockdep false-positive ->boost_mtx complaints"). That brought the user
of rt_mutex_futex_unlock() with irq off.

To fix this, replace the *lock_irq() in rt_mutex_futex_unlock() with
*lock_irq{save,restore}() to make it safe to call rt_mutex_futex_unlock()
with irq off.

Fixes: 02a7c234e540 ("rcu: Suppress lockdep false-positive ->boost_mtx 
complaints")
Signed-off-by: Boqun Feng 
Signed-off-by: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Lai Jiangshan 
Cc: Steven Rostedt 
Cc: Josh Triplett 
Cc: Mathieu Desnoyers 
Cc: "Paul E . McKenney" 
Link: https://lkml.kernel.org/r/20180309065630.8283-1-boqun.f...@gmail.com
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Julia Cartwright 
---
 kernel/locking/rtmutex.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 57361d631749..5e15f5c73637 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -2213,11 +2213,12 @@ void __sched rt_mutex_futex_unlock(struct rt_mutex 
*lock)
 {
WAKE_Q(wake_q);
WAKE_Q(wake_sleeper_q);
+   unsigned long flags;
bool postunlock;
 
-   raw_spin_lock_irq(>wait_lock);
+   raw_spin_lock_irqsave(>wait_lock, flags);
postunlock = __rt_mutex_futex_unlock(lock, _q, _sleeper_q);
-   raw_spin_unlock_irq(>wait_lock);
+   raw_spin_unlock_irqrestore(>wait_lock, flags);
 
if (postunlock)
rt_mutex_postunlock(_q, _sleeper_q);
-- 
2.18.0



[PATCH RT 20/22] squashfs: make use of local lock in multi_cpu decompressor

2018-08-06 Thread Julia Cartwright
4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
Currently, the squashfs multi_cpu decompressor makes use of
get_cpu_ptr()/put_cpu_ptr(), which unconditionally disable preemption
during decompression.

Because the workload is distributed across CPUs, all CPUs can observe a
very high wakeup latency, which has been seen to be as much as 8000us.

Convert this decompressor to make use of a local lock, which will allow
execution of the decompressor with preemption-enabled, but also ensure
concurrent accesses to the percpu compressor data on the local CPU will
be serialized.

Cc: stable...@vger.kernel.org
Reported-by: Alexander Stein 
Tested-by: Alexander Stein 
Signed-off-by: Julia Cartwright 
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit c160736542d7b3d67da32848d2f028b8e35730e5)
Signed-off-by: Julia Cartwright 
---
 fs/squashfs/decompressor_multi_percpu.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/squashfs/decompressor_multi_percpu.c 
b/fs/squashfs/decompressor_multi_percpu.c
index 23a9c28ad8ea..6a73c4fa88e7 100644
--- a/fs/squashfs/decompressor_multi_percpu.c
+++ b/fs/squashfs/decompressor_multi_percpu.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "squashfs_fs.h"
 #include "squashfs_fs_sb.h"
@@ -25,6 +26,8 @@ struct squashfs_stream {
void*stream;
 };
 
+static DEFINE_LOCAL_IRQ_LOCK(stream_lock);
+
 void *squashfs_decompressor_create(struct squashfs_sb_info *msblk,
void *comp_opts)
 {
@@ -79,10 +82,15 @@ int squashfs_decompress(struct squashfs_sb_info *msblk, 
struct buffer_head **bh,
 {
struct squashfs_stream __percpu *percpu =
(struct squashfs_stream __percpu *) msblk->stream;
-   struct squashfs_stream *stream = get_cpu_ptr(percpu);
-   int res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
-   offset, length, output);
-   put_cpu_ptr(stream);
+   struct squashfs_stream *stream;
+   int res;
+
+   stream = get_locked_ptr(stream_lock, percpu);
+
+   res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
+   offset, length, output);
+
+   put_locked_ptr(stream_lock, stream);
 
if (res < 0)
ERROR("%s decompression failed, data probably corrupt\n",
-- 
2.18.0



[PATCH RT 13/22] net: use task_struct instead of CPU number as the queue owner on -RT

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
In commit ("net: move xmit_recursion to per-task variable on -RT") the
recursion level was changed to be per-task since we can get preempted in
BH on -RT. The lock owner should consequently be recorded as the task
that holds the lock and not the CPU. Otherwise we trigger the "Dead loop
on virtual device" warning on SMP systems.

Cc: stable...@vger.kernel.org
Reported-by: Kurt Kanzenbach 
Tested-by: Kurt Kanzenbach 
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit d3a66ffd1c4f0253076069b10a8223e7b6e80e38)
Signed-off-by: Julia Cartwright 
---
 include/linux/netdevice.h | 54 ++-
 net/core/dev.c|  6 -
 2 files changed, 53 insertions(+), 7 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 85fc72b8a92b..6f1a3f286c4b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -594,7 +594,11 @@ struct netdev_queue {
  * write-mostly part
  */
spinlock_t  _xmit_lock cacheline_aligned_in_smp;
+#ifdef CONFIG_PREEMPT_RT_FULL
+   struct task_struct  *xmit_lock_owner;
+#else
int xmit_lock_owner;
+#endif
/*
 * Time (in jiffies) of last Tx
 */
@@ -3610,41 +3614,79 @@ static inline u32 netif_msg_init(int debug_value, int 
default_msg_enable_bits)
return (1 << debug_value) - 1;
 }
 
+#ifdef CONFIG_PREEMPT_RT_FULL
+static inline void netdev_queue_set_owner(struct netdev_queue *txq, int cpu)
+{
+   txq->xmit_lock_owner = current;
+}
+
+static inline void netdev_queue_clear_owner(struct netdev_queue *txq)
+{
+   txq->xmit_lock_owner = NULL;
+}
+
+static inline bool netdev_queue_has_owner(struct netdev_queue *txq)
+{
+   if (txq->xmit_lock_owner != NULL)
+   return true;
+   return false;
+}
+
+#else
+
+static inline void netdev_queue_set_owner(struct netdev_queue *txq, int cpu)
+{
+   txq->xmit_lock_owner = cpu;
+}
+
+static inline void netdev_queue_clear_owner(struct netdev_queue *txq)
+{
+   txq->xmit_lock_owner = -1;
+}
+
+static inline bool netdev_queue_has_owner(struct netdev_queue *txq)
+{
+   if (txq->xmit_lock_owner != -1)
+   return true;
+   return false;
+}
+#endif
+
 static inline void __netif_tx_lock(struct netdev_queue *txq, int cpu)
 {
spin_lock(>_xmit_lock);
-   txq->xmit_lock_owner = cpu;
+   netdev_queue_set_owner(txq, cpu);
 }
 
 static inline void __netif_tx_lock_bh(struct netdev_queue *txq)
 {
spin_lock_bh(>_xmit_lock);
-   txq->xmit_lock_owner = smp_processor_id();
+   netdev_queue_set_owner(txq, smp_processor_id());
 }
 
 static inline bool __netif_tx_trylock(struct netdev_queue *txq)
 {
bool ok = spin_trylock(>_xmit_lock);
if (likely(ok))
-   txq->xmit_lock_owner = smp_processor_id();
+   netdev_queue_set_owner(txq, smp_processor_id());
return ok;
 }
 
 static inline void __netif_tx_unlock(struct netdev_queue *txq)
 {
-   txq->xmit_lock_owner = -1;
+   netdev_queue_clear_owner(txq);
spin_unlock(>_xmit_lock);
 }
 
 static inline void __netif_tx_unlock_bh(struct netdev_queue *txq)
 {
-   txq->xmit_lock_owner = -1;
+   netdev_queue_clear_owner(txq);
spin_unlock_bh(>_xmit_lock);
 }
 
 static inline void txq_trans_update(struct netdev_queue *txq)
 {
-   if (txq->xmit_lock_owner != -1)
+   if (netdev_queue_has_owner(txq))
txq->trans_start = jiffies;
 }
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 93995575d23a..e7dc4700e463 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3449,7 +3449,11 @@ static int __dev_queue_xmit(struct sk_buff *skb, void 
*accel_priv)
if (dev->flags & IFF_UP) {
int cpu = smp_processor_id(); /* ok because BHs are off */
 
+#ifdef CONFIG_PREEMPT_RT_FULL
+   if (txq->xmit_lock_owner != current) {
+#else
if (txq->xmit_lock_owner != cpu) {
+#endif
if (unlikely(xmit_rec_read() > XMIT_RECURSION_LIMIT))
goto recursion_alert;
 
@@ -7168,7 +7172,7 @@ static void netdev_init_one_queue(struct net_device *dev,
/* Initialize queue lock */
spin_lock_init(>_xmit_lock);
netdev_set_xmit_lockdep_class(>_xmit_lock, dev->type);
-   queue->xmit_lock_owner = -1;
+   netdev_queue_clear_owner(queue);
netdev_queue_numa_node_write(queue, NUMA_NO_NODE);
queue->dev = dev;
 #ifdef CONFIG_BQL
-- 
2.18.0



[PATCH RT 05/22] rtmutex: Make rt_mutex_futex_unlock() safe for irq-off callsites

2018-08-06 Thread Julia Cartwright
From: Boqun Feng 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
[ Upstream commit 6b0ef92fee2a3189eba6d6b827b247cb4f6da7e9 ]

When running rcutorture with TREE03 config, CONFIG_PROVE_LOCKING=y, and
kernel cmdline argument "rcutorture.gp_exp=1", lockdep reports a
HARDIRQ-safe->HARDIRQ-unsafe deadlock:

 === WARNING: inconsistent lock state
 4.16.0-rc4+ #1 Not tainted
 
 inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
 takes:
 __schedule+0xbe/0xaf0
 {IN-HARDIRQ-W} state was registered at:
   _raw_spin_lock+0x2a/0x40
   scheduler_tick+0x47/0xf0
...
 other info that might help us debug this:
  Possible unsafe locking scenario:
CPU0

   lock(>lock);
   
 lock(>lock);
  *** DEADLOCK ***
 1 lock held by rcu_torture_rea/724:
 rcu_torture_read_lock+0x0/0x70
 stack backtrace:
 CPU: 2 PID: 724 Comm: rcu_torture_rea Not tainted 4.16.0-rc4+ #1
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.11.0-20171110_100015-anatol 04/01/2014
 Call Trace:
  lock_acquire+0x90/0x200
  ? __schedule+0xbe/0xaf0
  _raw_spin_lock+0x2a/0x40
  ? __schedule+0xbe/0xaf0
  __schedule+0xbe/0xaf0
  preempt_schedule_irq+0x2f/0x60
  retint_kernel+0x1b/0x2d
 RIP: 0010:rcu_read_unlock_special+0x0/0x680
  ? rcu_torture_read_unlock+0x60/0x60
  __rcu_read_unlock+0x64/0x70
  rcu_torture_read_unlock+0x17/0x60
  rcu_torture_reader+0x275/0x450
  ? rcutorture_booster_init+0x110/0x110
  ? rcu_torture_stall+0x230/0x230
  ? kthread+0x10e/0x130
  kthread+0x10e/0x130
  ? kthread_create_worker_on_cpu+0x70/0x70
  ? call_usermodehelper_exec_async+0x11a/0x150
  ret_from_fork+0x3a/0x50

This happens with the following even sequence:

preempt_schedule_irq();
  local_irq_enable();
  __schedule():
local_irq_disable(); // irq off
...
rcu_note_context_switch():
  rcu_note_preempt_context_switch():
rcu_read_unlock_special():
  local_irq_save(flags);
  ...
  raw_spin_unlock_irqrestore(...,flags); // irq remains off
  rt_mutex_futex_unlock():
raw_spin_lock_irq();
...
raw_spin_unlock_irq(); // accidentally set irq on


rq_lock():
  raw_spin_lock(); // acquiring rq->lock with irq on

which means rq->lock becomes a HARDIRQ-unsafe lock, which can cause
deadlocks in scheduler code.

This problem was introduced by commit 02a7c234e540 ("rcu: Suppress
lockdep false-positive ->boost_mtx complaints"). That brought the user
of rt_mutex_futex_unlock() with irq off.

To fix this, replace the *lock_irq() in rt_mutex_futex_unlock() with
*lock_irq{save,restore}() to make it safe to call rt_mutex_futex_unlock()
with irq off.

Fixes: 02a7c234e540 ("rcu: Suppress lockdep false-positive ->boost_mtx 
complaints")
Signed-off-by: Boqun Feng 
Signed-off-by: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Lai Jiangshan 
Cc: Steven Rostedt 
Cc: Josh Triplett 
Cc: Mathieu Desnoyers 
Cc: "Paul E . McKenney" 
Link: https://lkml.kernel.org/r/20180309065630.8283-1-boqun.f...@gmail.com
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Julia Cartwright 
---
 kernel/locking/rtmutex.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 57361d631749..5e15f5c73637 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -2213,11 +2213,12 @@ void __sched rt_mutex_futex_unlock(struct rt_mutex 
*lock)
 {
WAKE_Q(wake_q);
WAKE_Q(wake_sleeper_q);
+   unsigned long flags;
bool postunlock;
 
-   raw_spin_lock_irq(>wait_lock);
+   raw_spin_lock_irqsave(>wait_lock, flags);
postunlock = __rt_mutex_futex_unlock(lock, _q, _sleeper_q);
-   raw_spin_unlock_irq(>wait_lock);
+   raw_spin_unlock_irqrestore(>wait_lock, flags);
 
if (postunlock)
rt_mutex_postunlock(_q, _sleeper_q);
-- 
2.18.0



[PATCH RT 20/22] squashfs: make use of local lock in multi_cpu decompressor

2018-08-06 Thread Julia Cartwright
4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
Currently, the squashfs multi_cpu decompressor makes use of
get_cpu_ptr()/put_cpu_ptr(), which unconditionally disable preemption
during decompression.

Because the workload is distributed across CPUs, all CPUs can observe a
very high wakeup latency, which has been seen to be as much as 8000us.

Convert this decompressor to make use of a local lock, which will allow
execution of the decompressor with preemption-enabled, but also ensure
concurrent accesses to the percpu compressor data on the local CPU will
be serialized.

Cc: stable...@vger.kernel.org
Reported-by: Alexander Stein 
Tested-by: Alexander Stein 
Signed-off-by: Julia Cartwright 
Signed-off-by: Sebastian Andrzej Siewior 
(cherry picked from commit c160736542d7b3d67da32848d2f028b8e35730e5)
Signed-off-by: Julia Cartwright 
---
 fs/squashfs/decompressor_multi_percpu.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/squashfs/decompressor_multi_percpu.c 
b/fs/squashfs/decompressor_multi_percpu.c
index 23a9c28ad8ea..6a73c4fa88e7 100644
--- a/fs/squashfs/decompressor_multi_percpu.c
+++ b/fs/squashfs/decompressor_multi_percpu.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "squashfs_fs.h"
 #include "squashfs_fs_sb.h"
@@ -25,6 +26,8 @@ struct squashfs_stream {
void*stream;
 };
 
+static DEFINE_LOCAL_IRQ_LOCK(stream_lock);
+
 void *squashfs_decompressor_create(struct squashfs_sb_info *msblk,
void *comp_opts)
 {
@@ -79,10 +82,15 @@ int squashfs_decompress(struct squashfs_sb_info *msblk, 
struct buffer_head **bh,
 {
struct squashfs_stream __percpu *percpu =
(struct squashfs_stream __percpu *) msblk->stream;
-   struct squashfs_stream *stream = get_cpu_ptr(percpu);
-   int res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
-   offset, length, output);
-   put_cpu_ptr(stream);
+   struct squashfs_stream *stream;
+   int res;
+
+   stream = get_locked_ptr(stream_lock, percpu);
+
+   res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
+   offset, length, output);
+
+   put_locked_ptr(stream_lock, stream);
 
if (res < 0)
ERROR("%s decompression failed, data probably corrupt\n",
-- 
2.18.0



[PATCH RT 08/22] sched, tracing: Fix trace_sched_pi_setprio() for deboosting

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
[ Upstream commit 4ff648decf4712d39f184fc2df3163f43975575a ]

Since the following commit:

  b91473ff6e97 ("sched,tracing: Update trace_sched_pi_setprio()")

the sched_pi_setprio trace point shows the "newprio" during a deboost:

  |futex sched_pi_setprio: comm=futex_requeue_p pid"34 oldprio˜ newprio=98
  |futex sched_switch: prev_comm=futex_requeue_p prev_pid"34 prev_prio0

This patch open codes __rt_effective_prio() in the tracepoint as the
'newprio' to get the old behaviour back / the correct priority:

  |futex sched_pi_setprio: comm=futex_requeue_p pid"20 oldprio˜ newprio=120
  |futex sched_switch: prev_comm=futex_requeue_p prev_pid"20 prev_prio0

Peter suggested to open code the new priority so people using tracehook
could get the deadline data out.

Reported-by: Mansky Christian 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Steven Rostedt 
Cc: Thomas Gleixner 
Fixes: b91473ff6e97 ("sched,tracing: Update trace_sched_pi_setprio()")
Link: http://lkml.kernel.org/r/20180524132647.gg6ziuogczdmj...@linutronix.de
Signed-off-by: Ingo Molnar 
Signed-off-by: Julia Cartwright 
---
 include/trace/events/sched.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 516ae88cddf4..742682079acf 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -429,7 +429,9 @@ TRACE_EVENT(sched_pi_setprio,
memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
__entry->pid= tsk->pid;
__entry->oldprio= tsk->prio;
-   __entry->newprio= pi_task ? pi_task->prio : tsk->prio;
+   __entry->newprio= pi_task ?
+   min(tsk->normal_prio, pi_task->prio) :
+   tsk->normal_prio;
/* XXX SCHED_DEADLINE bits missing */
),
 
-- 
2.18.0



[PATCH RT 08/22] sched, tracing: Fix trace_sched_pi_setprio() for deboosting

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
[ Upstream commit 4ff648decf4712d39f184fc2df3163f43975575a ]

Since the following commit:

  b91473ff6e97 ("sched,tracing: Update trace_sched_pi_setprio()")

the sched_pi_setprio trace point shows the "newprio" during a deboost:

  |futex sched_pi_setprio: comm=futex_requeue_p pid"34 oldprio˜ newprio=98
  |futex sched_switch: prev_comm=futex_requeue_p prev_pid"34 prev_prio0

This patch open codes __rt_effective_prio() in the tracepoint as the
'newprio' to get the old behaviour back / the correct priority:

  |futex sched_pi_setprio: comm=futex_requeue_p pid"20 oldprio˜ newprio=120
  |futex sched_switch: prev_comm=futex_requeue_p prev_pid"20 prev_prio0

Peter suggested to open code the new priority so people using tracehook
could get the deadline data out.

Reported-by: Mansky Christian 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Steven Rostedt 
Cc: Thomas Gleixner 
Fixes: b91473ff6e97 ("sched,tracing: Update trace_sched_pi_setprio()")
Link: http://lkml.kernel.org/r/20180524132647.gg6ziuogczdmj...@linutronix.de
Signed-off-by: Ingo Molnar 
Signed-off-by: Julia Cartwright 
---
 include/trace/events/sched.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 516ae88cddf4..742682079acf 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -429,7 +429,9 @@ TRACE_EVENT(sched_pi_setprio,
memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
__entry->pid= tsk->pid;
__entry->oldprio= tsk->prio;
-   __entry->newprio= pi_task ? pi_task->prio : tsk->prio;
+   __entry->newprio= pi_task ?
+   min(tsk->normal_prio, pi_task->prio) :
+   tsk->normal_prio;
/* XXX SCHED_DEADLINE bits missing */
),
 
-- 
2.18.0



[PATCH RT 03/22] futex: Avoid violating the 10th rule of futex

2018-08-06 Thread Julia Cartwright
From: Peter Zijlstra 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
[ Upstream commit c1e2f0eaf015fb7076d51a339011f2383e6dd389 ]

Julia reported futex state corruption in the following scenario:

   waiter  waker
stealer (prio > waiter)

   futex(WAIT_REQUEUE_PI, uaddr, uaddr2,
 timeout=[N ms])
  futex_wait_requeue_pi()
 futex_wait_queue_me()
freezable_schedule()

   futex(LOCK_PI, uaddr2)
   futex(CMP_REQUEUE_PI, uaddr,
 uaddr2, 1, 0)
  /* requeues waiter to uaddr2 */
   futex(UNLOCK_PI, uaddr2)
 wake_futex_pi()

cmp_futex_value_locked(uaddr2, waiter)
wake_up_q()
   
   task>

   futex(LOCK_PI, uaddr2)

  __rt_mutex_start_proxy_lock()

 try_to_take_rt_mutex() /* steals lock */

rt_mutex_set_owner(lock, stealer)

  
 
 rt_mutex_wait_proxy_lock()
__rt_mutex_slowlock()
   try_to_take_rt_mutex() /* fails, lock held by stealer */
   if (timeout && !timeout->task)
  return -ETIMEDOUT;
fixup_owner()
   /* lock wasn't acquired, so,
  fixup_pi_state_owner skipped */

   return -ETIMEDOUT;

   /* At this point, we've returned -ETIMEDOUT to userspace, but the
* futex word shows waiter to be the owner, and the pi_mutex has
* stealer as the owner */

   futex_lock(LOCK_PI, uaddr2)
 -> bails with EDEADLK, futex word says we're owner.

And suggested that what commit:

  73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state")

removes from fixup_owner() looks to be just what is needed. And indeed
it is -- I completely missed that requeue_pi could also result in this
case. So we need to restore that, except that subsequent patches, like
commit:

  16ffa12d7425 ("futex: Pull rt_mutex_futex_unlock() out from under hb->lock")

changed all the locking rules. Even without that, the sequence:

-   if (rt_mutex_futex_trylock(>pi_state->pi_mutex)) {
-   locked = 1;
-   goto out;
-   }

-   raw_spin_lock_irq(>pi_state->pi_mutex.wait_lock);
-   owner = rt_mutex_owner(>pi_state->pi_mutex);
-   if (!owner)
-   owner = rt_mutex_next_owner(>pi_state->pi_mutex);
-   raw_spin_unlock_irq(>pi_state->pi_mutex.wait_lock);
-   ret = fixup_pi_state_owner(uaddr, q, owner);

already suggests there were races; otherwise we'd never have to look
at next_owner.

So instead of doing 3 consecutive wait_lock sections with who knows
what races, we do it all in a single section. Additionally, the usage
of pi_state->owner in fixup_owner() was only safe because only the
rt_mutex owner would modify it, which this additional case wrecks.

Luckily the values can only change away and not to the value we're
testing, this means we can do a speculative test and double check once
we have the wait_lock.

Fixes: 73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state")
Reported-by: Julia Cartwright 
Reported-by: Gratian Crisan 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Thomas Gleixner 
Tested-by: Julia Cartwright 
Tested-by: Gratian Crisan 
Cc: Darren Hart 
Cc: sta...@vger.kernel.org
Link: 
https://lkml.kernel.org/r/20171208124939.7livp7no2ov65...@hirez.programming.kicks-ass.net
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Julia Cartwright 
---
 kernel/futex.c  | 83 ++---
 kernel/locking/rtmutex.c| 26 ---
 kernel/locking/rtmutex_common.h |  1 +
 3 files changed, 87 insertions(+), 23 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 270148be5647..cdd68ba6e3a6 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2287,21 +2287,17 @@ static void unqueue_me_pi(struct futex_q *q)
spin_unlock(q->lock_ptr);
 }
 
-/*
- * Fixup the pi_state owner with the

[PATCH RT 03/22] futex: Avoid violating the 10th rule of futex

2018-08-06 Thread Julia Cartwright
From: Peter Zijlstra 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
[ Upstream commit c1e2f0eaf015fb7076d51a339011f2383e6dd389 ]

Julia reported futex state corruption in the following scenario:

   waiter  waker
stealer (prio > waiter)

   futex(WAIT_REQUEUE_PI, uaddr, uaddr2,
 timeout=[N ms])
  futex_wait_requeue_pi()
 futex_wait_queue_me()
freezable_schedule()

   futex(LOCK_PI, uaddr2)
   futex(CMP_REQUEUE_PI, uaddr,
 uaddr2, 1, 0)
  /* requeues waiter to uaddr2 */
   futex(UNLOCK_PI, uaddr2)
 wake_futex_pi()

cmp_futex_value_locked(uaddr2, waiter)
wake_up_q()
   
   task>

   futex(LOCK_PI, uaddr2)

  __rt_mutex_start_proxy_lock()

 try_to_take_rt_mutex() /* steals lock */

rt_mutex_set_owner(lock, stealer)

  
 
 rt_mutex_wait_proxy_lock()
__rt_mutex_slowlock()
   try_to_take_rt_mutex() /* fails, lock held by stealer */
   if (timeout && !timeout->task)
  return -ETIMEDOUT;
fixup_owner()
   /* lock wasn't acquired, so,
  fixup_pi_state_owner skipped */

   return -ETIMEDOUT;

   /* At this point, we've returned -ETIMEDOUT to userspace, but the
* futex word shows waiter to be the owner, and the pi_mutex has
* stealer as the owner */

   futex_lock(LOCK_PI, uaddr2)
 -> bails with EDEADLK, futex word says we're owner.

And suggested that what commit:

  73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state")

removes from fixup_owner() looks to be just what is needed. And indeed
it is -- I completely missed that requeue_pi could also result in this
case. So we need to restore that, except that subsequent patches, like
commit:

  16ffa12d7425 ("futex: Pull rt_mutex_futex_unlock() out from under hb->lock")

changed all the locking rules. Even without that, the sequence:

-   if (rt_mutex_futex_trylock(>pi_state->pi_mutex)) {
-   locked = 1;
-   goto out;
-   }

-   raw_spin_lock_irq(>pi_state->pi_mutex.wait_lock);
-   owner = rt_mutex_owner(>pi_state->pi_mutex);
-   if (!owner)
-   owner = rt_mutex_next_owner(>pi_state->pi_mutex);
-   raw_spin_unlock_irq(>pi_state->pi_mutex.wait_lock);
-   ret = fixup_pi_state_owner(uaddr, q, owner);

already suggests there were races; otherwise we'd never have to look
at next_owner.

So instead of doing 3 consecutive wait_lock sections with who knows
what races, we do it all in a single section. Additionally, the usage
of pi_state->owner in fixup_owner() was only safe because only the
rt_mutex owner would modify it, which this additional case wrecks.

Luckily the values can only change away and not to the value we're
testing, this means we can do a speculative test and double check once
we have the wait_lock.

Fixes: 73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state")
Reported-by: Julia Cartwright 
Reported-by: Gratian Crisan 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Thomas Gleixner 
Tested-by: Julia Cartwright 
Tested-by: Gratian Crisan 
Cc: Darren Hart 
Cc: sta...@vger.kernel.org
Link: 
https://lkml.kernel.org/r/20171208124939.7livp7no2ov65...@hirez.programming.kicks-ass.net
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Julia Cartwright 
---
 kernel/futex.c  | 83 ++---
 kernel/locking/rtmutex.c| 26 ---
 kernel/locking/rtmutex_common.h |  1 +
 3 files changed, 87 insertions(+), 23 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 270148be5647..cdd68ba6e3a6 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2287,21 +2287,17 @@ static void unqueue_me_pi(struct futex_q *q)
spin_unlock(q->lock_ptr);
 }
 
-/*
- * Fixup the pi_state owner with the

[PATCH RT 00/22] Linux 4.9.115-rt94-rc1

2018-08-06 Thread Julia Cartwright
Hello RT folks!

This patchset brings back many RT-specific fixes that have gone into
subsequent 4.14-rt and 4.16-rt releases.

One of my x86 boxes very intermittently triggers a WARN_ON() on bootup
in migrate_enable(), which I'm still trying to triage.  If you can more
reliably reproduce this, please let me know.

This release candidate will not be pushed to the git tree.

To build 4.9.115-rt94-rc1 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v4.x/linux-4.9.tar.xz

  http://www.kernel.org/pub/linux/kernel/v4.x/patch-4.9.115.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/4.9/patch-4.9.115-rt94-rc1.patch.xz

If all goes well with testing, this rc will be promoted to an official
release on 8/16/2018.

Please go forth and test!

Thanks,
   Julia

---
Boqun Feng (1):
  rtmutex: Make rt_mutex_futex_unlock() safe for irq-off callsites

Julia Cartwright (4):
  locallock: provide {get,put}_locked_ptr() variants
  squashfs: make use of local lock in multi_cpu decompressor
  seqlock: provide the same ordering semantics as mainline
  Linux 4.9.115-rt94-rc1

Paul E. McKenney (1):
  rcu: Suppress lockdep false-positive ->boost_mtx complaints

Peter Zijlstra (4):
  futex: Fix pi_state->owner serialization
  futex: Fix more put_pi_state() vs. exit_pi_state_list() races
  futex: Avoid violating the 10th rule of futex
  futex: Fix OWNER_DEAD fixup

Sebastian Andrzej Siewior (12):
  rcu: Do not include rtmutex_common.h unconditionally
  sched, tracing: Fix trace_sched_pi_setprio() for deboosting
  crypto: limit more FPU-enabled sections
  arm*: disable NEON in kernel mode
  mm/slub: close possible memory-leak in kmem_cache_alloc_bulk()
  locking: add types.h
  net: use task_struct instead of CPU number as the queue owner on -RT
  Revert "rt,ntp: Move call to schedule_delayed_work() to helper thread"
  Revert "block: blk-mq: Use swait"
  block: blk-mq: move blk_queue_usage_counter_release() into process
context
  alarmtimer: Prevent live lock in alarm_cancel()
  posix-timers: move the rcu head out of the union

 arch/arm/Kconfig   |   2 +-
 arch/arm64/crypto/Kconfig  |  14 +-
 arch/x86/crypto/camellia_aesni_avx2_glue.c |  20 +++
 arch/x86/crypto/camellia_aesni_avx_glue.c  |  19 +++
 arch/x86/crypto/cast6_avx_glue.c   |  24 +++-
 arch/x86/crypto/chacha20_glue.c|   9 +-
 arch/x86/crypto/serpent_avx2_glue.c|  19 +++
 arch/x86/crypto/serpent_avx_glue.c |  23 +++-
 arch/x86/crypto/serpent_sse2_glue.c|  23 +++-
 arch/x86/crypto/twofish_avx_glue.c |  27 +++-
 arch/x86/include/asm/fpu/api.h |   1 +
 arch/x86/kernel/fpu/core.c |  12 ++
 block/blk-core.c   |  22 +++-
 block/blk-mq.c |   6 +-
 fs/squashfs/decompressor_multi_percpu.c|  16 ++-
 include/linux/blkdev.h |   4 +-
 include/linux/locallock.h  |  10 ++
 include/linux/netdevice.h  |  54 +++-
 include/linux/posix-timers.h   |   2 +-
 include/linux/seqlock.h|   1 +
 include/linux/spinlock_types_raw.h |   2 +
 include/trace/events/sched.h   |   4 +-
 kernel/futex.c | 144 -
 kernel/locking/rtmutex.c   |  31 +++--
 kernel/locking/rtmutex_common.h|   1 +
 kernel/rcu/tree_plugin.h   |   5 +-
 kernel/time/alarmtimer.c   |   2 +-
 kernel/time/ntp.c  |  26 
 kernel/time/posix-timers.c |   4 +-
 localversion-rt|   2 +-
 mm/slub.c  |   1 +
 net/core/dev.c |   6 +-
 32 files changed, 412 insertions(+), 124 deletions(-)

-- 
2.18.0



[PATCH RT 15/22] Revert "block: blk-mq: Use swait"

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
This reverts commit "block: blk-mq: Use swait". The issue remains but
will be fixed differently.

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
(cherry-picked from ca3fd6cf836739fd59eac2f7a9b0261365e818bb)
Signed-off-by: Julia Cartwright 
---
 block/blk-core.c   | 10 +-
 block/blk-mq.c |  6 +++---
 include/linux/blkdev.h |  2 +-
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index e4ac43392875..87d3e0a503e5 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -662,9 +662,9 @@ int blk_queue_enter(struct request_queue *q, bool nowait)
if (nowait)
return -EBUSY;
 
-   swait_event(q->mq_freeze_wq,
-   !atomic_read(>mq_freeze_depth) ||
-   blk_queue_dying(q));
+   wait_event(q->mq_freeze_wq,
+  !atomic_read(>mq_freeze_depth) ||
+  blk_queue_dying(q));
if (blk_queue_dying(q))
return -ENODEV;
}
@@ -680,7 +680,7 @@ static void blk_queue_usage_counter_release(struct 
percpu_ref *ref)
struct request_queue *q =
container_of(ref, struct request_queue, q_usage_counter);
 
-   swake_up_all(>mq_freeze_wq);
+   wake_up_all(>mq_freeze_wq);
 }
 
 static void blk_rq_timed_out_timer(unsigned long data)
@@ -750,7 +750,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
q->bypass_depth = 1;
__set_bit(QUEUE_FLAG_BYPASS, >queue_flags);
 
-   init_swait_queue_head(>mq_freeze_wq);
+   init_waitqueue_head(>mq_freeze_wq);
 
/*
 * Init percpu_ref in atomic mode so that it's faster to shutdown.
diff --git a/block/blk-mq.c b/block/blk-mq.c
index e0a804ab5420..3a49552974ec 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -72,7 +72,7 @@ EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_start);
 
 static void blk_mq_freeze_queue_wait(struct request_queue *q)
 {
-   swait_event(q->mq_freeze_wq, percpu_ref_is_zero(>q_usage_counter));
+   wait_event(q->mq_freeze_wq, percpu_ref_is_zero(>q_usage_counter));
 }
 
 /*
@@ -110,7 +110,7 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
WARN_ON_ONCE(freeze_depth < 0);
if (!freeze_depth) {
percpu_ref_reinit(>q_usage_counter);
-   swake_up_all(>mq_freeze_wq);
+   wake_up_all(>mq_freeze_wq);
}
 }
 EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
@@ -129,7 +129,7 @@ void blk_mq_wake_waiters(struct request_queue *q)
 * dying, we need to ensure that processes currently waiting on
 * the queue are notified as well.
 */
-   swake_up_all(>mq_freeze_wq);
+   wake_up_all(>mq_freeze_wq);
 }
 
 bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 34fd1ed9845e..fdb449fe3ff7 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -468,7 +468,7 @@ struct request_queue {
struct throtl_data *td;
 #endif
struct rcu_head rcu_head;
-   struct swait_queue_head mq_freeze_wq;
+   wait_queue_head_t   mq_freeze_wq;
struct percpu_ref   q_usage_counter;
struct list_headall_q_node;
 
-- 
2.18.0



[PATCH RT 15/22] Revert "block: blk-mq: Use swait"

2018-08-06 Thread Julia Cartwright
From: Sebastian Andrzej Siewior 

4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
This reverts commit "block: blk-mq: Use swait". The issue remains but
will be fixed differently.

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
(cherry-picked from ca3fd6cf836739fd59eac2f7a9b0261365e818bb)
Signed-off-by: Julia Cartwright 
---
 block/blk-core.c   | 10 +-
 block/blk-mq.c |  6 +++---
 include/linux/blkdev.h |  2 +-
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index e4ac43392875..87d3e0a503e5 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -662,9 +662,9 @@ int blk_queue_enter(struct request_queue *q, bool nowait)
if (nowait)
return -EBUSY;
 
-   swait_event(q->mq_freeze_wq,
-   !atomic_read(>mq_freeze_depth) ||
-   blk_queue_dying(q));
+   wait_event(q->mq_freeze_wq,
+  !atomic_read(>mq_freeze_depth) ||
+  blk_queue_dying(q));
if (blk_queue_dying(q))
return -ENODEV;
}
@@ -680,7 +680,7 @@ static void blk_queue_usage_counter_release(struct 
percpu_ref *ref)
struct request_queue *q =
container_of(ref, struct request_queue, q_usage_counter);
 
-   swake_up_all(>mq_freeze_wq);
+   wake_up_all(>mq_freeze_wq);
 }
 
 static void blk_rq_timed_out_timer(unsigned long data)
@@ -750,7 +750,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
q->bypass_depth = 1;
__set_bit(QUEUE_FLAG_BYPASS, >queue_flags);
 
-   init_swait_queue_head(>mq_freeze_wq);
+   init_waitqueue_head(>mq_freeze_wq);
 
/*
 * Init percpu_ref in atomic mode so that it's faster to shutdown.
diff --git a/block/blk-mq.c b/block/blk-mq.c
index e0a804ab5420..3a49552974ec 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -72,7 +72,7 @@ EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_start);
 
 static void blk_mq_freeze_queue_wait(struct request_queue *q)
 {
-   swait_event(q->mq_freeze_wq, percpu_ref_is_zero(>q_usage_counter));
+   wait_event(q->mq_freeze_wq, percpu_ref_is_zero(>q_usage_counter));
 }
 
 /*
@@ -110,7 +110,7 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
WARN_ON_ONCE(freeze_depth < 0);
if (!freeze_depth) {
percpu_ref_reinit(>q_usage_counter);
-   swake_up_all(>mq_freeze_wq);
+   wake_up_all(>mq_freeze_wq);
}
 }
 EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
@@ -129,7 +129,7 @@ void blk_mq_wake_waiters(struct request_queue *q)
 * dying, we need to ensure that processes currently waiting on
 * the queue are notified as well.
 */
-   swake_up_all(>mq_freeze_wq);
+   wake_up_all(>mq_freeze_wq);
 }
 
 bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 34fd1ed9845e..fdb449fe3ff7 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -468,7 +468,7 @@ struct request_queue {
struct throtl_data *td;
 #endif
struct rcu_head rcu_head;
-   struct swait_queue_head mq_freeze_wq;
+   wait_queue_head_t   mq_freeze_wq;
struct percpu_ref   q_usage_counter;
struct list_headall_q_node;
 
-- 
2.18.0



[PATCH RT 00/22] Linux 4.9.115-rt94-rc1

2018-08-06 Thread Julia Cartwright
Hello RT folks!

This patchset brings back many RT-specific fixes that have gone into
subsequent 4.14-rt and 4.16-rt releases.

One of my x86 boxes very intermittently triggers a WARN_ON() on bootup
in migrate_enable(), which I'm still trying to triage.  If you can more
reliably reproduce this, please let me know.

This release candidate will not be pushed to the git tree.

To build 4.9.115-rt94-rc1 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v4.x/linux-4.9.tar.xz

  http://www.kernel.org/pub/linux/kernel/v4.x/patch-4.9.115.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/4.9/patch-4.9.115-rt94-rc1.patch.xz

If all goes well with testing, this rc will be promoted to an official
release on 8/16/2018.

Please go forth and test!

Thanks,
   Julia

---
Boqun Feng (1):
  rtmutex: Make rt_mutex_futex_unlock() safe for irq-off callsites

Julia Cartwright (4):
  locallock: provide {get,put}_locked_ptr() variants
  squashfs: make use of local lock in multi_cpu decompressor
  seqlock: provide the same ordering semantics as mainline
  Linux 4.9.115-rt94-rc1

Paul E. McKenney (1):
  rcu: Suppress lockdep false-positive ->boost_mtx complaints

Peter Zijlstra (4):
  futex: Fix pi_state->owner serialization
  futex: Fix more put_pi_state() vs. exit_pi_state_list() races
  futex: Avoid violating the 10th rule of futex
  futex: Fix OWNER_DEAD fixup

Sebastian Andrzej Siewior (12):
  rcu: Do not include rtmutex_common.h unconditionally
  sched, tracing: Fix trace_sched_pi_setprio() for deboosting
  crypto: limit more FPU-enabled sections
  arm*: disable NEON in kernel mode
  mm/slub: close possible memory-leak in kmem_cache_alloc_bulk()
  locking: add types.h
  net: use task_struct instead of CPU number as the queue owner on -RT
  Revert "rt,ntp: Move call to schedule_delayed_work() to helper thread"
  Revert "block: blk-mq: Use swait"
  block: blk-mq: move blk_queue_usage_counter_release() into process
context
  alarmtimer: Prevent live lock in alarm_cancel()
  posix-timers: move the rcu head out of the union

 arch/arm/Kconfig   |   2 +-
 arch/arm64/crypto/Kconfig  |  14 +-
 arch/x86/crypto/camellia_aesni_avx2_glue.c |  20 +++
 arch/x86/crypto/camellia_aesni_avx_glue.c  |  19 +++
 arch/x86/crypto/cast6_avx_glue.c   |  24 +++-
 arch/x86/crypto/chacha20_glue.c|   9 +-
 arch/x86/crypto/serpent_avx2_glue.c|  19 +++
 arch/x86/crypto/serpent_avx_glue.c |  23 +++-
 arch/x86/crypto/serpent_sse2_glue.c|  23 +++-
 arch/x86/crypto/twofish_avx_glue.c |  27 +++-
 arch/x86/include/asm/fpu/api.h |   1 +
 arch/x86/kernel/fpu/core.c |  12 ++
 block/blk-core.c   |  22 +++-
 block/blk-mq.c |   6 +-
 fs/squashfs/decompressor_multi_percpu.c|  16 ++-
 include/linux/blkdev.h |   4 +-
 include/linux/locallock.h  |  10 ++
 include/linux/netdevice.h  |  54 +++-
 include/linux/posix-timers.h   |   2 +-
 include/linux/seqlock.h|   1 +
 include/linux/spinlock_types_raw.h |   2 +
 include/trace/events/sched.h   |   4 +-
 kernel/futex.c | 144 -
 kernel/locking/rtmutex.c   |  31 +++--
 kernel/locking/rtmutex_common.h|   1 +
 kernel/rcu/tree_plugin.h   |   5 +-
 kernel/time/alarmtimer.c   |   2 +-
 kernel/time/ntp.c  |  26 
 kernel/time/posix-timers.c |   4 +-
 localversion-rt|   2 +-
 mm/slub.c  |   1 +
 net/core/dev.c |   6 +-
 32 files changed, 412 insertions(+), 124 deletions(-)

-- 
2.18.0



[PATCH RT 22/22] Linux 4.9.115-rt94-rc1

2018-08-06 Thread Julia Cartwright
4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
---
 localversion-rt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/localversion-rt b/localversion-rt
index e98a1fe050bd..dcc2fd2ca155 100644
--- a/localversion-rt
+++ b/localversion-rt
@@ -1 +1 @@
--rt93
+-rt94-rc1
-- 
2.18.0



[PATCH RT 22/22] Linux 4.9.115-rt94-rc1

2018-08-06 Thread Julia Cartwright
4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.

--- 8< --- 8< --- 8< ---
---
 localversion-rt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/localversion-rt b/localversion-rt
index e98a1fe050bd..dcc2fd2ca155 100644
--- a/localversion-rt
+++ b/localversion-rt
@@ -1 +1 @@
--rt93
+-rt94-rc1
-- 
2.18.0



Re: simple-framebuffer enquire

2018-06-26 Thread Julia Cartwright
On Tue, Jun 26, 2018 at 08:44:58PM +0200, Hans de Goede wrote:
> Hi,
> 
> On 26-06-18 18:35, Michael Nazzareno Trimarchi wrote:
[..]
> > cat memblock/reserved
> > 0: 0x80004000..0x80007fff
> > 1: 0x8010..0x81e030b3
> > 2: 0x8300..0x83007fff
> > 3: 0x8400..0x85ff
> > 4: 0x86fa2000..0x87021fff
> > 
> > +   reserved-memory {
> > +   #address-cells = <1>;
> > +   #size-cells = <1>;
> > +   ranges;
> > +
> > +   display_reserved: framebuffer@86fa2000 {
> > +   reg = <0x86fa2000 0x8>;
> > +   };
> > +
> > +   };
[..]
> > Still have the same on ioremap.
> 
> Hmm, I guess the kernel does map the entire region its get
> passed and simply makes sure to not touch the reserved mem,
> where as with the changes to the passed in mem-region the
> sunxi u-boot code does the memory does not get mapped by
> the kernel at all ?

If the intent is to reserve memory _and_ prevent it from being included
in the kernel's linear map, then it is also necessary to include the
'no-map' property for this reserved-mem node.

>From Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt:

   no-map (optional) - empty property
   - Indicates the operating system must not create a virtual mapping
 of the region as part of its standard mapping of system memory,
 nor permit speculative access to it under any circumstances other
 than under the control of the device driver using the region.

Julia


Re: simple-framebuffer enquire

2018-06-26 Thread Julia Cartwright
On Tue, Jun 26, 2018 at 08:44:58PM +0200, Hans de Goede wrote:
> Hi,
> 
> On 26-06-18 18:35, Michael Nazzareno Trimarchi wrote:
[..]
> > cat memblock/reserved
> > 0: 0x80004000..0x80007fff
> > 1: 0x8010..0x81e030b3
> > 2: 0x8300..0x83007fff
> > 3: 0x8400..0x85ff
> > 4: 0x86fa2000..0x87021fff
> > 
> > +   reserved-memory {
> > +   #address-cells = <1>;
> > +   #size-cells = <1>;
> > +   ranges;
> > +
> > +   display_reserved: framebuffer@86fa2000 {
> > +   reg = <0x86fa2000 0x8>;
> > +   };
> > +
> > +   };
[..]
> > Still have the same on ioremap.
> 
> Hmm, I guess the kernel does map the entire region its get
> passed and simply makes sure to not touch the reserved mem,
> where as with the changes to the passed in mem-region the
> sunxi u-boot code does the memory does not get mapped by
> the kernel at all ?

If the intent is to reserve memory _and_ prevent it from being included
in the kernel's linear map, then it is also necessary to include the
'no-map' property for this reserved-mem node.

>From Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt:

   no-map (optional) - empty property
   - Indicates the operating system must not create a virtual mapping
 of the region as part of its standard mapping of system memory,
 nor permit speculative access to it under any circumstances other
 than under the control of the device driver using the region.

Julia


[ANNOUNCE] 4.9.98-rt76

2018-05-10 Thread Julia Cartwright
Hello RT Folks!

I'm pleased to announce the 4.9.98-rt76 stable release.

This release is just an update to the new stable 4.9.98 version
and no RT specific changes have been made.

Expect a 4.9.98-rt77-rc1 with backports from rt-devel soon.

You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v4.9-rt
  Head SHA1: bc64e52891b8b9e56b19cbb800a1b8c415df13e8

Or to build 4.9.98-rt76 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v4.x/linux-4.9.tar.xz

  http://www.kernel.org/pub/linux/kernel/v4.x/patch-4.9.98.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/4.9/patch-4.9.98-rt76.patch.xz

Enjoy!
Julia



[ANNOUNCE] 4.9.98-rt76

2018-05-10 Thread Julia Cartwright
Hello RT Folks!

I'm pleased to announce the 4.9.98-rt76 stable release.

This release is just an update to the new stable 4.9.98 version
and no RT specific changes have been made.

Expect a 4.9.98-rt77-rc1 with backports from rt-devel soon.

You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v4.9-rt
  Head SHA1: bc64e52891b8b9e56b19cbb800a1b8c415df13e8

Or to build 4.9.98-rt76 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v4.x/linux-4.9.tar.xz

  http://www.kernel.org/pub/linux/kernel/v4.x/patch-4.9.98.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/4.9/patch-4.9.98-rt76.patch.xz

Enjoy!
Julia



Re: [LINUX PATCH v8 2/2] memory: pl353: Add driver for arm pl353 static memory controller

2018-05-07 Thread Julia Cartwright
On Mon, May 07, 2018 at 10:12:28AM +, Naga Sureshkumar Relli wrote:
> Hi Julia,
>
> Thanks for reviewing the patch and Sorry for my late reply. This patch
> went to junk folder, hence I didn't catch this patch.
>
> From: Julia Cartwright [mailto:ju...@ni.com]
[..]
> > 
> > It would be easier to follow if you constructed your two patchsets with git 
> > format-patch --
> > thread.
> >
>
> I am using the same but with out --thread.
>
> > Or, merge them into a single patchset, especially considering the 
> > dependency between patches.
>
> But both are different patches, one for Device tree documentation and other 
> for driver update.

Yes, I'm not proposing you merge _patches_ but _patchsets_.  You have
two patchsets, one for the SMC driver, and another for the NAND.  Given
that they depend on one another, it's helpful for reviewers if you sent
them all together, with a cover letter which describes the entire
patchset, changelog, it's dependencies, revision changelog, etc.

Something like:

   [PATCH v9 0/4] rawnand: memory: add support for PL353 static memory 
controller + NAND
  [PATCH v9 1/4] Devicetree: Add pl353 smc controller devicetree binding 
information
  [PATCH v9 2/4] memory: pl353: Add driver for arm pl353 static memory 
controller
  [PATCH v9 3/4] Documentation: nand: pl353: Add documentation for 
controller and driver
  [PATCH v9 4/4] mtd: rawnand: pl353: Add basic driver for arm pl353 smc 
nand interface

Anyway, just with --thread enabled would be an improvement.

[..]
> > > --- a/drivers/memory/Kconfig
> > > +++ b/drivers/memory/Kconfig
> > > @@ -152,6 +152,13 @@ config DA8XX_DDRCTL
> > > This driver is for the DDR2/mDDR Memory Controller present on
> > > Texas Instruments da8xx SoCs. It's used to tweak various memory
> > > controller configuration options.
> > > +config PL35X_SMC
> > > + bool "ARM PL35X Static Memory Controller(SMC) driver"
> >
> > Is there any reason why this can't be tristate?
>
> There is a Nand driver which uses this driver. i.e The NAND driver
> Depends on this driver.

That's true, but it's irrelevant to question I asked.  It is perfectly
valid for both the SMC and NAND drivers to be tristate, why are you not
allowing this configuration?

   Julia


Re: [LINUX PATCH v8 2/2] memory: pl353: Add driver for arm pl353 static memory controller

2018-05-07 Thread Julia Cartwright
On Mon, May 07, 2018 at 10:12:28AM +, Naga Sureshkumar Relli wrote:
> Hi Julia,
>
> Thanks for reviewing the patch and Sorry for my late reply. This patch
> went to junk folder, hence I didn't catch this patch.
>
> From: Julia Cartwright [mailto:ju...@ni.com]
[..]
> > 
> > It would be easier to follow if you constructed your two patchsets with git 
> > format-patch --
> > thread.
> >
>
> I am using the same but with out --thread.
>
> > Or, merge them into a single patchset, especially considering the 
> > dependency between patches.
>
> But both are different patches, one for Device tree documentation and other 
> for driver update.

Yes, I'm not proposing you merge _patches_ but _patchsets_.  You have
two patchsets, one for the SMC driver, and another for the NAND.  Given
that they depend on one another, it's helpful for reviewers if you sent
them all together, with a cover letter which describes the entire
patchset, changelog, it's dependencies, revision changelog, etc.

Something like:

   [PATCH v9 0/4] rawnand: memory: add support for PL353 static memory 
controller + NAND
  [PATCH v9 1/4] Devicetree: Add pl353 smc controller devicetree binding 
information
  [PATCH v9 2/4] memory: pl353: Add driver for arm pl353 static memory 
controller
  [PATCH v9 3/4] Documentation: nand: pl353: Add documentation for 
controller and driver
  [PATCH v9 4/4] mtd: rawnand: pl353: Add basic driver for arm pl353 smc 
nand interface

Anyway, just with --thread enabled would be an improvement.

[..]
> > > --- a/drivers/memory/Kconfig
> > > +++ b/drivers/memory/Kconfig
> > > @@ -152,6 +152,13 @@ config DA8XX_DDRCTL
> > > This driver is for the DDR2/mDDR Memory Controller present on
> > > Texas Instruments da8xx SoCs. It's used to tweak various memory
> > > controller configuration options.
> > > +config PL35X_SMC
> > > + bool "ARM PL35X Static Memory Controller(SMC) driver"
> >
> > Is there any reason why this can't be tristate?
>
> There is a Nand driver which uses this driver. i.e The NAND driver
> Depends on this driver.

That's true, but it's irrelevant to question I asked.  It is perfectly
valid for both the SMC and NAND drivers to be tristate, why are you not
allowing this configuration?

   Julia


[PATCH RT 1/2] locallock: provide {get,put}_locked_ptr() variants

2018-05-07 Thread Julia Cartwright
Provide a set of locallocked accessors for pointers to per-CPU data;
this is useful for dynamically-allocated per-CPU regions, for example.

These are symmetric with the {get,put}_cpu_ptr() per-CPU accessor
variants.

Signed-off-by: Julia Cartwright <ju...@ni.com>
---
 include/linux/locallock.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/locallock.h b/include/linux/locallock.h
index d658c2552601..921eab83cd34 100644
--- a/include/linux/locallock.h
+++ b/include/linux/locallock.h
@@ -222,6 +222,14 @@ static inline int __local_unlock_irqrestore(struct 
local_irq_lock *lv,
 
 #define put_locked_var(lvar, var)  local_unlock(lvar);
 
+#define get_locked_ptr(lvar, var)  \
+   ({  \
+   local_lock(lvar);   \
+   this_cpu_ptr(var);  \
+   })
+
+#define put_locked_ptr(lvar, var)  local_unlock(lvar);
+
 #define local_lock_cpu(lvar)   \
({  \
local_lock(lvar);   \
@@ -262,6 +270,8 @@ static inline void local_irq_lock_init(int lvar) { }
 
 #define get_locked_var(lvar, var)  get_cpu_var(var)
 #define put_locked_var(lvar, var)  put_cpu_var(var)
+#define get_locked_ptr(lvar, var)  get_cpu_ptr(var)
+#define put_locked_ptr(lvar, var)  put_cpu_ptr(var)
 
 #define local_lock_cpu(lvar)   get_cpu()
 #define local_unlock_cpu(lvar) put_cpu()
-- 
2.17.0



[PATCH RT 1/2] locallock: provide {get,put}_locked_ptr() variants

2018-05-07 Thread Julia Cartwright
Provide a set of locallocked accessors for pointers to per-CPU data;
this is useful for dynamically-allocated per-CPU regions, for example.

These are symmetric with the {get,put}_cpu_ptr() per-CPU accessor
variants.

Signed-off-by: Julia Cartwright 
---
 include/linux/locallock.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/locallock.h b/include/linux/locallock.h
index d658c2552601..921eab83cd34 100644
--- a/include/linux/locallock.h
+++ b/include/linux/locallock.h
@@ -222,6 +222,14 @@ static inline int __local_unlock_irqrestore(struct 
local_irq_lock *lv,
 
 #define put_locked_var(lvar, var)  local_unlock(lvar);
 
+#define get_locked_ptr(lvar, var)  \
+   ({  \
+   local_lock(lvar);   \
+   this_cpu_ptr(var);  \
+   })
+
+#define put_locked_ptr(lvar, var)  local_unlock(lvar);
+
 #define local_lock_cpu(lvar)   \
({  \
local_lock(lvar);   \
@@ -262,6 +270,8 @@ static inline void local_irq_lock_init(int lvar) { }
 
 #define get_locked_var(lvar, var)  get_cpu_var(var)
 #define put_locked_var(lvar, var)  put_cpu_var(var)
+#define get_locked_ptr(lvar, var)  get_cpu_ptr(var)
+#define put_locked_ptr(lvar, var)  put_cpu_ptr(var)
 
 #define local_lock_cpu(lvar)   get_cpu()
 #define local_unlock_cpu(lvar) put_cpu()
-- 
2.17.0



[PATCH RT 2/2] squashfs: make use of local lock in multi_cpu decompressor

2018-05-07 Thread Julia Cartwright
Currently, the squashfs multi_cpu decompressor makes use of
get_cpu_ptr()/put_cpu_ptr(), which unconditionally disable preemption
during decompression.

Because the workload is distributed across CPUs, all CPUs can observe a
very high wakeup latency, which has been seen to be as much as 8000us.

Convert this decompressor to make use of a local lock, which will allow
execution of the decompressor with preemption-enabled, but also ensure
concurrent accesses to the percpu compressor data on the local CPU will
be serialized.

Reported-by: Alexander Stein <alexander.st...@systec-electronic.com>
Tested-by: Alexander Stein <alexander.st...@systec-electronic.com>
Signed-off-by: Julia Cartwright <ju...@ni.com>
---
 fs/squashfs/decompressor_multi_percpu.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/squashfs/decompressor_multi_percpu.c 
b/fs/squashfs/decompressor_multi_percpu.c
index 23a9c28ad8ea..6a73c4fa88e7 100644
--- a/fs/squashfs/decompressor_multi_percpu.c
+++ b/fs/squashfs/decompressor_multi_percpu.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "squashfs_fs.h"
 #include "squashfs_fs_sb.h"
@@ -25,6 +26,8 @@ struct squashfs_stream {
void*stream;
 };
 
+static DEFINE_LOCAL_IRQ_LOCK(stream_lock);
+
 void *squashfs_decompressor_create(struct squashfs_sb_info *msblk,
void *comp_opts)
 {
@@ -79,10 +82,15 @@ int squashfs_decompress(struct squashfs_sb_info *msblk, 
struct buffer_head **bh,
 {
struct squashfs_stream __percpu *percpu =
(struct squashfs_stream __percpu *) msblk->stream;
-   struct squashfs_stream *stream = get_cpu_ptr(percpu);
-   int res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
-   offset, length, output);
-   put_cpu_ptr(stream);
+   struct squashfs_stream *stream;
+   int res;
+
+   stream = get_locked_ptr(stream_lock, percpu);
+
+   res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
+   offset, length, output);
+
+   put_locked_ptr(stream_lock, stream);
 
if (res < 0)
ERROR("%s decompression failed, data probably corrupt\n",
-- 
2.17.0



[PATCH RT 2/2] squashfs: make use of local lock in multi_cpu decompressor

2018-05-07 Thread Julia Cartwright
Currently, the squashfs multi_cpu decompressor makes use of
get_cpu_ptr()/put_cpu_ptr(), which unconditionally disable preemption
during decompression.

Because the workload is distributed across CPUs, all CPUs can observe a
very high wakeup latency, which has been seen to be as much as 8000us.

Convert this decompressor to make use of a local lock, which will allow
execution of the decompressor with preemption-enabled, but also ensure
concurrent accesses to the percpu compressor data on the local CPU will
be serialized.

Reported-by: Alexander Stein 
Tested-by: Alexander Stein 
Signed-off-by: Julia Cartwright 
---
 fs/squashfs/decompressor_multi_percpu.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/squashfs/decompressor_multi_percpu.c 
b/fs/squashfs/decompressor_multi_percpu.c
index 23a9c28ad8ea..6a73c4fa88e7 100644
--- a/fs/squashfs/decompressor_multi_percpu.c
+++ b/fs/squashfs/decompressor_multi_percpu.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "squashfs_fs.h"
 #include "squashfs_fs_sb.h"
@@ -25,6 +26,8 @@ struct squashfs_stream {
void*stream;
 };
 
+static DEFINE_LOCAL_IRQ_LOCK(stream_lock);
+
 void *squashfs_decompressor_create(struct squashfs_sb_info *msblk,
void *comp_opts)
 {
@@ -79,10 +82,15 @@ int squashfs_decompress(struct squashfs_sb_info *msblk, 
struct buffer_head **bh,
 {
struct squashfs_stream __percpu *percpu =
(struct squashfs_stream __percpu *) msblk->stream;
-   struct squashfs_stream *stream = get_cpu_ptr(percpu);
-   int res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
-   offset, length, output);
-   put_cpu_ptr(stream);
+   struct squashfs_stream *stream;
+   int res;
+
+   stream = get_locked_ptr(stream_lock, percpu);
+
+   res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
+   offset, length, output);
+
+   put_locked_ptr(stream_lock, stream);
 
if (res < 0)
ERROR("%s decompression failed, data probably corrupt\n",
-- 
2.17.0



[PATCH RT] seqlock: provide the same ordering semantics as mainline

2018-04-26 Thread Julia Cartwright
The mainline implementation of read_seqbegin() orders prior loads w.r.t.
the read-side critical section.  Fixup the RT writer-boosting
implementation to provide the same guarantee.

Also, while we're here, update the usage of ACCESS_ONCE() to use
READ_ONCE().

Fixes: e69f15cf77c23 ("seqlock: Prevent rt starvation")
Cc: stable...@vger.kernel.org
Signed-off-by: Julia Cartwright <ju...@ni.com>
---
Found during code inspection of the RT seqlock implementation.

   Julia

 include/linux/seqlock.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index a59751276b94..597ce5a9e013 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -453,7 +453,7 @@ static inline unsigned read_seqbegin(seqlock_t *sl)
unsigned ret;
 
 repeat:
-   ret = ACCESS_ONCE(sl->seqcount.sequence);
+   ret = READ_ONCE(sl->seqcount.sequence);
if (unlikely(ret & 1)) {
/*
 * Take the lock and let the writer proceed (i.e. evtl
@@ -462,6 +462,7 @@ static inline unsigned read_seqbegin(seqlock_t *sl)
spin_unlock_wait(>lock);
goto repeat;
}
+   smp_rmb();
return ret;
 }
 #endif
-- 
2.16.1



[PATCH RT] seqlock: provide the same ordering semantics as mainline

2018-04-26 Thread Julia Cartwright
The mainline implementation of read_seqbegin() orders prior loads w.r.t.
the read-side critical section.  Fixup the RT writer-boosting
implementation to provide the same guarantee.

Also, while we're here, update the usage of ACCESS_ONCE() to use
READ_ONCE().

Fixes: e69f15cf77c23 ("seqlock: Prevent rt starvation")
Cc: stable...@vger.kernel.org
Signed-off-by: Julia Cartwright 
---
Found during code inspection of the RT seqlock implementation.

   Julia

 include/linux/seqlock.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index a59751276b94..597ce5a9e013 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -453,7 +453,7 @@ static inline unsigned read_seqbegin(seqlock_t *sl)
unsigned ret;
 
 repeat:
-   ret = ACCESS_ONCE(sl->seqcount.sequence);
+   ret = READ_ONCE(sl->seqcount.sequence);
if (unlikely(ret & 1)) {
/*
 * Take the lock and let the writer proceed (i.e. evtl
@@ -462,6 +462,7 @@ static inline unsigned read_seqbegin(seqlock_t *sl)
spin_unlock_wait(>lock);
goto repeat;
}
+   smp_rmb();
return ret;
 }
 #endif
-- 
2.16.1



Re: [LINUX PATCH v8 2/2] memory: pl353: Add driver for arm pl353 static memory controller

2018-04-03 Thread Julia Cartwright
Hello-

On Wed, Mar 14, 2018 at 04:14:34PM +0530, nagasureshkumarre...@gmail.com wrote:
> From: Naga Sureshkumar Relli 

I'm pleased to see this patchset revived and resubmitted!

It would be easier to follow if you constructed your two patchsets with
git format-patch --thread.

Or, merge them into a single patchset, especially considering the
dependency between patches.

Some code review comments below:

> Add driver for arm pl353 static memory controller. This controller is
> used in xilinx zynq soc for interfacing the nand and nor/sram memory
> devices.
> 
> Signed-off-by: Naga Sureshkumar Relli 
> ---
> Changes in v8:
> - None
> Changes in v7:
> - Corrected the kconfig to use tristate selection
> - Corrected the GPL licence ident 
> - Added boundary checks for nand timing parameters 
> Changes in v6:
> - Fixed checkpatch.pl reported warnings
> Changes in v5:
> - Added pl353_smc_get_clkrate function, made pl353_smc_set_cycles as public
>   API
> - Removed nand timing parameter initialization and moved it to nand driver  
> Changes in v4:
> - Modified driver to support multiple instances
> - Used sleep instaed of busywait for delay
> Changes in v3:
> - None
> Changes in v2:
> - Since now the timing parameters are in nano seconds, added logic to convert
>   them to the cycles
> ---
>  drivers/memory/Kconfig  |   7 +
>  drivers/memory/Makefile |   1 +
>  drivers/memory/pl353-smc.c  | 548 
> 
>  include/linux/platform_data/pl353-smc.h |  34 ++
>  4 files changed, 590 insertions(+)
>  create mode 100644 drivers/memory/pl353-smc.c
>  create mode 100644 include/linux/platform_data/pl353-smc.h
> 
> diff --git a/drivers/memory/Kconfig b/drivers/memory/Kconfig
> index 19a0e83..d70d6db 100644
> --- a/drivers/memory/Kconfig
> +++ b/drivers/memory/Kconfig
> @@ -152,6 +152,13 @@ config DA8XX_DDRCTL
> This driver is for the DDR2/mDDR Memory Controller present on
> Texas Instruments da8xx SoCs. It's used to tweak various memory
> controller configuration options.
> +config PL35X_SMC
> + bool "ARM PL35X Static Memory Controller(SMC) driver"

Is there any reason why this can't be tristate?

[..]
> +++ b/drivers/memory/pl353-smc.c
[..]
> +/**
> + * struct pl353_smc_data - Private smc driver structure
> + * @devclk:  Pointer to the peripheral clock
> + * @aperclk: Pointer to the APER clock
> + */
> +struct pl353_smc_data {
> + struct clk  *memclk;
> + struct clk  *aclk;
> +};
> +
> +/* SMC virtual register base */
> +static void __iomem *pl353_smc_base;

While it's true that the Zynq chips only have a single instance of this
IP, there is no real reason why an SoC can't instantiate more than one.
I'm a bit uncomfortable with the fact that this has been designed out.

> +
> +/**
> + * pl353_smc_set_buswidth - Set memory buswidth
> + * @bw:  Memory buswidth (8 | 16)
> + * Return: 0 on success or negative errno.
> + */
> +int pl353_smc_set_buswidth(unsigned int bw)
> +{
> +
> + if (bw != PL353_SMC_MEM_WIDTH_8  && bw != PL353_SMC_MEM_WIDTH_16)
> + return -EINVAL;
> +
> + writel(bw, pl353_smc_base + PL353_SMC_SET_OPMODE_OFFS);

There should be no reason not to use the _relaxed() accessor variants.

> + writel(PL353_SMC_DC_UPT_NAND_REGS, pl353_smc_base +
> +PL353_SMC_DIRECT_CMD_OFFS);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(pl353_smc_set_buswidth);
> +
> +/**
> + * pl353_smc_set_cycles - Set memory timing parameters
> + * @t0:  t_rcread cycle time
> + * @t1:  t_wcwrite cycle time
> + * @t2:  t_rea/t_ceoeoutput enable assertion delay
> + * @t3:  t_wpwrite enable deassertion delay
> + * @t4:  t_clr/t_pc  page cycle time
> + * @t5:  t_ar/t_ta   ID read time/turnaround time
> + * @t6:  t_rrbusy to RE timing
> + *
> + * Sets NAND chip specific timing parameters.
> + */
> +static void pl353_smc_set_cycles(u32 t0, u32 t1, u32 t2, u32 t3, u32
> +   t4, u32 t5, u32 t6)
> +{
> + t0 &= PL353_SMC_SET_CYCLES_T0_MASK;
> + t1 = (t1 & PL353_SMC_SET_CYCLES_T1_MASK) <<
> + PL353_SMC_SET_CYCLES_T1_SHIFT;
> + t2 = (t2 & PL353_SMC_SET_CYCLES_T2_MASK) <<
> + PL353_SMC_SET_CYCLES_T2_SHIFT;
> + t3 = (t3 & PL353_SMC_SET_CYCLES_T3_MASK) <<
> + PL353_SMC_SET_CYCLES_T3_SHIFT;
> + t4 = (t4 & PL353_SMC_SET_CYCLES_T4_MASK) <<
> + PL353_SMC_SET_CYCLES_T4_SHIFT;
> + t5 = (t5 & PL353_SMC_SET_CYCLES_T5_MASK) <<
> + PL353_SMC_SET_CYCLES_T5_SHIFT;
> + t6 = (t6 & PL353_SMC_SET_CYCLES_T6_MASK) <<
> + PL353_SMC_SET_CYCLES_T6_SHIFT;
> +
> + t0 |= t1 | t2 | t3 | t4 | t5 | t6;
> +
> + writel(t0, pl353_smc_base + PL353_SMC_SET_CYCLES_OFFS);
> + 

Re: [LINUX PATCH v8 2/2] memory: pl353: Add driver for arm pl353 static memory controller

2018-04-03 Thread Julia Cartwright
Hello-

On Wed, Mar 14, 2018 at 04:14:34PM +0530, nagasureshkumarre...@gmail.com wrote:
> From: Naga Sureshkumar Relli 

I'm pleased to see this patchset revived and resubmitted!

It would be easier to follow if you constructed your two patchsets with
git format-patch --thread.

Or, merge them into a single patchset, especially considering the
dependency between patches.

Some code review comments below:

> Add driver for arm pl353 static memory controller. This controller is
> used in xilinx zynq soc for interfacing the nand and nor/sram memory
> devices.
> 
> Signed-off-by: Naga Sureshkumar Relli 
> ---
> Changes in v8:
> - None
> Changes in v7:
> - Corrected the kconfig to use tristate selection
> - Corrected the GPL licence ident 
> - Added boundary checks for nand timing parameters 
> Changes in v6:
> - Fixed checkpatch.pl reported warnings
> Changes in v5:
> - Added pl353_smc_get_clkrate function, made pl353_smc_set_cycles as public
>   API
> - Removed nand timing parameter initialization and moved it to nand driver  
> Changes in v4:
> - Modified driver to support multiple instances
> - Used sleep instaed of busywait for delay
> Changes in v3:
> - None
> Changes in v2:
> - Since now the timing parameters are in nano seconds, added logic to convert
>   them to the cycles
> ---
>  drivers/memory/Kconfig  |   7 +
>  drivers/memory/Makefile |   1 +
>  drivers/memory/pl353-smc.c  | 548 
> 
>  include/linux/platform_data/pl353-smc.h |  34 ++
>  4 files changed, 590 insertions(+)
>  create mode 100644 drivers/memory/pl353-smc.c
>  create mode 100644 include/linux/platform_data/pl353-smc.h
> 
> diff --git a/drivers/memory/Kconfig b/drivers/memory/Kconfig
> index 19a0e83..d70d6db 100644
> --- a/drivers/memory/Kconfig
> +++ b/drivers/memory/Kconfig
> @@ -152,6 +152,13 @@ config DA8XX_DDRCTL
> This driver is for the DDR2/mDDR Memory Controller present on
> Texas Instruments da8xx SoCs. It's used to tweak various memory
> controller configuration options.
> +config PL35X_SMC
> + bool "ARM PL35X Static Memory Controller(SMC) driver"

Is there any reason why this can't be tristate?

[..]
> +++ b/drivers/memory/pl353-smc.c
[..]
> +/**
> + * struct pl353_smc_data - Private smc driver structure
> + * @devclk:  Pointer to the peripheral clock
> + * @aperclk: Pointer to the APER clock
> + */
> +struct pl353_smc_data {
> + struct clk  *memclk;
> + struct clk  *aclk;
> +};
> +
> +/* SMC virtual register base */
> +static void __iomem *pl353_smc_base;

While it's true that the Zynq chips only have a single instance of this
IP, there is no real reason why an SoC can't instantiate more than one.
I'm a bit uncomfortable with the fact that this has been designed out.

> +
> +/**
> + * pl353_smc_set_buswidth - Set memory buswidth
> + * @bw:  Memory buswidth (8 | 16)
> + * Return: 0 on success or negative errno.
> + */
> +int pl353_smc_set_buswidth(unsigned int bw)
> +{
> +
> + if (bw != PL353_SMC_MEM_WIDTH_8  && bw != PL353_SMC_MEM_WIDTH_16)
> + return -EINVAL;
> +
> + writel(bw, pl353_smc_base + PL353_SMC_SET_OPMODE_OFFS);

There should be no reason not to use the _relaxed() accessor variants.

> + writel(PL353_SMC_DC_UPT_NAND_REGS, pl353_smc_base +
> +PL353_SMC_DIRECT_CMD_OFFS);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(pl353_smc_set_buswidth);
> +
> +/**
> + * pl353_smc_set_cycles - Set memory timing parameters
> + * @t0:  t_rcread cycle time
> + * @t1:  t_wcwrite cycle time
> + * @t2:  t_rea/t_ceoeoutput enable assertion delay
> + * @t3:  t_wpwrite enable deassertion delay
> + * @t4:  t_clr/t_pc  page cycle time
> + * @t5:  t_ar/t_ta   ID read time/turnaround time
> + * @t6:  t_rrbusy to RE timing
> + *
> + * Sets NAND chip specific timing parameters.
> + */
> +static void pl353_smc_set_cycles(u32 t0, u32 t1, u32 t2, u32 t3, u32
> +   t4, u32 t5, u32 t6)
> +{
> + t0 &= PL353_SMC_SET_CYCLES_T0_MASK;
> + t1 = (t1 & PL353_SMC_SET_CYCLES_T1_MASK) <<
> + PL353_SMC_SET_CYCLES_T1_SHIFT;
> + t2 = (t2 & PL353_SMC_SET_CYCLES_T2_MASK) <<
> + PL353_SMC_SET_CYCLES_T2_SHIFT;
> + t3 = (t3 & PL353_SMC_SET_CYCLES_T3_MASK) <<
> + PL353_SMC_SET_CYCLES_T3_SHIFT;
> + t4 = (t4 & PL353_SMC_SET_CYCLES_T4_MASK) <<
> + PL353_SMC_SET_CYCLES_T4_SHIFT;
> + t5 = (t5 & PL353_SMC_SET_CYCLES_T5_MASK) <<
> + PL353_SMC_SET_CYCLES_T5_SHIFT;
> + t6 = (t6 & PL353_SMC_SET_CYCLES_T6_MASK) <<
> + PL353_SMC_SET_CYCLES_T6_SHIFT;
> +
> + t0 |= t1 | t2 | t3 | t4 | t5 | t6;
> +
> + writel(t0, pl353_smc_base + PL353_SMC_SET_CYCLES_OFFS);
> + writel(PL353_SMC_DC_UPT_NAND_REGS, pl353_smc_base +
> +   

Re: [PATCH RT] Defer migrate_enable migration while task state != TASK_RUNNING

2018-03-23 Thread Julia Cartwright
On Fri, Mar 23, 2018 at 01:21:31PM -0400, joe.ko...@concurrent-rt.com wrote:
> On Fri, Mar 23, 2018 at 11:59:21AM -0500, Julia Cartwright wrote:
> > On Fri, Mar 23, 2018 at 11:09:59AM -0400, joe.ko...@concurrent-rt.com wrote:
> > > I see the below kernel splat in 4.9-rt when I run a test program that
> > > continually changes the affinity of some set of running pids:
> > >
> > >do not call blocking ops when !TASK_RUNNING; state=2 set at ...
> > >   ...
> > >   stop_one_cpu+0x60/0x80
> > >   migrate_enable+0x21f/0x3e0
> > >   rt_spin_unlock+0x2f/0x40
> > >   prepare_to_wait+0x5c/0x80
> > >   ...
> >
> > This is clearly a problem.
> >
> > > The reason is that spin_unlock, write_unlock, and read_unlock call
> > > migrate_enable, and since 4.4-rt, migrate_enable will sleep if it 
> > > discovers
> > > that a migration is in order.  But sleeping in the unlock services is not
> > > expected by most kernel developers,
> >
> > I don't buy this, see below:
> >
> > > and where that counts most is in code sequences like the following:
> > >
> > >   set_current_state(TASK_UNINTERRUPIBLE);
> > >   spin_unlock();
> > >   schedule();
> >
> > The analog in mainline is CONFIG_PREEMPT and the implicit
> > preempt_enable() in spin_unlock().  In this configuration, a kernel
> > developer should _absolutely_ expect their task to be suspended (and
> > potentially migrated), _regardless of the task state_ if there is a
> > preemption event on the CPU on which this task is executing.
> >
> > Similarly, on RT, there is nothing _conceptually_ wrong on RT with
> > migrating on migrate_enable(), regardless of task state, if there is a
> > pending migration event.
>
> My understanding is, in standard Linux and in rt, setting
> task state to anything other than TASK_RUNNING in of itself
> blocks preemption.

I'm assuming you're referring to the window of time between a task
setting its state to !TASK_RUNNING and schedule()?  The task remains
preemptible in this window.

> A preemption is not really needed here as it is expected that there is
> a schedule() written in that will shortly be executed.
>
> And if a 'involuntary schedule' (ie, preemption) were allowed to occur
> between the task state set and the schedule(), that would change the
> task state back to TASK_RUNNING;

This isn't the case.  A preempted task preserves its state.

   Julia


Re: [PATCH RT] Defer migrate_enable migration while task state != TASK_RUNNING

2018-03-23 Thread Julia Cartwright
On Fri, Mar 23, 2018 at 01:21:31PM -0400, joe.ko...@concurrent-rt.com wrote:
> On Fri, Mar 23, 2018 at 11:59:21AM -0500, Julia Cartwright wrote:
> > On Fri, Mar 23, 2018 at 11:09:59AM -0400, joe.ko...@concurrent-rt.com wrote:
> > > I see the below kernel splat in 4.9-rt when I run a test program that
> > > continually changes the affinity of some set of running pids:
> > >
> > >do not call blocking ops when !TASK_RUNNING; state=2 set at ...
> > >   ...
> > >   stop_one_cpu+0x60/0x80
> > >   migrate_enable+0x21f/0x3e0
> > >   rt_spin_unlock+0x2f/0x40
> > >   prepare_to_wait+0x5c/0x80
> > >   ...
> >
> > This is clearly a problem.
> >
> > > The reason is that spin_unlock, write_unlock, and read_unlock call
> > > migrate_enable, and since 4.4-rt, migrate_enable will sleep if it 
> > > discovers
> > > that a migration is in order.  But sleeping in the unlock services is not
> > > expected by most kernel developers,
> >
> > I don't buy this, see below:
> >
> > > and where that counts most is in code sequences like the following:
> > >
> > >   set_current_state(TASK_UNINTERRUPIBLE);
> > >   spin_unlock();
> > >   schedule();
> >
> > The analog in mainline is CONFIG_PREEMPT and the implicit
> > preempt_enable() in spin_unlock().  In this configuration, a kernel
> > developer should _absolutely_ expect their task to be suspended (and
> > potentially migrated), _regardless of the task state_ if there is a
> > preemption event on the CPU on which this task is executing.
> >
> > Similarly, on RT, there is nothing _conceptually_ wrong on RT with
> > migrating on migrate_enable(), regardless of task state, if there is a
> > pending migration event.
>
> My understanding is, in standard Linux and in rt, setting
> task state to anything other than TASK_RUNNING in of itself
> blocks preemption.

I'm assuming you're referring to the window of time between a task
setting its state to !TASK_RUNNING and schedule()?  The task remains
preemptible in this window.

> A preemption is not really needed here as it is expected that there is
> a schedule() written in that will shortly be executed.
>
> And if a 'involuntary schedule' (ie, preemption) were allowed to occur
> between the task state set and the schedule(), that would change the
> task state back to TASK_RUNNING;

This isn't the case.  A preempted task preserves its state.

   Julia


Re: [PATCH RT] Defer migrate_enable migration while task state != TASK_RUNNING

2018-03-23 Thread Julia Cartwright
Hey Joe-

Thanks for the writeup.

On Fri, Mar 23, 2018 at 11:09:59AM -0400, joe.ko...@concurrent-rt.com wrote:
> I see the below kernel splat in 4.9-rt when I run a test program that
> continually changes the affinity of some set of running pids:
> 
>do not call blocking ops when !TASK_RUNNING; state=2 set at ...
>   ...
>   stop_one_cpu+0x60/0x80
>   migrate_enable+0x21f/0x3e0
>   rt_spin_unlock+0x2f/0x40
>   prepare_to_wait+0x5c/0x80
>   ...

This is clearly a problem.

> The reason is that spin_unlock, write_unlock, and read_unlock call
> migrate_enable, and since 4.4-rt, migrate_enable will sleep if it discovers
> that a migration is in order.  But sleeping in the unlock services is not
> expected by most kernel developers,

I don't buy this, see below:

> and where that counts most is in code sequences like the following:
>
>   set_current_state(TASK_UNINTERRUPIBLE);
>   spin_unlock();
>   schedule();

The analog in mainline is CONFIG_PREEMPT and the implicit
preempt_enable() in spin_unlock().  In this configuration, a kernel
developer should _absolutely_ expect their task to be suspended (and
potentially migrated), _regardless of the task state_ if there is a
preemption event on the CPU on which this task is executing.

Similarly, on RT, there is nothing _conceptually_ wrong on RT with
migrating on migrate_enable(), regardless of task state, if there is a
pending migration event.

It's clear, however, that the mechanism used here is broken ...

   Julia


Re: [PATCH RT] Defer migrate_enable migration while task state != TASK_RUNNING

2018-03-23 Thread Julia Cartwright
Hey Joe-

Thanks for the writeup.

On Fri, Mar 23, 2018 at 11:09:59AM -0400, joe.ko...@concurrent-rt.com wrote:
> I see the below kernel splat in 4.9-rt when I run a test program that
> continually changes the affinity of some set of running pids:
> 
>do not call blocking ops when !TASK_RUNNING; state=2 set at ...
>   ...
>   stop_one_cpu+0x60/0x80
>   migrate_enable+0x21f/0x3e0
>   rt_spin_unlock+0x2f/0x40
>   prepare_to_wait+0x5c/0x80
>   ...

This is clearly a problem.

> The reason is that spin_unlock, write_unlock, and read_unlock call
> migrate_enable, and since 4.4-rt, migrate_enable will sleep if it discovers
> that a migration is in order.  But sleeping in the unlock services is not
> expected by most kernel developers,

I don't buy this, see below:

> and where that counts most is in code sequences like the following:
>
>   set_current_state(TASK_UNINTERRUPIBLE);
>   spin_unlock();
>   schedule();

The analog in mainline is CONFIG_PREEMPT and the implicit
preempt_enable() in spin_unlock().  In this configuration, a kernel
developer should _absolutely_ expect their task to be suspended (and
potentially migrated), _regardless of the task state_ if there is a
preemption event on the CPU on which this task is executing.

Similarly, on RT, there is nothing _conceptually_ wrong on RT with
migrating on migrate_enable(), regardless of task state, if there is a
pending migration event.

It's clear, however, that the mechanism used here is broken ...

   Julia


Re: [PATCH v2 1/8] [PATCH 1/8] drivers/peci: Add support for PECI bus driver core

2018-03-06 Thread Julia Cartwright
On Wed, Feb 21, 2018 at 08:15:59AM -0800, Jae Hyun Yoo wrote:
> This commit adds driver implementation for PECI bus into linux
> driver framework.
> 
> Signed-off-by: Jae Hyun Yoo 
> ---
[..]
> +static int peci_locked_xfer(struct peci_adapter *adapter,
> + struct peci_xfer_msg *msg,
> + bool do_retry,
> + bool has_aw_fcs)

_locked generally means that this function is invoked with some critical
lock held, what lock does the caller need to acquire before invoking
this function?

> +{
> + ktime_t start, end;
> + s64 elapsed_ms;
> + int rc = 0;
> +
> + if (!adapter->xfer) {

Is this really an optional feature of an adapter?  If this is not
optional, then this check should be in place when the adapter is
registered, not here.  (And it should WARN_ON(), because it's a driver
developer error).

> + dev_dbg(>dev, "PECI level transfers not supported\n");
> + return -ENODEV;
> + }
> +
> + if (in_atomic() || irqs_disabled()) {

As Andrew mentioned, this is broken.

You don't even need a might_sleep().  The locking functions you use here
will already include a might_sleep() w/ CONFIG_DEBUG_ATOMIC_SLEEP.

> + rt_mutex_trylock(>bus_lock);
> + if (!rc)
> + return -EAGAIN; /* PECI activity is ongoing */
> + } else {
> + rt_mutex_lock(>bus_lock);
> + }
> +
> + if (do_retry)
> + start = ktime_get();
> +
> + do {
> + rc = adapter->xfer(adapter, msg);
> +
> + if (!do_retry)
> + break;
> +
> + /* Per the PECI spec, need to retry commands that return 0x8x */
> + if (!(!rc && ((msg->rx_buf[0] & DEV_PECI_CC_RETRY_ERR_MASK) ==
> +   DEV_PECI_CC_TIMEOUT)))
> + break;

This is pretty difficult to parse.  Can you split it into two different
conditions?

> +
> + /* Set the retry bit to indicate a retry attempt */
> + msg->tx_buf[1] |= DEV_PECI_RETRY_BIT;

Are you sure this bit is to be set in the _second_ byte of tx_buf?

> +
> + /* Recalculate the AW FCS if it has one */
> + if (has_aw_fcs)
> + msg->tx_buf[msg->tx_len - 1] = 0x80 ^
> + peci_aw_fcs((u8 *)msg,
> + 2 + msg->tx_len);
> +
> + /* Retry for at least 250ms before returning an error */
> + end = ktime_get();
> + elapsed_ms = ktime_to_ms(ktime_sub(end, start));
> + if (elapsed_ms >= DEV_PECI_RETRY_TIME_MS) {
> + dev_dbg(>dev, "Timeout retrying xfer!\n");
> + break;
> + }
> + } while (true);
> +
> + rt_mutex_unlock(>bus_lock);
> +
> + return rc;
> +}
> +
> +static int peci_xfer(struct peci_adapter *adapter, struct peci_xfer_msg *msg)
> +{
> + return peci_locked_xfer(adapter, msg, false, false);
> +}
> +
> +static int peci_xfer_with_retries(struct peci_adapter *adapter,
> +   struct peci_xfer_msg *msg,
> +   bool has_aw_fcs)
> +{
> + return peci_locked_xfer(adapter, msg, true, has_aw_fcs);
> +}
> +
> +static int peci_scan_cmd_mask(struct peci_adapter *adapter)
> +{
> + struct peci_xfer_msg msg;
> + u32 dib;
> + int rc = 0;
> +
> + /* Update command mask just once */
> + if (adapter->cmd_mask & BIT(PECI_CMD_PING))
> + return 0;
> +
> + msg.addr  = PECI_BASE_ADDR;
> + msg.tx_len= GET_DIB_WR_LEN;
> + msg.rx_len= GET_DIB_RD_LEN;
> + msg.tx_buf[0] = GET_DIB_PECI_CMD;
> +
> + rc = peci_xfer(adapter, );
> + if (rc < 0) {
> + dev_dbg(>dev, "PECI xfer error, rc : %d\n", rc);
> + return rc;
> + }
> +
> + dib = msg.rx_buf[0] | (msg.rx_buf[1] << 8) |
> +   (msg.rx_buf[2] << 16) | (msg.rx_buf[3] << 24);
> +
> + /* Check special case for Get DIB command */
> + if (dib == 0x00) {
> + dev_dbg(>dev, "DIB read as 0x00\n");
> + return -1;
> + }
> +
> + if (!rc) {

You should change this to:

if (rc) {
dev_dbg(>dev, "Error reading DIB, rc : %d\n", rc);
return rc;
}

And then leave the happy path below unindented.

> + /**
> +  * setting up the supporting commands based on minor rev#
> +  * see PECI Spec Table 3-1
> +  */
> + dib = (dib >> 8) & 0xF;
> +
> + if (dib >= 0x1) {
> + adapter->cmd_mask |= BIT(PECI_CMD_RD_PKG_CFG);
> + adapter->cmd_mask |= BIT(PECI_CMD_WR_PKG_CFG);
> + }
> +
> + if (dib >= 0x2)
> + adapter->cmd_mask |= BIT(PECI_CMD_RD_IA_MSR);
> +
> + if 

Re: [PATCH v2 1/8] [PATCH 1/8] drivers/peci: Add support for PECI bus driver core

2018-03-06 Thread Julia Cartwright
On Wed, Feb 21, 2018 at 08:15:59AM -0800, Jae Hyun Yoo wrote:
> This commit adds driver implementation for PECI bus into linux
> driver framework.
> 
> Signed-off-by: Jae Hyun Yoo 
> ---
[..]
> +static int peci_locked_xfer(struct peci_adapter *adapter,
> + struct peci_xfer_msg *msg,
> + bool do_retry,
> + bool has_aw_fcs)

_locked generally means that this function is invoked with some critical
lock held, what lock does the caller need to acquire before invoking
this function?

> +{
> + ktime_t start, end;
> + s64 elapsed_ms;
> + int rc = 0;
> +
> + if (!adapter->xfer) {

Is this really an optional feature of an adapter?  If this is not
optional, then this check should be in place when the adapter is
registered, not here.  (And it should WARN_ON(), because it's a driver
developer error).

> + dev_dbg(>dev, "PECI level transfers not supported\n");
> + return -ENODEV;
> + }
> +
> + if (in_atomic() || irqs_disabled()) {

As Andrew mentioned, this is broken.

You don't even need a might_sleep().  The locking functions you use here
will already include a might_sleep() w/ CONFIG_DEBUG_ATOMIC_SLEEP.

> + rt_mutex_trylock(>bus_lock);
> + if (!rc)
> + return -EAGAIN; /* PECI activity is ongoing */
> + } else {
> + rt_mutex_lock(>bus_lock);
> + }
> +
> + if (do_retry)
> + start = ktime_get();
> +
> + do {
> + rc = adapter->xfer(adapter, msg);
> +
> + if (!do_retry)
> + break;
> +
> + /* Per the PECI spec, need to retry commands that return 0x8x */
> + if (!(!rc && ((msg->rx_buf[0] & DEV_PECI_CC_RETRY_ERR_MASK) ==
> +   DEV_PECI_CC_TIMEOUT)))
> + break;

This is pretty difficult to parse.  Can you split it into two different
conditions?

> +
> + /* Set the retry bit to indicate a retry attempt */
> + msg->tx_buf[1] |= DEV_PECI_RETRY_BIT;

Are you sure this bit is to be set in the _second_ byte of tx_buf?

> +
> + /* Recalculate the AW FCS if it has one */
> + if (has_aw_fcs)
> + msg->tx_buf[msg->tx_len - 1] = 0x80 ^
> + peci_aw_fcs((u8 *)msg,
> + 2 + msg->tx_len);
> +
> + /* Retry for at least 250ms before returning an error */
> + end = ktime_get();
> + elapsed_ms = ktime_to_ms(ktime_sub(end, start));
> + if (elapsed_ms >= DEV_PECI_RETRY_TIME_MS) {
> + dev_dbg(>dev, "Timeout retrying xfer!\n");
> + break;
> + }
> + } while (true);
> +
> + rt_mutex_unlock(>bus_lock);
> +
> + return rc;
> +}
> +
> +static int peci_xfer(struct peci_adapter *adapter, struct peci_xfer_msg *msg)
> +{
> + return peci_locked_xfer(adapter, msg, false, false);
> +}
> +
> +static int peci_xfer_with_retries(struct peci_adapter *adapter,
> +   struct peci_xfer_msg *msg,
> +   bool has_aw_fcs)
> +{
> + return peci_locked_xfer(adapter, msg, true, has_aw_fcs);
> +}
> +
> +static int peci_scan_cmd_mask(struct peci_adapter *adapter)
> +{
> + struct peci_xfer_msg msg;
> + u32 dib;
> + int rc = 0;
> +
> + /* Update command mask just once */
> + if (adapter->cmd_mask & BIT(PECI_CMD_PING))
> + return 0;
> +
> + msg.addr  = PECI_BASE_ADDR;
> + msg.tx_len= GET_DIB_WR_LEN;
> + msg.rx_len= GET_DIB_RD_LEN;
> + msg.tx_buf[0] = GET_DIB_PECI_CMD;
> +
> + rc = peci_xfer(adapter, );
> + if (rc < 0) {
> + dev_dbg(>dev, "PECI xfer error, rc : %d\n", rc);
> + return rc;
> + }
> +
> + dib = msg.rx_buf[0] | (msg.rx_buf[1] << 8) |
> +   (msg.rx_buf[2] << 16) | (msg.rx_buf[3] << 24);
> +
> + /* Check special case for Get DIB command */
> + if (dib == 0x00) {
> + dev_dbg(>dev, "DIB read as 0x00\n");
> + return -1;
> + }
> +
> + if (!rc) {

You should change this to:

if (rc) {
dev_dbg(>dev, "Error reading DIB, rc : %d\n", rc);
return rc;
}

And then leave the happy path below unindented.

> + /**
> +  * setting up the supporting commands based on minor rev#
> +  * see PECI Spec Table 3-1
> +  */
> + dib = (dib >> 8) & 0xF;
> +
> + if (dib >= 0x1) {
> + adapter->cmd_mask |= BIT(PECI_CMD_RD_PKG_CFG);
> + adapter->cmd_mask |= BIT(PECI_CMD_WR_PKG_CFG);
> + }
> +
> + if (dib >= 0x2)
> + adapter->cmd_mask |= BIT(PECI_CMD_RD_IA_MSR);
> +
> + if (dib >= 0x3) {
> +

Re: C tricks for efficient stack zeroing

2018-03-06 Thread Julia Cartwright
On Fri, Mar 02, 2018 at 08:50:17PM +0100, Jason A. Donenfeld wrote:
[..]
> What would be really nice would be to somehow keep track of the
> maximum stack depth, and just before the function returns, clear from
> the maximum depth to its stack base, all in one single call. This
> would not only make the code faster and less brittle, but it would
> also clean up some algorithms quite a bit.
> 
> Ideally this would take the form of a gcc attribute on the function,
> but I was unable to find anything of that nature. I started looking
> for little C tricks for this, and came up dry too. I realize I could
> probably just take the current stack address and zero out until _the
> very end_ but that seems to overshoot and would probably be bad for
> performance. The best I've been able to do come up with are some
> x86-specific macros, but that approach seems a bit underwhelming.
> Other approaches include adding a new attribute via the gcc plugin
> system, which could make this kind of thing more complete [cc'ing
> pipacs in case he's thought about that before].

Can objtool support a static stack usage analysis?

I'm wondering if it's possible to place these sensitive functions in a
special linker section, like .text.stackzero.; objtool could
collect static call data (as it already does) and stack usage, spitting
out a symbol definition stackzero__max_depth, which you could then
use to bound your zeroing.

Obviously this is a static analysis, with the limitations therein.

   Julia


Re: C tricks for efficient stack zeroing

2018-03-06 Thread Julia Cartwright
On Fri, Mar 02, 2018 at 08:50:17PM +0100, Jason A. Donenfeld wrote:
[..]
> What would be really nice would be to somehow keep track of the
> maximum stack depth, and just before the function returns, clear from
> the maximum depth to its stack base, all in one single call. This
> would not only make the code faster and less brittle, but it would
> also clean up some algorithms quite a bit.
> 
> Ideally this would take the form of a gcc attribute on the function,
> but I was unable to find anything of that nature. I started looking
> for little C tricks for this, and came up dry too. I realize I could
> probably just take the current stack address and zero out until _the
> very end_ but that seems to overshoot and would probably be bad for
> performance. The best I've been able to do come up with are some
> x86-specific macros, but that approach seems a bit underwhelming.
> Other approaches include adding a new attribute via the gcc plugin
> system, which could make this kind of thing more complete [cc'ing
> pipacs in case he's thought about that before].

Can objtool support a static stack usage analysis?

I'm wondering if it's possible to place these sensitive functions in a
special linker section, like .text.stackzero.; objtool could
collect static call data (as it already does) and stack usage, spitting
out a symbol definition stackzero__max_depth, which you could then
use to bound your zeroing.

Obviously this is a static analysis, with the limitations therein.

   Julia


[ANNOUNCE] 4.9.84-rt62

2018-02-27 Thread Julia Cartwright
Hello RT Folks!

I'm pleased to announce the 4.9.84-rt62 stable release.

This release is just an update to the new stable 4.9.84 version
and no RT specific changes have been made.

You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v4.9-rt
  Head SHA1: be42cd02846a611af533103a3f4b6a7d8c592f49

Or to build 4.9.84-rt62 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v4.x/linux-4.9.tar.xz

  http://www.kernel.org/pub/linux/kernel/v4.x/patch-4.9.84.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/4.9/patch-4.9.84-rt62.patch.xz

Enjoy!
Julia



[ANNOUNCE] 4.9.84-rt62

2018-02-27 Thread Julia Cartwright
Hello RT Folks!

I'm pleased to announce the 4.9.84-rt62 stable release.

This release is just an update to the new stable 4.9.84 version
and no RT specific changes have been made.

You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v4.9-rt
  Head SHA1: be42cd02846a611af533103a3f4b6a7d8c592f49

Or to build 4.9.84-rt62 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v4.x/linux-4.9.tar.xz

  http://www.kernel.org/pub/linux/kernel/v4.x/patch-4.9.84.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/4.9/patch-4.9.84-rt62.patch.xz

Enjoy!
Julia



Re: [PATCH 1/2] kernel/sofirq: consolidate common code in __tasklet_schedule() + _hi_

2018-02-15 Thread Julia Cartwright
On Thu, Feb 15, 2018 at 03:07:07PM -0500, Steven Rostedt wrote:
> On Thu, 15 Feb 2018 18:20:41 +0100
> Sebastian Andrzej Siewior  wrote:
> 
> > -void __tasklet_schedule(struct tasklet_struct *t)
> > +static void __tasklet_schedule_common(struct tasklet_struct *t,
> > + struct tasklet_head *head,
> > + unsigned int softirq_nr)
> >  {
> > unsigned long flags;
> >  
> > local_irq_save(flags);
> 
> If you look at the original patch, it did not move local_irq_save()
> into the common function.
> 
> > t->next = NULL;
> > -   *__this_cpu_read(tasklet_vec.tail) = t;
> > -   __this_cpu_write(tasklet_vec.tail, &(t->next));
> > -   raise_softirq_irqoff(TASKLET_SOFTIRQ);
> > +   *head->tail = t;
> > +   head->tail = &(t->next);
> > +   raise_softirq_irqoff(softirq_nr);
> > local_irq_restore(flags);
> >  }
> > +
> > +void __tasklet_schedule(struct tasklet_struct *t)
> > +{
> > +   __tasklet_schedule_common(t, this_cpu_ptr(_vec),
> 
> What can happen is, we reference (tasklet_vec) on one CPU, get
> preempted (running in ksoftirqd), scheduled on another CPU, then when
> inside the common code, we are executing on a different CPU than the
> tasklet is for. The rasise_softirq() is happening on the wrong CPU.
> 
> The local_irq_save() can't be moved to the common function. It must be
> done by each individual function.

Well, it can be, but the percpu access needs to go with it; so an
alternative solution would be the below.

I'm also wondering whether the t->next = NULL assignment could be lifted
out from irqs-disabled region.

   Julia

diff --git a/kernel/softirq.c b/kernel/softirq.c
index fa7ed89a9fcf..177de3640c78 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -461,12 +461,14 @@ static DEFINE_PER_CPU(struct tasklet_head, tasklet_vec);
 static DEFINE_PER_CPU(struct tasklet_head, tasklet_hi_vec);
 
 static void __tasklet_schedule_common(struct tasklet_struct *t,
- struct tasklet_head *head,
+ struct tasklet_head __percpu *headp,
  unsigned int softirq_nr)
 {
+   struct tasklet_head *head;
unsigned long flags;
 
local_irq_save(flags);
+   head = this_cpu_ptr(headp);
t->next = NULL;
*head->tail = t;
head->tail = &(t->next);
@@ -476,14 +478,14 @@ static void __tasklet_schedule_common(struct 
tasklet_struct *t,
 
 void __tasklet_schedule(struct tasklet_struct *t)
 {
-   __tasklet_schedule_common(t, this_cpu_ptr(_vec),
+   __tasklet_schedule_common(t, _vec,
  TASKLET_SOFTIRQ);
 }
 EXPORT_SYMBOL(__tasklet_schedule);
 
 void __tasklet_hi_schedule(struct tasklet_struct *t)
 {
-   __tasklet_schedule_common(t, this_cpu_ptr(_hi_vec),
+   __tasklet_schedule_common(t, _hi_vec,
  HI_SOFTIRQ);
 }
 EXPORT_SYMBOL(__tasklet_hi_schedule);


Re: [PATCH 1/2] kernel/sofirq: consolidate common code in __tasklet_schedule() + _hi_

2018-02-15 Thread Julia Cartwright
On Thu, Feb 15, 2018 at 03:07:07PM -0500, Steven Rostedt wrote:
> On Thu, 15 Feb 2018 18:20:41 +0100
> Sebastian Andrzej Siewior  wrote:
> 
> > -void __tasklet_schedule(struct tasklet_struct *t)
> > +static void __tasklet_schedule_common(struct tasklet_struct *t,
> > + struct tasklet_head *head,
> > + unsigned int softirq_nr)
> >  {
> > unsigned long flags;
> >  
> > local_irq_save(flags);
> 
> If you look at the original patch, it did not move local_irq_save()
> into the common function.
> 
> > t->next = NULL;
> > -   *__this_cpu_read(tasklet_vec.tail) = t;
> > -   __this_cpu_write(tasklet_vec.tail, &(t->next));
> > -   raise_softirq_irqoff(TASKLET_SOFTIRQ);
> > +   *head->tail = t;
> > +   head->tail = &(t->next);
> > +   raise_softirq_irqoff(softirq_nr);
> > local_irq_restore(flags);
> >  }
> > +
> > +void __tasklet_schedule(struct tasklet_struct *t)
> > +{
> > +   __tasklet_schedule_common(t, this_cpu_ptr(_vec),
> 
> What can happen is, we reference (tasklet_vec) on one CPU, get
> preempted (running in ksoftirqd), scheduled on another CPU, then when
> inside the common code, we are executing on a different CPU than the
> tasklet is for. The rasise_softirq() is happening on the wrong CPU.
> 
> The local_irq_save() can't be moved to the common function. It must be
> done by each individual function.

Well, it can be, but the percpu access needs to go with it; so an
alternative solution would be the below.

I'm also wondering whether the t->next = NULL assignment could be lifted
out from irqs-disabled region.

   Julia

diff --git a/kernel/softirq.c b/kernel/softirq.c
index fa7ed89a9fcf..177de3640c78 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -461,12 +461,14 @@ static DEFINE_PER_CPU(struct tasklet_head, tasklet_vec);
 static DEFINE_PER_CPU(struct tasklet_head, tasklet_hi_vec);
 
 static void __tasklet_schedule_common(struct tasklet_struct *t,
- struct tasklet_head *head,
+ struct tasklet_head __percpu *headp,
  unsigned int softirq_nr)
 {
+   struct tasklet_head *head;
unsigned long flags;
 
local_irq_save(flags);
+   head = this_cpu_ptr(headp);
t->next = NULL;
*head->tail = t;
head->tail = &(t->next);
@@ -476,14 +478,14 @@ static void __tasklet_schedule_common(struct 
tasklet_struct *t,
 
 void __tasklet_schedule(struct tasklet_struct *t)
 {
-   __tasklet_schedule_common(t, this_cpu_ptr(_vec),
+   __tasklet_schedule_common(t, _vec,
  TASKLET_SOFTIRQ);
 }
 EXPORT_SYMBOL(__tasklet_schedule);
 
 void __tasklet_hi_schedule(struct tasklet_struct *t)
 {
-   __tasklet_schedule_common(t, this_cpu_ptr(_hi_vec),
+   __tasklet_schedule_common(t, _hi_vec,
  HI_SOFTIRQ);
 }
 EXPORT_SYMBOL(__tasklet_hi_schedule);


[PATCH 4.9.y] ubifs: Massage assert in ubifs_xattr_set() wrt. init_xattrs

2018-02-13 Thread Julia Cartwright
From: Xiaolei Li <xiaolei...@mediatek.com>

This is a conceptual cherry-pick of commit
d8db5b1ca9d4c57e49893d0f78e6d5ce81450cc8 upstream.

The inode is not locked in init_xattrs when creating a new inode.

Without this patch, there will occurs assert when booting or creating
a new file, if the kernel config CONFIG_SECURITY_SMACK is enabled.

Log likes:

UBIFS assert failed in ubifs_xattr_set at 298 (pid 1156)
CPU: 1 PID: 1156 Comm: ldconfig Tainted: G S 4.12.0-rc1-207440-g1e70b02 #2
Hardware name: MediaTek MT2712 evaluation board (DT)
Call trace:
[] dump_backtrace+0x0/0x238
[] show_stack+0x14/0x20
[] dump_stack+0x9c/0xc0
[] ubifs_xattr_set+0x374/0x5e0
[] init_xattrs+0x5c/0xb8
[] security_inode_init_security+0x110/0x190
[] ubifs_init_security+0x30/0x68
[] ubifs_mkdir+0x100/0x200
[] vfs_mkdir+0x11c/0x1b8
[] SyS_mkdirat+0x74/0xd0
[] __sys_trace_return+0x0/0x4

Signed-off-by: Xiaolei Li <xiaolei...@mediatek.com>
Signed-off-by: Richard Weinberger <rich...@nod.at>
Cc: sta...@vger.kernel.org
(julia: massaged to apply to 4.9.y, which doesn't contain fscrypto support)
Signed-off-by: Julia Cartwright <ju...@ni.com>
---
Hey all-

We reproduced the issue fixed upstream by Xiaolei Li's commit in 4.9.y,
with the very similar backtrace:

   UBIFS assert failed in __ubifs_setxattr at 282 (pid 1362)
   CPU: 1 PID: 1362 Comm: sed Not tainted 4.9.47-rt37 #1
   Backtrace:
  (dump_backtrace) from (show_stack+0x20/0x24)
  (show_stack) from (dump_stack+0x80/0xa0)
  (dump_stack) from (__ubifs_setxattr+0x84/0x634)
  (__ubifs_setxattr) from (init_xattrs+0x70/0xac)
  (init_xattrs) from (security_inode_init_security+0x100/0x144)
  (security_inode_init_security) from (ubifs_init_security+0x38/0x6c)
  (ubifs_init_security) from (ubifs_create+0xe8/0x1fc)
  (ubifs_create) from (path_openat+0x97c/0x1090)
  (path_openat) from (do_filp_open+0x48/0x94)
  (do_filp_open) from (do_sys_open+0x134/0x1d8)
  (do_sys_open) from (SyS_open+0x30/0x34)
  (SyS_open) from (ret_fast_syscall+0x0/0x3c)

Please consider applying this to 4.9.y at the very least, it may apply
further back as well.

Thanks!
   Julia

 fs/ubifs/xattr.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/ubifs/xattr.c b/fs/ubifs/xattr.c
index d9f9615bfd71..3979d767a0cb 100644
--- a/fs/ubifs/xattr.c
+++ b/fs/ubifs/xattr.c
@@ -270,7 +270,8 @@ static struct inode *iget_xattr(struct ubifs_info *c, ino_t 
inum)
 }
 
 static int __ubifs_setxattr(struct inode *host, const char *name,
-   const void *value, size_t size, int flags)
+   const void *value, size_t size, int flags,
+   bool check_lock)
 {
struct inode *inode;
struct ubifs_info *c = host->i_sb->s_fs_info;
@@ -279,7 +280,8 @@ static int __ubifs_setxattr(struct inode *host, const char 
*name,
union ubifs_key key;
int err;
 
-   ubifs_assert(inode_is_locked(host));
+   if (check_lock)
+   ubifs_assert(inode_is_locked(host));
 
if (size > UBIFS_MAX_INO_DATA)
return -ERANGE;
@@ -548,7 +550,8 @@ static int init_xattrs(struct inode *inode, const struct 
xattr *xattr_array,
}
strcpy(name, XATTR_SECURITY_PREFIX);
strcpy(name + XATTR_SECURITY_PREFIX_LEN, xattr->name);
-   err = __ubifs_setxattr(inode, name, xattr->value, 
xattr->value_len, 0);
+   err = __ubifs_setxattr(inode, name, xattr->value,
+  xattr->value_len, 0, false);
kfree(name);
if (err < 0)
break;
@@ -594,7 +597,8 @@ static int ubifs_xattr_set(const struct xattr_handler 
*handler,
name = xattr_full_name(handler, name);
 
if (value)
-   return __ubifs_setxattr(inode, name, value, size, flags);
+   return __ubifs_setxattr(inode, name, value, size, flags,
+   true);
else
return __ubifs_removexattr(inode, name);
 }
-- 
2.16.1



[PATCH 4.9.y] ubifs: Massage assert in ubifs_xattr_set() wrt. init_xattrs

2018-02-13 Thread Julia Cartwright
From: Xiaolei Li 

This is a conceptual cherry-pick of commit
d8db5b1ca9d4c57e49893d0f78e6d5ce81450cc8 upstream.

The inode is not locked in init_xattrs when creating a new inode.

Without this patch, there will occurs assert when booting or creating
a new file, if the kernel config CONFIG_SECURITY_SMACK is enabled.

Log likes:

UBIFS assert failed in ubifs_xattr_set at 298 (pid 1156)
CPU: 1 PID: 1156 Comm: ldconfig Tainted: G S 4.12.0-rc1-207440-g1e70b02 #2
Hardware name: MediaTek MT2712 evaluation board (DT)
Call trace:
[] dump_backtrace+0x0/0x238
[] show_stack+0x14/0x20
[] dump_stack+0x9c/0xc0
[] ubifs_xattr_set+0x374/0x5e0
[] init_xattrs+0x5c/0xb8
[] security_inode_init_security+0x110/0x190
[] ubifs_init_security+0x30/0x68
[] ubifs_mkdir+0x100/0x200
[] vfs_mkdir+0x11c/0x1b8
[] SyS_mkdirat+0x74/0xd0
[] __sys_trace_return+0x0/0x4

Signed-off-by: Xiaolei Li 
Signed-off-by: Richard Weinberger 
Cc: sta...@vger.kernel.org
(julia: massaged to apply to 4.9.y, which doesn't contain fscrypto support)
Signed-off-by: Julia Cartwright 
---
Hey all-

We reproduced the issue fixed upstream by Xiaolei Li's commit in 4.9.y,
with the very similar backtrace:

   UBIFS assert failed in __ubifs_setxattr at 282 (pid 1362)
   CPU: 1 PID: 1362 Comm: sed Not tainted 4.9.47-rt37 #1
   Backtrace:
  (dump_backtrace) from (show_stack+0x20/0x24)
  (show_stack) from (dump_stack+0x80/0xa0)
  (dump_stack) from (__ubifs_setxattr+0x84/0x634)
  (__ubifs_setxattr) from (init_xattrs+0x70/0xac)
  (init_xattrs) from (security_inode_init_security+0x100/0x144)
  (security_inode_init_security) from (ubifs_init_security+0x38/0x6c)
  (ubifs_init_security) from (ubifs_create+0xe8/0x1fc)
  (ubifs_create) from (path_openat+0x97c/0x1090)
  (path_openat) from (do_filp_open+0x48/0x94)
  (do_filp_open) from (do_sys_open+0x134/0x1d8)
  (do_sys_open) from (SyS_open+0x30/0x34)
  (SyS_open) from (ret_fast_syscall+0x0/0x3c)

Please consider applying this to 4.9.y at the very least, it may apply
further back as well.

Thanks!
   Julia

 fs/ubifs/xattr.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/ubifs/xattr.c b/fs/ubifs/xattr.c
index d9f9615bfd71..3979d767a0cb 100644
--- a/fs/ubifs/xattr.c
+++ b/fs/ubifs/xattr.c
@@ -270,7 +270,8 @@ static struct inode *iget_xattr(struct ubifs_info *c, ino_t 
inum)
 }
 
 static int __ubifs_setxattr(struct inode *host, const char *name,
-   const void *value, size_t size, int flags)
+   const void *value, size_t size, int flags,
+   bool check_lock)
 {
struct inode *inode;
struct ubifs_info *c = host->i_sb->s_fs_info;
@@ -279,7 +280,8 @@ static int __ubifs_setxattr(struct inode *host, const char 
*name,
union ubifs_key key;
int err;
 
-   ubifs_assert(inode_is_locked(host));
+   if (check_lock)
+   ubifs_assert(inode_is_locked(host));
 
if (size > UBIFS_MAX_INO_DATA)
return -ERANGE;
@@ -548,7 +550,8 @@ static int init_xattrs(struct inode *inode, const struct 
xattr *xattr_array,
}
strcpy(name, XATTR_SECURITY_PREFIX);
strcpy(name + XATTR_SECURITY_PREFIX_LEN, xattr->name);
-   err = __ubifs_setxattr(inode, name, xattr->value, 
xattr->value_len, 0);
+   err = __ubifs_setxattr(inode, name, xattr->value,
+  xattr->value_len, 0, false);
kfree(name);
if (err < 0)
break;
@@ -594,7 +597,8 @@ static int ubifs_xattr_set(const struct xattr_handler 
*handler,
name = xattr_full_name(handler, name);
 
if (value)
-   return __ubifs_setxattr(inode, name, value, size, flags);
+   return __ubifs_setxattr(inode, name, value, size, flags,
+   true);
else
return __ubifs_removexattr(inode, name);
 }
-- 
2.16.1



Re: [patch v18 1/4] drivers: jtag: Add JTAG core driver

2018-01-29 Thread Julia Cartwright
On Mon, Jan 29, 2018 at 04:31:42PM +0200, Oleksandr Shamray wrote:
> Initial patch for JTAG driver
> JTAG class driver provide infrastructure to support hardware/software
> JTAG platform drivers. It provide user layer API interface for flashing
> and debugging external devices which equipped with JTAG interface
> using standard transactions.
> 
> Driver exposes set of IOCTL to user space for:
> - XFER:
> - SIR (Scan Instruction Register, IEEE 1149.1 Data Register scan);
> - SDR (Scan Data Register, IEEE 1149.1 Instruction Register scan);
> - RUNTEST (Forces the IEEE 1149.1 bus to a run state for a specified
>   number of clocks).
> - SIOCFREQ/GIOCFREQ for setting and reading JTAG frequency.
> 
> Driver core provides set of internal APIs for allocation and
> registration:
> - jtag_register;
> - jtag_unregister;
> - jtag_alloc;
> - jtag_free;
> 
> Platform driver on registration with jtag-core creates the next
> entry in dev folder:
> /dev/jtagX
> 
> Signed-off-by: Oleksandr Shamray 
> Signed-off-by: Jiri Pirko 
> Acked-by: Philippe Ombredanne 
[..]
> diff --git a/drivers/jtag/jtag.c b/drivers/jtag/jtag.c
[..]
> +struct jtag *jtag_alloc(size_t priv_size, const struct jtag_ops *ops)
> +{
> + struct jtag *jtag;
> +
> + jtag = kzalloc(sizeof(*jtag) + priv_size, GFP_KERNEL);
> + if (!jtag)
> + return NULL;
> +
> + if (!ops)
> + return NULL;
> +
> + if (!ops->idle || !ops->mode_set || !ops->status_get || !ops->xfer)
> + return NULL;

Did you think through this?

You leak 'jtag' here and above.

Perform all the ops checks prior to the allocation.

   Julia


Re: [patch v18 1/4] drivers: jtag: Add JTAG core driver

2018-01-29 Thread Julia Cartwright
On Mon, Jan 29, 2018 at 04:31:42PM +0200, Oleksandr Shamray wrote:
> Initial patch for JTAG driver
> JTAG class driver provide infrastructure to support hardware/software
> JTAG platform drivers. It provide user layer API interface for flashing
> and debugging external devices which equipped with JTAG interface
> using standard transactions.
> 
> Driver exposes set of IOCTL to user space for:
> - XFER:
> - SIR (Scan Instruction Register, IEEE 1149.1 Data Register scan);
> - SDR (Scan Data Register, IEEE 1149.1 Instruction Register scan);
> - RUNTEST (Forces the IEEE 1149.1 bus to a run state for a specified
>   number of clocks).
> - SIOCFREQ/GIOCFREQ for setting and reading JTAG frequency.
> 
> Driver core provides set of internal APIs for allocation and
> registration:
> - jtag_register;
> - jtag_unregister;
> - jtag_alloc;
> - jtag_free;
> 
> Platform driver on registration with jtag-core creates the next
> entry in dev folder:
> /dev/jtagX
> 
> Signed-off-by: Oleksandr Shamray 
> Signed-off-by: Jiri Pirko 
> Acked-by: Philippe Ombredanne 
[..]
> diff --git a/drivers/jtag/jtag.c b/drivers/jtag/jtag.c
[..]
> +struct jtag *jtag_alloc(size_t priv_size, const struct jtag_ops *ops)
> +{
> + struct jtag *jtag;
> +
> + jtag = kzalloc(sizeof(*jtag) + priv_size, GFP_KERNEL);
> + if (!jtag)
> + return NULL;
> +
> + if (!ops)
> + return NULL;
> +
> + if (!ops->idle || !ops->mode_set || !ops->status_get || !ops->xfer)
> + return NULL;

Did you think through this?

You leak 'jtag' here and above.

Perform all the ops checks prior to the allocation.

   Julia


Re: [patch v17 1/4] drivers: jtag: Add JTAG core driver

2018-01-17 Thread Julia Cartwright
Hello Oleksandr-

On Tue, Jan 16, 2018 at 09:18:56AM +0200, Oleksandr Shamray wrote:
[..]
> v16->v17
> Comments pointed by Julia Cartwright <jul...@eso.teric.us>

More review feedback below:

[..]
> +++ b/drivers/jtag/jtag.c
[..]
> +static long jtag_ioctl(struct file *file, unsigned int cmd, unsigned long 
> arg)
> +{
> + struct jtag *jtag = file->private_data;
> + struct jtag_run_test_idle idle;
> + struct jtag_xfer xfer;
> + u8 *xfer_data;
> + u32 data_size;
> + u32 value;
> + int err;
> +
> + if (!arg)
> + return -EINVAL;
> +
> + switch (cmd) {
> + case JTAG_GIOCFREQ:
> + if (!jtag->ops->freq_get)
> + err = -EOPNOTSUPP;

Did you mean:

return -EOPNOTSUPP;

?

> +
> + err = jtag->ops->freq_get(jtag, );

Otherwise you're check was worthless, you'll call NULL here.

Also, w.r.t. the set of ops which are required to be implemented: this
isn't the right place to do the check.

Instead, do it in jtag_alloc():

struct jtag *jtag_alloc(size_t priv_size, const struct jtag_ops *ops)
{
struct jtag *jtag;

if (!ops->freq_get || !ops->xfer || ...) /* fixup condition */
return NULL;

jtag = kzalloc(sizeof(*jtag) + priv_size, GFP_KERNEL);
if (!jtag)
return NULL;

jtag->ops = ops;
return jtag;
}
EXPORT_SYMBOL_GPL(jtag_alloc);

[..]
> + case JTAG_IOCXFER:
[..]
> + data_size = DIV_ROUND_UP(xfer.length, BITS_PER_BYTE);
> + xfer_data = memdup_user(u64_to_user_ptr(xfer.tdio), data_size);
> +
> + if (!xfer_data)

memdup_user() doesn't return NULL on error.  You need to check for
IS_ERR(xfer_data).

> + return -EFAULT;
> +
> + err = jtag->ops->xfer(jtag, , xfer_data);
> + if (err) {
> + kfree(xfer_data);
> + return -EFAULT;
> + }
> +
> + err = copy_to_user(u64_to_user_ptr(xfer.tdio),
> +(void *)(xfer_data), data_size);
> +
> + if (err) {
> + kfree(xfer_data);
> + return -EFAULT;
> + }
> +
> + kfree(xfer_data);

Move the kfree() above the if (err).

> + if (copy_to_user((void *)arg, , sizeof(struct jtag_xfer)))
> + return -EFAULT;
> + break;
> +
> + case JTAG_GIOCSTATUS:
> + if (!jtag->ops->status_get)
> + return -EOPNOTSUPP;
> +
> + err = jtag->ops->status_get(jtag, );
> + if (err)
> + break;
> +
> + err = put_user(value, (__u32 *)arg);
> + if (err)
> + err = -EFAULT;

put_user() returns -EFAULT on failure, so this shouldn't be necessary.

[..]
> --- /dev/null
> +++ b/include/uapi/linux/jtag.h
[..]
> +/**
> + * struct jtag_xfer - jtag xfer:
> + *
> + * @type: transfer type
> + * @direction: xfer direction
> + * @length: xfer bits len
> + * @tdio : xfer data array
> + * @endir: xfer end state
> + *
> + * Structure represents interface to JTAG device for jtag sdr xfer
> + * execution.
> + */
> +struct jtag_xfer {
> + __u8type;
> + __u8direction;
> + __u8endstate;

Just to be as unambiguous as possible, considering this is ABI, I would
suggest explicitly putting a padding byte here.

> + __u32   length;
> + __u64   tdio;
> +};

Thanks,
   Julia


Re: [patch v17 1/4] drivers: jtag: Add JTAG core driver

2018-01-17 Thread Julia Cartwright
Hello Oleksandr-

On Tue, Jan 16, 2018 at 09:18:56AM +0200, Oleksandr Shamray wrote:
[..]
> v16->v17
> Comments pointed by Julia Cartwright 

More review feedback below:

[..]
> +++ b/drivers/jtag/jtag.c
[..]
> +static long jtag_ioctl(struct file *file, unsigned int cmd, unsigned long 
> arg)
> +{
> + struct jtag *jtag = file->private_data;
> + struct jtag_run_test_idle idle;
> + struct jtag_xfer xfer;
> + u8 *xfer_data;
> + u32 data_size;
> + u32 value;
> + int err;
> +
> + if (!arg)
> + return -EINVAL;
> +
> + switch (cmd) {
> + case JTAG_GIOCFREQ:
> + if (!jtag->ops->freq_get)
> + err = -EOPNOTSUPP;

Did you mean:

return -EOPNOTSUPP;

?

> +
> + err = jtag->ops->freq_get(jtag, );

Otherwise you're check was worthless, you'll call NULL here.

Also, w.r.t. the set of ops which are required to be implemented: this
isn't the right place to do the check.

Instead, do it in jtag_alloc():

struct jtag *jtag_alloc(size_t priv_size, const struct jtag_ops *ops)
{
struct jtag *jtag;

if (!ops->freq_get || !ops->xfer || ...) /* fixup condition */
return NULL;

jtag = kzalloc(sizeof(*jtag) + priv_size, GFP_KERNEL);
if (!jtag)
return NULL;

jtag->ops = ops;
return jtag;
}
EXPORT_SYMBOL_GPL(jtag_alloc);

[..]
> + case JTAG_IOCXFER:
[..]
> + data_size = DIV_ROUND_UP(xfer.length, BITS_PER_BYTE);
> + xfer_data = memdup_user(u64_to_user_ptr(xfer.tdio), data_size);
> +
> + if (!xfer_data)

memdup_user() doesn't return NULL on error.  You need to check for
IS_ERR(xfer_data).

> + return -EFAULT;
> +
> + err = jtag->ops->xfer(jtag, , xfer_data);
> + if (err) {
> + kfree(xfer_data);
> + return -EFAULT;
> + }
> +
> + err = copy_to_user(u64_to_user_ptr(xfer.tdio),
> +(void *)(xfer_data), data_size);
> +
> + if (err) {
> + kfree(xfer_data);
> + return -EFAULT;
> + }
> +
> + kfree(xfer_data);

Move the kfree() above the if (err).

> + if (copy_to_user((void *)arg, , sizeof(struct jtag_xfer)))
> + return -EFAULT;
> + break;
> +
> + case JTAG_GIOCSTATUS:
> + if (!jtag->ops->status_get)
> + return -EOPNOTSUPP;
> +
> + err = jtag->ops->status_get(jtag, );
> + if (err)
> + break;
> +
> + err = put_user(value, (__u32 *)arg);
> + if (err)
> + err = -EFAULT;

put_user() returns -EFAULT on failure, so this shouldn't be necessary.

[..]
> --- /dev/null
> +++ b/include/uapi/linux/jtag.h
[..]
> +/**
> + * struct jtag_xfer - jtag xfer:
> + *
> + * @type: transfer type
> + * @direction: xfer direction
> + * @length: xfer bits len
> + * @tdio : xfer data array
> + * @endir: xfer end state
> + *
> + * Structure represents interface to JTAG device for jtag sdr xfer
> + * execution.
> + */
> +struct jtag_xfer {
> + __u8type;
> + __u8direction;
> + __u8endstate;

Just to be as unambiguous as possible, considering this is ABI, I would
suggest explicitly putting a padding byte here.

> + __u32   length;
> + __u64   tdio;
> +};

Thanks,
   Julia


Re: [ANNOUNCE] 4.9.76-rt61

2018-01-16 Thread Julia Cartwright
On Tue, Jan 16, 2018 at 11:11:32PM +, Bernhard Landauer wrote:
> Ah. Working now after adding in validpgpkeys array in the PKGBUILD.

This was the first v4.9-rt release since Steven handed over his
maintainership , so that would explain why it would be signed by a
different key (although, my key has been signed by Steven's).

Thanks,
   Julia


Re: [ANNOUNCE] 4.9.76-rt61

2018-01-16 Thread Julia Cartwright
On Tue, Jan 16, 2018 at 11:11:32PM +, Bernhard Landauer wrote:
> Ah. Working now after adding in validpgpkeys array in the PKGBUILD.

This was the first v4.9-rt release since Steven handed over his
maintainership , so that would explain why it would be signed by a
different key (although, my key has been signed by Steven's).

Thanks,
   Julia


  1   2   3   4   >