[tip: core/rcu] rcu: Enable rcu_normal_after_boot unconditionally for RT
The following commit has been merged into the core/rcu branch of tip: Commit-ID: 36221e109eb20ac111bc3bf3e8d5639aa457c7e0 Gitweb: https://git.kernel.org/tip/36221e109eb20ac111bc3bf3e8d5639aa457c7e0 Author:Julia Cartwright AuthorDate:Tue, 15 Dec 2020 15:16:47 +01:00 Committer: Paul E. McKenney CommitterDate: Mon, 04 Jan 2021 13:43:51 -08:00 rcu: Enable rcu_normal_after_boot unconditionally for RT Expedited RCU grace periods send IPIs to all non-idle CPUs, and thus can disrupt time-critical code in real-time applications. However, there is a portion of boot-time processing (presumably before any real-time applications have started) where expedited RCU grace periods are the only option. And so it is that experience with the -rt patchset indicates that PREEMPT_RT systems should always set the rcupdate.rcu_normal_after_boot kernel boot parameter. This commit therefore makes the post-boot application environment safe for real-time applications by making PREEMPT_RT systems disable the rcupdate.rcu_normal_after_boot kernel boot parameter and acting as if this parameter had been set. This means that post-boot calls to synchronize_rcu_expedited() will be treated as if they were instead calls to synchronize_rcu(), thus preventing the IPIs, and thus avoiding disrupting real-time applications. Suggested-by: Luiz Capitulino Acked-by: Paul E. McKenney Signed-off-by: Julia Cartwright Signed-off-by: Sebastian Andrzej Siewior [ paulmck: Update kernel-parameters.txt accordingly. ] Signed-off-by: Paul E. McKenney --- Documentation/admin-guide/kernel-parameters.txt | 7 +++ kernel/rcu/update.c | 4 +++- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 521255f..e0008d9 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -4474,6 +4474,13 @@ only normal grace-period primitives. No effect on CONFIG_TINY_RCU kernels. + But note that CONFIG_PREEMPT_RT=y kernels enables + this kernel boot parameter, forcibly setting + it to the value one, that is, converting any + post-boot attempt at an expedited RCU grace + period to instead use normal non-expedited + grace-period processing. + rcupdate.rcu_task_ipi_delay= [KNL] Set time in jiffies during which RCU tasks will avoid sending IPIs, starting with the beginning diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c index 39334d2..b95ae86 100644 --- a/kernel/rcu/update.c +++ b/kernel/rcu/update.c @@ -56,8 +56,10 @@ #ifndef CONFIG_TINY_RCU module_param(rcu_expedited, int, 0); module_param(rcu_normal, int, 0); -static int rcu_normal_after_boot; +static int rcu_normal_after_boot = IS_ENABLED(CONFIG_PREEMPT_RT); +#ifndef CONFIG_PREEMPT_RT module_param(rcu_normal_after_boot, int, 0); +#endif #endif /* #ifndef CONFIG_TINY_RCU */ #ifdef CONFIG_DEBUG_LOCK_ALLOC
[tip: locking/core] squashfs: Make use of local lock in multi_cpu decompressor
The following commit has been merged into the locking/core branch of tip: Commit-ID: fd56200a16c72c7c3ec3e54e06160dfaa5b8dee8 Gitweb: https://git.kernel.org/tip/fd56200a16c72c7c3ec3e54e06160dfaa5b8dee8 Author:Julia Cartwright AuthorDate:Wed, 27 May 2020 22:11:16 +02:00 Committer: Ingo Molnar CommitterDate: Thu, 28 May 2020 10:31:10 +02:00 squashfs: Make use of local lock in multi_cpu decompressor The squashfs multi CPU decompressor makes use of get_cpu_ptr() to acquire a pointer to per-CPU data. get_cpu_ptr() implicitly disables preemption which serializes the access to the per-CPU data. But decompression can take quite some time depending on the size. The observed preempt disabled times in real world scenarios went up to 8ms, causing massive wakeup latencies. This happens on all CPUs as the decompression is fully parallelized. Replace the implicit preemption control with an explicit local lock. This allows RT kernels to substitute it with a real per CPU lock, which serializes the access but keeps the code section preemptible. On non RT kernels this maps to preempt_disable() as before, i.e. no functional change. [ bigeasy: Use local_lock(), patch description] Reported-by: Alexander Stein Signed-off-by: Julia Cartwright Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Ingo Molnar Tested-by: Alexander Stein Acked-by: Peter Zijlstra Link: https://lore.kernel.org/r/20200527201119.1692513-5-bige...@linutronix.de --- fs/squashfs/decompressor_multi_percpu.c | 21 ++--- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/fs/squashfs/decompressor_multi_percpu.c b/fs/squashfs/decompressor_multi_percpu.c index 2a2a2d1..e206ebf 100644 --- a/fs/squashfs/decompressor_multi_percpu.c +++ b/fs/squashfs/decompressor_multi_percpu.c @@ -8,6 +8,7 @@ #include #include #include +#include #include "squashfs_fs.h" #include "squashfs_fs_sb.h" @@ -20,7 +21,8 @@ */ struct squashfs_stream { - void*stream; + void*stream; + local_lock_tlock; }; void *squashfs_decompressor_create(struct squashfs_sb_info *msblk, @@ -41,6 +43,7 @@ void *squashfs_decompressor_create(struct squashfs_sb_info *msblk, err = PTR_ERR(stream->stream); goto out; } + local_lock_init(>lock); } kfree(comp_opts); @@ -75,12 +78,16 @@ void squashfs_decompressor_destroy(struct squashfs_sb_info *msblk) int squashfs_decompress(struct squashfs_sb_info *msblk, struct buffer_head **bh, int b, int offset, int length, struct squashfs_page_actor *output) { - struct squashfs_stream __percpu *percpu = - (struct squashfs_stream __percpu *) msblk->stream; - struct squashfs_stream *stream = get_cpu_ptr(percpu); - int res = msblk->decompressor->decompress(msblk, stream->stream, bh, b, - offset, length, output); - put_cpu_ptr(stream); + struct squashfs_stream *stream; + int res; + + local_lock(>stream->lock); + stream = this_cpu_ptr(msblk->stream); + + res = msblk->decompressor->decompress(msblk, stream->stream, bh, b, + offset, length, output); + + local_unlock(>stream->lock); if (res < 0) ERROR("%s decompression failed, data probably corrupt\n",
Re: [patch 10/12] hrtimer: Determine hard/soft expiry mode for hrtimer sleepers on RT
On Fri, Jul 26, 2019 at 08:30:58PM +0200, Thomas Gleixner wrote: > From: Sebastian Andrzej Siewior > > On PREEMPT_RT enabled kernels hrtimers which are not explicitely marked for > hard interrupt expiry mode are moved into soft interrupt context either for > latency reasons or because the hrtimer callback takes regular spinlocks or > invokes other functions which are not suitable for hard interrupt context > on PREEMPT_RT. > > The hrtimer_sleeper callback is RT compatible in hard interrupt context, > but there is a latency concern: Untrusted userspace can spawn many threads > which arm timers for the same expiry time on the same CPU. On expiry that > causes a latency spike due to the wakeup of a gazillion threads. > > OTOH, priviledged real-time user space applications rely on the low latency > of hard interrupt wakeups. These syscall related wakeups are all based on > hrtimer sleepers. > > If the current task is in a real-time scheduling class, mark the mode for > hard interrupt expiry. > > [ tglx: Split out of a larger combo patch. Added changelog ] > > Signed-off-by: Sebastian Andrzej Siewior > Signed-off-by: Thomas Gleixner > --- > kernel/time/hrtimer.c | 24 > 1 file changed, 24 insertions(+) > > --- a/kernel/time/hrtimer.c > +++ b/kernel/time/hrtimer.c > @@ -1662,6 +1662,30 @@ static enum hrtimer_restart hrtimer_wake > static void __hrtimer_init_sleeper(struct hrtimer_sleeper *sl, > clockid_t clock_id, enum hrtimer_mode mode) > { > + /* > + * On PREEMPT_RT enabled kernels hrtimers which are not explicitely > + * marked for hard interrupt expiry mode are moved into soft > + * interrupt context either for latency reasons or because the > + * hrtimer callback takes regular spinlocks or invokes other > + * functions which are not suitable for hard interrupt context on > + * PREEMPT_RT. > + * > + * The hrtimer_sleeper callback is RT compatible in hard interrupt > + * context, but there is a latency concern: Untrusted userspace can > + * spawn many threads which arm timers for the same expiry time on > + * the same CPU. That causes a latency spike due to the wakeup of > + * a gazillion threads. > + * > + * OTOH, priviledged real-time user space applications rely on the > + * low latency of hard interrupt wakeups. If the current task is in > + * a real-time scheduling class, mark the mode for hard interrupt > + * expiry. > + */ > + if (IS_ENABLED(CONFIG_PREEMPT_RT)) { > + if (task_is_realtime(current) && !(mode & HRTIMER_MODE_SOFT)) > + mode |= HRTIMER_MODE_HARD; Because this ends up sampling the tasks' scheduling parameters only at the time of enqueue, it doesn't take into consideration whether or not the task maybe holding a PI lock and later be boosted if contended by an RT thread. Am I correct in assuming there is an induced inversion here in this case, because the deferred wakeup mechanism isn't part of the PI chain? If so, is this just to be an accepted limitation at this point? Is the intent to argue this away as bad RT application design? :) Julia
Re: [patch V2 1/1] Kconfig: Introduce CONFIG_PREEMPT_RT
On Wed, Jul 17, 2019 at 10:01:49PM +0200, Thomas Gleixner wrote: > Add a new entry to the preemption menu which enables the real-time support > for the kernel. The choice is only enabled when an architecture supports > it. > > It selects PREEMPT as the RT features depend on it. To achieve that the > existing PREEMPT choice is renamed to PREEMPT_LL which select PREEMPT as > well. > > No functional change. > > Signed-off-by: Thomas Gleixner > Acked-by: Paul E. McKenney > Acked-by: Steven Rostedt (VMware) > Acked-by: Clark Williams > Acked-by: Daniel Bristot de Oliveira > Acked-by: Frederic Weisbecker > Acked-by: Ingo Molnar > Acked-by: Peter Zijlstra (Intel) > Acked-by: Marc Zyngier > Acked-by: Daniel Wagner > --- I'm excited to see where this goes. Acked-by: Julia Cartwright Julia
[tip:sched/core] kthread: Convert worker lock to raw spinlock
Commit-ID: fe99a4f4d6022ec92f9b52a5528cb9b77513e7d1 Gitweb: https://git.kernel.org/tip/fe99a4f4d6022ec92f9b52a5528cb9b77513e7d1 Author: Julia Cartwright AuthorDate: Tue, 12 Feb 2019 17:25:53 +0100 Committer: Thomas Gleixner CommitDate: Thu, 28 Feb 2019 11:18:38 +0100 kthread: Convert worker lock to raw spinlock In order to enable the queuing of kthread work items from hardirq context even when PREEMPT_RT_FULL is enabled, convert the worker spin_lock to a raw_spin_lock. This is only acceptable to do because the work performed under the lock is well-bounded and minimal. Reported-by: Steffen Trumtrar Reported-by: Tim Sander Signed-off-by: Julia Cartwright Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Tested-by: Steffen Trumtrar Reviewed-by: Petr Mladek Cc: Guenter Roeck Link: https://lkml.kernel.org/r/20190212162554.19779-1-bige...@linutronix.de --- include/linux/kthread.h | 4 ++-- kernel/kthread.c| 42 +- 2 files changed, 23 insertions(+), 23 deletions(-) diff --git a/include/linux/kthread.h b/include/linux/kthread.h index c1961761311d..6b8c064f0cbc 100644 --- a/include/linux/kthread.h +++ b/include/linux/kthread.h @@ -85,7 +85,7 @@ enum { struct kthread_worker { unsigned intflags; - spinlock_t lock; + raw_spinlock_t lock; struct list_headwork_list; struct list_headdelayed_work_list; struct task_struct *task; @@ -106,7 +106,7 @@ struct kthread_delayed_work { }; #define KTHREAD_WORKER_INIT(worker){ \ - .lock = __SPIN_LOCK_UNLOCKED((worker).lock),\ + .lock = __RAW_SPIN_LOCK_UNLOCKED((worker).lock),\ .work_list = LIST_HEAD_INIT((worker).work_list),\ .delayed_work_list = LIST_HEAD_INIT((worker).delayed_work_list),\ } diff --git a/kernel/kthread.c b/kernel/kthread.c index 087d18d771b5..5641b55783a6 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -599,7 +599,7 @@ void __kthread_init_worker(struct kthread_worker *worker, struct lock_class_key *key) { memset(worker, 0, sizeof(struct kthread_worker)); - spin_lock_init(>lock); + raw_spin_lock_init(>lock); lockdep_set_class_and_name(>lock, key, name); INIT_LIST_HEAD(>work_list); INIT_LIST_HEAD(>delayed_work_list); @@ -641,21 +641,21 @@ repeat: if (kthread_should_stop()) { __set_current_state(TASK_RUNNING); - spin_lock_irq(>lock); + raw_spin_lock_irq(>lock); worker->task = NULL; - spin_unlock_irq(>lock); + raw_spin_unlock_irq(>lock); return 0; } work = NULL; - spin_lock_irq(>lock); + raw_spin_lock_irq(>lock); if (!list_empty(>work_list)) { work = list_first_entry(>work_list, struct kthread_work, node); list_del_init(>node); } worker->current_work = work; - spin_unlock_irq(>lock); + raw_spin_unlock_irq(>lock); if (work) { __set_current_state(TASK_RUNNING); @@ -812,12 +812,12 @@ bool kthread_queue_work(struct kthread_worker *worker, bool ret = false; unsigned long flags; - spin_lock_irqsave(>lock, flags); + raw_spin_lock_irqsave(>lock, flags); if (!queuing_blocked(worker, work)) { kthread_insert_work(worker, work, >work_list); ret = true; } - spin_unlock_irqrestore(>lock, flags); + raw_spin_unlock_irqrestore(>lock, flags); return ret; } EXPORT_SYMBOL_GPL(kthread_queue_work); @@ -843,7 +843,7 @@ void kthread_delayed_work_timer_fn(struct timer_list *t) if (WARN_ON_ONCE(!worker)) return; - spin_lock(>lock); + raw_spin_lock(>lock); /* Work must not be used with >1 worker, see kthread_queue_work(). */ WARN_ON_ONCE(work->worker != worker); @@ -852,7 +852,7 @@ void kthread_delayed_work_timer_fn(struct timer_list *t) list_del_init(>node); kthread_insert_work(worker, work, >work_list); - spin_unlock(>lock); + raw_spin_unlock(>lock); } EXPORT_SYMBOL(kthread_delayed_work_timer_fn); @@ -908,14 +908,14 @@ bool kthread_queue_delayed_work(struct kthread_worker *worker, unsigned long flags; bool ret = false; - spin_lock_irqsave(>lock, flags); + raw_spin_lock_irqsave(>lock, flags); if (!queuing_blocked(worker, work)) { __kthread_queue_delayed_work(worker, dwork, delay); ret = true; } - spin_unlock_irqrestore(>loc
Re: [PATCH] iommu/dmar: fix buffer overflow during PCI bus notification
On Wed, Feb 20, 2019 at 10:46:31AM -0600, Julia Cartwright wrote: > Commit 57384592c433 ("iommu/vt-d: Store bus information in RMRR PCI > device path") changed the type of the path data, however, the change in > path type was not reflected in size calculations. Update to use the > correct type and prevent a buffer overflow. > > This bug manifests in systems with deep PCI hierarchies, and can lead to > an overflow of the static allocated buffer (dmar_pci_notify_info_buf), > or can lead to overflow of slab-allocated data. > >BUG: KASAN: global-out-of-bounds in dmar_alloc_pci_notify_info+0x1d5/0x2e0 >Write of size 1 at addr 90445d80 by task swapper/0/1 >CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW > 4.14.87-rt49-02406-gd0a0e96 #1 >Call Trace: > ? dump_stack+0x46/0x59 > ? print_address_description+0x1df/0x290 [..] >The buggy address belongs to the variable: > dmar_pci_notify_info_buf+0x40/0x60 > > Fixes: 57384592c433 ("iommu/vt-d: Store bus information in RMRR PCI device > path") > Signed-off-by: Julia Cartwright > --- > drivers/iommu/dmar.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c > index 6b7df25e1488..9c49300e9fb7 100644 > --- a/drivers/iommu/dmar.c > +++ b/drivers/iommu/dmar.c > @@ -145,7 +145,7 @@ dmar_alloc_pci_notify_info(struct pci_dev *dev, unsigned > long event) > for (tmp = dev; tmp; tmp = tmp->bus->self) > level++; > > - size = sizeof(*info) + level * sizeof(struct acpi_dmar_pci_path); > + size = sizeof(*info) + level * sizeof(info->path[0]); This is probably a candidate for struct_size() instead, if that's what is preferred. Julia
[PATCH] iommu/dmar: fix buffer overflow during PCI bus notification
Commit 57384592c433 ("iommu/vt-d: Store bus information in RMRR PCI device path") changed the type of the path data, however, the change in path type was not reflected in size calculations. Update to use the correct type and prevent a buffer overflow. This bug manifests in systems with deep PCI hierarchies, and can lead to an overflow of the static allocated buffer (dmar_pci_notify_info_buf), or can lead to overflow of slab-allocated data. BUG: KASAN: global-out-of-bounds in dmar_alloc_pci_notify_info+0x1d5/0x2e0 Write of size 1 at addr 90445d80 by task swapper/0/1 CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW 4.14.87-rt49-02406-gd0a0e96 #1 Call Trace: ? dump_stack+0x46/0x59 ? print_address_description+0x1df/0x290 ? dmar_alloc_pci_notify_info+0x1d5/0x2e0 ? kasan_report+0x256/0x340 ? dmar_alloc_pci_notify_info+0x1d5/0x2e0 ? e820__memblock_setup+0xb0/0xb0 ? dmar_dev_scope_init+0x424/0x48f ? __down_write_common+0x1ec/0x230 ? dmar_dev_scope_init+0x48f/0x48f ? dmar_free_unused_resources+0x109/0x109 ? cpumask_next+0x16/0x20 ? __kmem_cache_create+0x392/0x430 ? kmem_cache_create+0x135/0x2f0 ? e820__memblock_setup+0xb0/0xb0 ? intel_iommu_init+0x170/0x1848 ? _raw_spin_unlock_irqrestore+0x32/0x60 ? migrate_enable+0x27a/0x5b0 ? sched_setattr+0x20/0x20 ? migrate_disable+0x1fc/0x380 ? task_rq_lock+0x170/0x170 ? try_to_run_init_process+0x40/0x40 ? locks_remove_file+0x85/0x2f0 ? dev_prepare_static_identity_mapping+0x78/0x78 ? rt_spin_unlock+0x39/0x50 ? lockref_put_or_lock+0x2a/0x40 ? dput+0x128/0x2f0 ? __rcu_read_unlock+0x66/0x80 ? __fput+0x250/0x300 ? __rcu_read_lock+0x1b/0x30 ? mntput_no_expire+0x38/0x290 ? e820__memblock_setup+0xb0/0xb0 ? pci_iommu_init+0x25/0x63 ? pci_iommu_init+0x25/0x63 ? do_one_initcall+0x7e/0x1c0 ? initcall_blacklisted+0x120/0x120 ? kernel_init_freeable+0x27b/0x307 ? rest_init+0xd0/0xd0 ? kernel_init+0xf/0x120 ? rest_init+0xd0/0xd0 ? ret_from_fork+0x1f/0x40 The buggy address belongs to the variable: dmar_pci_notify_info_buf+0x40/0x60 Fixes: 57384592c433 ("iommu/vt-d: Store bus information in RMRR PCI device path") Signed-off-by: Julia Cartwright --- drivers/iommu/dmar.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index 6b7df25e1488..9c49300e9fb7 100644 --- a/drivers/iommu/dmar.c +++ b/drivers/iommu/dmar.c @@ -145,7 +145,7 @@ dmar_alloc_pci_notify_info(struct pci_dev *dev, unsigned long event) for (tmp = dev; tmp; tmp = tmp->bus->self) level++; - size = sizeof(*info) + level * sizeof(struct acpi_dmar_pci_path); + size = sizeof(*info) + level * sizeof(info->path[0]); if (size <= sizeof(dmar_pci_notify_info_buf)) { info = (struct dmar_pci_notify_info *)dmar_pci_notify_info_buf; } else { -- 2.20.1
Re: [RFC PATCH] arm64/fpsimd: Don't disable softirq when touching FPSIMD/SVE state
Hello Julien- On Fri, Feb 08, 2019 at 04:55:13PM +, Julien Grall wrote: > When the kernel is compiled with CONFIG_KERNEL_MODE_NEON, some part of > the kernel may be able to use FPSIMD/SVE. This is for instance the case > for crypto code. > > Any use of FPSIMD/SVE in the kernel are clearly marked by using the > function kernel_neon_{begin, end}. Furthermore, this can only be used > when may_use_simd() returns true. > > The current implementation of may_use_simd() allows softirq to use > FPSIMD/SVE unless it is currently in used (i.e kernel_neon_busy is true). > When in used, softirqs usually fallback to a software method. > > At the moment, as a softirq may use FPSIMD/SVE, softirqs are disabled > when touching the FPSIMD/SVE context. This has the drawback to disable > all softirqs even if they are not using FPSIMD/SVE. > > As a softirq should not rely on been able to use simd at a given time, > there are limited reason to keep softirq disabled when touching the > FPSIMD/SVE context. Instead, we can only disable preemption and tell > the NEON unit is currently in use. > > This patch introduces two new helpers kernel_neon_{disable, enable} to > mark the area using FPSIMD/SVE context and use them in replacement of > local_bh_{disable, enable}. The functions kernel_neon_{begin, end} are > also re-implemented to use the new helpers. > > Signed-off-by: Julien Grall > > --- > > I have been exploring this solution as an alternative approach to the RT > patch "arm64: fpsimd: use preemp_disable in addition to local_bh_disable()". > > So far, the patch has only been lightly tested. > > For RT-linux, it might be possible to use migrate_{enable, disable}. I > am quite new with RT and have some trouble to understand the semantics > of migrate_{enable, disable}. So far, I am still unsure if it is possible > to run another userspace task on the same CPU while getting preempted > when the migration is disabled. Yes, a thread in a migrate_disable()-protected critical section can be preempted, and another thread on the same CPU may enter the critical section. If it's necessary to prevent this from happening, then you need to also make use of a per-CPU mutex. On RT, this would do the right thing w.r.t. priority inheritance. Julia
Re: Re: [PATCH v3 1/3] KVM: arm/arm64: vgic: Make vgic_irq->irq_lock a raw_spinlock
On Fri, Feb 01, 2019 at 03:30:58PM +, Julien Grall wrote: > Hi Julien, > > On 07/01/2019 15:06, Julien Thierry wrote: > > vgic_irq->irq_lock must always be taken with interrupts disabled as > > it is used in interrupt context. > > I am a bit confused with the reason here. The code mention that ap_list_lock > could be taken from the timer interrupt handler interrupt. I assume it > speaks about the handler kvm_arch_timer_handler. Looking at the > configuration of the interrupt, the flag IRQF_NO_THREAD is not set, so the > interrupt should be threaded when CONFIG_PREEMPT_FULL is set. If my > understanding is correct, this means the interrupt thread would sleep if it > takes the spinlock. > > Did I miss anything? Do you have an exact path where the vGIC is actually > called from an interrupt context? The part you're missing is that percpu interrupts are not force threaded: static int irq_setup_forced_threading(struct irqaction *new) { if (!force_irqthreads) return 0; if (new->flags & (IRQF_NO_THREAD | IRQF_PERCPU | IRQF_ONESHOT)) return 0; /* ...*/ } Julia
[ANNOUNCE] 4.9.146-rt125
Hello RT Folks! I'm pleased to announce the 4.9.146-rt125 stable release. Apologies for an update to the 4.9-rt tree being way overdue. This release is just an update to the new stable 4.9.146 version and no RT specific changes have been made. You can get this release via the git tree at: git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git branch: v4.9-rt Head SHA1: 5b5e350d8c82f5dcbad136aeede6b73d41cd1e8a Or to build 4.9.146-rt125 directly, the following patches should be applied: http://www.kernel.org/pub/linux/kernel/v4.x/linux-4.9.tar.xz http://www.kernel.org/pub/linux/kernel/v4.x/patch-4.9.146.xz http://www.kernel.org/pub/linux/kernel/projects/rt/4.9/patch-4.9.146-rt125.patch.xz Enjoy, and happy holidays, Julia
[PATCH 0/2] Fix watchdogd wakeup deferral on RT
The following two patches solve an issue reported by Steffen Trumtrar and Tim Sander to the linux-rt-users mailing list. Namely, the wakeup of the watchdogd thread being starved by an RT task due to the hrtimer deferral through ktimersoftd (on PREEMPT_RT_FULL). The first patch adjusts the kthread_worker locking to make use of a raw spinlock, making it suitable for work item queueing from hardirq context on PREEMPT_RT. This patch can be applied directly to mainline without any functional change. The second patch adjusts the hrtimer used by the watchdog core to be a _HARD timer (and thus not deferred through ktimersoftd w/ PREEMPT_RT). This patch depends on hrtimer patches carried in the RT patch, and so should therefore land there. Cc: Guenter Roeck Cc: Steffen Trumtrar Cc: Tim Sander Julia Cartwright (2): kthread: convert worker lock to raw spinlock watchdog, rt: prevent deferral of watchdogd wakeup drivers/watchdog/watchdog_dev.c | 2 +- include/linux/kthread.h | 2 +- kernel/kthread.c| 42 - 3 files changed, 23 insertions(+), 23 deletions(-) -- 2.18.0
[PATCH 1/2] kthread: convert worker lock to raw spinlock
In order to enable the queuing of kthread work items from hardirq context even when PREEMPT_RT_FULL is enabled, convert the worker spin_lock to a raw_spin_lock. This is only acceptable to do because the work performed under the lock is well-bounded and minimal. Cc: Sebastian Andrzej Siewior Cc: Guenter Roeck Reported-and-tested-by: Steffen Trumtrar Reported-by: Tim Sander Signed-off-by: Julia Cartwright --- include/linux/kthread.h | 2 +- kernel/kthread.c| 42 - 2 files changed, 22 insertions(+), 22 deletions(-) diff --git a/include/linux/kthread.h b/include/linux/kthread.h index c1961761311d..ad292898f7f2 100644 --- a/include/linux/kthread.h +++ b/include/linux/kthread.h @@ -85,7 +85,7 @@ enum { struct kthread_worker { unsigned intflags; - spinlock_t lock; + raw_spinlock_t lock; struct list_headwork_list; struct list_headdelayed_work_list; struct task_struct *task; diff --git a/kernel/kthread.c b/kernel/kthread.c index 486dedbd9af5..c1d9ee6671c6 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -597,7 +597,7 @@ void __kthread_init_worker(struct kthread_worker *worker, struct lock_class_key *key) { memset(worker, 0, sizeof(struct kthread_worker)); - spin_lock_init(>lock); + raw_spin_lock_init(>lock); lockdep_set_class_and_name(>lock, key, name); INIT_LIST_HEAD(>work_list); INIT_LIST_HEAD(>delayed_work_list); @@ -639,21 +639,21 @@ int kthread_worker_fn(void *worker_ptr) if (kthread_should_stop()) { __set_current_state(TASK_RUNNING); - spin_lock_irq(>lock); + raw_spin_lock_irq(>lock); worker->task = NULL; - spin_unlock_irq(>lock); + raw_spin_unlock_irq(>lock); return 0; } work = NULL; - spin_lock_irq(>lock); + raw_spin_lock_irq(>lock); if (!list_empty(>work_list)) { work = list_first_entry(>work_list, struct kthread_work, node); list_del_init(>node); } worker->current_work = work; - spin_unlock_irq(>lock); + raw_spin_unlock_irq(>lock); if (work) { __set_current_state(TASK_RUNNING); @@ -810,12 +810,12 @@ bool kthread_queue_work(struct kthread_worker *worker, bool ret = false; unsigned long flags; - spin_lock_irqsave(>lock, flags); + raw_spin_lock_irqsave(>lock, flags); if (!queuing_blocked(worker, work)) { kthread_insert_work(worker, work, >work_list); ret = true; } - spin_unlock_irqrestore(>lock, flags); + raw_spin_unlock_irqrestore(>lock, flags); return ret; } EXPORT_SYMBOL_GPL(kthread_queue_work); @@ -841,7 +841,7 @@ void kthread_delayed_work_timer_fn(struct timer_list *t) if (WARN_ON_ONCE(!worker)) return; - spin_lock(>lock); + raw_spin_lock(>lock); /* Work must not be used with >1 worker, see kthread_queue_work(). */ WARN_ON_ONCE(work->worker != worker); @@ -850,7 +850,7 @@ void kthread_delayed_work_timer_fn(struct timer_list *t) list_del_init(>node); kthread_insert_work(worker, work, >work_list); - spin_unlock(>lock); + raw_spin_unlock(>lock); } EXPORT_SYMBOL(kthread_delayed_work_timer_fn); @@ -906,14 +906,14 @@ bool kthread_queue_delayed_work(struct kthread_worker *worker, unsigned long flags; bool ret = false; - spin_lock_irqsave(>lock, flags); + raw_spin_lock_irqsave(>lock, flags); if (!queuing_blocked(worker, work)) { __kthread_queue_delayed_work(worker, dwork, delay); ret = true; } - spin_unlock_irqrestore(>lock, flags); + raw_spin_unlock_irqrestore(>lock, flags); return ret; } EXPORT_SYMBOL_GPL(kthread_queue_delayed_work); @@ -949,7 +949,7 @@ void kthread_flush_work(struct kthread_work *work) if (!worker) return; - spin_lock_irq(>lock); + raw_spin_lock_irq(>lock); /* Work must not be used with >1 worker, see kthread_queue_work(). */ WARN_ON_ONCE(work->worker != worker); @@ -961,7 +961,7 @@ void kthread_flush_work(struct kthread_work *work) else noop = true; - spin_unlock_irq(>lock); + raw_spin_unlock_irq(>lock); if (!noop) wait_for_completion(); @@ -994,9 +994,9 @@ static bool __kthread_cancel_work(struct kthread_work *work, bool is_dwork, * any queuing is blocked by setting the canceling counter. */ work-&g
[PATCH 0/2] Fix watchdogd wakeup deferral on RT
The following two patches solve an issue reported by Steffen Trumtrar and Tim Sander to the linux-rt-users mailing list. Namely, the wakeup of the watchdogd thread being starved by an RT task due to the hrtimer deferral through ktimersoftd (on PREEMPT_RT_FULL). The first patch adjusts the kthread_worker locking to make use of a raw spinlock, making it suitable for work item queueing from hardirq context on PREEMPT_RT. This patch can be applied directly to mainline without any functional change. The second patch adjusts the hrtimer used by the watchdog core to be a _HARD timer (and thus not deferred through ktimersoftd w/ PREEMPT_RT). This patch depends on hrtimer patches carried in the RT patch, and so should therefore land there. Cc: Guenter Roeck Cc: Steffen Trumtrar Cc: Tim Sander Julia Cartwright (2): kthread: convert worker lock to raw spinlock watchdog, rt: prevent deferral of watchdogd wakeup drivers/watchdog/watchdog_dev.c | 2 +- include/linux/kthread.h | 2 +- kernel/kthread.c| 42 - 3 files changed, 23 insertions(+), 23 deletions(-) -- 2.18.0
[PATCH 1/2] kthread: convert worker lock to raw spinlock
In order to enable the queuing of kthread work items from hardirq context even when PREEMPT_RT_FULL is enabled, convert the worker spin_lock to a raw_spin_lock. This is only acceptable to do because the work performed under the lock is well-bounded and minimal. Cc: Sebastian Andrzej Siewior Cc: Guenter Roeck Reported-and-tested-by: Steffen Trumtrar Reported-by: Tim Sander Signed-off-by: Julia Cartwright --- include/linux/kthread.h | 2 +- kernel/kthread.c| 42 - 2 files changed, 22 insertions(+), 22 deletions(-) diff --git a/include/linux/kthread.h b/include/linux/kthread.h index c1961761311d..ad292898f7f2 100644 --- a/include/linux/kthread.h +++ b/include/linux/kthread.h @@ -85,7 +85,7 @@ enum { struct kthread_worker { unsigned intflags; - spinlock_t lock; + raw_spinlock_t lock; struct list_headwork_list; struct list_headdelayed_work_list; struct task_struct *task; diff --git a/kernel/kthread.c b/kernel/kthread.c index 486dedbd9af5..c1d9ee6671c6 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -597,7 +597,7 @@ void __kthread_init_worker(struct kthread_worker *worker, struct lock_class_key *key) { memset(worker, 0, sizeof(struct kthread_worker)); - spin_lock_init(>lock); + raw_spin_lock_init(>lock); lockdep_set_class_and_name(>lock, key, name); INIT_LIST_HEAD(>work_list); INIT_LIST_HEAD(>delayed_work_list); @@ -639,21 +639,21 @@ int kthread_worker_fn(void *worker_ptr) if (kthread_should_stop()) { __set_current_state(TASK_RUNNING); - spin_lock_irq(>lock); + raw_spin_lock_irq(>lock); worker->task = NULL; - spin_unlock_irq(>lock); + raw_spin_unlock_irq(>lock); return 0; } work = NULL; - spin_lock_irq(>lock); + raw_spin_lock_irq(>lock); if (!list_empty(>work_list)) { work = list_first_entry(>work_list, struct kthread_work, node); list_del_init(>node); } worker->current_work = work; - spin_unlock_irq(>lock); + raw_spin_unlock_irq(>lock); if (work) { __set_current_state(TASK_RUNNING); @@ -810,12 +810,12 @@ bool kthread_queue_work(struct kthread_worker *worker, bool ret = false; unsigned long flags; - spin_lock_irqsave(>lock, flags); + raw_spin_lock_irqsave(>lock, flags); if (!queuing_blocked(worker, work)) { kthread_insert_work(worker, work, >work_list); ret = true; } - spin_unlock_irqrestore(>lock, flags); + raw_spin_unlock_irqrestore(>lock, flags); return ret; } EXPORT_SYMBOL_GPL(kthread_queue_work); @@ -841,7 +841,7 @@ void kthread_delayed_work_timer_fn(struct timer_list *t) if (WARN_ON_ONCE(!worker)) return; - spin_lock(>lock); + raw_spin_lock(>lock); /* Work must not be used with >1 worker, see kthread_queue_work(). */ WARN_ON_ONCE(work->worker != worker); @@ -850,7 +850,7 @@ void kthread_delayed_work_timer_fn(struct timer_list *t) list_del_init(>node); kthread_insert_work(worker, work, >work_list); - spin_unlock(>lock); + raw_spin_unlock(>lock); } EXPORT_SYMBOL(kthread_delayed_work_timer_fn); @@ -906,14 +906,14 @@ bool kthread_queue_delayed_work(struct kthread_worker *worker, unsigned long flags; bool ret = false; - spin_lock_irqsave(>lock, flags); + raw_spin_lock_irqsave(>lock, flags); if (!queuing_blocked(worker, work)) { __kthread_queue_delayed_work(worker, dwork, delay); ret = true; } - spin_unlock_irqrestore(>lock, flags); + raw_spin_unlock_irqrestore(>lock, flags); return ret; } EXPORT_SYMBOL_GPL(kthread_queue_delayed_work); @@ -949,7 +949,7 @@ void kthread_flush_work(struct kthread_work *work) if (!worker) return; - spin_lock_irq(>lock); + raw_spin_lock_irq(>lock); /* Work must not be used with >1 worker, see kthread_queue_work(). */ WARN_ON_ONCE(work->worker != worker); @@ -961,7 +961,7 @@ void kthread_flush_work(struct kthread_work *work) else noop = true; - spin_unlock_irq(>lock); + raw_spin_unlock_irq(>lock); if (!noop) wait_for_completion(); @@ -994,9 +994,9 @@ static bool __kthread_cancel_work(struct kthread_work *work, bool is_dwork, * any queuing is blocked by setting the canceling counter. */ work-&g
[PATCH RT 2/2] watchdog, rt: prevent deferral of watchdogd wakeup
When PREEMPT_RT_FULL is enabled, all hrtimer expiry functions are deferred for execution into the context of ktimersoftd unless otherwise annotated. Deferring the expiry of the hrtimer used by the watchdog core, however, is a waste, as the callback does nothing but queue a kthread work item and wakeup watchdogd. It's worst then that, too: the deferral through ktimersoftd also means that for correct behavior a user must adjust the scheduling parameters of both watchdogd _and_ ktimersoftd, which is unnecessary and has other side effects (like causing unrelated expiry functions to execute at potentially elevated priority). Instead, mark the hrtimer used by the watchdog core as being _HARD to allow it's execution directly from hardirq context. The work done in this expiry function is well-bounded and minimal. A user still must adjust the scheduling parameters of the watchdogd to be correct w.r.t. their application needs. Cc: Guenter Roeck Reported-and-tested-by: Steffen Trumtrar Reported-by: Tim Sander Signed-off-by: Julia Cartwright --- drivers/watchdog/watchdog_dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c index ffbdc4642ea5..9c2b3e5cebdc 100644 --- a/drivers/watchdog/watchdog_dev.c +++ b/drivers/watchdog/watchdog_dev.c @@ -945,7 +945,7 @@ static int watchdog_cdev_register(struct watchdog_device *wdd, dev_t devno) return -ENODEV; kthread_init_work(_data->work, watchdog_ping_work); - hrtimer_init(_data->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); + hrtimer_init(_data->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_HARD); wd_data->timer.function = watchdog_timer_expired; if (wdd->id == 0) { -- 2.18.0
[PATCH RT 2/2] watchdog, rt: prevent deferral of watchdogd wakeup
When PREEMPT_RT_FULL is enabled, all hrtimer expiry functions are deferred for execution into the context of ktimersoftd unless otherwise annotated. Deferring the expiry of the hrtimer used by the watchdog core, however, is a waste, as the callback does nothing but queue a kthread work item and wakeup watchdogd. It's worst then that, too: the deferral through ktimersoftd also means that for correct behavior a user must adjust the scheduling parameters of both watchdogd _and_ ktimersoftd, which is unnecessary and has other side effects (like causing unrelated expiry functions to execute at potentially elevated priority). Instead, mark the hrtimer used by the watchdog core as being _HARD to allow it's execution directly from hardirq context. The work done in this expiry function is well-bounded and minimal. A user still must adjust the scheduling parameters of the watchdogd to be correct w.r.t. their application needs. Cc: Guenter Roeck Reported-and-tested-by: Steffen Trumtrar Reported-by: Tim Sander Signed-off-by: Julia Cartwright --- drivers/watchdog/watchdog_dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c index ffbdc4642ea5..9c2b3e5cebdc 100644 --- a/drivers/watchdog/watchdog_dev.c +++ b/drivers/watchdog/watchdog_dev.c @@ -945,7 +945,7 @@ static int watchdog_cdev_register(struct watchdog_device *wdd, dev_t devno) return -ENODEV; kthread_init_work(_data->work, watchdog_ping_work); - hrtimer_init(_data->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); + hrtimer_init(_data->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_HARD); wd_data->timer.function = watchdog_timer_expired; if (wdd->id == 0) { -- 2.18.0
Re: [ANNOUNCE] Submit a topic for the RT Microconference at Linux Plumbers
Hello all- On Tue, Sep 04, 2018 at 01:25:29PM -0400, Steven Rostedt wrote: > Hi RT folks! > > The call for proposals (CfP) is now open for the RT Microconference at > Linux Plumbers in Vancouver, Canada. The topics we are looking at this > year are: > > - How will PREEMPT_RT be maintained (is it a subsystem?) > > - Who will maintain it? > > - How do we teach the rest of the kernel developers how not to break > PREEMPT_RT? > > - How do we catch when it breaks? > > - What tools can we add into tools/ that other kernel developers can use to > test and learn about PREEMPT_RT? > > - What tests can we add into tools/testing/selftests? > > - What kernel boot selftests can be added? > > - How can we modify lockdep to catch problems that can break PREEMPT_RT? > > - Discuss various types of failures that can happen with PREEMPT_RT that > normally would not happen in the vanilla kernel. > > - How do we handle stable backports? > > - Interaction with Safety Critical domains? > > - Interrupt threads are RT and are not protected by the RT Throttling. >How can we prevent interrupt thread starvation from a rogue RT task? > > - How to determine if a kernel is booted with PREEMPT_RT enabled? > > - librtpi: "solving" long-standing glibc pthread condvar internal lock > inversion with a new implementation; motivations, technical details, etc. > > - SCHED_DEADLINE - Lots of research has been made to various issues like CPU > affinity, Bandwidth inheritance, cgroup support; but nothing has been done to > the kernel. Let's make a commitment and push these ideas forward into > mainline. > > - How can we modify lockdep to catch problems that can break PREEMPT_RT? > > - Discuss various types of failures that can happen with PREEMPT_RT that > normally would not happen in the vanilla kernel. There was some confusion as to whether or not a topic being listed above automatically meant that it was to be included in the RT Microconference agenda. The answer is: NO. If you would like to see a specific topic discussed, either from this list or otherwise, you must submit it via the portal listed below. Only topics submitted to the portal will be considered for the RT Microconference agenda. Note also that submitting a topic will also mean that you will be responsible for introducing the necessary context to the group to facilitate further discussion. > If you want to lead one of the above sessions, or if you have a new > session you want to submit, please do so here: > > https://linuxplumbersconf.org/event/2/abstracts/ > > Set up an account, and then click "Submit new proposal". > > Fill out the "Title", "Content", select the "Authors" as the one leading > the session (yourself), and then for "Track" select "RT MC Topics CfP". > Select "Yes" to agree with the anti-harassment policy, and then hit > Submit. Thanks, and see you there! Julia
Re: [ANNOUNCE] Submit a topic for the RT Microconference at Linux Plumbers
Hello all- On Tue, Sep 04, 2018 at 01:25:29PM -0400, Steven Rostedt wrote: > Hi RT folks! > > The call for proposals (CfP) is now open for the RT Microconference at > Linux Plumbers in Vancouver, Canada. The topics we are looking at this > year are: > > - How will PREEMPT_RT be maintained (is it a subsystem?) > > - Who will maintain it? > > - How do we teach the rest of the kernel developers how not to break > PREEMPT_RT? > > - How do we catch when it breaks? > > - What tools can we add into tools/ that other kernel developers can use to > test and learn about PREEMPT_RT? > > - What tests can we add into tools/testing/selftests? > > - What kernel boot selftests can be added? > > - How can we modify lockdep to catch problems that can break PREEMPT_RT? > > - Discuss various types of failures that can happen with PREEMPT_RT that > normally would not happen in the vanilla kernel. > > - How do we handle stable backports? > > - Interaction with Safety Critical domains? > > - Interrupt threads are RT and are not protected by the RT Throttling. >How can we prevent interrupt thread starvation from a rogue RT task? > > - How to determine if a kernel is booted with PREEMPT_RT enabled? > > - librtpi: "solving" long-standing glibc pthread condvar internal lock > inversion with a new implementation; motivations, technical details, etc. > > - SCHED_DEADLINE - Lots of research has been made to various issues like CPU > affinity, Bandwidth inheritance, cgroup support; but nothing has been done to > the kernel. Let's make a commitment and push these ideas forward into > mainline. > > - How can we modify lockdep to catch problems that can break PREEMPT_RT? > > - Discuss various types of failures that can happen with PREEMPT_RT that > normally would not happen in the vanilla kernel. There was some confusion as to whether or not a topic being listed above automatically meant that it was to be included in the RT Microconference agenda. The answer is: NO. If you would like to see a specific topic discussed, either from this list or otherwise, you must submit it via the portal listed below. Only topics submitted to the portal will be considered for the RT Microconference agenda. Note also that submitting a topic will also mean that you will be responsible for introducing the necessary context to the group to facilitate further discussion. > If you want to lead one of the above sessions, or if you have a new > session you want to submit, please do so here: > > https://linuxplumbersconf.org/event/2/abstracts/ > > Set up an account, and then click "Submit new proposal". > > Fill out the "Title", "Content", select the "Authors" as the one leading > the session (yourself), and then for "Track" select "RT MC Topics CfP". > Select "Yes" to agree with the anti-harassment policy, and then hit > Submit. Thanks, and see you there! Julia
[PATCH RT 11/22] mm/slub: close possible memory-leak in kmem_cache_alloc_bulk()
From: Sebastian Andrzej Siewior 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- Under certain circumstances we could leak elements which were moved to the local "to_free" list. The damage is limited since I can't find any users here. Cc: stable...@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior (cherry picked from commit 5022166d3b225bf5e343efb3ea01b3c5a41d69ba) Signed-off-by: Julia Cartwright --- mm/slub.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/slub.c b/mm/slub.c index 67eb368b9314..738b2bccbd5f 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3209,6 +3209,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, return i; error: local_irq_enable(); + free_delayed(_free); slab_post_alloc_hook(s, flags, i, p); __kmem_cache_free_bulk(s, i, p); return 0; -- 2.18.0
[PATCH RT 11/22] mm/slub: close possible memory-leak in kmem_cache_alloc_bulk()
From: Sebastian Andrzej Siewior 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- Under certain circumstances we could leak elements which were moved to the local "to_free" list. The damage is limited since I can't find any users here. Cc: stable...@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior (cherry picked from commit 5022166d3b225bf5e343efb3ea01b3c5a41d69ba) Signed-off-by: Julia Cartwright --- mm/slub.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/slub.c b/mm/slub.c index 67eb368b9314..738b2bccbd5f 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3209,6 +3209,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, return i; error: local_irq_enable(); + free_delayed(_free); slab_post_alloc_hook(s, flags, i, p); __kmem_cache_free_bulk(s, i, p); return 0; -- 2.18.0
[ANNOUNCE] 4.9.115-rt93
Hello RT Folks! I'm pleased to announce the 4.9.115-rt93 stable release. This release is just an update to the new stable 4.9.115 version and no RT specific changes have been made. You can get this release via the git tree at: git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git branch: v4.9-rt Head SHA1: 236d20fd941a444577579ba398ba9907f3b33277 Or to build 4.9.115-rt93 directly, the following patches should be applied: http://www.kernel.org/pub/linux/kernel/v4.x/linux-4.9.tar.xz http://www.kernel.org/pub/linux/kernel/v4.x/patch-4.9.115.xz http://www.kernel.org/pub/linux/kernel/projects/rt/4.9/patch-4.9.115-rt93.patch.xz Enjoy! Julia
[ANNOUNCE] 4.9.115-rt93
Hello RT Folks! I'm pleased to announce the 4.9.115-rt93 stable release. This release is just an update to the new stable 4.9.115 version and no RT specific changes have been made. You can get this release via the git tree at: git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git branch: v4.9-rt Head SHA1: 236d20fd941a444577579ba398ba9907f3b33277 Or to build 4.9.115-rt93 directly, the following patches should be applied: http://www.kernel.org/pub/linux/kernel/v4.x/linux-4.9.tar.xz http://www.kernel.org/pub/linux/kernel/v4.x/patch-4.9.115.xz http://www.kernel.org/pub/linux/kernel/projects/rt/4.9/patch-4.9.115-rt93.patch.xz Enjoy! Julia
[PATCH RT 10/22] arm*: disable NEON in kernel mode
From: Sebastian Andrzej Siewior 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- NEON in kernel mode is used by the crypto algorithms and raid6 code. While the raid6 code looks okay, the crypto algorithms do not: NEON is enabled on first invocation and may allocate/free/map memory before the NEON mode is disabled again. This needs to be changed until it can be enabled. On ARM NEON in kernel mode can be simply disabled. on ARM64 it needs to stay on due to possible EFI callbacks so here I disable each algorithm. Cc: stable...@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior (cherry picked from commit b3a776555e0d465df138d254d6dc3ac1b718ac6d) Signed-off-by: Julia Cartwright --- arch/arm/Kconfig | 2 +- arch/arm64/crypto/Kconfig | 14 +++--- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 5715844e83e3..8c40f7d73251 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -2158,7 +2158,7 @@ config NEON config KERNEL_MODE_NEON bool "Support for NEON in kernel mode" - depends on NEON && AEABI + depends on NEON && AEABI && !PREEMPT_RT_BASE help Say Y to include support for NEON in kernel mode. diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig index 2cf32e9887e1..cd71b3432720 100644 --- a/arch/arm64/crypto/Kconfig +++ b/arch/arm64/crypto/Kconfig @@ -10,41 +10,41 @@ if ARM64_CRYPTO config CRYPTO_SHA1_ARM64_CE tristate "SHA-1 digest algorithm (ARMv8 Crypto Extensions)" - depends on ARM64 && KERNEL_MODE_NEON + depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE select CRYPTO_HASH config CRYPTO_SHA2_ARM64_CE tristate "SHA-224/SHA-256 digest algorithm (ARMv8 Crypto Extensions)" - depends on ARM64 && KERNEL_MODE_NEON + depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE select CRYPTO_HASH config CRYPTO_GHASH_ARM64_CE tristate "GHASH (for GCM chaining mode) using ARMv8 Crypto Extensions" - depends on ARM64 && KERNEL_MODE_NEON + depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE select CRYPTO_HASH config CRYPTO_AES_ARM64_CE tristate "AES core cipher using ARMv8 Crypto Extensions" - depends on ARM64 && KERNEL_MODE_NEON + depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE select CRYPTO_ALGAPI config CRYPTO_AES_ARM64_CE_CCM tristate "AES in CCM mode using ARMv8 Crypto Extensions" - depends on ARM64 && KERNEL_MODE_NEON + depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE select CRYPTO_ALGAPI select CRYPTO_AES_ARM64_CE select CRYPTO_AEAD config CRYPTO_AES_ARM64_CE_BLK tristate "AES in ECB/CBC/CTR/XTS modes using ARMv8 Crypto Extensions" - depends on ARM64 && KERNEL_MODE_NEON + depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE select CRYPTO_BLKCIPHER select CRYPTO_AES_ARM64_CE select CRYPTO_ABLK_HELPER config CRYPTO_AES_ARM64_NEON_BLK tristate "AES in ECB/CBC/CTR/XTS modes using NEON instructions" - depends on ARM64 && KERNEL_MODE_NEON + depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE select CRYPTO_BLKCIPHER select CRYPTO_AES select CRYPTO_ABLK_HELPER -- 2.18.0
[PATCH RT 17/22] alarmtimer: Prevent live lock in alarm_cancel()
From: Sebastian Andrzej Siewior 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- If alarm_try_to_cancel() requires a retry, then depending on the priority setting the retry loop might prevent timer callback completion on RT. Prevent that by waiting for completion on RT, no change for a non RT kernel. Cc: stable...@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior (cherry picked from commit 51e376c469bf05f32cb1ceb9e39d31bb92f1f6c8) Signed-off-by: Julia Cartwright --- kernel/time/alarmtimer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c index d67ef56ca9bc..61b20a656863 100644 --- a/kernel/time/alarmtimer.c +++ b/kernel/time/alarmtimer.c @@ -407,7 +407,7 @@ int alarm_cancel(struct alarm *alarm) int ret = alarm_try_to_cancel(alarm); if (ret >= 0) return ret; - cpu_relax(); + hrtimer_wait_for_timer(>timer); } } EXPORT_SYMBOL_GPL(alarm_cancel); -- 2.18.0
[PATCH RT 18/22] posix-timers: move the rcu head out of the union
From: Sebastian Andrzej Siewior 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- On RT the timer can be preempted while running and therefore we wait with timer_wait_for_callback() for the timer to complete (instead of busy looping). The RCU-readlock is held to ensure that this posix timer is not removed while we wait on it. If the timer is removed then it invokes call_rcu() with a pointer that is shared with the hrtimer because it is part of the same union. In order to avoid any possible side effects I am moving the rcu pointer out of the union. Cc: stable...@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior (cherry picked from commit b8401365af110949f12c7cf1fa86b4c0ea069bbd) Signed-off-by: Julia Cartwright --- include/linux/posix-timers.h | 2 +- kernel/time/posix-timers.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h index 62d44c176071..cbd3f9334543 100644 --- a/include/linux/posix-timers.h +++ b/include/linux/posix-timers.h @@ -92,8 +92,8 @@ struct k_itimer { struct alarm alarmtimer; ktime_t interval; } alarm; - struct rcu_head rcu; } it; + struct rcu_head rcu; }; struct k_clock { diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c index 6084618436fd..45d8033caec4 100644 --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -568,7 +568,7 @@ static struct k_itimer * alloc_posix_timer(void) static void k_itimer_rcu_free(struct rcu_head *head) { - struct k_itimer *tmr = container_of(head, struct k_itimer, it.rcu); + struct k_itimer *tmr = container_of(head, struct k_itimer, rcu); kmem_cache_free(posix_timers_cache, tmr); } @@ -585,7 +585,7 @@ static void release_posix_timer(struct k_itimer *tmr, int it_id_set) } put_pid(tmr->it_pid); sigqueue_free(tmr->sigq); - call_rcu(>it.rcu, k_itimer_rcu_free); + call_rcu(>rcu, k_itimer_rcu_free); } static struct k_clock *clockid_to_kclock(const clockid_t id) -- 2.18.0
[PATCH RT 10/22] arm*: disable NEON in kernel mode
From: Sebastian Andrzej Siewior 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- NEON in kernel mode is used by the crypto algorithms and raid6 code. While the raid6 code looks okay, the crypto algorithms do not: NEON is enabled on first invocation and may allocate/free/map memory before the NEON mode is disabled again. This needs to be changed until it can be enabled. On ARM NEON in kernel mode can be simply disabled. on ARM64 it needs to stay on due to possible EFI callbacks so here I disable each algorithm. Cc: stable...@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior (cherry picked from commit b3a776555e0d465df138d254d6dc3ac1b718ac6d) Signed-off-by: Julia Cartwright --- arch/arm/Kconfig | 2 +- arch/arm64/crypto/Kconfig | 14 +++--- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 5715844e83e3..8c40f7d73251 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -2158,7 +2158,7 @@ config NEON config KERNEL_MODE_NEON bool "Support for NEON in kernel mode" - depends on NEON && AEABI + depends on NEON && AEABI && !PREEMPT_RT_BASE help Say Y to include support for NEON in kernel mode. diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig index 2cf32e9887e1..cd71b3432720 100644 --- a/arch/arm64/crypto/Kconfig +++ b/arch/arm64/crypto/Kconfig @@ -10,41 +10,41 @@ if ARM64_CRYPTO config CRYPTO_SHA1_ARM64_CE tristate "SHA-1 digest algorithm (ARMv8 Crypto Extensions)" - depends on ARM64 && KERNEL_MODE_NEON + depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE select CRYPTO_HASH config CRYPTO_SHA2_ARM64_CE tristate "SHA-224/SHA-256 digest algorithm (ARMv8 Crypto Extensions)" - depends on ARM64 && KERNEL_MODE_NEON + depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE select CRYPTO_HASH config CRYPTO_GHASH_ARM64_CE tristate "GHASH (for GCM chaining mode) using ARMv8 Crypto Extensions" - depends on ARM64 && KERNEL_MODE_NEON + depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE select CRYPTO_HASH config CRYPTO_AES_ARM64_CE tristate "AES core cipher using ARMv8 Crypto Extensions" - depends on ARM64 && KERNEL_MODE_NEON + depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE select CRYPTO_ALGAPI config CRYPTO_AES_ARM64_CE_CCM tristate "AES in CCM mode using ARMv8 Crypto Extensions" - depends on ARM64 && KERNEL_MODE_NEON + depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE select CRYPTO_ALGAPI select CRYPTO_AES_ARM64_CE select CRYPTO_AEAD config CRYPTO_AES_ARM64_CE_BLK tristate "AES in ECB/CBC/CTR/XTS modes using ARMv8 Crypto Extensions" - depends on ARM64 && KERNEL_MODE_NEON + depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE select CRYPTO_BLKCIPHER select CRYPTO_AES_ARM64_CE select CRYPTO_ABLK_HELPER config CRYPTO_AES_ARM64_NEON_BLK tristate "AES in ECB/CBC/CTR/XTS modes using NEON instructions" - depends on ARM64 && KERNEL_MODE_NEON + depends on ARM64 && KERNEL_MODE_NEON && !PREEMPT_RT_BASE select CRYPTO_BLKCIPHER select CRYPTO_AES select CRYPTO_ABLK_HELPER -- 2.18.0
[PATCH RT 17/22] alarmtimer: Prevent live lock in alarm_cancel()
From: Sebastian Andrzej Siewior 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- If alarm_try_to_cancel() requires a retry, then depending on the priority setting the retry loop might prevent timer callback completion on RT. Prevent that by waiting for completion on RT, no change for a non RT kernel. Cc: stable...@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior (cherry picked from commit 51e376c469bf05f32cb1ceb9e39d31bb92f1f6c8) Signed-off-by: Julia Cartwright --- kernel/time/alarmtimer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c index d67ef56ca9bc..61b20a656863 100644 --- a/kernel/time/alarmtimer.c +++ b/kernel/time/alarmtimer.c @@ -407,7 +407,7 @@ int alarm_cancel(struct alarm *alarm) int ret = alarm_try_to_cancel(alarm); if (ret >= 0) return ret; - cpu_relax(); + hrtimer_wait_for_timer(>timer); } } EXPORT_SYMBOL_GPL(alarm_cancel); -- 2.18.0
[PATCH RT 18/22] posix-timers: move the rcu head out of the union
From: Sebastian Andrzej Siewior 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- On RT the timer can be preempted while running and therefore we wait with timer_wait_for_callback() for the timer to complete (instead of busy looping). The RCU-readlock is held to ensure that this posix timer is not removed while we wait on it. If the timer is removed then it invokes call_rcu() with a pointer that is shared with the hrtimer because it is part of the same union. In order to avoid any possible side effects I am moving the rcu pointer out of the union. Cc: stable...@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior (cherry picked from commit b8401365af110949f12c7cf1fa86b4c0ea069bbd) Signed-off-by: Julia Cartwright --- include/linux/posix-timers.h | 2 +- kernel/time/posix-timers.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h index 62d44c176071..cbd3f9334543 100644 --- a/include/linux/posix-timers.h +++ b/include/linux/posix-timers.h @@ -92,8 +92,8 @@ struct k_itimer { struct alarm alarmtimer; ktime_t interval; } alarm; - struct rcu_head rcu; } it; + struct rcu_head rcu; }; struct k_clock { diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c index 6084618436fd..45d8033caec4 100644 --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -568,7 +568,7 @@ static struct k_itimer * alloc_posix_timer(void) static void k_itimer_rcu_free(struct rcu_head *head) { - struct k_itimer *tmr = container_of(head, struct k_itimer, it.rcu); + struct k_itimer *tmr = container_of(head, struct k_itimer, rcu); kmem_cache_free(posix_timers_cache, tmr); } @@ -585,7 +585,7 @@ static void release_posix_timer(struct k_itimer *tmr, int it_id_set) } put_pid(tmr->it_pid); sigqueue_free(tmr->sigq); - call_rcu(>it.rcu, k_itimer_rcu_free); + call_rcu(>rcu, k_itimer_rcu_free); } static struct k_clock *clockid_to_kclock(const clockid_t id) -- 2.18.0
[PATCH RT 19/22] locallock: provide {get,put}_locked_ptr() variants
4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- Provide a set of locallocked accessors for pointers to per-CPU data; this is useful for dynamically-allocated per-CPU regions, for example. These are symmetric with the {get,put}_cpu_ptr() per-CPU accessor variants. Signed-off-by: Julia Cartwright Signed-off-by: Sebastian Andrzej Siewior (cherry picked from commit 3d45cf23db4f76cd356ebb0aa4cdaa7d92d1a64e) Signed-off-by: Julia Cartwright --- include/linux/locallock.h | 10 ++ 1 file changed, 10 insertions(+) diff --git a/include/linux/locallock.h b/include/linux/locallock.h index 280f884a05a3..0c3ff5b23f6a 100644 --- a/include/linux/locallock.h +++ b/include/linux/locallock.h @@ -238,6 +238,14 @@ static inline int __local_unlock_irqrestore(struct local_irq_lock *lv, #define put_locked_var(lvar, var) local_unlock(lvar); +#define get_locked_ptr(lvar, var) \ + ({ \ + local_lock(lvar); \ + this_cpu_ptr(var); \ + }) + +#define put_locked_ptr(lvar, var) local_unlock(lvar); + #define local_lock_cpu(lvar) \ ({ \ local_lock(lvar); \ @@ -278,6 +286,8 @@ static inline void local_irq_lock_init(int lvar) { } #define get_locked_var(lvar, var) get_cpu_var(var) #define put_locked_var(lvar, var) put_cpu_var(var) +#define get_locked_ptr(lvar, var) get_cpu_ptr(var) +#define put_locked_ptr(lvar, var) put_cpu_ptr(var) #define local_lock_cpu(lvar) get_cpu() #define local_unlock_cpu(lvar) put_cpu() -- 2.18.0
[PATCH RT 19/22] locallock: provide {get,put}_locked_ptr() variants
4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- Provide a set of locallocked accessors for pointers to per-CPU data; this is useful for dynamically-allocated per-CPU regions, for example. These are symmetric with the {get,put}_cpu_ptr() per-CPU accessor variants. Signed-off-by: Julia Cartwright Signed-off-by: Sebastian Andrzej Siewior (cherry picked from commit 3d45cf23db4f76cd356ebb0aa4cdaa7d92d1a64e) Signed-off-by: Julia Cartwright --- include/linux/locallock.h | 10 ++ 1 file changed, 10 insertions(+) diff --git a/include/linux/locallock.h b/include/linux/locallock.h index 280f884a05a3..0c3ff5b23f6a 100644 --- a/include/linux/locallock.h +++ b/include/linux/locallock.h @@ -238,6 +238,14 @@ static inline int __local_unlock_irqrestore(struct local_irq_lock *lv, #define put_locked_var(lvar, var) local_unlock(lvar); +#define get_locked_ptr(lvar, var) \ + ({ \ + local_lock(lvar); \ + this_cpu_ptr(var); \ + }) + +#define put_locked_ptr(lvar, var) local_unlock(lvar); + #define local_lock_cpu(lvar) \ ({ \ local_lock(lvar); \ @@ -278,6 +286,8 @@ static inline void local_irq_lock_init(int lvar) { } #define get_locked_var(lvar, var) get_cpu_var(var) #define put_locked_var(lvar, var) put_cpu_var(var) +#define get_locked_ptr(lvar, var) get_cpu_ptr(var) +#define put_locked_ptr(lvar, var) put_cpu_ptr(var) #define local_lock_cpu(lvar) get_cpu() #define local_unlock_cpu(lvar) put_cpu() -- 2.18.0
[PATCH RT 06/22] rcu: Do not include rtmutex_common.h unconditionally
From: Sebastian Andrzej Siewior 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- [ upstream commit b88697810d7c1d102a529990f9071b0f14cfe6df ] This commit adjusts include files and provides definitions in preparation for suppressing lockdep false-positive ->boost_mtx complaints. Without this preparation, architectures not supporting rt_mutex will get build failures. Reported-by: kbuild test robot Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Paul E. McKenney Signed-off-by: Julia Cartwright --- kernel/rcu/tree_plugin.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index a1a7bafcce15..3d18d08e8382 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -37,6 +37,7 @@ * This probably needs to be excluded from -rt builds. */ #define rt_mutex_owner(a) ({ WARN_ON_ONCE(1); NULL; }) +#define rt_mutex_futex_unlock(x) WARN_ON_ONCE(1) #endif /* #else #ifdef CONFIG_RCU_BOOST */ @@ -834,8 +835,6 @@ static void rcu_cpu_kthread_setup(unsigned int cpu) #ifdef CONFIG_RCU_BOOST -#include "../locking/rtmutex_common.h" - #ifdef CONFIG_RCU_TRACE static void rcu_initiate_boost_trace(struct rcu_node *rnp) -- 2.18.0
[PATCH RT 12/22] locking: add types.h
From: Sebastian Andrzej Siewior 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- During the stable update the arm architecture did not compile anymore due to missing definition of u16/32. Cc: stable...@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior (cherry picked from commit 1289b06974d64f244a26455fab699c6a1332f4bc) Signed-off-by: Julia Cartwright --- include/linux/spinlock_types_raw.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/spinlock_types_raw.h b/include/linux/spinlock_types_raw.h index edffc4d53fc9..03235b475b77 100644 --- a/include/linux/spinlock_types_raw.h +++ b/include/linux/spinlock_types_raw.h @@ -1,6 +1,8 @@ #ifndef __LINUX_SPINLOCK_TYPES_RAW_H #define __LINUX_SPINLOCK_TYPES_RAW_H +#include + #if defined(CONFIG_SMP) # include #else -- 2.18.0
[PATCH RT 06/22] rcu: Do not include rtmutex_common.h unconditionally
From: Sebastian Andrzej Siewior 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- [ upstream commit b88697810d7c1d102a529990f9071b0f14cfe6df ] This commit adjusts include files and provides definitions in preparation for suppressing lockdep false-positive ->boost_mtx complaints. Without this preparation, architectures not supporting rt_mutex will get build failures. Reported-by: kbuild test robot Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Paul E. McKenney Signed-off-by: Julia Cartwright --- kernel/rcu/tree_plugin.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index a1a7bafcce15..3d18d08e8382 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -37,6 +37,7 @@ * This probably needs to be excluded from -rt builds. */ #define rt_mutex_owner(a) ({ WARN_ON_ONCE(1); NULL; }) +#define rt_mutex_futex_unlock(x) WARN_ON_ONCE(1) #endif /* #else #ifdef CONFIG_RCU_BOOST */ @@ -834,8 +835,6 @@ static void rcu_cpu_kthread_setup(unsigned int cpu) #ifdef CONFIG_RCU_BOOST -#include "../locking/rtmutex_common.h" - #ifdef CONFIG_RCU_TRACE static void rcu_initiate_boost_trace(struct rcu_node *rnp) -- 2.18.0
[PATCH RT 12/22] locking: add types.h
From: Sebastian Andrzej Siewior 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- During the stable update the arm architecture did not compile anymore due to missing definition of u16/32. Cc: stable...@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior (cherry picked from commit 1289b06974d64f244a26455fab699c6a1332f4bc) Signed-off-by: Julia Cartwright --- include/linux/spinlock_types_raw.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/spinlock_types_raw.h b/include/linux/spinlock_types_raw.h index edffc4d53fc9..03235b475b77 100644 --- a/include/linux/spinlock_types_raw.h +++ b/include/linux/spinlock_types_raw.h @@ -1,6 +1,8 @@ #ifndef __LINUX_SPINLOCK_TYPES_RAW_H #define __LINUX_SPINLOCK_TYPES_RAW_H +#include + #if defined(CONFIG_SMP) # include #else -- 2.18.0
[PATCH RT 09/22] crypto: limit more FPU-enabled sections
From: Sebastian Andrzej Siewior 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- Those crypto drivers use SSE/AVX/… for their crypto work and in order to do so in kernel they need to enable the "FPU" in kernel mode which disables preemption. There are two problems with the way they are used: - the while loop which processes X bytes may create latency spikes and should be avoided or limited. - the cipher-walk-next part may allocate/free memory and may use kmap_atomic(). The whole kernel_fpu_begin()/end() processing isn't probably that cheap. It most likely makes sense to process as much of those as possible in one go. The new *_fpu_sched_rt() schedules only if a RT task is pending. Probably we should measure the performance those ciphers in pure SW mode and with this optimisations to see if it makes sense to keep them for RT. This kernel_fpu_resched() makes the code more preemptible which might hurt performance. Cc: stable...@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior (cherry picked from commit 0dcc4c1693ef37e166da420ef7c68c7047c996f1) Signed-off-by: Julia Cartwright --- arch/x86/crypto/camellia_aesni_avx2_glue.c | 20 arch/x86/crypto/camellia_aesni_avx_glue.c | 19 +++ arch/x86/crypto/cast6_avx_glue.c | 24 +++ arch/x86/crypto/chacha20_glue.c| 9 arch/x86/crypto/serpent_avx2_glue.c| 19 +++ arch/x86/crypto/serpent_avx_glue.c | 23 ++ arch/x86/crypto/serpent_sse2_glue.c| 23 ++ arch/x86/crypto/twofish_avx_glue.c | 27 -- arch/x86/include/asm/fpu/api.h | 1 + arch/x86/kernel/fpu/core.c | 12 ++ 10 files changed, 158 insertions(+), 19 deletions(-) diff --git a/arch/x86/crypto/camellia_aesni_avx2_glue.c b/arch/x86/crypto/camellia_aesni_avx2_glue.c index 60907c139c4e..0902db7d326a 100644 --- a/arch/x86/crypto/camellia_aesni_avx2_glue.c +++ b/arch/x86/crypto/camellia_aesni_avx2_glue.c @@ -206,6 +206,20 @@ struct crypt_priv { bool fpu_enabled; }; +#ifdef CONFIG_PREEMPT_RT_FULL +static void camellia_fpu_end_rt(struct crypt_priv *ctx) +{ + bool fpu_enabled = ctx->fpu_enabled; + + if (!fpu_enabled) + return; + camellia_fpu_end(fpu_enabled); + ctx->fpu_enabled = false; +} +#else +static void camellia_fpu_end_rt(struct crypt_priv *ctx) { } +#endif + static void encrypt_callback(void *priv, u8 *srcdst, unsigned int nbytes) { const unsigned int bsize = CAMELLIA_BLOCK_SIZE; @@ -221,16 +235,19 @@ static void encrypt_callback(void *priv, u8 *srcdst, unsigned int nbytes) } if (nbytes >= CAMELLIA_AESNI_PARALLEL_BLOCKS * bsize) { + kernel_fpu_resched(); camellia_ecb_enc_16way(ctx->ctx, srcdst, srcdst); srcdst += bsize * CAMELLIA_AESNI_PARALLEL_BLOCKS; nbytes -= bsize * CAMELLIA_AESNI_PARALLEL_BLOCKS; } while (nbytes >= CAMELLIA_PARALLEL_BLOCKS * bsize) { + kernel_fpu_resched(); camellia_enc_blk_2way(ctx->ctx, srcdst, srcdst); srcdst += bsize * CAMELLIA_PARALLEL_BLOCKS; nbytes -= bsize * CAMELLIA_PARALLEL_BLOCKS; } + camellia_fpu_end_rt(ctx); for (i = 0; i < nbytes / bsize; i++, srcdst += bsize) camellia_enc_blk(ctx->ctx, srcdst, srcdst); @@ -251,16 +268,19 @@ static void decrypt_callback(void *priv, u8 *srcdst, unsigned int nbytes) } if (nbytes >= CAMELLIA_AESNI_PARALLEL_BLOCKS * bsize) { + kernel_fpu_resched(); camellia_ecb_dec_16way(ctx->ctx, srcdst, srcdst); srcdst += bsize * CAMELLIA_AESNI_PARALLEL_BLOCKS; nbytes -= bsize * CAMELLIA_AESNI_PARALLEL_BLOCKS; } while (nbytes >= CAMELLIA_PARALLEL_BLOCKS * bsize) { + kernel_fpu_resched(); camellia_dec_blk_2way(ctx->ctx, srcdst, srcdst); srcdst += bsize * CAMELLIA_PARALLEL_BLOCKS; nbytes -= bsize * CAMELLIA_PARALLEL_BLOCKS; } + camellia_fpu_end_rt(ctx); for (i = 0; i < nbytes / bsize; i++, srcdst += bsize) camellia_dec_blk(ctx->ctx, srcdst, srcdst); diff --git a/arch/x86/crypto/camellia_aesni_avx_glue.c b/arch/x86/crypto/camellia_aesni_avx_glue.c index d96429da88eb..3b8e91841039 100644 --- a/arch/x86/crypto/camellia_aesni_avx_glue.c +++ b/arch/x86/crypto/camellia_aesni_avx_glue.c @@ -210,6 +210,21 @@ struct crypt_priv { bool fpu_enabled; }; +#ifdef CONFIG_PREEMPT_RT_FULL +static void camellia_fpu_end_rt(struct crypt_priv *ctx) +{ + bool fpu_enabled = ctx->fpu_enabled; + + if (!fpu_enabled) + return;
[PATCH RT 07/22] rcu: Suppress lockdep false-positive ->boost_mtx complaints
From: "Paul E. McKenney" 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- [ Upstream commit 02a7c234e54052101164368ff981bd72f7acdd65 ] RCU priority boosting uses rt_mutex_init_proxy_locked() to initialize an rt_mutex structure in locked state held by some other task. When that other task releases it, lockdep complains (quite accurately, but a bit uselessly) that the other task never acquired it. This complaint can suppress other, more helpful, lockdep complaints, and in any case it is a false positive. This commit therefore switches from rt_mutex_unlock() to rt_mutex_futex_unlock(), thereby avoiding the lockdep annotations. Of course, if lockdep ever learns about rt_mutex_init_proxy_locked(), addtional adjustments will be required. Suggested-by: Peter Zijlstra Signed-off-by: Paul E. McKenney Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Julia Cartwright --- kernel/rcu/tree_plugin.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 3d18d08e8382..510de72ad8a3 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -486,7 +486,7 @@ void rcu_read_unlock_special(struct task_struct *t) /* Unboost if we were boosted. */ if (IS_ENABLED(CONFIG_RCU_BOOST) && drop_boost_mutex) - rt_mutex_unlock(>boost_mtx); + rt_mutex_futex_unlock(>boost_mtx); /* * If this was the last task on the expedited lists, -- 2.18.0
[PATCH RT 09/22] crypto: limit more FPU-enabled sections
From: Sebastian Andrzej Siewior 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- Those crypto drivers use SSE/AVX/… for their crypto work and in order to do so in kernel they need to enable the "FPU" in kernel mode which disables preemption. There are two problems with the way they are used: - the while loop which processes X bytes may create latency spikes and should be avoided or limited. - the cipher-walk-next part may allocate/free memory and may use kmap_atomic(). The whole kernel_fpu_begin()/end() processing isn't probably that cheap. It most likely makes sense to process as much of those as possible in one go. The new *_fpu_sched_rt() schedules only if a RT task is pending. Probably we should measure the performance those ciphers in pure SW mode and with this optimisations to see if it makes sense to keep them for RT. This kernel_fpu_resched() makes the code more preemptible which might hurt performance. Cc: stable...@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior (cherry picked from commit 0dcc4c1693ef37e166da420ef7c68c7047c996f1) Signed-off-by: Julia Cartwright --- arch/x86/crypto/camellia_aesni_avx2_glue.c | 20 arch/x86/crypto/camellia_aesni_avx_glue.c | 19 +++ arch/x86/crypto/cast6_avx_glue.c | 24 +++ arch/x86/crypto/chacha20_glue.c| 9 arch/x86/crypto/serpent_avx2_glue.c| 19 +++ arch/x86/crypto/serpent_avx_glue.c | 23 ++ arch/x86/crypto/serpent_sse2_glue.c| 23 ++ arch/x86/crypto/twofish_avx_glue.c | 27 -- arch/x86/include/asm/fpu/api.h | 1 + arch/x86/kernel/fpu/core.c | 12 ++ 10 files changed, 158 insertions(+), 19 deletions(-) diff --git a/arch/x86/crypto/camellia_aesni_avx2_glue.c b/arch/x86/crypto/camellia_aesni_avx2_glue.c index 60907c139c4e..0902db7d326a 100644 --- a/arch/x86/crypto/camellia_aesni_avx2_glue.c +++ b/arch/x86/crypto/camellia_aesni_avx2_glue.c @@ -206,6 +206,20 @@ struct crypt_priv { bool fpu_enabled; }; +#ifdef CONFIG_PREEMPT_RT_FULL +static void camellia_fpu_end_rt(struct crypt_priv *ctx) +{ + bool fpu_enabled = ctx->fpu_enabled; + + if (!fpu_enabled) + return; + camellia_fpu_end(fpu_enabled); + ctx->fpu_enabled = false; +} +#else +static void camellia_fpu_end_rt(struct crypt_priv *ctx) { } +#endif + static void encrypt_callback(void *priv, u8 *srcdst, unsigned int nbytes) { const unsigned int bsize = CAMELLIA_BLOCK_SIZE; @@ -221,16 +235,19 @@ static void encrypt_callback(void *priv, u8 *srcdst, unsigned int nbytes) } if (nbytes >= CAMELLIA_AESNI_PARALLEL_BLOCKS * bsize) { + kernel_fpu_resched(); camellia_ecb_enc_16way(ctx->ctx, srcdst, srcdst); srcdst += bsize * CAMELLIA_AESNI_PARALLEL_BLOCKS; nbytes -= bsize * CAMELLIA_AESNI_PARALLEL_BLOCKS; } while (nbytes >= CAMELLIA_PARALLEL_BLOCKS * bsize) { + kernel_fpu_resched(); camellia_enc_blk_2way(ctx->ctx, srcdst, srcdst); srcdst += bsize * CAMELLIA_PARALLEL_BLOCKS; nbytes -= bsize * CAMELLIA_PARALLEL_BLOCKS; } + camellia_fpu_end_rt(ctx); for (i = 0; i < nbytes / bsize; i++, srcdst += bsize) camellia_enc_blk(ctx->ctx, srcdst, srcdst); @@ -251,16 +268,19 @@ static void decrypt_callback(void *priv, u8 *srcdst, unsigned int nbytes) } if (nbytes >= CAMELLIA_AESNI_PARALLEL_BLOCKS * bsize) { + kernel_fpu_resched(); camellia_ecb_dec_16way(ctx->ctx, srcdst, srcdst); srcdst += bsize * CAMELLIA_AESNI_PARALLEL_BLOCKS; nbytes -= bsize * CAMELLIA_AESNI_PARALLEL_BLOCKS; } while (nbytes >= CAMELLIA_PARALLEL_BLOCKS * bsize) { + kernel_fpu_resched(); camellia_dec_blk_2way(ctx->ctx, srcdst, srcdst); srcdst += bsize * CAMELLIA_PARALLEL_BLOCKS; nbytes -= bsize * CAMELLIA_PARALLEL_BLOCKS; } + camellia_fpu_end_rt(ctx); for (i = 0; i < nbytes / bsize; i++, srcdst += bsize) camellia_dec_blk(ctx->ctx, srcdst, srcdst); diff --git a/arch/x86/crypto/camellia_aesni_avx_glue.c b/arch/x86/crypto/camellia_aesni_avx_glue.c index d96429da88eb..3b8e91841039 100644 --- a/arch/x86/crypto/camellia_aesni_avx_glue.c +++ b/arch/x86/crypto/camellia_aesni_avx_glue.c @@ -210,6 +210,21 @@ struct crypt_priv { bool fpu_enabled; }; +#ifdef CONFIG_PREEMPT_RT_FULL +static void camellia_fpu_end_rt(struct crypt_priv *ctx) +{ + bool fpu_enabled = ctx->fpu_enabled; + + if (!fpu_enabled) + return;
[PATCH RT 07/22] rcu: Suppress lockdep false-positive ->boost_mtx complaints
From: "Paul E. McKenney" 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- [ Upstream commit 02a7c234e54052101164368ff981bd72f7acdd65 ] RCU priority boosting uses rt_mutex_init_proxy_locked() to initialize an rt_mutex structure in locked state held by some other task. When that other task releases it, lockdep complains (quite accurately, but a bit uselessly) that the other task never acquired it. This complaint can suppress other, more helpful, lockdep complaints, and in any case it is a false positive. This commit therefore switches from rt_mutex_unlock() to rt_mutex_futex_unlock(), thereby avoiding the lockdep annotations. Of course, if lockdep ever learns about rt_mutex_init_proxy_locked(), addtional adjustments will be required. Suggested-by: Peter Zijlstra Signed-off-by: Paul E. McKenney Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Julia Cartwright --- kernel/rcu/tree_plugin.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 3d18d08e8382..510de72ad8a3 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -486,7 +486,7 @@ void rcu_read_unlock_special(struct task_struct *t) /* Unboost if we were boosted. */ if (IS_ENABLED(CONFIG_RCU_BOOST) && drop_boost_mutex) - rt_mutex_unlock(>boost_mtx); + rt_mutex_futex_unlock(>boost_mtx); /* * If this was the last task on the expedited lists, -- 2.18.0
[PATCH RT 04/22] futex: Fix OWNER_DEAD fixup
From: Peter Zijlstra 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- [ Upstream commit a97cb0e7b3f4c6297fd857055ae8e895f402f501 ] Both Geert and DaveJ reported that the recent futex commit: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex") introduced a problem with setting OWNER_DEAD. We set the bit on an uninitialized variable and then entirely optimize it away as a dead-store. Move the setting of the bit to where it is more useful. Reported-by: Geert Uytterhoeven Reported-by: Dave Jones Signed-off-by: Peter Zijlstra (Intel) Cc: Andrew Morton Cc: Linus Torvalds Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Thomas Gleixner Fixes: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex") Link: http://lkml.kernel.org/r/20180122103947.gd2...@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Julia Cartwright --- kernel/futex.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index cdd68ba6e3a6..57038131ad3f 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -2301,9 +2301,6 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q, raw_spin_lock_irq(_state->pi_mutex.wait_lock); oldowner = pi_state->owner; - /* Owner died? */ - if (!pi_state->owner) - newtid |= FUTEX_OWNER_DIED; /* * We are here because either: @@ -2364,6 +2361,9 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q, } newtid = task_pid_vnr(newowner) | FUTEX_WAITERS; + /* Owner died? */ + if (!pi_state->owner) + newtid |= FUTEX_OWNER_DIED; if (get_futex_value_locked(, uaddr)) goto handle_fault; -- 2.18.0
[PATCH RT 16/22] block: blk-mq: move blk_queue_usage_counter_release() into process context
From: Sebastian Andrzej Siewior 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- | BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:914 | in_atomic(): 1, irqs_disabled(): 0, pid: 255, name: kworker/u257:6 | 5 locks held by kworker/u257:6/255: | #0: ("events_unbound"){.+.+.+}, at: [] process_one_work+0x171/0x5e0 | #1: ((>work)){+.+.+.}, at: [] process_one_work+0x171/0x5e0 | #2: (>scan_mutex){+.+.+.}, at: [] __scsi_add_device+0xa3/0x130 [scsi_mod] | #3: (>tag_list_lock){+.+...}, at: [] blk_mq_init_queue+0x96a/0xa50 | #4: (rcu_read_lock_sched){..}, at: [] percpu_ref_kill_and_confirm+0x1d/0x120 | Preemption disabled at:[] blk_mq_freeze_queue_start+0x56/0x70 | | CPU: 2 PID: 255 Comm: kworker/u257:6 Not tainted 3.18.7-rt0+ #1 | Workqueue: events_unbound async_run_entry_fn | 0003 8800bc29f998 815b3a12 | 8800bc29f9b8 8109aa16 8800bc29fa28 | 8800bc5d1bc8 8800bc29f9e8 815b8dd4 8800 | Call Trace: | [] dump_stack+0x4f/0x7c | [] __might_sleep+0x116/0x190 | [] rt_spin_lock+0x24/0x60 | [] __wake_up+0x29/0x60 | [] blk_mq_usage_counter_release+0x1e/0x20 | [] percpu_ref_kill_and_confirm+0x106/0x120 | [] blk_mq_freeze_queue_start+0x56/0x70 | [] blk_mq_update_tag_set_depth+0x40/0xd0 | [] blk_mq_init_queue+0x98c/0xa50 | [] scsi_mq_alloc_queue+0x20/0x60 [scsi_mod] | [] scsi_alloc_sdev+0x2f5/0x370 [scsi_mod] | [] scsi_probe_and_add_lun+0x9e4/0xdd0 [scsi_mod] | [] __scsi_add_device+0x126/0x130 [scsi_mod] | [] ata_scsi_scan_host+0xaf/0x200 [libata] | [] async_port_probe+0x46/0x60 [libata] | [] async_run_entry_fn+0x3b/0xf0 | [] process_one_work+0x201/0x5e0 percpu_ref_kill_and_confirm() invokes blk_mq_usage_counter_release() in a rcu-sched region. swait based wake queue can't be used due to wake_up_all() usage and disabled interrupts in !RT configs (as reported by Corey Minyard). The wq_has_sleeper() check has been suggested by Peter Zijlstra. Cc: stable...@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior (cherry picked from commit 2d701058d614554cce412a787f00568b9fdffade) Signed-off-by: Julia Cartwright --- block/blk-core.c | 14 +- include/linux/blkdev.h | 2 ++ 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/block/blk-core.c b/block/blk-core.c index 87d3e0a503e5..346d5bba3948 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -675,12 +675,21 @@ void blk_queue_exit(struct request_queue *q) percpu_ref_put(>q_usage_counter); } +static void blk_queue_usage_counter_release_swork(struct swork_event *sev) +{ + struct request_queue *q = + container_of(sev, struct request_queue, mq_pcpu_wake); + + wake_up_all(>mq_freeze_wq); +} + static void blk_queue_usage_counter_release(struct percpu_ref *ref) { struct request_queue *q = container_of(ref, struct request_queue, q_usage_counter); - wake_up_all(>mq_freeze_wq); + if (wq_has_sleeper(>mq_freeze_wq)) + swork_queue(>mq_pcpu_wake); } static void blk_rq_timed_out_timer(unsigned long data) @@ -751,6 +760,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) __set_bit(QUEUE_FLAG_BYPASS, >queue_flags); init_waitqueue_head(>mq_freeze_wq); + INIT_SWORK(>mq_pcpu_wake, blk_queue_usage_counter_release_swork); /* * Init percpu_ref in atomic mode so that it's faster to shutdown. @@ -3556,6 +3566,8 @@ int __init blk_dev_init(void) if (!kblockd_workqueue) panic("Failed to create kblockd\n"); + BUG_ON(swork_get()); + request_cachep = kmem_cache_create("blkdev_requests", sizeof(struct request), 0, SLAB_PANIC, NULL); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index fdb449fe3ff7..ab039211ab9f 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -24,6 +24,7 @@ #include #include #include +#include struct module; struct scsi_ioctl_command; @@ -469,6 +470,7 @@ struct request_queue { #endif struct rcu_head rcu_head; wait_queue_head_t mq_freeze_wq; + struct swork_event mq_pcpu_wake; struct percpu_ref q_usage_counter; struct list_headall_q_node; -- 2.18.0
[PATCH RT 02/22] futex: Fix more put_pi_state() vs. exit_pi_state_list() races
From: Peter Zijlstra 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- [ Upstream commit 51d00899f7e6ded15c89cb4e2cb11a35283bac81 ] Dmitry (through syzbot) reported being able to trigger the WARN in get_pi_state() and a use-after-free on: raw_spin_lock_irq(_state->pi_mutex.wait_lock); Both are due to this race: exit_pi_state_list() put_pi_state() lock(>pi_lock) while() { pi_state = list_first_entry(head); hb = hash_futex(_state->key); unlock(>pi_lock); dec_and_test(_state->refcount); lock(>lock) lock(_state->pi_mutex.wait_lock) // uaf if pi_state free'd lock(>pi_lock); unlock(>pi_lock); get_pi_state(); // WARN; refcount==0 The problem is we take the reference count too late, and don't allow it being 0. Fix it by using inc_not_zero() and simply retrying the loop when we fail to get a refcount. In that case put_pi_state() should remove the entry from the list. Reported-by: Dmitry Vyukov Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Thomas Gleixner Cc: Gratian Crisan Cc: Linus Torvalds Cc: Peter Zijlstra Cc: dvh...@infradead.org Cc: syzbot Cc: syzkaller-b...@googlegroups.com Cc: Fixes: c74aef2d06a9 ("futex: Fix pi_state->owner serialization") Link: http://lkml.kernel.org/r/20171031101853.xpfh72y643kdf...@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Julia Cartwright --- kernel/futex.c | 23 --- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index 47e42faad6c5..270148be5647 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -899,11 +899,27 @@ void exit_pi_state_list(struct task_struct *curr) */ raw_spin_lock_irq(>pi_lock); while (!list_empty(head)) { - next = head->next; pi_state = list_entry(next, struct futex_pi_state, list); key = pi_state->key; hb = hash_futex(); + + /* +* We can race against put_pi_state() removing itself from the +* list (a waiter going away). put_pi_state() will first +* decrement the reference count and then modify the list, so +* its possible to see the list entry but fail this reference +* acquire. +* +* In that case; drop the locks to let put_pi_state() make +* progress and retry the loop. +*/ + if (!atomic_inc_not_zero(_state->refcount)) { + raw_spin_unlock_irq(>pi_lock); + cpu_relax(); + raw_spin_lock_irq(>pi_lock); + continue; + } raw_spin_unlock_irq(>pi_lock); spin_lock(>lock); @@ -914,10 +930,12 @@ void exit_pi_state_list(struct task_struct *curr) * task still owns the PI-state: */ if (head->next != next) { + /* retain curr->pi_lock for the loop invariant */ raw_spin_unlock(>pi_lock); raw_spin_unlock_irq(_state->pi_mutex.wait_lock); spin_unlock(>lock); raw_spin_lock_irq(>pi_lock); + put_pi_state(pi_state); continue; } @@ -925,9 +943,8 @@ void exit_pi_state_list(struct task_struct *curr) WARN_ON(list_empty(_state->list)); list_del_init(_state->list); pi_state->owner = NULL; - raw_spin_unlock(>pi_lock); - get_pi_state(pi_state); + raw_spin_unlock(>pi_lock); raw_spin_unlock_irq(_state->pi_mutex.wait_lock); spin_unlock(>lock); -- 2.18.0
[PATCH RT 21/22] seqlock: provide the same ordering semantics as mainline
4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- The mainline implementation of read_seqbegin() orders prior loads w.r.t. the read-side critical section. Fixup the RT writer-boosting implementation to provide the same guarantee. Also, while we're here, update the usage of ACCESS_ONCE() to use READ_ONCE(). Fixes: e69f15cf77c23 ("seqlock: Prevent rt starvation") Cc: stable...@vger.kernel.org Signed-off-by: Julia Cartwright Signed-off-by: Sebastian Andrzej Siewior (cherry picked from commit afa4c06b89a3c0fb7784ff900ccd707bef519cb7) Signed-off-by: Julia Cartwright --- include/linux/seqlock.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h index 3d7223ffdd3b..de04d6d0face 100644 --- a/include/linux/seqlock.h +++ b/include/linux/seqlock.h @@ -461,6 +461,7 @@ static inline unsigned read_seqbegin(seqlock_t *sl) spin_unlock_wait(>lock); goto repeat; } + smp_rmb(); return ret; } #endif -- 2.18.0
[PATCH RT 01/22] futex: Fix pi_state->owner serialization
From: Peter Zijlstra 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- [ Upstream commit c74aef2d06a9f59cece89093eecc552933cba72a ] There was a reported suspicion about a race between exit_pi_state_list() and put_pi_state(). The same report mentioned the comment with put_pi_state() said it should be called with hb->lock held, and it no longer is in all places. As it turns out, the pi_state->owner serialization is indeed broken. As per the new rules: 734009e96d19 ("futex: Change locking rules") pi_state->owner should be serialized by pi_state->pi_mutex.wait_lock. For the sites setting pi_state->owner we already hold wait_lock (where required) but exit_pi_state_list() and put_pi_state() were not and raced on clearing it. Fixes: 734009e96d19 ("futex: Change locking rules") Reported-by: Gratian Crisan Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Cc: dvh...@infradead.org Cc: sta...@vger.kernel.org Link: https://lkml.kernel.org/r/20170922154806.jd3ffltfk24m4...@hirez.programming.kicks-ass.net Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Julia Cartwright --- kernel/futex.c | 34 ++ 1 file changed, 22 insertions(+), 12 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index 8ab0ddd4cf8f..47e42faad6c5 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -819,8 +819,6 @@ static void get_pi_state(struct futex_pi_state *pi_state) /* * Drops a reference to the pi_state object and frees or caches it * when the last reference is gone. - * - * Must be called with the hb lock held. */ static void put_pi_state(struct futex_pi_state *pi_state) { @@ -835,16 +833,22 @@ static void put_pi_state(struct futex_pi_state *pi_state) * and has cleaned up the pi_state already */ if (pi_state->owner) { - raw_spin_lock_irq(_state->owner->pi_lock); - list_del_init(_state->list); - raw_spin_unlock_irq(_state->owner->pi_lock); + struct task_struct *owner; - rt_mutex_proxy_unlock(_state->pi_mutex, pi_state->owner); + raw_spin_lock_irq(_state->pi_mutex.wait_lock); + owner = pi_state->owner; + if (owner) { + raw_spin_lock(>pi_lock); + list_del_init(_state->list); + raw_spin_unlock(>pi_lock); + } + rt_mutex_proxy_unlock(_state->pi_mutex, owner); + raw_spin_unlock_irq(_state->pi_mutex.wait_lock); } - if (current->pi_state_cache) + if (current->pi_state_cache) { kfree(pi_state); - else { + } else { /* * pi_state->list is already empty. * clear pi_state->owner. @@ -903,14 +907,15 @@ void exit_pi_state_list(struct task_struct *curr) raw_spin_unlock_irq(>pi_lock); spin_lock(>lock); - - raw_spin_lock_irq(>pi_lock); + raw_spin_lock_irq(_state->pi_mutex.wait_lock); + raw_spin_lock(>pi_lock); /* * We dropped the pi-lock, so re-check whether this * task still owns the PI-state: */ if (head->next != next) { - raw_spin_unlock_irq(>pi_lock); + raw_spin_unlock(>pi_lock); + raw_spin_unlock_irq(_state->pi_mutex.wait_lock); spin_unlock(>lock); raw_spin_lock_irq(>pi_lock); continue; @@ -920,9 +925,10 @@ void exit_pi_state_list(struct task_struct *curr) WARN_ON(list_empty(_state->list)); list_del_init(_state->list); pi_state->owner = NULL; - raw_spin_unlock_irq(>pi_lock); + raw_spin_unlock(>pi_lock); get_pi_state(pi_state); + raw_spin_unlock_irq(_state->pi_mutex.wait_lock); spin_unlock(>lock); rt_mutex_futex_unlock(_state->pi_mutex); @@ -1204,6 +1210,10 @@ static int attach_to_pi_owner(u32 uval, union futex_key *key, WARN_ON(!list_empty(_state->list)); list_add(_state->list, >pi_state_list); + /* +* Assignment without holding pi_state->pi_mutex.wait_lock is safe +* because there is no concurrency as the object is not published yet. +*/ pi_state->owner = p; raw_spin_unlock_irq(>pi_lock); -- 2.18.0
[PATCH RT 14/22] Revert "rt,ntp: Move call to schedule_delayed_work() to helper thread"
From: Sebastian Andrzej Siewior 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- I've been looking at this in v3.10-RT where it got in. The patch description says |The ntp code for notify_cmos_timer() is called from a hard interrupt |context. I see only one caller of ntp_notify_cmos_timer() and that is do_adjtimex() after "raw_spin_unlock_irqrestore()". I see a few callers of do_adjtimex() which is SYS_adjtimex() (+compat) and posix_clock_realtime_adj() which in turn is called by SYS_clock_adjtime(). Reverting the patch. Cc: stable...@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior (cherry picked from commit 932c5783d4434250a1019f49ae81b80731dfd4cd) Signed-off-by: Julia Cartwright --- kernel/time/ntp.c | 26 -- 1 file changed, 26 deletions(-) diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index 05b7391bf9bd..6df8927c58a5 100644 --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -17,7 +17,6 @@ #include #include #include -#include #include "ntp_internal.h" #include "timekeeping_internal.h" @@ -569,35 +568,10 @@ static void sync_cmos_clock(struct work_struct *work) _cmos_work, timespec64_to_jiffies()); } -#ifdef CONFIG_PREEMPT_RT_FULL - -static void run_clock_set_delay(struct swork_event *event) -{ - queue_delayed_work(system_power_efficient_wq, _cmos_work, 0); -} - -static struct swork_event ntp_cmos_swork; - -void ntp_notify_cmos_timer(void) -{ - swork_queue(_cmos_swork); -} - -static __init int create_cmos_delay_thread(void) -{ - WARN_ON(swork_get()); - INIT_SWORK(_cmos_swork, run_clock_set_delay); - return 0; -} -early_initcall(create_cmos_delay_thread); - -#else - void ntp_notify_cmos_timer(void) { queue_delayed_work(system_power_efficient_wq, _cmos_work, 0); } -#endif /* CONFIG_PREEMPT_RT_FULL */ #else void ntp_notify_cmos_timer(void) { } -- 2.18.0
[PATCH RT 04/22] futex: Fix OWNER_DEAD fixup
From: Peter Zijlstra 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- [ Upstream commit a97cb0e7b3f4c6297fd857055ae8e895f402f501 ] Both Geert and DaveJ reported that the recent futex commit: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex") introduced a problem with setting OWNER_DEAD. We set the bit on an uninitialized variable and then entirely optimize it away as a dead-store. Move the setting of the bit to where it is more useful. Reported-by: Geert Uytterhoeven Reported-by: Dave Jones Signed-off-by: Peter Zijlstra (Intel) Cc: Andrew Morton Cc: Linus Torvalds Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Thomas Gleixner Fixes: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex") Link: http://lkml.kernel.org/r/20180122103947.gd2...@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Julia Cartwright --- kernel/futex.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index cdd68ba6e3a6..57038131ad3f 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -2301,9 +2301,6 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q, raw_spin_lock_irq(_state->pi_mutex.wait_lock); oldowner = pi_state->owner; - /* Owner died? */ - if (!pi_state->owner) - newtid |= FUTEX_OWNER_DIED; /* * We are here because either: @@ -2364,6 +2361,9 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q, } newtid = task_pid_vnr(newowner) | FUTEX_WAITERS; + /* Owner died? */ + if (!pi_state->owner) + newtid |= FUTEX_OWNER_DIED; if (get_futex_value_locked(, uaddr)) goto handle_fault; -- 2.18.0
[PATCH RT 16/22] block: blk-mq: move blk_queue_usage_counter_release() into process context
From: Sebastian Andrzej Siewior 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- | BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:914 | in_atomic(): 1, irqs_disabled(): 0, pid: 255, name: kworker/u257:6 | 5 locks held by kworker/u257:6/255: | #0: ("events_unbound"){.+.+.+}, at: [] process_one_work+0x171/0x5e0 | #1: ((>work)){+.+.+.}, at: [] process_one_work+0x171/0x5e0 | #2: (>scan_mutex){+.+.+.}, at: [] __scsi_add_device+0xa3/0x130 [scsi_mod] | #3: (>tag_list_lock){+.+...}, at: [] blk_mq_init_queue+0x96a/0xa50 | #4: (rcu_read_lock_sched){..}, at: [] percpu_ref_kill_and_confirm+0x1d/0x120 | Preemption disabled at:[] blk_mq_freeze_queue_start+0x56/0x70 | | CPU: 2 PID: 255 Comm: kworker/u257:6 Not tainted 3.18.7-rt0+ #1 | Workqueue: events_unbound async_run_entry_fn | 0003 8800bc29f998 815b3a12 | 8800bc29f9b8 8109aa16 8800bc29fa28 | 8800bc5d1bc8 8800bc29f9e8 815b8dd4 8800 | Call Trace: | [] dump_stack+0x4f/0x7c | [] __might_sleep+0x116/0x190 | [] rt_spin_lock+0x24/0x60 | [] __wake_up+0x29/0x60 | [] blk_mq_usage_counter_release+0x1e/0x20 | [] percpu_ref_kill_and_confirm+0x106/0x120 | [] blk_mq_freeze_queue_start+0x56/0x70 | [] blk_mq_update_tag_set_depth+0x40/0xd0 | [] blk_mq_init_queue+0x98c/0xa50 | [] scsi_mq_alloc_queue+0x20/0x60 [scsi_mod] | [] scsi_alloc_sdev+0x2f5/0x370 [scsi_mod] | [] scsi_probe_and_add_lun+0x9e4/0xdd0 [scsi_mod] | [] __scsi_add_device+0x126/0x130 [scsi_mod] | [] ata_scsi_scan_host+0xaf/0x200 [libata] | [] async_port_probe+0x46/0x60 [libata] | [] async_run_entry_fn+0x3b/0xf0 | [] process_one_work+0x201/0x5e0 percpu_ref_kill_and_confirm() invokes blk_mq_usage_counter_release() in a rcu-sched region. swait based wake queue can't be used due to wake_up_all() usage and disabled interrupts in !RT configs (as reported by Corey Minyard). The wq_has_sleeper() check has been suggested by Peter Zijlstra. Cc: stable...@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior (cherry picked from commit 2d701058d614554cce412a787f00568b9fdffade) Signed-off-by: Julia Cartwright --- block/blk-core.c | 14 +- include/linux/blkdev.h | 2 ++ 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/block/blk-core.c b/block/blk-core.c index 87d3e0a503e5..346d5bba3948 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -675,12 +675,21 @@ void blk_queue_exit(struct request_queue *q) percpu_ref_put(>q_usage_counter); } +static void blk_queue_usage_counter_release_swork(struct swork_event *sev) +{ + struct request_queue *q = + container_of(sev, struct request_queue, mq_pcpu_wake); + + wake_up_all(>mq_freeze_wq); +} + static void blk_queue_usage_counter_release(struct percpu_ref *ref) { struct request_queue *q = container_of(ref, struct request_queue, q_usage_counter); - wake_up_all(>mq_freeze_wq); + if (wq_has_sleeper(>mq_freeze_wq)) + swork_queue(>mq_pcpu_wake); } static void blk_rq_timed_out_timer(unsigned long data) @@ -751,6 +760,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) __set_bit(QUEUE_FLAG_BYPASS, >queue_flags); init_waitqueue_head(>mq_freeze_wq); + INIT_SWORK(>mq_pcpu_wake, blk_queue_usage_counter_release_swork); /* * Init percpu_ref in atomic mode so that it's faster to shutdown. @@ -3556,6 +3566,8 @@ int __init blk_dev_init(void) if (!kblockd_workqueue) panic("Failed to create kblockd\n"); + BUG_ON(swork_get()); + request_cachep = kmem_cache_create("blkdev_requests", sizeof(struct request), 0, SLAB_PANIC, NULL); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index fdb449fe3ff7..ab039211ab9f 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -24,6 +24,7 @@ #include #include #include +#include struct module; struct scsi_ioctl_command; @@ -469,6 +470,7 @@ struct request_queue { #endif struct rcu_head rcu_head; wait_queue_head_t mq_freeze_wq; + struct swork_event mq_pcpu_wake; struct percpu_ref q_usage_counter; struct list_headall_q_node; -- 2.18.0
[PATCH RT 02/22] futex: Fix more put_pi_state() vs. exit_pi_state_list() races
From: Peter Zijlstra 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- [ Upstream commit 51d00899f7e6ded15c89cb4e2cb11a35283bac81 ] Dmitry (through syzbot) reported being able to trigger the WARN in get_pi_state() and a use-after-free on: raw_spin_lock_irq(_state->pi_mutex.wait_lock); Both are due to this race: exit_pi_state_list() put_pi_state() lock(>pi_lock) while() { pi_state = list_first_entry(head); hb = hash_futex(_state->key); unlock(>pi_lock); dec_and_test(_state->refcount); lock(>lock) lock(_state->pi_mutex.wait_lock) // uaf if pi_state free'd lock(>pi_lock); unlock(>pi_lock); get_pi_state(); // WARN; refcount==0 The problem is we take the reference count too late, and don't allow it being 0. Fix it by using inc_not_zero() and simply retrying the loop when we fail to get a refcount. In that case put_pi_state() should remove the entry from the list. Reported-by: Dmitry Vyukov Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Thomas Gleixner Cc: Gratian Crisan Cc: Linus Torvalds Cc: Peter Zijlstra Cc: dvh...@infradead.org Cc: syzbot Cc: syzkaller-b...@googlegroups.com Cc: Fixes: c74aef2d06a9 ("futex: Fix pi_state->owner serialization") Link: http://lkml.kernel.org/r/20171031101853.xpfh72y643kdf...@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Julia Cartwright --- kernel/futex.c | 23 --- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index 47e42faad6c5..270148be5647 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -899,11 +899,27 @@ void exit_pi_state_list(struct task_struct *curr) */ raw_spin_lock_irq(>pi_lock); while (!list_empty(head)) { - next = head->next; pi_state = list_entry(next, struct futex_pi_state, list); key = pi_state->key; hb = hash_futex(); + + /* +* We can race against put_pi_state() removing itself from the +* list (a waiter going away). put_pi_state() will first +* decrement the reference count and then modify the list, so +* its possible to see the list entry but fail this reference +* acquire. +* +* In that case; drop the locks to let put_pi_state() make +* progress and retry the loop. +*/ + if (!atomic_inc_not_zero(_state->refcount)) { + raw_spin_unlock_irq(>pi_lock); + cpu_relax(); + raw_spin_lock_irq(>pi_lock); + continue; + } raw_spin_unlock_irq(>pi_lock); spin_lock(>lock); @@ -914,10 +930,12 @@ void exit_pi_state_list(struct task_struct *curr) * task still owns the PI-state: */ if (head->next != next) { + /* retain curr->pi_lock for the loop invariant */ raw_spin_unlock(>pi_lock); raw_spin_unlock_irq(_state->pi_mutex.wait_lock); spin_unlock(>lock); raw_spin_lock_irq(>pi_lock); + put_pi_state(pi_state); continue; } @@ -925,9 +943,8 @@ void exit_pi_state_list(struct task_struct *curr) WARN_ON(list_empty(_state->list)); list_del_init(_state->list); pi_state->owner = NULL; - raw_spin_unlock(>pi_lock); - get_pi_state(pi_state); + raw_spin_unlock(>pi_lock); raw_spin_unlock_irq(_state->pi_mutex.wait_lock); spin_unlock(>lock); -- 2.18.0
[PATCH RT 21/22] seqlock: provide the same ordering semantics as mainline
4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- The mainline implementation of read_seqbegin() orders prior loads w.r.t. the read-side critical section. Fixup the RT writer-boosting implementation to provide the same guarantee. Also, while we're here, update the usage of ACCESS_ONCE() to use READ_ONCE(). Fixes: e69f15cf77c23 ("seqlock: Prevent rt starvation") Cc: stable...@vger.kernel.org Signed-off-by: Julia Cartwright Signed-off-by: Sebastian Andrzej Siewior (cherry picked from commit afa4c06b89a3c0fb7784ff900ccd707bef519cb7) Signed-off-by: Julia Cartwright --- include/linux/seqlock.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h index 3d7223ffdd3b..de04d6d0face 100644 --- a/include/linux/seqlock.h +++ b/include/linux/seqlock.h @@ -461,6 +461,7 @@ static inline unsigned read_seqbegin(seqlock_t *sl) spin_unlock_wait(>lock); goto repeat; } + smp_rmb(); return ret; } #endif -- 2.18.0
[PATCH RT 01/22] futex: Fix pi_state->owner serialization
From: Peter Zijlstra 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- [ Upstream commit c74aef2d06a9f59cece89093eecc552933cba72a ] There was a reported suspicion about a race between exit_pi_state_list() and put_pi_state(). The same report mentioned the comment with put_pi_state() said it should be called with hb->lock held, and it no longer is in all places. As it turns out, the pi_state->owner serialization is indeed broken. As per the new rules: 734009e96d19 ("futex: Change locking rules") pi_state->owner should be serialized by pi_state->pi_mutex.wait_lock. For the sites setting pi_state->owner we already hold wait_lock (where required) but exit_pi_state_list() and put_pi_state() were not and raced on clearing it. Fixes: 734009e96d19 ("futex: Change locking rules") Reported-by: Gratian Crisan Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Cc: dvh...@infradead.org Cc: sta...@vger.kernel.org Link: https://lkml.kernel.org/r/20170922154806.jd3ffltfk24m4...@hirez.programming.kicks-ass.net Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Julia Cartwright --- kernel/futex.c | 34 ++ 1 file changed, 22 insertions(+), 12 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index 8ab0ddd4cf8f..47e42faad6c5 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -819,8 +819,6 @@ static void get_pi_state(struct futex_pi_state *pi_state) /* * Drops a reference to the pi_state object and frees or caches it * when the last reference is gone. - * - * Must be called with the hb lock held. */ static void put_pi_state(struct futex_pi_state *pi_state) { @@ -835,16 +833,22 @@ static void put_pi_state(struct futex_pi_state *pi_state) * and has cleaned up the pi_state already */ if (pi_state->owner) { - raw_spin_lock_irq(_state->owner->pi_lock); - list_del_init(_state->list); - raw_spin_unlock_irq(_state->owner->pi_lock); + struct task_struct *owner; - rt_mutex_proxy_unlock(_state->pi_mutex, pi_state->owner); + raw_spin_lock_irq(_state->pi_mutex.wait_lock); + owner = pi_state->owner; + if (owner) { + raw_spin_lock(>pi_lock); + list_del_init(_state->list); + raw_spin_unlock(>pi_lock); + } + rt_mutex_proxy_unlock(_state->pi_mutex, owner); + raw_spin_unlock_irq(_state->pi_mutex.wait_lock); } - if (current->pi_state_cache) + if (current->pi_state_cache) { kfree(pi_state); - else { + } else { /* * pi_state->list is already empty. * clear pi_state->owner. @@ -903,14 +907,15 @@ void exit_pi_state_list(struct task_struct *curr) raw_spin_unlock_irq(>pi_lock); spin_lock(>lock); - - raw_spin_lock_irq(>pi_lock); + raw_spin_lock_irq(_state->pi_mutex.wait_lock); + raw_spin_lock(>pi_lock); /* * We dropped the pi-lock, so re-check whether this * task still owns the PI-state: */ if (head->next != next) { - raw_spin_unlock_irq(>pi_lock); + raw_spin_unlock(>pi_lock); + raw_spin_unlock_irq(_state->pi_mutex.wait_lock); spin_unlock(>lock); raw_spin_lock_irq(>pi_lock); continue; @@ -920,9 +925,10 @@ void exit_pi_state_list(struct task_struct *curr) WARN_ON(list_empty(_state->list)); list_del_init(_state->list); pi_state->owner = NULL; - raw_spin_unlock_irq(>pi_lock); + raw_spin_unlock(>pi_lock); get_pi_state(pi_state); + raw_spin_unlock_irq(_state->pi_mutex.wait_lock); spin_unlock(>lock); rt_mutex_futex_unlock(_state->pi_mutex); @@ -1204,6 +1210,10 @@ static int attach_to_pi_owner(u32 uval, union futex_key *key, WARN_ON(!list_empty(_state->list)); list_add(_state->list, >pi_state_list); + /* +* Assignment without holding pi_state->pi_mutex.wait_lock is safe +* because there is no concurrency as the object is not published yet. +*/ pi_state->owner = p; raw_spin_unlock_irq(>pi_lock); -- 2.18.0
[PATCH RT 14/22] Revert "rt,ntp: Move call to schedule_delayed_work() to helper thread"
From: Sebastian Andrzej Siewior 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- I've been looking at this in v3.10-RT where it got in. The patch description says |The ntp code for notify_cmos_timer() is called from a hard interrupt |context. I see only one caller of ntp_notify_cmos_timer() and that is do_adjtimex() after "raw_spin_unlock_irqrestore()". I see a few callers of do_adjtimex() which is SYS_adjtimex() (+compat) and posix_clock_realtime_adj() which in turn is called by SYS_clock_adjtime(). Reverting the patch. Cc: stable...@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior (cherry picked from commit 932c5783d4434250a1019f49ae81b80731dfd4cd) Signed-off-by: Julia Cartwright --- kernel/time/ntp.c | 26 -- 1 file changed, 26 deletions(-) diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index 05b7391bf9bd..6df8927c58a5 100644 --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -17,7 +17,6 @@ #include #include #include -#include #include "ntp_internal.h" #include "timekeeping_internal.h" @@ -569,35 +568,10 @@ static void sync_cmos_clock(struct work_struct *work) _cmos_work, timespec64_to_jiffies()); } -#ifdef CONFIG_PREEMPT_RT_FULL - -static void run_clock_set_delay(struct swork_event *event) -{ - queue_delayed_work(system_power_efficient_wq, _cmos_work, 0); -} - -static struct swork_event ntp_cmos_swork; - -void ntp_notify_cmos_timer(void) -{ - swork_queue(_cmos_swork); -} - -static __init int create_cmos_delay_thread(void) -{ - WARN_ON(swork_get()); - INIT_SWORK(_cmos_swork, run_clock_set_delay); - return 0; -} -early_initcall(create_cmos_delay_thread); - -#else - void ntp_notify_cmos_timer(void) { queue_delayed_work(system_power_efficient_wq, _cmos_work, 0); } -#endif /* CONFIG_PREEMPT_RT_FULL */ #else void ntp_notify_cmos_timer(void) { } -- 2.18.0
[PATCH RT 13/22] net: use task_struct instead of CPU number as the queue owner on -RT
From: Sebastian Andrzej Siewior 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- In commit ("net: move xmit_recursion to per-task variable on -RT") the recursion level was changed to be per-task since we can get preempted in BH on -RT. The lock owner should consequently be recorded as the task that holds the lock and not the CPU. Otherwise we trigger the "Dead loop on virtual device" warning on SMP systems. Cc: stable...@vger.kernel.org Reported-by: Kurt Kanzenbach Tested-by: Kurt Kanzenbach Signed-off-by: Sebastian Andrzej Siewior (cherry picked from commit d3a66ffd1c4f0253076069b10a8223e7b6e80e38) Signed-off-by: Julia Cartwright --- include/linux/netdevice.h | 54 ++- net/core/dev.c| 6 - 2 files changed, 53 insertions(+), 7 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 85fc72b8a92b..6f1a3f286c4b 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -594,7 +594,11 @@ struct netdev_queue { * write-mostly part */ spinlock_t _xmit_lock cacheline_aligned_in_smp; +#ifdef CONFIG_PREEMPT_RT_FULL + struct task_struct *xmit_lock_owner; +#else int xmit_lock_owner; +#endif /* * Time (in jiffies) of last Tx */ @@ -3610,41 +3614,79 @@ static inline u32 netif_msg_init(int debug_value, int default_msg_enable_bits) return (1 << debug_value) - 1; } +#ifdef CONFIG_PREEMPT_RT_FULL +static inline void netdev_queue_set_owner(struct netdev_queue *txq, int cpu) +{ + txq->xmit_lock_owner = current; +} + +static inline void netdev_queue_clear_owner(struct netdev_queue *txq) +{ + txq->xmit_lock_owner = NULL; +} + +static inline bool netdev_queue_has_owner(struct netdev_queue *txq) +{ + if (txq->xmit_lock_owner != NULL) + return true; + return false; +} + +#else + +static inline void netdev_queue_set_owner(struct netdev_queue *txq, int cpu) +{ + txq->xmit_lock_owner = cpu; +} + +static inline void netdev_queue_clear_owner(struct netdev_queue *txq) +{ + txq->xmit_lock_owner = -1; +} + +static inline bool netdev_queue_has_owner(struct netdev_queue *txq) +{ + if (txq->xmit_lock_owner != -1) + return true; + return false; +} +#endif + static inline void __netif_tx_lock(struct netdev_queue *txq, int cpu) { spin_lock(>_xmit_lock); - txq->xmit_lock_owner = cpu; + netdev_queue_set_owner(txq, cpu); } static inline void __netif_tx_lock_bh(struct netdev_queue *txq) { spin_lock_bh(>_xmit_lock); - txq->xmit_lock_owner = smp_processor_id(); + netdev_queue_set_owner(txq, smp_processor_id()); } static inline bool __netif_tx_trylock(struct netdev_queue *txq) { bool ok = spin_trylock(>_xmit_lock); if (likely(ok)) - txq->xmit_lock_owner = smp_processor_id(); + netdev_queue_set_owner(txq, smp_processor_id()); return ok; } static inline void __netif_tx_unlock(struct netdev_queue *txq) { - txq->xmit_lock_owner = -1; + netdev_queue_clear_owner(txq); spin_unlock(>_xmit_lock); } static inline void __netif_tx_unlock_bh(struct netdev_queue *txq) { - txq->xmit_lock_owner = -1; + netdev_queue_clear_owner(txq); spin_unlock_bh(>_xmit_lock); } static inline void txq_trans_update(struct netdev_queue *txq) { - if (txq->xmit_lock_owner != -1) + if (netdev_queue_has_owner(txq)) txq->trans_start = jiffies; } diff --git a/net/core/dev.c b/net/core/dev.c index 93995575d23a..e7dc4700e463 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3449,7 +3449,11 @@ static int __dev_queue_xmit(struct sk_buff *skb, void *accel_priv) if (dev->flags & IFF_UP) { int cpu = smp_processor_id(); /* ok because BHs are off */ +#ifdef CONFIG_PREEMPT_RT_FULL + if (txq->xmit_lock_owner != current) { +#else if (txq->xmit_lock_owner != cpu) { +#endif if (unlikely(xmit_rec_read() > XMIT_RECURSION_LIMIT)) goto recursion_alert; @@ -7168,7 +7172,7 @@ static void netdev_init_one_queue(struct net_device *dev, /* Initialize queue lock */ spin_lock_init(>_xmit_lock); netdev_set_xmit_lockdep_class(>_xmit_lock, dev->type); - queue->xmit_lock_owner = -1; + netdev_queue_clear_owner(queue); netdev_queue_numa_node_write(queue, NUMA_NO_NODE); queue->dev = dev; #ifdef CONFIG_BQL -- 2.18.0
[PATCH RT 05/22] rtmutex: Make rt_mutex_futex_unlock() safe for irq-off callsites
From: Boqun Feng 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- [ Upstream commit 6b0ef92fee2a3189eba6d6b827b247cb4f6da7e9 ] When running rcutorture with TREE03 config, CONFIG_PROVE_LOCKING=y, and kernel cmdline argument "rcutorture.gp_exp=1", lockdep reports a HARDIRQ-safe->HARDIRQ-unsafe deadlock: === WARNING: inconsistent lock state 4.16.0-rc4+ #1 Not tainted inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. takes: __schedule+0xbe/0xaf0 {IN-HARDIRQ-W} state was registered at: _raw_spin_lock+0x2a/0x40 scheduler_tick+0x47/0xf0 ... other info that might help us debug this: Possible unsafe locking scenario: CPU0 lock(>lock); lock(>lock); *** DEADLOCK *** 1 lock held by rcu_torture_rea/724: rcu_torture_read_lock+0x0/0x70 stack backtrace: CPU: 2 PID: 724 Comm: rcu_torture_rea Not tainted 4.16.0-rc4+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014 Call Trace: lock_acquire+0x90/0x200 ? __schedule+0xbe/0xaf0 _raw_spin_lock+0x2a/0x40 ? __schedule+0xbe/0xaf0 __schedule+0xbe/0xaf0 preempt_schedule_irq+0x2f/0x60 retint_kernel+0x1b/0x2d RIP: 0010:rcu_read_unlock_special+0x0/0x680 ? rcu_torture_read_unlock+0x60/0x60 __rcu_read_unlock+0x64/0x70 rcu_torture_read_unlock+0x17/0x60 rcu_torture_reader+0x275/0x450 ? rcutorture_booster_init+0x110/0x110 ? rcu_torture_stall+0x230/0x230 ? kthread+0x10e/0x130 kthread+0x10e/0x130 ? kthread_create_worker_on_cpu+0x70/0x70 ? call_usermodehelper_exec_async+0x11a/0x150 ret_from_fork+0x3a/0x50 This happens with the following even sequence: preempt_schedule_irq(); local_irq_enable(); __schedule(): local_irq_disable(); // irq off ... rcu_note_context_switch(): rcu_note_preempt_context_switch(): rcu_read_unlock_special(): local_irq_save(flags); ... raw_spin_unlock_irqrestore(...,flags); // irq remains off rt_mutex_futex_unlock(): raw_spin_lock_irq(); ... raw_spin_unlock_irq(); // accidentally set irq on rq_lock(): raw_spin_lock(); // acquiring rq->lock with irq on which means rq->lock becomes a HARDIRQ-unsafe lock, which can cause deadlocks in scheduler code. This problem was introduced by commit 02a7c234e540 ("rcu: Suppress lockdep false-positive ->boost_mtx complaints"). That brought the user of rt_mutex_futex_unlock() with irq off. To fix this, replace the *lock_irq() in rt_mutex_futex_unlock() with *lock_irq{save,restore}() to make it safe to call rt_mutex_futex_unlock() with irq off. Fixes: 02a7c234e540 ("rcu: Suppress lockdep false-positive ->boost_mtx complaints") Signed-off-by: Boqun Feng Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra Cc: Lai Jiangshan Cc: Steven Rostedt Cc: Josh Triplett Cc: Mathieu Desnoyers Cc: "Paul E . McKenney" Link: https://lkml.kernel.org/r/20180309065630.8283-1-boqun.f...@gmail.com Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Julia Cartwright --- kernel/locking/rtmutex.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 57361d631749..5e15f5c73637 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -2213,11 +2213,12 @@ void __sched rt_mutex_futex_unlock(struct rt_mutex *lock) { WAKE_Q(wake_q); WAKE_Q(wake_sleeper_q); + unsigned long flags; bool postunlock; - raw_spin_lock_irq(>wait_lock); + raw_spin_lock_irqsave(>wait_lock, flags); postunlock = __rt_mutex_futex_unlock(lock, _q, _sleeper_q); - raw_spin_unlock_irq(>wait_lock); + raw_spin_unlock_irqrestore(>wait_lock, flags); if (postunlock) rt_mutex_postunlock(_q, _sleeper_q); -- 2.18.0
[PATCH RT 20/22] squashfs: make use of local lock in multi_cpu decompressor
4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- Currently, the squashfs multi_cpu decompressor makes use of get_cpu_ptr()/put_cpu_ptr(), which unconditionally disable preemption during decompression. Because the workload is distributed across CPUs, all CPUs can observe a very high wakeup latency, which has been seen to be as much as 8000us. Convert this decompressor to make use of a local lock, which will allow execution of the decompressor with preemption-enabled, but also ensure concurrent accesses to the percpu compressor data on the local CPU will be serialized. Cc: stable...@vger.kernel.org Reported-by: Alexander Stein Tested-by: Alexander Stein Signed-off-by: Julia Cartwright Signed-off-by: Sebastian Andrzej Siewior (cherry picked from commit c160736542d7b3d67da32848d2f028b8e35730e5) Signed-off-by: Julia Cartwright --- fs/squashfs/decompressor_multi_percpu.c | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/fs/squashfs/decompressor_multi_percpu.c b/fs/squashfs/decompressor_multi_percpu.c index 23a9c28ad8ea..6a73c4fa88e7 100644 --- a/fs/squashfs/decompressor_multi_percpu.c +++ b/fs/squashfs/decompressor_multi_percpu.c @@ -10,6 +10,7 @@ #include #include #include +#include #include "squashfs_fs.h" #include "squashfs_fs_sb.h" @@ -25,6 +26,8 @@ struct squashfs_stream { void*stream; }; +static DEFINE_LOCAL_IRQ_LOCK(stream_lock); + void *squashfs_decompressor_create(struct squashfs_sb_info *msblk, void *comp_opts) { @@ -79,10 +82,15 @@ int squashfs_decompress(struct squashfs_sb_info *msblk, struct buffer_head **bh, { struct squashfs_stream __percpu *percpu = (struct squashfs_stream __percpu *) msblk->stream; - struct squashfs_stream *stream = get_cpu_ptr(percpu); - int res = msblk->decompressor->decompress(msblk, stream->stream, bh, b, - offset, length, output); - put_cpu_ptr(stream); + struct squashfs_stream *stream; + int res; + + stream = get_locked_ptr(stream_lock, percpu); + + res = msblk->decompressor->decompress(msblk, stream->stream, bh, b, + offset, length, output); + + put_locked_ptr(stream_lock, stream); if (res < 0) ERROR("%s decompression failed, data probably corrupt\n", -- 2.18.0
[PATCH RT 13/22] net: use task_struct instead of CPU number as the queue owner on -RT
From: Sebastian Andrzej Siewior 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- In commit ("net: move xmit_recursion to per-task variable on -RT") the recursion level was changed to be per-task since we can get preempted in BH on -RT. The lock owner should consequently be recorded as the task that holds the lock and not the CPU. Otherwise we trigger the "Dead loop on virtual device" warning on SMP systems. Cc: stable...@vger.kernel.org Reported-by: Kurt Kanzenbach Tested-by: Kurt Kanzenbach Signed-off-by: Sebastian Andrzej Siewior (cherry picked from commit d3a66ffd1c4f0253076069b10a8223e7b6e80e38) Signed-off-by: Julia Cartwright --- include/linux/netdevice.h | 54 ++- net/core/dev.c| 6 - 2 files changed, 53 insertions(+), 7 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 85fc72b8a92b..6f1a3f286c4b 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -594,7 +594,11 @@ struct netdev_queue { * write-mostly part */ spinlock_t _xmit_lock cacheline_aligned_in_smp; +#ifdef CONFIG_PREEMPT_RT_FULL + struct task_struct *xmit_lock_owner; +#else int xmit_lock_owner; +#endif /* * Time (in jiffies) of last Tx */ @@ -3610,41 +3614,79 @@ static inline u32 netif_msg_init(int debug_value, int default_msg_enable_bits) return (1 << debug_value) - 1; } +#ifdef CONFIG_PREEMPT_RT_FULL +static inline void netdev_queue_set_owner(struct netdev_queue *txq, int cpu) +{ + txq->xmit_lock_owner = current; +} + +static inline void netdev_queue_clear_owner(struct netdev_queue *txq) +{ + txq->xmit_lock_owner = NULL; +} + +static inline bool netdev_queue_has_owner(struct netdev_queue *txq) +{ + if (txq->xmit_lock_owner != NULL) + return true; + return false; +} + +#else + +static inline void netdev_queue_set_owner(struct netdev_queue *txq, int cpu) +{ + txq->xmit_lock_owner = cpu; +} + +static inline void netdev_queue_clear_owner(struct netdev_queue *txq) +{ + txq->xmit_lock_owner = -1; +} + +static inline bool netdev_queue_has_owner(struct netdev_queue *txq) +{ + if (txq->xmit_lock_owner != -1) + return true; + return false; +} +#endif + static inline void __netif_tx_lock(struct netdev_queue *txq, int cpu) { spin_lock(>_xmit_lock); - txq->xmit_lock_owner = cpu; + netdev_queue_set_owner(txq, cpu); } static inline void __netif_tx_lock_bh(struct netdev_queue *txq) { spin_lock_bh(>_xmit_lock); - txq->xmit_lock_owner = smp_processor_id(); + netdev_queue_set_owner(txq, smp_processor_id()); } static inline bool __netif_tx_trylock(struct netdev_queue *txq) { bool ok = spin_trylock(>_xmit_lock); if (likely(ok)) - txq->xmit_lock_owner = smp_processor_id(); + netdev_queue_set_owner(txq, smp_processor_id()); return ok; } static inline void __netif_tx_unlock(struct netdev_queue *txq) { - txq->xmit_lock_owner = -1; + netdev_queue_clear_owner(txq); spin_unlock(>_xmit_lock); } static inline void __netif_tx_unlock_bh(struct netdev_queue *txq) { - txq->xmit_lock_owner = -1; + netdev_queue_clear_owner(txq); spin_unlock_bh(>_xmit_lock); } static inline void txq_trans_update(struct netdev_queue *txq) { - if (txq->xmit_lock_owner != -1) + if (netdev_queue_has_owner(txq)) txq->trans_start = jiffies; } diff --git a/net/core/dev.c b/net/core/dev.c index 93995575d23a..e7dc4700e463 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3449,7 +3449,11 @@ static int __dev_queue_xmit(struct sk_buff *skb, void *accel_priv) if (dev->flags & IFF_UP) { int cpu = smp_processor_id(); /* ok because BHs are off */ +#ifdef CONFIG_PREEMPT_RT_FULL + if (txq->xmit_lock_owner != current) { +#else if (txq->xmit_lock_owner != cpu) { +#endif if (unlikely(xmit_rec_read() > XMIT_RECURSION_LIMIT)) goto recursion_alert; @@ -7168,7 +7172,7 @@ static void netdev_init_one_queue(struct net_device *dev, /* Initialize queue lock */ spin_lock_init(>_xmit_lock); netdev_set_xmit_lockdep_class(>_xmit_lock, dev->type); - queue->xmit_lock_owner = -1; + netdev_queue_clear_owner(queue); netdev_queue_numa_node_write(queue, NUMA_NO_NODE); queue->dev = dev; #ifdef CONFIG_BQL -- 2.18.0
[PATCH RT 05/22] rtmutex: Make rt_mutex_futex_unlock() safe for irq-off callsites
From: Boqun Feng 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- [ Upstream commit 6b0ef92fee2a3189eba6d6b827b247cb4f6da7e9 ] When running rcutorture with TREE03 config, CONFIG_PROVE_LOCKING=y, and kernel cmdline argument "rcutorture.gp_exp=1", lockdep reports a HARDIRQ-safe->HARDIRQ-unsafe deadlock: === WARNING: inconsistent lock state 4.16.0-rc4+ #1 Not tainted inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. takes: __schedule+0xbe/0xaf0 {IN-HARDIRQ-W} state was registered at: _raw_spin_lock+0x2a/0x40 scheduler_tick+0x47/0xf0 ... other info that might help us debug this: Possible unsafe locking scenario: CPU0 lock(>lock); lock(>lock); *** DEADLOCK *** 1 lock held by rcu_torture_rea/724: rcu_torture_read_lock+0x0/0x70 stack backtrace: CPU: 2 PID: 724 Comm: rcu_torture_rea Not tainted 4.16.0-rc4+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014 Call Trace: lock_acquire+0x90/0x200 ? __schedule+0xbe/0xaf0 _raw_spin_lock+0x2a/0x40 ? __schedule+0xbe/0xaf0 __schedule+0xbe/0xaf0 preempt_schedule_irq+0x2f/0x60 retint_kernel+0x1b/0x2d RIP: 0010:rcu_read_unlock_special+0x0/0x680 ? rcu_torture_read_unlock+0x60/0x60 __rcu_read_unlock+0x64/0x70 rcu_torture_read_unlock+0x17/0x60 rcu_torture_reader+0x275/0x450 ? rcutorture_booster_init+0x110/0x110 ? rcu_torture_stall+0x230/0x230 ? kthread+0x10e/0x130 kthread+0x10e/0x130 ? kthread_create_worker_on_cpu+0x70/0x70 ? call_usermodehelper_exec_async+0x11a/0x150 ret_from_fork+0x3a/0x50 This happens with the following even sequence: preempt_schedule_irq(); local_irq_enable(); __schedule(): local_irq_disable(); // irq off ... rcu_note_context_switch(): rcu_note_preempt_context_switch(): rcu_read_unlock_special(): local_irq_save(flags); ... raw_spin_unlock_irqrestore(...,flags); // irq remains off rt_mutex_futex_unlock(): raw_spin_lock_irq(); ... raw_spin_unlock_irq(); // accidentally set irq on rq_lock(): raw_spin_lock(); // acquiring rq->lock with irq on which means rq->lock becomes a HARDIRQ-unsafe lock, which can cause deadlocks in scheduler code. This problem was introduced by commit 02a7c234e540 ("rcu: Suppress lockdep false-positive ->boost_mtx complaints"). That brought the user of rt_mutex_futex_unlock() with irq off. To fix this, replace the *lock_irq() in rt_mutex_futex_unlock() with *lock_irq{save,restore}() to make it safe to call rt_mutex_futex_unlock() with irq off. Fixes: 02a7c234e540 ("rcu: Suppress lockdep false-positive ->boost_mtx complaints") Signed-off-by: Boqun Feng Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra Cc: Lai Jiangshan Cc: Steven Rostedt Cc: Josh Triplett Cc: Mathieu Desnoyers Cc: "Paul E . McKenney" Link: https://lkml.kernel.org/r/20180309065630.8283-1-boqun.f...@gmail.com Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Julia Cartwright --- kernel/locking/rtmutex.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 57361d631749..5e15f5c73637 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -2213,11 +2213,12 @@ void __sched rt_mutex_futex_unlock(struct rt_mutex *lock) { WAKE_Q(wake_q); WAKE_Q(wake_sleeper_q); + unsigned long flags; bool postunlock; - raw_spin_lock_irq(>wait_lock); + raw_spin_lock_irqsave(>wait_lock, flags); postunlock = __rt_mutex_futex_unlock(lock, _q, _sleeper_q); - raw_spin_unlock_irq(>wait_lock); + raw_spin_unlock_irqrestore(>wait_lock, flags); if (postunlock) rt_mutex_postunlock(_q, _sleeper_q); -- 2.18.0
[PATCH RT 20/22] squashfs: make use of local lock in multi_cpu decompressor
4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- Currently, the squashfs multi_cpu decompressor makes use of get_cpu_ptr()/put_cpu_ptr(), which unconditionally disable preemption during decompression. Because the workload is distributed across CPUs, all CPUs can observe a very high wakeup latency, which has been seen to be as much as 8000us. Convert this decompressor to make use of a local lock, which will allow execution of the decompressor with preemption-enabled, but also ensure concurrent accesses to the percpu compressor data on the local CPU will be serialized. Cc: stable...@vger.kernel.org Reported-by: Alexander Stein Tested-by: Alexander Stein Signed-off-by: Julia Cartwright Signed-off-by: Sebastian Andrzej Siewior (cherry picked from commit c160736542d7b3d67da32848d2f028b8e35730e5) Signed-off-by: Julia Cartwright --- fs/squashfs/decompressor_multi_percpu.c | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/fs/squashfs/decompressor_multi_percpu.c b/fs/squashfs/decompressor_multi_percpu.c index 23a9c28ad8ea..6a73c4fa88e7 100644 --- a/fs/squashfs/decompressor_multi_percpu.c +++ b/fs/squashfs/decompressor_multi_percpu.c @@ -10,6 +10,7 @@ #include #include #include +#include #include "squashfs_fs.h" #include "squashfs_fs_sb.h" @@ -25,6 +26,8 @@ struct squashfs_stream { void*stream; }; +static DEFINE_LOCAL_IRQ_LOCK(stream_lock); + void *squashfs_decompressor_create(struct squashfs_sb_info *msblk, void *comp_opts) { @@ -79,10 +82,15 @@ int squashfs_decompress(struct squashfs_sb_info *msblk, struct buffer_head **bh, { struct squashfs_stream __percpu *percpu = (struct squashfs_stream __percpu *) msblk->stream; - struct squashfs_stream *stream = get_cpu_ptr(percpu); - int res = msblk->decompressor->decompress(msblk, stream->stream, bh, b, - offset, length, output); - put_cpu_ptr(stream); + struct squashfs_stream *stream; + int res; + + stream = get_locked_ptr(stream_lock, percpu); + + res = msblk->decompressor->decompress(msblk, stream->stream, bh, b, + offset, length, output); + + put_locked_ptr(stream_lock, stream); if (res < 0) ERROR("%s decompression failed, data probably corrupt\n", -- 2.18.0
[PATCH RT 08/22] sched, tracing: Fix trace_sched_pi_setprio() for deboosting
From: Sebastian Andrzej Siewior 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- [ Upstream commit 4ff648decf4712d39f184fc2df3163f43975575a ] Since the following commit: b91473ff6e97 ("sched,tracing: Update trace_sched_pi_setprio()") the sched_pi_setprio trace point shows the "newprio" during a deboost: |futex sched_pi_setprio: comm=futex_requeue_p pid"34 oldprio˜ newprio=98 |futex sched_switch: prev_comm=futex_requeue_p prev_pid"34 prev_prio0 This patch open codes __rt_effective_prio() in the tracepoint as the 'newprio' to get the old behaviour back / the correct priority: |futex sched_pi_setprio: comm=futex_requeue_p pid"20 oldprio˜ newprio=120 |futex sched_switch: prev_comm=futex_requeue_p prev_pid"20 prev_prio0 Peter suggested to open code the new priority so people using tracehook could get the deadline data out. Reported-by: Mansky Christian Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Steven Rostedt Cc: Thomas Gleixner Fixes: b91473ff6e97 ("sched,tracing: Update trace_sched_pi_setprio()") Link: http://lkml.kernel.org/r/20180524132647.gg6ziuogczdmj...@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Julia Cartwright --- include/trace/events/sched.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h index 516ae88cddf4..742682079acf 100644 --- a/include/trace/events/sched.h +++ b/include/trace/events/sched.h @@ -429,7 +429,9 @@ TRACE_EVENT(sched_pi_setprio, memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN); __entry->pid= tsk->pid; __entry->oldprio= tsk->prio; - __entry->newprio= pi_task ? pi_task->prio : tsk->prio; + __entry->newprio= pi_task ? + min(tsk->normal_prio, pi_task->prio) : + tsk->normal_prio; /* XXX SCHED_DEADLINE bits missing */ ), -- 2.18.0
[PATCH RT 08/22] sched, tracing: Fix trace_sched_pi_setprio() for deboosting
From: Sebastian Andrzej Siewior 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- [ Upstream commit 4ff648decf4712d39f184fc2df3163f43975575a ] Since the following commit: b91473ff6e97 ("sched,tracing: Update trace_sched_pi_setprio()") the sched_pi_setprio trace point shows the "newprio" during a deboost: |futex sched_pi_setprio: comm=futex_requeue_p pid"34 oldprio˜ newprio=98 |futex sched_switch: prev_comm=futex_requeue_p prev_pid"34 prev_prio0 This patch open codes __rt_effective_prio() in the tracepoint as the 'newprio' to get the old behaviour back / the correct priority: |futex sched_pi_setprio: comm=futex_requeue_p pid"20 oldprio˜ newprio=120 |futex sched_switch: prev_comm=futex_requeue_p prev_pid"20 prev_prio0 Peter suggested to open code the new priority so people using tracehook could get the deadline data out. Reported-by: Mansky Christian Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Steven Rostedt Cc: Thomas Gleixner Fixes: b91473ff6e97 ("sched,tracing: Update trace_sched_pi_setprio()") Link: http://lkml.kernel.org/r/20180524132647.gg6ziuogczdmj...@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Julia Cartwright --- include/trace/events/sched.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h index 516ae88cddf4..742682079acf 100644 --- a/include/trace/events/sched.h +++ b/include/trace/events/sched.h @@ -429,7 +429,9 @@ TRACE_EVENT(sched_pi_setprio, memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN); __entry->pid= tsk->pid; __entry->oldprio= tsk->prio; - __entry->newprio= pi_task ? pi_task->prio : tsk->prio; + __entry->newprio= pi_task ? + min(tsk->normal_prio, pi_task->prio) : + tsk->normal_prio; /* XXX SCHED_DEADLINE bits missing */ ), -- 2.18.0
[PATCH RT 03/22] futex: Avoid violating the 10th rule of futex
From: Peter Zijlstra 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- [ Upstream commit c1e2f0eaf015fb7076d51a339011f2383e6dd389 ] Julia reported futex state corruption in the following scenario: waiter waker stealer (prio > waiter) futex(WAIT_REQUEUE_PI, uaddr, uaddr2, timeout=[N ms]) futex_wait_requeue_pi() futex_wait_queue_me() freezable_schedule() futex(LOCK_PI, uaddr2) futex(CMP_REQUEUE_PI, uaddr, uaddr2, 1, 0) /* requeues waiter to uaddr2 */ futex(UNLOCK_PI, uaddr2) wake_futex_pi() cmp_futex_value_locked(uaddr2, waiter) wake_up_q() task> futex(LOCK_PI, uaddr2) __rt_mutex_start_proxy_lock() try_to_take_rt_mutex() /* steals lock */ rt_mutex_set_owner(lock, stealer) rt_mutex_wait_proxy_lock() __rt_mutex_slowlock() try_to_take_rt_mutex() /* fails, lock held by stealer */ if (timeout && !timeout->task) return -ETIMEDOUT; fixup_owner() /* lock wasn't acquired, so, fixup_pi_state_owner skipped */ return -ETIMEDOUT; /* At this point, we've returned -ETIMEDOUT to userspace, but the * futex word shows waiter to be the owner, and the pi_mutex has * stealer as the owner */ futex_lock(LOCK_PI, uaddr2) -> bails with EDEADLK, futex word says we're owner. And suggested that what commit: 73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state") removes from fixup_owner() looks to be just what is needed. And indeed it is -- I completely missed that requeue_pi could also result in this case. So we need to restore that, except that subsequent patches, like commit: 16ffa12d7425 ("futex: Pull rt_mutex_futex_unlock() out from under hb->lock") changed all the locking rules. Even without that, the sequence: - if (rt_mutex_futex_trylock(>pi_state->pi_mutex)) { - locked = 1; - goto out; - } - raw_spin_lock_irq(>pi_state->pi_mutex.wait_lock); - owner = rt_mutex_owner(>pi_state->pi_mutex); - if (!owner) - owner = rt_mutex_next_owner(>pi_state->pi_mutex); - raw_spin_unlock_irq(>pi_state->pi_mutex.wait_lock); - ret = fixup_pi_state_owner(uaddr, q, owner); already suggests there were races; otherwise we'd never have to look at next_owner. So instead of doing 3 consecutive wait_lock sections with who knows what races, we do it all in a single section. Additionally, the usage of pi_state->owner in fixup_owner() was only safe because only the rt_mutex owner would modify it, which this additional case wrecks. Luckily the values can only change away and not to the value we're testing, this means we can do a speculative test and double check once we have the wait_lock. Fixes: 73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state") Reported-by: Julia Cartwright Reported-by: Gratian Crisan Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Tested-by: Julia Cartwright Tested-by: Gratian Crisan Cc: Darren Hart Cc: sta...@vger.kernel.org Link: https://lkml.kernel.org/r/20171208124939.7livp7no2ov65...@hirez.programming.kicks-ass.net Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Julia Cartwright --- kernel/futex.c | 83 ++--- kernel/locking/rtmutex.c| 26 --- kernel/locking/rtmutex_common.h | 1 + 3 files changed, 87 insertions(+), 23 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index 270148be5647..cdd68ba6e3a6 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -2287,21 +2287,17 @@ static void unqueue_me_pi(struct futex_q *q) spin_unlock(q->lock_ptr); } -/* - * Fixup the pi_state owner with the
[PATCH RT 03/22] futex: Avoid violating the 10th rule of futex
From: Peter Zijlstra 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- [ Upstream commit c1e2f0eaf015fb7076d51a339011f2383e6dd389 ] Julia reported futex state corruption in the following scenario: waiter waker stealer (prio > waiter) futex(WAIT_REQUEUE_PI, uaddr, uaddr2, timeout=[N ms]) futex_wait_requeue_pi() futex_wait_queue_me() freezable_schedule() futex(LOCK_PI, uaddr2) futex(CMP_REQUEUE_PI, uaddr, uaddr2, 1, 0) /* requeues waiter to uaddr2 */ futex(UNLOCK_PI, uaddr2) wake_futex_pi() cmp_futex_value_locked(uaddr2, waiter) wake_up_q() task> futex(LOCK_PI, uaddr2) __rt_mutex_start_proxy_lock() try_to_take_rt_mutex() /* steals lock */ rt_mutex_set_owner(lock, stealer) rt_mutex_wait_proxy_lock() __rt_mutex_slowlock() try_to_take_rt_mutex() /* fails, lock held by stealer */ if (timeout && !timeout->task) return -ETIMEDOUT; fixup_owner() /* lock wasn't acquired, so, fixup_pi_state_owner skipped */ return -ETIMEDOUT; /* At this point, we've returned -ETIMEDOUT to userspace, but the * futex word shows waiter to be the owner, and the pi_mutex has * stealer as the owner */ futex_lock(LOCK_PI, uaddr2) -> bails with EDEADLK, futex word says we're owner. And suggested that what commit: 73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state") removes from fixup_owner() looks to be just what is needed. And indeed it is -- I completely missed that requeue_pi could also result in this case. So we need to restore that, except that subsequent patches, like commit: 16ffa12d7425 ("futex: Pull rt_mutex_futex_unlock() out from under hb->lock") changed all the locking rules. Even without that, the sequence: - if (rt_mutex_futex_trylock(>pi_state->pi_mutex)) { - locked = 1; - goto out; - } - raw_spin_lock_irq(>pi_state->pi_mutex.wait_lock); - owner = rt_mutex_owner(>pi_state->pi_mutex); - if (!owner) - owner = rt_mutex_next_owner(>pi_state->pi_mutex); - raw_spin_unlock_irq(>pi_state->pi_mutex.wait_lock); - ret = fixup_pi_state_owner(uaddr, q, owner); already suggests there were races; otherwise we'd never have to look at next_owner. So instead of doing 3 consecutive wait_lock sections with who knows what races, we do it all in a single section. Additionally, the usage of pi_state->owner in fixup_owner() was only safe because only the rt_mutex owner would modify it, which this additional case wrecks. Luckily the values can only change away and not to the value we're testing, this means we can do a speculative test and double check once we have the wait_lock. Fixes: 73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state") Reported-by: Julia Cartwright Reported-by: Gratian Crisan Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Tested-by: Julia Cartwright Tested-by: Gratian Crisan Cc: Darren Hart Cc: sta...@vger.kernel.org Link: https://lkml.kernel.org/r/20171208124939.7livp7no2ov65...@hirez.programming.kicks-ass.net Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Julia Cartwright --- kernel/futex.c | 83 ++--- kernel/locking/rtmutex.c| 26 --- kernel/locking/rtmutex_common.h | 1 + 3 files changed, 87 insertions(+), 23 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index 270148be5647..cdd68ba6e3a6 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -2287,21 +2287,17 @@ static void unqueue_me_pi(struct futex_q *q) spin_unlock(q->lock_ptr); } -/* - * Fixup the pi_state owner with the
[PATCH RT 00/22] Linux 4.9.115-rt94-rc1
Hello RT folks! This patchset brings back many RT-specific fixes that have gone into subsequent 4.14-rt and 4.16-rt releases. One of my x86 boxes very intermittently triggers a WARN_ON() on bootup in migrate_enable(), which I'm still trying to triage. If you can more reliably reproduce this, please let me know. This release candidate will not be pushed to the git tree. To build 4.9.115-rt94-rc1 directly, the following patches should be applied: http://www.kernel.org/pub/linux/kernel/v4.x/linux-4.9.tar.xz http://www.kernel.org/pub/linux/kernel/v4.x/patch-4.9.115.xz http://www.kernel.org/pub/linux/kernel/projects/rt/4.9/patch-4.9.115-rt94-rc1.patch.xz If all goes well with testing, this rc will be promoted to an official release on 8/16/2018. Please go forth and test! Thanks, Julia --- Boqun Feng (1): rtmutex: Make rt_mutex_futex_unlock() safe for irq-off callsites Julia Cartwright (4): locallock: provide {get,put}_locked_ptr() variants squashfs: make use of local lock in multi_cpu decompressor seqlock: provide the same ordering semantics as mainline Linux 4.9.115-rt94-rc1 Paul E. McKenney (1): rcu: Suppress lockdep false-positive ->boost_mtx complaints Peter Zijlstra (4): futex: Fix pi_state->owner serialization futex: Fix more put_pi_state() vs. exit_pi_state_list() races futex: Avoid violating the 10th rule of futex futex: Fix OWNER_DEAD fixup Sebastian Andrzej Siewior (12): rcu: Do not include rtmutex_common.h unconditionally sched, tracing: Fix trace_sched_pi_setprio() for deboosting crypto: limit more FPU-enabled sections arm*: disable NEON in kernel mode mm/slub: close possible memory-leak in kmem_cache_alloc_bulk() locking: add types.h net: use task_struct instead of CPU number as the queue owner on -RT Revert "rt,ntp: Move call to schedule_delayed_work() to helper thread" Revert "block: blk-mq: Use swait" block: blk-mq: move blk_queue_usage_counter_release() into process context alarmtimer: Prevent live lock in alarm_cancel() posix-timers: move the rcu head out of the union arch/arm/Kconfig | 2 +- arch/arm64/crypto/Kconfig | 14 +- arch/x86/crypto/camellia_aesni_avx2_glue.c | 20 +++ arch/x86/crypto/camellia_aesni_avx_glue.c | 19 +++ arch/x86/crypto/cast6_avx_glue.c | 24 +++- arch/x86/crypto/chacha20_glue.c| 9 +- arch/x86/crypto/serpent_avx2_glue.c| 19 +++ arch/x86/crypto/serpent_avx_glue.c | 23 +++- arch/x86/crypto/serpent_sse2_glue.c| 23 +++- arch/x86/crypto/twofish_avx_glue.c | 27 +++- arch/x86/include/asm/fpu/api.h | 1 + arch/x86/kernel/fpu/core.c | 12 ++ block/blk-core.c | 22 +++- block/blk-mq.c | 6 +- fs/squashfs/decompressor_multi_percpu.c| 16 ++- include/linux/blkdev.h | 4 +- include/linux/locallock.h | 10 ++ include/linux/netdevice.h | 54 +++- include/linux/posix-timers.h | 2 +- include/linux/seqlock.h| 1 + include/linux/spinlock_types_raw.h | 2 + include/trace/events/sched.h | 4 +- kernel/futex.c | 144 - kernel/locking/rtmutex.c | 31 +++-- kernel/locking/rtmutex_common.h| 1 + kernel/rcu/tree_plugin.h | 5 +- kernel/time/alarmtimer.c | 2 +- kernel/time/ntp.c | 26 kernel/time/posix-timers.c | 4 +- localversion-rt| 2 +- mm/slub.c | 1 + net/core/dev.c | 6 +- 32 files changed, 412 insertions(+), 124 deletions(-) -- 2.18.0
[PATCH RT 15/22] Revert "block: blk-mq: Use swait"
From: Sebastian Andrzej Siewior 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- This reverts commit "block: blk-mq: Use swait". The issue remains but will be fixed differently. Cc: stable...@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior (cherry-picked from ca3fd6cf836739fd59eac2f7a9b0261365e818bb) Signed-off-by: Julia Cartwright --- block/blk-core.c | 10 +- block/blk-mq.c | 6 +++--- include/linux/blkdev.h | 2 +- 3 files changed, 9 insertions(+), 9 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index e4ac43392875..87d3e0a503e5 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -662,9 +662,9 @@ int blk_queue_enter(struct request_queue *q, bool nowait) if (nowait) return -EBUSY; - swait_event(q->mq_freeze_wq, - !atomic_read(>mq_freeze_depth) || - blk_queue_dying(q)); + wait_event(q->mq_freeze_wq, + !atomic_read(>mq_freeze_depth) || + blk_queue_dying(q)); if (blk_queue_dying(q)) return -ENODEV; } @@ -680,7 +680,7 @@ static void blk_queue_usage_counter_release(struct percpu_ref *ref) struct request_queue *q = container_of(ref, struct request_queue, q_usage_counter); - swake_up_all(>mq_freeze_wq); + wake_up_all(>mq_freeze_wq); } static void blk_rq_timed_out_timer(unsigned long data) @@ -750,7 +750,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) q->bypass_depth = 1; __set_bit(QUEUE_FLAG_BYPASS, >queue_flags); - init_swait_queue_head(>mq_freeze_wq); + init_waitqueue_head(>mq_freeze_wq); /* * Init percpu_ref in atomic mode so that it's faster to shutdown. diff --git a/block/blk-mq.c b/block/blk-mq.c index e0a804ab5420..3a49552974ec 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -72,7 +72,7 @@ EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_start); static void blk_mq_freeze_queue_wait(struct request_queue *q) { - swait_event(q->mq_freeze_wq, percpu_ref_is_zero(>q_usage_counter)); + wait_event(q->mq_freeze_wq, percpu_ref_is_zero(>q_usage_counter)); } /* @@ -110,7 +110,7 @@ void blk_mq_unfreeze_queue(struct request_queue *q) WARN_ON_ONCE(freeze_depth < 0); if (!freeze_depth) { percpu_ref_reinit(>q_usage_counter); - swake_up_all(>mq_freeze_wq); + wake_up_all(>mq_freeze_wq); } } EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue); @@ -129,7 +129,7 @@ void blk_mq_wake_waiters(struct request_queue *q) * dying, we need to ensure that processes currently waiting on * the queue are notified as well. */ - swake_up_all(>mq_freeze_wq); + wake_up_all(>mq_freeze_wq); } bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 34fd1ed9845e..fdb449fe3ff7 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -468,7 +468,7 @@ struct request_queue { struct throtl_data *td; #endif struct rcu_head rcu_head; - struct swait_queue_head mq_freeze_wq; + wait_queue_head_t mq_freeze_wq; struct percpu_ref q_usage_counter; struct list_headall_q_node; -- 2.18.0
[PATCH RT 15/22] Revert "block: blk-mq: Use swait"
From: Sebastian Andrzej Siewior 4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- This reverts commit "block: blk-mq: Use swait". The issue remains but will be fixed differently. Cc: stable...@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior (cherry-picked from ca3fd6cf836739fd59eac2f7a9b0261365e818bb) Signed-off-by: Julia Cartwright --- block/blk-core.c | 10 +- block/blk-mq.c | 6 +++--- include/linux/blkdev.h | 2 +- 3 files changed, 9 insertions(+), 9 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index e4ac43392875..87d3e0a503e5 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -662,9 +662,9 @@ int blk_queue_enter(struct request_queue *q, bool nowait) if (nowait) return -EBUSY; - swait_event(q->mq_freeze_wq, - !atomic_read(>mq_freeze_depth) || - blk_queue_dying(q)); + wait_event(q->mq_freeze_wq, + !atomic_read(>mq_freeze_depth) || + blk_queue_dying(q)); if (blk_queue_dying(q)) return -ENODEV; } @@ -680,7 +680,7 @@ static void blk_queue_usage_counter_release(struct percpu_ref *ref) struct request_queue *q = container_of(ref, struct request_queue, q_usage_counter); - swake_up_all(>mq_freeze_wq); + wake_up_all(>mq_freeze_wq); } static void blk_rq_timed_out_timer(unsigned long data) @@ -750,7 +750,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) q->bypass_depth = 1; __set_bit(QUEUE_FLAG_BYPASS, >queue_flags); - init_swait_queue_head(>mq_freeze_wq); + init_waitqueue_head(>mq_freeze_wq); /* * Init percpu_ref in atomic mode so that it's faster to shutdown. diff --git a/block/blk-mq.c b/block/blk-mq.c index e0a804ab5420..3a49552974ec 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -72,7 +72,7 @@ EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_start); static void blk_mq_freeze_queue_wait(struct request_queue *q) { - swait_event(q->mq_freeze_wq, percpu_ref_is_zero(>q_usage_counter)); + wait_event(q->mq_freeze_wq, percpu_ref_is_zero(>q_usage_counter)); } /* @@ -110,7 +110,7 @@ void blk_mq_unfreeze_queue(struct request_queue *q) WARN_ON_ONCE(freeze_depth < 0); if (!freeze_depth) { percpu_ref_reinit(>q_usage_counter); - swake_up_all(>mq_freeze_wq); + wake_up_all(>mq_freeze_wq); } } EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue); @@ -129,7 +129,7 @@ void blk_mq_wake_waiters(struct request_queue *q) * dying, we need to ensure that processes currently waiting on * the queue are notified as well. */ - swake_up_all(>mq_freeze_wq); + wake_up_all(>mq_freeze_wq); } bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 34fd1ed9845e..fdb449fe3ff7 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -468,7 +468,7 @@ struct request_queue { struct throtl_data *td; #endif struct rcu_head rcu_head; - struct swait_queue_head mq_freeze_wq; + wait_queue_head_t mq_freeze_wq; struct percpu_ref q_usage_counter; struct list_headall_q_node; -- 2.18.0
[PATCH RT 00/22] Linux 4.9.115-rt94-rc1
Hello RT folks! This patchset brings back many RT-specific fixes that have gone into subsequent 4.14-rt and 4.16-rt releases. One of my x86 boxes very intermittently triggers a WARN_ON() on bootup in migrate_enable(), which I'm still trying to triage. If you can more reliably reproduce this, please let me know. This release candidate will not be pushed to the git tree. To build 4.9.115-rt94-rc1 directly, the following patches should be applied: http://www.kernel.org/pub/linux/kernel/v4.x/linux-4.9.tar.xz http://www.kernel.org/pub/linux/kernel/v4.x/patch-4.9.115.xz http://www.kernel.org/pub/linux/kernel/projects/rt/4.9/patch-4.9.115-rt94-rc1.patch.xz If all goes well with testing, this rc will be promoted to an official release on 8/16/2018. Please go forth and test! Thanks, Julia --- Boqun Feng (1): rtmutex: Make rt_mutex_futex_unlock() safe for irq-off callsites Julia Cartwright (4): locallock: provide {get,put}_locked_ptr() variants squashfs: make use of local lock in multi_cpu decompressor seqlock: provide the same ordering semantics as mainline Linux 4.9.115-rt94-rc1 Paul E. McKenney (1): rcu: Suppress lockdep false-positive ->boost_mtx complaints Peter Zijlstra (4): futex: Fix pi_state->owner serialization futex: Fix more put_pi_state() vs. exit_pi_state_list() races futex: Avoid violating the 10th rule of futex futex: Fix OWNER_DEAD fixup Sebastian Andrzej Siewior (12): rcu: Do not include rtmutex_common.h unconditionally sched, tracing: Fix trace_sched_pi_setprio() for deboosting crypto: limit more FPU-enabled sections arm*: disable NEON in kernel mode mm/slub: close possible memory-leak in kmem_cache_alloc_bulk() locking: add types.h net: use task_struct instead of CPU number as the queue owner on -RT Revert "rt,ntp: Move call to schedule_delayed_work() to helper thread" Revert "block: blk-mq: Use swait" block: blk-mq: move blk_queue_usage_counter_release() into process context alarmtimer: Prevent live lock in alarm_cancel() posix-timers: move the rcu head out of the union arch/arm/Kconfig | 2 +- arch/arm64/crypto/Kconfig | 14 +- arch/x86/crypto/camellia_aesni_avx2_glue.c | 20 +++ arch/x86/crypto/camellia_aesni_avx_glue.c | 19 +++ arch/x86/crypto/cast6_avx_glue.c | 24 +++- arch/x86/crypto/chacha20_glue.c| 9 +- arch/x86/crypto/serpent_avx2_glue.c| 19 +++ arch/x86/crypto/serpent_avx_glue.c | 23 +++- arch/x86/crypto/serpent_sse2_glue.c| 23 +++- arch/x86/crypto/twofish_avx_glue.c | 27 +++- arch/x86/include/asm/fpu/api.h | 1 + arch/x86/kernel/fpu/core.c | 12 ++ block/blk-core.c | 22 +++- block/blk-mq.c | 6 +- fs/squashfs/decompressor_multi_percpu.c| 16 ++- include/linux/blkdev.h | 4 +- include/linux/locallock.h | 10 ++ include/linux/netdevice.h | 54 +++- include/linux/posix-timers.h | 2 +- include/linux/seqlock.h| 1 + include/linux/spinlock_types_raw.h | 2 + include/trace/events/sched.h | 4 +- kernel/futex.c | 144 - kernel/locking/rtmutex.c | 31 +++-- kernel/locking/rtmutex_common.h| 1 + kernel/rcu/tree_plugin.h | 5 +- kernel/time/alarmtimer.c | 2 +- kernel/time/ntp.c | 26 kernel/time/posix-timers.c | 4 +- localversion-rt| 2 +- mm/slub.c | 1 + net/core/dev.c | 6 +- 32 files changed, 412 insertions(+), 124 deletions(-) -- 2.18.0
[PATCH RT 22/22] Linux 4.9.115-rt94-rc1
4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- --- localversion-rt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/localversion-rt b/localversion-rt index e98a1fe050bd..dcc2fd2ca155 100644 --- a/localversion-rt +++ b/localversion-rt @@ -1 +1 @@ --rt93 +-rt94-rc1 -- 2.18.0
[PATCH RT 22/22] Linux 4.9.115-rt94-rc1
4.9.115-rt94-rc1 stable review patch. If you have any objection to the inclusion of this patch, let me know. --- 8< --- 8< --- 8< --- --- localversion-rt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/localversion-rt b/localversion-rt index e98a1fe050bd..dcc2fd2ca155 100644 --- a/localversion-rt +++ b/localversion-rt @@ -1 +1 @@ --rt93 +-rt94-rc1 -- 2.18.0
Re: simple-framebuffer enquire
On Tue, Jun 26, 2018 at 08:44:58PM +0200, Hans de Goede wrote: > Hi, > > On 26-06-18 18:35, Michael Nazzareno Trimarchi wrote: [..] > > cat memblock/reserved > > 0: 0x80004000..0x80007fff > > 1: 0x8010..0x81e030b3 > > 2: 0x8300..0x83007fff > > 3: 0x8400..0x85ff > > 4: 0x86fa2000..0x87021fff > > > > + reserved-memory { > > + #address-cells = <1>; > > + #size-cells = <1>; > > + ranges; > > + > > + display_reserved: framebuffer@86fa2000 { > > + reg = <0x86fa2000 0x8>; > > + }; > > + > > + }; [..] > > Still have the same on ioremap. > > Hmm, I guess the kernel does map the entire region its get > passed and simply makes sure to not touch the reserved mem, > where as with the changes to the passed in mem-region the > sunxi u-boot code does the memory does not get mapped by > the kernel at all ? If the intent is to reserve memory _and_ prevent it from being included in the kernel's linear map, then it is also necessary to include the 'no-map' property for this reserved-mem node. >From Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt: no-map (optional) - empty property - Indicates the operating system must not create a virtual mapping of the region as part of its standard mapping of system memory, nor permit speculative access to it under any circumstances other than under the control of the device driver using the region. Julia
Re: simple-framebuffer enquire
On Tue, Jun 26, 2018 at 08:44:58PM +0200, Hans de Goede wrote: > Hi, > > On 26-06-18 18:35, Michael Nazzareno Trimarchi wrote: [..] > > cat memblock/reserved > > 0: 0x80004000..0x80007fff > > 1: 0x8010..0x81e030b3 > > 2: 0x8300..0x83007fff > > 3: 0x8400..0x85ff > > 4: 0x86fa2000..0x87021fff > > > > + reserved-memory { > > + #address-cells = <1>; > > + #size-cells = <1>; > > + ranges; > > + > > + display_reserved: framebuffer@86fa2000 { > > + reg = <0x86fa2000 0x8>; > > + }; > > + > > + }; [..] > > Still have the same on ioremap. > > Hmm, I guess the kernel does map the entire region its get > passed and simply makes sure to not touch the reserved mem, > where as with the changes to the passed in mem-region the > sunxi u-boot code does the memory does not get mapped by > the kernel at all ? If the intent is to reserve memory _and_ prevent it from being included in the kernel's linear map, then it is also necessary to include the 'no-map' property for this reserved-mem node. >From Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt: no-map (optional) - empty property - Indicates the operating system must not create a virtual mapping of the region as part of its standard mapping of system memory, nor permit speculative access to it under any circumstances other than under the control of the device driver using the region. Julia
[ANNOUNCE] 4.9.98-rt76
Hello RT Folks! I'm pleased to announce the 4.9.98-rt76 stable release. This release is just an update to the new stable 4.9.98 version and no RT specific changes have been made. Expect a 4.9.98-rt77-rc1 with backports from rt-devel soon. You can get this release via the git tree at: git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git branch: v4.9-rt Head SHA1: bc64e52891b8b9e56b19cbb800a1b8c415df13e8 Or to build 4.9.98-rt76 directly, the following patches should be applied: http://www.kernel.org/pub/linux/kernel/v4.x/linux-4.9.tar.xz http://www.kernel.org/pub/linux/kernel/v4.x/patch-4.9.98.xz http://www.kernel.org/pub/linux/kernel/projects/rt/4.9/patch-4.9.98-rt76.patch.xz Enjoy! Julia
[ANNOUNCE] 4.9.98-rt76
Hello RT Folks! I'm pleased to announce the 4.9.98-rt76 stable release. This release is just an update to the new stable 4.9.98 version and no RT specific changes have been made. Expect a 4.9.98-rt77-rc1 with backports from rt-devel soon. You can get this release via the git tree at: git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git branch: v4.9-rt Head SHA1: bc64e52891b8b9e56b19cbb800a1b8c415df13e8 Or to build 4.9.98-rt76 directly, the following patches should be applied: http://www.kernel.org/pub/linux/kernel/v4.x/linux-4.9.tar.xz http://www.kernel.org/pub/linux/kernel/v4.x/patch-4.9.98.xz http://www.kernel.org/pub/linux/kernel/projects/rt/4.9/patch-4.9.98-rt76.patch.xz Enjoy! Julia
Re: [LINUX PATCH v8 2/2] memory: pl353: Add driver for arm pl353 static memory controller
On Mon, May 07, 2018 at 10:12:28AM +, Naga Sureshkumar Relli wrote: > Hi Julia, > > Thanks for reviewing the patch and Sorry for my late reply. This patch > went to junk folder, hence I didn't catch this patch. > > From: Julia Cartwright [mailto:ju...@ni.com] [..] > > > > It would be easier to follow if you constructed your two patchsets with git > > format-patch -- > > thread. > > > > I am using the same but with out --thread. > > > Or, merge them into a single patchset, especially considering the > > dependency between patches. > > But both are different patches, one for Device tree documentation and other > for driver update. Yes, I'm not proposing you merge _patches_ but _patchsets_. You have two patchsets, one for the SMC driver, and another for the NAND. Given that they depend on one another, it's helpful for reviewers if you sent them all together, with a cover letter which describes the entire patchset, changelog, it's dependencies, revision changelog, etc. Something like: [PATCH v9 0/4] rawnand: memory: add support for PL353 static memory controller + NAND [PATCH v9 1/4] Devicetree: Add pl353 smc controller devicetree binding information [PATCH v9 2/4] memory: pl353: Add driver for arm pl353 static memory controller [PATCH v9 3/4] Documentation: nand: pl353: Add documentation for controller and driver [PATCH v9 4/4] mtd: rawnand: pl353: Add basic driver for arm pl353 smc nand interface Anyway, just with --thread enabled would be an improvement. [..] > > > --- a/drivers/memory/Kconfig > > > +++ b/drivers/memory/Kconfig > > > @@ -152,6 +152,13 @@ config DA8XX_DDRCTL > > > This driver is for the DDR2/mDDR Memory Controller present on > > > Texas Instruments da8xx SoCs. It's used to tweak various memory > > > controller configuration options. > > > +config PL35X_SMC > > > + bool "ARM PL35X Static Memory Controller(SMC) driver" > > > > Is there any reason why this can't be tristate? > > There is a Nand driver which uses this driver. i.e The NAND driver > Depends on this driver. That's true, but it's irrelevant to question I asked. It is perfectly valid for both the SMC and NAND drivers to be tristate, why are you not allowing this configuration? Julia
Re: [LINUX PATCH v8 2/2] memory: pl353: Add driver for arm pl353 static memory controller
On Mon, May 07, 2018 at 10:12:28AM +, Naga Sureshkumar Relli wrote: > Hi Julia, > > Thanks for reviewing the patch and Sorry for my late reply. This patch > went to junk folder, hence I didn't catch this patch. > > From: Julia Cartwright [mailto:ju...@ni.com] [..] > > > > It would be easier to follow if you constructed your two patchsets with git > > format-patch -- > > thread. > > > > I am using the same but with out --thread. > > > Or, merge them into a single patchset, especially considering the > > dependency between patches. > > But both are different patches, one for Device tree documentation and other > for driver update. Yes, I'm not proposing you merge _patches_ but _patchsets_. You have two patchsets, one for the SMC driver, and another for the NAND. Given that they depend on one another, it's helpful for reviewers if you sent them all together, with a cover letter which describes the entire patchset, changelog, it's dependencies, revision changelog, etc. Something like: [PATCH v9 0/4] rawnand: memory: add support for PL353 static memory controller + NAND [PATCH v9 1/4] Devicetree: Add pl353 smc controller devicetree binding information [PATCH v9 2/4] memory: pl353: Add driver for arm pl353 static memory controller [PATCH v9 3/4] Documentation: nand: pl353: Add documentation for controller and driver [PATCH v9 4/4] mtd: rawnand: pl353: Add basic driver for arm pl353 smc nand interface Anyway, just with --thread enabled would be an improvement. [..] > > > --- a/drivers/memory/Kconfig > > > +++ b/drivers/memory/Kconfig > > > @@ -152,6 +152,13 @@ config DA8XX_DDRCTL > > > This driver is for the DDR2/mDDR Memory Controller present on > > > Texas Instruments da8xx SoCs. It's used to tweak various memory > > > controller configuration options. > > > +config PL35X_SMC > > > + bool "ARM PL35X Static Memory Controller(SMC) driver" > > > > Is there any reason why this can't be tristate? > > There is a Nand driver which uses this driver. i.e The NAND driver > Depends on this driver. That's true, but it's irrelevant to question I asked. It is perfectly valid for both the SMC and NAND drivers to be tristate, why are you not allowing this configuration? Julia
[PATCH RT 1/2] locallock: provide {get,put}_locked_ptr() variants
Provide a set of locallocked accessors for pointers to per-CPU data; this is useful for dynamically-allocated per-CPU regions, for example. These are symmetric with the {get,put}_cpu_ptr() per-CPU accessor variants. Signed-off-by: Julia Cartwright <ju...@ni.com> --- include/linux/locallock.h | 10 ++ 1 file changed, 10 insertions(+) diff --git a/include/linux/locallock.h b/include/linux/locallock.h index d658c2552601..921eab83cd34 100644 --- a/include/linux/locallock.h +++ b/include/linux/locallock.h @@ -222,6 +222,14 @@ static inline int __local_unlock_irqrestore(struct local_irq_lock *lv, #define put_locked_var(lvar, var) local_unlock(lvar); +#define get_locked_ptr(lvar, var) \ + ({ \ + local_lock(lvar); \ + this_cpu_ptr(var); \ + }) + +#define put_locked_ptr(lvar, var) local_unlock(lvar); + #define local_lock_cpu(lvar) \ ({ \ local_lock(lvar); \ @@ -262,6 +270,8 @@ static inline void local_irq_lock_init(int lvar) { } #define get_locked_var(lvar, var) get_cpu_var(var) #define put_locked_var(lvar, var) put_cpu_var(var) +#define get_locked_ptr(lvar, var) get_cpu_ptr(var) +#define put_locked_ptr(lvar, var) put_cpu_ptr(var) #define local_lock_cpu(lvar) get_cpu() #define local_unlock_cpu(lvar) put_cpu() -- 2.17.0
[PATCH RT 1/2] locallock: provide {get,put}_locked_ptr() variants
Provide a set of locallocked accessors for pointers to per-CPU data; this is useful for dynamically-allocated per-CPU regions, for example. These are symmetric with the {get,put}_cpu_ptr() per-CPU accessor variants. Signed-off-by: Julia Cartwright --- include/linux/locallock.h | 10 ++ 1 file changed, 10 insertions(+) diff --git a/include/linux/locallock.h b/include/linux/locallock.h index d658c2552601..921eab83cd34 100644 --- a/include/linux/locallock.h +++ b/include/linux/locallock.h @@ -222,6 +222,14 @@ static inline int __local_unlock_irqrestore(struct local_irq_lock *lv, #define put_locked_var(lvar, var) local_unlock(lvar); +#define get_locked_ptr(lvar, var) \ + ({ \ + local_lock(lvar); \ + this_cpu_ptr(var); \ + }) + +#define put_locked_ptr(lvar, var) local_unlock(lvar); + #define local_lock_cpu(lvar) \ ({ \ local_lock(lvar); \ @@ -262,6 +270,8 @@ static inline void local_irq_lock_init(int lvar) { } #define get_locked_var(lvar, var) get_cpu_var(var) #define put_locked_var(lvar, var) put_cpu_var(var) +#define get_locked_ptr(lvar, var) get_cpu_ptr(var) +#define put_locked_ptr(lvar, var) put_cpu_ptr(var) #define local_lock_cpu(lvar) get_cpu() #define local_unlock_cpu(lvar) put_cpu() -- 2.17.0
[PATCH RT 2/2] squashfs: make use of local lock in multi_cpu decompressor
Currently, the squashfs multi_cpu decompressor makes use of get_cpu_ptr()/put_cpu_ptr(), which unconditionally disable preemption during decompression. Because the workload is distributed across CPUs, all CPUs can observe a very high wakeup latency, which has been seen to be as much as 8000us. Convert this decompressor to make use of a local lock, which will allow execution of the decompressor with preemption-enabled, but also ensure concurrent accesses to the percpu compressor data on the local CPU will be serialized. Reported-by: Alexander Stein <alexander.st...@systec-electronic.com> Tested-by: Alexander Stein <alexander.st...@systec-electronic.com> Signed-off-by: Julia Cartwright <ju...@ni.com> --- fs/squashfs/decompressor_multi_percpu.c | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/fs/squashfs/decompressor_multi_percpu.c b/fs/squashfs/decompressor_multi_percpu.c index 23a9c28ad8ea..6a73c4fa88e7 100644 --- a/fs/squashfs/decompressor_multi_percpu.c +++ b/fs/squashfs/decompressor_multi_percpu.c @@ -10,6 +10,7 @@ #include #include #include +#include #include "squashfs_fs.h" #include "squashfs_fs_sb.h" @@ -25,6 +26,8 @@ struct squashfs_stream { void*stream; }; +static DEFINE_LOCAL_IRQ_LOCK(stream_lock); + void *squashfs_decompressor_create(struct squashfs_sb_info *msblk, void *comp_opts) { @@ -79,10 +82,15 @@ int squashfs_decompress(struct squashfs_sb_info *msblk, struct buffer_head **bh, { struct squashfs_stream __percpu *percpu = (struct squashfs_stream __percpu *) msblk->stream; - struct squashfs_stream *stream = get_cpu_ptr(percpu); - int res = msblk->decompressor->decompress(msblk, stream->stream, bh, b, - offset, length, output); - put_cpu_ptr(stream); + struct squashfs_stream *stream; + int res; + + stream = get_locked_ptr(stream_lock, percpu); + + res = msblk->decompressor->decompress(msblk, stream->stream, bh, b, + offset, length, output); + + put_locked_ptr(stream_lock, stream); if (res < 0) ERROR("%s decompression failed, data probably corrupt\n", -- 2.17.0
[PATCH RT 2/2] squashfs: make use of local lock in multi_cpu decompressor
Currently, the squashfs multi_cpu decompressor makes use of get_cpu_ptr()/put_cpu_ptr(), which unconditionally disable preemption during decompression. Because the workload is distributed across CPUs, all CPUs can observe a very high wakeup latency, which has been seen to be as much as 8000us. Convert this decompressor to make use of a local lock, which will allow execution of the decompressor with preemption-enabled, but also ensure concurrent accesses to the percpu compressor data on the local CPU will be serialized. Reported-by: Alexander Stein Tested-by: Alexander Stein Signed-off-by: Julia Cartwright --- fs/squashfs/decompressor_multi_percpu.c | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/fs/squashfs/decompressor_multi_percpu.c b/fs/squashfs/decompressor_multi_percpu.c index 23a9c28ad8ea..6a73c4fa88e7 100644 --- a/fs/squashfs/decompressor_multi_percpu.c +++ b/fs/squashfs/decompressor_multi_percpu.c @@ -10,6 +10,7 @@ #include #include #include +#include #include "squashfs_fs.h" #include "squashfs_fs_sb.h" @@ -25,6 +26,8 @@ struct squashfs_stream { void*stream; }; +static DEFINE_LOCAL_IRQ_LOCK(stream_lock); + void *squashfs_decompressor_create(struct squashfs_sb_info *msblk, void *comp_opts) { @@ -79,10 +82,15 @@ int squashfs_decompress(struct squashfs_sb_info *msblk, struct buffer_head **bh, { struct squashfs_stream __percpu *percpu = (struct squashfs_stream __percpu *) msblk->stream; - struct squashfs_stream *stream = get_cpu_ptr(percpu); - int res = msblk->decompressor->decompress(msblk, stream->stream, bh, b, - offset, length, output); - put_cpu_ptr(stream); + struct squashfs_stream *stream; + int res; + + stream = get_locked_ptr(stream_lock, percpu); + + res = msblk->decompressor->decompress(msblk, stream->stream, bh, b, + offset, length, output); + + put_locked_ptr(stream_lock, stream); if (res < 0) ERROR("%s decompression failed, data probably corrupt\n", -- 2.17.0
[PATCH RT] seqlock: provide the same ordering semantics as mainline
The mainline implementation of read_seqbegin() orders prior loads w.r.t. the read-side critical section. Fixup the RT writer-boosting implementation to provide the same guarantee. Also, while we're here, update the usage of ACCESS_ONCE() to use READ_ONCE(). Fixes: e69f15cf77c23 ("seqlock: Prevent rt starvation") Cc: stable...@vger.kernel.org Signed-off-by: Julia Cartwright <ju...@ni.com> --- Found during code inspection of the RT seqlock implementation. Julia include/linux/seqlock.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h index a59751276b94..597ce5a9e013 100644 --- a/include/linux/seqlock.h +++ b/include/linux/seqlock.h @@ -453,7 +453,7 @@ static inline unsigned read_seqbegin(seqlock_t *sl) unsigned ret; repeat: - ret = ACCESS_ONCE(sl->seqcount.sequence); + ret = READ_ONCE(sl->seqcount.sequence); if (unlikely(ret & 1)) { /* * Take the lock and let the writer proceed (i.e. evtl @@ -462,6 +462,7 @@ static inline unsigned read_seqbegin(seqlock_t *sl) spin_unlock_wait(>lock); goto repeat; } + smp_rmb(); return ret; } #endif -- 2.16.1
[PATCH RT] seqlock: provide the same ordering semantics as mainline
The mainline implementation of read_seqbegin() orders prior loads w.r.t. the read-side critical section. Fixup the RT writer-boosting implementation to provide the same guarantee. Also, while we're here, update the usage of ACCESS_ONCE() to use READ_ONCE(). Fixes: e69f15cf77c23 ("seqlock: Prevent rt starvation") Cc: stable...@vger.kernel.org Signed-off-by: Julia Cartwright --- Found during code inspection of the RT seqlock implementation. Julia include/linux/seqlock.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h index a59751276b94..597ce5a9e013 100644 --- a/include/linux/seqlock.h +++ b/include/linux/seqlock.h @@ -453,7 +453,7 @@ static inline unsigned read_seqbegin(seqlock_t *sl) unsigned ret; repeat: - ret = ACCESS_ONCE(sl->seqcount.sequence); + ret = READ_ONCE(sl->seqcount.sequence); if (unlikely(ret & 1)) { /* * Take the lock and let the writer proceed (i.e. evtl @@ -462,6 +462,7 @@ static inline unsigned read_seqbegin(seqlock_t *sl) spin_unlock_wait(>lock); goto repeat; } + smp_rmb(); return ret; } #endif -- 2.16.1
Re: [LINUX PATCH v8 2/2] memory: pl353: Add driver for arm pl353 static memory controller
Hello- On Wed, Mar 14, 2018 at 04:14:34PM +0530, nagasureshkumarre...@gmail.com wrote: > From: Naga Sureshkumar RelliI'm pleased to see this patchset revived and resubmitted! It would be easier to follow if you constructed your two patchsets with git format-patch --thread. Or, merge them into a single patchset, especially considering the dependency between patches. Some code review comments below: > Add driver for arm pl353 static memory controller. This controller is > used in xilinx zynq soc for interfacing the nand and nor/sram memory > devices. > > Signed-off-by: Naga Sureshkumar Relli > --- > Changes in v8: > - None > Changes in v7: > - Corrected the kconfig to use tristate selection > - Corrected the GPL licence ident > - Added boundary checks for nand timing parameters > Changes in v6: > - Fixed checkpatch.pl reported warnings > Changes in v5: > - Added pl353_smc_get_clkrate function, made pl353_smc_set_cycles as public > API > - Removed nand timing parameter initialization and moved it to nand driver > Changes in v4: > - Modified driver to support multiple instances > - Used sleep instaed of busywait for delay > Changes in v3: > - None > Changes in v2: > - Since now the timing parameters are in nano seconds, added logic to convert > them to the cycles > --- > drivers/memory/Kconfig | 7 + > drivers/memory/Makefile | 1 + > drivers/memory/pl353-smc.c | 548 > > include/linux/platform_data/pl353-smc.h | 34 ++ > 4 files changed, 590 insertions(+) > create mode 100644 drivers/memory/pl353-smc.c > create mode 100644 include/linux/platform_data/pl353-smc.h > > diff --git a/drivers/memory/Kconfig b/drivers/memory/Kconfig > index 19a0e83..d70d6db 100644 > --- a/drivers/memory/Kconfig > +++ b/drivers/memory/Kconfig > @@ -152,6 +152,13 @@ config DA8XX_DDRCTL > This driver is for the DDR2/mDDR Memory Controller present on > Texas Instruments da8xx SoCs. It's used to tweak various memory > controller configuration options. > +config PL35X_SMC > + bool "ARM PL35X Static Memory Controller(SMC) driver" Is there any reason why this can't be tristate? [..] > +++ b/drivers/memory/pl353-smc.c [..] > +/** > + * struct pl353_smc_data - Private smc driver structure > + * @devclk: Pointer to the peripheral clock > + * @aperclk: Pointer to the APER clock > + */ > +struct pl353_smc_data { > + struct clk *memclk; > + struct clk *aclk; > +}; > + > +/* SMC virtual register base */ > +static void __iomem *pl353_smc_base; While it's true that the Zynq chips only have a single instance of this IP, there is no real reason why an SoC can't instantiate more than one. I'm a bit uncomfortable with the fact that this has been designed out. > + > +/** > + * pl353_smc_set_buswidth - Set memory buswidth > + * @bw: Memory buswidth (8 | 16) > + * Return: 0 on success or negative errno. > + */ > +int pl353_smc_set_buswidth(unsigned int bw) > +{ > + > + if (bw != PL353_SMC_MEM_WIDTH_8 && bw != PL353_SMC_MEM_WIDTH_16) > + return -EINVAL; > + > + writel(bw, pl353_smc_base + PL353_SMC_SET_OPMODE_OFFS); There should be no reason not to use the _relaxed() accessor variants. > + writel(PL353_SMC_DC_UPT_NAND_REGS, pl353_smc_base + > +PL353_SMC_DIRECT_CMD_OFFS); > + > + return 0; > +} > +EXPORT_SYMBOL_GPL(pl353_smc_set_buswidth); > + > +/** > + * pl353_smc_set_cycles - Set memory timing parameters > + * @t0: t_rcread cycle time > + * @t1: t_wcwrite cycle time > + * @t2: t_rea/t_ceoeoutput enable assertion delay > + * @t3: t_wpwrite enable deassertion delay > + * @t4: t_clr/t_pc page cycle time > + * @t5: t_ar/t_ta ID read time/turnaround time > + * @t6: t_rrbusy to RE timing > + * > + * Sets NAND chip specific timing parameters. > + */ > +static void pl353_smc_set_cycles(u32 t0, u32 t1, u32 t2, u32 t3, u32 > + t4, u32 t5, u32 t6) > +{ > + t0 &= PL353_SMC_SET_CYCLES_T0_MASK; > + t1 = (t1 & PL353_SMC_SET_CYCLES_T1_MASK) << > + PL353_SMC_SET_CYCLES_T1_SHIFT; > + t2 = (t2 & PL353_SMC_SET_CYCLES_T2_MASK) << > + PL353_SMC_SET_CYCLES_T2_SHIFT; > + t3 = (t3 & PL353_SMC_SET_CYCLES_T3_MASK) << > + PL353_SMC_SET_CYCLES_T3_SHIFT; > + t4 = (t4 & PL353_SMC_SET_CYCLES_T4_MASK) << > + PL353_SMC_SET_CYCLES_T4_SHIFT; > + t5 = (t5 & PL353_SMC_SET_CYCLES_T5_MASK) << > + PL353_SMC_SET_CYCLES_T5_SHIFT; > + t6 = (t6 & PL353_SMC_SET_CYCLES_T6_MASK) << > + PL353_SMC_SET_CYCLES_T6_SHIFT; > + > + t0 |= t1 | t2 | t3 | t4 | t5 | t6; > + > + writel(t0, pl353_smc_base + PL353_SMC_SET_CYCLES_OFFS); > +
Re: [LINUX PATCH v8 2/2] memory: pl353: Add driver for arm pl353 static memory controller
Hello- On Wed, Mar 14, 2018 at 04:14:34PM +0530, nagasureshkumarre...@gmail.com wrote: > From: Naga Sureshkumar Relli I'm pleased to see this patchset revived and resubmitted! It would be easier to follow if you constructed your two patchsets with git format-patch --thread. Or, merge them into a single patchset, especially considering the dependency between patches. Some code review comments below: > Add driver for arm pl353 static memory controller. This controller is > used in xilinx zynq soc for interfacing the nand and nor/sram memory > devices. > > Signed-off-by: Naga Sureshkumar Relli > --- > Changes in v8: > - None > Changes in v7: > - Corrected the kconfig to use tristate selection > - Corrected the GPL licence ident > - Added boundary checks for nand timing parameters > Changes in v6: > - Fixed checkpatch.pl reported warnings > Changes in v5: > - Added pl353_smc_get_clkrate function, made pl353_smc_set_cycles as public > API > - Removed nand timing parameter initialization and moved it to nand driver > Changes in v4: > - Modified driver to support multiple instances > - Used sleep instaed of busywait for delay > Changes in v3: > - None > Changes in v2: > - Since now the timing parameters are in nano seconds, added logic to convert > them to the cycles > --- > drivers/memory/Kconfig | 7 + > drivers/memory/Makefile | 1 + > drivers/memory/pl353-smc.c | 548 > > include/linux/platform_data/pl353-smc.h | 34 ++ > 4 files changed, 590 insertions(+) > create mode 100644 drivers/memory/pl353-smc.c > create mode 100644 include/linux/platform_data/pl353-smc.h > > diff --git a/drivers/memory/Kconfig b/drivers/memory/Kconfig > index 19a0e83..d70d6db 100644 > --- a/drivers/memory/Kconfig > +++ b/drivers/memory/Kconfig > @@ -152,6 +152,13 @@ config DA8XX_DDRCTL > This driver is for the DDR2/mDDR Memory Controller present on > Texas Instruments da8xx SoCs. It's used to tweak various memory > controller configuration options. > +config PL35X_SMC > + bool "ARM PL35X Static Memory Controller(SMC) driver" Is there any reason why this can't be tristate? [..] > +++ b/drivers/memory/pl353-smc.c [..] > +/** > + * struct pl353_smc_data - Private smc driver structure > + * @devclk: Pointer to the peripheral clock > + * @aperclk: Pointer to the APER clock > + */ > +struct pl353_smc_data { > + struct clk *memclk; > + struct clk *aclk; > +}; > + > +/* SMC virtual register base */ > +static void __iomem *pl353_smc_base; While it's true that the Zynq chips only have a single instance of this IP, there is no real reason why an SoC can't instantiate more than one. I'm a bit uncomfortable with the fact that this has been designed out. > + > +/** > + * pl353_smc_set_buswidth - Set memory buswidth > + * @bw: Memory buswidth (8 | 16) > + * Return: 0 on success or negative errno. > + */ > +int pl353_smc_set_buswidth(unsigned int bw) > +{ > + > + if (bw != PL353_SMC_MEM_WIDTH_8 && bw != PL353_SMC_MEM_WIDTH_16) > + return -EINVAL; > + > + writel(bw, pl353_smc_base + PL353_SMC_SET_OPMODE_OFFS); There should be no reason not to use the _relaxed() accessor variants. > + writel(PL353_SMC_DC_UPT_NAND_REGS, pl353_smc_base + > +PL353_SMC_DIRECT_CMD_OFFS); > + > + return 0; > +} > +EXPORT_SYMBOL_GPL(pl353_smc_set_buswidth); > + > +/** > + * pl353_smc_set_cycles - Set memory timing parameters > + * @t0: t_rcread cycle time > + * @t1: t_wcwrite cycle time > + * @t2: t_rea/t_ceoeoutput enable assertion delay > + * @t3: t_wpwrite enable deassertion delay > + * @t4: t_clr/t_pc page cycle time > + * @t5: t_ar/t_ta ID read time/turnaround time > + * @t6: t_rrbusy to RE timing > + * > + * Sets NAND chip specific timing parameters. > + */ > +static void pl353_smc_set_cycles(u32 t0, u32 t1, u32 t2, u32 t3, u32 > + t4, u32 t5, u32 t6) > +{ > + t0 &= PL353_SMC_SET_CYCLES_T0_MASK; > + t1 = (t1 & PL353_SMC_SET_CYCLES_T1_MASK) << > + PL353_SMC_SET_CYCLES_T1_SHIFT; > + t2 = (t2 & PL353_SMC_SET_CYCLES_T2_MASK) << > + PL353_SMC_SET_CYCLES_T2_SHIFT; > + t3 = (t3 & PL353_SMC_SET_CYCLES_T3_MASK) << > + PL353_SMC_SET_CYCLES_T3_SHIFT; > + t4 = (t4 & PL353_SMC_SET_CYCLES_T4_MASK) << > + PL353_SMC_SET_CYCLES_T4_SHIFT; > + t5 = (t5 & PL353_SMC_SET_CYCLES_T5_MASK) << > + PL353_SMC_SET_CYCLES_T5_SHIFT; > + t6 = (t6 & PL353_SMC_SET_CYCLES_T6_MASK) << > + PL353_SMC_SET_CYCLES_T6_SHIFT; > + > + t0 |= t1 | t2 | t3 | t4 | t5 | t6; > + > + writel(t0, pl353_smc_base + PL353_SMC_SET_CYCLES_OFFS); > + writel(PL353_SMC_DC_UPT_NAND_REGS, pl353_smc_base + > +
Re: [PATCH RT] Defer migrate_enable migration while task state != TASK_RUNNING
On Fri, Mar 23, 2018 at 01:21:31PM -0400, joe.ko...@concurrent-rt.com wrote: > On Fri, Mar 23, 2018 at 11:59:21AM -0500, Julia Cartwright wrote: > > On Fri, Mar 23, 2018 at 11:09:59AM -0400, joe.ko...@concurrent-rt.com wrote: > > > I see the below kernel splat in 4.9-rt when I run a test program that > > > continually changes the affinity of some set of running pids: > > > > > >do not call blocking ops when !TASK_RUNNING; state=2 set at ... > > > ... > > > stop_one_cpu+0x60/0x80 > > > migrate_enable+0x21f/0x3e0 > > > rt_spin_unlock+0x2f/0x40 > > > prepare_to_wait+0x5c/0x80 > > > ... > > > > This is clearly a problem. > > > > > The reason is that spin_unlock, write_unlock, and read_unlock call > > > migrate_enable, and since 4.4-rt, migrate_enable will sleep if it > > > discovers > > > that a migration is in order. But sleeping in the unlock services is not > > > expected by most kernel developers, > > > > I don't buy this, see below: > > > > > and where that counts most is in code sequences like the following: > > > > > > set_current_state(TASK_UNINTERRUPIBLE); > > > spin_unlock(); > > > schedule(); > > > > The analog in mainline is CONFIG_PREEMPT and the implicit > > preempt_enable() in spin_unlock(). In this configuration, a kernel > > developer should _absolutely_ expect their task to be suspended (and > > potentially migrated), _regardless of the task state_ if there is a > > preemption event on the CPU on which this task is executing. > > > > Similarly, on RT, there is nothing _conceptually_ wrong on RT with > > migrating on migrate_enable(), regardless of task state, if there is a > > pending migration event. > > My understanding is, in standard Linux and in rt, setting > task state to anything other than TASK_RUNNING in of itself > blocks preemption. I'm assuming you're referring to the window of time between a task setting its state to !TASK_RUNNING and schedule()? The task remains preemptible in this window. > A preemption is not really needed here as it is expected that there is > a schedule() written in that will shortly be executed. > > And if a 'involuntary schedule' (ie, preemption) were allowed to occur > between the task state set and the schedule(), that would change the > task state back to TASK_RUNNING; This isn't the case. A preempted task preserves its state. Julia
Re: [PATCH RT] Defer migrate_enable migration while task state != TASK_RUNNING
On Fri, Mar 23, 2018 at 01:21:31PM -0400, joe.ko...@concurrent-rt.com wrote: > On Fri, Mar 23, 2018 at 11:59:21AM -0500, Julia Cartwright wrote: > > On Fri, Mar 23, 2018 at 11:09:59AM -0400, joe.ko...@concurrent-rt.com wrote: > > > I see the below kernel splat in 4.9-rt when I run a test program that > > > continually changes the affinity of some set of running pids: > > > > > >do not call blocking ops when !TASK_RUNNING; state=2 set at ... > > > ... > > > stop_one_cpu+0x60/0x80 > > > migrate_enable+0x21f/0x3e0 > > > rt_spin_unlock+0x2f/0x40 > > > prepare_to_wait+0x5c/0x80 > > > ... > > > > This is clearly a problem. > > > > > The reason is that spin_unlock, write_unlock, and read_unlock call > > > migrate_enable, and since 4.4-rt, migrate_enable will sleep if it > > > discovers > > > that a migration is in order. But sleeping in the unlock services is not > > > expected by most kernel developers, > > > > I don't buy this, see below: > > > > > and where that counts most is in code sequences like the following: > > > > > > set_current_state(TASK_UNINTERRUPIBLE); > > > spin_unlock(); > > > schedule(); > > > > The analog in mainline is CONFIG_PREEMPT and the implicit > > preempt_enable() in spin_unlock(). In this configuration, a kernel > > developer should _absolutely_ expect their task to be suspended (and > > potentially migrated), _regardless of the task state_ if there is a > > preemption event on the CPU on which this task is executing. > > > > Similarly, on RT, there is nothing _conceptually_ wrong on RT with > > migrating on migrate_enable(), regardless of task state, if there is a > > pending migration event. > > My understanding is, in standard Linux and in rt, setting > task state to anything other than TASK_RUNNING in of itself > blocks preemption. I'm assuming you're referring to the window of time between a task setting its state to !TASK_RUNNING and schedule()? The task remains preemptible in this window. > A preemption is not really needed here as it is expected that there is > a schedule() written in that will shortly be executed. > > And if a 'involuntary schedule' (ie, preemption) were allowed to occur > between the task state set and the schedule(), that would change the > task state back to TASK_RUNNING; This isn't the case. A preempted task preserves its state. Julia
Re: [PATCH RT] Defer migrate_enable migration while task state != TASK_RUNNING
Hey Joe- Thanks for the writeup. On Fri, Mar 23, 2018 at 11:09:59AM -0400, joe.ko...@concurrent-rt.com wrote: > I see the below kernel splat in 4.9-rt when I run a test program that > continually changes the affinity of some set of running pids: > >do not call blocking ops when !TASK_RUNNING; state=2 set at ... > ... > stop_one_cpu+0x60/0x80 > migrate_enable+0x21f/0x3e0 > rt_spin_unlock+0x2f/0x40 > prepare_to_wait+0x5c/0x80 > ... This is clearly a problem. > The reason is that spin_unlock, write_unlock, and read_unlock call > migrate_enable, and since 4.4-rt, migrate_enable will sleep if it discovers > that a migration is in order. But sleeping in the unlock services is not > expected by most kernel developers, I don't buy this, see below: > and where that counts most is in code sequences like the following: > > set_current_state(TASK_UNINTERRUPIBLE); > spin_unlock(); > schedule(); The analog in mainline is CONFIG_PREEMPT and the implicit preempt_enable() in spin_unlock(). In this configuration, a kernel developer should _absolutely_ expect their task to be suspended (and potentially migrated), _regardless of the task state_ if there is a preemption event on the CPU on which this task is executing. Similarly, on RT, there is nothing _conceptually_ wrong on RT with migrating on migrate_enable(), regardless of task state, if there is a pending migration event. It's clear, however, that the mechanism used here is broken ... Julia
Re: [PATCH RT] Defer migrate_enable migration while task state != TASK_RUNNING
Hey Joe- Thanks for the writeup. On Fri, Mar 23, 2018 at 11:09:59AM -0400, joe.ko...@concurrent-rt.com wrote: > I see the below kernel splat in 4.9-rt when I run a test program that > continually changes the affinity of some set of running pids: > >do not call blocking ops when !TASK_RUNNING; state=2 set at ... > ... > stop_one_cpu+0x60/0x80 > migrate_enable+0x21f/0x3e0 > rt_spin_unlock+0x2f/0x40 > prepare_to_wait+0x5c/0x80 > ... This is clearly a problem. > The reason is that spin_unlock, write_unlock, and read_unlock call > migrate_enable, and since 4.4-rt, migrate_enable will sleep if it discovers > that a migration is in order. But sleeping in the unlock services is not > expected by most kernel developers, I don't buy this, see below: > and where that counts most is in code sequences like the following: > > set_current_state(TASK_UNINTERRUPIBLE); > spin_unlock(); > schedule(); The analog in mainline is CONFIG_PREEMPT and the implicit preempt_enable() in spin_unlock(). In this configuration, a kernel developer should _absolutely_ expect their task to be suspended (and potentially migrated), _regardless of the task state_ if there is a preemption event on the CPU on which this task is executing. Similarly, on RT, there is nothing _conceptually_ wrong on RT with migrating on migrate_enable(), regardless of task state, if there is a pending migration event. It's clear, however, that the mechanism used here is broken ... Julia
Re: [PATCH v2 1/8] [PATCH 1/8] drivers/peci: Add support for PECI bus driver core
On Wed, Feb 21, 2018 at 08:15:59AM -0800, Jae Hyun Yoo wrote: > This commit adds driver implementation for PECI bus into linux > driver framework. > > Signed-off-by: Jae Hyun Yoo> --- [..] > +static int peci_locked_xfer(struct peci_adapter *adapter, > + struct peci_xfer_msg *msg, > + bool do_retry, > + bool has_aw_fcs) _locked generally means that this function is invoked with some critical lock held, what lock does the caller need to acquire before invoking this function? > +{ > + ktime_t start, end; > + s64 elapsed_ms; > + int rc = 0; > + > + if (!adapter->xfer) { Is this really an optional feature of an adapter? If this is not optional, then this check should be in place when the adapter is registered, not here. (And it should WARN_ON(), because it's a driver developer error). > + dev_dbg(>dev, "PECI level transfers not supported\n"); > + return -ENODEV; > + } > + > + if (in_atomic() || irqs_disabled()) { As Andrew mentioned, this is broken. You don't even need a might_sleep(). The locking functions you use here will already include a might_sleep() w/ CONFIG_DEBUG_ATOMIC_SLEEP. > + rt_mutex_trylock(>bus_lock); > + if (!rc) > + return -EAGAIN; /* PECI activity is ongoing */ > + } else { > + rt_mutex_lock(>bus_lock); > + } > + > + if (do_retry) > + start = ktime_get(); > + > + do { > + rc = adapter->xfer(adapter, msg); > + > + if (!do_retry) > + break; > + > + /* Per the PECI spec, need to retry commands that return 0x8x */ > + if (!(!rc && ((msg->rx_buf[0] & DEV_PECI_CC_RETRY_ERR_MASK) == > + DEV_PECI_CC_TIMEOUT))) > + break; This is pretty difficult to parse. Can you split it into two different conditions? > + > + /* Set the retry bit to indicate a retry attempt */ > + msg->tx_buf[1] |= DEV_PECI_RETRY_BIT; Are you sure this bit is to be set in the _second_ byte of tx_buf? > + > + /* Recalculate the AW FCS if it has one */ > + if (has_aw_fcs) > + msg->tx_buf[msg->tx_len - 1] = 0x80 ^ > + peci_aw_fcs((u8 *)msg, > + 2 + msg->tx_len); > + > + /* Retry for at least 250ms before returning an error */ > + end = ktime_get(); > + elapsed_ms = ktime_to_ms(ktime_sub(end, start)); > + if (elapsed_ms >= DEV_PECI_RETRY_TIME_MS) { > + dev_dbg(>dev, "Timeout retrying xfer!\n"); > + break; > + } > + } while (true); > + > + rt_mutex_unlock(>bus_lock); > + > + return rc; > +} > + > +static int peci_xfer(struct peci_adapter *adapter, struct peci_xfer_msg *msg) > +{ > + return peci_locked_xfer(adapter, msg, false, false); > +} > + > +static int peci_xfer_with_retries(struct peci_adapter *adapter, > + struct peci_xfer_msg *msg, > + bool has_aw_fcs) > +{ > + return peci_locked_xfer(adapter, msg, true, has_aw_fcs); > +} > + > +static int peci_scan_cmd_mask(struct peci_adapter *adapter) > +{ > + struct peci_xfer_msg msg; > + u32 dib; > + int rc = 0; > + > + /* Update command mask just once */ > + if (adapter->cmd_mask & BIT(PECI_CMD_PING)) > + return 0; > + > + msg.addr = PECI_BASE_ADDR; > + msg.tx_len= GET_DIB_WR_LEN; > + msg.rx_len= GET_DIB_RD_LEN; > + msg.tx_buf[0] = GET_DIB_PECI_CMD; > + > + rc = peci_xfer(adapter, ); > + if (rc < 0) { > + dev_dbg(>dev, "PECI xfer error, rc : %d\n", rc); > + return rc; > + } > + > + dib = msg.rx_buf[0] | (msg.rx_buf[1] << 8) | > + (msg.rx_buf[2] << 16) | (msg.rx_buf[3] << 24); > + > + /* Check special case for Get DIB command */ > + if (dib == 0x00) { > + dev_dbg(>dev, "DIB read as 0x00\n"); > + return -1; > + } > + > + if (!rc) { You should change this to: if (rc) { dev_dbg(>dev, "Error reading DIB, rc : %d\n", rc); return rc; } And then leave the happy path below unindented. > + /** > + * setting up the supporting commands based on minor rev# > + * see PECI Spec Table 3-1 > + */ > + dib = (dib >> 8) & 0xF; > + > + if (dib >= 0x1) { > + adapter->cmd_mask |= BIT(PECI_CMD_RD_PKG_CFG); > + adapter->cmd_mask |= BIT(PECI_CMD_WR_PKG_CFG); > + } > + > + if (dib >= 0x2) > + adapter->cmd_mask |= BIT(PECI_CMD_RD_IA_MSR); > + > + if
Re: [PATCH v2 1/8] [PATCH 1/8] drivers/peci: Add support for PECI bus driver core
On Wed, Feb 21, 2018 at 08:15:59AM -0800, Jae Hyun Yoo wrote: > This commit adds driver implementation for PECI bus into linux > driver framework. > > Signed-off-by: Jae Hyun Yoo > --- [..] > +static int peci_locked_xfer(struct peci_adapter *adapter, > + struct peci_xfer_msg *msg, > + bool do_retry, > + bool has_aw_fcs) _locked generally means that this function is invoked with some critical lock held, what lock does the caller need to acquire before invoking this function? > +{ > + ktime_t start, end; > + s64 elapsed_ms; > + int rc = 0; > + > + if (!adapter->xfer) { Is this really an optional feature of an adapter? If this is not optional, then this check should be in place when the adapter is registered, not here. (And it should WARN_ON(), because it's a driver developer error). > + dev_dbg(>dev, "PECI level transfers not supported\n"); > + return -ENODEV; > + } > + > + if (in_atomic() || irqs_disabled()) { As Andrew mentioned, this is broken. You don't even need a might_sleep(). The locking functions you use here will already include a might_sleep() w/ CONFIG_DEBUG_ATOMIC_SLEEP. > + rt_mutex_trylock(>bus_lock); > + if (!rc) > + return -EAGAIN; /* PECI activity is ongoing */ > + } else { > + rt_mutex_lock(>bus_lock); > + } > + > + if (do_retry) > + start = ktime_get(); > + > + do { > + rc = adapter->xfer(adapter, msg); > + > + if (!do_retry) > + break; > + > + /* Per the PECI spec, need to retry commands that return 0x8x */ > + if (!(!rc && ((msg->rx_buf[0] & DEV_PECI_CC_RETRY_ERR_MASK) == > + DEV_PECI_CC_TIMEOUT))) > + break; This is pretty difficult to parse. Can you split it into two different conditions? > + > + /* Set the retry bit to indicate a retry attempt */ > + msg->tx_buf[1] |= DEV_PECI_RETRY_BIT; Are you sure this bit is to be set in the _second_ byte of tx_buf? > + > + /* Recalculate the AW FCS if it has one */ > + if (has_aw_fcs) > + msg->tx_buf[msg->tx_len - 1] = 0x80 ^ > + peci_aw_fcs((u8 *)msg, > + 2 + msg->tx_len); > + > + /* Retry for at least 250ms before returning an error */ > + end = ktime_get(); > + elapsed_ms = ktime_to_ms(ktime_sub(end, start)); > + if (elapsed_ms >= DEV_PECI_RETRY_TIME_MS) { > + dev_dbg(>dev, "Timeout retrying xfer!\n"); > + break; > + } > + } while (true); > + > + rt_mutex_unlock(>bus_lock); > + > + return rc; > +} > + > +static int peci_xfer(struct peci_adapter *adapter, struct peci_xfer_msg *msg) > +{ > + return peci_locked_xfer(adapter, msg, false, false); > +} > + > +static int peci_xfer_with_retries(struct peci_adapter *adapter, > + struct peci_xfer_msg *msg, > + bool has_aw_fcs) > +{ > + return peci_locked_xfer(adapter, msg, true, has_aw_fcs); > +} > + > +static int peci_scan_cmd_mask(struct peci_adapter *adapter) > +{ > + struct peci_xfer_msg msg; > + u32 dib; > + int rc = 0; > + > + /* Update command mask just once */ > + if (adapter->cmd_mask & BIT(PECI_CMD_PING)) > + return 0; > + > + msg.addr = PECI_BASE_ADDR; > + msg.tx_len= GET_DIB_WR_LEN; > + msg.rx_len= GET_DIB_RD_LEN; > + msg.tx_buf[0] = GET_DIB_PECI_CMD; > + > + rc = peci_xfer(adapter, ); > + if (rc < 0) { > + dev_dbg(>dev, "PECI xfer error, rc : %d\n", rc); > + return rc; > + } > + > + dib = msg.rx_buf[0] | (msg.rx_buf[1] << 8) | > + (msg.rx_buf[2] << 16) | (msg.rx_buf[3] << 24); > + > + /* Check special case for Get DIB command */ > + if (dib == 0x00) { > + dev_dbg(>dev, "DIB read as 0x00\n"); > + return -1; > + } > + > + if (!rc) { You should change this to: if (rc) { dev_dbg(>dev, "Error reading DIB, rc : %d\n", rc); return rc; } And then leave the happy path below unindented. > + /** > + * setting up the supporting commands based on minor rev# > + * see PECI Spec Table 3-1 > + */ > + dib = (dib >> 8) & 0xF; > + > + if (dib >= 0x1) { > + adapter->cmd_mask |= BIT(PECI_CMD_RD_PKG_CFG); > + adapter->cmd_mask |= BIT(PECI_CMD_WR_PKG_CFG); > + } > + > + if (dib >= 0x2) > + adapter->cmd_mask |= BIT(PECI_CMD_RD_IA_MSR); > + > + if (dib >= 0x3) { > +
Re: C tricks for efficient stack zeroing
On Fri, Mar 02, 2018 at 08:50:17PM +0100, Jason A. Donenfeld wrote: [..] > What would be really nice would be to somehow keep track of the > maximum stack depth, and just before the function returns, clear from > the maximum depth to its stack base, all in one single call. This > would not only make the code faster and less brittle, but it would > also clean up some algorithms quite a bit. > > Ideally this would take the form of a gcc attribute on the function, > but I was unable to find anything of that nature. I started looking > for little C tricks for this, and came up dry too. I realize I could > probably just take the current stack address and zero out until _the > very end_ but that seems to overshoot and would probably be bad for > performance. The best I've been able to do come up with are some > x86-specific macros, but that approach seems a bit underwhelming. > Other approaches include adding a new attribute via the gcc plugin > system, which could make this kind of thing more complete [cc'ing > pipacs in case he's thought about that before]. Can objtool support a static stack usage analysis? I'm wondering if it's possible to place these sensitive functions in a special linker section, like .text.stackzero.; objtool could collect static call data (as it already does) and stack usage, spitting out a symbol definition stackzero__max_depth, which you could then use to bound your zeroing. Obviously this is a static analysis, with the limitations therein. Julia
Re: C tricks for efficient stack zeroing
On Fri, Mar 02, 2018 at 08:50:17PM +0100, Jason A. Donenfeld wrote: [..] > What would be really nice would be to somehow keep track of the > maximum stack depth, and just before the function returns, clear from > the maximum depth to its stack base, all in one single call. This > would not only make the code faster and less brittle, but it would > also clean up some algorithms quite a bit. > > Ideally this would take the form of a gcc attribute on the function, > but I was unable to find anything of that nature. I started looking > for little C tricks for this, and came up dry too. I realize I could > probably just take the current stack address and zero out until _the > very end_ but that seems to overshoot and would probably be bad for > performance. The best I've been able to do come up with are some > x86-specific macros, but that approach seems a bit underwhelming. > Other approaches include adding a new attribute via the gcc plugin > system, which could make this kind of thing more complete [cc'ing > pipacs in case he's thought about that before]. Can objtool support a static stack usage analysis? I'm wondering if it's possible to place these sensitive functions in a special linker section, like .text.stackzero.; objtool could collect static call data (as it already does) and stack usage, spitting out a symbol definition stackzero__max_depth, which you could then use to bound your zeroing. Obviously this is a static analysis, with the limitations therein. Julia
[ANNOUNCE] 4.9.84-rt62
Hello RT Folks! I'm pleased to announce the 4.9.84-rt62 stable release. This release is just an update to the new stable 4.9.84 version and no RT specific changes have been made. You can get this release via the git tree at: git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git branch: v4.9-rt Head SHA1: be42cd02846a611af533103a3f4b6a7d8c592f49 Or to build 4.9.84-rt62 directly, the following patches should be applied: http://www.kernel.org/pub/linux/kernel/v4.x/linux-4.9.tar.xz http://www.kernel.org/pub/linux/kernel/v4.x/patch-4.9.84.xz http://www.kernel.org/pub/linux/kernel/projects/rt/4.9/patch-4.9.84-rt62.patch.xz Enjoy! Julia
[ANNOUNCE] 4.9.84-rt62
Hello RT Folks! I'm pleased to announce the 4.9.84-rt62 stable release. This release is just an update to the new stable 4.9.84 version and no RT specific changes have been made. You can get this release via the git tree at: git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git branch: v4.9-rt Head SHA1: be42cd02846a611af533103a3f4b6a7d8c592f49 Or to build 4.9.84-rt62 directly, the following patches should be applied: http://www.kernel.org/pub/linux/kernel/v4.x/linux-4.9.tar.xz http://www.kernel.org/pub/linux/kernel/v4.x/patch-4.9.84.xz http://www.kernel.org/pub/linux/kernel/projects/rt/4.9/patch-4.9.84-rt62.patch.xz Enjoy! Julia
Re: [PATCH 1/2] kernel/sofirq: consolidate common code in __tasklet_schedule() + _hi_
On Thu, Feb 15, 2018 at 03:07:07PM -0500, Steven Rostedt wrote: > On Thu, 15 Feb 2018 18:20:41 +0100 > Sebastian Andrzej Siewiorwrote: > > > -void __tasklet_schedule(struct tasklet_struct *t) > > +static void __tasklet_schedule_common(struct tasklet_struct *t, > > + struct tasklet_head *head, > > + unsigned int softirq_nr) > > { > > unsigned long flags; > > > > local_irq_save(flags); > > If you look at the original patch, it did not move local_irq_save() > into the common function. > > > t->next = NULL; > > - *__this_cpu_read(tasklet_vec.tail) = t; > > - __this_cpu_write(tasklet_vec.tail, &(t->next)); > > - raise_softirq_irqoff(TASKLET_SOFTIRQ); > > + *head->tail = t; > > + head->tail = &(t->next); > > + raise_softirq_irqoff(softirq_nr); > > local_irq_restore(flags); > > } > > + > > +void __tasklet_schedule(struct tasklet_struct *t) > > +{ > > + __tasklet_schedule_common(t, this_cpu_ptr(_vec), > > What can happen is, we reference (tasklet_vec) on one CPU, get > preempted (running in ksoftirqd), scheduled on another CPU, then when > inside the common code, we are executing on a different CPU than the > tasklet is for. The rasise_softirq() is happening on the wrong CPU. > > The local_irq_save() can't be moved to the common function. It must be > done by each individual function. Well, it can be, but the percpu access needs to go with it; so an alternative solution would be the below. I'm also wondering whether the t->next = NULL assignment could be lifted out from irqs-disabled region. Julia diff --git a/kernel/softirq.c b/kernel/softirq.c index fa7ed89a9fcf..177de3640c78 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -461,12 +461,14 @@ static DEFINE_PER_CPU(struct tasklet_head, tasklet_vec); static DEFINE_PER_CPU(struct tasklet_head, tasklet_hi_vec); static void __tasklet_schedule_common(struct tasklet_struct *t, - struct tasklet_head *head, + struct tasklet_head __percpu *headp, unsigned int softirq_nr) { + struct tasklet_head *head; unsigned long flags; local_irq_save(flags); + head = this_cpu_ptr(headp); t->next = NULL; *head->tail = t; head->tail = &(t->next); @@ -476,14 +478,14 @@ static void __tasklet_schedule_common(struct tasklet_struct *t, void __tasklet_schedule(struct tasklet_struct *t) { - __tasklet_schedule_common(t, this_cpu_ptr(_vec), + __tasklet_schedule_common(t, _vec, TASKLET_SOFTIRQ); } EXPORT_SYMBOL(__tasklet_schedule); void __tasklet_hi_schedule(struct tasklet_struct *t) { - __tasklet_schedule_common(t, this_cpu_ptr(_hi_vec), + __tasklet_schedule_common(t, _hi_vec, HI_SOFTIRQ); } EXPORT_SYMBOL(__tasklet_hi_schedule);
Re: [PATCH 1/2] kernel/sofirq: consolidate common code in __tasklet_schedule() + _hi_
On Thu, Feb 15, 2018 at 03:07:07PM -0500, Steven Rostedt wrote: > On Thu, 15 Feb 2018 18:20:41 +0100 > Sebastian Andrzej Siewior wrote: > > > -void __tasklet_schedule(struct tasklet_struct *t) > > +static void __tasklet_schedule_common(struct tasklet_struct *t, > > + struct tasklet_head *head, > > + unsigned int softirq_nr) > > { > > unsigned long flags; > > > > local_irq_save(flags); > > If you look at the original patch, it did not move local_irq_save() > into the common function. > > > t->next = NULL; > > - *__this_cpu_read(tasklet_vec.tail) = t; > > - __this_cpu_write(tasklet_vec.tail, &(t->next)); > > - raise_softirq_irqoff(TASKLET_SOFTIRQ); > > + *head->tail = t; > > + head->tail = &(t->next); > > + raise_softirq_irqoff(softirq_nr); > > local_irq_restore(flags); > > } > > + > > +void __tasklet_schedule(struct tasklet_struct *t) > > +{ > > + __tasklet_schedule_common(t, this_cpu_ptr(_vec), > > What can happen is, we reference (tasklet_vec) on one CPU, get > preempted (running in ksoftirqd), scheduled on another CPU, then when > inside the common code, we are executing on a different CPU than the > tasklet is for. The rasise_softirq() is happening on the wrong CPU. > > The local_irq_save() can't be moved to the common function. It must be > done by each individual function. Well, it can be, but the percpu access needs to go with it; so an alternative solution would be the below. I'm also wondering whether the t->next = NULL assignment could be lifted out from irqs-disabled region. Julia diff --git a/kernel/softirq.c b/kernel/softirq.c index fa7ed89a9fcf..177de3640c78 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -461,12 +461,14 @@ static DEFINE_PER_CPU(struct tasklet_head, tasklet_vec); static DEFINE_PER_CPU(struct tasklet_head, tasklet_hi_vec); static void __tasklet_schedule_common(struct tasklet_struct *t, - struct tasklet_head *head, + struct tasklet_head __percpu *headp, unsigned int softirq_nr) { + struct tasklet_head *head; unsigned long flags; local_irq_save(flags); + head = this_cpu_ptr(headp); t->next = NULL; *head->tail = t; head->tail = &(t->next); @@ -476,14 +478,14 @@ static void __tasklet_schedule_common(struct tasklet_struct *t, void __tasklet_schedule(struct tasklet_struct *t) { - __tasklet_schedule_common(t, this_cpu_ptr(_vec), + __tasklet_schedule_common(t, _vec, TASKLET_SOFTIRQ); } EXPORT_SYMBOL(__tasklet_schedule); void __tasklet_hi_schedule(struct tasklet_struct *t) { - __tasklet_schedule_common(t, this_cpu_ptr(_hi_vec), + __tasklet_schedule_common(t, _hi_vec, HI_SOFTIRQ); } EXPORT_SYMBOL(__tasklet_hi_schedule);
[PATCH 4.9.y] ubifs: Massage assert in ubifs_xattr_set() wrt. init_xattrs
From: Xiaolei Li <xiaolei...@mediatek.com> This is a conceptual cherry-pick of commit d8db5b1ca9d4c57e49893d0f78e6d5ce81450cc8 upstream. The inode is not locked in init_xattrs when creating a new inode. Without this patch, there will occurs assert when booting or creating a new file, if the kernel config CONFIG_SECURITY_SMACK is enabled. Log likes: UBIFS assert failed in ubifs_xattr_set at 298 (pid 1156) CPU: 1 PID: 1156 Comm: ldconfig Tainted: G S 4.12.0-rc1-207440-g1e70b02 #2 Hardware name: MediaTek MT2712 evaluation board (DT) Call trace: [] dump_backtrace+0x0/0x238 [] show_stack+0x14/0x20 [] dump_stack+0x9c/0xc0 [] ubifs_xattr_set+0x374/0x5e0 [] init_xattrs+0x5c/0xb8 [] security_inode_init_security+0x110/0x190 [] ubifs_init_security+0x30/0x68 [] ubifs_mkdir+0x100/0x200 [] vfs_mkdir+0x11c/0x1b8 [] SyS_mkdirat+0x74/0xd0 [] __sys_trace_return+0x0/0x4 Signed-off-by: Xiaolei Li <xiaolei...@mediatek.com> Signed-off-by: Richard Weinberger <rich...@nod.at> Cc: sta...@vger.kernel.org (julia: massaged to apply to 4.9.y, which doesn't contain fscrypto support) Signed-off-by: Julia Cartwright <ju...@ni.com> --- Hey all- We reproduced the issue fixed upstream by Xiaolei Li's commit in 4.9.y, with the very similar backtrace: UBIFS assert failed in __ubifs_setxattr at 282 (pid 1362) CPU: 1 PID: 1362 Comm: sed Not tainted 4.9.47-rt37 #1 Backtrace: (dump_backtrace) from (show_stack+0x20/0x24) (show_stack) from (dump_stack+0x80/0xa0) (dump_stack) from (__ubifs_setxattr+0x84/0x634) (__ubifs_setxattr) from (init_xattrs+0x70/0xac) (init_xattrs) from (security_inode_init_security+0x100/0x144) (security_inode_init_security) from (ubifs_init_security+0x38/0x6c) (ubifs_init_security) from (ubifs_create+0xe8/0x1fc) (ubifs_create) from (path_openat+0x97c/0x1090) (path_openat) from (do_filp_open+0x48/0x94) (do_filp_open) from (do_sys_open+0x134/0x1d8) (do_sys_open) from (SyS_open+0x30/0x34) (SyS_open) from (ret_fast_syscall+0x0/0x3c) Please consider applying this to 4.9.y at the very least, it may apply further back as well. Thanks! Julia fs/ubifs/xattr.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/fs/ubifs/xattr.c b/fs/ubifs/xattr.c index d9f9615bfd71..3979d767a0cb 100644 --- a/fs/ubifs/xattr.c +++ b/fs/ubifs/xattr.c @@ -270,7 +270,8 @@ static struct inode *iget_xattr(struct ubifs_info *c, ino_t inum) } static int __ubifs_setxattr(struct inode *host, const char *name, - const void *value, size_t size, int flags) + const void *value, size_t size, int flags, + bool check_lock) { struct inode *inode; struct ubifs_info *c = host->i_sb->s_fs_info; @@ -279,7 +280,8 @@ static int __ubifs_setxattr(struct inode *host, const char *name, union ubifs_key key; int err; - ubifs_assert(inode_is_locked(host)); + if (check_lock) + ubifs_assert(inode_is_locked(host)); if (size > UBIFS_MAX_INO_DATA) return -ERANGE; @@ -548,7 +550,8 @@ static int init_xattrs(struct inode *inode, const struct xattr *xattr_array, } strcpy(name, XATTR_SECURITY_PREFIX); strcpy(name + XATTR_SECURITY_PREFIX_LEN, xattr->name); - err = __ubifs_setxattr(inode, name, xattr->value, xattr->value_len, 0); + err = __ubifs_setxattr(inode, name, xattr->value, + xattr->value_len, 0, false); kfree(name); if (err < 0) break; @@ -594,7 +597,8 @@ static int ubifs_xattr_set(const struct xattr_handler *handler, name = xattr_full_name(handler, name); if (value) - return __ubifs_setxattr(inode, name, value, size, flags); + return __ubifs_setxattr(inode, name, value, size, flags, + true); else return __ubifs_removexattr(inode, name); } -- 2.16.1
[PATCH 4.9.y] ubifs: Massage assert in ubifs_xattr_set() wrt. init_xattrs
From: Xiaolei Li This is a conceptual cherry-pick of commit d8db5b1ca9d4c57e49893d0f78e6d5ce81450cc8 upstream. The inode is not locked in init_xattrs when creating a new inode. Without this patch, there will occurs assert when booting or creating a new file, if the kernel config CONFIG_SECURITY_SMACK is enabled. Log likes: UBIFS assert failed in ubifs_xattr_set at 298 (pid 1156) CPU: 1 PID: 1156 Comm: ldconfig Tainted: G S 4.12.0-rc1-207440-g1e70b02 #2 Hardware name: MediaTek MT2712 evaluation board (DT) Call trace: [] dump_backtrace+0x0/0x238 [] show_stack+0x14/0x20 [] dump_stack+0x9c/0xc0 [] ubifs_xattr_set+0x374/0x5e0 [] init_xattrs+0x5c/0xb8 [] security_inode_init_security+0x110/0x190 [] ubifs_init_security+0x30/0x68 [] ubifs_mkdir+0x100/0x200 [] vfs_mkdir+0x11c/0x1b8 [] SyS_mkdirat+0x74/0xd0 [] __sys_trace_return+0x0/0x4 Signed-off-by: Xiaolei Li Signed-off-by: Richard Weinberger Cc: sta...@vger.kernel.org (julia: massaged to apply to 4.9.y, which doesn't contain fscrypto support) Signed-off-by: Julia Cartwright --- Hey all- We reproduced the issue fixed upstream by Xiaolei Li's commit in 4.9.y, with the very similar backtrace: UBIFS assert failed in __ubifs_setxattr at 282 (pid 1362) CPU: 1 PID: 1362 Comm: sed Not tainted 4.9.47-rt37 #1 Backtrace: (dump_backtrace) from (show_stack+0x20/0x24) (show_stack) from (dump_stack+0x80/0xa0) (dump_stack) from (__ubifs_setxattr+0x84/0x634) (__ubifs_setxattr) from (init_xattrs+0x70/0xac) (init_xattrs) from (security_inode_init_security+0x100/0x144) (security_inode_init_security) from (ubifs_init_security+0x38/0x6c) (ubifs_init_security) from (ubifs_create+0xe8/0x1fc) (ubifs_create) from (path_openat+0x97c/0x1090) (path_openat) from (do_filp_open+0x48/0x94) (do_filp_open) from (do_sys_open+0x134/0x1d8) (do_sys_open) from (SyS_open+0x30/0x34) (SyS_open) from (ret_fast_syscall+0x0/0x3c) Please consider applying this to 4.9.y at the very least, it may apply further back as well. Thanks! Julia fs/ubifs/xattr.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/fs/ubifs/xattr.c b/fs/ubifs/xattr.c index d9f9615bfd71..3979d767a0cb 100644 --- a/fs/ubifs/xattr.c +++ b/fs/ubifs/xattr.c @@ -270,7 +270,8 @@ static struct inode *iget_xattr(struct ubifs_info *c, ino_t inum) } static int __ubifs_setxattr(struct inode *host, const char *name, - const void *value, size_t size, int flags) + const void *value, size_t size, int flags, + bool check_lock) { struct inode *inode; struct ubifs_info *c = host->i_sb->s_fs_info; @@ -279,7 +280,8 @@ static int __ubifs_setxattr(struct inode *host, const char *name, union ubifs_key key; int err; - ubifs_assert(inode_is_locked(host)); + if (check_lock) + ubifs_assert(inode_is_locked(host)); if (size > UBIFS_MAX_INO_DATA) return -ERANGE; @@ -548,7 +550,8 @@ static int init_xattrs(struct inode *inode, const struct xattr *xattr_array, } strcpy(name, XATTR_SECURITY_PREFIX); strcpy(name + XATTR_SECURITY_PREFIX_LEN, xattr->name); - err = __ubifs_setxattr(inode, name, xattr->value, xattr->value_len, 0); + err = __ubifs_setxattr(inode, name, xattr->value, + xattr->value_len, 0, false); kfree(name); if (err < 0) break; @@ -594,7 +597,8 @@ static int ubifs_xattr_set(const struct xattr_handler *handler, name = xattr_full_name(handler, name); if (value) - return __ubifs_setxattr(inode, name, value, size, flags); + return __ubifs_setxattr(inode, name, value, size, flags, + true); else return __ubifs_removexattr(inode, name); } -- 2.16.1
Re: [patch v18 1/4] drivers: jtag: Add JTAG core driver
On Mon, Jan 29, 2018 at 04:31:42PM +0200, Oleksandr Shamray wrote: > Initial patch for JTAG driver > JTAG class driver provide infrastructure to support hardware/software > JTAG platform drivers. It provide user layer API interface for flashing > and debugging external devices which equipped with JTAG interface > using standard transactions. > > Driver exposes set of IOCTL to user space for: > - XFER: > - SIR (Scan Instruction Register, IEEE 1149.1 Data Register scan); > - SDR (Scan Data Register, IEEE 1149.1 Instruction Register scan); > - RUNTEST (Forces the IEEE 1149.1 bus to a run state for a specified > number of clocks). > - SIOCFREQ/GIOCFREQ for setting and reading JTAG frequency. > > Driver core provides set of internal APIs for allocation and > registration: > - jtag_register; > - jtag_unregister; > - jtag_alloc; > - jtag_free; > > Platform driver on registration with jtag-core creates the next > entry in dev folder: > /dev/jtagX > > Signed-off-by: Oleksandr Shamray> Signed-off-by: Jiri Pirko > Acked-by: Philippe Ombredanne [..] > diff --git a/drivers/jtag/jtag.c b/drivers/jtag/jtag.c [..] > +struct jtag *jtag_alloc(size_t priv_size, const struct jtag_ops *ops) > +{ > + struct jtag *jtag; > + > + jtag = kzalloc(sizeof(*jtag) + priv_size, GFP_KERNEL); > + if (!jtag) > + return NULL; > + > + if (!ops) > + return NULL; > + > + if (!ops->idle || !ops->mode_set || !ops->status_get || !ops->xfer) > + return NULL; Did you think through this? You leak 'jtag' here and above. Perform all the ops checks prior to the allocation. Julia
Re: [patch v18 1/4] drivers: jtag: Add JTAG core driver
On Mon, Jan 29, 2018 at 04:31:42PM +0200, Oleksandr Shamray wrote: > Initial patch for JTAG driver > JTAG class driver provide infrastructure to support hardware/software > JTAG platform drivers. It provide user layer API interface for flashing > and debugging external devices which equipped with JTAG interface > using standard transactions. > > Driver exposes set of IOCTL to user space for: > - XFER: > - SIR (Scan Instruction Register, IEEE 1149.1 Data Register scan); > - SDR (Scan Data Register, IEEE 1149.1 Instruction Register scan); > - RUNTEST (Forces the IEEE 1149.1 bus to a run state for a specified > number of clocks). > - SIOCFREQ/GIOCFREQ for setting and reading JTAG frequency. > > Driver core provides set of internal APIs for allocation and > registration: > - jtag_register; > - jtag_unregister; > - jtag_alloc; > - jtag_free; > > Platform driver on registration with jtag-core creates the next > entry in dev folder: > /dev/jtagX > > Signed-off-by: Oleksandr Shamray > Signed-off-by: Jiri Pirko > Acked-by: Philippe Ombredanne [..] > diff --git a/drivers/jtag/jtag.c b/drivers/jtag/jtag.c [..] > +struct jtag *jtag_alloc(size_t priv_size, const struct jtag_ops *ops) > +{ > + struct jtag *jtag; > + > + jtag = kzalloc(sizeof(*jtag) + priv_size, GFP_KERNEL); > + if (!jtag) > + return NULL; > + > + if (!ops) > + return NULL; > + > + if (!ops->idle || !ops->mode_set || !ops->status_get || !ops->xfer) > + return NULL; Did you think through this? You leak 'jtag' here and above. Perform all the ops checks prior to the allocation. Julia
Re: [patch v17 1/4] drivers: jtag: Add JTAG core driver
Hello Oleksandr- On Tue, Jan 16, 2018 at 09:18:56AM +0200, Oleksandr Shamray wrote: [..] > v16->v17 > Comments pointed by Julia Cartwright <jul...@eso.teric.us> More review feedback below: [..] > +++ b/drivers/jtag/jtag.c [..] > +static long jtag_ioctl(struct file *file, unsigned int cmd, unsigned long > arg) > +{ > + struct jtag *jtag = file->private_data; > + struct jtag_run_test_idle idle; > + struct jtag_xfer xfer; > + u8 *xfer_data; > + u32 data_size; > + u32 value; > + int err; > + > + if (!arg) > + return -EINVAL; > + > + switch (cmd) { > + case JTAG_GIOCFREQ: > + if (!jtag->ops->freq_get) > + err = -EOPNOTSUPP; Did you mean: return -EOPNOTSUPP; ? > + > + err = jtag->ops->freq_get(jtag, ); Otherwise you're check was worthless, you'll call NULL here. Also, w.r.t. the set of ops which are required to be implemented: this isn't the right place to do the check. Instead, do it in jtag_alloc(): struct jtag *jtag_alloc(size_t priv_size, const struct jtag_ops *ops) { struct jtag *jtag; if (!ops->freq_get || !ops->xfer || ...) /* fixup condition */ return NULL; jtag = kzalloc(sizeof(*jtag) + priv_size, GFP_KERNEL); if (!jtag) return NULL; jtag->ops = ops; return jtag; } EXPORT_SYMBOL_GPL(jtag_alloc); [..] > + case JTAG_IOCXFER: [..] > + data_size = DIV_ROUND_UP(xfer.length, BITS_PER_BYTE); > + xfer_data = memdup_user(u64_to_user_ptr(xfer.tdio), data_size); > + > + if (!xfer_data) memdup_user() doesn't return NULL on error. You need to check for IS_ERR(xfer_data). > + return -EFAULT; > + > + err = jtag->ops->xfer(jtag, , xfer_data); > + if (err) { > + kfree(xfer_data); > + return -EFAULT; > + } > + > + err = copy_to_user(u64_to_user_ptr(xfer.tdio), > +(void *)(xfer_data), data_size); > + > + if (err) { > + kfree(xfer_data); > + return -EFAULT; > + } > + > + kfree(xfer_data); Move the kfree() above the if (err). > + if (copy_to_user((void *)arg, , sizeof(struct jtag_xfer))) > + return -EFAULT; > + break; > + > + case JTAG_GIOCSTATUS: > + if (!jtag->ops->status_get) > + return -EOPNOTSUPP; > + > + err = jtag->ops->status_get(jtag, ); > + if (err) > + break; > + > + err = put_user(value, (__u32 *)arg); > + if (err) > + err = -EFAULT; put_user() returns -EFAULT on failure, so this shouldn't be necessary. [..] > --- /dev/null > +++ b/include/uapi/linux/jtag.h [..] > +/** > + * struct jtag_xfer - jtag xfer: > + * > + * @type: transfer type > + * @direction: xfer direction > + * @length: xfer bits len > + * @tdio : xfer data array > + * @endir: xfer end state > + * > + * Structure represents interface to JTAG device for jtag sdr xfer > + * execution. > + */ > +struct jtag_xfer { > + __u8type; > + __u8direction; > + __u8endstate; Just to be as unambiguous as possible, considering this is ABI, I would suggest explicitly putting a padding byte here. > + __u32 length; > + __u64 tdio; > +}; Thanks, Julia
Re: [patch v17 1/4] drivers: jtag: Add JTAG core driver
Hello Oleksandr- On Tue, Jan 16, 2018 at 09:18:56AM +0200, Oleksandr Shamray wrote: [..] > v16->v17 > Comments pointed by Julia Cartwright More review feedback below: [..] > +++ b/drivers/jtag/jtag.c [..] > +static long jtag_ioctl(struct file *file, unsigned int cmd, unsigned long > arg) > +{ > + struct jtag *jtag = file->private_data; > + struct jtag_run_test_idle idle; > + struct jtag_xfer xfer; > + u8 *xfer_data; > + u32 data_size; > + u32 value; > + int err; > + > + if (!arg) > + return -EINVAL; > + > + switch (cmd) { > + case JTAG_GIOCFREQ: > + if (!jtag->ops->freq_get) > + err = -EOPNOTSUPP; Did you mean: return -EOPNOTSUPP; ? > + > + err = jtag->ops->freq_get(jtag, ); Otherwise you're check was worthless, you'll call NULL here. Also, w.r.t. the set of ops which are required to be implemented: this isn't the right place to do the check. Instead, do it in jtag_alloc(): struct jtag *jtag_alloc(size_t priv_size, const struct jtag_ops *ops) { struct jtag *jtag; if (!ops->freq_get || !ops->xfer || ...) /* fixup condition */ return NULL; jtag = kzalloc(sizeof(*jtag) + priv_size, GFP_KERNEL); if (!jtag) return NULL; jtag->ops = ops; return jtag; } EXPORT_SYMBOL_GPL(jtag_alloc); [..] > + case JTAG_IOCXFER: [..] > + data_size = DIV_ROUND_UP(xfer.length, BITS_PER_BYTE); > + xfer_data = memdup_user(u64_to_user_ptr(xfer.tdio), data_size); > + > + if (!xfer_data) memdup_user() doesn't return NULL on error. You need to check for IS_ERR(xfer_data). > + return -EFAULT; > + > + err = jtag->ops->xfer(jtag, , xfer_data); > + if (err) { > + kfree(xfer_data); > + return -EFAULT; > + } > + > + err = copy_to_user(u64_to_user_ptr(xfer.tdio), > +(void *)(xfer_data), data_size); > + > + if (err) { > + kfree(xfer_data); > + return -EFAULT; > + } > + > + kfree(xfer_data); Move the kfree() above the if (err). > + if (copy_to_user((void *)arg, , sizeof(struct jtag_xfer))) > + return -EFAULT; > + break; > + > + case JTAG_GIOCSTATUS: > + if (!jtag->ops->status_get) > + return -EOPNOTSUPP; > + > + err = jtag->ops->status_get(jtag, ); > + if (err) > + break; > + > + err = put_user(value, (__u32 *)arg); > + if (err) > + err = -EFAULT; put_user() returns -EFAULT on failure, so this shouldn't be necessary. [..] > --- /dev/null > +++ b/include/uapi/linux/jtag.h [..] > +/** > + * struct jtag_xfer - jtag xfer: > + * > + * @type: transfer type > + * @direction: xfer direction > + * @length: xfer bits len > + * @tdio : xfer data array > + * @endir: xfer end state > + * > + * Structure represents interface to JTAG device for jtag sdr xfer > + * execution. > + */ > +struct jtag_xfer { > + __u8type; > + __u8direction; > + __u8endstate; Just to be as unambiguous as possible, considering this is ABI, I would suggest explicitly putting a padding byte here. > + __u32 length; > + __u64 tdio; > +}; Thanks, Julia
Re: [ANNOUNCE] 4.9.76-rt61
On Tue, Jan 16, 2018 at 11:11:32PM +, Bernhard Landauer wrote: > Ah. Working now after adding in validpgpkeys array in the PKGBUILD. This was the first v4.9-rt release since Steven handed over his maintainership , so that would explain why it would be signed by a different key (although, my key has been signed by Steven's). Thanks, Julia
Re: [ANNOUNCE] 4.9.76-rt61
On Tue, Jan 16, 2018 at 11:11:32PM +, Bernhard Landauer wrote: > Ah. Working now after adding in validpgpkeys array in the PKGBUILD. This was the first v4.9-rt release since Steven handed over his maintainership , so that would explain why it would be signed by a different key (although, my key has been signed by Steven's). Thanks, Julia