Re: [PATCH 3/5] perf: Add pmu get/put
On Thu, Oct 31, 2024 at 12:07:42AM -0500, Lucas De Marchi wrote: > On Wed, Oct 23, 2024 at 12:07:58AM -0500, Lucas De Marchi wrote: > > On Tue, Oct 22, 2024 at 11:52:10PM +0200, Peter Zijlstra wrote: > > > On Fri, Oct 18, 2024 at 02:46:31PM -0500, Lucas De Marchi wrote: > > > > > > > I will give this a try with i915 and/or xe. > > > > > > Less horrible version here: > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git > > > perf/pmu-unregister > > > > > > I've just pushed it out to the robots, but it builds, passes perf test > > > and your dummy_pmu testcase (for me, on my one testbox and .config > > > etc..) > > > > It passed for me as well with both dummy_pmu and with i915. I have some > > changes to igt (i915/xe testsuite) that should bring some more coverage. > > I minimized the pending test changes I had and posted to trigger CI: > > > > https://patchwork.freedesktop.org/series/140355/ > > Our CI also didn't trigger that pmu issue and the test could run > succesfully. Excellent. > I did get a report from kernel test robot though: > > https://lore.kernel.org/all/202410281436.8246527d-...@intel.com/ Yeah, I think I fixed that one, but the robot gifted me another one that I still need to find time to address. I'm hoping this weel. I'll repost properly once I fix it. Thanks!
Re: [PATCH 3/5] perf: Add pmu get/put
On Fri, Oct 18, 2024 at 02:46:31PM -0500, Lucas De Marchi wrote: > I will give this a try with i915 and/or xe. Less horrible version here: git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git perf/pmu-unregister I've just pushed it out to the robots, but it builds, passes perf test and your dummy_pmu testcase (for me, on my one testbox and .config etc..) I'll let it sit with the robots for a few days and if they don't complain I'll post it. Thanks!
Re: [PATCH 3/5] perf: Add pmu get/put
On Mon, Oct 14, 2024 at 09:25:19PM +0200, Peter Zijlstra wrote: > Let me ponder that a little bit. So I did the thing on top of the get/put thing that would allow you to get rid of the ->closed thing, and before I was finished I already hated all of it :-( The thing is, if you're going to the effort of adding get/put and putting the responsibility on the implementation, then the ->closed thing is only a little extra ask. So... I wondered, how hard it would be for perf_pmu_unregister() to do what you want. Time passed, hacks were done... and while what I have is still riddled with holes (more work is definitely required), it does pass your dummy_pmu test scipt. I'll poke at this a little longer. Afaict it should be possible to make this good enough for what you want. Just needs more holes filled. --- include/linux/perf_event.h | 13 ++- kernel/events/core.c | 260 ++--- 2 files changed, 228 insertions(+), 45 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index fb908843f209..20995d33d27e 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -318,6 +318,9 @@ struct perf_output_handle; struct pmu { struct list_headentry; + spinlock_t events_lock; + struct list_headevents; + struct module *module; struct device *dev; struct device *parent; @@ -612,9 +615,10 @@ struct perf_addr_filter_range { * enum perf_event_state - the states of an event: */ enum perf_event_state { - PERF_EVENT_STATE_DEAD = -4, - PERF_EVENT_STATE_EXIT = -3, - PERF_EVENT_STATE_ERROR = -2, + PERF_EVENT_STATE_DEAD = -5, + PERF_EVENT_STATE_REVOKED= -4, /* pmu gone, must not touch */ + PERF_EVENT_STATE_EXIT = -3, /* task died, still inherit */ + PERF_EVENT_STATE_ERROR = -2, /* scheduling error, can enable */ PERF_EVENT_STATE_OFF= -1, PERF_EVENT_STATE_INACTIVE = 0, PERF_EVENT_STATE_ACTIVE = 1, @@ -853,6 +857,7 @@ struct perf_event { void *security; #endif struct list_headsb_list; + struct list_headpmu_list; /* * Certain events gets forwarded to another pmu internally by over- @@ -1103,7 +1108,7 @@ extern void perf_aux_output_flag(struct perf_output_handle *handle, u64 flags); extern void perf_event_itrace_started(struct perf_event *event); extern int perf_pmu_register(struct pmu *pmu, const char *name, int type); -extern void perf_pmu_unregister(struct pmu *pmu); +extern int perf_pmu_unregister(struct pmu *pmu); extern void __perf_event_task_sched_in(struct task_struct *prev, struct task_struct *task); diff --git a/kernel/events/core.c b/kernel/events/core.c index cdd09769e6c5..e66827367a97 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2406,7 +2406,9 @@ ctx_time_update_event(struct perf_event_context *ctx, struct perf_event *event) #define DETACH_GROUP 0x01UL #define DETACH_CHILD 0x02UL -#define DETACH_DEAD0x04UL +#define DETACH_EXIT0x04UL +#define DETACH_REVOKE 0x08UL +#define DETACH_DEAD0x10UL /* * Cross CPU call to remove a performance event @@ -2421,6 +2423,7 @@ __perf_remove_from_context(struct perf_event *event, void *info) { struct perf_event_pmu_context *pmu_ctx = event->pmu_ctx; + enum perf_event_state state = PERF_EVENT_STATE_OFF; unsigned long flags = (unsigned long)info; ctx_time_update(cpuctx, ctx); @@ -2429,16 +2432,22 @@ __perf_remove_from_context(struct perf_event *event, * Ensure event_sched_out() switches to OFF, at the very least * this avoids raising perf_pending_task() at this time. */ - if (flags & DETACH_DEAD) + if (flags & DETACH_EXIT) + state = PERF_EVENT_STATE_EXIT; + if (flags & DETACH_REVOKE) + state = PERF_EVENT_STATE_REVOKED; + if (flags & DETACH_DEAD) { event->pending_disable = 1; + state = PERF_EVENT_STATE_DEAD; + } event_sched_out(event, ctx); if (flags & DETACH_GROUP) perf_group_detach(event); if (flags & DETACH_CHILD) perf_child_detach(event); list_del_event(event, ctx); - if (flags & DETACH_DEAD) - event->state = PERF_EVENT_STATE_DEAD; + + event->state = state; if (!pmu_ctx->nr_events) { pmu_ctx->rotate_necessary = 0; @@ -4508,7 +4517,8 @@ static void perf_event_enable_on_exec(struct perf_event_context *ctx) static void perf_remove_from_owner(struct perf_event *ev
Re: [PATCH 1/5] perf: Add dummy pmu module
On Tue, Oct 08, 2024 at 01:34:57PM -0500, Lucas De Marchi wrote: > +static int parse_device(const char __user *ubuf, size_t size, u32 *instance) > +{ > + char buf[16]; > + ssize_t len; > + > + if (size > sizeof(buf) - 1) > + return -E2BIG; > + > + len = strncpy_from_user(buf, ubuf, sizeof(buf)); > + if (len < 0 || len >= sizeof(buf) - 1) > + return -E2BIG; > + > + if (kstrtou32(buf, 0, instance)) > + return -EINVAL; > + > + return size; > +} I had to change this to: +static int parse_device(const char __user *ubuf, size_t size, u32 *instance) +{ + int ret = kstrtouint_from_user(ubuf, size, 0, instance); + if (ret) { + printk("suckage: %d\n", ret); + return ret; + } + return size; +} because otherwise it didn't want to work for me; I kept getting garbage at the end and things failing. Specifically, it looked like the string presented by userspace was not '\0' terminated, ubuf was pointing to "1...garbage..." and size was 1.
Re: [PATCH 3/5] perf: Add pmu get/put
On Mon, Oct 14, 2024 at 01:20:34PM -0500, Lucas De Marchi wrote: > On Mon, Oct 14, 2024 at 07:32:46PM +0200, Peter Zijlstra wrote: > > I'm confused.. probably because I still don't have any clue about > > drivers and the above isn't really telling me much either. > > > > I don't see how you get rid of the try_module_get() we do per event; > > without that you can't unload the module. > > I don't get rid of the try_module_get(). They serve diffeerent purposes. > Having a reference to the module prevents the _module_ going away (and > hence the function pointers we call into from perf). It doesn't prevent > the module unbinding from the HW. A module may have N instances if it's > bound to N devices. > > This can be done today to unbind the HW (integrated graphics) from the > i915 module: > > # echo :00:02.0 > /sys/bus/pci/drivers/i915/unbind > > The ref taken by these new get()/put() are related to preventing the > data going away - the driver can use that to take a ref on something > that will survive the unbind. OK, for some reason I thought to remember that you wanted to be able to unload the module too. > > And I don't see how you think it is safe to free a pmu while there are > > still events around. > > so, we don't actually free it - the pmu is unregistered but the > `struct pmu` and (possibly) its container are still around after unregister. > When the get/put are used, the driver can keep the data around, which is > then free'd when the last reference is put. Ah, okay. So the implementation knows to nop out all device interaction when it gets unbound, but the events and pmu data stick around until they're naturally killed off? Ah, reading the below patches that is indeed what i915 does. pmu->closed makes this so. The dummy thing you posted in this thread, does perf_event_disable() on all previously created events, and this is not sound. Userspace can do PERF_EVENT_IOC_ENABLE on them and then things will go side-ways fast. And I was afraid i915 was doing this same. > - Subject: [PATCH 3/8] drm/i915/pmu: Fix crash due to use-after-free So reading that Changelog, you would like a replacement for pmu->closed as well. I suppose, one way to go about doing this is to have perf_pmu_unregister() replace a bunch of methods. Notably you have pmu->closed in: - event_init() - read() - start() - stop() - add() Having perf_pmu_unregister() overwrite those function pointers with something akin to your pmu->closed would go a long way, right? It would require using READ_ONCE() for calling the methods, which would make things a little ugly :/ But I also think we want to force all the events into STATE_ERROR, and I'm not immediately sure how best to go about doing that. Adding better return value handling to ->add() is trivial enough, and perhaps calling perf_pmu_resched() is sufficient to cycle everything. Let me ponder that a little bit.
Re: [PATCH 3/5] perf: Add pmu get/put
On Tue, Oct 08, 2024 at 01:34:59PM -0500, Lucas De Marchi wrote: > If a pmu is unregistered while there's an active event, perf will still > access the pmu via event->pmu, even after the event is destroyed. This > makes it difficult for drivers like i915 that can be unbound from the > HW. > > BUG: KASAN: use-after-free in exclusive_event_destroy+0xd8/0xf0 > Read of size 4 at addr 88816e2bb63c by task perf/7748 > > i915 tries to cope with it by installing a event->destroy, but that is > not sufficient: if pmu is released by the driver, it will still crash > since event->pmu is still used. > > Moreover, even with that use-after-free fixed by adjusting the order in > _free_event() or delaying the free by the driver, kernel still oops when > closing the event fd related to a unregistered pmu: the percpu variables > allocated on perf_pmu_register() would already be released. One such > crash is: > > BUG: KASAN: user-memory-access in _raw_spin_lock_irqsave+0x88/0x100 > Write of size 4 at addr by task perf/727 > > CPU: 0 UID: 0 PID: 727 Comm: perf Not tainted 6.12.0-rc1-DEMARCHI-dxnf+ > #9 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS unknown > 2/2/2022 > Call Trace: > >dump_stack_lvl+0x5f/0x90 >print_report+0x4d3/0x50a >? __pfx__raw_spin_lock_irqsave+0x10/0x10 >? kasan_addr_to_slab+0xd/0xb0 >kasan_report+0xe2/0x170 >? _raw_spin_lock_irqsave+0x88/0x100 >? _raw_spin_lock_irqsave+0x88/0x100 >kasan_check_range+0x125/0x230 >__kasan_check_write+0x14/0x30 >_raw_spin_lock_irqsave+0x88/0x100 >? __pfx__raw_spin_lock_irqsave+0x10/0x10 >_atomic_dec_and_raw_lock_irqsave+0x89/0x110 >? __kasan_check_write+0x14/0x30 >put_pmu_ctx+0x98/0x330 > > The fix here is to provide a set of get/put hooks that drivers can > implement to piggy back the perf's pmu lifecycle to the driver's > instance lifecycle. With this, perf_pmu_unregister() can be called by > the driver, which is then responsible for freeing the resources. I'm confused.. probably because I still don't have any clue about drivers and the above isn't really telling me much either. I don't see how you get rid of the try_module_get() we do per event; without that you can't unload the module. And I don't see how you think it is safe to free a pmu while there are still events around. Nor do I really see what these new get/put methods do. I see you call ->put() where we do module_put(), and ->get() near try_module_get(), but how is that helping?
Re: [PATCH v2] locking/ww_mutex: Adjust to lockdep nest_lock requirements
On Wed, Oct 09, 2024 at 11:20:31AM +0200, Thomas Hellström wrote: > When using mutex_acquire_nest() with a nest_lock, lockdep refcounts the > number of acquired lockdep_maps of mutexes of the same class, and also > keeps a pointer to the first acquired lockdep_map of a class. That pointer > is then used for various comparison-, printing- and checking purposes, > but there is no mechanism to actively ensure that lockdep_map stays in > memory. Instead, a warning is printed if the lockdep_map is freed and > there are still held locks of the same lock class, even if the lockdep_map > itself has been released. > > In the context of WW/WD transactions that means that if a user unlocks > and frees a ww_mutex from within an ongoing ww transaction, and that > mutex happens to be the first ww_mutex grabbed in the transaction, > such a warning is printed and there might be a risk of a UAF. > > Note that this is only problem when lockdep is enabled and affects only > dereferences of struct lockdep_map. > > Adjust to this by adding a fake lockdep_map to the acquired context and > make sure it is the first acquired lockdep map of the associated > ww_mutex class. Then hold it for the duration of the WW/WD transaction. > > This has the side effect that trying to lock a ww mutex *without* a > ww_acquire_context but where a such context has been acquire, we'd see > a lockdep splat. The test-ww_mutex.c selftest attempts to do that, so > modify that particular test to not acquire a ww_acquire_context if it > is not going to be used. > > v2: > - Lower the number of locks in the test-ww_mutex > stress(STRESS_ALL) test to accommodate the dummy lock > introduced in this patch without overflowing lockdep held lock > references. Thanks, I rebased tip/locking/core, which should now have this patch.
Re: [PATCH RESEND] locking/ww_mutex: Adjust to lockdep nest_lock requirements
On Wed, Oct 02, 2024 at 02:56:11PM +0200, Thomas Hellström wrote: > When using mutex_acquire_nest() with a nest_lock, lockdep refcounts the > number of acquired lockdep_maps of mutexes of the same class, and also > keeps a pointer to the first acquired lockdep_map of a class. That pointer > is then used for various comparison-, printing- and checking purposes, > but there is no mechanism to actively ensure that lockdep_map stays in > memory. Instead, a warning is printed if the lockdep_map is freed and > there are still held locks of the same lock class, even if the lockdep_map > itself has been released. > > In the context of WW/WD transactions that means that if a user unlocks > and frees a ww_mutex from within an ongoing ww transaction, and that > mutex happens to be the first ww_mutex grabbed in the transaction, > such a warning is printed and there might be a risk of a UAF. I'm assuming you actually hit this? Anyway, work around seems sane enough, thanks!
Re: [PATCH 6/7] drm/i915/pmu: Lazy unregister
On Tue, Jul 23, 2024 at 10:30:08AM -0500, Lucas De Marchi wrote: > On Tue, Jul 23, 2024 at 09:03:25AM GMT, Tvrtko Ursulin wrote: > > > > On 22/07/2024 22:06, Lucas De Marchi wrote: > > > Instead of calling perf_pmu_unregister() when unbinding, defer that to > > > the destruction of i915 object. Since perf itself holds a reference in > > > the event, this only happens when all events are gone, which guarantees > > > i915 is not unregistering the pmu with live events. > > > > > > Previously, running the following sequence would crash the system after > > > ~2 tries: > > > > > > 1) bind device to i915 > > > 2) wait events to show up on sysfs > > > 3) start perf stat -I 1000 -e i915/rcs0-busy/ > > > 4) unbind driver > > > 5) kill perf > > > > > > Most of the time this crashes in perf_pmu_disable() while accessing the > > > percpu pmu_disable_count. This happens because perf_pmu_unregister() > > > destroys it with free_percpu(pmu->pmu_disable_count). > > > > > > With a lazy unbind, the pmu is only unregistered after (5) as opposed to > > > after (4). The downside is that if a new bind operation is attempted for > > > the same device/driver without killing the perf process, i915 will fail > > > to register the pmu (but still load successfully). This seems better > > > than completely crashing the system. > > > > So effectively allows unbind to succeed without fully unbinding the > > driver from the device? That sounds like a significant drawback and if > > so, I wonder if a more complicated solution wouldn't be better after > > all. Or is there precedence for allowing userspace keeping their paws on > > unbound devices in this way? > > keeping the resources alive but "unplunged" while the hardware > disappeared is a common thing to do... it's the whole point of the > drmm-managed resource for example. If you bind the driver and then > unbind it while userspace is holding a ref, next time you try to bind it > will come up with a different card number. A similar thing that could be > done is to adjust the name of the event - currently we add the mangled > pci slot. > > That said, I agree a better approach would be to allow > perf_pmu_unregister() to do its job even when there are open events. On > top of that (or as a way to help achieve that), make perf core replace > the callbacks with stubs when pmu is unregistered - that would even kill > the need for i915's checks on pmu->closed (and fix the lack thereof in > other drivers). > > It can be a can of worms though and may be pushed back by perf core > maintainers, so it'd be good have their feedback. I don't think I understand the problem. I also don't understand drivers much -- so that might be the problem. So the problem appears to be that the device just disappears without warning? How can a GPU go away like that? Since you have a notion of this device, can't you do this stubbing you talk about? That is, if your internal device reference becomes NULL, let the PMU methods preserve the state like no-ops. And then when the last event goes away, tear down the whole thing. Again, I'm not sure I'm following.
Re: [PATCH 5/5] fs: Convert struct file::f_count to refcount_long_t
On Fri, May 03, 2024 at 11:37:25AM +0200, Christian Brauner wrote: > On Thu, May 02, 2024 at 05:41:23PM -0700, Kees Cook wrote: > > On Fri, May 03, 2024 at 01:14:45AM +0100, Al Viro wrote: > > > On Thu, May 02, 2024 at 05:10:18PM -0700, Kees Cook wrote: > > > > > > > But anyway, there needs to be a general "oops I hit 0"-aware form of > > > > get_file(), and it seems like it should just be get_file() itself... > > > > > > ... which brings back the question of what's the sane damage mitigation > > > for that. Adding arseloads of never-exercised failure exits is generally > > > a bad idea - it's asking for bitrot and making the thing harder to review > > > in future. > > > > Linus seems to prefer best-effort error recovery to sprinkling BUG()s > > around. But if that's really the solution, then how about get_file() > > switching to to use inc_not_zero and BUG on 0? > > Making get_file() return an error is not an option. For all current > callers that's pointless churn for a condition that's not supposed to > happen at all. > > Additionally, iirc *_inc_not_zero() variants are implemented with > try_cmpxchg() which scales poorly under contention for a condition > that's not supposed to happen. unsigned long old = atomic_long_fetch_inc_relaxed(&f->f_count); WARN_ON(!old); Or somesuch might be an option?
Re: [PATCH v3 0/4] mm: convert to vma_is_initial_heap/stack()
On Fri, Jul 28, 2023 at 01:00:39PM +0800, Kefeng Wang wrote: > Kefeng Wang (4): > mm: factor out VMA stack and heap checks > drm/amdkfd: use vma_is_initial_stack() and vma_is_initial_heap() > selinux: use vma_is_initial_stack() and vma_is_initial_heap() > perf/core: use vma_is_initial_stack() and vma_is_initial_heap() > > drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 5 + > fs/proc/task_mmu.c | 24 > fs/proc/task_nommu.c | 15 + > include/linux/mm.h | 25 + > kernel/events/core.c | 33 ++-- > security/selinux/hooks.c | 7 ++ > 6 files changed, 44 insertions(+), 65 deletions(-) Acked-by: Peter Zijlstra (Intel)
Re: [PATCH v2 03/47] mm: shrinker: add infrastructure for dynamically allocating shrinker
On Mon, Jul 24, 2023 at 05:43:10PM +0800, Qi Zheng wrote: > +void shrinker_unregister(struct shrinker *shrinker) > +{ > + struct dentry *debugfs_entry; > + int debugfs_id; > + > + if (!shrinker || !(shrinker->flags & SHRINKER_REGISTERED)) > + return; > + > + down_write(&shrinker_rwsem); > + list_del(&shrinker->list); > + shrinker->flags &= ~SHRINKER_REGISTERED; > + if (shrinker->flags & SHRINKER_MEMCG_AWARE) > + unregister_memcg_shrinker(shrinker); > + debugfs_entry = shrinker_debugfs_detach(shrinker, &debugfs_id); > + up_write(&shrinker_rwsem); > + > + shrinker_debugfs_remove(debugfs_entry, debugfs_id); Should there not be an rcu_barrier() right about here? > + > + kfree(shrinker->nr_deferred); > + shrinker->nr_deferred = NULL; > + > + kfree(shrinker); > +} > +EXPORT_SYMBOL(shrinker_unregister);
Re: [PATCH v2 4/4] perf/core: use vma_is_initial_stack() and vma_is_initial_heap()
On Wed, Jul 19, 2023 at 03:51:14PM +0800, Kefeng Wang wrote: > Use the helpers to simplify code, also kill unneeded goto cpy_name. Grrr.. why am I only getting 4/4 ? I'm going to write a bot that auto NAKs all partial series :/
Re: Consider switching to WQ_UNBOUND messages (was: Re: [PATCH v2 6/7] workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism)
On Wed, Jul 12, 2023 at 11:04:16AM +0200, Geert Uytterhoeven wrote: > Hoi Peter, > > On Wed, Jul 12, 2023 at 10:05 AM Peter Zijlstra wrote: > > On Tue, Jul 11, 2023 at 11:39:17AM -1000, Tejun Heo wrote: > > > I wonder whether the right thing to do here is somehow scaling the > > > threshold > > > according to the relative processing power. It's difficult to come up > > > with a > > > threshold which works well across the latest & fastest and really tiny > > > CPUs. > > > I'll think about it some more but if you have some ideas, please feel free > > > to suggest. > > > > We could scale by BogoMIPS I suppose, it's a bogus measurement, as per > > the name, but it does have some relation to how fast the machine is. > > That's gonna fail miserably on e.g. ARM and RISC-V, where BogoMIPS > depends on some timer frequency. > > R-Car M2-W with 1.5 GHz Cortex-A15: 40.00 BogoMIPS > R-Car V4H with 1.8 GHz Cortex-A76: 33.33 BogoMIPS > > while the real slow 48 MHz VexRiscV gets 128 BogoMIPS. Hehe, OK, really bogus then. Lets file this idea in the bit-bucket then.
Re: Consider switching to WQ_UNBOUND messages (was: Re: [PATCH v2 6/7] workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism)
On Tue, Jul 11, 2023 at 11:39:17AM -1000, Tejun Heo wrote: > I wonder whether the right thing to do here is somehow scaling the threshold > according to the relative processing power. It's difficult to come up with a > threshold which works well across the latest & fastest and really tiny CPUs. > I'll think about it some more but if you have some ideas, please feel free > to suggest. We could scale by BogoMIPS I suppose, it's a bogus measurement, as per the name, but it does have some relation to how fast the machine is.
Re: [Intel-gfx] [PATCH v5 0/7] Introduce __xchg, non-atomic xchg
On Thu, Feb 23, 2023 at 10:24:19PM +0100, Andrzej Hajda wrote: > On 22.02.2023 18:04, Peter Zijlstra wrote: > > On Wed, Jan 18, 2023 at 04:35:22PM +0100, Andrzej Hajda wrote: > > > > > Andrzej Hajda (7): > > >arch: rename all internal names __xchg to __arch_xchg > > >linux/include: add non-atomic version of xchg > > >arch/*/uprobes: simplify arch_uretprobe_hijack_return_addr > > >llist: simplify __llist_del_all > > >io_uring: use __xchg if possible > > >qed: use __xchg if possible > > >drm/i915/gt: use __xchg instead of internal helper > > > > Nothing crazy in here I suppose, I somewhat wonder why you went through > > the trouble, but meh. > > If you are asking why I have proposed this patchset, then the answer is > simple, 1st I've tried to find a way to move internal i915 helper to core > (see patch 7). > Then I was looking for possible other users of this helper. And apparently > there are many of them, patches 3-7 shows some. > > > > > > You want me to take this through te locking tree (for the next cycle, > > not this one) where I normally take atomic things or does someone else > > want this? > > If you could take it I will be happy. OK, I'll go queue it in tip/locking/core after -rc1. Thanks!
Re: [PATCH v5 0/7] Introduce __xchg, non-atomic xchg
On Wed, Jan 18, 2023 at 04:35:22PM +0100, Andrzej Hajda wrote: > Andrzej Hajda (7): > arch: rename all internal names __xchg to __arch_xchg > linux/include: add non-atomic version of xchg > arch/*/uprobes: simplify arch_uretprobe_hijack_return_addr > llist: simplify __llist_del_all > io_uring: use __xchg if possible > qed: use __xchg if possible > drm/i915/gt: use __xchg instead of internal helper Nothing crazy in here I suppose, I somewhat wonder why you went through the trouble, but meh. You want me to take this through te locking tree (for the next cycle, not this one) where I normally take atomic things or does someone else want this?
Re: [PATCH v2 1/6] mm: introduce vma->vm_flags modifier functions
On Wed, Jan 25, 2023 at 12:38:46AM -0800, Suren Baghdasaryan wrote: > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 2d6d790d9bed..6c7c70bf50dd 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -491,7 +491,13 @@ struct vm_area_struct { >* See vmf_insert_mixed_prot() for discussion. >*/ > pgprot_t vm_page_prot; > - unsigned long vm_flags; /* Flags, see mm.h. */ > + > + /* > + * Flags, see mm.h. > + * WARNING! Do not modify directly. > + * Use {init|reset|set|clear|mod}_vm_flags() functions instead. > + */ > + unsigned long vm_flags; We have __private and ACCESS_PRIVATE() to help with enforcing this.
Re: [PATCH v2 20/20] jump_label: RFC - tolerate toggled state
On Fri, Jan 13, 2023 at 12:30:16PM -0700, Jim Cromie wrote: > __jump_label_patch currently will "crash the box" if it finds a > jump_entry not as expected. ISTM this overly harsh; it doesn't > distinguish between "alternate/opposite" state, and truly > "insane/corrupted". > > The "opposite" (but well-formed) state is a milder mis-initialization > problem, and some less severe mitigation seems practical. ATM this > just warns about it; a range/enum of outcomes: warn, crash, silence, > ok, fixup-continue, etc, are possible on a case-by-case basis. > > Ive managed to create this mis-initialization condition with > test_dynamic_debug.ko & _submod.ko. These replicate DRM's regression > on DRM_USE_DYNAMIC_DEBUG=y; drm.debug callsites in drivers/helpers > (dependent modules) are not enabled along with those in drm.ko itself. > > Ive hit this case a few times, but havent been able to isolate the > when and why. > > warn-only is something of a punt, and I'm still left with remaining > bugs which are likely related; I'm able to toggle the p-flag on > callsites in the submod, but their enablement still doesn't yield > logging activity. Right; having been in this is state is bad since it will generate inconsistent code-flow. Full on panic *might* not be warranted (as it does for corrupted text) but it is still a fairly bad situation -- so I'm not convinced we want to warn and carry on. It would be really good to figure out why the site was skipped over and got out of skew. Given it's all module stuff, the 'obvious' case would be something like a race between adding the new sites and flipping it, but I'm not seeing how -- things are rather crudely serialized by jump_label_mutex. The only other option I can come up with is that somehow the update condition in jump_label_add_module() is somehow wrong.
Re: [PATCH mm-unstable v1 20/20] mm: rename FOLL_FORCE to FOLL_PTRACE
On Wed, Nov 16, 2022 at 10:16:34AM -0800, Linus Torvalds wrote: > Following the history of it is a big of a mess, because there's a > number of renamings and re-organizations, but it seems to go back to > 2007 and commit b6a2fea39318 ("mm: variable length argument support"). I went back and read parts of the discussions with Ollie, and the .force=1 thing just magically appeared one day when we were sending work-in-progress patches back and forth without mention of where it came from :-/ And I certainly can't remember now.. Looking at it now, I have the same reaction as both you and Kees had, it seems entirely superflous. So I'm all for trying to remove it.
Re: [PATCH 15/15] seqlock: drop seqcount_ww_mutex_t
On Thu, Apr 07, 2022 at 10:59:46AM +0200, Christian König wrote: > Daniel pointed out that this series removes the last user of > seqcount_ww_mutex_t, so let's drop this. > > Signed-off-by: Christian König Acked-by: Peter Zijlstra (Intel)
Re: [PATCH] locking/rwsem: drop redundant semicolon of down_write_nest_lock
On Fri, Jan 14, 2022 at 09:40:54AM +0100, Christian König wrote: > Am 14.01.22 um 09:37 schrieb Guchun Chen: > > Otherwise, braces are needed when using it. > > > > Signed-off-by: Guchun Chen > > Acked-by: Christian König > > Peter any objections that we push this upstream through the drm subsystem? Nah, take it. Acked-by: Peter Zijlstra (Intel)
Re: [PATCH 0/2] Nuke PAGE_KERNEL_IO
On Thu, Oct 21, 2021 at 11:15:09AM -0700, Lucas De Marchi wrote: > Last user of PAGE_KERNEL_IO is the i915 driver. While removing it from > there as we seek to bring the driver to other architectures, Daniel > suggested that we could finish the cleanup and remove it altogether, > through the tip tree. So here I'm sending both commits needed for that. > > Lucas De Marchi (2): > drm/i915/gem: stop using PAGE_KERNEL_IO > x86/mm: nuke PAGE_KERNEL_IO > > arch/x86/include/asm/fixmap.h | 2 +- > arch/x86/include/asm/pgtable_types.h | 7 --- > arch/x86/mm/ioremap.c | 2 +- > arch/x86/xen/setup.c | 2 +- > drivers/gpu/drm/i915/gem/i915_gem_pages.c | 4 ++-- > include/asm-generic/fixmap.h | 2 +- > 6 files changed, 6 insertions(+), 13 deletions(-) Acked-by: Peter Zijlstra (Intel)
Re: printk deadlock due to double lock attempt on current CPU's runqueue
On Wed, Nov 10, 2021 at 11:50:38AM +0100, Petr Mladek wrote: > On Tue 2021-11-09 12:06:48, Sultan Alsawaf wrote: > > Hi, > > > > I encountered a printk deadlock on 5.13 which appears to still affect the > > latest > > kernel. The deadlock occurs due to printk being used while having the > > current > > CPU's runqueue locked, and the underlying framebuffer console attempting to > > lock > > the same runqueue when printk tries to flush the log buffer. > > > > I'm not sure what the *correct* solution is here (don't use printk while > > having > > a runqueue locked? don't use schedule_work() from the fbcon path? tell > > printk > > to use one of its lock-less backends?), so I've cc'd all the relevant folks. > > At the moment, printk_deferred() could be used here. It defers the > console handling via irq_work(). I think I've rejected that patch at least twice now :-) John's printk stuff will really land real soon now, right.
Re: printk deadlock due to double lock attempt on current CPU's runqueue
On Tue, Nov 09, 2021 at 12:06:48PM -0800, Sultan Alsawaf wrote: > Hi, > > I encountered a printk deadlock on 5.13 which appears to still affect the > latest > kernel. The deadlock occurs due to printk being used while having the current > CPU's runqueue locked, and the underlying framebuffer console attempting to > lock > the same runqueue when printk tries to flush the log buffer. Yes, that's a known 'feature' of some consoles. printk() is in the process of being reworked to not call con->write() from the printk() calling context, which would go a long way towards fixing this. > #27 [c95b8e28] enqueue_task_fair at 8129774a <-- > SCHED_WARN_ON(rq->tmp_alone_branch != &rq->leaf_cfs_rq_list); > #28 [c95b8ec0] activate_task at 8125625d > #29 [c95b8ef0] ttwu_do_activate at 81257943 > #30 [c95b8f28] sched_ttwu_pending at 8125c71f <-- locks > this CPU's runqueue > #31 [c95b8fa0] flush_smp_call_function_queue at 813c6833 > #32 [c95b8fd8] generic_smp_call_function_single_interrupt at > 813c7f58 > #33 [c95b8fe0] __sysvec_call_function_single at 810f1456 > #34 [c95b8ff0] sysvec_call_function_single at 831ec1bc > --- --- > #35 [c919fda8] sysvec_call_function_single at 831ec1bc > RIP: 831ed06e RSP: ed10438a6a49 RFLAGS: 0001 > RAX: 888100d832c0 RBX: RCX: 19233fd7 > RDX: RSI: 888100d832c0 RDI: ed10438a6a49 > RBP: 831ec166 R8: dc00 R9: > R10: 83400e22 R11: R12: 831ed83e > R13: R14: c919fde8 R15: 814d4d9d > ORIG_RAX: 88821c53524b CS: 0001 SS: ef073a2 > WARNING: possibly bogus exception frame > --->8--- > > The catalyst is that CONFIG_SCHED_DEBUG is enabled and the tmp_alone_branch > assertion fails (Peter, is this bad?). Yes, that's not good. IIRC Vincent and Michal were looking at that code recently. > I'm not sure what the *correct* solution is here (don't use printk while > having > a runqueue locked? don't use schedule_work() from the fbcon path? tell printk > to use one of its lock-less backends?), so I've cc'd all the relevant folks. I'm a firm believer in early_printk serial consoles.
Re: [Intel-gfx] [PATCH 2/4] mm: add a io_mapping_map_user helper
On Wed, Oct 20, 2021 at 08:40:05AM -0700, Lucas De Marchi wrote: > On Fri, Mar 26, 2021 at 06:55:03AM +0100, Christoph Hellwig wrote: > > Add a helper that calls remap_pfn_range for an struct io_mapping, relying > > on the pgprot pre-validation done when creating the mapping instead of > > doing it at runtime. > > > > Signed-off-by: Christoph Hellwig > > --- > > include/linux/io-mapping.h | 3 +++ > > mm/Kconfig | 3 +++ > > mm/Makefile| 1 + > > mm/io-mapping.c| 29 + > > 4 files changed, 36 insertions(+) > > create mode 100644 mm/io-mapping.c > > > > diff --git a/include/linux/io-mapping.h b/include/linux/io-mapping.h > > index c093e81310a9b3..e9743cfd858527 100644 > > --- a/include/linux/io-mapping.h > > +++ b/include/linux/io-mapping.h > > @@ -220,3 +220,6 @@ io_mapping_free(struct io_mapping *iomap) > > } > > > > #endif /* _LINUX_IO_MAPPING_H */ > > + > > +int io_mapping_map_user(struct io_mapping *iomap, struct vm_area_struct > > *vma, > > + unsigned long addr, unsigned long pfn, unsigned long size); > > I'm not sure what exactly brought me to check this, but while debugging > I noticed this outside the header guard. But then after some more checks I > saw nothing actually selects CONFIG_IO_MAPPING because commit using > it was reverted in commit 0e4fe0c9f2f9 ("Revert "i915: use > io_mapping_map_user"") > > Is this something we want to re-attempt moving to mm/ ? Yes, it would be very good to unexport apply_to_page_range(), it's a terrible interface to expose.
Re: [PATCH 8/8] drm/i915: Don't disable interrupts and pretend a lock as been acquired in __timeline_mark_lock().
On Tue, Oct 05, 2021 at 05:00:46PM +0200, Sebastian Andrzej Siewior wrote: > This is a revert of commits >d67739268cf0e ("drm/i915/gt: Mark up the nested engine-pm timeline lock as > irqsafe") >6c69a45445af9 ("drm/i915/gt: Mark context->active_count as protected by > timeline->mutex") > > The existing code leads to a different behaviour depending on wheather > lockdep is enabled or not. Any following lock that is acquired without > disabling interrupts (but needs to) will not be noticed by lockdep. > > This it not just a lockdep annotation but is used but an actual mutex_t > that is properly used as a lock but in case of __timeline_mark_lock() > lockdep is only told that it is acquired but no lock has been acquired. > > It appears that its purporse is just satisfy the lockdep_assert_held() > check in intel_context_mark_active(). The other problem with disabling > interrupts is that on PREEMPT_RT interrupts are also disabled which > leads to problems for instance later during memory allocation. > > Add an argument to intel_context_mark_active() which is true if the lock > must have been acquired, false if other magic is involved and the lock > is not needed. Use the `false' argument only from within > switch_to_kernel_context() and remove __timeline_mark_lock(). > > Cc: Peter Zijlstra > Signed-off-by: Sebastian Andrzej Siewior Eeew, nice find. > -static inline void intel_context_mark_active(struct intel_context *ce) > +static inline void intel_context_mark_active(struct intel_context *ce, > + bool timeline_mutex_needed) > { > - lockdep_assert_held(&ce->timeline->mutex); > + if (timeline_mutex_needed) > + lockdep_assert_held(&ce->timeline->mutex); > ++ce->active_count; > } Chris, might it be possible to write that something like: lockdep_assert(lockdep_is_held(&ce->timeline->mutex) || engine_is_parked(ce)); instead? Otherwise, Acked-by: Peter Zijlstra (Intel)
Re: [RFC 1/6] sched: Add nice value change notifier
On Mon, Oct 04, 2021 at 09:12:37AM +0100, Tvrtko Ursulin wrote: > On 01/10/2021 16:48, Peter Zijlstra wrote: > > Hmm? That's for normalize_rt_tasks() only, right? Just don't have it > > call the notifier in that special case (that's a magic sysrq thing > > anyway). > > You mean my talk about tasklist_lock? No, it is also on the syscall part I > am interested in as well. Call chain looks like this: Urgh, I alwys miss that because it lives outside of sched.. :/ > sys_setpriority() > { > ... > rcu_read_lock(); > read_lock(&tasklist_lock); > ... > set_one_prio() > set_user_nice() > { > ... > task_rq_lock(); > -> my notifier from this RFC [1] > task_rq_unlock(); > -> I can move the notifier here for _some_ improvement [2] > } > ... > read_unlock(&tasklist_lock); > rcu_read_unlock(); > } > > So this RFC had the notifier call chain at [1], which I understood was the > thing you initially pointed was horrible, being under a scheduler lock. > > I can trivially move it to [2] but that still leaves it under the tasklist > lock. I don't have a good feel how much better that would be. If not good > enough then I will look for a smarter solution with less opportunity for > global impact. So task_list lock is pretty terrible and effectively unbound already (just create more tasks etc..) so adding a notifier call there shouldn't really make it much worse.
Re: [RFC 1/6] sched: Add nice value change notifier
On Fri, Oct 01, 2021 at 11:32:16AM +0100, Tvrtko Ursulin wrote: > > On 01/10/2021 10:04, Tvrtko Ursulin wrote: > > > > Hi Peter, > > > > On 30/09/2021 19:33, Peter Zijlstra wrote: > > > On Thu, Sep 30, 2021 at 06:15:47PM +0100, Tvrtko Ursulin wrote: > > > > void set_user_nice(struct task_struct *p, long nice) > > > > { > > > > bool queued, running; > > > > - int old_prio; > > > > + int old_prio, ret; > > > > struct rq_flags rf; > > > > struct rq *rq; > > > > @@ -6913,6 +6945,9 @@ void set_user_nice(struct task_struct *p, > > > > long nice) > > > > */ > > > > p->sched_class->prio_changed(rq, p, old_prio); > > > > + ret = atomic_notifier_call_chain(&user_nice_notifier_list, > > > > nice, p); > > > > + WARN_ON_ONCE(ret != NOTIFY_DONE); > > > > + > > > > out_unlock: > > > > task_rq_unlock(rq, p, &rf); > > > > } > > > > > > No, we're not going to call out to exported, and potentially unbounded, > > > functions under scheduler locks. > > > > Agreed, that's another good point why it is even more hairy, as I have > > generally alluded in the cover letter. > > > > Do you have any immediate thoughts on possible alternatives? > > > > Like for instance if I did a queue_work from set_user_nice and then ran > > a notifier chain async from a worker? I haven't looked at yet what > > repercussion would that have in terms of having to cancel the pending > > workers when tasks exit. I can try and prototype that and see how it > > would look. > > Hm or I simply move calling the notifier chain to after task_rq_unlock? That > would leave it run under the tasklist lock so probably still quite bad. Hmm? That's for normalize_rt_tasks() only, right? Just don't have it call the notifier in that special case (that's a magic sysrq thing anyway).
Re: [RFC 0/6] CPU + GPU synchronised priority scheduling
On Thu, Sep 30, 2021 at 06:15:46PM +0100, Tvrtko Ursulin wrote: > (Note I did not copy > everyone on all patches but just the cover letter for context and the rest > should be available from the mailing list.) In general, if you can't be arsed to send it to me, I can't be arsed to dig it out. I've got plenty other email I can read without having to go looking for more.
Re: [RFC 1/6] sched: Add nice value change notifier
On Thu, Sep 30, 2021 at 06:15:47PM +0100, Tvrtko Ursulin wrote: > void set_user_nice(struct task_struct *p, long nice) > { > bool queued, running; > - int old_prio; > + int old_prio, ret; > struct rq_flags rf; > struct rq *rq; > > @@ -6913,6 +6945,9 @@ void set_user_nice(struct task_struct *p, long nice) >*/ > p->sched_class->prio_changed(rq, p, old_prio); > > + ret = atomic_notifier_call_chain(&user_nice_notifier_list, nice, p); > + WARN_ON_ONCE(ret != NOTIFY_DONE); > + > out_unlock: > task_rq_unlock(rq, p, &rf); > } No, we're not going to call out to exported, and potentially unbounded, functions under scheduler locks.
Re: [PATCH v2] kernel/locking: Add context to ww_mutex_trylock.
On Thu, Sep 16, 2021 at 03:28:11PM +0200, Peter Zijlstra wrote: > On Thu, Sep 16, 2021 at 03:00:39PM +0200, Maarten Lankhorst wrote: > > > > For merge logistics, can we pls have a stable branch? I expect that the > > > i915 patches will be ready for 5.16. > > > > > > Or send it in for -rc2 so that the interface change doesn't cause needless > > > conflicts, whatever you think is best. > > > Yeah, some central branch drm could pull from, would make upstreaming > > patches that depends on it easier. :) > > I think I'll make tip/locking/wwmutex and include that in > tip/locking/core, let me have a poke. This is now so. Enjoy!
Re: [PATCH v2] kernel/locking: Add context to ww_mutex_trylock.
On Thu, Sep 16, 2021 at 03:00:39PM +0200, Maarten Lankhorst wrote: > > For merge logistics, can we pls have a stable branch? I expect that the > > i915 patches will be ready for 5.16. > > > > Or send it in for -rc2 so that the interface change doesn't cause needless > > conflicts, whatever you think is best. > Yeah, some central branch drm could pull from, would make upstreaming patches > that depends on it easier. :) I think I'll make tip/locking/wwmutex and include that in tip/locking/core, let me have a poke.
Re: [PATCH v2] kernel/locking: Add context to ww_mutex_trylock.
On Mon, Sep 13, 2021 at 10:42:36AM +0200, Maarten Lankhorst wrote: > > +/** > > + * ww_mutex_trylock - tries to acquire the w/w mutex with optional acquire > > context > > + * @ww: mutex to lock > > + * @ww_ctx: optional w/w acquire context > > + * > > + * Trylocks a mutex with the optional acquire context; no deadlock > > detection is > > + * possible. Returns 1 if the mutex has been acquired successfully, 0 > > otherwise. > > + * > > + * Unlike ww_mutex_lock, no deadlock handling is performed. However, if a > > @ctx is > > + * specified, -EALREADY handling may happen in calls to ww_mutex_trylock. > > + * > > + * A mutex acquired with this function must be released with > > ww_mutex_unlock. > > + */ > > +int ww_mutex_trylock(struct ww_mutex *ww, struct ww_acquire_ctx *ww_ctx) > > +{ > > + if (!ww_ctx) > > + return mutex_trylock(&ww->base); > > + > > + MUTEX_WARN_ON(ww->base.magic != &ww->base); > > + > > + if (unlikely(ww_ctx == READ_ONCE(ww->ctx))) > > + return -EALREADY; > > I'm not 100% sure this is a good idea, because it would make the > trylock weird. For i915 I checked manually, because I didn't want to > change the function signature. This is probably the other extreme. > > "if (ww_mutex_trylock())" would look correct, but actually be wrong > and lead to double unlock without adjustments. Maybe we could make a > ww_mutex_trylock_ctx_err, which would return -EALREADY or -EBUSY on > failure, and 0 on success? We could keep ww_mutex_trylock without > ctx, probably just #define as (!ww_mutex_trylock_ctx_err(lock, NULL)) Urgh, yeah. Also, I suppose that if we already own it, we'll just fail the trylock anyway. Let me take this out. > > + /* > > +* Reset the wounded flag after a kill. No other process can > > +* race and wound us here, since they can't have a valid owner > > +* pointer if we don't have any locks held. > > +*/ > > + if (ww_ctx->acquired == 0) > > + ww_ctx->wounded = 0; > > Yeah I guess this needs fixing too. Not completely sure since trylock > wouldn't do the whole ww dance, but since it's our first lock, > probably best to do so regardless so other users don't trip over it. This is actually critical, because if this trylock is the first lock acquisition for the context, there won't be any other opportunity to reset this value. > > + > > + if (__mutex_trylock(&ww->base)) { > > + ww_mutex_set_context_fastpath(ww, ww_ctx); > > + mutex_acquire_nest(&ww->base.dep_map, 0, 1, &ww_ctx->dep_map, > > _RET_IP_); > > + return 1; > > + } > > + > > + return 0; > > +} > > +EXPORT_SYMBOL(ww_mutex_trylock); Updated version below... --- Subject: kernel/locking: Add context to ww_mutex_trylock() From: Maarten Lankhorst Date: Thu, 9 Sep 2021 11:32:18 +0200 From: Maarten Lankhorst i915 will soon gain an eviction path that trylock a whole lot of locks for eviction, getting dmesg failures like below: BUG: MAX_LOCK_DEPTH too low! turning off the locking correctness validator. depth: 48 max: 48! 48 locks held by i915_selftest/5776: #0: 888101a79240 (&dev->mutex){}-{3:3}, at: __driver_attach+0x88/0x160 #1: c99778c0 (reservation_ww_class_acquire){+.+.}-{0:0}, at: i915_vma_pin.constprop.63+0x39/0x1b0 [i915] #2: 88800cf74de8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_vma_pin.constprop.63+0x5f/0x1b0 [i915] #3: 88810c7f9e38 (&vm->mutex/1){+.+.}-{3:3}, at: i915_vma_pin_ww+0x1c4/0x9d0 [i915] #4: 88810bad5768 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_gem_evict_something+0x110/0x860 [i915] #5: 88810bad60e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_gem_evict_something+0x110/0x860 [i915] ... #46: 88811964d768 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_gem_evict_something+0x110/0x860 [i915] #47: 88811964e0e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_gem_evict_something+0x110/0x860 [i915] INFO: lockdep is turned off. Fixing eviction to nest into ww_class_acquire is a high priority, but it requires a rework of the entire driver, which can only be done one step at a time. As an intermediate solution, add an acquire context to ww_mutex_trylock, which allows us to do proper nesting annotations on the trylocks, making the above lockdep splat disappear. This is also useful in regulator_lock_nested, which may avoid dropping regulator_nesting_mutex in the uncontended path, so use it there. TTM may be another user for this, where we could lock a buffer in a
Re: [PATCH v2] kernel/locking: Add context to ww_mutex_trylock.
On Fri, Sep 10, 2021 at 05:02:54PM +0200, Peter Zijlstra wrote: > That doesn't look right, how's this for you? Full patch for the robots here: https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=locking/core&id=826e7b8826f0af185bb93249600533c33fd69a95
Re: [PATCH v2] kernel/locking: Add context to ww_mutex_trylock.
On Thu, Sep 09, 2021 at 11:32:18AM +0200, Maarten Lankhorst wrote: > diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c > index d456579d0952..791c28005eef 100644 > --- a/kernel/locking/mutex.c > +++ b/kernel/locking/mutex.c > @@ -736,6 +736,44 @@ __ww_mutex_lock(struct mutex *lock, unsigned int state, > unsigned int subclass, > return __mutex_lock_common(lock, state, subclass, NULL, ip, ww_ctx, > true); > } > > +/** > + * ww_mutex_trylock - tries to acquire the w/w mutex with optional acquire > context > + * @lock: mutex to lock > + * @ctx: optional w/w acquire context > + * > + * Trylocks a mutex with the optional acquire context; no deadlock detection > is > + * possible. Returns 1 if the mutex has been acquired successfully, 0 > otherwise. > + * > + * Unlike ww_mutex_lock, no deadlock handling is performed. However, if a > @ctx is > + * specified, -EALREADY and -EDEADLK handling may happen in calls to > ww_mutex_lock. > + * > + * A mutex acquired with this function must be released with ww_mutex_unlock. > + */ > +int __sched > +ww_mutex_trylock(struct ww_mutex *ww, struct ww_acquire_ctx *ctx) > +{ > + bool locked; > + > + if (!ctx) > + return mutex_trylock(&ww->base); > + > +#ifdef CONFIG_DEBUG_MUTEXES > + DEBUG_LOCKS_WARN_ON(ww->base.magic != &ww->base); > +#endif > + > + preempt_disable(); > + locked = __mutex_trylock(&ww->base); > + > + if (locked) { > + ww_mutex_set_context_fastpath(ww, ctx); > + mutex_acquire_nest(&ww->base.dep_map, 0, 1, &ctx->dep_map, > _RET_IP_); > + } > + preempt_enable(); > + > + return locked; > +} > +EXPORT_SYMBOL(ww_mutex_trylock); > + > #ifdef CONFIG_DEBUG_LOCK_ALLOC > void __sched > mutex_lock_nested(struct mutex *lock, unsigned int subclass) > diff --git a/kernel/locking/ww_rt_mutex.c b/kernel/locking/ww_rt_mutex.c > index 3f1fff7d2780..c4cb863edb4c 100644 > --- a/kernel/locking/ww_rt_mutex.c > +++ b/kernel/locking/ww_rt_mutex.c > @@ -50,6 +50,18 @@ __ww_rt_mutex_lock(struct ww_mutex *lock, struct > ww_acquire_ctx *ww_ctx, > return ret; > } > > +int __sched > +ww_mutex_trylock(struct ww_mutex *lock, struct ww_acquire_ctx *ctx) > +{ > + int locked = rt_mutex_trylock(&lock->base); > + > + if (locked && ctx) > + ww_mutex_set_context_fastpath(lock, ctx); > + > + return locked; > +} > +EXPORT_SYMBOL(ww_mutex_trylock); > + > int __sched > ww_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ctx) > { That doesn't look right, how's this for you? --- --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -94,6 +94,9 @@ static inline unsigned long __owner_flag return owner & MUTEX_FLAGS; } +/* + * Returns: __mutex_owner(lock) on failure or NULL on success. + */ static inline struct task_struct *__mutex_trylock_common(struct mutex *lock, bool handoff) { unsigned long owner, curr = (unsigned long)current; @@ -736,6 +739,47 @@ __ww_mutex_lock(struct mutex *lock, unsi return __mutex_lock_common(lock, state, subclass, NULL, ip, ww_ctx, true); } +/** + * ww_mutex_trylock - tries to acquire the w/w mutex with optional acquire context + * @ww: mutex to lock + * @ww_ctx: optional w/w acquire context + * + * Trylocks a mutex with the optional acquire context; no deadlock detection is + * possible. Returns 1 if the mutex has been acquired successfully, 0 otherwise. + * + * Unlike ww_mutex_lock, no deadlock handling is performed. However, if a @ctx is + * specified, -EALREADY handling may happen in calls to ww_mutex_trylock. + * + * A mutex acquired with this function must be released with ww_mutex_unlock. + */ +int ww_mutex_trylock(struct ww_mutex *ww, struct ww_acquire_ctx *ww_ctx) +{ + if (!ww_ctx) + return mutex_trylock(&ww->base); + + MUTEX_WARN_ON(ww->base.magic != &ww->base); + + if (unlikely(ww_ctx == READ_ONCE(ww->ctx))) + return -EALREADY; + + /* +* Reset the wounded flag after a kill. No other process can +* race and wound us here, since they can't have a valid owner +* pointer if we don't have any locks held. +*/ + if (ww_ctx->acquired == 0) + ww_ctx->wounded = 0; + + if (__mutex_trylock(&ww->base)) { + ww_mutex_set_context_fastpath(ww, ww_ctx); + mutex_acquire_nest(&ww->base.dep_map, 0, 1, &ww_ctx->dep_map, _RET_IP_); + return 1; + } + + return 0; +} +EXPORT_SYMBOL(ww_mutex_trylock); + #ifdef CONFIG_DEBUG_LOCK_ALLOC void __sched mutex_lock_nested(struct mutex *lock, unsigned int subclass) --- a/kernel/locking/ww_rt_mutex.c +++ b/kernel/locking/ww_rt_mutex.c @@ -9,6 +9,34 @@ #define WW_RT #include "rtmutex.c" +int ww_mutex_trylock(struct ww_mutex *lock, struct ww_acquire_ctx *ww_ctx) +{ + struct rt_mutex *rtm = &lock->base; + + if (!ww_ctx) + return rt_mutex_trylock(rtm); + + if (unlikely(ww_
Re: [PATCH] kernel/locking: Add context to ww_mutex_trylock.
On Thu, Sep 09, 2021 at 07:38:06AM +0200, Maarten Lankhorst wrote: > > You'll need a similar hunk in ww_rt_mutex.c > > What tree has that file? Linus' tree should have it. Per commit: f8635d509d80 ("locking/ww_mutex: Implement rtmutex based ww_mutex API functions")
Re: [PATCH] kernel/locking: Add context to ww_mutex_trylock.
On Tue, Sep 07, 2021 at 03:20:44PM +0200, Maarten Lankhorst wrote: > i915 will soon gain an eviction path that trylock a whole lot of locks > for eviction, getting dmesg failures like below: > > BUG: MAX_LOCK_DEPTH too low! > turning off the locking correctness validator. > depth: 48 max: 48! > 48 locks held by i915_selftest/5776: > #0: 888101a79240 (&dev->mutex){}-{3:3}, at: > __driver_attach+0x88/0x160 > #1: c99778c0 (reservation_ww_class_acquire){+.+.}-{0:0}, at: > i915_vma_pin.constprop.63+0x39/0x1b0 [i915] > #2: 88800cf74de8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: > i915_vma_pin.constprop.63+0x5f/0x1b0 [i915] > #3: 88810c7f9e38 (&vm->mutex/1){+.+.}-{3:3}, at: > i915_vma_pin_ww+0x1c4/0x9d0 [i915] > #4: 88810bad5768 (reservation_ww_class_mutex){+.+.}-{3:3}, at: > i915_gem_evict_something+0x110/0x860 [i915] > #5: 88810bad60e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: > i915_gem_evict_something+0x110/0x860 [i915] > ... > #46: 88811964d768 (reservation_ww_class_mutex){+.+.}-{3:3}, at: > i915_gem_evict_something+0x110/0x860 [i915] > #47: 88811964e0e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: > i915_gem_evict_something+0x110/0x860 [i915] > INFO: lockdep is turned off. > As an intermediate solution, add an acquire context to ww_mutex_trylock, > which allows us to do proper nesting annotations on the trylocks, making > the above lockdep splat disappear. Fair enough I suppose. > +/** > + * ww_mutex_trylock - tries to acquire the w/w mutex with optional acquire > context > + * @lock: mutex to lock > + * @ctx: optional w/w acquire context > + * > + * Trylocks a mutex with the optional acquire context; no deadlock detection > is > + * possible. Returns 1 if the mutex has been acquired successfully, 0 > otherwise. > + * > + * Unlike ww_mutex_lock, no deadlock handling is performed. However, if a > @ctx is > + * specified, -EALREADY and -EDEADLK handling may happen in calls to > ww_mutex_lock. > + * > + * A mutex acquired with this function must be released with ww_mutex_unlock. > + */ > +int __sched > +ww_mutex_trylock(struct ww_mutex *ww, struct ww_acquire_ctx *ctx) > +{ > + bool locked; > + > + if (!ctx) > + return mutex_trylock(&ww->base); > + > +#ifdef CONFIG_DEBUG_MUTEXES > + DEBUG_LOCKS_WARN_ON(ww->base.magic != &ww->base); > +#endif > + > + preempt_disable(); > + locked = __mutex_trylock(&ww->base); > + > + if (locked) { > + ww_mutex_set_context_fastpath(ww, ctx); > + mutex_acquire_nest(&ww->base.dep_map, 0, 1, &ctx->dep_map, > _RET_IP_); > + } > + preempt_enable(); > + > + return locked; > +} > +EXPORT_SYMBOL(ww_mutex_trylock); You'll need a similar hunk in ww_rt_mutex.c
Re: [PATCH v2 0/2] locking/lockdep, drm: apply new lockdep assert in drm_auth.c
On Mon, Aug 02, 2021 at 10:26:16AM +0200, Daniel Vetter wrote: > On Sat, Jul 31, 2021 at 04:24:56PM +0800, Desmond Cheong Zhi Xi wrote: > > Hi, > > > > Following a discussion on the patch ("drm: use the lookup lock in > > drm_is_current_master") [1], Peter Zijlstra proposed new lockdep_assert > > helpers to make it convenient to compose lockdep checks together. > > > > This series includes the patch that introduces the new lockdep helpers, > > then utilizes these helpers in drm_is_current_master_locked in the > > following patch. > > > > v1 -> v2: > > Patch 2: > > - Updated the kerneldoc on the lock design of drm_file.master to explain > > the use of lockdep_assert(). As suggested by Boqun Feng. > > > > Link: > > https://lore.kernel.org/lkml/20210722092929.244629-2-desmondcheon...@gmail.com/ > > [1] > > Can you pls also cc: this to intel-gfx so the local CI there can pick it > up and verify? Just to check we got it all. Acked-by: Peter Zijlstra (Intel) Feel free to take it through the drm tree.
Re: [PATCH 1/3] drm: use the lookup lock in drm_is_current_master
On Thu, Jul 29, 2021 at 10:32:13PM +0800, Desmond Cheong Zhi Xi wrote: > Sounds good, will do. Thanks for the patch, Peter. > > Just going to make a small edit: > s/LOCK_STAT_NOT_HELD/LOCK_STATE_NOT_HELD/ Bah, I knew I should've compile tested it :-), Thanks!
Re: [PATCH 1/3] drm: use the lookup lock in drm_is_current_master
On Thu, Jul 22, 2021 at 12:38:10PM +0200, Daniel Vetter wrote: > On Thu, Jul 22, 2021 at 05:29:27PM +0800, Desmond Cheong Zhi Xi wrote: > > Inside drm_is_current_master, using the outer drm_device.master_mutex > > to protect reads of drm_file.master makes the function prone to creating > > lock hierarchy inversions. Instead, we can use the > > drm_file.master_lookup_lock that sits at the bottom of the lock > > hierarchy. > > > > Reported-by: Daniel Vetter > > Signed-off-by: Desmond Cheong Zhi Xi > > --- > > drivers/gpu/drm/drm_auth.c | 9 + > > 1 file changed, 5 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c > > index f00354bec3fb..9c24b8cc8e36 100644 > > --- a/drivers/gpu/drm/drm_auth.c > > +++ b/drivers/gpu/drm/drm_auth.c > > @@ -63,8 +63,9 @@ > > > > static bool drm_is_current_master_locked(struct drm_file *fpriv) > > { > > - lockdep_assert_held_once(&fpriv->minor->dev->master_mutex); > > - > > + /* Either drm_device.master_mutex or drm_file.master_lookup_lock > > +* should be held here. > > +*/ > > Disappointing that lockdep can't check or conditions for us, a > lockdep_assert_held_either would be really neat in some cases. > > Adding lockdep folks, maybe they have ideas. #ifdef CONFIG_LOCKDEP WARN_ON_ONCE(debug_locks && !(lockdep_is_held(&drm_device.master_mutex) || lockdep_is_held(&drm_file.master_lookup_lock))); #endif doesn't exactly roll off the tongue, but should do as you want I suppose. Would something like: #define lockdep_assert(cond)WARN_ON_ONCE(debug_locks && !(cond)) Such that we can write: lockdep_assert(lockdep_is_held(&drm_device.master_mutex) || lockdep_is_held(&drm_file.master_lookup_lock)); make it better ? --- Subject: locking/lockdep: Provide lockdep_assert{,_once}() helpers Extract lockdep_assert{,_once}() helpers to more easily write composite assertions like, for example: lockdep_assert(lockdep_is_held(&drm_device.master_mutex) || lockdep_is_held(&drm_file.master_lookup_lock)); Signed-off-by: Peter Zijlstra (Intel) --- diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index 5cf387813754..0da67341c1fb 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -306,31 +306,29 @@ extern void lock_unpin_lock(struct lockdep_map *lock, struct pin_cookie); #define lockdep_depth(tsk) (debug_locks ? (tsk)->lockdep_depth : 0) -#define lockdep_assert_held(l) do {\ - WARN_ON(debug_locks && \ - lockdep_is_held(l) == LOCK_STATE_NOT_HELD); \ - } while (0) +#define lockdep_assert(cond) \ + do { WARN_ON(debug_locks && !(cond)); } while (0) -#define lockdep_assert_not_held(l) do {\ - WARN_ON(debug_locks && \ - lockdep_is_held(l) == LOCK_STATE_HELD); \ - } while (0) +#define lockdep_assert_once(cond) \ + do { WARN_ON_ONCE(debug_locks && !(cond)); } while (0) -#define lockdep_assert_held_write(l) do {\ - WARN_ON(debug_locks && !lockdep_is_held_type(l, 0));\ - } while (0) +#define lockdep_assert_held(l) \ + lockdep_assert(lockdep_is_held(l) != LOCK_STAT_NOT_HELD) -#define lockdep_assert_held_read(l)do {\ - WARN_ON(debug_locks && !lockdep_is_held_type(l, 1));\ - } while (0) +#define lockdep_assert_not_held(l) \ + lockdep_assert(lockdep_is_held(l) != LOCK_STATE_HELD) -#define lockdep_assert_held_once(l)do {\ - WARN_ON_ONCE(debug_locks && !lockdep_is_held(l)); \ - } while (0) +#define lockdep_assert_held_write(l) \ + lockdep_assert(lockdep_is_held_type(l, 0)) -#define lockdep_assert_none_held_once()do { \ - WARN_ON_ONCE(debug_locks && current->lockdep_depth);\ - } while (0) +#define lockdep_assert_held_read(l)\ + lockdep_assert(lockdep_is_held_type(l, 1)) + +#define lockdep_assert_held_once(l)\ + lockdep_assert_once(lockdep_is_held(l) != LOCK_STAT_NOT_HELD) + +#define lockdep_assert_none_held_once()\ + lockdep_assert_once(!current->lockdep_depth) #define lockdep_recursing(tsk) ((tsk)->lockdep_recursion) @@ -407,6 +405,9 @@ extern int lock_is_held(const void *); extern int lockdep_is_held(co
Re: [PATCH 1/2] locking/rwsem: Add down_write_interruptible
On Mon, Mar 08, 2021 at 03:54:55PM -0500, Zack Rusin wrote: > Add an interruptible version of down_write. It's the other > side of the already implemented down_read_interruptible. > It allows drivers which used custom locking code to > support interruptible rw semaphores to switch over > to rwsem. > > Cc: Peter Zijlstra > Cc: Ingo Molnar > Cc: Will Deacon > Cc: linux-ker...@vger.kernel.org > Cc: dri-devel@lists.freedesktop.org No SoB! Assuning you actually wrote and and simply forgot to add it, the patch does look ok, so: Acked-by: Peter Zijlstra (Intel) ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] mutex: nuke mutex_trylock_recursive
On Tue, Feb 16, 2021 at 01:38:49PM +0100, Christian König wrote: > > > Am 16.02.21 um 11:13 schrieb Peter Zijlstra: > > On Tue, Feb 16, 2021 at 10:29:00AM +0100, Daniel Vetter wrote: > > > On Tue, Feb 16, 2021 at 09:21:46AM +0100, Christian König wrote: > > > > The last user went away in the 5.11 cycle. > > > > > > > > Signed-off-by: Christian König > > > Nice. > > > > > > Reviewed-by: Daniel Vetter > > > > > > I think would be good to still stuff this into 5.12 before someone > > > resurrects this zombie. > > Already done: > > > > > > https://lkml.kernel.org/r/161296556531.23325.10473355259841906876.tip-bot2@tip-bot2 > > One less bad concept to worry about. > > But your patch is missing to remove the checkpatch.pl check for this :) The next patch does that, look at the "Thread overview:" at the bottom of the things. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] mutex: nuke mutex_trylock_recursive
On Tue, Feb 16, 2021 at 10:29:00AM +0100, Daniel Vetter wrote: > On Tue, Feb 16, 2021 at 09:21:46AM +0100, Christian König wrote: > > The last user went away in the 5.11 cycle. > > > > Signed-off-by: Christian König > > Nice. > > Reviewed-by: Daniel Vetter > > I think would be good to still stuff this into 5.12 before someone > resurrects this zombie. Already done: https://lkml.kernel.org/r/161296556531.23325.10473355259841906876.tip-bot2@tip-bot2 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/syncobj: make lockdep complain on WAIT_FOR_SUBMIT v3
On Tue, Jan 19, 2021 at 02:05:09PM +0100, Daniel Vetter wrote: > > diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h > > index b9e9adec73e8..6eb117c0d0f3 100644 > > --- a/include/linux/lockdep.h > > +++ b/include/linux/lockdep.h > > @@ -310,6 +310,10 @@ extern void lock_unpin_lock(struct lockdep_map *lock, > > struct pin_cookie); > > WARN_ON_ONCE(debug_locks && !lockdep_is_held(l)); \ > > } while (0) > > > > +#define lockdep_assert_none_held_once()do { > > \ > > + WARN_ON_ONCE(debug_locks && current->lockdep_depth);\ > > + } while (0) > > + > > #define lockdep_recursing(tsk) ((tsk)->lockdep_recursion) > > > > #define lockdep_pin_lock(l)lock_pin_lock(&(l)->dep_map) > > @@ -387,6 +391,7 @@ extern int lockdep_is_held(const void *); > > #define lockdep_assert_held_write(l) do { (void)(l); } while (0) > > #define lockdep_assert_held_read(l)do { (void)(l); } while (0) > > #define lockdep_assert_held_once(l)do { (void)(l); } while (0) > > +#define lockdep_assert_none_held_once() do { } while (0) > > > > #define lockdep_recursing(tsk) (0) > > ofc needs ack from Peter, but drm parts look all good to me. Acked-by: Peter Zijlstra (Intel) ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/syncobj: make lockdep complain on WAIT_FOR_SUBMIT v2
On Tue, Jan 19, 2021 at 10:46:53AM +0100, Christian König wrote: > But I'm going to double check if drm_syncobj_fence_add_wait() isn't used > elsewhere as well. drm_syncobj_array_wait_timeout() Which is why I asked.. :-) ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/syncobj: make lockdep complain on WAIT_FOR_SUBMIT v2
On Mon, Jan 18, 2021 at 07:03:34PM +0100, Christian König wrote: > diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c > index 6e74e6745eca..f51458615158 100644 > --- a/drivers/gpu/drm/drm_syncobj.c > +++ b/drivers/gpu/drm/drm_syncobj.c > @@ -387,6 +387,13 @@ int drm_syncobj_find_fence(struct drm_file *file_private, > if (!syncobj) > return -ENOENT; > > + /* Waiting for userspace with locks help is illegal cause that can > + * trivial deadlock with page faults for example. Make lockdep complain > + * about it early on. > + */ Egads, the cursed comment style is spreading :/ > + if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) > + lockdep_assert_none_held_once(); > + Should this not be part of drm_syncobj_fence_add_wait() instead? Also, do you want to sprinkle might_sleep() around ? > *fence = drm_syncobj_fence_get(syncobj); > drm_syncobj_put(syncobj); > > diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 3/3] locking/selftests: Add testcases for fs_reclaim
On Fri, Nov 20, 2020 at 10:54:44AM +0100, Daniel Vetter wrote: > Since I butchered this I figured better to make sure we have testcases > for this now. Since we only have a locking context for __GFP_FS that's > the only thing we're testing right now. > > Cc: linux-fsde...@vger.kernel.org > Cc: Dave Chinner > Cc: Qian Cai > Cc: linux-...@vger.kernel.org > Cc: Thomas Hellström (Intel) > Cc: Andrew Morton > Cc: Jason Gunthorpe > Cc: linux...@kvack.org > Cc: linux-r...@vger.kernel.org > Cc: Maarten Lankhorst > Cc: Christian König > Cc: "Matthew Wilcox (Oracle)" > Signed-off-by: Daniel Vetter > Cc: Peter Zijlstra > Cc: Ingo Molnar > Cc: Will Deacon > Cc: linux-ker...@vger.kernel.org > --- > lib/locking-selftest.c | 47 ++ > 1 file changed, 47 insertions(+) I have a few changes pending for this file, I don't think the conflicts will be bad, but.. In any case: Acked-by: Peter Zijlstra (Intel) ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v2 0/3] drm: commit_work scheduling
On Tue, Oct 06, 2020 at 11:08:59AM +0200, Daniel Vetter wrote: > vblank work needs to preempt commit work. > > Right now we don't have any driver requiring this, but if we e.g. roll out > the gamma table update for i915, then this _has_ to happen in the vblank > period. > > Whereas the commit work can happen in there, but it can also be delayed a > bit (until the vblank worker has finished) we will not miss any additional > deadline due to that. > > So that's why we have 2 levels. I'm not even sure you can model that with > SCHED_DEADLINE, since essentially we need a few usec of cpu time very > vblank (16ms normally), but thos few usec _must_ be scheduled within a > very specific time slot or we're toast. And that vblank period is only > 1-2ms usually. Depends a bit on what the hardware gets us. If for example we're provided an interrupt on vblank start, then that could wake a DEADLINE job with (given your numbers above): .sched_period = 16ms, .sched_deadline = 1-2ms, .sched_runtime = 1-2ms, The effective utilization of that task would be: 1-2/16. > deadline has the upshot that it compose much better than SCHED_FIFO: > Everyone just drops their deadline requirements onto the scheduler, > scheduler makes sure it's all obeyed (or rejects your request). > > The trouble is we'd need to know how long a commit takes, worst case, on a > given platform. And for that you need to measure stuff, and we kinda can't > spend a few minutes at boot-up going through the combinatorial maze of > atomic commits to make sure we have it all. > > So I think in practice letting userspace set the right rt priority/mode is > the only way to go here :-/ Or you can have it adjust it's expected runtime as the system runs (always keeping a little extra room over what you measure to make sure). Given you have period > deadline, you can simply start with runtime = deadline and adjust downwards during use (carefully). ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [External] Re: [PATCH 2/2] sched: mark PRINTK_DEFERRED_CONTEXT_MASK in __schedule()
On Tue, Sep 29, 2020 at 04:27:51PM +0200, Petr Mladek wrote: > Upstreaming the console handling will be the next big step. I am sure > that there will be long discussion about it. But there might be > few things that would help removing printk_deferred(). > > 1. Messages will be printed on consoles by dedicated kthreads. It will >be safe context. No deadlocks. This, that's what I remember. With sane consoles having a ->write_atomic() callback which is called in-line. Current -RT has a single kthread that's flushing the consoles. > 2. The registration and unregistration of consoles should not longer >be handled by console_lock (semaphore). It should be possible to >call most consoles without a sleeping lock. It should remove all >these deadlocks between printk() and scheduler(). I'm confused, who cares about registation? That only happens once at boot, right? >There might be problems with some consoles. For example, tty would >most likely still need a sleeping lock because it is using the console >semaphore also internally. But per 1) above, that's done from a kthread, so nobody cares. > 3. We will try harder to get the messages out immediately during >panic(). As long as you guarantee to first write everything to consoles with ->write_atomic() before attempting to flush others that should be fine. > It would take some time until the above reaches upstream. But it seems > to be the right way to go. Yep. > Finally, the deadlock happens "only" when someone is waiting on > console_lock() in parallel. Otherwise, the waitqueue for the semaphore > is empty and scheduler is not called. There's more deadlocks. In fact the one described earlier in this thread isn't the one you mention. > It means that there is quite a big change to see the WARN(). It might > be even bigger than with printk_deferred() because WARN() in scheduler > means that the scheduler is big troubles. Nobody guarantees that > the deferred messages will get handled later. I only care about ->write_atomic() :-) Anything else is a best-effort-loss anyway. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [External] Re: [PATCH 2/2] sched: mark PRINTK_DEFERRED_CONTEXT_MASK in __schedule()
On Mon, Sep 28, 2020 at 06:04:23PM +0800, Chengming Zhou wrote: > Well, you are lucky. So it's a problem in our printk implementation. Not lucky; I just kicked it in the groin really hard: git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git debug/experimental > The deadlock path is: > > printk > vprintk_emit > console_unlock > vt_console_print > hide_cursor > bit_cursor > soft_cursor > queue_work_on > __queue_work > try_to_wake_up > _raw_spin_lock > native_queued_spin_lock_slowpath > > Looks like it's introduced by this commit: > > eaa434defaca1781fb2932c685289b610aeb8b4b > > "drm/fb-helper: Add fb_deferred_io support" Oh gawd, yeah, all the !serial consoles are utter batshit. Please look at John's last printk rewrite, IIRC it farms all that off to a kernel thread instead of doing it from the printk() caller's context. I'm not sure where he hides his latests patches, but I'm sure he'll be more than happy to tell you. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
On Thu, Sep 24, 2020 at 09:51:38AM -0400, Steven Rostedt wrote: > > It turns out, that getting selected for pull-balance is exactly that > > condition, and clearly a migrate_disable() task cannot be pulled, but we > > can use that signal to try and pull away the running task that's in the > > way. > > Unless of course that running task is in a migrate disable section > itself ;-) See my ramblings here: https://lkml.kernel.org/r/20200924082717.ga1362...@hirez.programming.kicks-ass.net My plan was to have the migrate_enable() of the running task trigger the migration in that case. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
On Thu, Sep 24, 2020 at 08:32:41AM -0400, Steven Rostedt wrote: > Anyway, instead of blocking. What about having a counter of number of > migrate disabled tasks per cpu, and when taking a migrate_disable(), and > there's > already another task with migrate_disabled() set, and the current task has > an affinity greater than 1, it tries to migrate to another CPU? That doesn't solve the problem. On wakeup we should already prefer an idle CPU over one running a (RT) task, but you can always wake more tasks than there's CPUs around and you'll _have_ to stack at some point. The trick is how to unstack them correctly. We need to detect when a migrate_disable() task _should_ start running again, and migrate away whoever is in the way at that point. It turns out, that getting selected for pull-balance is exactly that condition, and clearly a migrate_disable() task cannot be pulled, but we can use that signal to try and pull away the running task that's in the way. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 10/28] mm: only allow page table mappings for built-in zsmalloc
On Thu, Apr 09, 2020 at 09:08:26AM -0700, Minchan Kim wrote: > On Wed, Apr 08, 2020 at 01:59:08PM +0200, Christoph Hellwig wrote: > > This allows to unexport map_vm_area and unmap_kernel_range, which are > > rather deep internal and should not be available to modules. > > Even though I don't know how many usecase we have using zsmalloc as > module(I heard only once by dumb reason), it could affect existing > users. Thus, please include concrete explanation in the patch to > justify when the complain occurs. The justification is 'we can unexport functions that have no sane reason of being exported in the first place'. The Changelog pretty much says that. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 10/28] mm: only allow page table mappings for built-in zsmalloc
On Wed, Apr 08, 2020 at 08:01:00AM -0700, Randy Dunlap wrote: > Hi, > > On 4/8/20 4:59 AM, Christoph Hellwig wrote: > > diff --git a/mm/Kconfig b/mm/Kconfig > > index 36949a9425b8..614cc786b519 100644 > > --- a/mm/Kconfig > > +++ b/mm/Kconfig > > @@ -702,7 +702,7 @@ config ZSMALLOC > > > > config ZSMALLOC_PGTABLE_MAPPING > > bool "Use page table mapping to access object in zsmalloc" > > - depends on ZSMALLOC > > + depends on ZSMALLOC=y > > It's a bool so this shouldn't matter... not needed. My mm/Kconfig has: config ZSMALLOC tristate "Memory allocator for compressed pages" depends on MMU which I think means it can be modular, no? ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: decruft the vmalloc API
On Wed, Apr 08, 2020 at 01:58:58PM +0200, Christoph Hellwig wrote: > Hi all, > > Peter noticed that with some dumb luck you can toast the kernel address > space with exported vmalloc symbols. > > I used this as an opportunity to decruft the vmalloc.c API and make it > much more systematic. This also removes any chance to create vmalloc > mappings outside the designated areas or using executable permissions > from modules. Besides that it removes more than 300 lines of code. > Looks great, thanks for doing this! Acked-by: Peter Zijlstra (Intel) ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 17/28] mm: remove the prot argument from vm_map_ram
On Wed, Apr 08, 2020 at 01:59:15PM +0200, Christoph Hellwig wrote: > This is always GFP_KERNEL - for long term mappings with other properties > vmap should be used. PAGE_KERNEL != GFP_KERNEL :-) > - return vm_map_ram(mock->pages, mock->npages, 0, PAGE_KERNEL); > + return vm_map_ram(mock->pages, mock->npages, 0); ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: rcu_barrier() no longer allowed within mmap_sem?
On Mon, Mar 30, 2020 at 03:00:35PM +0200, Daniel Vetter wrote: > Hi all, for all = rcu, cpuhotplug and perf maintainers > > We've hit an interesting new lockdep splat in our drm/i915 CI: > > https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17096/shard-tglb7/igt@kms_frontbuffer_track...@fbcpsr-rgb101010-draw-mmap-gtt.html#dmesg-warnings861 > > Summarizing away the driver parts we have > > < gpu locks which are held within mm->mmap_sem in various gpu fault handlers > > > -> #4 (&mm->mmap_sem#2){}: > <4> [604.892615] __might_fault+0x63/0x90 > <4> [604.892617] _copy_to_user+0x1e/0x80 > <4> [604.892619] perf_read+0x200/0x2b0 > <4> [604.892621] vfs_read+0x96/0x160 > <4> [604.892622] ksys_read+0x9f/0xe0 > <4> [604.892623] do_syscall_64+0x4f/0x220 > <4> [604.892624] entry_SYSCALL_64_after_hwframe+0x49/0xbe > <4> [604.892625] > -> #3 (&cpuctx_mutex){+.+.}: > <4> [604.892626] __mutex_lock+0x9a/0x9c0 > <4> [604.892627] perf_event_init_cpu+0xa4/0x140 > <4> [604.892629] perf_event_init+0x19d/0x1cd > <4> [604.892630] start_kernel+0x362/0x4e4 > <4> [604.892631] secondary_startup_64+0xa4/0xb0 > <4> [604.892631] > -> #2 (pmus_lock){+.+.}: > <4> [604.892633] __mutex_lock+0x9a/0x9c0 > <4> [604.892633] perf_event_init_cpu+0x6b/0x140 > <4> [604.892635] cpuhp_invoke_callback+0x9b/0x9d0 > <4> [604.892636] _cpu_up+0xa2/0x140 > <4> [604.892637] do_cpu_up+0x61/0xa0 > <4> [604.892639] smp_init+0x57/0x96 > <4> [604.892639] kernel_init_freeable+0x87/0x1dc > <4> [604.892640] kernel_init+0x5/0x100 > <4> [604.892642] ret_from_fork+0x24/0x50 > <4> [604.892642] > -> #1 (cpu_hotplug_lock.rw_sem){}: > <4> [604.892643] cpus_read_lock+0x34/0xd0 > <4> [604.892644] rcu_barrier+0xaa/0x190 > <4> [604.892645] kernel_init+0x21/0x100 > <4> [604.892647] ret_from_fork+0x24/0x50 > <4> [604.892647] > -> #0 (rcu_state.barrier_mutex){+.+.}: > The last backtrace boils down to i915 driver code which holds the same > locks we are holding within mm->mmap_sem, and then ends up calling > rcu_barrier(). From what I can see i915 is just the messenger here, > any driver with this pattern of a lock held within mmap_sem which also > has a path of calling rcu_barrier while holding that lock should be > hitting this splat. > > Two questions: > - This suggests that calling rcu_barrier() isn't ok anymore while > holding mmap_sem, or anything that has a dependency upon mmap_sem. I > guess that's not the idea, please confirm. > - Assuming this depedency is indeed not intended, where should the > loop be broken? It goes through perf, cpuhotplug and rcu subsystems, > and I don't have a clue about any of those. I wonder what is new here; the 1-4 chain there has been true for a long time, see also the comment at perf_event_ctx_lock_nested(). That said; it _might_ be possible to break 3->4, that is, all the copy_{to,from}_user() usage in perf can be lifted out from under the various locks by re-arranging code, but I have a nagging feeling there was more to it than that. Of course, while I did document the locking rules, I seem to have forgotten to comment on exactly why these rules are as they are.. oh well. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: linux-next: build failure after merge of the tip tree
On Thu, Nov 21, 2019 at 02:54:03PM +1100, Stephen Rothwell wrote: > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h > index f940526c5889..63e734a125fb 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h > @@ -473,7 +473,7 @@ TRACE_EVENT(amdgpu_ib_pipe_sync, > TP_PROTO(struct amdgpu_job *sched_job, struct dma_fence *fence), > TP_ARGS(sched_job, fence), > TP_STRUCT__entry( > - __string(ring, sched_job->base.sched->name); > + __string(ring, sched_job->base.sched->name) >__field(uint64_t, id) >__field(struct dma_fence *, fence) >__field(uint64_t, ctx) Correct, ';' there is invalid and now results in very verbose compile errors :-) signature.asc Description: PGP signature ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH -next] treewide: remove unused argument in lock_release()
On Tue, Oct 08, 2019 at 06:33:51PM +0200, Daniel Vetter wrote: > On Thu, Sep 19, 2019 at 12:09:40PM -0400, Qian Cai wrote: > > Since the commit b4adfe8e05f1 ("locking/lockdep: Remove unused argument > > in __lock_release"), @nested is no longer used in lock_release(), so > > remove it from all lock_release() calls and friends. > > > > Signed-off-by: Qian Cai > > Ack on the concept and for the drm parts (and feel free to keep the ack if > you inevitably have to respin this later on). Might result in some > conflicts, but welp we need to keep Linus busy :-) > > Acked-by: Daniel Vetter Thanks Daniel!
Re: [PATCH -next] treewide: remove unused argument in lock_release()
On Thu, Sep 19, 2019 at 12:09:40PM -0400, Qian Cai wrote: > Since the commit b4adfe8e05f1 ("locking/lockdep: Remove unused argument > in __lock_release"), @nested is no longer used in lock_release(), so > remove it from all lock_release() calls and friends. Right; I never did this cleanup for not wanting the churn, but as long as it applies I'll take it. > Signed-off-by: Qian Cai > --- > drivers/gpu/drm/drm_connector.c | 2 +- > drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 6 +++--- > drivers/gpu/drm/i915/gt/intel_engine_pm.c | 2 +- > drivers/gpu/drm/i915/i915_request.c | 2 +- > drivers/tty/tty_ldsem.c | 8 > fs/dcache.c | 2 +- > fs/jbd2/transaction.c | 4 ++-- > fs/kernfs/dir.c | 4 ++-- > fs/ocfs2/dlmglue.c| 2 +- > include/linux/jbd2.h | 2 +- > include/linux/lockdep.h | 21 ++--- > include/linux/percpu-rwsem.h | 4 ++-- > include/linux/rcupdate.h | 2 +- > include/linux/rwlock_api_smp.h| 16 > include/linux/seqlock.h | 4 ++-- > include/linux/spinlock_api_smp.h | 8 > include/linux/ww_mutex.h | 2 +- > include/net/sock.h| 2 +- > kernel/bpf/stackmap.c | 2 +- > kernel/cpu.c | 2 +- > kernel/locking/lockdep.c | 3 +-- > kernel/locking/mutex.c| 4 ++-- > kernel/locking/rtmutex.c | 6 +++--- > kernel/locking/rwsem.c| 10 +- > kernel/printk/printk.c| 10 +- > kernel/sched/core.c | 2 +- > lib/locking-selftest.c| 24 > mm/memcontrol.c | 2 +- > net/core/sock.c | 2 +- > tools/lib/lockdep/include/liblockdep/common.h | 3 +-- > tools/lib/lockdep/include/liblockdep/mutex.h | 2 +- > tools/lib/lockdep/include/liblockdep/rwlock.h | 2 +- > tools/lib/lockdep/preload.c | 16 > 33 files changed, 90 insertions(+), 93 deletions(-) Thanks!
Re: [PATCH 3/4] kernel.h: Add non_block_start/end()
On Fri, Aug 23, 2019 at 03:42:47PM +0200, Daniel Vetter wrote: > I'm assuming the lockdep one will land, so not going to resend that. I was assuming you'd wake the might_lock_nested() along with the i915 user through the i915/drm tree. If want me to take some or all of that, lemme know. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 3/4] kernel.h: Add non_block_start/end()
On Fri, Aug 23, 2019 at 09:12:34AM -0300, Jason Gunthorpe wrote: > I still haven't heard a satisfactory answer why a whole new scheme is > needed and a simple: > >if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP)) > preempt_disable() > > isn't sufficient to catch the problematic cases during debugging?? > IMHO the fact preempt is changed by the above when debugging is not > material here. I think that information should be included in the > commit message at least. That has a much larger impact and actually changes behaviour, while the relatively simple patch Daniel proposed only adds a warning but doesn't affect behaviour. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 3/4] kernel.h: Add non_block_start/end()
On Tue, Aug 20, 2019 at 10:24:40PM +0200, Daniel Vetter wrote: > On Tue, Aug 20, 2019 at 10:19:01AM +0200, Daniel Vetter wrote: > > In some special cases we must not block, but there's not a > > spinlock, preempt-off, irqs-off or similar critical section already > > that arms the might_sleep() debug checks. Add a non_block_start/end() > > pair to annotate these. > > > > This will be used in the oom paths of mmu-notifiers, where blocking is > > not allowed to make sure there's forward progress. Quoting Michal: > > > > "The notifier is called from quite a restricted context - oom_reaper - > > which shouldn't depend on any locks or sleepable conditionals. The code > > should be swift as well but we mostly do care about it to make a forward > > progress. Checking for sleepable context is the best thing we could come > > up with that would describe these demands at least partially." > > > > Peter also asked whether we want to catch spinlocks on top, but Michal > > said those are less of a problem because spinlocks can't have an > > indirect dependency upon the page allocator and hence close the loop > > with the oom reaper. > > > > Suggested by Michal Hocko. > > > > v2: > > - Improve commit message (Michal) > > - Also check in schedule, not just might_sleep (Peter) > > > > v3: It works better when I actually squash in the fixup I had lying > > around :-/ > > > > v4: Pick the suggestion from Andrew Morton to give non_block_start/end > > some good kerneldoc comments. I added that other blocking calls like > > wait_event pose similar issues, since that's the other example we > > discussed. > > > > Cc: Jason Gunthorpe > > Cc: Peter Zijlstra > > Cc: Ingo Molnar > > Cc: Andrew Morton > > Cc: Michal Hocko > > Cc: David Rientjes > > Cc: "Christian König" > > Cc: Daniel Vetter > > Cc: "Jérôme Glisse" > > Cc: linux...@kvack.org > > Cc: Masahiro Yamada > > Cc: Wei Wang > > Cc: Andy Shevchenko > > Cc: Thomas Gleixner > > Cc: Jann Horn > > Cc: Feng Tang > > Cc: Kees Cook > > Cc: Randy Dunlap > > Cc: linux-ker...@vger.kernel.org > > Acked-by: Christian König (v1) > > Signed-off-by: Daniel Vetter > > Hi Peter, > > Iirc you've been involved at least somewhat in discussing this. -mm folks > are a bit undecided whether these new non_block semantics are a good idea. > Michal Hocko still is in support, but Andrew Morton and Jason Gunthorpe > are less enthusiastic. Jason said he's ok with merging the hmm side of > this if scheduler folks ack. If not, then I'll respin with the > preempt_disable/enable instead like in v1. > > So ack/nack for this from the scheduler side? Right, I had memories of seeing this before, and I just found a fairly long discussion on this elsewhere in the vacation inbox (*groan*). Yeah, this is something I can live with, Acked-by: Peter Zijlstra (Intel) ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 00/34] put_user_pages(): miscellaneous call sites
On Thu, Aug 01, 2019 at 07:16:19PM -0700, john.hubb...@gmail.com wrote: > This is part a tree-wide conversion, as described in commit fc1d8e7cca2d > ("mm: introduce put_user_page*(), placeholder versions"). That commit > has an extensive description of the problem and the planned steps to > solve it, but the highlites are: That is one horridly mangled Changelog there :-/ It looks like it's partially duplicated. Anyway; no objections to any of that, but I just wanted to mention that there are other problems with long term pinning that haven't been mentioned, notably they inhibit compaction. A long time ago I proposed an interface to mark pages as pinned, such that we could run compaction before we actually did the pinning. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 1/2] dma-buf: add reservation_context for deadlock handling
On Tue, Jun 25, 2019 at 03:55:06PM +0200, Christian König wrote: > diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c > index 4d32e2c67862..9e53e42b053a 100644 > --- a/drivers/dma-buf/reservation.c > +++ b/drivers/dma-buf/reservation.c > @@ -55,6 +55,88 @@ EXPORT_SYMBOL(reservation_seqcount_class); > const char reservation_seqcount_string[] = "reservation_seqcount"; > EXPORT_SYMBOL(reservation_seqcount_string); > > +/** > + * reservation_context_init - initialize a reservation context > + * @ctx: the context to initialize > + * > + * Start using this reservation context to lock reservation objects for > update. > + */ > +void reservation_context_init(struct reservation_context *ctx) > +{ > + ww_acquire_init(&ctx->ctx, &reservation_ww_class); > + init_llist_head(&ctx->locked); ctx->locked = NULL; > + ctx->contended = NULL; > +} > +EXPORT_SYMBOL(reservation_context_init); > + > +/** > + * reservation_context_unlock_all - unlock all reservation objects > + * @ctx: the context which holds the reservation objects > + * > + * Unlocks all reservation objects locked with this context. > + */ > +void reservation_context_unlock_all(struct reservation_context *ctx) > +{ > + struct reservation_object *obj, *next; > + > + if (ctx->contended) > + ww_mutex_unlock(&ctx->contended->lock); > + ctx->contended = NULL; > + > + llist_for_each_entry_safe(obj, next, ctx->locked.first, locked) > + ww_mutex_unlock(&obj->lock); for (obj = ctx->locked; obj && (next = obj->locked, true); obj = next) > + init_llist_head(&ctx->locked); ctx->locked = NULL; > +} > +EXPORT_SYMBOL(reservation_context_unlock_all); > + > +/** > + * reservation_context_lock - lock a reservation object with deadlock > handling > + * @ctx: the context which should be used to lock the object > + * @obj: the object which needs to be locked > + * @interruptible: if we should wait interruptible or not > + * > + * Use @ctx to lock the reservation object. If a deadlock is detected we > backoff > + * by releasing all locked objects and use the slow path to lock the > reservation > + * object. After successfully locking in the slow path -EDEADLK is returned > to > + * signal that all other locks must be re-taken as well. > + */ > +int reservation_context_lock(struct reservation_context *ctx, > + struct reservation_object *obj, > + bool interruptible) > +{ > + int ret = 0; > + > + if (unlikely(ctx->contended == obj)) > + ctx->contended = NULL; > + else if (interruptible) > + ret = ww_mutex_lock_interruptible(&obj->lock, &ctx->ctx); > + else > + ret = ww_mutex_lock(&obj->lock, &ctx->ctx); > + > + if (likely(!ret)) { > + /* don't use llist_add here, we have separate locking */ > + obj->locked.next = ctx->locked.first; > + ctx->locked.first = &obj->locked; Since you're not actually using llist, could you then maybe open-code the whole thing and not rely on its implementation? That is, it hardly seems worth the trouble, as shown the bits you do use are hardly longer than just writing it out. Also, I was confused by the use of llist, as I wondered where the concurrency came from. > + return 0; > + } > + if (unlikely(ret != -EDEADLK)) > + return ret; > + > + reservation_context_unlock_all(ctx); > + > + if (interruptible) { > + ret = ww_mutex_lock_slow_interruptible(&obj->lock, &ctx->ctx); > + if (unlikely(ret)) > + return ret; > + } else { > + ww_mutex_lock_slow(&obj->lock, &ctx->ctx); > + } > + > + ctx->contended = obj; > + return -EDEADLK; > +} > +EXPORT_SYMBOL(reservation_context_lock); > diff --git a/include/linux/reservation.h b/include/linux/reservation.h > index ee750765cc94..a8a52e5d3e80 100644 > --- a/include/linux/reservation.h > +++ b/include/linux/reservation.h > +struct reservation_context { > + struct ww_acquire_ctx ctx; > + struct llist_head locked; struct reservation_object *locked; > + struct reservation_object *contended; > +}; > @@ -71,6 +108,7 @@ struct reservation_object_list { > */ > struct reservation_object { > struct ww_mutex lock; > + struct llist_node locked; struct reservation_object *locked; > seqcount_t seq; > > struct dma_fence __rcu *fence_excl; > -- > 2.17.1 > ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC][PATCH] wake_up_var() memory ordering
On Tue, Jun 25, 2019 at 02:12:22PM +0200, Andreas Gruenbacher wrote: > > Only if we do as David suggested and make clean_and_wake_up_bit() > > provide the RELEASE barrier. > > (It's clear_and_wake_up_bit, not clean_and_wake_up_bit.) Yes, typing hard. > > That is, currently clear_and_wake_up_bit() is > > > > clear_bit() > > smp_mb__after_atomic(); > > wake_up_bit(); > > > Now I'm confused because clear_and_wake_up_bit() in mainline does use > clear_bit_unlock(), so it's the exact opposite of what you just said. Argh; clearly I couldn't read. And then yes, you're right. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC][PATCH] wake_up_var() memory ordering
On Tue, Jun 25, 2019 at 11:19:35AM +0200, Andreas Gruenbacher wrote: > > diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c > > index cf4c767005b1..29ea5da7 100644 > > --- a/fs/gfs2/glops.c > > +++ b/fs/gfs2/glops.c > > @@ -227,6 +227,7 @@ static void gfs2_clear_glop_pending(struct gfs2_inode > > *ip) > > return; > > > > clear_bit_unlock(GIF_GLOP_PENDING, &ip->i_flags); > > + smp_mb__after_atomic(); > > wake_up_bit(&ip->i_flags, GIF_GLOP_PENDING); > > This should become clear_and_wake_up_bit as well, right? There are > several more instances of the same pattern. Only if we do as David suggested and make clean_and_wake_up_bit() provide the RELEASE barrier. That is, currently clear_and_wake_up_bit() is clear_bit() smp_mb__after_atomic(); wake_up_bit(); But the above is: clear_bit_unlock(); smp_mb__after_atomic(); wake_up_bit() the difference is that _unlock() uses RELEASE semantics, where clear_bit() does not. The difference is illustrated with something like: cond = true; clear_bit() smp_mb__after_atomic(); wake_up_bit(); In this case, a remote CPU can first observe the clear_bit() and then the 'cond = true' store. When we use clear_bit_unlock() this is not possible, because the RELEASE barrier ensures that everything before, stays before. > > } > > ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC][PATCH] wake_up_var() memory ordering
(sorry for cross-posting to moderated lists btw, I've since acquired a patch to get_maintainers.pl that wil exclude them in the future) On Tue, Jun 25, 2019 at 08:51:01AM +0100, David Howells wrote: > Peter Zijlstra wrote: > > > I tried using wake_up_var() today and accidentally noticed that it > > didn't imply an smp_mb() and specifically requires it through > > wake_up_bit() / waitqueue_active(). > > Thinking about it again, I'm not sure why you need to add the barrier when > wake_up() (which this is a wrapper around) is required to impose a barrier at > the front if there's anything to wake up (ie. the wait queue isn't empty). > > If this is insufficient, does it make sense just to have wake_up*() functions > do an unconditional release or full barrier right at the front, rather than it > being conditional on something being woken up? The curprit is __wake_up_bit()'s usage of waitqueue_active(); it is this latter (see its comment) that requires the smp_mb(). wake_up_bit() and wake_up_var() are wrappers around __wake_up_bit(). Without this barrier it is possible for the waitqueue_active() load to be hoisted over the cond=true store and the remote end can miss the store and we can miss its enqueue and we'll all miss a wakeup and get stuck. Adding an smp_mb() (or use wq_has_sleeper()) in __wake_up_bit() would be nice, but I fear some people will complain about overhead, esp. since about half the sites don't need the barrier due to being behind test_and_clear_bit() and the other half using smp_mb__after_atomic() after some clear_bit*() variant. There's a few sites that seem to open-code wait_var_event()/wake_up_var() and those actually need the full smp_mb(), but then maybe they should be converted to var instread of bit anyway. > > @@ -619,9 +614,7 @@ static int dvb_usb_fe_sleep(struct dvb_frontend *fe) > > err: > > if (!adap->suspend_resume_active) { > > adap->active_fe = -1; > > I'm wondering if there's a missing barrier here. Should the clear_bit() on > the next line be clear_bit_unlock() or clear_bit_release()? That looks reasonable, but I'd like to hear from the DVB folks on that. > > - clear_bit(ADAP_SLEEP, &adap->state_bits); > > - smp_mb__after_atomic(); > > - wake_up_bit(&adap->state_bits, ADAP_SLEEP); > > + clear_and_wake_up_bit(ADAP_SLEEP, &adap->state_bits); > > } > > > > dev_dbg(&d->udev->dev, "%s: ret=%d\n", __func__, ret); > > diff --git a/fs/afs/fs_probe.c b/fs/afs/fs_probe.c > > index cfe62b154f68..377ee07d5f76 100644 > > --- a/fs/afs/fs_probe.c > > +++ b/fs/afs/fs_probe.c > > @@ -18,6 +18,7 @@ static bool afs_fs_probe_done(struct afs_server *server) > > > > wake_up_var(&server->probe_outstanding); > > clear_bit_unlock(AFS_SERVER_FL_PROBING, &server->flags); > > + smp_mb__after_atomic(); > > wake_up_bit(&server->flags, AFS_SERVER_FL_PROBING); > > return true; > > } > > Looking at this and the dvb one, does it make sense to stick the release > semantics of clear_bit_unlock() into clear_and_wake_up_bit()? I was thinking of adding another helper, maybe unlock_and_wake_up_bit() that included that extra barrier, but maybe making it unconditional isn't the worst idea. > Also, should clear_bit_unlock() be renamed to clear_bit_release() (and > similarly test_and_set_bit_lock() -> test_and_set_bit_acquire()) if we seem to > be trying to standardise on that terminology. That definitely makes sense to me, there's only 157 clear_bit_unlock() and 76 test_and_set_bit_lock() users (note the asymetry of that). ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[RFC][PATCH] wake_up_var() memory ordering
Hi all, I tried using wake_up_var() today and accidentally noticed that it didn't imply an smp_mb() and specifically requires it through wake_up_bit() / waitqueue_active(). Now, wake_up_bit() doesn't imply the barrier because it is assumed to be used with the atomic bitops API which either implies (test_and_clear) or only needs smp_mb__after_atomic(), which is 'much' cheaper than an unconditional smp_mb(). Still, while auditing all that, I found a whole bunch of things that could be improved. There were missing barriers, superfluous barriers and a whole bunch of sites that could use clear_and_wake_up_bit(). So this fixes all wake_up_bit() usage without actually changing semantics of it (which are unfortunate but understandable). This does however change the semantics of wake_up_var(); even though wake_up_var() is most often used with atomics and then the additional smp_mb() is most often superfluous :/ There isn't really a good option here, comments (other than I need to split this up)? --- drivers/bluetooth/btmtksdio.c | 5 + drivers/bluetooth/btmtkuart.c | 5 + drivers/bluetooth/hci_mrvl.c| 8 ++-- drivers/gpu/drm/i915/i915_reset.c | 6 ++ drivers/md/dm-bufio.c | 10 ++ drivers/media/usb/dvb-usb-v2/dvb_usb_core.c | 15 --- fs/afs/fs_probe.c | 1 + fs/afs/server.c | 1 + fs/afs/vl_probe.c | 1 + fs/afs/volume.c | 1 + fs/aio.c| 4 +--- fs/block_dev.c | 1 + fs/btrfs/extent_io.c| 4 +--- fs/cachefiles/namei.c | 1 + fs/cifs/connect.c | 3 +-- fs/cifs/misc.c | 15 +-- fs/fscache/cookie.c | 2 ++ fs/fscache/object.c | 2 ++ fs/fscache/page.c | 3 +++ fs/gfs2/glock.c | 8 ++-- fs/gfs2/glops.c | 1 + fs/gfs2/lock_dlm.c | 8 ++-- fs/gfs2/recovery.c | 4 +--- fs/gfs2/super.c | 1 + fs/gfs2/sys.c | 4 +--- fs/nfs/nfs4state.c | 4 +--- fs/nfs/pnfs_nfs.c | 4 +--- fs/nfsd/nfs4recover.c | 4 +--- fs/orangefs/file.c | 2 +- kernel/sched/wait_bit.c | 1 + net/bluetooth/hci_event.c | 5 + net/rds/ib_recv.c | 1 + security/keys/gc.c | 5 ++--- 33 files changed, 50 insertions(+), 90 deletions(-) diff --git a/drivers/bluetooth/btmtksdio.c b/drivers/bluetooth/btmtksdio.c index 813338288453..27523cfeac9a 100644 --- a/drivers/bluetooth/btmtksdio.c +++ b/drivers/bluetooth/btmtksdio.c @@ -356,11 +356,8 @@ static int btmtksdio_recv_event(struct hci_dev *hdev, struct sk_buff *skb) if (hdr->evt == HCI_EV_VENDOR) { if (test_and_clear_bit(BTMTKSDIO_TX_WAIT_VND_EVT, - &bdev->tx_state)) { - /* Barrier to sync with other CPUs */ - smp_mb__after_atomic(); + &bdev->tx_state)) wake_up_bit(&bdev->tx_state, BTMTKSDIO_TX_WAIT_VND_EVT); - } } return 0; diff --git a/drivers/bluetooth/btmtkuart.c b/drivers/bluetooth/btmtkuart.c index f5dbeec8e274..7fe324df3799 100644 --- a/drivers/bluetooth/btmtkuart.c +++ b/drivers/bluetooth/btmtkuart.c @@ -340,11 +340,8 @@ static int btmtkuart_recv_event(struct hci_dev *hdev, struct sk_buff *skb) if (hdr->evt == HCI_EV_VENDOR) { if (test_and_clear_bit(BTMTKUART_TX_WAIT_VND_EVT, - &bdev->tx_state)) { - /* Barrier to sync with other CPUs */ - smp_mb__after_atomic(); + &bdev->tx_state)) wake_up_bit(&bdev->tx_state, BTMTKUART_TX_WAIT_VND_EVT); - } } return 0; diff --git a/drivers/bluetooth/hci_mrvl.c b/drivers/bluetooth/hci_mrvl.c index 50212ac629e3..f03294d39d08 100644 --- a/drivers/bluetooth/hci_mrvl.c +++ b/drivers/bluetooth/hci_mrvl.c @@ -157,9 +157,7 @@ static int mrvl_recv_fw_req(struct hci_dev *hdev, struct sk_buff *skb) mrvl->tx_len = le16_to_cpu(pkt->lhs); - clear_bit(STATE_FW_REQ_PENDING, &mrvl->flags); - smp_mb__after_atomic(); - wake_up_bit(&mrvl->flags, STATE_FW_REQ_PENDING); + clear_and_wake_up_bit(STATE_FW_REQ_PENDING, &mrvl->flags); done: kfree_skb(skb); @@ -192,9 +190,7 @@ static int mrvl_recv_chi
Re: [PATCH v1 12/22] docs: driver-api: add .rst files from the main dir
On Wed, Jun 19, 2019 at 01:45:51PM +0200, Peter Zijlstra wrote: > On Wed, Jun 19, 2019 at 01:43:56PM +0200, Peter Zijlstra wrote: > > On Tue, Jun 18, 2019 at 05:53:17PM -0300, Mauro Carvalho Chehab wrote: > > > > > .../{ => driver-api}/atomic_bitops.rst| 2 - > > > > That's a .txt file, big fat NAK for making it an rst. > > Also, how many bloody times do I have to keep telling this? It is > starting to get _REALLY_ annoying. Also, cross-posting to moderated lists is rude. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v1 12/22] docs: driver-api: add .rst files from the main dir
On Wed, Jun 19, 2019 at 01:43:56PM +0200, Peter Zijlstra wrote: > On Tue, Jun 18, 2019 at 05:53:17PM -0300, Mauro Carvalho Chehab wrote: > > > .../{ => driver-api}/atomic_bitops.rst| 2 - > > That's a .txt file, big fat NAK for making it an rst. Also, how many bloody times do I have to keep telling this? It is starting to get _REALLY_ annoying. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v1 12/22] docs: driver-api: add .rst files from the main dir
On Tue, Jun 18, 2019 at 05:53:17PM -0300, Mauro Carvalho Chehab wrote: > .../{ => driver-api}/atomic_bitops.rst| 2 - That's a .txt file, big fat NAK for making it an rst. > .../{ => driver-api}/futex-requeue-pi.rst | 2 - > .../{ => driver-api}/gcc-plugins.rst | 2 - > Documentation/{ => driver-api}/kprobes.rst| 2 - > .../{ => driver-api}/percpu-rw-semaphore.rst | 2 - More NAK for rst conversion > Documentation/{ => driver-api}/pi-futex.rst | 2 - > .../{ => driver-api}/preempt-locking.rst | 2 - > Documentation/{ => driver-api}/rbtree.rst | 2 - > .../{ => driver-api}/robust-futex-ABI.rst | 2 - > .../{ => driver-api}/robust-futexes.rst | 2 - > .../{ => driver-api}/speculation.rst | 8 +-- > .../{ => driver-api}/static-keys.rst | 2 - > .../{ => driver-api}/this_cpu_ops.rst | 2 - > Documentation/locking/rt-mutex.rst| 2 +- NAK. None of the above have anything to do with driver-api. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v1 12/22] docs: driver-api: add .rst files from the main dir
On Wed, Jun 19, 2019 at 07:22:18AM -0300, Mauro Carvalho Chehab wrote: > Hi Daniel, > > Em Wed, 19 Jun 2019 11:05:57 +0200 > Daniel Vetter escreveu: > > > On Tue, Jun 18, 2019 at 10:55 PM Mauro Carvalho Chehab > > wrote: > > > diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst > > > index fa30dfcfc3c8..b0f948d8733b 100644 > > > --- a/Documentation/gpu/drm-mm.rst > > > +++ b/Documentation/gpu/drm-mm.rst > > > @@ -320,7 +320,7 @@ struct :c:type:`struct file_operations > > > ` get_unmapped_area > > > field with a pointer on :c:func:`drm_gem_cma_get_unmapped_area`. > > > > > > More detailed information about get_unmapped_area can be found in > > > -Documentation/nommu-mmap.rst > > > +Documentation/driver-api/nommu-mmap.rst > > > > Random drive-by comment: Could we convert these into hyperlinks within > > sphinx somehow, without making them less useful as raw file references > > (with vim I can just type 'gf' and it works, emacs probably the same). > > -Daniel > > Short answer: I don't know how vim/emacs would recognize Sphinx tags. No, the other way around, Sphinx can recognize local files and treat them special. That way we keep the text readable. Same with that :c:func:'foo' crap, that needs to die, and Sphinx needs to be taught about foo(). ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 3/6] drm/gem: use new ww_mutex_(un)lock_for_each macros
On Fri, Jun 14, 2019 at 03:06:10PM +0200, Christian König wrote: > Am 14.06.19 um 14:59 schrieb Peter Zijlstra: > > +#define ww_mutex_lock_for_each(loop, pos, contended, ret, intr, ctx) \ > > + for (contended = ERR_PTR(-ENOENT); ({ \ > > + __label__ relock, next; \ > > + ret = -ENOENT; \ > > + if (contended == ERR_PTR(-ENOENT)) \ > > + contended = NULL; \ > > + else if (contended == ERR_PTR(-EDEADLK))\ > > + contended = (pos); \ > > + else\ > > + goto next; \ > > + loop { \ > > + if (contended == (pos)) { \ > > + contended = NULL; \ > > + continue; \ > > + } \ > > +relock: > > \ > > + ret = !(intr) ? ww_mutex_lock(pos, ctx) : \ > > + ww_mutex_lock_interruptible(pos, ctx); \ > > + if (ret == -EDEADLK) { \ > > + ww_mutex_unlock_for_each(loop, pos, \ > > +contended);\ > > + contended = ERR_PTR(-EDEADLK); \ > > + goto relock;\ > > > > while relock here continues where it left of and doesn't restart @loop. > > Or am I reading this monstrosity the wrong way? > > contended = ERR_PTR(-EDEADLK) makes sure that the whole loop is restarted > after retrying to acquire the lock. > > See the "if" above "loop". ARGH, the loop inside the loop... Yeah, maybe, brain hurts. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 3/6] drm/gem: use new ww_mutex_(un)lock_for_each macros
On Fri, Jun 14, 2019 at 02:41:22PM +0200, Christian König wrote: > Use the provided macros instead of implementing deadlock handling on our own. > > Signed-off-by: Christian König > --- > drivers/gpu/drm/drm_gem.c | 49 ++- > 1 file changed, 12 insertions(+), 37 deletions(-) > > diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c > index 50de138c89e0..6e4623d3bee2 100644 > --- a/drivers/gpu/drm/drm_gem.c > +++ b/drivers/gpu/drm/drm_gem.c > @@ -1307,51 +1307,26 @@ int > drm_gem_lock_reservations(struct drm_gem_object **objs, int count, > struct ww_acquire_ctx *acquire_ctx) > { > - int contended = -1; > + struct ww_mutex *contended; > int i, ret; > > ww_acquire_init(acquire_ctx, &reservation_ww_class); > > -retry: > - if (contended != -1) { > - struct drm_gem_object *obj = objs[contended]; > - > - ret = ww_mutex_lock_slow_interruptible(&obj->resv->lock, > -acquire_ctx); > - if (ret) { > - ww_acquire_done(acquire_ctx); > - return ret; > - } > - } > - > - for (i = 0; i < count; i++) { > - if (i == contended) > - continue; > - > - ret = ww_mutex_lock_interruptible(&objs[i]->resv->lock, > - acquire_ctx); > - if (ret) { > - int j; > - > - for (j = 0; j < i; j++) > - ww_mutex_unlock(&objs[j]->resv->lock); > - > - if (contended != -1 && contended >= i) > - ww_mutex_unlock(&objs[contended]->resv->lock); > - > - if (ret == -EDEADLK) { > - contended = i; > - goto retry; > - } > - > - ww_acquire_done(acquire_ctx); > - return ret; > - } > - } I note all the sites you use this on are simple idx iterators; so how about something like so: int ww_mutex_unlock_all(int count, void *data, struct ww_mutex *(*func)(int, void *)) { int i; for (i = 0; i < count; i++) { lock = func(i, data); ww_mutex_unlock(lock); } } int ww_mutex_lock_all(int count, struct ww_acquire_context *acquire_ctx, bool intr, void *data, struct ww_mutex *(*func)(int, void *)) { int i, ret, contended = -1; struct ww_mutex *lock; retry: if (contended != -1) { lock = func(contended, data); if (intr) ret = ww_mutex_lock_slow_interruptible(lock, acquire_ctx); else ret = ww_mutex_lock_slow(lock, acquire_ctx), 0; if (ret) { ww_acquire_done(acquire_ctx); return ret; } } for (i = 0; i < count; i++) { if (i == contended) continue; lock = func(i, data); if (intr) ret = ww_mutex_lock_interruptible(lock, acquire_ctx); else ret = ww_mutex_lock(lock, acquire_ctx), 0; if (ret) { ww_mutex_unlock_all(i, data, func); if (contended > i) { lock = func(contended, data); ww_mutex_unlock(lock); } if (ret == -EDEADLK) { contended = i; goto retry; } ww_acquire_done(acquire_ctx); return ret; } } ww_acquire_done(acquire_ctx); return 0; } > + ww_mutex_lock_for_each(for (i = 0; i < count; i++), > +&objs[i]->resv->lock, contended, ret, true, > +acquire_ctx) > + if (ret) > + goto error; which then becomes: struct ww_mutex *gem_ww_mutex_func(int i, void *data) { struct drm_gem_object **objs = data; return &objs[i]->resv->lock; } ret = ww_mutex_lock_all(count, acquire_ctx, true, objs, gem_ww_mutex_func); > ww_acquire_done(acquire_ctx); > > return 0; > + > +error: > + ww_mutex_unlock_for_each(for (i = 0; i < count; i++), > + &objs[i]->resv->lock, contended); > + ww_acquire_done(acquire_ctx); > + return ret; > } > EXPORT_SYMBOL(drm_gem_lock_reservations); > > -- > 2.17.1 > ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listi
Re: [PATCH 3/6] drm/gem: use new ww_mutex_(un)lock_for_each macros
On Fri, Jun 14, 2019 at 02:41:22PM +0200, Christian König wrote: > Use the provided macros instead of implementing deadlock handling on our own. > > Signed-off-by: Christian König > --- > drivers/gpu/drm/drm_gem.c | 49 ++- > 1 file changed, 12 insertions(+), 37 deletions(-) > > diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c > index 50de138c89e0..6e4623d3bee2 100644 > --- a/drivers/gpu/drm/drm_gem.c > +++ b/drivers/gpu/drm/drm_gem.c > @@ -1307,51 +1307,26 @@ int > drm_gem_lock_reservations(struct drm_gem_object **objs, int count, > struct ww_acquire_ctx *acquire_ctx) > { > - int contended = -1; > + struct ww_mutex *contended; > int i, ret; > > ww_acquire_init(acquire_ctx, &reservation_ww_class); > > -retry: > - if (contended != -1) { > - struct drm_gem_object *obj = objs[contended]; > - > - ret = ww_mutex_lock_slow_interruptible(&obj->resv->lock, > -acquire_ctx); > - if (ret) { > - ww_acquire_done(acquire_ctx); > - return ret; > - } > - } > - > - for (i = 0; i < count; i++) { > - if (i == contended) > - continue; > - > - ret = ww_mutex_lock_interruptible(&objs[i]->resv->lock, > - acquire_ctx); > - if (ret) { > - int j; > - > - for (j = 0; j < i; j++) > - ww_mutex_unlock(&objs[j]->resv->lock); > - > - if (contended != -1 && contended >= i) > - ww_mutex_unlock(&objs[contended]->resv->lock); > - > - if (ret == -EDEADLK) { > - contended = i; > - goto retry; retry here, starts the whole locking loop. > - } > - > - ww_acquire_done(acquire_ctx); > - return ret; > - } > - } +#define ww_mutex_unlock_for_each(loop, pos, contended) \ + if (!IS_ERR(contended)) { \ + if (contended) \ + ww_mutex_unlock(contended); \ + contended = (pos); \ + loop { \ + if (contended == (pos)) \ + break; \ + ww_mutex_unlock(pos); \ + } \ + } + +#define ww_mutex_lock_for_each(loop, pos, contended, ret, intr, ctx) \ + for (contended = ERR_PTR(-ENOENT); ({ \ + __label__ relock, next; \ + ret = -ENOENT; \ + if (contended == ERR_PTR(-ENOENT)) \ + contended = NULL; \ + else if (contended == ERR_PTR(-EDEADLK))\ + contended = (pos); \ + else\ + goto next; \ + loop { \ + if (contended == (pos)) { \ + contended = NULL; \ + continue; \ + } \ +relock: \ + ret = !(intr) ? ww_mutex_lock(pos, ctx) : \ + ww_mutex_lock_interruptible(pos, ctx); \ + if (ret == -EDEADLK) { \ + ww_mutex_unlock_for_each(loop, pos, \ +contended);\ + contended = ERR_PTR(-EDEADLK); \ + goto relock;\ while relock here continues where it left of and doesn't restart @loop. Or am I reading this monstrosity the wrong way? + } \ + break; \ +next: \ + continue; \ + } \ + }), ret != -ENOENT;) + > + ww_mutex_lock_for_each(for (i = 0; i
Re: [PATCH 1/6] locking: add ww_mutex_(un)lock_for_each helpers
On Fri, Jun 14, 2019 at 02:41:20PM +0200, Christian König wrote: > The ww_mutex implementation allows for detection deadlocks when multiple > threads try to acquire the same set of locks in different order. > > The problem is that handling those deadlocks was the burden of the user of > the ww_mutex implementation and at least some users didn't got that right > on the first try. > > I'm not a big fan of macros, but it still better then implementing the same > logic at least halve a dozen times. So this patch adds two macros to lock > and unlock multiple ww_mutex instances with the necessary deadlock handling. > > Signed-off-by: Christian König > --- > include/linux/ww_mutex.h | 75 > 1 file changed, 75 insertions(+) > > diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h > index 3af7c0e03be5..a0d893b5b114 100644 > --- a/include/linux/ww_mutex.h > +++ b/include/linux/ww_mutex.h > @@ -369,4 +369,79 @@ static inline bool ww_mutex_is_locked(struct ww_mutex > *lock) > return mutex_is_locked(&lock->base); > } > > +/** > + * ww_mutex_unlock_for_each - cleanup after error or contention > + * @loop: for loop code fragment iterating over all the locks > + * @pos: code fragment returning the currently handled lock > + * @contended: the last contended ww_mutex or NULL or ERR_PTR > + * > + * Helper to make cleanup after error or lock contention easier. > + * First unlock the last contended lock and then all other locked ones. > + */ > +#define ww_mutex_unlock_for_each(loop, pos, contended) \ > + if (!IS_ERR(contended)) { \ > + if (contended) \ > + ww_mutex_unlock(contended); \ > + contended = (pos); \ > + loop { \ > + if (contended == (pos)) \ > + break; \ > + ww_mutex_unlock(pos); \ > + } \ > + } > + > +/** > + * ww_mutex_lock_for_each - implement ww_mutex deadlock handling > + * @loop: for loop code fragment iterating over all the locks > + * @pos: code fragment returning the currently handled lock > + * @contended: ww_mutex pointer variable for state handling > + * @ret: int variable to store the return value of ww_mutex_lock() > + * @intr: If true ww_mutex_lock_interruptible() is used > + * @ctx: ww_acquire_ctx pointer for the locking > + * > + * This macro implements the necessary logic to lock multiple ww_mutex > + * instances. Lock contention with backoff and re-locking is handled by the > + * macro so that the loop body only need to handle other errors and > + * successfully acquired locks. > + * > + * With the @loop and @pos code fragment it is possible to apply this logic > + * to all kind of containers (array, list, tree, etc...) holding multiple > + * ww_mutex instances. > + * > + * @contended is used to hold the current state we are in. -ENOENT is used to > + * signal that we are just starting the handling. -EDEADLK means that the > + * current position is contended and we need to restart the loop after > locking > + * it. NULL means that there is no contention to be handled. Any other value > is > + * the last contended ww_mutex. > + */ > +#define ww_mutex_lock_for_each(loop, pos, contended, ret, intr, ctx) \ > + for (contended = ERR_PTR(-ENOENT); ({ \ > + __label__ relock, next; \ > + ret = -ENOENT; \ > + if (contended == ERR_PTR(-ENOENT)) \ > + contended = NULL; \ > + else if (contended == ERR_PTR(-EDEADLK))\ > + contended = (pos); \ > + else\ > + goto next; \ > + loop { \ > + if (contended == (pos)) { \ > + contended = NULL; \ > + continue; \ > + } \ > +relock: > \ > + ret = !(intr) ? ww_mutex_lock(pos, ctx) : \ > + ww_mutex_lock_interruptible(pos, ctx); \ > + if (ret == -EDEADLK) { \ > + ww_mutex_unlock_for_each(loop, pos, \ > + contended);\ > + contended
Re: [PATCH v3 08/18] objtool: add kunit_try_catch_throw to the noreturn list
On Tue, May 14, 2019 at 01:12:23AM -0700, Brendan Higgins wrote: > On Tue, May 14, 2019 at 08:56:43AM +0200, Peter Zijlstra wrote: > > On Mon, May 13, 2019 at 10:42:42PM -0700, Brendan Higgins wrote: > > > This fixes the following warning seen on GCC 7.3: > > > kunit/test-test.o: warning: objtool: kunit_test_unsuccessful_try() > > > falls through to next function kunit_test_catch() > > > > > > > What is that file and function; no kernel tree near me seems to have > > that. > > Oh, sorry about that. The function is added in the following patch, > "[PATCH v3 09/18] kunit: test: add support for test abort"[1]. > > My apologies if this patch is supposed to come after it in sequence, but > I assumed it should come before otherwise objtool would complain about > the symbol when it is introduced. Or send me all patches such that I have context, or have a sane Changelog that gives me context. Just don't give me one patch with a crappy changelog. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v3 08/18] objtool: add kunit_try_catch_throw to the noreturn list
On Mon, May 13, 2019 at 10:42:42PM -0700, Brendan Higgins wrote: > This fixes the following warning seen on GCC 7.3: > kunit/test-test.o: warning: objtool: kunit_test_unsuccessful_try() falls > through to next function kunit_test_catch() > What is that file and function; no kernel tree near me seems to have that. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] RFC: console: hack up console_lock more v3
On Thu, May 09, 2019 at 03:06:09PM +0200, Daniel Vetter wrote: > On Thu, May 9, 2019 at 2:31 PM Peter Zijlstra wrote: > > On Thu, May 09, 2019 at 02:09:03PM +0200, Daniel Vetter wrote: > > > Fix this by creating a prinkt_safe_up() which calls wake_up_process > > > outside of the spinlock. This isn't correct in full generality, but > > > good enough for console_lock: > > > > > > - console_lock doesn't use interruptible or killable or timeout down() > > > calls, hence an up() is the only thing that can wake up a process. > > > > Wrong :/ Any task can be woken at any random time. We must, at all > > times, assume spurious wakeups will happen. > > Out of curiosity, where do these come from? I know about the races > where you need to recheck on the waiter side to avoid getting stuck, > but didn't know about this. Are these earlier (possibly spurious) > wakeups that got held up and delayed for a while, then hit the task > much later when it's already continued doing something else? Yes, this. So they all more or less have the form: CPU0CPU1 enqueue_waiter() done = true; if (waiters) for (;;) { if (done) break; ... } dequeue_waiter() do something else again wake_up_task The wake_q thing made the above much more common, but we've had it forever. > Or even > more random, and even if I never put a task on a wait list or anything > else, ever, it can get woken spuriously? I had patches that did that on purpose, but no. > > Something like the below might work. > > Yeah that looks like the proper fix. I guess semaphores are uncritical > enough that we can roll this out for everyone. Thanks for the hint. It's actually an optimization that we never did because semaphores are so uncritical :-) The thing is, by delaying the wakup until after we've released the spinlock, the waiter will not contend on the spinlock the moment it wakes. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] RFC: console: hack up console_lock more v3
On Thu, May 09, 2019 at 02:09:03PM +0200, Daniel Vetter wrote: > Fix this by creating a prinkt_safe_up() which calls wake_up_process > outside of the spinlock. This isn't correct in full generality, but > good enough for console_lock: > > - console_lock doesn't use interruptible or killable or timeout down() > calls, hence an up() is the only thing that can wake up a process. Wrong :/ Any task can be woken at any random time. We must, at all times, assume spurious wakeups will happen. > +void printk_safe_up(struct semaphore *sem) > +{ > + unsigned long flags; > + struct semaphore_waiter *waiter = NULL; > + > + raw_spin_lock_irqsave(&sem->lock, flags); > + if (likely(list_empty(&sem->wait_list))) { > + sem->count++; > + } else { > + waiter = list_first_entry(&sem->wait_list, > + struct semaphore_waiter, list); > + list_del(&waiter->list); > + waiter->up = true; > + } > + raw_spin_unlock_irqrestore(&sem->lock, flags); > + > + if (waiter) /* protected by being sole wake source */ > + wake_up_process(waiter->task); > +} > +EXPORT_SYMBOL(printk_safe_up); Since its only used from printk, that EXPORT really isn't needed. Something like the below might work. --- kernel/locking/semaphore.c | 26 +- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c index 561acdd39960..ac0a67e95aac 100644 --- a/kernel/locking/semaphore.c +++ b/kernel/locking/semaphore.c @@ -38,7 +38,6 @@ static noinline void __down(struct semaphore *sem); static noinline int __down_interruptible(struct semaphore *sem); static noinline int __down_killable(struct semaphore *sem); static noinline int __down_timeout(struct semaphore *sem, long timeout); -static noinline void __up(struct semaphore *sem); /** * down - acquire the semaphore @@ -178,14 +177,24 @@ EXPORT_SYMBOL(down_timeout); */ void up(struct semaphore *sem) { + struct semaphore_waiter *waiter; + DEFINE_WAKE_Q(wake_q); unsigned long flags; raw_spin_lock_irqsave(&sem->lock, flags); - if (likely(list_empty(&sem->wait_list))) + if (likely(list_empty(&sem->wait_list))) { sem->count++; - else - __up(sem); + goto unlock; + } + + waiter = list_first_entry(&sem->wait_list, struct semaphore_waiter, list); + list_del(&waiter->list); + waiter->up = true; + wake_q_add(&wake_q, waiter->task); +unlock: raw_spin_unlock_irqrestore(&sem->lock, flags); + + wake_up_q(&wake_q); } EXPORT_SYMBOL(up); @@ -252,12 +261,3 @@ static noinline int __sched __down_timeout(struct semaphore *sem, long timeout) { return __down_common(sem, TASK_UNINTERRUPTIBLE, timeout); } - -static noinline void __sched __up(struct semaphore *sem) -{ - struct semaphore_waiter *waiter = list_first_entry(&sem->wait_list, - struct semaphore_waiter, list); - list_del(&waiter->list); - waiter->up = true; - wake_up_process(waiter->task); -} ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [patch V3 18/29] lockdep: Remove save argument from check_prev_add()
On Thu, Apr 25, 2019 at 11:45:11AM +0200, Thomas Gleixner wrote: > There is only one caller which hands in save_trace as function pointer. > > Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [patch V2 19/29] lockdep: Simplify stack trace handling
On Thu, Apr 18, 2019 at 10:41:38AM +0200, Thomas Gleixner wrote: > Replace the indirection through struct stack_trace by using the storage > array based interfaces and storing the information is a small lockdep > specific data structure. > Acked-by: Peter Zijlstra (Intel) ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [patch V2 18/29] lockdep: Move stack trace logic into check_prev_add()
On Thu, Apr 18, 2019 at 10:41:37AM +0200, Thomas Gleixner wrote: > There is only one caller of check_prev_add() which hands in a zeroed struct > stack trace and a function pointer to save_stack(). Inside check_prev_add() > the stack_trace struct is checked for being empty, which is always > true. Based on that one code path stores a stack trace which is unused. The > comment there does not make sense either. It's all leftovers from > historical lockdep code (cross release). I was more or less expecting a revert of: ce07a9415f26 ("locking/lockdep: Make check_prev_add() able to handle external stack_trace") And then I read the comment that went with the "static struct stack_trace trace" that got removed (in the above commit) and realized that your patch will consume more stack entries. The problem is when the held lock stack in check_prevs_add() has multple trylock entries on top, in that case we call check_prev_add() multiple times, and this patch will then save the exact same stack-trace multiple times, consuming static resources. Possibly we should copy what stackdepot does (but we cannot use it directly because stackdepot uses locks; but possible we can share bits), but that is a patch for another day I think. So while convoluted, perhaps we should retain this code for now. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v2 56/79] docs: Documentation/*.txt: rename all ReST files to *.rst
On Tue, Apr 23, 2019 at 11:38:16PM +0200, Borislav Petkov wrote: > If that is all the changes it would need, then I guess that's ok. Btw, > those rst-conversion patches don't really show what got changed. Dunno > if git can even show that properly. I diffed the two files by hand to > see what got changed, see end of mail. That is not a happy diff; that table has gotten waay worse to read due to all that extra table crap. > --- > --- mm.old2019-04-23 23:18:55.954335784 +0200 > +++ mm.new2019-04-23 23:18:48.122335821 +0200 > @@ -18,51 +18,68 @@ Notes: > notation than "16 EB", which few will recognize at first sight as 16 > exabytes. > It also shows it nicely how incredibly large 64-bit address space is. > > - > -Start addr| Offset | End addr | Size | VM area > description > - > - || | | > - |0 | 7fff | 128 TB | user-space > virtual memory, different per mm > -__||__|_|___ > - || | | > - 8000 | +128TB | 7fff | ~16M TB | ... huge, > almost 64 bits wide hole of non-canonical > - || | | virtual > memory addresses up to the -128 TB > - || | | starting > offset of kernel mappings. > -__||__|_|___ > -| > -| Kernel-space > virtual memory, shared between all processes: > -|___ > - || | | > - 8000 | -128TB | 87ff |8 TB | ... guard > hole, also reserved for hypervisor > - 8800 | -120TB | 887f | 0.5 TB | LDT remap for > PTI > - 8880 | -119.5 TB | c87f | 64 TB | direct mapping > of all physical memory (page_offset_base) > - c880 | -55.5 TB | c8ff | 0.5 TB | ... unused hole > - c900 | -55TB | e8ff | 32 TB | > vmalloc/ioremap space (vmalloc_base) > - e900 | -23TB | e9ff |1 TB | ... unused hole > - ea00 | -22TB | eaff |1 TB | virtual memory > map (vmemmap_base) > - eb00 | -21TB | ebff |1 TB | ... unused hole > - ec00 | -20TB | fbff | 16 TB | KASAN shadow > memory > -__||__|_| > -| > -| Identical > layout to the 56-bit one from here on: > -| > - || | | > - fc00 | -4TB | fdff |2 TB | ... unused hole > - || | | vaddr_end for > KASLR > - fe00 | -2TB | fe7f | 0.5 TB | cpu_entry_area > mapping > - fe80 | -1.5 TB | feff | 0.5 TB | ... unused hole > - ff00 | -1TB | ff7f | 0.5 TB | %esp fixup > stacks > - ff80 | -512GB | ffee | 444 GB | ... unused hole > - ffef | -68GB | fffe | 64 GB | EFI region > mapping space > - | -4GB | 7fff |2 GB | ... unused hole > - 8000 | -2GB | 9fff | 512 MB | kernel text > mapping, mapped to physical address 0 > - 8000 |-2048MB | | | > - a000 |-1536MB | feff | 1520 MB | module mapping > space > - ff00 | -16MB | | | > -FIXADDR_START | ~-11MB | ff5f | ~0.5 MB | > kernel-internal fixmap range, variable size and offset > - ff60 | -10MB | ff600fff |4 kB | legacy > vsyscall ABI > - ffe0 | -2MB | |2 MB | ... unused hole > -__||__
Re: [PATCH v2 56/79] docs: Documentation/*.txt: rename all ReST files to *.rst
On Tue, Apr 23, 2019 at 11:53:49AM -0600, Jonathan Corbet wrote: > > Look at crap like this: > > > > "The memory allocations via :c:func:`kmalloc`, :c:func:`vmalloc`, > > :c:func:`kmem_cache_alloc` and" > > > > That should've been written like: > > > > "The memory allocations via kmalloc(), vmalloc(), kmem_cache_alloc() > > and" > > Yeah, I get it. That markup generates cross-references, which can be > seriously useful for readers - we want that. The funny thing is; that sentence continues (on a new line) like: "friends are traced and the pointers, together with additional" So while it then has cross-references to a few functions, all 'friends' are left dangling. So what's the point of the cross-references? Also, 'make ctags' and follow tag (ctrl-] for fellow vim users) will get you to the function, no magic markup required. > But I do wonder if we > couldn't do it automatically with just a little bit of scripting work. > It's not to hard to recognize this_is_a_function(), after all. I'll look > into that, it would definitely help to remove some gunk from the source > docs. That would be good; less markup is more. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v2 56/79] docs: Documentation/*.txt: rename all ReST files to *.rst
On Tue, Apr 23, 2019 at 08:55:19AM -0400, Mike Snitzer wrote: > On Tue, Apr 23 2019 at 4:31am -0400, > Peter Zijlstra wrote: > > > On Mon, Apr 22, 2019 at 10:27:45AM -0300, Mauro Carvalho Chehab wrote: > > > > > .../{atomic_bitops.txt => atomic_bitops.rst} | 2 + > > > > What's happend to atomic_t.txt, also NAK, I still occationally touch > > these files. > > Seems Mauro's point is in the future we need to touch these .rst files > in terms of ReST compatible changes. > > I'm dreading DM documentation changes in the future.. despite Mauro and > Jon Corbet informing me that ReST is simple, etc. Well, it _can_ be simple, I've seen examples of rst that were not far from generated HTML contents. And I must give Jon credit for not accepting that atrocious crap. But yes, I have 0 motivation to learn or abide by rst. It simply doesn't give me anything in return. There is no upside, only worse text files :/ ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v2 56/79] docs: Documentation/*.txt: rename all ReST files to *.rst
A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing in e-mail? ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v2 56/79] docs: Documentation/*.txt: rename all ReST files to *.rst
On Mon, Apr 22, 2019 at 10:27:45AM -0300, Mauro Carvalho Chehab wrote: > .../{atomic_bitops.txt => atomic_bitops.rst} | 2 + What's happend to atomic_t.txt, also NAK, I still occationally touch these files. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v2 56/79] docs: Documentation/*.txt: rename all ReST files to *.rst
On Tue, Apr 23, 2019 at 10:30:53AM -0600, Jonathan Corbet wrote: > On Tue, 23 Apr 2019 15:01:32 +0200 > Peter Zijlstra wrote: > > > But yes, I have 0 motivation to learn or abide by rst. It simply doesn't > > give me anything in return. There is no upside, only worse text files :/ > > So I believe it gives even you one thing in return: documentation that is > more accessible for both readers and authors. I know I'm an odd duck; but no. They're _less_ accessible for me, as both a reader and author. They look 'funny' when read as a text file (the only way it makes sense to read them; I spend 99% of my time on a computer looking at monospace text interfaces; mutt, vim and console, in that approximate order). When writing, I now have to be bothered about this format crap over just trying to write a coherent document. Look at crap like this: "The memory allocations via :c:func:`kmalloc`, :c:func:`vmalloc`, :c:func:`kmem_cache_alloc` and" That should've been written like: "The memory allocations via kmalloc(), vmalloc(), kmem_cache_alloc() and" Heck, that paragraph isn't even properly flowed. Then there's the endless stuck ':' key, and the mysterious "''" because \" isn't a character, oh wait. Bah.. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [patch V2 28/29] stacktrace: Provide common infrastructure
On Fri, Apr 19, 2019 at 10:32:30AM +0200, Thomas Gleixner wrote: > On Fri, 19 Apr 2019, Peter Zijlstra wrote: > > On Thu, Apr 18, 2019 at 10:41:47AM +0200, Thomas Gleixner wrote: > > > > > +typedef bool (*stack_trace_consume_fn)(void *cookie, unsigned long addr, > > > + bool reliable); > > > > > +void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie, > > > + struct task_struct *task, struct pt_regs *regs); > > > +int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry, void > > > *cookie, > > > + struct task_struct *task); > > > > This bugs me a little; ideally the _reliable() thing would not exists. > > > > Thomas said that the existing __save_stack_trace_reliable() is different > > enough for the unification to be non-trivial, but maybe Josh can help > > out? > > > > >From what I can see the biggest significant differences are: > > > > - it looks at the regs sets on the stack and for FP bails early > > - bails for khreads and idle (after it does all the hard work!?!) > > > > The first (FP checking for exceptions) should probably be reflected in > > consume_fn(.reliable) anyway -- although that would mean a lot of extra > > '?' entries where there are none today. > > > > And the second (KTHREAD/IDLE) is something that the generic code can > > easily do before calling into the arch unwinder. > > And looking at the powerpc version of it, that has even more interesting > extra checks in that function. Right, but not fundamentally different from determining @reliable I think. Anyway, it would be good if someone knowledgable could have a look at this. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [patch V2 28/29] stacktrace: Provide common infrastructure
On Thu, Apr 18, 2019 at 10:41:47AM +0200, Thomas Gleixner wrote: > +typedef bool (*stack_trace_consume_fn)(void *cookie, unsigned long addr, > + bool reliable); > +void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie, > + struct task_struct *task, struct pt_regs *regs); > +int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry, void > *cookie, > + struct task_struct *task); This bugs me a little; ideally the _reliable() thing would not exists. Thomas said that the existing __save_stack_trace_reliable() is different enough for the unification to be non-trivial, but maybe Josh can help out? From what I can see the biggest significant differences are: - it looks at the regs sets on the stack and for FP bails early - bails for khreads and idle (after it does all the hard work!?!) The first (FP checking for exceptions) should probably be reflected in consume_fn(.reliable) anyway -- although that would mean a lot of extra '?' entries where there are none today. And the second (KTHREAD/IDLE) is something that the generic code can easily do before calling into the arch unwinder. Hmm? ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [patch V2 28/29] stacktrace: Provide common infrastructure
On Thu, Apr 18, 2019 at 05:42:55PM +0200, Thomas Gleixner wrote: > On Thu, 18 Apr 2019, Josh Poimboeuf wrote: > > Another idea I had (but never got a chance to work on) was to extend the > > x86 unwind interface to all arches. So instead of the callbacks, each > > arch would implement something like this API: > I surely thought about that, but after staring at all incarnations of > arch/*/stacktrace.c I just gave up. > > Aside of that quite some archs already have callback based unwinders > because they use them for more than stacktracing and just have a single > implementation of that loop. > > I'm fine either way. We can start with x86 and then let archs convert over > their stuff, but I wouldn't hold my breath that this will be completed in > the forseeable future. I suggested the same to Thomas early on, and I even spend the time to convert some $random arch to the iterator interface, and while it is indeed entirely feasible, it is _far_ more work. The callback thing OTOH is flexible enough to do what we want to do now, and allows converting most archs to it without too much pain (as Thomas said, many archs are already in this form and only need minor API adjustments), which gets us in a far better place than we are now. And we can always go to iterators later on. But I think getting the generic unwinder improved across all archs is a really important first step here. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 2/4] kernel.h: Add non_block_start/end()
On Mon, Dec 10, 2018 at 04:01:59PM +0100, Michal Hocko wrote: > On Mon 10-12-18 15:47:11, Peter Zijlstra wrote: > > On Mon, Dec 10, 2018 at 03:13:37PM +0100, Michal Hocko wrote: > > > I do not see any scheduler guys Cced and it would be really great to get > > > their opinion here. > > > > > > On Mon 10-12-18 11:36:39, Daniel Vetter wrote: > > > > In some special cases we must not block, but there's not a > > > > spinlock, preempt-off, irqs-off or similar critical section already > > > > that arms the might_sleep() debug checks. Add a non_block_start/end() > > > > pair to annotate these. > > > > > > > > This will be used in the oom paths of mmu-notifiers, where blocking is > > > > not allowed to make sure there's forward progress. > > > > > > Considering the only alternative would be to abuse > > > preempt_{disable,enable}, and that really has a different semantic, I > > > think this makes some sense. The cotext is preemptible but we do not > > > want notifier to sleep on any locks, WQ etc. > > > > I'm confused... what is this supposed to do? > > > > And what does 'block' mean here? Without preempt_disable/IRQ-off we're > > subject to regular preemption and execution can stall for arbitrary > > amounts of time. > > The notifier is called from quite a restricted context - oom_reaper - > which shouldn't depend on any locks or sleepable conditionals. You want to exclude spinlocks too? We could maybe frob something with lockdep if you need that? > The code > should be swift as well but we mostly do care about it to make a forward > progress. Checking for sleepable context is the best thing we could come > up with that would describe these demands at least partially. OK, no real objections to the thing. Just so long we're all on the same page as to what it does and doesn't do ;-) I suppose you could extend the check to include schedule_debug() as well, maybe something like: diff --git a/kernel/sched/core.c b/kernel/sched/core.c index f66920173370..b1aaa278f1af 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3278,13 +3278,18 @@ static noinline void __schedule_bug(struct task_struct *prev) /* * Various schedule()-time debugging checks and statistics: */ -static inline void schedule_debug(struct task_struct *prev) +static inline void schedule_debug(struct task_struct *prev, bool preempt) { #ifdef CONFIG_SCHED_STACK_END_CHECK if (task_stack_end_corrupted(prev)) panic("corrupted stack end detected inside scheduler\n"); #endif +#ifdef CONFIG_DEBUG_ATOMIC_SLEEP + if (!preempt && prev->state && prev->non_block_count) + // splat +#endif + if (unlikely(in_atomic_preempt_off())) { __schedule_bug(prev); preempt_count_set(PREEMPT_DISABLED); @@ -3391,7 +3396,7 @@ static void __sched notrace __schedule(bool preempt) rq = cpu_rq(cpu); prev = rq->curr; - schedule_debug(prev); + schedule_debug(prev, preempt); if (sched_feat(HRTICK)) hrtick_clear(rq); ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 2/4] kernel.h: Add non_block_start/end()
On Mon, Dec 10, 2018 at 05:20:10PM +0100, Michal Hocko wrote: > > OK, no real objections to the thing. Just so long we're all on the same > > page as to what it does and doesn't do ;-) > > I am not really sure whether there are other potential users besides > this one and whether the check as such is justified. It's a debug option... > > I suppose you could extend the check to include schedule_debug() as > > well, maybe something like: > > Do you mean to make the check cheaper? Nah, so the patch only touched might_sleep(), the below touches schedule(). If there were a patch that hits schedule() without going through a might_sleep() (rare in practise I think, but entirely possible) then you won't get a splat without something like the below on top. > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index f66920173370..b1aaa278f1af 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -3278,13 +3278,18 @@ static noinline void __schedule_bug(struct > > task_struct *prev) > > /* > > * Various schedule()-time debugging checks and statistics: > > */ > > -static inline void schedule_debug(struct task_struct *prev) > > +static inline void schedule_debug(struct task_struct *prev, bool preempt) > > { > > #ifdef CONFIG_SCHED_STACK_END_CHECK > > if (task_stack_end_corrupted(prev)) > > panic("corrupted stack end detected inside scheduler\n"); > > #endif > > > > +#ifdef CONFIG_DEBUG_ATOMIC_SLEEP > > + if (!preempt && prev->state && prev->non_block_count) > > + // splat > > +#endif > > + > > if (unlikely(in_atomic_preempt_off())) { > > __schedule_bug(prev); > > preempt_count_set(PREEMPT_DISABLED); > > @@ -3391,7 +3396,7 @@ static void __sched notrace __schedule(bool preempt) > > rq = cpu_rq(cpu); > > prev = rq->curr; > > > > - schedule_debug(prev); > > + schedule_debug(prev, preempt); > > > > if (sched_feat(HRTICK)) > > hrtick_clear(rq); > > -- > Michal Hocko > SUSE Labs ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 2/4] kernel.h: Add non_block_start/end()
On Mon, Dec 10, 2018 at 03:13:37PM +0100, Michal Hocko wrote: > I do not see any scheduler guys Cced and it would be really great to get > their opinion here. > > On Mon 10-12-18 11:36:39, Daniel Vetter wrote: > > In some special cases we must not block, but there's not a > > spinlock, preempt-off, irqs-off or similar critical section already > > that arms the might_sleep() debug checks. Add a non_block_start/end() > > pair to annotate these. > > > > This will be used in the oom paths of mmu-notifiers, where blocking is > > not allowed to make sure there's forward progress. > > Considering the only alternative would be to abuse > preempt_{disable,enable}, and that really has a different semantic, I > think this makes some sense. The cotext is preemptible but we do not > want notifier to sleep on any locks, WQ etc. I'm confused... what is this supposed to do? And what does 'block' mean here? Without preempt_disable/IRQ-off we're subject to regular preemption and execution can stall for arbitrary amounts of time. The Changelog doesn't yield any clues. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] locking: Fix mutex debug call and ww_mutex doc
On Mon, Sep 03, 2018 at 04:07:08PM +0200, Thomas Hellstrom wrote: > Commit 08295b3b5bee ("Implement an algorithm choice for Wound-Wait > mutexes") introduced a reference in the documentation to a function > that was removed in an earlier commit. > > It also forgot to remove a call to debug_mutex_add_waiter() which is now > unconditionally called by __mutex_add_waiter(). > > Fix those issues. > > Fixes: 08295b3b5bee ("Implement an algorithm choice for Wound-Wait mutexes") > Signed-off-by: Thomas Hellstrom Thanks! ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 11/12] sched: use for_each_if in topology.h
On Mon, Jul 09, 2018 at 05:52:04PM +0200, Daniel Vetter wrote: > for_each_something(foo) > if (foo->bla) > call_bla(foo); > else > call_default(foo); Note that the kernel coding style 'discourages' this style and would like you to write: for_each_something(foo) { if (foo->bla) call_bla(foo); else call_default(foo); } Which avoids the entire problem. See CodingStyle-3: Also, use braces when a loop contains more than a single simple statement: .. code-block:: c while (condition) { if (test) do_something(); } ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 11/12] sched: use for_each_if in topology.h
On Mon, Jul 09, 2018 at 05:52:04PM +0200, Daniel Vetter wrote: > for_each_something(foo) > if (foo->bla) > call_bla(foo); > else > call_default(foo); > > Totally contrived, but this complains. Liberally sprinkling {} also shuts > up the compiler, but it's a bit confusing given that a plain for {;;} is > totally fine. And it's confusing since at first glance the compiler > complaining about nested if and ambigous else doesn't make sense since > clearly there's only 1 if there. Ah, so the pattern the compiler tries to warn about is: if (foo) if (bar) /* stmts1 */ else /* stmts2 * Because it might not be immediately obvious with which if the else goes. Which is fair enough I suppose. OK, ACK. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel