Re: [Intel-gfx] [PATCH 24/29] mm: vmscan: make global slab shrink lockless

2023-07-06 Thread Paul E. McKenney
On Fri, Jun 23, 2023 at 04:29:39PM +1000, Dave Chinner wrote:
> On Thu, Jun 22, 2023 at 05:12:02PM +0200, Vlastimil Babka wrote:
> > On 6/22/23 10:53, Qi Zheng wrote:
> > > @@ -1067,33 +1068,27 @@ static unsigned long shrink_slab(gfp_t gfp_mask, 
> > > int nid,
> > >   if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg))
> > >   return shrink_slab_memcg(gfp_mask, nid, memcg, priority);
> > >  
> > > - if (!down_read_trylock(&shrinker_rwsem))
> > > - goto out;
> > > -
> > > - list_for_each_entry(shrinker, &shrinker_list, list) {
> > > + rcu_read_lock();
> > > + list_for_each_entry_rcu(shrinker, &shrinker_list, list) {
> > >   struct shrink_control sc = {
> > >   .gfp_mask = gfp_mask,
> > >   .nid = nid,
> > >   .memcg = memcg,
> > >   };
> > >  
> > > + if (!shrinker_try_get(shrinker))
> > > + continue;
> > > + rcu_read_unlock();
> > 
> > I don't think you can do this unlock?

Sorry to be slow to respond here, this one fell through the cracks.
And thank you to Qi for reminding me!

If you do this unlock, you had jolly well better nail down the current
element (the one referenced by shrinker), for example, by acquiring an
explicit reference count on the object.  And presumably this is exactly
what shrinker_try_get() is doing.  And a look at your 24/29 confirms this,
at least assuming that shrinker->refcount is set to zero before the call
to synchronize_rcu() in free_module() *and* that synchronize_rcu() doesn't
start until *after* shrinker_put() calls complete().  Plus, as always,
the object must be removed from the list before the synchronize_rcu()
starts.  (On these parts of the puzzle, I defer to those more familiar
with this code path.  And I strongly suggest carefully commenting this
type of action-at-a-distance design pattern.)

Why is this important?  Because otherwise that object might be freed
before you get to the call to rcu_read_lock() at the end of this loop.
And if that happens, list_for_each_entry_rcu() will be walking the
freelist, which is quite bad for the health and well-being of your kernel.

There are a few other ways to make this sort of thing work:

1.  Defer the shrinker_put() to the beginning of the loop.
You would need a flag initially set to zero, and then set to
one just before (or just after) the rcu_read_lock() above.
You would also need another shrinker_old pointer to track the
old pointer.  Then at the top of the loop, if the flag is set,
invoke shrinker_put() on shrinker_old.  This ensures that the
previous shrinker structure stays around long enough to allow
the loop to find the next shrinker structure in the list.

This approach is attractive when the removal code path
can invoke shrinker_put() after the grace period ends.

2.  Make shrinker_put() invoke call_rcu() when ->refcount reaches
zero, and have the callback function free the object.  This of
course requires adding an rcu_head structure to the shrinker
structure, which might or might not be a reasonable course of
action.  If adding that rcu_head is reasonable, this simplifies
the logic quite a bit.

3.  For the shrinker-structure-removal code path, remove the shrinker
structure, then remove the initial count from ->refcount,
and then keep doing grace periods until ->refcount is zero,
then do one more.  Of course, if the result of removing the
initial count was zero, then only a single additional grace
period is required.

This would need to be carefully commented, as it is a bit
unconventional.

There are probably many other ways, but just to give an idea of a few
other ways to do this.

> > > +
> > >   ret = do_shrink_slab(&sc, shrinker, priority);
> > >   if (ret == SHRINK_EMPTY)
> > >   ret = 0;
> > >   freed += ret;
> > > - /*
> > > -  * Bail out if someone want to register a new shrinker to
> > > -  * prevent the registration from being stalled for long periods
> > > -  * by parallel ongoing shrinking.
> > > -  */
> > > - if (rwsem_is_contended(&shrinker_rwsem)) {
> > > - freed = freed ? : 1;
> > > - break;
> > > - }
> > > - }
> > >  
> > > - up_read(&shrinker_rwsem);
> > > -out:
> > > + rcu_read_lock();
> > 
> > That new rcu_read_lock() won't help AFAIK, the whole
> > list_for_each_entry_rcu() needs to be under the single rcu_read_lock() to be
> > safe.
> 
> Yeah, that's the pattern we've been taught and the one we can look
> at and immediately say "this is safe".
> 
> This is a different pattern, as has been explained bi Qi, and I
> think it *might* be safe.
> 
> *However.*
> 
> Right now I don't have time to go through a novel RCU list iteration
> pattern it one step at to determine 

Re: [Intel-gfx] [BUG?] INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-.... } 3 jiffies s: 309 root: 0x1/.

2023-04-21 Thread Paul E. McKenney
On Thu, Apr 13, 2023 at 08:30:02AM +0100, Rui Salvaterra wrote:
> Hi again, everyone.
> 
> So, while preparing to file the bug report with the requested
> information, I got a trace completely unrelated to DRM (on a swapon
> call, it seems).
> 
> [4.868340] rcu: INFO: rcu_sched detected expedited stalls on
> CPUs/tasks: { 4- } 3 jiffies s: 265 root: 0x10/.
> [4.868349] rcu: blocking rcu_node structures (internal RCU debug):
> [4.868351] Sending NMI from CPU 3 to CPUs 4:
> [4.868355] NMI backtrace for cpu 4
> [4.868357] CPU: 4 PID: 462 Comm: swapon Not tainted 6.3.0-rc6-debug+ #57
> [4.868359] Hardware name: Apple Inc.
> Macmini6,2/Mac-F65AE981FFA204ED, BIOS 429.0.0.0.0 03/18/2022
> [4.868360] RIP: 0010:zram_submit_bio+0x57c/0x940
> [4.868365] Code: 04 4c 01 f0 48 8d 48 08 f0 48 0f ba 68 08 0d 0f
> 82 80 00 00 00 4c 89 ef e8 01 eb ff ff 49 8b 45 00 4a 8d 44 30 09 f0
> 80 20 df  48 ff 45 00 48 81 eb 00 10 00 00 41 83 c4 01 48 81 fb ff
> 0f 00
> [4.868366] RSP: 0018:8881057dbcd8 EFLAGS: 0246
> [4.868368] RAX: c90001c186d9 RBX: 3e893000 RCX: 
> c90001c186d8
> [4.868369] RDX: c90001c186d0 RSI:  RDI: 
> 88810083b400
> [4.868369] RBP: 88810083b470 R08: 00027e40 R09: 
> 00025850
> [4.868370] R10: 0014b212 R11: 88810ba03180 R12: 
> 000c176d
> [4.868371] R13: 88810083b400 R14: 00c176d0 R15: 
> 
> [4.868372] FS:  7fbd8f8ce800() GS:88826610()
> knlGS:
> [4.868373] CS:  0010 DS:  ES:  CR0: 80050033
> [4.868374] CR2: 563005371000 CR3: 00010355c003 CR4: 
> 001706e0
> [4.868375] Call Trace:
> [4.868377]  
> [4.868378]  ? block_read_full_folio+0x23e/0x2e0
> [4.868383]  ? kmem_cache_alloc+0x1b/0x110
> [4.868385]  ? mempool_alloc+0x37/0x140
> [4.868388]  ? pcpu_block_update_hint_alloc+0xce/0x2f0
> [4.868390]  __submit_bio+0x41/0xd0
> [4.868394]  submit_bio_noacct_nocheck+0xc4/0x2b0
> [4.868396]  blk_next_bio+0x55/0x70
> [4.868398]  __blkdev_issue_discard+0xc8/0x180
> [4.868401]  blkdev_issue_discard+0x3c/0x80
> [4.868403]  __x64_sys_swapon+0xb71/0x1120
> [4.868407]  do_syscall_64+0x2b/0x50
> [4.868410]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
> [4.868414] RIP: 0033:0x7fbd8f712d5b
> [4.868416] Code: 73 01 c3 48 8b 0d bd 30 0e 00 f7 d8 64 89 01 48
> 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 a7 00 00
> 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8d 30 0e 00 f7 d8 64 89
> 01 48
> [4.868417] RSP: 002b:7ffcaf9a3448 EFLAGS: 0246 ORIG_RAX:
> 00a7
> [4.868418] RAX: ffda RBX: 00018064 RCX: 
> 7fbd8f712d5b
> [4.868419] RDX: 00018064 RSI: 00018064 RDI: 
> 56300535fb10
> [4.868420] RBP: 7ffcaf9a3530 R08: 00014b213000 R09: 
> 7fbd8f7f70f0
> [4.868420] R10: 1000 R11: 0246 R12: 
> 56300535fb10
> [4.868421] R13: 0064 R14: 7ffcaf9a3530 R15: 
> 
> [4.868423]  
> 
> Could it be that RCU is reporting expedited stalls too eagerly? And,
> if so, why only on this machine?

My guess would be that you have CONFIG_RCU_EXP_CPU_STALL_TIMEOUT set to
some small non-zero number, for example, you might have set up a recent
Android .config or some such.  The default of zero would give you about
21 seconds rather than the three jiffies that you are seeing.

Could you please check your .config?

Thanx, Paul


Re: [Intel-gfx] [BUG?] INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-.... } 3 jiffies s: 309 root: 0x1/.

2023-04-21 Thread Paul E. McKenney
On Thu, Apr 13, 2023 at 04:32:32PM +0100, Rui Salvaterra wrote:
> Hi, Paul,
> 
> On Thu, 13 Apr 2023 at 15:43, Paul E. McKenney  wrote:
> >
> > My guess would be that you have CONFIG_RCU_EXP_CPU_STALL_TIMEOUT set to
> > some small non-zero number, for example, you might have set up a recent
> > Android .config or some such.  The default of zero would give you about
> > 21 seconds rather than the three jiffies that you are seeing.
> >
> > Could you please check your .config?
> 
> Well, this is embarrassing. I can't fathom why/how, but I had it set
> to 20, on this machine. That is, 20 millisseconds. I guess its a
> miracle I haven't seen *more* expedited RCU traces. Sorry for the
> noise, everyone.

Been there, done that!

And actually, it is kind of reassuring that the Linux kernel avoids
tens-of-milliseocnds latency blows in the common case on your system.

Thanx, Paul


Re: [Intel-gfx] [PATCH 04/65] mm: Extract might_alloc() debug check

2020-10-23 Thread Paul E. McKenney
On Fri, Oct 23, 2020 at 02:21:15PM +0200, Daniel Vetter wrote:
> Extracted from slab.h, which seems to have the most complete version
> including the correct might_sleep() check. Roll it out to slob.c.
> 
> Motivated by a discussion with Paul about possibly changing call_rcu
> behaviour to allocate memory, but only roughly every 500th call.
> 
> There are a lot fewer places in the kernel that care about whether
> allocating memory is allowed or not (due to deadlocks with reclaim
> code) than places that care whether sleeping is allowed. But debugging
> these also tends to be a lot harder, so nice descriptive checks could
> come in handy. I might have some use eventually for annotations in
> drivers/gpu.
> 
> Note that unlike fs_reclaim_acquire/release gfpflags_allow_blocking
> does not consult the PF_MEMALLOC flags. But there is no flag
> equivalent for GFP_NOWAIT, hence this check can't go wrong due to
> memalloc_no*_save/restore contexts.
> 
> Cc: Paul E. McKenney 
> Cc: Christoph Lameter 
> Cc: Pekka Enberg 
> Cc: David Rientjes 
> Cc: Joonsoo Kim 
> Cc: Andrew Morton 
> Cc: Peter Zijlstra 
> Cc: Ingo Molnar 
> Cc: Vlastimil Babka 
> Cc: Mathieu Desnoyers 
> Cc: Sebastian Andrzej Siewior 
> Cc: Michel Lespinasse 
> Cc: Daniel Vetter 
> Cc: Waiman Long 
> Cc: Thomas Gleixner 
> Cc: Randy Dunlap 
> Cc: linux...@kvack.org
> Cc: linux-fsde...@vger.kernel.org
> Cc: Dave Chinner 
> Cc: Qian Cai 
> Cc: linux-...@vger.kernel.org
> Signed-off-by: Daniel Vetter 

Nice!!!

Acked-by: Paul E. McKenney 

> ---
>  include/linux/sched/mm.h | 16 
>  mm/slab.h|  5 +
>  mm/slob.c|  6 ++
>  3 files changed, 19 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index f889e332912f..2b0037abac0b 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -205,6 +205,22 @@ static inline void fs_reclaim_acquire(gfp_t gfp_mask) { }
>  static inline void fs_reclaim_release(gfp_t gfp_mask) { }
>  #endif
>  
> +/**
> + * might_alloc - Marks possible allocation sites
> + * @gfp_mask: gfp_t flags that would be use to allocate
> + *
> + * Similar to might_sleep() and other annotations this can be used in 
> functions
> + * that might allocate, but often dont. Compiles to nothing without
> + * CONFIG_LOCKDEP. Includes a conditional might_sleep() if @gfp allows 
> blocking.
> + */
> +static inline void might_alloc(gfp_t gfp_mask)
> +{
> + fs_reclaim_acquire(gfp_mask);
> + fs_reclaim_release(gfp_mask);
> +
> + might_sleep_if(gfpflags_allow_blocking(gfp_mask));
> +}
> +
>  /**
>   * memalloc_noio_save - Marks implicit GFP_NOIO allocation scope.
>   *
> diff --git a/mm/slab.h b/mm/slab.h
> index 6cc323f1313a..fedd789b2270 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -492,10 +492,7 @@ static inline struct kmem_cache 
> *slab_pre_alloc_hook(struct kmem_cache *s,
>  {
>   flags &= gfp_allowed_mask;
>  
> - fs_reclaim_acquire(flags);
> - fs_reclaim_release(flags);
> -
> - might_sleep_if(gfpflags_allow_blocking(flags));
> + might_alloc(flags);
>  
>   if (should_failslab(s, flags))
>   return NULL;
> diff --git a/mm/slob.c b/mm/slob.c
> index 7cc9805c8091..8d4bfa46247f 100644
> --- a/mm/slob.c
> +++ b/mm/slob.c
> @@ -474,8 +474,7 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, 
> unsigned long caller)
>  
>   gfp &= gfp_allowed_mask;
>  
> - fs_reclaim_acquire(gfp);
> - fs_reclaim_release(gfp);
> + might_alloc(gfp);
>  
>   if (size < PAGE_SIZE - minalign) {
>   int align = minalign;
> @@ -597,8 +596,7 @@ static void *slob_alloc_node(struct kmem_cache *c, gfp_t 
> flags, int node)
>  
>   flags &= gfp_allowed_mask;
>  
> - fs_reclaim_acquire(flags);
> - fs_reclaim_release(flags);
> + might_alloc(flags);
>  
>   if (c->size < PAGE_SIZE) {
>   b = slob_alloc(c->size, flags, c->align, node, 0);
> -- 
> 2.28.0
> 
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH tip/core/rcu 11/15] drm/i915: Cleanup PREEMPT_COUNT leftovers

2020-10-01 Thread Paul E. McKenney
On Thu, Oct 01, 2020 at 10:25:06AM +0200, Thomas Gleixner wrote:
> On Thu, Oct 01 2020 at 10:17, Joonas Lahtinen wrote:
> > Quoting paul...@kernel.org (2020-09-29 02:30:58)
> >> CONFIG_PREEMPT_COUNT is now unconditionally enabled and will be
> >> removed. Cleanup the leftovers before doing so.
> >
> > Change looks fine:
> >
> > Reviewed-by: Joonas Lahtinen 

Applied, thank you!

> > Are you looking for us to merge or merge through another tree?
> >
> > If us, did the base patch always enabling PREEMPT_COUNT go into 5.9 or is
> > it heading to 5.10? We can queue this earliest for 5.11 as drm-next closed
> > for 5.10 at week of -rc5.
> 
> If at all it goes through rcu/tip because it depends on the earlier patches.

I was figuring on sending a pull request later today, Pacific Time.

Thanx, Paul
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [patch 00/13] preempt: Make preempt count unconditional

2020-09-18 Thread Paul E. McKenney
On Thu, Sep 17, 2020 at 09:52:30AM +0200, Daniel Vetter wrote:
> On Thu, Sep 17, 2020 at 12:39 AM Paul E. McKenney  wrote:
> >
> > On Wed, Sep 16, 2020 at 11:43:02PM +0200, Daniel Vetter wrote:
> > > On Wed, Sep 16, 2020 at 10:58 PM Paul E. McKenney  
> > > wrote:
> > > >
> > > > On Wed, Sep 16, 2020 at 10:29:06PM +0200, Daniel Vetter wrote:
> > > > > On Wed, Sep 16, 2020 at 5:29 PM Paul E. McKenney  
> > > > > wrote:
> > > > > >
> > > > > > On Wed, Sep 16, 2020 at 09:37:17AM +0200, Daniel Vetter wrote:
> > > > > > > On Tue, Sep 15, 2020 at 7:35 PM Linus Torvalds
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Tue, Sep 15, 2020 at 1:39 AM Thomas Gleixner 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > OTOH, having a working 'preemptible()' or maybe better named
> > > > > > > > > 'can_schedule()' check makes tons of sense to make decisions 
> > > > > > > > > about
> > > > > > > > > allocation modes or other things.
> > > > > > > >
> > > > > > > > No. I think that those kinds of decisions about actual behavior 
> > > > > > > > are
> > > > > > > > always simply fundamentally wrong.
> > > > > > > >
> > > > > > > > Note that this is very different from having warnings about 
> > > > > > > > invalid
> > > > > > > > use. THAT is correct. It may not warn in all configurations, 
> > > > > > > > but that
> > > > > > > > doesn't matter: what matters is that it warns in common enough
> > > > > > > > configurations that developers will catch it.
> > > > > > > >
> > > > > > > > So having a warning in "might_sleep()" that doesn't always 
> > > > > > > > trigger,
> > > > > > > > because you have a limited configuration that can't even detect 
> > > > > > > > the
> > > > > > > > situation, that's fine and dandy and intentional.
> > > > > > > >
> > > > > > > > But having code like
> > > > > > > >
> > > > > > > >if (can_schedule())
> > > > > > > >.. do something different ..
> > > > > > > >
> > > > > > > > is fundamentally complete and utter garbage.
> > > > > > > >
> > > > > > > > It's one thing if you test for "am I in hardware interrupt 
> > > > > > > > context".
> > > > > > > > Those tests aren't great either, but at least they make sense.
> > > > > > > >
> > > > > > > > But a driver - or some library routine - making a difference 
> > > > > > > > based on
> > > > > > > > some nebulous "can I schedule" is fundamentally and basically 
> > > > > > > > WRONG.
> > > > > > > >
> > > > > > > > If some code changes behavior, it needs to be explicit to the 
> > > > > > > > *caller*
> > > > > > > > of that code.
> > > > > > > >
> > > > > > > > So this is why GFP_ATOMIC is fine, but "if (!can_schedule())
> > > > > > > > do_something_atomic()" is pure shite.
> > > > > > > >
> > > > > > > > And I am not IN THE LEAST interested in trying to help people 
> > > > > > > > doing
> > > > > > > > pure shite. We need to fix them. Like the crypto code is getting
> > > > > > > > fixed.
> > > > > > >
> > > > > > > Just figured I'll throw my +1 in from reading too many (gpu) 
> > > > > > > drivers.
> > > > > > > Code that tries to cleverly adjust its behaviour depending upon 
> > > > > > > the
> > > > > > > context it's running in is harder to understand and blows up in 
> > > > > > > more
> > > > > > >

Re: [Intel-gfx] [patch 00/13] preempt: Make preempt count unconditional

2020-09-18 Thread Paul E. McKenney
On Wed, Sep 16, 2020 at 11:43:02PM +0200, Daniel Vetter wrote:
> On Wed, Sep 16, 2020 at 10:58 PM Paul E. McKenney  wrote:
> >
> > On Wed, Sep 16, 2020 at 10:29:06PM +0200, Daniel Vetter wrote:
> > > On Wed, Sep 16, 2020 at 5:29 PM Paul E. McKenney  
> > > wrote:
> > > >
> > > > On Wed, Sep 16, 2020 at 09:37:17AM +0200, Daniel Vetter wrote:
> > > > > On Tue, Sep 15, 2020 at 7:35 PM Linus Torvalds
> > > > >  wrote:
> > > > > >
> > > > > > On Tue, Sep 15, 2020 at 1:39 AM Thomas Gleixner 
> > > > > >  wrote:
> > > > > > >
> > > > > > > OTOH, having a working 'preemptible()' or maybe better named
> > > > > > > 'can_schedule()' check makes tons of sense to make decisions about
> > > > > > > allocation modes or other things.
> > > > > >
> > > > > > No. I think that those kinds of decisions about actual behavior are
> > > > > > always simply fundamentally wrong.
> > > > > >
> > > > > > Note that this is very different from having warnings about invalid
> > > > > > use. THAT is correct. It may not warn in all configurations, but 
> > > > > > that
> > > > > > doesn't matter: what matters is that it warns in common enough
> > > > > > configurations that developers will catch it.
> > > > > >
> > > > > > So having a warning in "might_sleep()" that doesn't always trigger,
> > > > > > because you have a limited configuration that can't even detect the
> > > > > > situation, that's fine and dandy and intentional.
> > > > > >
> > > > > > But having code like
> > > > > >
> > > > > >if (can_schedule())
> > > > > >.. do something different ..
> > > > > >
> > > > > > is fundamentally complete and utter garbage.
> > > > > >
> > > > > > It's one thing if you test for "am I in hardware interrupt context".
> > > > > > Those tests aren't great either, but at least they make sense.
> > > > > >
> > > > > > But a driver - or some library routine - making a difference based 
> > > > > > on
> > > > > > some nebulous "can I schedule" is fundamentally and basically WRONG.
> > > > > >
> > > > > > If some code changes behavior, it needs to be explicit to the 
> > > > > > *caller*
> > > > > > of that code.
> > > > > >
> > > > > > So this is why GFP_ATOMIC is fine, but "if (!can_schedule())
> > > > > > do_something_atomic()" is pure shite.
> > > > > >
> > > > > > And I am not IN THE LEAST interested in trying to help people doing
> > > > > > pure shite. We need to fix them. Like the crypto code is getting
> > > > > > fixed.
> > > > >
> > > > > Just figured I'll throw my +1 in from reading too many (gpu) drivers.
> > > > > Code that tries to cleverly adjust its behaviour depending upon the
> > > > > context it's running in is harder to understand and blows up in more
> > > > > interesting ways. We still have drm_can_sleep() and it's mostly just
> > > > > used for debug code, and I've largely ended up just deleting
> > > > > everything that used it because when you're driver is blowing up the
> > > > > last thing you want is to realize your debug code and output can't be
> > > > > relied upon. Or worse, that the only Oops you have is the one in the
> > > > > debug code, because the real one scrolled away - the original idea
> > > > > behind drm_can_sleep was to make all the modeset code work
> > > > > automagically both in normal ioctl/kworker context and in the panic
> > > > > handlers or kgdb callbacks. Wishful thinking at best.
> > > > >
> > > > > Also at least for me that extends to everything, e.g. I much prefer
> > > > > explicit spin_lock and spin_lock_irq vs magic spin_lock_irqsave for
> > > > > locks shared with interrupt handlers, since the former two gives me
> > > > > clear information from which contexts such function can be called.

Re: [Intel-gfx] [patch 00/13] preempt: Make preempt count unconditional

2020-09-16 Thread Paul E. McKenney
On Wed, Sep 16, 2020 at 10:29:06PM +0200, Daniel Vetter wrote:
> On Wed, Sep 16, 2020 at 5:29 PM Paul E. McKenney  wrote:
> >
> > On Wed, Sep 16, 2020 at 09:37:17AM +0200, Daniel Vetter wrote:
> > > On Tue, Sep 15, 2020 at 7:35 PM Linus Torvalds
> > >  wrote:
> > > >
> > > > On Tue, Sep 15, 2020 at 1:39 AM Thomas Gleixner  
> > > > wrote:
> > > > >
> > > > > OTOH, having a working 'preemptible()' or maybe better named
> > > > > 'can_schedule()' check makes tons of sense to make decisions about
> > > > > allocation modes or other things.
> > > >
> > > > No. I think that those kinds of decisions about actual behavior are
> > > > always simply fundamentally wrong.
> > > >
> > > > Note that this is very different from having warnings about invalid
> > > > use. THAT is correct. It may not warn in all configurations, but that
> > > > doesn't matter: what matters is that it warns in common enough
> > > > configurations that developers will catch it.
> > > >
> > > > So having a warning in "might_sleep()" that doesn't always trigger,
> > > > because you have a limited configuration that can't even detect the
> > > > situation, that's fine and dandy and intentional.
> > > >
> > > > But having code like
> > > >
> > > >if (can_schedule())
> > > >.. do something different ..
> > > >
> > > > is fundamentally complete and utter garbage.
> > > >
> > > > It's one thing if you test for "am I in hardware interrupt context".
> > > > Those tests aren't great either, but at least they make sense.
> > > >
> > > > But a driver - or some library routine - making a difference based on
> > > > some nebulous "can I schedule" is fundamentally and basically WRONG.
> > > >
> > > > If some code changes behavior, it needs to be explicit to the *caller*
> > > > of that code.
> > > >
> > > > So this is why GFP_ATOMIC is fine, but "if (!can_schedule())
> > > > do_something_atomic()" is pure shite.
> > > >
> > > > And I am not IN THE LEAST interested in trying to help people doing
> > > > pure shite. We need to fix them. Like the crypto code is getting
> > > > fixed.
> > >
> > > Just figured I'll throw my +1 in from reading too many (gpu) drivers.
> > > Code that tries to cleverly adjust its behaviour depending upon the
> > > context it's running in is harder to understand and blows up in more
> > > interesting ways. We still have drm_can_sleep() and it's mostly just
> > > used for debug code, and I've largely ended up just deleting
> > > everything that used it because when you're driver is blowing up the
> > > last thing you want is to realize your debug code and output can't be
> > > relied upon. Or worse, that the only Oops you have is the one in the
> > > debug code, because the real one scrolled away - the original idea
> > > behind drm_can_sleep was to make all the modeset code work
> > > automagically both in normal ioctl/kworker context and in the panic
> > > handlers or kgdb callbacks. Wishful thinking at best.
> > >
> > > Also at least for me that extends to everything, e.g. I much prefer
> > > explicit spin_lock and spin_lock_irq vs magic spin_lock_irqsave for
> > > locks shared with interrupt handlers, since the former two gives me
> > > clear information from which contexts such function can be called.
> > > Other end is the memalloc_no*_save/restore functions, where I recently
> > > made a real big fool of myself because I didn't realize how much that
> > > impacts everything that's run within - suddenly "GFP_KERNEL for small
> > > stuff never fails" is wrong everywhere.
> > >
> > > It's all great for debugging and sanity checks (and we run with all
> > > that stuff enabled in our CI), but really semantic changes depending
> > > upon magic context checks freak my out :-)
> >
> > All fair, but some of us need to write code that must handle being
> > invoked from a wide variety of contexts.  Now perhaps you like the idea of
> > call_rcu() for schedulable contexts, call_rcu_nosched() when preemption
> > is disabled, call_rcu_irqs_are_disabled() when interrupts are disa

Re: [Intel-gfx] [patch 00/13] preempt: Make preempt count unconditional

2020-09-16 Thread Paul E. McKenney
On Wed, Sep 16, 2020 at 11:32:00AM -0700, Linus Torvalds wrote:
> On Wed, Sep 16, 2020 at 8:29 AM Paul E. McKenney  wrote:
> >
> > All fair, but some of us need to write code that must handle being
> > invoked from a wide variety of contexts.
> 
> Note that I think that core functionality is different from random drivers.
> 
> Of course core code can (and will) look at things like
> 
> if (in_interrupt())
> .. schedule work asynchronously ..
> 
> because core code ends up being called from odd places, and code like
> that is expected to have understanding of the rules it plays with.
> 
> But something like RCU is a very different beast from some "walk the
> scatter-gather list" code.
> 
> RCU does its work in the background, and works with lots of different
> things. And it's so core and used everywhere that it knows about these
> things. I mean, we literally have special code explicitly to let RCU
> know "we entered kernel context now".
> 
> But something like a driver list walking thing should not be doing
> different things behind peoples back depending on whether they hold
> spinlocks or not. It should either just work regardless, or there
> should be a flag (or special interface) for the "you're being called
> in a crtitical region".
> 
> Because dynamically changing behavior really is very confusing.

Whew!  I feel much better now.  ;-)

Thanx, Paul
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [patch 00/13] preempt: Make preempt count unconditional

2020-09-16 Thread Paul E. McKenney
On Wed, Sep 16, 2020 at 08:23:52PM +0100, Matthew Wilcox wrote:
> On Mon, Sep 14, 2020 at 11:55:24PM +0200, Thomas Gleixner wrote:
> > But just look at any check which uses preemptible(), especially those
> > which check !preemptible():
> 
> hmm.
> 
> +++ b/include/linux/preempt.h
> @@ -180,7 +180,9 @@ do { \
>  
>  #define preempt_enable_no_resched() sched_preempt_enable_no_resched()
>  
> +#ifndef MODULE
>  #define preemptible()  (preempt_count() == 0 && !irqs_disabled())
> +#endif
>  
>  #ifdef CONFIG_PREEMPTION
>  #define preempt_enable() \
> 
> 
> $ git grep -w preemptible drivers
> (slightly trimmed by hand to remove, eg, comments)
> drivers/firmware/arm_sdei.c:WARN_ON_ONCE(preemptible());
> drivers/firmware/arm_sdei.c:WARN_ON_ONCE(preemptible());
> drivers/firmware/arm_sdei.c:WARN_ON_ONCE(preemptible());
> drivers/firmware/arm_sdei.c:WARN_ON_ONCE(preemptible());
> drivers/firmware/arm_sdei.c:WARN_ON(preemptible());
> drivers/firmware/efi/efi-pstore.c:preemptible(), 
> record->size, record->psi->buf);
> drivers/irqchip/irq-gic-v4.c:   WARN_ON(preemptible());
> drivers/irqchip/irq-gic-v4.c:   WARN_ON(preemptible());
> drivers/scsi/hisi_sas/hisi_sas_main.c:  if (!preemptible())
> drivers/xen/time.c: BUG_ON(preemptible());
> 
> That only looks like two drivers that need more than WARNectomies.

I could easily imagine someone thinking that these did something in
CONFIG_PREEMPT_NONE=y kernels.  In fact, I could easily imagine myself
making that mistake.  :-/

> Although maybe rcu_read_load_sched_held() or rcu_read_lock_any_held()
> might get called from a module ...

But yes, from the rcutorture module for certain and also from any other
RCU-using module that includes the usual RCU debug checks.

Thanx, Paul
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [patch 00/13] preempt: Make preempt count unconditional

2020-09-16 Thread Paul E. McKenney
On Wed, Sep 16, 2020 at 09:37:17AM +0200, Daniel Vetter wrote:
> On Tue, Sep 15, 2020 at 7:35 PM Linus Torvalds
>  wrote:
> >
> > On Tue, Sep 15, 2020 at 1:39 AM Thomas Gleixner  wrote:
> > >
> > > OTOH, having a working 'preemptible()' or maybe better named
> > > 'can_schedule()' check makes tons of sense to make decisions about
> > > allocation modes or other things.
> >
> > No. I think that those kinds of decisions about actual behavior are
> > always simply fundamentally wrong.
> >
> > Note that this is very different from having warnings about invalid
> > use. THAT is correct. It may not warn in all configurations, but that
> > doesn't matter: what matters is that it warns in common enough
> > configurations that developers will catch it.
> >
> > So having a warning in "might_sleep()" that doesn't always trigger,
> > because you have a limited configuration that can't even detect the
> > situation, that's fine and dandy and intentional.
> >
> > But having code like
> >
> >if (can_schedule())
> >.. do something different ..
> >
> > is fundamentally complete and utter garbage.
> >
> > It's one thing if you test for "am I in hardware interrupt context".
> > Those tests aren't great either, but at least they make sense.
> >
> > But a driver - or some library routine - making a difference based on
> > some nebulous "can I schedule" is fundamentally and basically WRONG.
> >
> > If some code changes behavior, it needs to be explicit to the *caller*
> > of that code.
> >
> > So this is why GFP_ATOMIC is fine, but "if (!can_schedule())
> > do_something_atomic()" is pure shite.
> >
> > And I am not IN THE LEAST interested in trying to help people doing
> > pure shite. We need to fix them. Like the crypto code is getting
> > fixed.
> 
> Just figured I'll throw my +1 in from reading too many (gpu) drivers.
> Code that tries to cleverly adjust its behaviour depending upon the
> context it's running in is harder to understand and blows up in more
> interesting ways. We still have drm_can_sleep() and it's mostly just
> used for debug code, and I've largely ended up just deleting
> everything that used it because when you're driver is blowing up the
> last thing you want is to realize your debug code and output can't be
> relied upon. Or worse, that the only Oops you have is the one in the
> debug code, because the real one scrolled away - the original idea
> behind drm_can_sleep was to make all the modeset code work
> automagically both in normal ioctl/kworker context and in the panic
> handlers or kgdb callbacks. Wishful thinking at best.
> 
> Also at least for me that extends to everything, e.g. I much prefer
> explicit spin_lock and spin_lock_irq vs magic spin_lock_irqsave for
> locks shared with interrupt handlers, since the former two gives me
> clear information from which contexts such function can be called.
> Other end is the memalloc_no*_save/restore functions, where I recently
> made a real big fool of myself because I didn't realize how much that
> impacts everything that's run within - suddenly "GFP_KERNEL for small
> stuff never fails" is wrong everywhere.
> 
> It's all great for debugging and sanity checks (and we run with all
> that stuff enabled in our CI), but really semantic changes depending
> upon magic context checks freak my out :-)

All fair, but some of us need to write code that must handle being
invoked from a wide variety of contexts.  Now perhaps you like the idea of
call_rcu() for schedulable contexts, call_rcu_nosched() when preemption
is disabled, call_rcu_irqs_are_disabled() when interrupts are disabled,
call_rcu_raw_atomic() from contexts where (for example) raw spinlocks
are held, and so on.  However, from what I can see, most people instead
consistently prefer that the RCU API instead be consolidated.

Some in-flight cache-efficiency work for kvfree_rcu() and call_rcu()
needs to be able to allocate memory occasionally.  It can do that when
invoked from some contexts, but not when invoked from others.  Right now,
in !PREEMPT kernels, it cannot tell, and must either do things to the
memory allocators that some of the MM hate or must unnecessarily invoke
workqueues.  Thomas's patches would allow the code to just allocate in
the common case when these primitives are invoked from contexts where
allocation is permitted.

If we want to restrict access to the can_schedule() or whatever primitive,
fine and good.  We can add a check to checkpatch.pl, for example.  Maybe
we can go back to the old brlock approach of requiring certain people's
review for each addition to the kernel.

But there really are use cases that it would greatly help.

Thanx, Paul
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [patch 00/13] preempt: Make preempt count unconditional

2020-09-15 Thread Paul E. McKenney
On Mon, Sep 14, 2020 at 01:59:15PM -0700, Linus Torvalds wrote:
> On Mon, Sep 14, 2020 at 1:45 PM Thomas Gleixner  wrote:
> >
> > Recently merged code does:
> >
> >  gfp = preemptible() ? GFP_KERNEL : GFP_ATOMIC;
> >
> > Looks obviously correct, except for the fact that preemptible() is
> > unconditionally false for CONFIF_PREEMPT_COUNT=n, i.e. all allocations in
> > that code use GFP_ATOMIC on such kernels.
> 
> I don't think this is a good reason to entirely get rid of the no-preempt 
> thing.
> 
> The above is just garbage. It's bogus. You can't do it.
> 
> Blaming the no-preempt code for this bug is extremely unfair, imho.
> 
> And the no-preempt code does help make for much better code generation
> for simple spinlocks.
> 
> Where is that horribly buggy recent code? It's not in that exact
> format, certainly, since 'grep' doesn't find it.

It would be convenient for that "gfp =" code to work, as this would
allow better cache locality while invoking RCU callbacks, and would
further provide better robustness to callback floods.  The full story
is quite long, but here are alternatives have not yet been proven to be
abject failures:

1.  Use workqueues to do the allocations in a clean context.
While waiting for the allocations, the callbacks are queued
in the old cache-busting manner.  This functions correctly,
but in the meantime (which on busy systems can be some time)
the cache locality and robustness are lost.

2.  Provide the ability to allocate memory in raw atomic context.
This is extremely effective, especially when used in combination
with #1 above, but as you might suspect, the MM guys don't like
it much.

In contrast, with Thomas's patch series, call_rcu() and kvfree_rcu()
could just look at preemptible() to see whether or not it was safe to
allocate memory, even in !PREEMPT kernels -- and in the common case,
it almost always would be safe.  It is quite possible that this approach
would work in isolation, or failing that, that adding #1 above would do
the trick.

I understand that this is all very hand-wavy, and I do apologize for that.
If you really want the full sad story with performance numbers and the
works, let me know!

Thanx, Paul
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] rcu_barrier() no longer allowed within mmap_sem?

2020-03-30 Thread Paul E. McKenney
On Mon, Mar 30, 2020 at 03:00:35PM +0200, Daniel Vetter wrote:
> Hi all, for all = rcu, cpuhotplug and perf maintainers
> 
> We've hit an interesting new lockdep splat in our drm/i915 CI:
> 
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17096/shard-tglb7/igt@kms_frontbuffer_track...@fbcpsr-rgb101010-draw-mmap-gtt.html#dmesg-warnings861
> 
> Summarizing away the driver parts we have
> 
> < gpu locks which are held within mm->mmap_sem in various gpu fault handlers >
> 
> -> #4 (&mm->mmap_sem#2){}:
> <4> [604.892615] __might_fault+0x63/0x90
> <4> [604.892617] _copy_to_user+0x1e/0x80
> <4> [604.892619] perf_read+0x200/0x2b0
> <4> [604.892621] vfs_read+0x96/0x160
> <4> [604.892622] ksys_read+0x9f/0xe0
> <4> [604.892623] do_syscall_64+0x4f/0x220
> <4> [604.892624] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> <4> [604.892625]
> -> #3 (&cpuctx_mutex){+.+.}:
> <4> [604.892626] __mutex_lock+0x9a/0x9c0
> <4> [604.892627] perf_event_init_cpu+0xa4/0x140
> <4> [604.892629] perf_event_init+0x19d/0x1cd
> <4> [604.892630] start_kernel+0x362/0x4e4
> <4> [604.892631] secondary_startup_64+0xa4/0xb0
> <4> [604.892631]
> -> #2 (pmus_lock){+.+.}:
> <4> [604.892633] __mutex_lock+0x9a/0x9c0
> <4> [604.892633] perf_event_init_cpu+0x6b/0x140
> <4> [604.892635] cpuhp_invoke_callback+0x9b/0x9d0
> <4> [604.892636] _cpu_up+0xa2/0x140
> <4> [604.892637] do_cpu_up+0x61/0xa0
> <4> [604.892639] smp_init+0x57/0x96
> <4> [604.892639] kernel_init_freeable+0x87/0x1dc
> <4> [604.892640] kernel_init+0x5/0x100
> <4> [604.892642] ret_from_fork+0x24/0x50
> <4> [604.892642]
> -> #1 (cpu_hotplug_lock.rw_sem){}:
> <4> [604.892643] cpus_read_lock+0x34/0xd0
> <4> [604.892644] rcu_barrier+0xaa/0x190
> <4> [604.892645] kernel_init+0x21/0x100
> <4> [604.892647] ret_from_fork+0x24/0x50
> <4> [604.892647]
> -> #0 (rcu_state.barrier_mutex){+.+.}:
> <4> [604.892649] __lock_acquire+0x1328/0x15d0
> <4> [604.892650] lock_acquire+0xa7/0x1c0
> <4> [604.892651] __mutex_lock+0x9a/0x9c0
> <4> [604.892652] rcu_barrier+0x23/0x190
> <4> [604.892680] i915_gem_object_unbind+0x29d/0x3f0 [i915]
> <4> [604.892707] i915_gem_object_pin_to_display_plane+0x141/0x270 [i915]
> <4> [604.892737] intel_pin_and_fence_fb_obj+0xec/0x1f0 [i915]
> <4> [604.892767] intel_plane_pin_fb+0x3f/0xd0 [i915]
> <4> [604.892797] intel_prepare_plane_fb+0x13b/0x5c0 [i915]
> <4> [604.892798] drm_atomic_helper_prepare_planes+0x85/0x110
> <4> [604.892827] intel_atomic_commit+0xda/0x390 [i915]
> <4> [604.892828] drm_atomic_helper_set_config+0x57/0xa0
> <4> [604.892830] drm_mode_setcrtc+0x1c4/0x720
> <4> [604.892830] drm_ioctl_kernel+0xb0/0xf0
> <4> [604.892831] drm_ioctl+0x2e1/0x390
> <4> [604.892833] ksys_ioctl+0x7b/0x90
> <4> [604.892835] __x64_sys_ioctl+0x11/0x20
> <4> [604.892835] do_syscall_64+0x4f/0x220
> <4> [604.892836] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> The last backtrace boils down to i915 driver code which holds the same
> locks we are holding within mm->mmap_sem, and then ends up calling
> rcu_barrier(). From what I can see i915 is just the messenger here,
> any driver with this pattern of a lock held within mmap_sem which also
> has a path of calling rcu_barrier while holding that lock should be
> hitting this splat.
> 
> Two questions:
> - This suggests that calling rcu_barrier() isn't ok anymore while
> holding mmap_sem, or anything that has a dependency upon mmap_sem. I
> guess that's not the idea, please confirm.
> - Assuming this depedency is indeed not intended, where should the
> loop be broken? It goes through perf, cpuhotplug and rcu subsystems,
> and I don't have a clue about any of those.

Indeed, rcu_barrier() excludes CPU hotplug in order to eliminate a number
of interesting races.

Am I interpreting the above trace correctly in thinking that the various
calls to cpus_read_lock() are with mmap_sem held?  If so, can the calls
to rcu_barrier() be moved out from under the regions of code protected
by cpus_read_lock()?  Invoking rcu_barrier() with cpus_read_lock() held
is an immediate self-deadlock.

Or is rcu_barrier() somehow indirectly sometimes acquiring mmap_sem
or pmus_lock?  (Not seeing it myself, but...)

Thanx, Paul
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] list: Prevent compiler reloads inside 'safe' list iteration

2020-03-10 Thread Paul E. McKenney
On Tue, Mar 10, 2020 at 12:23:34PM +, David Laight wrote:
> From: Chris Wilson
> > Sent: 10 March 2020 11:50
> > 
> > Quoting David Laight (2020-03-10 11:36:41)
> > > From: Chris Wilson
> > > > Sent: 10 March 2020 09:21
> > > > Instruct the compiler to read the next element in the list iteration
> > > > once, and that it is not allowed to reload the value from the stale
> > > > element later. This is important as during the course of the safe
> > > > iteration, the stale element may be poisoned (unbeknownst to the
> > > > compiler).
> > >
> > > Eh?
> > > I thought any function call will stop the compiler being allowed
> > > to reload the value.
> > > The 'safe' loop iterators are only 'safe' against called
> > > code removing the current item from the list.
> > >
> > > > This helps prevent kcsan warnings over 'unsafe' conduct in releasing the
> > > > list elements during list_for_each_entry_safe() and friends.
> > >
> > > Sounds like kcsan is buggy 

Adding Marco on CC for his thoughts.

> > The warning kcsan gave made sense (a strange case where the emptying the
> > list from inside the safe iterator would allow that list to be taken
> > under a global mutex and have one extra request added to it. The
> > list_for_each_entry_safe() should be ok in this scenario, so long as the
> > next element is read before this element is dropped, and the compiler is
> > instructed not to reload the element.
> 
> Normally the loop iteration code has to hold the mutex.
> I guess it can be released inside the loop provided no other
> code can ever delete entries.
> 
> > kcsan is a little more insistent on having that annotation :)
> > 
> > In this instance I would say it was a false positive from kcsan, but I
> > can see why it would complain and suspect that given a sufficiently
> > aggressive compiler, we may be caught out by a late reload of the next
> > element.
> 
> If you have:
>   for (; p; p = next) {
>   next = p->next;
>   external_function_call(void);
>   }
> the compiler must assume that the function call
> can change 'p->next' and read it before the call.

That "must assume" is a statement of current compiler technology.
Given the progress over the past forty years, I would not expect this
restriction to hold forever.  Yes, we can and probably will get the
compiler implementers to give us command-line flags to suppress global
analysis.  But given the progress in compilers that I have seen over
the past 4+ decades, I would expect that the day will come when we won't
want to be using those command-line flags.

But if you want to ignore KCSAN's warnings, you are free to do so.

> Is this a list with strange locking rules?
> The only deletes are from within the loop.
> Adds and deletes are locked.
> The list traversal isn't locked.
> 
> I suspect kcsan bleats because it doesn't assume the compiler
> will use a single instruction/memory operation to read p->next.
> That is just stupid.

Heh!  If I am still around, I will ask you for your evaluation of the
above statement in 40 years.  Actually, 10 years will likely suffice.  ;-)

Thanx, Paul
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] list: Prevent compiler reloads inside 'safe' list iteration

2020-03-10 Thread Paul E. McKenney
On Tue, Mar 10, 2020 at 03:05:57PM +, David Laight wrote:
> From: Marco Elver
> > Sent: 10 March 2020 14:10
> ...
> > FWIW, for writes we're already being quite generous, in that plain
> > aligned writes up to word-size are assumed to be "atomic" with the
> > default (conservative) config, i.e. marking such writes is optional.
> > Although, that's a generous assumption that is not always guaranteed
> > to hold 
> > (https://lore.kernel.org/lkml/20190821103200.kpufwtviqhpbuv2n@willie-the-truck/).
> 
> Remind me to start writing everything in assembler.

Been there, done that.  :-/

> That and to mark all structure members 'volatile'.

Indeed.  READ_ONCE() and WRITE_ONCE() get this same effect, but without
pessimizing non-concurrent accesses to those same members.  Plus KCSAN
knows about READ_ONCE(), WRITE_ONCE(), and also volatile members.

Thanx, Paul
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [tip: core/rcu] drm/i915: Replace rcu_swap_protected() with rcu_replace_pointer()

2019-10-31 Thread tip-bot2 for Paul E. McKenney
The following commit has been merged into the core/rcu branch of tip:

Commit-ID: 1feace5d6a4a1acf44dde2bfb5c36cc0b1cf559c
Gitweb:
https://git.kernel.org/tip/1feace5d6a4a1acf44dde2bfb5c36cc0b1cf559c
Author:Paul E. McKenney 
AuthorDate:Mon, 23 Sep 2019 15:22:15 -07:00
Committer: Paul E. McKenney 
CommitterDate: Wed, 30 Oct 2019 08:44:04 -07:00

drm/i915: Replace rcu_swap_protected() with rcu_replace_pointer()

This commit replaces the use of rcu_swap_protected() with the more
intuitively appealing rcu_replace_pointer() as a step towards removing
rcu_swap_protected().

Link: 
https://lore.kernel.org/lkml/CAHk-=wiAsJLw1egFEE=z7-ggtm6wcvtyytxza1+bhqta4gg...@mail.gmail.com/
Reported-by: Linus Torvalds 
[ paulmck: From rcu_replace() to rcu_replace_pointer() per Ingo Molnar. ]
Signed-off-by: Paul E. McKenney 
Reviewed-by: Joonas Lahtinen 
Cc: Jani Nikula 
Cc: Rodrigo Vivi 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Chris Wilson 
Cc: Tvrtko Ursulin 
Cc: 
Cc: 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 1cdfe05..3f3e803 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1629,7 +1629,7 @@ replace:
i915_gem_context_set_user_engines(ctx);
else
i915_gem_context_clear_user_engines(ctx);
-   rcu_swap_protected(ctx->engines, set.engines, 1);
+   set.engines = rcu_replace_pointer(ctx->engines, set.engines, 1);
mutex_unlock(&ctx->engines_mutex);
 
call_rcu(&set.engines->rcu, free_engines_rcu);
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH tip/core/rcu 03/10] drivers/gpu: Replace rcu_swap_protected() with rcu_replace()

2019-10-28 Thread Paul E. McKenney
On Mon, Oct 28, 2019 at 02:57:26PM +0200, Joonas Lahtinen wrote:
> Quoting paul...@kernel.org (2019-10-22 22:12:08)
> > From: "Paul E. McKenney" 
> > 
> > This commit replaces the use of rcu_swap_protected() with the more
> > intuitively appealing rcu_replace() as a step towards removing
> > rcu_swap_protected().
> > 
> > Link: 
> > https://lore.kernel.org/lkml/CAHk-=wiAsJLw1egFEE=z7-ggtm6wcvtyytxza1+bhqta4gg...@mail.gmail.com/
> > Reported-by: Linus Torvalds 
> > [ paulmck: From rcu_replace() to rcu_replace_pointer() per Ingo Molnar. ]
> > Signed-off-by: Paul E. McKenney 
> > Cc: Jani Nikula 
> > Cc: Joonas Lahtinen 
> > Cc: Rodrigo Vivi 
> > Cc: David Airlie 
> > Cc: Daniel Vetter 
> > Cc: Chris Wilson 
> > Cc: Tvrtko Ursulin 
> > Cc: 
> > Cc: 
> 
> "drm/i915:" preferred as the subject prefix for increased specificity.

"drm/i915" it is!

> Let me know which tree you end up merging with.

I expect to be sending a pull request for inclusion into the -tip
tree in a day or three.

> Reviewed-by: Joonas Lahtinen 

Applied, thank you!

Thanx, Paul
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] linux-next: build failure after merge of the rcu tree

2017-03-08 Thread Paul E. McKenney
On Wed, Mar 08, 2017 at 11:13:38AM +0100, Daniel Vetter wrote:
> On Wed, Mar 08, 2017 at 12:16:45PM +1100, Stephen Rothwell wrote:
> > Hi Paul,
> > 
> > After merging the rcu tree, today's linux-next build (x86_64 allmodconfig)
> > failed like this:
> > 
> > In file included from include/linux/resource_ext.h:19:0,
> >  from include/linux/pci.h:32,
> >  from include/drm/drmP.h:50,
> >  from drivers/gpu/drm/i915/i915_gem.c:28:
> > drivers/gpu/drm/i915/selftests/mock_gem_device.c: In function 
> > 'mock_gem_device':
> > drivers/gpu/drm/i915/selftests/mock_gem_device.c:177:9: error: 
> > 'SLAB_DESTROY_BY_RCU' undeclared (first use in this function)
> >  SLAB_DESTROY_BY_RCU);
> >  ^
> > include/linux/slab.h:149:4: note: in definition of macro 'KMEM_CACHE'
> >(__flags), NULL)
> > ^
> > drivers/gpu/drm/i915/selftests/mock_gem_device.c:177:9: note: each 
> > undeclared identifier is reported only once for each function it appears in
> >  SLAB_DESTROY_BY_RCU);
> >  ^
> > include/linux/slab.h:149:4: note: in definition of macro 'KMEM_CACHE'
> >(__flags), NULL)
> > ^
> > /
> > 
> > Caused by commit
> > 
> >   24b7cb25b8d1 ("mm: Rename SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU")
> 
> Awesome rename. Count us in among the people who first thought this
> provides more guarantees than it does.

Glad you like it!  ;-)

> > interacting with commit
> > 
> >   0daf0113cff6 ("drm/i915: Mock infrastructure for request emission")
> > 
> > from the drm-intel tree.
> > 
> > I added the following merge fix patch:
> > 
> > From: Stephen Rothwell 
> > Date: Wed, 8 Mar 2017 12:09:49 +1100
> > Subject: [PATCH] drm/i915: merge fix for "mm: Rename SLAB_DESTROY_BY_RCU to
> >  SLAB_TYPESAFE_BY_RCU"
> > 
> > Signed-off-by: Stephen Rothwell 
> 
> Should we handle this with a topic branch? It's trivial to resolve, but I
> fear the note that this conflict exists might get lost somewhere between
> now and when the drm pull lands in Linus' inbox in 2 months ...
> 
> Otoh he's probably going to compile test drm extra carefully and will
> notice :-)

If it gets too ugly, I can always allow both SLAB_TYPESAFE_BY_RCU
and SLAB_DESTROY_BY_RCU as synonyms in 4.12, and then remove
SLAB_DESTROY_BY_RCU in 4.13.

Thanx, Paul

> -Daniel
> 
> > ---
> >  drivers/gpu/drm/i915/selftests/mock_gem_device.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c 
> > b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> > index 6a8258eacdcb..9f24c5da3f8d 100644
> > --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> > +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> > @@ -174,7 +174,7 @@ struct drm_i915_private *mock_gem_device(void)
> > i915->requests = KMEM_CACHE(mock_request,
> > SLAB_HWCACHE_ALIGN |
> > SLAB_RECLAIM_ACCOUNT |
> > -   SLAB_DESTROY_BY_RCU);
> > +   SLAB_TYPESAFE_BY_RCU);
> > if (!i915->requests)
> > goto err_vmas;
> >  
> > -- 
> > 2.11.0
> > 
> > -- 
> > Cheers,
> > Stephen Rothwell
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
> 

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/2] Revert "drm/fb-helper: Reduce READ_ONCE(master) to lockless_dereference"

2016-08-12 Thread Paul E. McKenney
On Fri, Aug 12, 2016 at 08:25:43PM +0200, Peter Zijlstra wrote:
> On Thu, Aug 11, 2016 at 11:26:47AM -0700, Paul E. McKenney wrote:
> > If my upcoming testing of the two changes together pans out, I will
> > give you a Tested-by -- I am guessing that you don't want to wait
> > until the next merge window for these changes.
> 
> I was planning to stuff them in tip/locking/urgent, so they'd end up in
> this release.

They seem to work fine for me, so for both:

Tested-by: Paul E. McKenney 

Thanx, Paul

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] drm/fb-helper: Reduce READ_ONCE(master) to lockless_dereference

2016-08-11 Thread Paul E. McKenney
On Thu, Aug 11, 2016 at 11:44:08AM +0200, Peter Zijlstra wrote:
> On Wed, Jun 22, 2016 at 08:46:12AM +0100, Chris Wilson wrote:
> > We are only documenting that the read is outside of the lock, and do not
> > require strict ordering on the operation. In this case the more relaxed
> > lockless_dereference() will suffice.
> 
> No, no, no... This is 'broken'. lockless_dereference() is _stronger_
> than READ_ONCE(), not weaker.
> 
> lockless_dereference() is a wrapper around smp_read_barrier_depends()
> and is used to form read dependencies. There is no read dependency here,
> therefore using lockless_dereference() is entirely pointless.
> 
> Look at the definition of lockless_dereference(), it does a READ_ONCE()
> and then smp_read_barrier_depends().
> 
> Also, clue is in the name: 'dereference', you don't actually dereference
> the pointer here, only load it.

What Peter said!

If you are just checking the value of an RCU-protected pointer,
that is, one on which you would use rcu_dereference() rather than
lockless_dereference(), rcu_access_pointer() does the job of READ_ONCE()
while keeping sparse happy.

Thanx, Paul

> > +++ b/drivers/gpu/drm/drm_fb_helper.c
> > @@ -464,7 +464,7 @@ static bool drm_fb_helper_is_bound(struct drm_fb_helper 
> > *fb_helper)
> >  
> > /* Sometimes user space wants everything disabled, so don't steal the
> >  * display if there's a master. */
> > -   if (READ_ONCE(dev->master))
> > +   if (lockless_dereference(dev->master))
> > return false;
> >  
> > drm_for_each_crtc(crtc, dev) {
> 

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/2] Revert "drm/fb-helper: Reduce READ_ONCE(master) to lockless_dereference"

2016-08-11 Thread Paul E. McKenney
On Thu, Aug 11, 2016 at 12:38:59PM +0200, Daniel Vetter wrote:
> On Thu, Aug 11, 2016 at 11:50:21AM +0200, Johannes Berg wrote:
> > From: Johannes Berg 
> > 
> > This reverts commit fa7d81bb3c269a2ee38b6e4d569d9eb8be1a78ad.
> > 
> > As Peter explained:
> >   [...] lockless_dereference() is _stronger_ than READ_ONCE(), not weaker.
> > 
> >   [...]
> > 
> >   Also, clue is in the name: 'dereference', you don't actually dereference
> >   the pointer here, only load it.
> > 
> > My next patch breaks compile on this, because it assumes you want to
> > deference and thus also need the struct type visible (which it isn't
> > here), so revert it.
> > 
> > Cc: Peter Zijlstra 
> > Signed-off-by: Johannes Berg 
> 
> Reviewed-by: Daniel Vetter 
> 
> And ack-by: me for merging through whatever tree this makes sense for.
> -Daniel

Initial testing says that the change below must precede the change
to the definition of lockless_dereference(), so the two should go
together.

If my upcoming testing of the two changes together pans out, I will
give you a Tested-by -- I am guessing that you don't want to wait
until the next merge window for these changes.

Thanx, Paul

> > ---
> >  drivers/gpu/drm/drm_fb_helper.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/drm_fb_helper.c 
> > b/drivers/gpu/drm/drm_fb_helper.c
> > index ce54e985d91b..0a06f9120b5a 100644
> > --- a/drivers/gpu/drm/drm_fb_helper.c
> > +++ b/drivers/gpu/drm/drm_fb_helper.c
> > @@ -464,7 +464,7 @@ static bool drm_fb_helper_is_bound(struct drm_fb_helper 
> > *fb_helper)
> >  
> > /* Sometimes user space wants everything disabled, so don't steal the
> >  * display if there's a master. */
> > -   if (lockless_dereference(dev->master))
> > +   if (READ_ONCE(dev->master))
> > return false;
> >  
> > drm_for_each_crtc(crtc, dev) {
> > -- 
> > 2.8.1
> > 
> > ___
> > dri-devel mailing list
> > dri-de...@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
> 

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] linux-next: build failure after merge of the tip tree (from the drm-intel tree)

2016-07-05 Thread Paul E. McKenney
On Tue, Jul 05, 2016 at 10:25:12AM +0200, Peter Zijlstra wrote:
> On Tue, Jul 05, 2016 at 01:53:03PM +1000, Stephen Rothwell wrote:
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c 
> > b/drivers/gpu/drm/i915/i915_gem.c
> > index d3502c0603e5..1f91f187b2a8 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -3290,7 +3290,7 @@ i915_gem_retire_work_handler(struct work_struct *work)
> >  * We do not need to do this test under locking as in the worst-case
> >  * we queue the retire worker once too often.
> >  */
> > -   if (lockless_dereference(dev_priv->gt.awake))
> > +   if (/*lockless_dereference*/(dev_priv->gt.awake))
> > queue_delayed_work(dev_priv->wq,
> >&dev_priv->gt.retire_work,
> >round_jiffies_up_relative(HZ));
> > diff --git a/drivers/gpu/drm/i915/i915_irq.c 
> > b/drivers/gpu/drm/i915/i915_irq.c
> > index f6de8dd567a2..2c1926418691 100644
> > --- a/drivers/gpu/drm/i915/i915_irq.c
> > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > @@ -3095,7 +3095,7 @@ static void i915_hangcheck_elapsed(struct work_struct 
> > *work)
> > if (!i915.enable_hangcheck)
> > return;
> >  
> > -   if (!lockless_dereference(dev_priv->gt.awake))
> > +   if (!/*lockless_dereference*/(dev_priv->gt.awake))
> > return;
> >  
> > /* As enabling the GPU requires fairly extensive mmio access,
> 
> Right, neither case appears to include a data dependency and thus
> lockless_dereference() seems misguided.

Agreed!  At first glance, this code wants READ_ONCE() rather than
lockless_dereference().

Thanx, Paul

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 1/3] drm/i915: Enable lockless lookup of request tracking via RCU

2016-01-06 Thread Paul E. McKenney
On Tue, Jan 05, 2016 at 04:06:48PM +0100, Peter Zijlstra wrote:
> On Tue, Jan 05, 2016 at 04:02:13PM +0100, Peter Zijlstra wrote:
> > > Shouldn't the slab subsystem do this for us if we request it delays the
> > > actual kfree? Seems like a core bug to me ... Adding more folks.
> > 
> > note that sync_rcu() can take a terribly long time.. but yes, I seem to
> > remember Paul talking about adding this to reclaim paths for just this
> > reason. Not sure that ever happened thouhg.

There is an RCU OOM notifier, but it just ensures that existing callbacks
get processed in a timely fashion.  It does not block, as that would
prevent other OOM notifiers from getting their memory freed quickly.

> Also, you might be wanting rcu_barrier() instead, that not only waits
> for a GP to complete, but also for all pending callbacks to be
> processed.

And in fact what the RCU OOM notifier does can be thought of as an
asynchronous open-coded rcu_barrier().  If you are interested, please
see rcu_register_oom_notifier() and friends.

> Without the latter there might still not be anything to free after it.

Another approach is synchronize_rcu() after some largish number of
requests.  The advantage of this approach is that it throttles the
production of callbacks at the source.  The corresponding disadvantage
is that it slows things up.

Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it.  Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair.  The
idea is to do something like this:

cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();

You would of course do an initial get_state_synchronize_rcu() to
get things going.  This would not block unless there was less than
one grace period's worth of time between invocations.  But this
assumes a busy system, where there is almost always a grace period
in flight.  But you can make that happen as follows:

cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
call_rcu(&my_rcu_head, noop_function);

Note that you need additional code to make sure that the old callback
has completed before doing a new one.  Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).

And there are probably other approaches as well...

Thanx, Paul

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 1/3] drm/i915: Enable lockless lookup of request tracking via RCU

2016-01-06 Thread Paul E. McKenney
On Wed, Jan 06, 2016 at 09:38:30AM +0100, Peter Zijlstra wrote:
> On Wed, Jan 06, 2016 at 09:06:58AM +0100, Daniel Vetter wrote:
> > This pretty much went over my head ;-) What I naively hoped for is that
> > kfree() on an rcu-freeing slab could be tought to magically stall a bit
> > (or at least expedite the delayed freeing) if we're piling up too many
> > freed objects.
> 
> Well, RCU does try harder when the callback list is getting 'big' (10k
> IIRC).

You got it, 10k by default, can be adjusted with the rcutree.qhimark
kernel-boot/sysfs parameter.  When a given CPU's callback list exceeds
this limit, it more aggressively starts a grace period, and if a grace
period is already in progress, it does more aggressive quiescent-state
forcing.  It does nothing to push back on processes generating callbacks,
other than by soaking up extra CPU cycles.

So, Daniel, if you haven't tried hammering the system hard, give it a
shot and see if qhimark is helping enough.  And perhaps adjust its value
if need be.  (Though please let me know if this is necessary -- if it is,
we should try to automate its setting.)

> > Doing that only in OOM is probably too late since OOM
> > handling is a bit unreliable/unpredictable. And I thought we're not the
> > first ones running into this problem.
> 
> The whole memory pressure thing is unreliable/unpredictable last time I
> looked at it, but sure, I suppose we could try and poke RCU sooner, but
> then you get into the problem of when, doing it too soon will be
> detrimental to performance, doing it too late is, well, too late.
> 
> > Do all the other users of rcu-freed slabs just open-code their own custom
> > approach? If that's the recommendation we can certainly follow that, too.
> 
> The ones I know of seem to simply ignore this problem..

I believe that there are a few that do the occasional synchronize_rcu()
to throttle themselves, but have not checked recently.

Thanx, Paul

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx