Re: [patch 00/13] preempt: Make preempt count unconditional

2020-09-29 Thread Michal Hocko
On Tue 29-09-20 11:00:03, Daniel Vetter wrote:
> On Tue, Sep 29, 2020 at 10:19:38AM +0200, Michal Hocko wrote:
> > On Wed 16-09-20 23:43:02, Daniel Vetter wrote:
> > > I can
> > > then figure out whether it's better to risk not spotting issues with
> > > call_rcu vs slapping a memalloc_noio_save/restore around all these
> > > critical section which force-degrades any allocation to GFP_ATOMIC at
> > 
> > did you mean memalloc_noreclaim_* here?
> 
> Yeah I picked the wrong one of that family of functions.
> 
> > > most, but has the risk that we run into code that assumes "GFP_KERNEL
> > > never fails for small stuff" and has a decidedly less tested fallback
> > > path than rcu code.
> > 
> > Even if the above then please note that memalloc_noreclaim_* or
> > PF_MEMALLOC should be used with an extreme care. Essentially only for
> > internal memory reclaimers. It grants access to _all_ the available
> > memory so any abuse can be detrimental to the overall system operation.
> > Allocation failure in this mode means that we are out of memory and any
> > code relying on such an allocation has to carefuly consider failure.
> > This is not a random allocation mode.
> 
> Agreed, that's why I don't like having these kind of automagic critical
> sections. It's a bit a shotgun approach. Paul said that the code would
> handle failures, but the problem is that it applies everywhere.

Ohh, in the ideal world we wouldn't need anything like that. But then
the reality fires:
* PF_MEMALLOC (resp memalloc_noreclaim_* for that matter) is primarily used
  to make sure that allocations from inside the memory reclaim - yeah that
  happens - will not recurse.
* PF_MEMALLOC_NO{FS,IO} (resp memalloc_no{fs,io}*) are used to mark no
  fs/io reclaim recursion critical sections because controling that for
  each allocation inside fs transaction (or other sensitive) or IO
  contexts turned out to be unmaintainable and people simply fallen into
  using NOFS/NOIO unconditionally which is causing reclaim imbalance
  problems.
* PF_MEMALLOC_NOCMA (resp memalloc_nocma*) is used for long term pinning
  when CMA pages cannot be pinned because that would break the CMA
  guarantees. Communicating this to all potential allocations during
  pinning is simply unfeasible.

So you are absolutely right that these critical sections with side
effects on all allocations are far from ideal from the API point of view
but they are mostly mirroring a demand for functionality which is
_practically_ impossible to achieve with our current code base. Not that
we couldn't get back to drawing board and come up with a saner thing and
rework the world...
-- 
Michal Hocko
SUSE Labs


Re: [patch 00/13] preempt: Make preempt count unconditional

2020-09-29 Thread Daniel Vetter
On Tue, Sep 29, 2020 at 10:19:38AM +0200, Michal Hocko wrote:
> On Wed 16-09-20 23:43:02, Daniel Vetter wrote:
> > I can
> > then figure out whether it's better to risk not spotting issues with
> > call_rcu vs slapping a memalloc_noio_save/restore around all these
> > critical section which force-degrades any allocation to GFP_ATOMIC at
> 
> did you mean memalloc_noreclaim_* here?

Yeah I picked the wrong one of that family of functions.

> > most, but has the risk that we run into code that assumes "GFP_KERNEL
> > never fails for small stuff" and has a decidedly less tested fallback
> > path than rcu code.
> 
> Even if the above then please note that memalloc_noreclaim_* or
> PF_MEMALLOC should be used with an extreme care. Essentially only for
> internal memory reclaimers. It grants access to _all_ the available
> memory so any abuse can be detrimental to the overall system operation.
> Allocation failure in this mode means that we are out of memory and any
> code relying on such an allocation has to carefuly consider failure.
> This is not a random allocation mode.

Agreed, that's why I don't like having these kind of automagic critical
sections. It's a bit a shotgun approach. Paul said that the code would
handle failures, but the problem is that it applies everywhere.

Anyway my understanding is that call_rcu will be reworked and gain a pile
of tricks so that these problems for the callchains leading to call_rcu
all disappear.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [patch 00/13] preempt: Make preempt count unconditional

2020-09-29 Thread Michal Hocko
On Wed 16-09-20 23:43:02, Daniel Vetter wrote:
> I can
> then figure out whether it's better to risk not spotting issues with
> call_rcu vs slapping a memalloc_noio_save/restore around all these
> critical section which force-degrades any allocation to GFP_ATOMIC at

did you mean memalloc_noreclaim_* here?

> most, but has the risk that we run into code that assumes "GFP_KERNEL
> never fails for small stuff" and has a decidedly less tested fallback
> path than rcu code.

Even if the above then please note that memalloc_noreclaim_* or
PF_MEMALLOC should be used with an extreme care. Essentially only for
internal memory reclaimers. It grants access to _all_ the available
memory so any abuse can be detrimental to the overall system operation.
Allocation failure in this mode means that we are out of memory and any
code relying on such an allocation has to carefuly consider failure.
This is not a random allocation mode.

-- 
Michal Hocko
SUSE Labs


Re: [patch 00/13] preempt: Make preempt count unconditional

2020-09-17 Thread Paul E. McKenney
On Thu, Sep 17, 2020 at 09:52:30AM +0200, Daniel Vetter wrote:
> On Thu, Sep 17, 2020 at 12:39 AM Paul E. McKenney  wrote:
> >
> > On Wed, Sep 16, 2020 at 11:43:02PM +0200, Daniel Vetter wrote:
> > > On Wed, Sep 16, 2020 at 10:58 PM Paul E. McKenney  
> > > wrote:
> > > >
> > > > On Wed, Sep 16, 2020 at 10:29:06PM +0200, Daniel Vetter wrote:
> > > > > On Wed, Sep 16, 2020 at 5:29 PM Paul E. McKenney  
> > > > > wrote:
> > > > > >
> > > > > > On Wed, Sep 16, 2020 at 09:37:17AM +0200, Daniel Vetter wrote:
> > > > > > > On Tue, Sep 15, 2020 at 7:35 PM Linus Torvalds
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Tue, Sep 15, 2020 at 1:39 AM Thomas Gleixner 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > OTOH, having a working 'preemptible()' or maybe better named
> > > > > > > > > 'can_schedule()' check makes tons of sense to make decisions 
> > > > > > > > > about
> > > > > > > > > allocation modes or other things.
> > > > > > > >
> > > > > > > > No. I think that those kinds of decisions about actual behavior 
> > > > > > > > are
> > > > > > > > always simply fundamentally wrong.
> > > > > > > >
> > > > > > > > Note that this is very different from having warnings about 
> > > > > > > > invalid
> > > > > > > > use. THAT is correct. It may not warn in all configurations, 
> > > > > > > > but that
> > > > > > > > doesn't matter: what matters is that it warns in common enough
> > > > > > > > configurations that developers will catch it.
> > > > > > > >
> > > > > > > > So having a warning in "might_sleep()" that doesn't always 
> > > > > > > > trigger,
> > > > > > > > because you have a limited configuration that can't even detect 
> > > > > > > > the
> > > > > > > > situation, that's fine and dandy and intentional.
> > > > > > > >
> > > > > > > > But having code like
> > > > > > > >
> > > > > > > >if (can_schedule())
> > > > > > > >.. do something different ..
> > > > > > > >
> > > > > > > > is fundamentally complete and utter garbage.
> > > > > > > >
> > > > > > > > It's one thing if you test for "am I in hardware interrupt 
> > > > > > > > context".
> > > > > > > > Those tests aren't great either, but at least they make sense.
> > > > > > > >
> > > > > > > > But a driver - or some library routine - making a difference 
> > > > > > > > based on
> > > > > > > > some nebulous "can I schedule" is fundamentally and basically 
> > > > > > > > WRONG.
> > > > > > > >
> > > > > > > > If some code changes behavior, it needs to be explicit to the 
> > > > > > > > *caller*
> > > > > > > > of that code.
> > > > > > > >
> > > > > > > > So this is why GFP_ATOMIC is fine, but "if (!can_schedule())
> > > > > > > > do_something_atomic()" is pure shite.
> > > > > > > >
> > > > > > > > And I am not IN THE LEAST interested in trying to help people 
> > > > > > > > doing
> > > > > > > > pure shite. We need to fix them. Like the crypto code is getting
> > > > > > > > fixed.
> > > > > > >
> > > > > > > Just figured I'll throw my +1 in from reading too many (gpu) 
> > > > > > > drivers.
> > > > > > > Code that tries to cleverly adjust its behaviour depending upon 
> > > > > > > the
> > > > > > > context it's running in is harder to understand and blows up in 
> > > > > > > more
> > > > > > > interesting ways. We still have drm_can_sleep() and it's mostly 
> > > > > > > just
> > > > > > > used for debug code, and I've largely ended up just deleting
> > > > > > > everything that used it because when you're driver is blowing up 
> > > > > > > the
> > > > > > > last thing you want is to realize your debug code and output 
> > > > > > > can't be
> > > > > > > relied upon. Or worse, that the only Oops you have is the one in 
> > > > > > > the
> > > > > > > debug code, because the real one scrolled away - the original idea
> > > > > > > behind drm_can_sleep was to make all the modeset code work
> > > > > > > automagically both in normal ioctl/kworker context and in the 
> > > > > > > panic
> > > > > > > handlers or kgdb callbacks. Wishful thinking at best.
> > > > > > >
> > > > > > > Also at least for me that extends to everything, e.g. I much 
> > > > > > > prefer
> > > > > > > explicit spin_lock and spin_lock_irq vs magic spin_lock_irqsave 
> > > > > > > for
> > > > > > > locks shared with interrupt handlers, since the former two gives 
> > > > > > > me
> > > > > > > clear information from which contexts such function can be called.
> > > > > > > Other end is the memalloc_no*_save/restore functions, where I 
> > > > > > > recently
> > > > > > > made a real big fool of myself because I didn't realize how much 
> > > > > > > that
> > > > > > > impacts everything that's run within - suddenly "GFP_KERNEL for 
> > > > > > > small
> > > > > > > stuff never fails" is wrong everywhere.
> > > > > > >
> > > > > > > It's all great for debugging and sanity checks (and we run with 
> > > > > > > all
> > > > > > > that stuff enabled in our CI), but really semantic changes 
> > > > > > > depending
> > > > > > 

Re: [patch 00/13] preempt: Make preempt count unconditional

2020-09-17 Thread Daniel Vetter
On Thu, Sep 17, 2020 at 12:39 AM Paul E. McKenney  wrote:
>
> On Wed, Sep 16, 2020 at 11:43:02PM +0200, Daniel Vetter wrote:
> > On Wed, Sep 16, 2020 at 10:58 PM Paul E. McKenney  
> > wrote:
> > >
> > > On Wed, Sep 16, 2020 at 10:29:06PM +0200, Daniel Vetter wrote:
> > > > On Wed, Sep 16, 2020 at 5:29 PM Paul E. McKenney  
> > > > wrote:
> > > > >
> > > > > On Wed, Sep 16, 2020 at 09:37:17AM +0200, Daniel Vetter wrote:
> > > > > > On Tue, Sep 15, 2020 at 7:35 PM Linus Torvalds
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Tue, Sep 15, 2020 at 1:39 AM Thomas Gleixner 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > OTOH, having a working 'preemptible()' or maybe better named
> > > > > > > > 'can_schedule()' check makes tons of sense to make decisions 
> > > > > > > > about
> > > > > > > > allocation modes or other things.
> > > > > > >
> > > > > > > No. I think that those kinds of decisions about actual behavior 
> > > > > > > are
> > > > > > > always simply fundamentally wrong.
> > > > > > >
> > > > > > > Note that this is very different from having warnings about 
> > > > > > > invalid
> > > > > > > use. THAT is correct. It may not warn in all configurations, but 
> > > > > > > that
> > > > > > > doesn't matter: what matters is that it warns in common enough
> > > > > > > configurations that developers will catch it.
> > > > > > >
> > > > > > > So having a warning in "might_sleep()" that doesn't always 
> > > > > > > trigger,
> > > > > > > because you have a limited configuration that can't even detect 
> > > > > > > the
> > > > > > > situation, that's fine and dandy and intentional.
> > > > > > >
> > > > > > > But having code like
> > > > > > >
> > > > > > >if (can_schedule())
> > > > > > >.. do something different ..
> > > > > > >
> > > > > > > is fundamentally complete and utter garbage.
> > > > > > >
> > > > > > > It's one thing if you test for "am I in hardware interrupt 
> > > > > > > context".
> > > > > > > Those tests aren't great either, but at least they make sense.
> > > > > > >
> > > > > > > But a driver - or some library routine - making a difference 
> > > > > > > based on
> > > > > > > some nebulous "can I schedule" is fundamentally and basically 
> > > > > > > WRONG.
> > > > > > >
> > > > > > > If some code changes behavior, it needs to be explicit to the 
> > > > > > > *caller*
> > > > > > > of that code.
> > > > > > >
> > > > > > > So this is why GFP_ATOMIC is fine, but "if (!can_schedule())
> > > > > > > do_something_atomic()" is pure shite.
> > > > > > >
> > > > > > > And I am not IN THE LEAST interested in trying to help people 
> > > > > > > doing
> > > > > > > pure shite. We need to fix them. Like the crypto code is getting
> > > > > > > fixed.
> > > > > >
> > > > > > Just figured I'll throw my +1 in from reading too many (gpu) 
> > > > > > drivers.
> > > > > > Code that tries to cleverly adjust its behaviour depending upon the
> > > > > > context it's running in is harder to understand and blows up in more
> > > > > > interesting ways. We still have drm_can_sleep() and it's mostly just
> > > > > > used for debug code, and I've largely ended up just deleting
> > > > > > everything that used it because when you're driver is blowing up the
> > > > > > last thing you want is to realize your debug code and output can't 
> > > > > > be
> > > > > > relied upon. Or worse, that the only Oops you have is the one in the
> > > > > > debug code, because the real one scrolled away - the original idea
> > > > > > behind drm_can_sleep was to make all the modeset code work
> > > > > > automagically both in normal ioctl/kworker context and in the panic
> > > > > > handlers or kgdb callbacks. Wishful thinking at best.
> > > > > >
> > > > > > Also at least for me that extends to everything, e.g. I much prefer
> > > > > > explicit spin_lock and spin_lock_irq vs magic spin_lock_irqsave for
> > > > > > locks shared with interrupt handlers, since the former two gives me
> > > > > > clear information from which contexts such function can be called.
> > > > > > Other end is the memalloc_no*_save/restore functions, where I 
> > > > > > recently
> > > > > > made a real big fool of myself because I didn't realize how much 
> > > > > > that
> > > > > > impacts everything that's run within - suddenly "GFP_KERNEL for 
> > > > > > small
> > > > > > stuff never fails" is wrong everywhere.
> > > > > >
> > > > > > It's all great for debugging and sanity checks (and we run with all
> > > > > > that stuff enabled in our CI), but really semantic changes depending
> > > > > > upon magic context checks freak my out :-)
> > > > >
> > > > > All fair, but some of us need to write code that must handle being
> > > > > invoked from a wide variety of contexts.  Now perhaps you like the 
> > > > > idea of
> > > > > call_rcu() for schedulable contexts, call_rcu_nosched() when 
> > > > > preemption
> > > > > is disabled, call_rcu_irqs_are_disabled() when interrupts are 
> > > > > disabled,
> > > > 

Re: [patch 00/13] preempt: Make preempt count unconditional

2020-09-17 Thread Ard Biesheuvel
On Wed, 16 Sep 2020 at 21:32, Linus Torvalds
 wrote:
>
> But something like a driver list walking thing should not be doing
> different things behind peoples back depending on whether they hold
> spinlocks or not. It should either just work regardless, or there
> should be a flag (or special interface) for the "you're being called
> in a crtitical region".
>
> Because dynamically changing behavior really is very confusing.
>

By the same reasoning, I don't think a generic crypto library should
be playing tricks with preemption en/disabling under the hood when
iterating over some data that is all directly accessible via the
linear map on the platforms that most people care about. And using
kmap_atomic() unconditionally achieves exactly that.

As I argued before, the fact that kmap_atomic() can be called from an
atomic context, and the fact that its implementation on HIGHMEM
platforms requires preemption to be disabled until the next kunmap()
are two different things, and I don't agree with your assertion that
the name kmap_atomic() implies the latter semantics. If we can avoid
disabling preemption on HIGHMEM, as Thomas suggests, we surely don't
need it on !HIGHMEM either, and given that kmap_atomic() is preferred
today anyway, we can just merge the two implementations. Are there any
existing debug features that could help us spot [ab]use of things like
raw per-CPU data within kmap_atomic regions?

Re your point about deprecating HIGHMEM: some work is underway on ARM
to implement a 3.75/3.75 GB kernel/user split on recent LPAE capable
hardware (which shouldn't suffer from the performance issues that
plagued the 4/4 split on i686), and so hopefully, there is a path
forward for ARM that does not rely on HIGHMEM as it does today.


Re: [patch 00/13] preempt: Make preempt count unconditional

2020-09-16 Thread Paul E. McKenney
On Wed, Sep 16, 2020 at 11:43:02PM +0200, Daniel Vetter wrote:
> On Wed, Sep 16, 2020 at 10:58 PM Paul E. McKenney  wrote:
> >
> > On Wed, Sep 16, 2020 at 10:29:06PM +0200, Daniel Vetter wrote:
> > > On Wed, Sep 16, 2020 at 5:29 PM Paul E. McKenney  
> > > wrote:
> > > >
> > > > On Wed, Sep 16, 2020 at 09:37:17AM +0200, Daniel Vetter wrote:
> > > > > On Tue, Sep 15, 2020 at 7:35 PM Linus Torvalds
> > > > >  wrote:
> > > > > >
> > > > > > On Tue, Sep 15, 2020 at 1:39 AM Thomas Gleixner 
> > > > > >  wrote:
> > > > > > >
> > > > > > > OTOH, having a working 'preemptible()' or maybe better named
> > > > > > > 'can_schedule()' check makes tons of sense to make decisions about
> > > > > > > allocation modes or other things.
> > > > > >
> > > > > > No. I think that those kinds of decisions about actual behavior are
> > > > > > always simply fundamentally wrong.
> > > > > >
> > > > > > Note that this is very different from having warnings about invalid
> > > > > > use. THAT is correct. It may not warn in all configurations, but 
> > > > > > that
> > > > > > doesn't matter: what matters is that it warns in common enough
> > > > > > configurations that developers will catch it.
> > > > > >
> > > > > > So having a warning in "might_sleep()" that doesn't always trigger,
> > > > > > because you have a limited configuration that can't even detect the
> > > > > > situation, that's fine and dandy and intentional.
> > > > > >
> > > > > > But having code like
> > > > > >
> > > > > >if (can_schedule())
> > > > > >.. do something different ..
> > > > > >
> > > > > > is fundamentally complete and utter garbage.
> > > > > >
> > > > > > It's one thing if you test for "am I in hardware interrupt context".
> > > > > > Those tests aren't great either, but at least they make sense.
> > > > > >
> > > > > > But a driver - or some library routine - making a difference based 
> > > > > > on
> > > > > > some nebulous "can I schedule" is fundamentally and basically WRONG.
> > > > > >
> > > > > > If some code changes behavior, it needs to be explicit to the 
> > > > > > *caller*
> > > > > > of that code.
> > > > > >
> > > > > > So this is why GFP_ATOMIC is fine, but "if (!can_schedule())
> > > > > > do_something_atomic()" is pure shite.
> > > > > >
> > > > > > And I am not IN THE LEAST interested in trying to help people doing
> > > > > > pure shite. We need to fix them. Like the crypto code is getting
> > > > > > fixed.
> > > > >
> > > > > Just figured I'll throw my +1 in from reading too many (gpu) drivers.
> > > > > Code that tries to cleverly adjust its behaviour depending upon the
> > > > > context it's running in is harder to understand and blows up in more
> > > > > interesting ways. We still have drm_can_sleep() and it's mostly just
> > > > > used for debug code, and I've largely ended up just deleting
> > > > > everything that used it because when you're driver is blowing up the
> > > > > last thing you want is to realize your debug code and output can't be
> > > > > relied upon. Or worse, that the only Oops you have is the one in the
> > > > > debug code, because the real one scrolled away - the original idea
> > > > > behind drm_can_sleep was to make all the modeset code work
> > > > > automagically both in normal ioctl/kworker context and in the panic
> > > > > handlers or kgdb callbacks. Wishful thinking at best.
> > > > >
> > > > > Also at least for me that extends to everything, e.g. I much prefer
> > > > > explicit spin_lock and spin_lock_irq vs magic spin_lock_irqsave for
> > > > > locks shared with interrupt handlers, since the former two gives me
> > > > > clear information from which contexts such function can be called.
> > > > > Other end is the memalloc_no*_save/restore functions, where I recently
> > > > > made a real big fool of myself because I didn't realize how much that
> > > > > impacts everything that's run within - suddenly "GFP_KERNEL for small
> > > > > stuff never fails" is wrong everywhere.
> > > > >
> > > > > It's all great for debugging and sanity checks (and we run with all
> > > > > that stuff enabled in our CI), but really semantic changes depending
> > > > > upon magic context checks freak my out :-)
> > > >
> > > > All fair, but some of us need to write code that must handle being
> > > > invoked from a wide variety of contexts.  Now perhaps you like the idea 
> > > > of
> > > > call_rcu() for schedulable contexts, call_rcu_nosched() when preemption
> > > > is disabled, call_rcu_irqs_are_disabled() when interrupts are disabled,
> > > > call_rcu_raw_atomic() from contexts where (for example) raw spinlocks
> > > > are held, and so on.  However, from what I can see, most people instead
> > > > consistently prefer that the RCU API instead be consolidated.
> > > >
> > > > Some in-flight cache-efficiency work for kvfree_rcu() and call_rcu()
> > > > needs to be able to allocate memory occasionally.  It can do that when
> > > > invoked from some contexts, but not when invoked from 

Re: [patch 00/13] preempt: Make preempt count unconditional

2020-09-16 Thread Daniel Vetter
On Wed, Sep 16, 2020 at 10:58 PM Paul E. McKenney  wrote:
>
> On Wed, Sep 16, 2020 at 10:29:06PM +0200, Daniel Vetter wrote:
> > On Wed, Sep 16, 2020 at 5:29 PM Paul E. McKenney  wrote:
> > >
> > > On Wed, Sep 16, 2020 at 09:37:17AM +0200, Daniel Vetter wrote:
> > > > On Tue, Sep 15, 2020 at 7:35 PM Linus Torvalds
> > > >  wrote:
> > > > >
> > > > > On Tue, Sep 15, 2020 at 1:39 AM Thomas Gleixner  
> > > > > wrote:
> > > > > >
> > > > > > OTOH, having a working 'preemptible()' or maybe better named
> > > > > > 'can_schedule()' check makes tons of sense to make decisions about
> > > > > > allocation modes or other things.
> > > > >
> > > > > No. I think that those kinds of decisions about actual behavior are
> > > > > always simply fundamentally wrong.
> > > > >
> > > > > Note that this is very different from having warnings about invalid
> > > > > use. THAT is correct. It may not warn in all configurations, but that
> > > > > doesn't matter: what matters is that it warns in common enough
> > > > > configurations that developers will catch it.
> > > > >
> > > > > So having a warning in "might_sleep()" that doesn't always trigger,
> > > > > because you have a limited configuration that can't even detect the
> > > > > situation, that's fine and dandy and intentional.
> > > > >
> > > > > But having code like
> > > > >
> > > > >if (can_schedule())
> > > > >.. do something different ..
> > > > >
> > > > > is fundamentally complete and utter garbage.
> > > > >
> > > > > It's one thing if you test for "am I in hardware interrupt context".
> > > > > Those tests aren't great either, but at least they make sense.
> > > > >
> > > > > But a driver - or some library routine - making a difference based on
> > > > > some nebulous "can I schedule" is fundamentally and basically WRONG.
> > > > >
> > > > > If some code changes behavior, it needs to be explicit to the *caller*
> > > > > of that code.
> > > > >
> > > > > So this is why GFP_ATOMIC is fine, but "if (!can_schedule())
> > > > > do_something_atomic()" is pure shite.
> > > > >
> > > > > And I am not IN THE LEAST interested in trying to help people doing
> > > > > pure shite. We need to fix them. Like the crypto code is getting
> > > > > fixed.
> > > >
> > > > Just figured I'll throw my +1 in from reading too many (gpu) drivers.
> > > > Code that tries to cleverly adjust its behaviour depending upon the
> > > > context it's running in is harder to understand and blows up in more
> > > > interesting ways. We still have drm_can_sleep() and it's mostly just
> > > > used for debug code, and I've largely ended up just deleting
> > > > everything that used it because when you're driver is blowing up the
> > > > last thing you want is to realize your debug code and output can't be
> > > > relied upon. Or worse, that the only Oops you have is the one in the
> > > > debug code, because the real one scrolled away - the original idea
> > > > behind drm_can_sleep was to make all the modeset code work
> > > > automagically both in normal ioctl/kworker context and in the panic
> > > > handlers or kgdb callbacks. Wishful thinking at best.
> > > >
> > > > Also at least for me that extends to everything, e.g. I much prefer
> > > > explicit spin_lock and spin_lock_irq vs magic spin_lock_irqsave for
> > > > locks shared with interrupt handlers, since the former two gives me
> > > > clear information from which contexts such function can be called.
> > > > Other end is the memalloc_no*_save/restore functions, where I recently
> > > > made a real big fool of myself because I didn't realize how much that
> > > > impacts everything that's run within - suddenly "GFP_KERNEL for small
> > > > stuff never fails" is wrong everywhere.
> > > >
> > > > It's all great for debugging and sanity checks (and we run with all
> > > > that stuff enabled in our CI), but really semantic changes depending
> > > > upon magic context checks freak my out :-)
> > >
> > > All fair, but some of us need to write code that must handle being
> > > invoked from a wide variety of contexts.  Now perhaps you like the idea of
> > > call_rcu() for schedulable contexts, call_rcu_nosched() when preemption
> > > is disabled, call_rcu_irqs_are_disabled() when interrupts are disabled,
> > > call_rcu_raw_atomic() from contexts where (for example) raw spinlocks
> > > are held, and so on.  However, from what I can see, most people instead
> > > consistently prefer that the RCU API instead be consolidated.
> > >
> > > Some in-flight cache-efficiency work for kvfree_rcu() and call_rcu()
> > > needs to be able to allocate memory occasionally.  It can do that when
> > > invoked from some contexts, but not when invoked from others.  Right now,
> > > in !PREEMPT kernels, it cannot tell, and must either do things to the
> > > memory allocators that some of the MM hate or must unnecessarily invoke
> > > workqueues.  Thomas's patches would allow the code to just allocate in
> > > the common case when these primitives 

Re: [patch 00/13] preempt: Make preempt count unconditional

2020-09-16 Thread Paul E. McKenney
On Wed, Sep 16, 2020 at 10:29:06PM +0200, Daniel Vetter wrote:
> On Wed, Sep 16, 2020 at 5:29 PM Paul E. McKenney  wrote:
> >
> > On Wed, Sep 16, 2020 at 09:37:17AM +0200, Daniel Vetter wrote:
> > > On Tue, Sep 15, 2020 at 7:35 PM Linus Torvalds
> > >  wrote:
> > > >
> > > > On Tue, Sep 15, 2020 at 1:39 AM Thomas Gleixner  
> > > > wrote:
> > > > >
> > > > > OTOH, having a working 'preemptible()' or maybe better named
> > > > > 'can_schedule()' check makes tons of sense to make decisions about
> > > > > allocation modes or other things.
> > > >
> > > > No. I think that those kinds of decisions about actual behavior are
> > > > always simply fundamentally wrong.
> > > >
> > > > Note that this is very different from having warnings about invalid
> > > > use. THAT is correct. It may not warn in all configurations, but that
> > > > doesn't matter: what matters is that it warns in common enough
> > > > configurations that developers will catch it.
> > > >
> > > > So having a warning in "might_sleep()" that doesn't always trigger,
> > > > because you have a limited configuration that can't even detect the
> > > > situation, that's fine and dandy and intentional.
> > > >
> > > > But having code like
> > > >
> > > >if (can_schedule())
> > > >.. do something different ..
> > > >
> > > > is fundamentally complete and utter garbage.
> > > >
> > > > It's one thing if you test for "am I in hardware interrupt context".
> > > > Those tests aren't great either, but at least they make sense.
> > > >
> > > > But a driver - or some library routine - making a difference based on
> > > > some nebulous "can I schedule" is fundamentally and basically WRONG.
> > > >
> > > > If some code changes behavior, it needs to be explicit to the *caller*
> > > > of that code.
> > > >
> > > > So this is why GFP_ATOMIC is fine, but "if (!can_schedule())
> > > > do_something_atomic()" is pure shite.
> > > >
> > > > And I am not IN THE LEAST interested in trying to help people doing
> > > > pure shite. We need to fix them. Like the crypto code is getting
> > > > fixed.
> > >
> > > Just figured I'll throw my +1 in from reading too many (gpu) drivers.
> > > Code that tries to cleverly adjust its behaviour depending upon the
> > > context it's running in is harder to understand and blows up in more
> > > interesting ways. We still have drm_can_sleep() and it's mostly just
> > > used for debug code, and I've largely ended up just deleting
> > > everything that used it because when you're driver is blowing up the
> > > last thing you want is to realize your debug code and output can't be
> > > relied upon. Or worse, that the only Oops you have is the one in the
> > > debug code, because the real one scrolled away - the original idea
> > > behind drm_can_sleep was to make all the modeset code work
> > > automagically both in normal ioctl/kworker context and in the panic
> > > handlers or kgdb callbacks. Wishful thinking at best.
> > >
> > > Also at least for me that extends to everything, e.g. I much prefer
> > > explicit spin_lock and spin_lock_irq vs magic spin_lock_irqsave for
> > > locks shared with interrupt handlers, since the former two gives me
> > > clear information from which contexts such function can be called.
> > > Other end is the memalloc_no*_save/restore functions, where I recently
> > > made a real big fool of myself because I didn't realize how much that
> > > impacts everything that's run within - suddenly "GFP_KERNEL for small
> > > stuff never fails" is wrong everywhere.
> > >
> > > It's all great for debugging and sanity checks (and we run with all
> > > that stuff enabled in our CI), but really semantic changes depending
> > > upon magic context checks freak my out :-)
> >
> > All fair, but some of us need to write code that must handle being
> > invoked from a wide variety of contexts.  Now perhaps you like the idea of
> > call_rcu() for schedulable contexts, call_rcu_nosched() when preemption
> > is disabled, call_rcu_irqs_are_disabled() when interrupts are disabled,
> > call_rcu_raw_atomic() from contexts where (for example) raw spinlocks
> > are held, and so on.  However, from what I can see, most people instead
> > consistently prefer that the RCU API instead be consolidated.
> >
> > Some in-flight cache-efficiency work for kvfree_rcu() and call_rcu()
> > needs to be able to allocate memory occasionally.  It can do that when
> > invoked from some contexts, but not when invoked from others.  Right now,
> > in !PREEMPT kernels, it cannot tell, and must either do things to the
> > memory allocators that some of the MM hate or must unnecessarily invoke
> > workqueues.  Thomas's patches would allow the code to just allocate in
> > the common case when these primitives are invoked from contexts where
> > allocation is permitted.
> >
> > If we want to restrict access to the can_schedule() or whatever primitive,
> > fine and good.  We can add a check to checkpatch.pl, for example.  Maybe
> > we can go back 

Re: [patch 00/13] preempt: Make preempt count unconditional

2020-09-16 Thread Paul E. McKenney
On Wed, Sep 16, 2020 at 08:23:52PM +0100, Matthew Wilcox wrote:
> On Mon, Sep 14, 2020 at 11:55:24PM +0200, Thomas Gleixner wrote:
> > But just look at any check which uses preemptible(), especially those
> > which check !preemptible():
> 
> hmm.
> 
> +++ b/include/linux/preempt.h
> @@ -180,7 +180,9 @@ do { \
>  
>  #define preempt_enable_no_resched() sched_preempt_enable_no_resched()
>  
> +#ifndef MODULE
>  #define preemptible()  (preempt_count() == 0 && !irqs_disabled())
> +#endif
>  
>  #ifdef CONFIG_PREEMPTION
>  #define preempt_enable() \
> 
> 
> $ git grep -w preemptible drivers
> (slightly trimmed by hand to remove, eg, comments)
> drivers/firmware/arm_sdei.c:WARN_ON_ONCE(preemptible());
> drivers/firmware/arm_sdei.c:WARN_ON_ONCE(preemptible());
> drivers/firmware/arm_sdei.c:WARN_ON_ONCE(preemptible());
> drivers/firmware/arm_sdei.c:WARN_ON_ONCE(preemptible());
> drivers/firmware/arm_sdei.c:WARN_ON(preemptible());
> drivers/firmware/efi/efi-pstore.c:preemptible(), 
> record->size, record->psi->buf);
> drivers/irqchip/irq-gic-v4.c:   WARN_ON(preemptible());
> drivers/irqchip/irq-gic-v4.c:   WARN_ON(preemptible());
> drivers/scsi/hisi_sas/hisi_sas_main.c:  if (!preemptible())
> drivers/xen/time.c: BUG_ON(preemptible());
> 
> That only looks like two drivers that need more than WARNectomies.

I could easily imagine someone thinking that these did something in
CONFIG_PREEMPT_NONE=y kernels.  In fact, I could easily imagine myself
making that mistake.  :-/

> Although maybe rcu_read_load_sched_held() or rcu_read_lock_any_held()
> might get called from a module ...

But yes, from the rcutorture module for certain and also from any other
RCU-using module that includes the usual RCU debug checks.

Thanx, Paul


Re: [patch 00/13] preempt: Make preempt count unconditional

2020-09-16 Thread Paul E. McKenney
On Wed, Sep 16, 2020 at 11:32:00AM -0700, Linus Torvalds wrote:
> On Wed, Sep 16, 2020 at 8:29 AM Paul E. McKenney  wrote:
> >
> > All fair, but some of us need to write code that must handle being
> > invoked from a wide variety of contexts.
> 
> Note that I think that core functionality is different from random drivers.
> 
> Of course core code can (and will) look at things like
> 
> if (in_interrupt())
> .. schedule work asynchronously ..
> 
> because core code ends up being called from odd places, and code like
> that is expected to have understanding of the rules it plays with.
> 
> But something like RCU is a very different beast from some "walk the
> scatter-gather list" code.
> 
> RCU does its work in the background, and works with lots of different
> things. And it's so core and used everywhere that it knows about these
> things. I mean, we literally have special code explicitly to let RCU
> know "we entered kernel context now".
> 
> But something like a driver list walking thing should not be doing
> different things behind peoples back depending on whether they hold
> spinlocks or not. It should either just work regardless, or there
> should be a flag (or special interface) for the "you're being called
> in a crtitical region".
> 
> Because dynamically changing behavior really is very confusing.

Whew!  I feel much better now.  ;-)

Thanx, Paul


Re: [patch 00/13] preempt: Make preempt count unconditional

2020-09-16 Thread Daniel Vetter
On Wed, Sep 16, 2020 at 5:29 PM Paul E. McKenney  wrote:
>
> On Wed, Sep 16, 2020 at 09:37:17AM +0200, Daniel Vetter wrote:
> > On Tue, Sep 15, 2020 at 7:35 PM Linus Torvalds
> >  wrote:
> > >
> > > On Tue, Sep 15, 2020 at 1:39 AM Thomas Gleixner  
> > > wrote:
> > > >
> > > > OTOH, having a working 'preemptible()' or maybe better named
> > > > 'can_schedule()' check makes tons of sense to make decisions about
> > > > allocation modes or other things.
> > >
> > > No. I think that those kinds of decisions about actual behavior are
> > > always simply fundamentally wrong.
> > >
> > > Note that this is very different from having warnings about invalid
> > > use. THAT is correct. It may not warn in all configurations, but that
> > > doesn't matter: what matters is that it warns in common enough
> > > configurations that developers will catch it.
> > >
> > > So having a warning in "might_sleep()" that doesn't always trigger,
> > > because you have a limited configuration that can't even detect the
> > > situation, that's fine and dandy and intentional.
> > >
> > > But having code like
> > >
> > >if (can_schedule())
> > >.. do something different ..
> > >
> > > is fundamentally complete and utter garbage.
> > >
> > > It's one thing if you test for "am I in hardware interrupt context".
> > > Those tests aren't great either, but at least they make sense.
> > >
> > > But a driver - or some library routine - making a difference based on
> > > some nebulous "can I schedule" is fundamentally and basically WRONG.
> > >
> > > If some code changes behavior, it needs to be explicit to the *caller*
> > > of that code.
> > >
> > > So this is why GFP_ATOMIC is fine, but "if (!can_schedule())
> > > do_something_atomic()" is pure shite.
> > >
> > > And I am not IN THE LEAST interested in trying to help people doing
> > > pure shite. We need to fix them. Like the crypto code is getting
> > > fixed.
> >
> > Just figured I'll throw my +1 in from reading too many (gpu) drivers.
> > Code that tries to cleverly adjust its behaviour depending upon the
> > context it's running in is harder to understand and blows up in more
> > interesting ways. We still have drm_can_sleep() and it's mostly just
> > used for debug code, and I've largely ended up just deleting
> > everything that used it because when you're driver is blowing up the
> > last thing you want is to realize your debug code and output can't be
> > relied upon. Or worse, that the only Oops you have is the one in the
> > debug code, because the real one scrolled away - the original idea
> > behind drm_can_sleep was to make all the modeset code work
> > automagically both in normal ioctl/kworker context and in the panic
> > handlers or kgdb callbacks. Wishful thinking at best.
> >
> > Also at least for me that extends to everything, e.g. I much prefer
> > explicit spin_lock and spin_lock_irq vs magic spin_lock_irqsave for
> > locks shared with interrupt handlers, since the former two gives me
> > clear information from which contexts such function can be called.
> > Other end is the memalloc_no*_save/restore functions, where I recently
> > made a real big fool of myself because I didn't realize how much that
> > impacts everything that's run within - suddenly "GFP_KERNEL for small
> > stuff never fails" is wrong everywhere.
> >
> > It's all great for debugging and sanity checks (and we run with all
> > that stuff enabled in our CI), but really semantic changes depending
> > upon magic context checks freak my out :-)
>
> All fair, but some of us need to write code that must handle being
> invoked from a wide variety of contexts.  Now perhaps you like the idea of
> call_rcu() for schedulable contexts, call_rcu_nosched() when preemption
> is disabled, call_rcu_irqs_are_disabled() when interrupts are disabled,
> call_rcu_raw_atomic() from contexts where (for example) raw spinlocks
> are held, and so on.  However, from what I can see, most people instead
> consistently prefer that the RCU API instead be consolidated.
>
> Some in-flight cache-efficiency work for kvfree_rcu() and call_rcu()
> needs to be able to allocate memory occasionally.  It can do that when
> invoked from some contexts, but not when invoked from others.  Right now,
> in !PREEMPT kernels, it cannot tell, and must either do things to the
> memory allocators that some of the MM hate or must unnecessarily invoke
> workqueues.  Thomas's patches would allow the code to just allocate in
> the common case when these primitives are invoked from contexts where
> allocation is permitted.
>
> If we want to restrict access to the can_schedule() or whatever primitive,
> fine and good.  We can add a check to checkpatch.pl, for example.  Maybe
> we can go back to the old brlock approach of requiring certain people's
> review for each addition to the kernel.
>
> But there really are use cases that it would greatly help.

We can deadlock in random fun places if random stuff we're calling
suddenly starts 

Re: [patch 00/13] preempt: Make preempt count unconditional

2020-09-16 Thread Matthew Wilcox
On Mon, Sep 14, 2020 at 11:55:24PM +0200, Thomas Gleixner wrote:
> But just look at any check which uses preemptible(), especially those
> which check !preemptible():

hmm.

+++ b/include/linux/preempt.h
@@ -180,7 +180,9 @@ do { \
 
 #define preempt_enable_no_resched() sched_preempt_enable_no_resched()
 
+#ifndef MODULE
 #define preemptible()  (preempt_count() == 0 && !irqs_disabled())
+#endif
 
 #ifdef CONFIG_PREEMPTION
 #define preempt_enable() \


$ git grep -w preemptible drivers
(slightly trimmed by hand to remove, eg, comments)
drivers/firmware/arm_sdei.c:WARN_ON_ONCE(preemptible());
drivers/firmware/arm_sdei.c:WARN_ON_ONCE(preemptible());
drivers/firmware/arm_sdei.c:WARN_ON_ONCE(preemptible());
drivers/firmware/arm_sdei.c:WARN_ON_ONCE(preemptible());
drivers/firmware/arm_sdei.c:WARN_ON(preemptible());
drivers/firmware/efi/efi-pstore.c:preemptible(), 
record->size, record->psi->buf);
drivers/irqchip/irq-gic-v4.c:   WARN_ON(preemptible());
drivers/irqchip/irq-gic-v4.c:   WARN_ON(preemptible());
drivers/scsi/hisi_sas/hisi_sas_main.c:  if (!preemptible())
drivers/xen/time.c: BUG_ON(preemptible());

That only looks like two drivers that need more than WARNectomies.

Although maybe rcu_read_load_sched_held() or rcu_read_lock_any_held()
might get called from a module ...


Re: [patch 00/13] preempt: Make preempt count unconditional

2020-09-16 Thread Paul E. McKenney
On Wed, Sep 16, 2020 at 09:37:17AM +0200, Daniel Vetter wrote:
> On Tue, Sep 15, 2020 at 7:35 PM Linus Torvalds
>  wrote:
> >
> > On Tue, Sep 15, 2020 at 1:39 AM Thomas Gleixner  wrote:
> > >
> > > OTOH, having a working 'preemptible()' or maybe better named
> > > 'can_schedule()' check makes tons of sense to make decisions about
> > > allocation modes or other things.
> >
> > No. I think that those kinds of decisions about actual behavior are
> > always simply fundamentally wrong.
> >
> > Note that this is very different from having warnings about invalid
> > use. THAT is correct. It may not warn in all configurations, but that
> > doesn't matter: what matters is that it warns in common enough
> > configurations that developers will catch it.
> >
> > So having a warning in "might_sleep()" that doesn't always trigger,
> > because you have a limited configuration that can't even detect the
> > situation, that's fine and dandy and intentional.
> >
> > But having code like
> >
> >if (can_schedule())
> >.. do something different ..
> >
> > is fundamentally complete and utter garbage.
> >
> > It's one thing if you test for "am I in hardware interrupt context".
> > Those tests aren't great either, but at least they make sense.
> >
> > But a driver - or some library routine - making a difference based on
> > some nebulous "can I schedule" is fundamentally and basically WRONG.
> >
> > If some code changes behavior, it needs to be explicit to the *caller*
> > of that code.
> >
> > So this is why GFP_ATOMIC is fine, but "if (!can_schedule())
> > do_something_atomic()" is pure shite.
> >
> > And I am not IN THE LEAST interested in trying to help people doing
> > pure shite. We need to fix them. Like the crypto code is getting
> > fixed.
> 
> Just figured I'll throw my +1 in from reading too many (gpu) drivers.
> Code that tries to cleverly adjust its behaviour depending upon the
> context it's running in is harder to understand and blows up in more
> interesting ways. We still have drm_can_sleep() and it's mostly just
> used for debug code, and I've largely ended up just deleting
> everything that used it because when you're driver is blowing up the
> last thing you want is to realize your debug code and output can't be
> relied upon. Or worse, that the only Oops you have is the one in the
> debug code, because the real one scrolled away - the original idea
> behind drm_can_sleep was to make all the modeset code work
> automagically both in normal ioctl/kworker context and in the panic
> handlers or kgdb callbacks. Wishful thinking at best.
> 
> Also at least for me that extends to everything, e.g. I much prefer
> explicit spin_lock and spin_lock_irq vs magic spin_lock_irqsave for
> locks shared with interrupt handlers, since the former two gives me
> clear information from which contexts such function can be called.
> Other end is the memalloc_no*_save/restore functions, where I recently
> made a real big fool of myself because I didn't realize how much that
> impacts everything that's run within - suddenly "GFP_KERNEL for small
> stuff never fails" is wrong everywhere.
> 
> It's all great for debugging and sanity checks (and we run with all
> that stuff enabled in our CI), but really semantic changes depending
> upon magic context checks freak my out :-)

All fair, but some of us need to write code that must handle being
invoked from a wide variety of contexts.  Now perhaps you like the idea of
call_rcu() for schedulable contexts, call_rcu_nosched() when preemption
is disabled, call_rcu_irqs_are_disabled() when interrupts are disabled,
call_rcu_raw_atomic() from contexts where (for example) raw spinlocks
are held, and so on.  However, from what I can see, most people instead
consistently prefer that the RCU API instead be consolidated.

Some in-flight cache-efficiency work for kvfree_rcu() and call_rcu()
needs to be able to allocate memory occasionally.  It can do that when
invoked from some contexts, but not when invoked from others.  Right now,
in !PREEMPT kernels, it cannot tell, and must either do things to the
memory allocators that some of the MM hate or must unnecessarily invoke
workqueues.  Thomas's patches would allow the code to just allocate in
the common case when these primitives are invoked from contexts where
allocation is permitted.

If we want to restrict access to the can_schedule() or whatever primitive,
fine and good.  We can add a check to checkpatch.pl, for example.  Maybe
we can go back to the old brlock approach of requiring certain people's
review for each addition to the kernel.

But there really are use cases that it would greatly help.

Thanx, Paul


Re: [patch 00/13] preempt: Make preempt count unconditional

2020-09-16 Thread Linus Torvalds
On Tue, Sep 15, 2020 at 12:57 PM Thomas Gleixner  wrote:
>
> You wish. I just found a 7 year old bug in a 10G network driver which
> surely would have been found if people would enable debug configs and
> not just run the crap on their PREEMPT_NONE, all debug off kernel. And
> that driver is not subject to bitrot, it gets regular bug fixes from
> people who seem to care (distro folks).

That driver clearly cannot be very well maintained. All the distro
kernels have the basic debug checks in place, afaik.

Is it some wonderful "enterprise hardware" garbage again that only
gets used in special data centers?

Becasue the "enterprise" people really are special. Very much in the
"short bus" special kind of way. The fact that they have fooled so
much of the industry into thinking that they are the competent and
serious people is a disgrace.

  Linus


Re: [patch 00/13] preempt: Make preempt count unconditional

2020-09-16 Thread Linus Torvalds
On Wed, Sep 16, 2020 at 8:29 AM Paul E. McKenney  wrote:
>
> All fair, but some of us need to write code that must handle being
> invoked from a wide variety of contexts.

Note that I think that core functionality is different from random drivers.

Of course core code can (and will) look at things like

if (in_interrupt())
.. schedule work asynchronously ..

because core code ends up being called from odd places, and code like
that is expected to have understanding of the rules it plays with.

But something like RCU is a very different beast from some "walk the
scatter-gather list" code.

RCU does its work in the background, and works with lots of different
things. And it's so core and used everywhere that it knows about these
things. I mean, we literally have special code explicitly to let RCU
know "we entered kernel context now".

But something like a driver list walking thing should not be doing
different things behind peoples back depending on whether they hold
spinlocks or not. It should either just work regardless, or there
should be a flag (or special interface) for the "you're being called
in a crtitical region".

Because dynamically changing behavior really is very confusing.

   Linus


Re: [patch 00/13] preempt: Make preempt count unconditional

2020-09-16 Thread Daniel Vetter
On Tue, Sep 15, 2020 at 7:35 PM Linus Torvalds
 wrote:
>
> On Tue, Sep 15, 2020 at 1:39 AM Thomas Gleixner  wrote:
> >
> > OTOH, having a working 'preemptible()' or maybe better named
> > 'can_schedule()' check makes tons of sense to make decisions about
> > allocation modes or other things.
>
> No. I think that those kinds of decisions about actual behavior are
> always simply fundamentally wrong.
>
> Note that this is very different from having warnings about invalid
> use. THAT is correct. It may not warn in all configurations, but that
> doesn't matter: what matters is that it warns in common enough
> configurations that developers will catch it.
>
> So having a warning in "might_sleep()" that doesn't always trigger,
> because you have a limited configuration that can't even detect the
> situation, that's fine and dandy and intentional.
>
> But having code like
>
>if (can_schedule())
>.. do something different ..
>
> is fundamentally complete and utter garbage.
>
> It's one thing if you test for "am I in hardware interrupt context".
> Those tests aren't great either, but at least they make sense.
>
> But a driver - or some library routine - making a difference based on
> some nebulous "can I schedule" is fundamentally and basically WRONG.
>
> If some code changes behavior, it needs to be explicit to the *caller*
> of that code.
>
> So this is why GFP_ATOMIC is fine, but "if (!can_schedule())
> do_something_atomic()" is pure shite.
>
> And I am not IN THE LEAST interested in trying to help people doing
> pure shite. We need to fix them. Like the crypto code is getting
> fixed.

Just figured I'll throw my +1 in from reading too many (gpu) drivers.
Code that tries to cleverly adjust its behaviour depending upon the
context it's running in is harder to understand and blows up in more
interesting ways. We still have drm_can_sleep() and it's mostly just
used for debug code, and I've largely ended up just deleting
everything that used it because when you're driver is blowing up the
last thing you want is to realize your debug code and output can't be
relied upon. Or worse, that the only Oops you have is the one in the
debug code, because the real one scrolled away - the original idea
behind drm_can_sleep was to make all the modeset code work
automagically both in normal ioctl/kworker context and in the panic
handlers or kgdb callbacks. Wishful thinking at best.

Also at least for me that extends to everything, e.g. I much prefer
explicit spin_lock and spin_lock_irq vs magic spin_lock_irqsave for
locks shared with interrupt handlers, since the former two gives me
clear information from which contexts such function can be called.
Other end is the memalloc_no*_save/restore functions, where I recently
made a real big fool of myself because I didn't realize how much that
impacts everything that's run within - suddenly "GFP_KERNEL for small
stuff never fails" is wrong everywhere.

It's all great for debugging and sanity checks (and we run with all
that stuff enabled in our CI), but really semantic changes depending
upon magic context checks freak my out :-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [patch 00/13] preempt: Make preempt count unconditional

2020-09-15 Thread Thomas Gleixner
On Tue, Sep 15 2020 at 10:35, Linus Torvalds wrote:
> On Tue, Sep 15, 2020 at 1:39 AM Thomas Gleixner  wrote:
>>
>> OTOH, having a working 'preemptible()' or maybe better named
>> 'can_schedule()' check makes tons of sense to make decisions about
>> allocation modes or other things.
>
> No. I think that those kinds of decisions about actual behavior are
> always simply fundamentally wrong.
>
> Note that this is very different from having warnings about invalid
> use. THAT is correct. It may not warn in all configurations, but that
> doesn't matter: what matters is that it warns in common enough
> configurations that developers will catch it.

You wish. I just found a 7 year old bug in a 10G network driver which
surely would have been found if people would enable debug configs and
not just run the crap on their PREEMPT_NONE, all debug off kernel. And
that driver is not subject to bitrot, it gets regular bug fixes from
people who seem to care (distro folks).

> So having a warning in "might_sleep()" that doesn't always trigger,
> because you have a limited configuration that can't even detect the
> situation, that's fine and dandy and intentional.

and lets people get away with their crap.

> But having code like
>
>if (can_schedule())
>.. do something different ..
>
> is fundamentally complete and utter garbage.
>
> It's one thing if you test for "am I in hardware interrupt context".
> Those tests aren't great either, but at least they make sense.

They make sense in limited situations like exception handlers and such
which really have to know from which context an exception was raised.

But with the above reasoning such checks do not make sense in any other
general code. 'in hard interrupt context' is just another context where
you can't do stuff which you can do when in preemptible task context.

Most tests are way broader than a single context. in_interrupt() is true
for hard interrupt, soft interrupt delivery and all BH disabled
contexts, which is completely ill defined.

> But a driver - or some library routine - making a difference based on
> some nebulous "can I schedule" is fundamentally and basically WRONG.
>
> If some code changes behavior, it needs to be explicit to the *caller*
> of that code.

I'm fine with that, but then we have to be consequent and ban _all_ of
these and not just declare can_schedule() to be a bad one.

Thanks,

tglx


Re: [patch 00/13] preempt: Make preempt count unconditional

2020-09-15 Thread Linus Torvalds
On Tue, Sep 15, 2020 at 1:39 AM Thomas Gleixner  wrote:
>
> OTOH, having a working 'preemptible()' or maybe better named
> 'can_schedule()' check makes tons of sense to make decisions about
> allocation modes or other things.

No. I think that those kinds of decisions about actual behavior are
always simply fundamentally wrong.

Note that this is very different from having warnings about invalid
use. THAT is correct. It may not warn in all configurations, but that
doesn't matter: what matters is that it warns in common enough
configurations that developers will catch it.

So having a warning in "might_sleep()" that doesn't always trigger,
because you have a limited configuration that can't even detect the
situation, that's fine and dandy and intentional.

But having code like

   if (can_schedule())
   .. do something different ..

is fundamentally complete and utter garbage.

It's one thing if you test for "am I in hardware interrupt context".
Those tests aren't great either, but at least they make sense.

But a driver - or some library routine - making a difference based on
some nebulous "can I schedule" is fundamentally and basically WRONG.

If some code changes behavior, it needs to be explicit to the *caller*
of that code.

So this is why GFP_ATOMIC is fine, but "if (!can_schedule())
do_something_atomic()" is pure shite.

And I am not IN THE LEAST interested in trying to help people doing
pure shite. We need to fix them. Like the crypto code is getting
fixed.

   Linus


Re: [patch 00/13] preempt: Make preempt count unconditional

2020-09-15 Thread Linus Torvalds
On Tue, Sep 15, 2020 at 12:24 AM Thomas Gleixner  wrote:
>
> Alternatively we just make highmem a bit more expensive by making these
> maps preemptible. RT is doing this for a long time and it's not that
> horrible.

Ack.

In fact, I've wanted to start just removing kmap support entirely. At
some point it's not so much about "I have an old machine that wants
HIGHMEM" but about "I have an old CPU, and I'll just run an old
kernel".

It's not that 32-bit is irrelevant, it's that 32-bit with large
amounts of memory is irrelevant.

Last time this was discussed, iirc the main issue was some
questionable old ARM chips that were still very common in embedded
environments, even with large memory.

But we could definitely start de-emphasizing HIGHMEM.

 Linus


Re: [patch 00/13] preempt: Make preempt count unconditional

2020-09-15 Thread Paul E. McKenney
On Mon, Sep 14, 2020 at 01:59:15PM -0700, Linus Torvalds wrote:
> On Mon, Sep 14, 2020 at 1:45 PM Thomas Gleixner  wrote:
> >
> > Recently merged code does:
> >
> >  gfp = preemptible() ? GFP_KERNEL : GFP_ATOMIC;
> >
> > Looks obviously correct, except for the fact that preemptible() is
> > unconditionally false for CONFIF_PREEMPT_COUNT=n, i.e. all allocations in
> > that code use GFP_ATOMIC on such kernels.
> 
> I don't think this is a good reason to entirely get rid of the no-preempt 
> thing.
> 
> The above is just garbage. It's bogus. You can't do it.
> 
> Blaming the no-preempt code for this bug is extremely unfair, imho.
> 
> And the no-preempt code does help make for much better code generation
> for simple spinlocks.
> 
> Where is that horribly buggy recent code? It's not in that exact
> format, certainly, since 'grep' doesn't find it.

It would be convenient for that "gfp =" code to work, as this would
allow better cache locality while invoking RCU callbacks, and would
further provide better robustness to callback floods.  The full story
is quite long, but here are alternatives have not yet been proven to be
abject failures:

1.  Use workqueues to do the allocations in a clean context.
While waiting for the allocations, the callbacks are queued
in the old cache-busting manner.  This functions correctly,
but in the meantime (which on busy systems can be some time)
the cache locality and robustness are lost.

2.  Provide the ability to allocate memory in raw atomic context.
This is extremely effective, especially when used in combination
with #1 above, but as you might suspect, the MM guys don't like
it much.

In contrast, with Thomas's patch series, call_rcu() and kvfree_rcu()
could just look at preemptible() to see whether or not it was safe to
allocate memory, even in !PREEMPT kernels -- and in the common case,
it almost always would be safe.  It is quite possible that this approach
would work in isolation, or failing that, that adding #1 above would do
the trick.

I understand that this is all very hand-wavy, and I do apologize for that.
If you really want the full sad story with performance numbers and the
works, let me know!

Thanx, Paul


Re: [patch 00/13] preempt: Make preempt count unconditional

2020-09-15 Thread Thomas Gleixner
On Mon, Sep 14 2020 at 15:24, Linus Torvalds wrote:
> On Mon, Sep 14, 2020 at 2:55 PM Thomas Gleixner  wrote:
>>
>> Yes it does generate better code, but I tried hard to spot a difference
>> in various metrics exposed by perf. It's all in the noise and I only
>> can spot a difference when the actual preemption check after the
>> decrement
>
> I'm somewhat more worried about the small-device case.

I just checked on one of my old UP ARM toys which I run at home. The .text
increase is about 2% (75k) and none of the tests I ran showed any
significant difference. Couldn't verify with perf though as the PMU on
that piece of art is unusable.

> That said, the diffstat certainly has its very clear charm, and I do
> agree that it makes things simpler.
>
> I'm just not convinced people should ever EVER do things like that "if
> (preemptible())" garbage. It sounds like somebody is doing seriously
> bad things.

OTOH, having a working 'preemptible()' or maybe better named
'can_schedule()' check makes tons of sense to make decisions about
allocation modes or other things.

We're currently looking through all of in_atomic(), in_interrupt()
etc. usage sites and quite some of them are historic and have the clear
intent of checking whether the code is called from task context or
hard/softirq context. Lots of them are completely broken or just work by
chance.

But there is clearly historic precendence that context checks are
useful, but they only can be useful if we have a consistent mechanism
which works everywhere.

Of course we could mandate that every interface which might be called
from one or the other context has a context argument or provides two
variants of the same thing. But I'm not really convinced whether that's
a win over having a consistent and reliable set of checks.

Thanks,

tglx






Re: [patch 00/13] preempt: Make preempt count unconditional

2020-09-15 Thread Thomas Gleixner
On Mon, Sep 14 2020 at 23:39, Linus Torvalds wrote:
> On Mon, Sep 14, 2020 at 11:24 PM Herbert Xu  
> wrote:
>> > But another reason I tried to avoid kmap_atomic() is that it disables
>> > preemption unconditionally, even on 64-bit architectures where HIGHMEM
>> > is irrelevant. So using kmap_atomic() here means that the bulk of
>> > WireGuard packet encryption runs with preemption disabled, essentially
>> > for legacy reasons.
>>
>> Agreed.  We should definitely fix that.
>
> Well, honestly, one big reason for that is debugging.
>
> The *semantics* of the kmap_atomic() is in the name - you can't sleep
> in between it and the kunmap_atomic().
>
> On any sane architecture, kmap_atomic() ends up being a no-op from an
> implementation standpoint, and sleeping would work just fine.
>
> But we very much want to make sure that people don't then write code
> that doesn't work on the bad old 32-bit machines where it really needs
> that sequence to be safe from preemption.

Alternatively we just make highmem a bit more expensive by making these
maps preemptible. RT is doing this for a long time and it's not that
horrible.

The approach is to keep track about the number of active maps in a task
and on an eventual context switch save them away in the task struct and
restore them when the task is scheduled back in.

Thanks,

tglx




Re: [patch 00/13] preempt: Make preempt count unconditional

2020-09-15 Thread Linus Torvalds
On Mon, Sep 14, 2020 at 11:24 PM Herbert Xu  wrote:
>
> On Tue, Sep 15, 2020 at 09:20:59AM +0300, Ard Biesheuvel wrote:
> >
> > The documentation of kmap_atomic() states the following:
> >
> >  * The use of kmap_atomic/kunmap_atomic is discouraged - kmap/kunmap
> >  * gives a more generic (and caching) interface. But kmap_atomic can
> >  * be used in IRQ contexts, so in some (very limited) cases we need
> >  * it.
> >
> > so if this is no longer accurate, perhaps we should fix it?
>
> This hasn't been accurate for at least ten years :)

Yeah, that used to be true a long long time ago, but the comment is very stale.

> > But another reason I tried to avoid kmap_atomic() is that it disables
> > preemption unconditionally, even on 64-bit architectures where HIGHMEM
> > is irrelevant. So using kmap_atomic() here means that the bulk of
> > WireGuard packet encryption runs with preemption disabled, essentially
> > for legacy reasons.
>
> Agreed.  We should definitely fix that.

Well, honestly, one big reason for that is debugging.

The *semantics* of the kmap_atomic() is in the name - you can't sleep
in between it and the kunmap_atomic().

On any sane architecture, kmap_atomic() ends up being a no-op from an
implementation standpoint, and sleeping would work just fine.

But we very much want to make sure that people don't then write code
that doesn't work on the bad old 32-bit machines where it really needs
that sequence to be safe from preemption.

So it's mostly a debug thing.

I say "mostly", because there might be small other details too, like
shared code, and perhaps even a couple of users out in the wild that
depend on the pagefault_disable() inherent in the current
kmap_atomic(), who knows..

So no, the preemption disabling isn't inherent in the operation
itself. But it does have some argument for it.

   Linus


Re: [patch 00/13] preempt: Make preempt count unconditional

2020-09-15 Thread Ard Biesheuvel
On Tue, 15 Sep 2020 at 01:43, Linus Torvalds
 wrote:
>
> On Mon, Sep 14, 2020 at 3:24 PM Linus Torvalds
>  wrote:
> >
> > Ard and Herbert added to participants: see
> > chacha20poly1305_crypt_sg_inplace(), which does
> >
> > flags = SG_MITER_TO_SG;
> > if (!preemptible())
> > flags |= SG_MITER_ATOMIC;
> >
> > introduced in commit d95312a3ccc0 ("crypto: lib/chacha20poly1305 -
> > reimplement crypt_from_sg() routine").
>
> As far as I can tell, the only reason for this all is to try to use
> "kmap()" rather than "kmap_atomic()".
>
> And kmap() actually has the much more complex "might_sleep()" tests,
> and apparently the "preemptible()" check wasn't even the proper full
> debug check, it was just a complete hack to catch the one that
> triggered.
>

This was not driven by a failing check.

The documentation of kmap_atomic() states the following:

 * The use of kmap_atomic/kunmap_atomic is discouraged - kmap/kunmap
 * gives a more generic (and caching) interface. But kmap_atomic can
 * be used in IRQ contexts, so in some (very limited) cases we need
 * it.

so if this is no longer accurate, perhaps we should fix it?

But another reason I tried to avoid kmap_atomic() is that it disables
preemption unconditionally, even on 64-bit architectures where HIGHMEM
is irrelevant. So using kmap_atomic() here means that the bulk of
WireGuard packet encryption runs with preemption disabled, essentially
for legacy reasons.


> From a quick look, that code should probably just get rid of
> SG_MITER_ATOMIC entirely, and alwayse use kmap_atomic().
>
> kmap_atomic() is actually the faster and proper interface to use
> anyway (never mind that any of this matters on any sane hardware). The
> old kmap() and kunmap() interfaces should generally be avoided like
> the plague - yes, they allow sleeping in the middle and that is
> sometimes required, but if you don't need that, you should never ever
> use them.
>
> We used to have a very nasty kmap_atomic() that required people to be
> very careful and know exactly which atomic entry to use, and that was
> admitedly quite nasty.
>
> So it _looks_ like this code started using kmap() - probably back when
> kmap_atomic() was so cumbersome to use - and was then converted
> (conditionally) to kmap_atomic() rather than just changed whole-sale.
> Is there actually something that wants to use those sg_miter functions
> and sleep?
>
> Because if there is, that choice should come from the outside, not
> from inside lib/scatterlist.c trying to make some bad guess based on
> the wrong thing entirely.
>
>  Linus