Re: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible

2020-10-02 Thread Paul E. McKenney
On Fri, Oct 02, 2020 at 08:57:46AM +0200, Michal Hocko wrote:
> On Thu 01-10-20 09:27:09, Paul E. McKenney wrote:
> [...]
> > commit ea5c19d21233b5e8d3d06c0d4ecd6be9f2829dc3
> > Author: Paul E. McKenney 
> > Date:   Thu Oct 1 09:24:40 2020 -0700
> > 
> > kvfree_rcu: Use __GFP_NOMEMALLOC for single-argument kvfree_rcu()
> > 
> > This commit applies the __GFP_NOMEMALLOC gfp flag to memory allocations
> > carried out by the single-argument variant of kvfree_rcu(), thus 
> > avoiding
> > this can-sleep code path from dipping into the emergency reserves.
> > 
> > Suggested-by: Michal Hocko 
> > Signed-off-by: Paul E. McKenney 
> 
> LGTM. At least for this one I feel competent to give you
> Acked-by: Michal Hocko 

Thank you very much!  I will apply this on the next rebase later today,
Pacific Time.

Thanx, Paul

> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 242f0f0..6132452 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3364,7 +3364,8 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
> >  {
> > struct kvfree_rcu_bulk_data *bnode;
> > bool can_alloc_page = preemptible();
> > -   gfp_t gfp = (can_sleep ? GFP_KERNEL | __GFP_RETRY_MAYFAIL : GFP_ATOMIC) 
> > | __GFP_NOWARN;
> > +   gfp_t gfp = (can_sleep ? GFP_KERNEL | __GFP_RETRY_MAYFAIL | 
> > __GFP_NOMEMALLOC
> > +  : GFP_ATOMIC) | __GFP_NOWARN;
> > int idx;
> >  
> > *krcp = krc_this_cpu_lock(flags);
> 
> -- 
> Michal Hocko
> SUSE Labs


Re: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible

2020-10-02 Thread Michal Hocko
On Thu 01-10-20 09:27:09, Paul E. McKenney wrote:
[...]
> commit ea5c19d21233b5e8d3d06c0d4ecd6be9f2829dc3
> Author: Paul E. McKenney 
> Date:   Thu Oct 1 09:24:40 2020 -0700
> 
> kvfree_rcu: Use __GFP_NOMEMALLOC for single-argument kvfree_rcu()
> 
> This commit applies the __GFP_NOMEMALLOC gfp flag to memory allocations
> carried out by the single-argument variant of kvfree_rcu(), thus avoiding
> this can-sleep code path from dipping into the emergency reserves.
> 
> Suggested-by: Michal Hocko 
> Signed-off-by: Paul E. McKenney 

LGTM. At least for this one I feel competent to give you
Acked-by: Michal Hocko 

> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 242f0f0..6132452 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3364,7 +3364,8 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
>  {
>   struct kvfree_rcu_bulk_data *bnode;
>   bool can_alloc_page = preemptible();
> - gfp_t gfp = (can_sleep ? GFP_KERNEL | __GFP_RETRY_MAYFAIL : GFP_ATOMIC) 
> | __GFP_NOWARN;
> + gfp_t gfp = (can_sleep ? GFP_KERNEL | __GFP_RETRY_MAYFAIL | 
> __GFP_NOMEMALLOC
> +: GFP_ATOMIC) | __GFP_NOWARN;
>   int idx;
>  
>   *krcp = krc_this_cpu_lock(flags);

-- 
Michal Hocko
SUSE Labs


Re: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible

2020-10-01 Thread Uladzislau Rezki
On Thu, Oct 01, 2020 at 11:02:20AM +0200, Michal Hocko wrote:
> On Wed 30-09-20 16:21:54, Paul E. McKenney wrote:
> > On Wed, Sep 30, 2020 at 10:41:39AM +0200, Michal Hocko wrote:
> > > On Tue 29-09-20 18:53:27, Paul E. McKenney wrote:
> [...]
> > > > No argument on it being confusing, and I hope that the added header
> > > > comment helps.  But specifically, can_sleep==true is a promise by the
> > > > caller to be schedulable and not to be holding any lock/mutex/whatever
> > > > that might possibly be acquired by the memory allocator or by anything
> > > > else that the memory allocator might invoke, to your point, including
> > > > for but one example the reclaim logic.
> > > > 
> > > > The only way that can_sleep==true is if this function was invoked due
> > > > to a call to single-argument kvfree_rcu(), which must be schedulable
> > > > because its fallback is to invoke synchronize_rcu().
> > > 
> > > OK. I have to say that it is still not clear to me whether this call
> > > path can be called from the memory reclaim context. If yes then you need
> > > __GFP_NOMEMALLOC as well.
> > 
> > Right now the restriction is that single-argument (AKA can_sleep==true)
> > kvfree_rcu() cannot be invoked from memory reclaim context.
> > 
> > But would adding __GFP_NOMEMALLOC to the can_sleep==true GFP_ flags
> > allow us to remove this restriction?  If so, I will queue a separate
> > patch making this change.  The improved ease of use would be well
> > worth it, if I understand correctly (ha!!!).
> 
> It would be quite daring to claim it will be ok but it will certainly be
> less problematic. Adding the flag will not hurt in any case. As this is
> a shared called that might be called from many contexts I think it will
> be safer to have it there. The justification is that it will prevent
> consumption of memory reserves from MEMALLOC contexts.
> 
> > 
> > > [...]
> > > 
> > > > > What is the point of calling kmalloc  for a PAGE_SIZE object? Wouldn't
> > > > > using the page allocator directly be better?
> > > > 
> > > > Well, you guys gave me considerable heat about abusing internal 
> > > > allocator
> > > > interfaces, and kmalloc() and kfree() seem to be about as non-internal
> > > > as you can get and still be invoking the allocator.  ;-)
> > > 
> > > alloc_pages resp. __get_free_pages is a normal page allocator interface
> > > to use for page size granular allocations. kmalloc is for more fine
> > > grained allocations.
> > 
> > OK, in the short term, both work, but I have queued a separate patch
> > making this change and recording the tradeoffs.  This is not yet a
> > promise to push this patch, but it is a promise not to lose this part
> > of the picture.  Please see below.
> 
> It doesn't matter all that much. Both allocators will work. It is just a
> matter of using optimal tool for the specific purose.
> 
> > You mentioned alloc_pages().  I reverted to __get_free_pages(), but
> > alloc_pages() of course looks nicer.  What are the tradeoffs between
> > __get_free_pages() and alloc_pages()?
> 
> alloc_pages will return struct page but you need a kernel pointer. That
> is what __get_free_pages will give you (or you can call page_address
> directly).
> 
> > Thanx, Paul
> > 
> > 
> > 
> > commit 490b638d7c241ac06cee168ccf8688bb8b872478
> > Author: Paul E. McKenney 
> > Date:   Wed Sep 30 16:16:39 2020 -0700
> > 
> > kvfree_rcu(): Switch from kmalloc/kfree to __get_free_page/free_page.
> > 
> > The advantages of using kmalloc() and kfree() are a possible small 
> > speedup
> > on CONFIG_SLAB=y systems, avoiding the allocation-side cast, and use of
> > more-familiar API members.  The advantages of using __get_free_page()
> > and free_page() are a possible reduction in fragmentation and direct
> > access to the buddy allocator.
> > 
> > To help settle the question as to which to use, this commit switches
> > from kmalloc() and kfree() to __get_free_page() and free_page().
> > 
> > Suggested-by: Michal Hocko 
> > Suggested-by: "Uladzislau Rezki (Sony)" 
> > Signed-off-by: Paul E. McKenney 
> 
> Yes, looks good to me. I am not entirely sure about the fragmentation
> argument. It really depends on the SL.B allocator internals. The same
> applies for the potential speed up. I would be even surprised if the
> SLAB was faster in average considering it has to use the page allocator
> as well. So to me the primary motivation would be "use the right tool
> for the purpose".
> 
As for raised a concern about fragmentation, i mostly was thinking about
that SLAbs where not designed to do an efficient allocations for sizes
which are >= than PAGE_SIZE. But it depends on three different
implementations, actually it also a good argument to switch to the page
allocator. I mean to get rid of such dependency.

Other side is, SLABs, at least SLAB and SLUB 

Re: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible

2020-10-01 Thread Paul E. McKenney
On Thu, Oct 01, 2020 at 11:02:20AM +0200, Michal Hocko wrote:
> On Wed 30-09-20 16:21:54, Paul E. McKenney wrote:

[ . . . ]

Hit "send" too soon, apologies...

> > 
> > 
> > commit 490b638d7c241ac06cee168ccf8688bb8b872478
> > Author: Paul E. McKenney 
> > Date:   Wed Sep 30 16:16:39 2020 -0700
> > 
> > kvfree_rcu(): Switch from kmalloc/kfree to __get_free_page/free_page.
> > 
> > The advantages of using kmalloc() and kfree() are a possible small 
> > speedup
> > on CONFIG_SLAB=y systems, avoiding the allocation-side cast, and use of
> > more-familiar API members.  The advantages of using __get_free_page()
> > and free_page() are a possible reduction in fragmentation and direct
> > access to the buddy allocator.
> > 
> > To help settle the question as to which to use, this commit switches
> > from kmalloc() and kfree() to __get_free_page() and free_page().
> > 
> > Suggested-by: Michal Hocko 
> > Suggested-by: "Uladzislau Rezki (Sony)" 
> > Signed-off-by: Paul E. McKenney 
> 
> Yes, looks good to me. I am not entirely sure about the fragmentation
> argument. It really depends on the SL.B allocator internals. The same
> applies for the potential speed up. I would be even surprised if the
> SLAB was faster in average considering it has to use the page allocator
> as well. So to me the primary motivation would be "use the right tool
> for the purpose".

Very well, I will update the commit message, and thank you!

Thanx, Paul

> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 2886e81..242f0f0 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3225,7 +3225,8 @@ static void kfree_rcu_work(struct work_struct *work)
> > bkvhead[i] = NULL;
> > krc_this_cpu_unlock(krcp, flags);
> >  
> > -   kfree(bkvhead[i]);
> > +   if (bkvhead[i])
> > +   free_page((unsigned long)bkvhead[i]);
> >  
> > cond_resched_tasks_rcu_qs();
> > }
> > @@ -3378,7 +3379,7 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
> > bnode = get_cached_bnode(*krcp);
> > if (!bnode && can_alloc_page) {
> > krc_this_cpu_unlock(*krcp, *flags);
> > -   bnode = kmalloc(PAGE_SIZE, gfp);
> > +   bnode = (struct kvfree_rcu_bulk_data 
> > *)__get_free_page(gfp);
> > *krcp = krc_this_cpu_lock(flags);
> > }
> >  
> 
> -- 
> Michal Hocko
> SUSE Labs


Re: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible

2020-10-01 Thread Paul E. McKenney
On Thu, Oct 01, 2020 at 11:02:20AM +0200, Michal Hocko wrote:
> On Wed 30-09-20 16:21:54, Paul E. McKenney wrote:
> > On Wed, Sep 30, 2020 at 10:41:39AM +0200, Michal Hocko wrote:
> > > On Tue 29-09-20 18:53:27, Paul E. McKenney wrote:
> [...]
> > > > No argument on it being confusing, and I hope that the added header
> > > > comment helps.  But specifically, can_sleep==true is a promise by the
> > > > caller to be schedulable and not to be holding any lock/mutex/whatever
> > > > that might possibly be acquired by the memory allocator or by anything
> > > > else that the memory allocator might invoke, to your point, including
> > > > for but one example the reclaim logic.
> > > > 
> > > > The only way that can_sleep==true is if this function was invoked due
> > > > to a call to single-argument kvfree_rcu(), which must be schedulable
> > > > because its fallback is to invoke synchronize_rcu().
> > > 
> > > OK. I have to say that it is still not clear to me whether this call
> > > path can be called from the memory reclaim context. If yes then you need
> > > __GFP_NOMEMALLOC as well.
> > 
> > Right now the restriction is that single-argument (AKA can_sleep==true)
> > kvfree_rcu() cannot be invoked from memory reclaim context.
> > 
> > But would adding __GFP_NOMEMALLOC to the can_sleep==true GFP_ flags
> > allow us to remove this restriction?  If so, I will queue a separate
> > patch making this change.  The improved ease of use would be well
> > worth it, if I understand correctly (ha!!!).
> 
> It would be quite daring to claim it will be ok but it will certainly be
> less problematic. Adding the flag will not hurt in any case. As this is
> a shared called that might be called from many contexts I think it will
> be safer to have it there. The justification is that it will prevent
> consumption of memory reserves from MEMALLOC contexts.

Ah, so a different goal (and yes, I finally went over and read the
relevant documentation).  Agreed, the can_sleep path does not really
need to be dipping into the emergency reserves.  And it looks like the
not-from-reclaim restriction is still at least partially in effect,
but one step at a time.

The patch is shown below, which I have queued for a later release.

> > > [...]
> > > 
> > > > > What is the point of calling kmalloc  for a PAGE_SIZE object? Wouldn't
> > > > > using the page allocator directly be better?
> > > > 
> > > > Well, you guys gave me considerable heat about abusing internal 
> > > > allocator
> > > > interfaces, and kmalloc() and kfree() seem to be about as non-internal
> > > > as you can get and still be invoking the allocator.  ;-)
> > > 
> > > alloc_pages resp. __get_free_pages is a normal page allocator interface
> > > to use for page size granular allocations. kmalloc is for more fine
> > > grained allocations.
> > 
> > OK, in the short term, both work, but I have queued a separate patch
> > making this change and recording the tradeoffs.  This is not yet a
> > promise to push this patch, but it is a promise not to lose this part
> > of the picture.  Please see below.
> 
> It doesn't matter all that much. Both allocators will work. It is just a
> matter of using optimal tool for the specific purose.
> 
> > You mentioned alloc_pages().  I reverted to __get_free_pages(), but
> > alloc_pages() of course looks nicer.  What are the tradeoffs between
> > __get_free_pages() and alloc_pages()?
> 
> alloc_pages will return struct page but you need a kernel pointer. That
> is what __get_free_pages will give you (or you can call page_address
> directly).

Thank you, looks like __get_free_pages() is the tool for this job.

Please see below for the aforementioned patch.

Thanx, Paul



commit ea5c19d21233b5e8d3d06c0d4ecd6be9f2829dc3
Author: Paul E. McKenney 
Date:   Thu Oct 1 09:24:40 2020 -0700

kvfree_rcu: Use __GFP_NOMEMALLOC for single-argument kvfree_rcu()

This commit applies the __GFP_NOMEMALLOC gfp flag to memory allocations
carried out by the single-argument variant of kvfree_rcu(), thus avoiding
this can-sleep code path from dipping into the emergency reserves.

Suggested-by: Michal Hocko 
Signed-off-by: Paul E. McKenney 

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 242f0f0..6132452 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3364,7 +3364,8 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
 {
struct kvfree_rcu_bulk_data *bnode;
bool can_alloc_page = preemptible();
-   gfp_t gfp = (can_sleep ? GFP_KERNEL | __GFP_RETRY_MAYFAIL : GFP_ATOMIC) 
| __GFP_NOWARN;
+   gfp_t gfp = (can_sleep ? GFP_KERNEL | __GFP_RETRY_MAYFAIL | 
__GFP_NOMEMALLOC
+  : GFP_ATOMIC) | __GFP_NOWARN;
int idx;
 
*krcp = krc_this_cpu_lock(flags);


Re: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible

2020-10-01 Thread Michal Hocko
On Wed 30-09-20 16:21:54, Paul E. McKenney wrote:
> On Wed, Sep 30, 2020 at 10:41:39AM +0200, Michal Hocko wrote:
> > On Tue 29-09-20 18:53:27, Paul E. McKenney wrote:
[...]
> > > No argument on it being confusing, and I hope that the added header
> > > comment helps.  But specifically, can_sleep==true is a promise by the
> > > caller to be schedulable and not to be holding any lock/mutex/whatever
> > > that might possibly be acquired by the memory allocator or by anything
> > > else that the memory allocator might invoke, to your point, including
> > > for but one example the reclaim logic.
> > > 
> > > The only way that can_sleep==true is if this function was invoked due
> > > to a call to single-argument kvfree_rcu(), which must be schedulable
> > > because its fallback is to invoke synchronize_rcu().
> > 
> > OK. I have to say that it is still not clear to me whether this call
> > path can be called from the memory reclaim context. If yes then you need
> > __GFP_NOMEMALLOC as well.
> 
> Right now the restriction is that single-argument (AKA can_sleep==true)
> kvfree_rcu() cannot be invoked from memory reclaim context.
> 
> But would adding __GFP_NOMEMALLOC to the can_sleep==true GFP_ flags
> allow us to remove this restriction?  If so, I will queue a separate
> patch making this change.  The improved ease of use would be well
> worth it, if I understand correctly (ha!!!).

It would be quite daring to claim it will be ok but it will certainly be
less problematic. Adding the flag will not hurt in any case. As this is
a shared called that might be called from many contexts I think it will
be safer to have it there. The justification is that it will prevent
consumption of memory reserves from MEMALLOC contexts.

> 
> > [...]
> > 
> > > > What is the point of calling kmalloc  for a PAGE_SIZE object? Wouldn't
> > > > using the page allocator directly be better?
> > > 
> > > Well, you guys gave me considerable heat about abusing internal allocator
> > > interfaces, and kmalloc() and kfree() seem to be about as non-internal
> > > as you can get and still be invoking the allocator.  ;-)
> > 
> > alloc_pages resp. __get_free_pages is a normal page allocator interface
> > to use for page size granular allocations. kmalloc is for more fine
> > grained allocations.
> 
> OK, in the short term, both work, but I have queued a separate patch
> making this change and recording the tradeoffs.  This is not yet a
> promise to push this patch, but it is a promise not to lose this part
> of the picture.  Please see below.

It doesn't matter all that much. Both allocators will work. It is just a
matter of using optimal tool for the specific purose.

> You mentioned alloc_pages().  I reverted to __get_free_pages(), but
> alloc_pages() of course looks nicer.  What are the tradeoffs between
> __get_free_pages() and alloc_pages()?

alloc_pages will return struct page but you need a kernel pointer. That
is what __get_free_pages will give you (or you can call page_address
directly).

>   Thanx, Paul
> 
> 
> 
> commit 490b638d7c241ac06cee168ccf8688bb8b872478
> Author: Paul E. McKenney 
> Date:   Wed Sep 30 16:16:39 2020 -0700
> 
> kvfree_rcu(): Switch from kmalloc/kfree to __get_free_page/free_page.
> 
> The advantages of using kmalloc() and kfree() are a possible small speedup
> on CONFIG_SLAB=y systems, avoiding the allocation-side cast, and use of
> more-familiar API members.  The advantages of using __get_free_page()
> and free_page() are a possible reduction in fragmentation and direct
> access to the buddy allocator.
> 
> To help settle the question as to which to use, this commit switches
> from kmalloc() and kfree() to __get_free_page() and free_page().
> 
> Suggested-by: Michal Hocko 
> Suggested-by: "Uladzislau Rezki (Sony)" 
> Signed-off-by: Paul E. McKenney 

Yes, looks good to me. I am not entirely sure about the fragmentation
argument. It really depends on the SL.B allocator internals. The same
applies for the potential speed up. I would be even surprised if the
SLAB was faster in average considering it has to use the page allocator
as well. So to me the primary motivation would be "use the right tool
for the purpose".

> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 2886e81..242f0f0 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3225,7 +3225,8 @@ static void kfree_rcu_work(struct work_struct *work)
>   bkvhead[i] = NULL;
>   krc_this_cpu_unlock(krcp, flags);
>  
> - kfree(bkvhead[i]);
> + if (bkvhead[i])
> + free_page((unsigned long)bkvhead[i]);
>  
>   cond_resched_tasks_rcu_qs();
>   }
> @@ -3378,7 +3379,7 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,

Re: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible

2020-09-30 Thread Paul E. McKenney
On Wed, Sep 30, 2020 at 10:41:39AM +0200, Michal Hocko wrote:
> On Tue 29-09-20 18:53:27, Paul E. McKenney wrote:
> > On Tue, Sep 29, 2020 at 02:07:56PM +0200, Michal Hocko wrote:
> > > On Mon 28-09-20 16:31:01, paul...@kernel.org wrote:
> > > [...]
> > 
> > Apologies for the delay, but today has not been boring.
> > 
> > > > This commit therefore uses preemptible() to determine whether allocation
> > > > is possible at all for double-argument kvfree_rcu().
> > > 
> > > This deserves a comment. Because GFP_ATOMIC is possible for many
> > > !preemptible() contexts. It is the raw_spin_lock, NMIs and likely few
> > > others that are a problem. You are taking a conservative approach which
> > > is fine but it would be good to articulate that explicitly.
> > 
> > Good point, and so I have added the following as a header comment to
> > the add_ptr_to_bulk_krc_lock() function:
> > 
> > // Record ptr in a page managed by krcp, with the pre-krc_this_cpu_lock()
> > // state specified by flags.  If can_sleep is true, the caller must
> > // be schedulable and not be holding any locks or mutexes that might be
> > // acquired by the memory allocator or anything that it might invoke.
> > // If !can_sleep, then if !preemptible() no allocation will be undertaken,
> > // otherwise the allocation will use GFP_ATOMIC to avoid the remainder of
> > // the aforementioned deadlock possibilities.  Returns true iff ptr was
> > // successfully recorded, else the caller must use a fallback.
> 
> OK, not trivial to follow but at least verbose enough to understand the
> intention after some mulling. Definitely an improvement, thanks!

Glad it helped!  With some luck, perhaps it will improve with time...

> [...]
> > > > -kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
> > > > +add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
> > > > +   unsigned long *flags, void *ptr, bool can_sleep)
> > > >  {
> > > > struct kvfree_rcu_bulk_data *bnode;
> > > > +   bool can_alloc_page = preemptible();
> > > > +   gfp_t gfp = (can_sleep ? GFP_KERNEL | __GFP_RETRY_MAYFAIL : 
> > > > GFP_ATOMIC) | __GFP_NOWARN;
> > > 
> > > This is quite confusing IMHO. At least without a further explanation.
> > > can_sleep is not as much about sleeping as it is about the reclaim
> > > recursion AFAIU your changelog, right?
> > 
> > No argument on it being confusing, and I hope that the added header
> > comment helps.  But specifically, can_sleep==true is a promise by the
> > caller to be schedulable and not to be holding any lock/mutex/whatever
> > that might possibly be acquired by the memory allocator or by anything
> > else that the memory allocator might invoke, to your point, including
> > for but one example the reclaim logic.
> > 
> > The only way that can_sleep==true is if this function was invoked due
> > to a call to single-argument kvfree_rcu(), which must be schedulable
> > because its fallback is to invoke synchronize_rcu().
> 
> OK. I have to say that it is still not clear to me whether this call
> path can be called from the memory reclaim context. If yes then you need
> __GFP_NOMEMALLOC as well.

Right now the restriction is that single-argument (AKA can_sleep==true)
kvfree_rcu() cannot be invoked from memory reclaim context.

But would adding __GFP_NOMEMALLOC to the can_sleep==true GFP_ flags
allow us to remove this restriction?  If so, I will queue a separate
patch making this change.  The improved ease of use would be well
worth it, if I understand correctly (ha!!!).

> [...]
> 
> > > What is the point of calling kmalloc  for a PAGE_SIZE object? Wouldn't
> > > using the page allocator directly be better?
> > 
> > Well, you guys gave me considerable heat about abusing internal allocator
> > interfaces, and kmalloc() and kfree() seem to be about as non-internal
> > as you can get and still be invoking the allocator.  ;-)
> 
> alloc_pages resp. __get_free_pages is a normal page allocator interface
> to use for page size granular allocations. kmalloc is for more fine
> grained allocations.

OK, in the short term, both work, but I have queued a separate patch
making this change and recording the tradeoffs.  This is not yet a
promise to push this patch, but it is a promise not to lose this part
of the picture.  Please see below.

You mentioned alloc_pages().  I reverted to __get_free_pages(), but
alloc_pages() of course looks nicer.  What are the tradeoffs between
__get_free_pages() and alloc_pages()?

Thanx, Paul



commit 490b638d7c241ac06cee168ccf8688bb8b872478
Author: Paul E. McKenney 
Date:   Wed Sep 30 16:16:39 2020 -0700

kvfree_rcu(): Switch from kmalloc/kfree to __get_free_page/free_page.

The advantages of using kmalloc() and kfree() are a possible small speedup
on CONFIG_SLAB=y systems, avoiding the allocation-side cast, and use of
more-familiar API 

Re: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible

2020-09-30 Thread Uladzislau Rezki
>
> > > What is the point of calling kmalloc  for a PAGE_SIZE object? Wouldn't
> > > using the page allocator directly be better?
> > 
> > Well, you guys gave me considerable heat about abusing internal allocator
> > interfaces, and kmalloc() and kfree() seem to be about as non-internal
> > as you can get and still be invoking the allocator.  ;-)
> 
> alloc_pages resp. __get_free_pages is a normal page allocator interface
> to use for page size granular allocations. kmalloc is for more fine
> grained allocations.
>
I do not have a strong opinion here but i would tend to using of the page
allocator directly. Because we operate on a page level where the page
allocator is supposed to be used. From the other hand i saw a slightly
better performance in case of SLAB only. I think that is because of
a) slab caches; b) more objects which can be cached. But i have one more
concern about using of kmalloc(). That is about fragmentations it can
cause with PAGE_SIZE requests.

As for SLUB it was identical whereas the SLOB i din not check.

--
Vlad Rezki


Re: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible

2020-09-30 Thread Michal Hocko
On Tue 29-09-20 18:53:27, Paul E. McKenney wrote:
> On Tue, Sep 29, 2020 at 02:07:56PM +0200, Michal Hocko wrote:
> > On Mon 28-09-20 16:31:01, paul...@kernel.org wrote:
> > [...]
> 
> Apologies for the delay, but today has not been boring.
> 
> > > This commit therefore uses preemptible() to determine whether allocation
> > > is possible at all for double-argument kvfree_rcu().
> > 
> > This deserves a comment. Because GFP_ATOMIC is possible for many
> > !preemptible() contexts. It is the raw_spin_lock, NMIs and likely few
> > others that are a problem. You are taking a conservative approach which
> > is fine but it would be good to articulate that explicitly.
> 
> Good point, and so I have added the following as a header comment to
> the add_ptr_to_bulk_krc_lock() function:
> 
> // Record ptr in a page managed by krcp, with the pre-krc_this_cpu_lock()
> // state specified by flags.  If can_sleep is true, the caller must
> // be schedulable and not be holding any locks or mutexes that might be
> // acquired by the memory allocator or anything that it might invoke.
> // If !can_sleep, then if !preemptible() no allocation will be undertaken,
> // otherwise the allocation will use GFP_ATOMIC to avoid the remainder of
> // the aforementioned deadlock possibilities.  Returns true iff ptr was
> // successfully recorded, else the caller must use a fallback.

OK, not trivial to follow but at least verbose enough to understand the
intention after some mulling. Definitely an improvement, thanks!

[...]
> > > -kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
> > > +add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
> > > + unsigned long *flags, void *ptr, bool can_sleep)
> > >  {
> > >   struct kvfree_rcu_bulk_data *bnode;
> > > + bool can_alloc_page = preemptible();
> > > + gfp_t gfp = (can_sleep ? GFP_KERNEL | __GFP_RETRY_MAYFAIL : GFP_ATOMIC) 
> > > | __GFP_NOWARN;
> > 
> > This is quite confusing IMHO. At least without a further explanation.
> > can_sleep is not as much about sleeping as it is about the reclaim
> > recursion AFAIU your changelog, right?
> 
> No argument on it being confusing, and I hope that the added header
> comment helps.  But specifically, can_sleep==true is a promise by the
> caller to be schedulable and not to be holding any lock/mutex/whatever
> that might possibly be acquired by the memory allocator or by anything
> else that the memory allocator might invoke, to your point, including
> for but one example the reclaim logic.
> 
> The only way that can_sleep==true is if this function was invoked due
> to a call to single-argument kvfree_rcu(), which must be schedulable
> because its fallback is to invoke synchronize_rcu().

OK. I have to say that it is still not clear to me whether this call
path can be called from the memory reclaim context. If yes then you need
__GFP_NOMEMALLOC as well.

[...]

> > What is the point of calling kmalloc  for a PAGE_SIZE object? Wouldn't
> > using the page allocator directly be better?
> 
> Well, you guys gave me considerable heat about abusing internal allocator
> interfaces, and kmalloc() and kfree() seem to be about as non-internal
> as you can get and still be invoking the allocator.  ;-)

alloc_pages resp. __get_free_pages is a normal page allocator interface
to use for page size granular allocations. kmalloc is for more fine
grained allocations.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible

2020-09-29 Thread Paul E. McKenney
On Tue, Sep 29, 2020 at 02:07:56PM +0200, Michal Hocko wrote:
> On Mon 28-09-20 16:31:01, paul...@kernel.org wrote:
> [...]

Apologies for the delay, but today has not been boring.

> > This commit therefore uses preemptible() to determine whether allocation
> > is possible at all for double-argument kvfree_rcu().
> 
> This deserves a comment. Because GFP_ATOMIC is possible for many
> !preemptible() contexts. It is the raw_spin_lock, NMIs and likely few
> others that are a problem. You are taking a conservative approach which
> is fine but it would be good to articulate that explicitly.

Good point, and so I have added the following as a header comment to
the add_ptr_to_bulk_krc_lock() function:

// Record ptr in a page managed by krcp, with the pre-krc_this_cpu_lock()
// state specified by flags.  If can_sleep is true, the caller must
// be schedulable and not be holding any locks or mutexes that might be
// acquired by the memory allocator or anything that it might invoke.
// If !can_sleep, then if !preemptible() no allocation will be undertaken,
// otherwise the allocation will use GFP_ATOMIC to avoid the remainder of
// the aforementioned deadlock possibilities.  Returns true iff ptr was
// successfully recorded, else the caller must use a fallback.

The updated patch is shown below.

Of course, to Vlastimil's point, lockless access to the allocator
would significantly reduce the level of confusion posed by this code.
But that is a separate issue at the moment.

> > If !preemptible(),
> > then allocation is not possible, and kvfree_rcu() falls back to using
> > the less cache-friendly rcu_head approach.  Even when preemptible(),
> > the caller might be involved in reclaim, so the GFP_ flags used by
> > double-argument kvfree_rcu() must avoid invoking reclaim processing.
> 
> Could you be more specific? Is this about being called directly in the
> reclaim context and you want to prevent a recursion? If that is the
> case, do you really need to special case this in any way? Any memory
> reclaim will set PF_MEMALLOC so allocations called from that context
> will not perform reclaim. So if you are called from the reclaim directly
> then you might want to do GFP_KERNEL | __GFP_NOMEMALLOC | __GFP_NOWARN.
> That should handle both from-the-recalim and outside of reclaim contexts
> just fine (assuming you don't allocated from !preemptible()) context.

My thought was to point you at Daniel Vetter, but you already got in
touch with him, so thank you for that!

> > Note that single-argument kvfree_rcu() must be invoked in sleepable
> > contexts, and that its fallback is the relatively high latency
> > synchronize_rcu().  Single-argument kvfree_rcu() therefore uses
> > GFP_KERNEL|__GFP_RETRY_MAYFAIL to allow limited sleeping within the
> > memory allocator.
> [...]
> >  static inline bool
> > -kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
> > +add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
> > +   unsigned long *flags, void *ptr, bool can_sleep)
> >  {
> > struct kvfree_rcu_bulk_data *bnode;
> > +   bool can_alloc_page = preemptible();
> > +   gfp_t gfp = (can_sleep ? GFP_KERNEL | __GFP_RETRY_MAYFAIL : GFP_ATOMIC) 
> > | __GFP_NOWARN;
> 
> This is quite confusing IMHO. At least without a further explanation.
> can_sleep is not as much about sleeping as it is about the reclaim
> recursion AFAIU your changelog, right?

No argument on it being confusing, and I hope that the added header
comment helps.  But specifically, can_sleep==true is a promise by the
caller to be schedulable and not to be holding any lock/mutex/whatever
that might possibly be acquired by the memory allocator or by anything
else that the memory allocator might invoke, to your point, including
for but one example the reclaim logic.

The only way that can_sleep==true is if this function was invoked due
to a call to single-argument kvfree_rcu(), which must be schedulable
because its fallback is to invoke synchronize_rcu().

> > int idx;
> >  
> > -   if (unlikely(!krcp->initialized))
> > +   *krcp = krc_this_cpu_lock(flags);
> > +   if (unlikely(!(*krcp)->initialized))
> > return false;
> >  
> > -   lockdep_assert_held(>lock);
> > idx = !!is_vmalloc_addr(ptr);
> >  
> > /* Check if a new block is required. */
> > -   if (!krcp->bkvhead[idx] ||
> > -   krcp->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) 
> > {
> > -   bnode = get_cached_bnode(krcp);
> > -   if (!bnode) {
> > -   /*
> > -* To keep this path working on raw non-preemptible
> > -* sections, prevent the optional entry into the
> > -* allocator as it uses sleeping locks. In fact, even
> > -* if the caller of kfree_rcu() is preemptible, this
> > -* path still is not, as krcp->lock is a raw spinlock.
> > -* With additional page pre-allocation in the works,
> > - 

Re: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible

2020-09-29 Thread Michal Hocko
On Mon 28-09-20 16:31:01, paul...@kernel.org wrote:
[...]
> This commit therefore uses preemptible() to determine whether allocation
> is possible at all for double-argument kvfree_rcu().

This deserves a comment. Because GFP_ATOMIC is possible for many
!preemptible() contexts. It is the raw_spin_lock, NMIs and likely few
others that are a problem. You are taking a conservative approach which
is fine but it would be good to articulate that explicitly.

> If !preemptible(),
> then allocation is not possible, and kvfree_rcu() falls back to using
> the less cache-friendly rcu_head approach.  Even when preemptible(),
> the caller might be involved in reclaim, so the GFP_ flags used by
> double-argument kvfree_rcu() must avoid invoking reclaim processing.

Could you be more specific? Is this about being called directly in the
reclaim context and you want to prevent a recursion? If that is the
case, do you really need to special case this in any way? Any memory
reclaim will set PF_MEMALLOC so allocations called from that context
will not perform reclaim. So if you are called from the reclaim directly
then you might want to do GFP_KERNEL | __GFP_NOMEMALLOC | __GFP_NOWARN.
That should handle both from-the-recalim and outside of reclaim contexts
just fine (assuming you don't allocated from !preemptible()) context.

> Note that single-argument kvfree_rcu() must be invoked in sleepable
> contexts, and that its fallback is the relatively high latency
> synchronize_rcu().  Single-argument kvfree_rcu() therefore uses
> GFP_KERNEL|__GFP_RETRY_MAYFAIL to allow limited sleeping within the
> memory allocator.
[...]
>  static inline bool
> -kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
> +add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
> + unsigned long *flags, void *ptr, bool can_sleep)
>  {
>   struct kvfree_rcu_bulk_data *bnode;
> + bool can_alloc_page = preemptible();
> + gfp_t gfp = (can_sleep ? GFP_KERNEL | __GFP_RETRY_MAYFAIL : GFP_ATOMIC) 
> | __GFP_NOWARN;

This is quite confusing IMHO. At least without a further explanation.
can_sleep is not as much about sleeping as it is about the reclaim
recursion AFAIU your changelog, right?

>   int idx;
>  
> - if (unlikely(!krcp->initialized))
> + *krcp = krc_this_cpu_lock(flags);
> + if (unlikely(!(*krcp)->initialized))
>   return false;
>  
> - lockdep_assert_held(>lock);
>   idx = !!is_vmalloc_addr(ptr);
>  
>   /* Check if a new block is required. */
> - if (!krcp->bkvhead[idx] ||
> - krcp->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) 
> {
> - bnode = get_cached_bnode(krcp);
> - if (!bnode) {
> - /*
> -  * To keep this path working on raw non-preemptible
> -  * sections, prevent the optional entry into the
> -  * allocator as it uses sleeping locks. In fact, even
> -  * if the caller of kfree_rcu() is preemptible, this
> -  * path still is not, as krcp->lock is a raw spinlock.
> -  * With additional page pre-allocation in the works,
> -  * hitting this return is going to be much less likely.
> -  */
> - if (IS_ENABLED(CONFIG_PREEMPT_RT))
> - return false;
> -
> - /*
> -  * NOTE: For one argument of kvfree_rcu() we can
> -  * drop the lock and get the page in sleepable
> -  * context. That would allow to maintain an array
> -  * for the CONFIG_PREEMPT_RT as well if no cached
> -  * pages are available.
> -  */
> - bnode = (struct kvfree_rcu_bulk_data *)
> - __get_free_page(GFP_NOWAIT | __GFP_NOWARN);
> + if (!(*krcp)->bkvhead[idx] ||
> + (*krcp)->bkvhead[idx]->nr_records == 
> KVFREE_BULK_MAX_ENTR) {
> + bnode = get_cached_bnode(*krcp);
> + if (!bnode && can_alloc_page) {
> + krc_this_cpu_unlock(*krcp, *flags);
> + bnode = kmalloc(PAGE_SIZE, gfp);

What is the point of calling kmalloc  for a PAGE_SIZE object? Wouldn't
using the page allocator directly be better?
-- 
Michal Hocko
SUSE Labs