Re: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible
On Fri, Oct 02, 2020 at 08:57:46AM +0200, Michal Hocko wrote: > On Thu 01-10-20 09:27:09, Paul E. McKenney wrote: > [...] > > commit ea5c19d21233b5e8d3d06c0d4ecd6be9f2829dc3 > > Author: Paul E. McKenney > > Date: Thu Oct 1 09:24:40 2020 -0700 > > > > kvfree_rcu: Use __GFP_NOMEMALLOC for single-argument kvfree_rcu() > > > > This commit applies the __GFP_NOMEMALLOC gfp flag to memory allocations > > carried out by the single-argument variant of kvfree_rcu(), thus > > avoiding > > this can-sleep code path from dipping into the emergency reserves. > > > > Suggested-by: Michal Hocko > > Signed-off-by: Paul E. McKenney > > LGTM. At least for this one I feel competent to give you > Acked-by: Michal Hocko Thank you very much! I will apply this on the next rebase later today, Pacific Time. Thanx, Paul > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index 242f0f0..6132452 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -3364,7 +3364,8 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp, > > { > > struct kvfree_rcu_bulk_data *bnode; > > bool can_alloc_page = preemptible(); > > - gfp_t gfp = (can_sleep ? GFP_KERNEL | __GFP_RETRY_MAYFAIL : GFP_ATOMIC) > > | __GFP_NOWARN; > > + gfp_t gfp = (can_sleep ? GFP_KERNEL | __GFP_RETRY_MAYFAIL | > > __GFP_NOMEMALLOC > > + : GFP_ATOMIC) | __GFP_NOWARN; > > int idx; > > > > *krcp = krc_this_cpu_lock(flags); > > -- > Michal Hocko > SUSE Labs
Re: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible
On Thu 01-10-20 09:27:09, Paul E. McKenney wrote: [...] > commit ea5c19d21233b5e8d3d06c0d4ecd6be9f2829dc3 > Author: Paul E. McKenney > Date: Thu Oct 1 09:24:40 2020 -0700 > > kvfree_rcu: Use __GFP_NOMEMALLOC for single-argument kvfree_rcu() > > This commit applies the __GFP_NOMEMALLOC gfp flag to memory allocations > carried out by the single-argument variant of kvfree_rcu(), thus avoiding > this can-sleep code path from dipping into the emergency reserves. > > Suggested-by: Michal Hocko > Signed-off-by: Paul E. McKenney LGTM. At least for this one I feel competent to give you Acked-by: Michal Hocko > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 242f0f0..6132452 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -3364,7 +3364,8 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp, > { > struct kvfree_rcu_bulk_data *bnode; > bool can_alloc_page = preemptible(); > - gfp_t gfp = (can_sleep ? GFP_KERNEL | __GFP_RETRY_MAYFAIL : GFP_ATOMIC) > | __GFP_NOWARN; > + gfp_t gfp = (can_sleep ? GFP_KERNEL | __GFP_RETRY_MAYFAIL | > __GFP_NOMEMALLOC > +: GFP_ATOMIC) | __GFP_NOWARN; > int idx; > > *krcp = krc_this_cpu_lock(flags); -- Michal Hocko SUSE Labs
Re: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible
On Thu, Oct 01, 2020 at 11:02:20AM +0200, Michal Hocko wrote: > On Wed 30-09-20 16:21:54, Paul E. McKenney wrote: > > On Wed, Sep 30, 2020 at 10:41:39AM +0200, Michal Hocko wrote: > > > On Tue 29-09-20 18:53:27, Paul E. McKenney wrote: > [...] > > > > No argument on it being confusing, and I hope that the added header > > > > comment helps. But specifically, can_sleep==true is a promise by the > > > > caller to be schedulable and not to be holding any lock/mutex/whatever > > > > that might possibly be acquired by the memory allocator or by anything > > > > else that the memory allocator might invoke, to your point, including > > > > for but one example the reclaim logic. > > > > > > > > The only way that can_sleep==true is if this function was invoked due > > > > to a call to single-argument kvfree_rcu(), which must be schedulable > > > > because its fallback is to invoke synchronize_rcu(). > > > > > > OK. I have to say that it is still not clear to me whether this call > > > path can be called from the memory reclaim context. If yes then you need > > > __GFP_NOMEMALLOC as well. > > > > Right now the restriction is that single-argument (AKA can_sleep==true) > > kvfree_rcu() cannot be invoked from memory reclaim context. > > > > But would adding __GFP_NOMEMALLOC to the can_sleep==true GFP_ flags > > allow us to remove this restriction? If so, I will queue a separate > > patch making this change. The improved ease of use would be well > > worth it, if I understand correctly (ha!!!). > > It would be quite daring to claim it will be ok but it will certainly be > less problematic. Adding the flag will not hurt in any case. As this is > a shared called that might be called from many contexts I think it will > be safer to have it there. The justification is that it will prevent > consumption of memory reserves from MEMALLOC contexts. > > > > > > [...] > > > > > > > > What is the point of calling kmalloc for a PAGE_SIZE object? Wouldn't > > > > > using the page allocator directly be better? > > > > > > > > Well, you guys gave me considerable heat about abusing internal > > > > allocator > > > > interfaces, and kmalloc() and kfree() seem to be about as non-internal > > > > as you can get and still be invoking the allocator. ;-) > > > > > > alloc_pages resp. __get_free_pages is a normal page allocator interface > > > to use for page size granular allocations. kmalloc is for more fine > > > grained allocations. > > > > OK, in the short term, both work, but I have queued a separate patch > > making this change and recording the tradeoffs. This is not yet a > > promise to push this patch, but it is a promise not to lose this part > > of the picture. Please see below. > > It doesn't matter all that much. Both allocators will work. It is just a > matter of using optimal tool for the specific purose. > > > You mentioned alloc_pages(). I reverted to __get_free_pages(), but > > alloc_pages() of course looks nicer. What are the tradeoffs between > > __get_free_pages() and alloc_pages()? > > alloc_pages will return struct page but you need a kernel pointer. That > is what __get_free_pages will give you (or you can call page_address > directly). > > > Thanx, Paul > > > > > > > > commit 490b638d7c241ac06cee168ccf8688bb8b872478 > > Author: Paul E. McKenney > > Date: Wed Sep 30 16:16:39 2020 -0700 > > > > kvfree_rcu(): Switch from kmalloc/kfree to __get_free_page/free_page. > > > > The advantages of using kmalloc() and kfree() are a possible small > > speedup > > on CONFIG_SLAB=y systems, avoiding the allocation-side cast, and use of > > more-familiar API members. The advantages of using __get_free_page() > > and free_page() are a possible reduction in fragmentation and direct > > access to the buddy allocator. > > > > To help settle the question as to which to use, this commit switches > > from kmalloc() and kfree() to __get_free_page() and free_page(). > > > > Suggested-by: Michal Hocko > > Suggested-by: "Uladzislau Rezki (Sony)" > > Signed-off-by: Paul E. McKenney > > Yes, looks good to me. I am not entirely sure about the fragmentation > argument. It really depends on the SL.B allocator internals. The same > applies for the potential speed up. I would be even surprised if the > SLAB was faster in average considering it has to use the page allocator > as well. So to me the primary motivation would be "use the right tool > for the purpose". > As for raised a concern about fragmentation, i mostly was thinking about that SLAbs where not designed to do an efficient allocations for sizes which are >= than PAGE_SIZE. But it depends on three different implementations, actually it also a good argument to switch to the page allocator. I mean to get rid of such dependency. Other side is, SLABs, at least SLAB and SLUB
Re: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible
On Thu, Oct 01, 2020 at 11:02:20AM +0200, Michal Hocko wrote: > On Wed 30-09-20 16:21:54, Paul E. McKenney wrote: [ . . . ] Hit "send" too soon, apologies... > > > > > > commit 490b638d7c241ac06cee168ccf8688bb8b872478 > > Author: Paul E. McKenney > > Date: Wed Sep 30 16:16:39 2020 -0700 > > > > kvfree_rcu(): Switch from kmalloc/kfree to __get_free_page/free_page. > > > > The advantages of using kmalloc() and kfree() are a possible small > > speedup > > on CONFIG_SLAB=y systems, avoiding the allocation-side cast, and use of > > more-familiar API members. The advantages of using __get_free_page() > > and free_page() are a possible reduction in fragmentation and direct > > access to the buddy allocator. > > > > To help settle the question as to which to use, this commit switches > > from kmalloc() and kfree() to __get_free_page() and free_page(). > > > > Suggested-by: Michal Hocko > > Suggested-by: "Uladzislau Rezki (Sony)" > > Signed-off-by: Paul E. McKenney > > Yes, looks good to me. I am not entirely sure about the fragmentation > argument. It really depends on the SL.B allocator internals. The same > applies for the potential speed up. I would be even surprised if the > SLAB was faster in average considering it has to use the page allocator > as well. So to me the primary motivation would be "use the right tool > for the purpose". Very well, I will update the commit message, and thank you! Thanx, Paul > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index 2886e81..242f0f0 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -3225,7 +3225,8 @@ static void kfree_rcu_work(struct work_struct *work) > > bkvhead[i] = NULL; > > krc_this_cpu_unlock(krcp, flags); > > > > - kfree(bkvhead[i]); > > + if (bkvhead[i]) > > + free_page((unsigned long)bkvhead[i]); > > > > cond_resched_tasks_rcu_qs(); > > } > > @@ -3378,7 +3379,7 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp, > > bnode = get_cached_bnode(*krcp); > > if (!bnode && can_alloc_page) { > > krc_this_cpu_unlock(*krcp, *flags); > > - bnode = kmalloc(PAGE_SIZE, gfp); > > + bnode = (struct kvfree_rcu_bulk_data > > *)__get_free_page(gfp); > > *krcp = krc_this_cpu_lock(flags); > > } > > > > -- > Michal Hocko > SUSE Labs
Re: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible
On Thu, Oct 01, 2020 at 11:02:20AM +0200, Michal Hocko wrote: > On Wed 30-09-20 16:21:54, Paul E. McKenney wrote: > > On Wed, Sep 30, 2020 at 10:41:39AM +0200, Michal Hocko wrote: > > > On Tue 29-09-20 18:53:27, Paul E. McKenney wrote: > [...] > > > > No argument on it being confusing, and I hope that the added header > > > > comment helps. But specifically, can_sleep==true is a promise by the > > > > caller to be schedulable and not to be holding any lock/mutex/whatever > > > > that might possibly be acquired by the memory allocator or by anything > > > > else that the memory allocator might invoke, to your point, including > > > > for but one example the reclaim logic. > > > > > > > > The only way that can_sleep==true is if this function was invoked due > > > > to a call to single-argument kvfree_rcu(), which must be schedulable > > > > because its fallback is to invoke synchronize_rcu(). > > > > > > OK. I have to say that it is still not clear to me whether this call > > > path can be called from the memory reclaim context. If yes then you need > > > __GFP_NOMEMALLOC as well. > > > > Right now the restriction is that single-argument (AKA can_sleep==true) > > kvfree_rcu() cannot be invoked from memory reclaim context. > > > > But would adding __GFP_NOMEMALLOC to the can_sleep==true GFP_ flags > > allow us to remove this restriction? If so, I will queue a separate > > patch making this change. The improved ease of use would be well > > worth it, if I understand correctly (ha!!!). > > It would be quite daring to claim it will be ok but it will certainly be > less problematic. Adding the flag will not hurt in any case. As this is > a shared called that might be called from many contexts I think it will > be safer to have it there. The justification is that it will prevent > consumption of memory reserves from MEMALLOC contexts. Ah, so a different goal (and yes, I finally went over and read the relevant documentation). Agreed, the can_sleep path does not really need to be dipping into the emergency reserves. And it looks like the not-from-reclaim restriction is still at least partially in effect, but one step at a time. The patch is shown below, which I have queued for a later release. > > > [...] > > > > > > > > What is the point of calling kmalloc for a PAGE_SIZE object? Wouldn't > > > > > using the page allocator directly be better? > > > > > > > > Well, you guys gave me considerable heat about abusing internal > > > > allocator > > > > interfaces, and kmalloc() and kfree() seem to be about as non-internal > > > > as you can get and still be invoking the allocator. ;-) > > > > > > alloc_pages resp. __get_free_pages is a normal page allocator interface > > > to use for page size granular allocations. kmalloc is for more fine > > > grained allocations. > > > > OK, in the short term, both work, but I have queued a separate patch > > making this change and recording the tradeoffs. This is not yet a > > promise to push this patch, but it is a promise not to lose this part > > of the picture. Please see below. > > It doesn't matter all that much. Both allocators will work. It is just a > matter of using optimal tool for the specific purose. > > > You mentioned alloc_pages(). I reverted to __get_free_pages(), but > > alloc_pages() of course looks nicer. What are the tradeoffs between > > __get_free_pages() and alloc_pages()? > > alloc_pages will return struct page but you need a kernel pointer. That > is what __get_free_pages will give you (or you can call page_address > directly). Thank you, looks like __get_free_pages() is the tool for this job. Please see below for the aforementioned patch. Thanx, Paul commit ea5c19d21233b5e8d3d06c0d4ecd6be9f2829dc3 Author: Paul E. McKenney Date: Thu Oct 1 09:24:40 2020 -0700 kvfree_rcu: Use __GFP_NOMEMALLOC for single-argument kvfree_rcu() This commit applies the __GFP_NOMEMALLOC gfp flag to memory allocations carried out by the single-argument variant of kvfree_rcu(), thus avoiding this can-sleep code path from dipping into the emergency reserves. Suggested-by: Michal Hocko Signed-off-by: Paul E. McKenney diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 242f0f0..6132452 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -3364,7 +3364,8 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp, { struct kvfree_rcu_bulk_data *bnode; bool can_alloc_page = preemptible(); - gfp_t gfp = (can_sleep ? GFP_KERNEL | __GFP_RETRY_MAYFAIL : GFP_ATOMIC) | __GFP_NOWARN; + gfp_t gfp = (can_sleep ? GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOMEMALLOC + : GFP_ATOMIC) | __GFP_NOWARN; int idx; *krcp = krc_this_cpu_lock(flags);
Re: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible
On Wed 30-09-20 16:21:54, Paul E. McKenney wrote: > On Wed, Sep 30, 2020 at 10:41:39AM +0200, Michal Hocko wrote: > > On Tue 29-09-20 18:53:27, Paul E. McKenney wrote: [...] > > > No argument on it being confusing, and I hope that the added header > > > comment helps. But specifically, can_sleep==true is a promise by the > > > caller to be schedulable and not to be holding any lock/mutex/whatever > > > that might possibly be acquired by the memory allocator or by anything > > > else that the memory allocator might invoke, to your point, including > > > for but one example the reclaim logic. > > > > > > The only way that can_sleep==true is if this function was invoked due > > > to a call to single-argument kvfree_rcu(), which must be schedulable > > > because its fallback is to invoke synchronize_rcu(). > > > > OK. I have to say that it is still not clear to me whether this call > > path can be called from the memory reclaim context. If yes then you need > > __GFP_NOMEMALLOC as well. > > Right now the restriction is that single-argument (AKA can_sleep==true) > kvfree_rcu() cannot be invoked from memory reclaim context. > > But would adding __GFP_NOMEMALLOC to the can_sleep==true GFP_ flags > allow us to remove this restriction? If so, I will queue a separate > patch making this change. The improved ease of use would be well > worth it, if I understand correctly (ha!!!). It would be quite daring to claim it will be ok but it will certainly be less problematic. Adding the flag will not hurt in any case. As this is a shared called that might be called from many contexts I think it will be safer to have it there. The justification is that it will prevent consumption of memory reserves from MEMALLOC contexts. > > > [...] > > > > > > What is the point of calling kmalloc for a PAGE_SIZE object? Wouldn't > > > > using the page allocator directly be better? > > > > > > Well, you guys gave me considerable heat about abusing internal allocator > > > interfaces, and kmalloc() and kfree() seem to be about as non-internal > > > as you can get and still be invoking the allocator. ;-) > > > > alloc_pages resp. __get_free_pages is a normal page allocator interface > > to use for page size granular allocations. kmalloc is for more fine > > grained allocations. > > OK, in the short term, both work, but I have queued a separate patch > making this change and recording the tradeoffs. This is not yet a > promise to push this patch, but it is a promise not to lose this part > of the picture. Please see below. It doesn't matter all that much. Both allocators will work. It is just a matter of using optimal tool for the specific purose. > You mentioned alloc_pages(). I reverted to __get_free_pages(), but > alloc_pages() of course looks nicer. What are the tradeoffs between > __get_free_pages() and alloc_pages()? alloc_pages will return struct page but you need a kernel pointer. That is what __get_free_pages will give you (or you can call page_address directly). > Thanx, Paul > > > > commit 490b638d7c241ac06cee168ccf8688bb8b872478 > Author: Paul E. McKenney > Date: Wed Sep 30 16:16:39 2020 -0700 > > kvfree_rcu(): Switch from kmalloc/kfree to __get_free_page/free_page. > > The advantages of using kmalloc() and kfree() are a possible small speedup > on CONFIG_SLAB=y systems, avoiding the allocation-side cast, and use of > more-familiar API members. The advantages of using __get_free_page() > and free_page() are a possible reduction in fragmentation and direct > access to the buddy allocator. > > To help settle the question as to which to use, this commit switches > from kmalloc() and kfree() to __get_free_page() and free_page(). > > Suggested-by: Michal Hocko > Suggested-by: "Uladzislau Rezki (Sony)" > Signed-off-by: Paul E. McKenney Yes, looks good to me. I am not entirely sure about the fragmentation argument. It really depends on the SL.B allocator internals. The same applies for the potential speed up. I would be even surprised if the SLAB was faster in average considering it has to use the page allocator as well. So to me the primary motivation would be "use the right tool for the purpose". > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 2886e81..242f0f0 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -3225,7 +3225,8 @@ static void kfree_rcu_work(struct work_struct *work) > bkvhead[i] = NULL; > krc_this_cpu_unlock(krcp, flags); > > - kfree(bkvhead[i]); > + if (bkvhead[i]) > + free_page((unsigned long)bkvhead[i]); > > cond_resched_tasks_rcu_qs(); > } > @@ -3378,7 +3379,7 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
Re: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible
On Wed, Sep 30, 2020 at 10:41:39AM +0200, Michal Hocko wrote: > On Tue 29-09-20 18:53:27, Paul E. McKenney wrote: > > On Tue, Sep 29, 2020 at 02:07:56PM +0200, Michal Hocko wrote: > > > On Mon 28-09-20 16:31:01, paul...@kernel.org wrote: > > > [...] > > > > Apologies for the delay, but today has not been boring. > > > > > > This commit therefore uses preemptible() to determine whether allocation > > > > is possible at all for double-argument kvfree_rcu(). > > > > > > This deserves a comment. Because GFP_ATOMIC is possible for many > > > !preemptible() contexts. It is the raw_spin_lock, NMIs and likely few > > > others that are a problem. You are taking a conservative approach which > > > is fine but it would be good to articulate that explicitly. > > > > Good point, and so I have added the following as a header comment to > > the add_ptr_to_bulk_krc_lock() function: > > > > // Record ptr in a page managed by krcp, with the pre-krc_this_cpu_lock() > > // state specified by flags. If can_sleep is true, the caller must > > // be schedulable and not be holding any locks or mutexes that might be > > // acquired by the memory allocator or anything that it might invoke. > > // If !can_sleep, then if !preemptible() no allocation will be undertaken, > > // otherwise the allocation will use GFP_ATOMIC to avoid the remainder of > > // the aforementioned deadlock possibilities. Returns true iff ptr was > > // successfully recorded, else the caller must use a fallback. > > OK, not trivial to follow but at least verbose enough to understand the > intention after some mulling. Definitely an improvement, thanks! Glad it helped! With some luck, perhaps it will improve with time... > [...] > > > > -kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr) > > > > +add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp, > > > > + unsigned long *flags, void *ptr, bool can_sleep) > > > > { > > > > struct kvfree_rcu_bulk_data *bnode; > > > > + bool can_alloc_page = preemptible(); > > > > + gfp_t gfp = (can_sleep ? GFP_KERNEL | __GFP_RETRY_MAYFAIL : > > > > GFP_ATOMIC) | __GFP_NOWARN; > > > > > > This is quite confusing IMHO. At least without a further explanation. > > > can_sleep is not as much about sleeping as it is about the reclaim > > > recursion AFAIU your changelog, right? > > > > No argument on it being confusing, and I hope that the added header > > comment helps. But specifically, can_sleep==true is a promise by the > > caller to be schedulable and not to be holding any lock/mutex/whatever > > that might possibly be acquired by the memory allocator or by anything > > else that the memory allocator might invoke, to your point, including > > for but one example the reclaim logic. > > > > The only way that can_sleep==true is if this function was invoked due > > to a call to single-argument kvfree_rcu(), which must be schedulable > > because its fallback is to invoke synchronize_rcu(). > > OK. I have to say that it is still not clear to me whether this call > path can be called from the memory reclaim context. If yes then you need > __GFP_NOMEMALLOC as well. Right now the restriction is that single-argument (AKA can_sleep==true) kvfree_rcu() cannot be invoked from memory reclaim context. But would adding __GFP_NOMEMALLOC to the can_sleep==true GFP_ flags allow us to remove this restriction? If so, I will queue a separate patch making this change. The improved ease of use would be well worth it, if I understand correctly (ha!!!). > [...] > > > > What is the point of calling kmalloc for a PAGE_SIZE object? Wouldn't > > > using the page allocator directly be better? > > > > Well, you guys gave me considerable heat about abusing internal allocator > > interfaces, and kmalloc() and kfree() seem to be about as non-internal > > as you can get and still be invoking the allocator. ;-) > > alloc_pages resp. __get_free_pages is a normal page allocator interface > to use for page size granular allocations. kmalloc is for more fine > grained allocations. OK, in the short term, both work, but I have queued a separate patch making this change and recording the tradeoffs. This is not yet a promise to push this patch, but it is a promise not to lose this part of the picture. Please see below. You mentioned alloc_pages(). I reverted to __get_free_pages(), but alloc_pages() of course looks nicer. What are the tradeoffs between __get_free_pages() and alloc_pages()? Thanx, Paul commit 490b638d7c241ac06cee168ccf8688bb8b872478 Author: Paul E. McKenney Date: Wed Sep 30 16:16:39 2020 -0700 kvfree_rcu(): Switch from kmalloc/kfree to __get_free_page/free_page. The advantages of using kmalloc() and kfree() are a possible small speedup on CONFIG_SLAB=y systems, avoiding the allocation-side cast, and use of more-familiar API
Re: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible
> > > > What is the point of calling kmalloc for a PAGE_SIZE object? Wouldn't > > > using the page allocator directly be better? > > > > Well, you guys gave me considerable heat about abusing internal allocator > > interfaces, and kmalloc() and kfree() seem to be about as non-internal > > as you can get and still be invoking the allocator. ;-) > > alloc_pages resp. __get_free_pages is a normal page allocator interface > to use for page size granular allocations. kmalloc is for more fine > grained allocations. > I do not have a strong opinion here but i would tend to using of the page allocator directly. Because we operate on a page level where the page allocator is supposed to be used. From the other hand i saw a slightly better performance in case of SLAB only. I think that is because of a) slab caches; b) more objects which can be cached. But i have one more concern about using of kmalloc(). That is about fragmentations it can cause with PAGE_SIZE requests. As for SLUB it was identical whereas the SLOB i din not check. -- Vlad Rezki
Re: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible
On Tue 29-09-20 18:53:27, Paul E. McKenney wrote: > On Tue, Sep 29, 2020 at 02:07:56PM +0200, Michal Hocko wrote: > > On Mon 28-09-20 16:31:01, paul...@kernel.org wrote: > > [...] > > Apologies for the delay, but today has not been boring. > > > > This commit therefore uses preemptible() to determine whether allocation > > > is possible at all for double-argument kvfree_rcu(). > > > > This deserves a comment. Because GFP_ATOMIC is possible for many > > !preemptible() contexts. It is the raw_spin_lock, NMIs and likely few > > others that are a problem. You are taking a conservative approach which > > is fine but it would be good to articulate that explicitly. > > Good point, and so I have added the following as a header comment to > the add_ptr_to_bulk_krc_lock() function: > > // Record ptr in a page managed by krcp, with the pre-krc_this_cpu_lock() > // state specified by flags. If can_sleep is true, the caller must > // be schedulable and not be holding any locks or mutexes that might be > // acquired by the memory allocator or anything that it might invoke. > // If !can_sleep, then if !preemptible() no allocation will be undertaken, > // otherwise the allocation will use GFP_ATOMIC to avoid the remainder of > // the aforementioned deadlock possibilities. Returns true iff ptr was > // successfully recorded, else the caller must use a fallback. OK, not trivial to follow but at least verbose enough to understand the intention after some mulling. Definitely an improvement, thanks! [...] > > > -kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr) > > > +add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp, > > > + unsigned long *flags, void *ptr, bool can_sleep) > > > { > > > struct kvfree_rcu_bulk_data *bnode; > > > + bool can_alloc_page = preemptible(); > > > + gfp_t gfp = (can_sleep ? GFP_KERNEL | __GFP_RETRY_MAYFAIL : GFP_ATOMIC) > > > | __GFP_NOWARN; > > > > This is quite confusing IMHO. At least without a further explanation. > > can_sleep is not as much about sleeping as it is about the reclaim > > recursion AFAIU your changelog, right? > > No argument on it being confusing, and I hope that the added header > comment helps. But specifically, can_sleep==true is a promise by the > caller to be schedulable and not to be holding any lock/mutex/whatever > that might possibly be acquired by the memory allocator or by anything > else that the memory allocator might invoke, to your point, including > for but one example the reclaim logic. > > The only way that can_sleep==true is if this function was invoked due > to a call to single-argument kvfree_rcu(), which must be schedulable > because its fallback is to invoke synchronize_rcu(). OK. I have to say that it is still not clear to me whether this call path can be called from the memory reclaim context. If yes then you need __GFP_NOMEMALLOC as well. [...] > > What is the point of calling kmalloc for a PAGE_SIZE object? Wouldn't > > using the page allocator directly be better? > > Well, you guys gave me considerable heat about abusing internal allocator > interfaces, and kmalloc() and kfree() seem to be about as non-internal > as you can get and still be invoking the allocator. ;-) alloc_pages resp. __get_free_pages is a normal page allocator interface to use for page size granular allocations. kmalloc is for more fine grained allocations. -- Michal Hocko SUSE Labs
Re: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible
On Tue, Sep 29, 2020 at 02:07:56PM +0200, Michal Hocko wrote: > On Mon 28-09-20 16:31:01, paul...@kernel.org wrote: > [...] Apologies for the delay, but today has not been boring. > > This commit therefore uses preemptible() to determine whether allocation > > is possible at all for double-argument kvfree_rcu(). > > This deserves a comment. Because GFP_ATOMIC is possible for many > !preemptible() contexts. It is the raw_spin_lock, NMIs and likely few > others that are a problem. You are taking a conservative approach which > is fine but it would be good to articulate that explicitly. Good point, and so I have added the following as a header comment to the add_ptr_to_bulk_krc_lock() function: // Record ptr in a page managed by krcp, with the pre-krc_this_cpu_lock() // state specified by flags. If can_sleep is true, the caller must // be schedulable and not be holding any locks or mutexes that might be // acquired by the memory allocator or anything that it might invoke. // If !can_sleep, then if !preemptible() no allocation will be undertaken, // otherwise the allocation will use GFP_ATOMIC to avoid the remainder of // the aforementioned deadlock possibilities. Returns true iff ptr was // successfully recorded, else the caller must use a fallback. The updated patch is shown below. Of course, to Vlastimil's point, lockless access to the allocator would significantly reduce the level of confusion posed by this code. But that is a separate issue at the moment. > > If !preemptible(), > > then allocation is not possible, and kvfree_rcu() falls back to using > > the less cache-friendly rcu_head approach. Even when preemptible(), > > the caller might be involved in reclaim, so the GFP_ flags used by > > double-argument kvfree_rcu() must avoid invoking reclaim processing. > > Could you be more specific? Is this about being called directly in the > reclaim context and you want to prevent a recursion? If that is the > case, do you really need to special case this in any way? Any memory > reclaim will set PF_MEMALLOC so allocations called from that context > will not perform reclaim. So if you are called from the reclaim directly > then you might want to do GFP_KERNEL | __GFP_NOMEMALLOC | __GFP_NOWARN. > That should handle both from-the-recalim and outside of reclaim contexts > just fine (assuming you don't allocated from !preemptible()) context. My thought was to point you at Daniel Vetter, but you already got in touch with him, so thank you for that! > > Note that single-argument kvfree_rcu() must be invoked in sleepable > > contexts, and that its fallback is the relatively high latency > > synchronize_rcu(). Single-argument kvfree_rcu() therefore uses > > GFP_KERNEL|__GFP_RETRY_MAYFAIL to allow limited sleeping within the > > memory allocator. > [...] > > static inline bool > > -kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr) > > +add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp, > > + unsigned long *flags, void *ptr, bool can_sleep) > > { > > struct kvfree_rcu_bulk_data *bnode; > > + bool can_alloc_page = preemptible(); > > + gfp_t gfp = (can_sleep ? GFP_KERNEL | __GFP_RETRY_MAYFAIL : GFP_ATOMIC) > > | __GFP_NOWARN; > > This is quite confusing IMHO. At least without a further explanation. > can_sleep is not as much about sleeping as it is about the reclaim > recursion AFAIU your changelog, right? No argument on it being confusing, and I hope that the added header comment helps. But specifically, can_sleep==true is a promise by the caller to be schedulable and not to be holding any lock/mutex/whatever that might possibly be acquired by the memory allocator or by anything else that the memory allocator might invoke, to your point, including for but one example the reclaim logic. The only way that can_sleep==true is if this function was invoked due to a call to single-argument kvfree_rcu(), which must be schedulable because its fallback is to invoke synchronize_rcu(). > > int idx; > > > > - if (unlikely(!krcp->initialized)) > > + *krcp = krc_this_cpu_lock(flags); > > + if (unlikely(!(*krcp)->initialized)) > > return false; > > > > - lockdep_assert_held(>lock); > > idx = !!is_vmalloc_addr(ptr); > > > > /* Check if a new block is required. */ > > - if (!krcp->bkvhead[idx] || > > - krcp->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) > > { > > - bnode = get_cached_bnode(krcp); > > - if (!bnode) { > > - /* > > -* To keep this path working on raw non-preemptible > > -* sections, prevent the optional entry into the > > -* allocator as it uses sleeping locks. In fact, even > > -* if the caller of kfree_rcu() is preemptible, this > > -* path still is not, as krcp->lock is a raw spinlock. > > -* With additional page pre-allocation in the works, > > -
Re: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible
On Mon 28-09-20 16:31:01, paul...@kernel.org wrote: [...] > This commit therefore uses preemptible() to determine whether allocation > is possible at all for double-argument kvfree_rcu(). This deserves a comment. Because GFP_ATOMIC is possible for many !preemptible() contexts. It is the raw_spin_lock, NMIs and likely few others that are a problem. You are taking a conservative approach which is fine but it would be good to articulate that explicitly. > If !preemptible(), > then allocation is not possible, and kvfree_rcu() falls back to using > the less cache-friendly rcu_head approach. Even when preemptible(), > the caller might be involved in reclaim, so the GFP_ flags used by > double-argument kvfree_rcu() must avoid invoking reclaim processing. Could you be more specific? Is this about being called directly in the reclaim context and you want to prevent a recursion? If that is the case, do you really need to special case this in any way? Any memory reclaim will set PF_MEMALLOC so allocations called from that context will not perform reclaim. So if you are called from the reclaim directly then you might want to do GFP_KERNEL | __GFP_NOMEMALLOC | __GFP_NOWARN. That should handle both from-the-recalim and outside of reclaim contexts just fine (assuming you don't allocated from !preemptible()) context. > Note that single-argument kvfree_rcu() must be invoked in sleepable > contexts, and that its fallback is the relatively high latency > synchronize_rcu(). Single-argument kvfree_rcu() therefore uses > GFP_KERNEL|__GFP_RETRY_MAYFAIL to allow limited sleeping within the > memory allocator. [...] > static inline bool > -kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr) > +add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp, > + unsigned long *flags, void *ptr, bool can_sleep) > { > struct kvfree_rcu_bulk_data *bnode; > + bool can_alloc_page = preemptible(); > + gfp_t gfp = (can_sleep ? GFP_KERNEL | __GFP_RETRY_MAYFAIL : GFP_ATOMIC) > | __GFP_NOWARN; This is quite confusing IMHO. At least without a further explanation. can_sleep is not as much about sleeping as it is about the reclaim recursion AFAIU your changelog, right? > int idx; > > - if (unlikely(!krcp->initialized)) > + *krcp = krc_this_cpu_lock(flags); > + if (unlikely(!(*krcp)->initialized)) > return false; > > - lockdep_assert_held(>lock); > idx = !!is_vmalloc_addr(ptr); > > /* Check if a new block is required. */ > - if (!krcp->bkvhead[idx] || > - krcp->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) > { > - bnode = get_cached_bnode(krcp); > - if (!bnode) { > - /* > - * To keep this path working on raw non-preemptible > - * sections, prevent the optional entry into the > - * allocator as it uses sleeping locks. In fact, even > - * if the caller of kfree_rcu() is preemptible, this > - * path still is not, as krcp->lock is a raw spinlock. > - * With additional page pre-allocation in the works, > - * hitting this return is going to be much less likely. > - */ > - if (IS_ENABLED(CONFIG_PREEMPT_RT)) > - return false; > - > - /* > - * NOTE: For one argument of kvfree_rcu() we can > - * drop the lock and get the page in sleepable > - * context. That would allow to maintain an array > - * for the CONFIG_PREEMPT_RT as well if no cached > - * pages are available. > - */ > - bnode = (struct kvfree_rcu_bulk_data *) > - __get_free_page(GFP_NOWAIT | __GFP_NOWARN); > + if (!(*krcp)->bkvhead[idx] || > + (*krcp)->bkvhead[idx]->nr_records == > KVFREE_BULK_MAX_ENTR) { > + bnode = get_cached_bnode(*krcp); > + if (!bnode && can_alloc_page) { > + krc_this_cpu_unlock(*krcp, *flags); > + bnode = kmalloc(PAGE_SIZE, gfp); What is the point of calling kmalloc for a PAGE_SIZE object? Wouldn't using the page allocator directly be better? -- Michal Hocko SUSE Labs