Re: [PATCH 1/8] mm: cma: introduce cma_release_nowait()

2021-03-29 Thread Mike Kravetz
On 3/29/21 12:46 AM, Michal Hocko wrote:
> On Fri 26-03-21 14:32:01, Mike Kravetz wrote:
> [...]
>> - Just change the mutex to an irq safe spinlock.
> 
> Yes please.
> 
>>   AFAICT, the potential
>>   downsides could be:
>>   - Interrupts disabled during long bitmap scans
> 
> How large those bitmaps are in practice?
> 
>>   - Wasted cpu cycles (spinning) if there is much contention on lock
>>   Both of these would be more of an issue on small/embedded systems. I
>>   took a quick look at the callers of cma_alloc/cma_release and nothing
>>   stood out that could lead to high degrees of contention.  However, I
>>   could have missed something.
> 
> If this is really a practical concern then we can try a more complex
> solution based on some data.
> 

Ok, I will send v2 with this approach.  Adding Barry and Will on Cc: as
they were involved in adding more cma use cases for dma on arm.

-- 
Mike Kravetz


Re: [PATCH 1/8] mm: cma: introduce cma_release_nowait()

2021-03-29 Thread Michal Hocko
On Fri 26-03-21 14:32:01, Mike Kravetz wrote:
[...]
> - Just change the mutex to an irq safe spinlock.

Yes please.

>   AFAICT, the potential
>   downsides could be:
>   - Interrupts disabled during long bitmap scans

How large those bitmaps are in practice?

>   - Wasted cpu cycles (spinning) if there is much contention on lock
>   Both of these would be more of an issue on small/embedded systems. I
>   took a quick look at the callers of cma_alloc/cma_release and nothing
>   stood out that could lead to high degrees of contention.  However, I
>   could have missed something.

If this is really a practical concern then we can try a more complex
solution based on some data.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH 1/8] mm: cma: introduce cma_release_nowait()

2021-03-26 Thread Mike Kravetz
On 3/25/21 4:49 PM, Mike Kravetz wrote:
> On 3/25/21 4:19 PM, Roman Gushchin wrote:
>> On Thu, Mar 25, 2021 at 01:12:51PM -0700, Minchan Kim wrote:
>>> On Thu, Mar 25, 2021 at 06:15:11PM +0100, David Hildenbrand wrote:
 On 25.03.21 17:56, Mike Kravetz wrote:
> On 3/25/21 3:22 AM, Michal Hocko wrote:
>> On Thu 25-03-21 10:56:38, David Hildenbrand wrote:
>>> On 25.03.21 01:28, Mike Kravetz wrote:
 From: Roman Gushchin 

 cma_release() has to lock the cma_lock mutex to clear the cma bitmap.
 It makes it a blocking function, which complicates its usage from
 non-blocking contexts. For instance, hugetlbfs code is temporarily
 dropping the hugetlb_lock spinlock to call cma_release().

 This patch introduces a non-blocking cma_release_nowait(), which
 postpones the cma bitmap clearance. It's done later from a work
 context. The first page in the cma allocation is used to store
 the work struct. Because CMA allocations and de-allocations are
 usually not that frequent, a single global workqueue is used.

 To make sure that subsequent cma_alloc() call will pass, cma_alloc()
 flushes the cma_release_wq workqueue. To avoid a performance
 regression in the case when only cma_release() is used, gate it
 by a per-cma area flag, which is set by the first call
 of cma_release_nowait().

 Signed-off-by: Roman Gushchin 
 [mike.krav...@oracle.com: rebased to v5.12-rc3-mmotm-2021-03-17-22-24]
 Signed-off-by: Mike Kravetz 
 ---
>>>
>>>
>>> 1. Is there a real reason this is a mutex and not a spin lock? It seems 
>>> to
>>> only protect the bitmap. Are bitmaps that huge that we spend a 
>>> significant
>>> amount of time in there?
>>
>> Good question. Looking at the code it doesn't seem that there is any
>> blockable operation or any heavy lifting done under the lock.
>> 7ee793a62fa8 ("cma: Remove potential deadlock situation") has introduced
>> the lock and there was a simple bitmat protection back then. I suspect
>> the patch just followed the cma_mutex lead and used the same type of the
>> lock. cma_mutex used to protect alloc_contig_range which is sleepable.
>>
>> This all suggests that there is no real reason to use a sleepable lock
>> at all and we do not need all this heavy lifting.
>>
>
> When Roman first proposed these patches, I brought up the same issue:
>
> https://lore.kernel.org/linux-mm/20201022023352.gc300...@carbon.dhcp.thefacebook.com/
>
> Previously, Roman proposed replacing the mutex with a spinlock but
> Joonsoo was opposed.
>
> Adding Joonsoo on Cc:
>

 There has to be a good reason not to. And if there is a good reason,
 lockless clearing might be one feasible alternative.
>>>
>>> I also don't think nowait variant is good idea. If the scanning of
>>> bitmap is *really* significant, it might be signal that we need to
>>> introduce different technique or data structure not bitmap rather
>>> than a new API variant.
>>
>> I'd also prefer to just replace the mutex with a spinlock rather than 
>> fiddling
>> with a delayed release.
>>
> 
> I hope Joonsoo or someone else brings up specific concerns.  I do not
> know enough about all CMA use cases.  Certainly, in this specific use
> case converting to a spinlock would not be an issue.  Do note that we
> would want to convert to an irq safe spinlock and disable irqs if that
> makes any difference in the discussion.
> 

Suggestions on how to move forward would be appreciated.  I can think of
the following options.

- Use the the cma_release_nowait() routine as defined in this patch.

- Just change the mutex to an irq safe spinlock.  AFAICT, the potential
  downsides could be:
  - Interrupts disabled during long bitmap scans
  - Wasted cpu cycles (spinning) if there is much contention on lock
  Both of these would be more of an issue on small/embedded systems. I
  took a quick look at the callers of cma_alloc/cma_release and nothing
  stood out that could lead to high degrees of contention.  However, I
  could have missed something.

- Another idea I had was to allow the user to specify the locking type
  when creating a cma area.  In this way, cma areas which may have
  release calls from atomic context would be set up with an irq safe
  spinlock.  Others, would use the mutex.  I admit this is a hackish
  way to address the issue, but perhaps not much worse than the separate
  cma_release_nowait interface?

- Change the CMA bitmap to some other data structure and algorithm.
  This would obviously take more work.

Thanks,
-- 
Mike Kravetz


Re: [PATCH 1/8] mm: cma: introduce cma_release_nowait()

2021-03-25 Thread Mike Kravetz
On 3/25/21 4:19 PM, Roman Gushchin wrote:
> On Thu, Mar 25, 2021 at 01:12:51PM -0700, Minchan Kim wrote:
>> On Thu, Mar 25, 2021 at 06:15:11PM +0100, David Hildenbrand wrote:
>>> On 25.03.21 17:56, Mike Kravetz wrote:
 On 3/25/21 3:22 AM, Michal Hocko wrote:
> On Thu 25-03-21 10:56:38, David Hildenbrand wrote:
>> On 25.03.21 01:28, Mike Kravetz wrote:
>>> From: Roman Gushchin 
>>>
>>> cma_release() has to lock the cma_lock mutex to clear the cma bitmap.
>>> It makes it a blocking function, which complicates its usage from
>>> non-blocking contexts. For instance, hugetlbfs code is temporarily
>>> dropping the hugetlb_lock spinlock to call cma_release().
>>>
>>> This patch introduces a non-blocking cma_release_nowait(), which
>>> postpones the cma bitmap clearance. It's done later from a work
>>> context. The first page in the cma allocation is used to store
>>> the work struct. Because CMA allocations and de-allocations are
>>> usually not that frequent, a single global workqueue is used.
>>>
>>> To make sure that subsequent cma_alloc() call will pass, cma_alloc()
>>> flushes the cma_release_wq workqueue. To avoid a performance
>>> regression in the case when only cma_release() is used, gate it
>>> by a per-cma area flag, which is set by the first call
>>> of cma_release_nowait().
>>>
>>> Signed-off-by: Roman Gushchin 
>>> [mike.krav...@oracle.com: rebased to v5.12-rc3-mmotm-2021-03-17-22-24]
>>> Signed-off-by: Mike Kravetz 
>>> ---
>>
>>
>> 1. Is there a real reason this is a mutex and not a spin lock? It seems 
>> to
>> only protect the bitmap. Are bitmaps that huge that we spend a 
>> significant
>> amount of time in there?
>
> Good question. Looking at the code it doesn't seem that there is any
> blockable operation or any heavy lifting done under the lock.
> 7ee793a62fa8 ("cma: Remove potential deadlock situation") has introduced
> the lock and there was a simple bitmat protection back then. I suspect
> the patch just followed the cma_mutex lead and used the same type of the
> lock. cma_mutex used to protect alloc_contig_range which is sleepable.
>
> This all suggests that there is no real reason to use a sleepable lock
> at all and we do not need all this heavy lifting.
>

 When Roman first proposed these patches, I brought up the same issue:

 https://lore.kernel.org/linux-mm/20201022023352.gc300...@carbon.dhcp.thefacebook.com/

 Previously, Roman proposed replacing the mutex with a spinlock but
 Joonsoo was opposed.

 Adding Joonsoo on Cc:

>>>
>>> There has to be a good reason not to. And if there is a good reason,
>>> lockless clearing might be one feasible alternative.
>>
>> I also don't think nowait variant is good idea. If the scanning of
>> bitmap is *really* significant, it might be signal that we need to
>> introduce different technique or data structure not bitmap rather
>> than a new API variant.
> 
> I'd also prefer to just replace the mutex with a spinlock rather than fiddling
> with a delayed release.
> 

I hope Joonsoo or someone else brings up specific concerns.  I do not
know enough about all CMA use cases.  Certainly, in this specific use
case converting to a spinlock would not be an issue.  Do note that we
would want to convert to an irq safe spinlock and disable irqs if that
makes any difference in the discussion.
-- 
Mike Kravetz


Re: [PATCH 1/8] mm: cma: introduce cma_release_nowait()

2021-03-25 Thread Roman Gushchin
On Thu, Mar 25, 2021 at 01:12:51PM -0700, Minchan Kim wrote:
> On Thu, Mar 25, 2021 at 06:15:11PM +0100, David Hildenbrand wrote:
> > On 25.03.21 17:56, Mike Kravetz wrote:
> > > On 3/25/21 3:22 AM, Michal Hocko wrote:
> > > > On Thu 25-03-21 10:56:38, David Hildenbrand wrote:
> > > > > On 25.03.21 01:28, Mike Kravetz wrote:
> > > > > > From: Roman Gushchin 
> > > > > > 
> > > > > > cma_release() has to lock the cma_lock mutex to clear the cma 
> > > > > > bitmap.
> > > > > > It makes it a blocking function, which complicates its usage from
> > > > > > non-blocking contexts. For instance, hugetlbfs code is temporarily
> > > > > > dropping the hugetlb_lock spinlock to call cma_release().
> > > > > > 
> > > > > > This patch introduces a non-blocking cma_release_nowait(), which
> > > > > > postpones the cma bitmap clearance. It's done later from a work
> > > > > > context. The first page in the cma allocation is used to store
> > > > > > the work struct. Because CMA allocations and de-allocations are
> > > > > > usually not that frequent, a single global workqueue is used.
> > > > > > 
> > > > > > To make sure that subsequent cma_alloc() call will pass, cma_alloc()
> > > > > > flushes the cma_release_wq workqueue. To avoid a performance
> > > > > > regression in the case when only cma_release() is used, gate it
> > > > > > by a per-cma area flag, which is set by the first call
> > > > > > of cma_release_nowait().
> > > > > > 
> > > > > > Signed-off-by: Roman Gushchin 
> > > > > > [mike.krav...@oracle.com: rebased to 
> > > > > > v5.12-rc3-mmotm-2021-03-17-22-24]
> > > > > > Signed-off-by: Mike Kravetz 
> > > > > > ---
> > > > > 
> > > > > 
> > > > > 1. Is there a real reason this is a mutex and not a spin lock? It 
> > > > > seems to
> > > > > only protect the bitmap. Are bitmaps that huge that we spend a 
> > > > > significant
> > > > > amount of time in there?
> > > > 
> > > > Good question. Looking at the code it doesn't seem that there is any
> > > > blockable operation or any heavy lifting done under the lock.
> > > > 7ee793a62fa8 ("cma: Remove potential deadlock situation") has introduced
> > > > the lock and there was a simple bitmat protection back then. I suspect
> > > > the patch just followed the cma_mutex lead and used the same type of the
> > > > lock. cma_mutex used to protect alloc_contig_range which is sleepable.
> > > > 
> > > > This all suggests that there is no real reason to use a sleepable lock
> > > > at all and we do not need all this heavy lifting.
> > > > 
> > > 
> > > When Roman first proposed these patches, I brought up the same issue:
> > > 
> > > https://lore.kernel.org/linux-mm/20201022023352.gc300...@carbon.dhcp.thefacebook.com/
> > > 
> > > Previously, Roman proposed replacing the mutex with a spinlock but
> > > Joonsoo was opposed.
> > > 
> > > Adding Joonsoo on Cc:
> > > 
> > 
> > There has to be a good reason not to. And if there is a good reason,
> > lockless clearing might be one feasible alternative.
> 
> I also don't think nowait variant is good idea. If the scanning of
> bitmap is *really* significant, it might be signal that we need to
> introduce different technique or data structure not bitmap rather
> than a new API variant.

I'd also prefer to just replace the mutex with a spinlock rather than fiddling
with a delayed release.

Thanks!


Re: [PATCH 1/8] mm: cma: introduce cma_release_nowait()

2021-03-25 Thread Minchan Kim
On Thu, Mar 25, 2021 at 06:15:11PM +0100, David Hildenbrand wrote:
> On 25.03.21 17:56, Mike Kravetz wrote:
> > On 3/25/21 3:22 AM, Michal Hocko wrote:
> > > On Thu 25-03-21 10:56:38, David Hildenbrand wrote:
> > > > On 25.03.21 01:28, Mike Kravetz wrote:
> > > > > From: Roman Gushchin 
> > > > > 
> > > > > cma_release() has to lock the cma_lock mutex to clear the cma bitmap.
> > > > > It makes it a blocking function, which complicates its usage from
> > > > > non-blocking contexts. For instance, hugetlbfs code is temporarily
> > > > > dropping the hugetlb_lock spinlock to call cma_release().
> > > > > 
> > > > > This patch introduces a non-blocking cma_release_nowait(), which
> > > > > postpones the cma bitmap clearance. It's done later from a work
> > > > > context. The first page in the cma allocation is used to store
> > > > > the work struct. Because CMA allocations and de-allocations are
> > > > > usually not that frequent, a single global workqueue is used.
> > > > > 
> > > > > To make sure that subsequent cma_alloc() call will pass, cma_alloc()
> > > > > flushes the cma_release_wq workqueue. To avoid a performance
> > > > > regression in the case when only cma_release() is used, gate it
> > > > > by a per-cma area flag, which is set by the first call
> > > > > of cma_release_nowait().
> > > > > 
> > > > > Signed-off-by: Roman Gushchin 
> > > > > [mike.krav...@oracle.com: rebased to v5.12-rc3-mmotm-2021-03-17-22-24]
> > > > > Signed-off-by: Mike Kravetz 
> > > > > ---
> > > > 
> > > > 
> > > > 1. Is there a real reason this is a mutex and not a spin lock? It seems 
> > > > to
> > > > only protect the bitmap. Are bitmaps that huge that we spend a 
> > > > significant
> > > > amount of time in there?
> > > 
> > > Good question. Looking at the code it doesn't seem that there is any
> > > blockable operation or any heavy lifting done under the lock.
> > > 7ee793a62fa8 ("cma: Remove potential deadlock situation") has introduced
> > > the lock and there was a simple bitmat protection back then. I suspect
> > > the patch just followed the cma_mutex lead and used the same type of the
> > > lock. cma_mutex used to protect alloc_contig_range which is sleepable.
> > > 
> > > This all suggests that there is no real reason to use a sleepable lock
> > > at all and we do not need all this heavy lifting.
> > > 
> > 
> > When Roman first proposed these patches, I brought up the same issue:
> > 
> > https://lore.kernel.org/linux-mm/20201022023352.gc300...@carbon.dhcp.thefacebook.com/
> > 
> > Previously, Roman proposed replacing the mutex with a spinlock but
> > Joonsoo was opposed.
> > 
> > Adding Joonsoo on Cc:
> > 
> 
> There has to be a good reason not to. And if there is a good reason,
> lockless clearing might be one feasible alternative.

I also don't think nowait variant is good idea. If the scanning of
bitmap is *really* significant, it might be signal that we need to
introduce different technique or data structure not bitmap rather
than a new API variant.


Re: [PATCH 1/8] mm: cma: introduce cma_release_nowait()

2021-03-25 Thread David Hildenbrand

On 25.03.21 17:56, Mike Kravetz wrote:

On 3/25/21 3:22 AM, Michal Hocko wrote:

On Thu 25-03-21 10:56:38, David Hildenbrand wrote:

On 25.03.21 01:28, Mike Kravetz wrote:

From: Roman Gushchin 

cma_release() has to lock the cma_lock mutex to clear the cma bitmap.
It makes it a blocking function, which complicates its usage from
non-blocking contexts. For instance, hugetlbfs code is temporarily
dropping the hugetlb_lock spinlock to call cma_release().

This patch introduces a non-blocking cma_release_nowait(), which
postpones the cma bitmap clearance. It's done later from a work
context. The first page in the cma allocation is used to store
the work struct. Because CMA allocations and de-allocations are
usually not that frequent, a single global workqueue is used.

To make sure that subsequent cma_alloc() call will pass, cma_alloc()
flushes the cma_release_wq workqueue. To avoid a performance
regression in the case when only cma_release() is used, gate it
by a per-cma area flag, which is set by the first call
of cma_release_nowait().

Signed-off-by: Roman Gushchin 
[mike.krav...@oracle.com: rebased to v5.12-rc3-mmotm-2021-03-17-22-24]
Signed-off-by: Mike Kravetz 
---



1. Is there a real reason this is a mutex and not a spin lock? It seems to
only protect the bitmap. Are bitmaps that huge that we spend a significant
amount of time in there?


Good question. Looking at the code it doesn't seem that there is any
blockable operation or any heavy lifting done under the lock.
7ee793a62fa8 ("cma: Remove potential deadlock situation") has introduced
the lock and there was a simple bitmat protection back then. I suspect
the patch just followed the cma_mutex lead and used the same type of the
lock. cma_mutex used to protect alloc_contig_range which is sleepable.

This all suggests that there is no real reason to use a sleepable lock
at all and we do not need all this heavy lifting.



When Roman first proposed these patches, I brought up the same issue:

https://lore.kernel.org/linux-mm/20201022023352.gc300...@carbon.dhcp.thefacebook.com/

Previously, Roman proposed replacing the mutex with a spinlock but
Joonsoo was opposed.

Adding Joonsoo on Cc:



There has to be a good reason not to. And if there is a good reason, 
lockless clearing might be one feasible alternative.


--
Thanks,

David / dhildenb



Re: [PATCH 1/8] mm: cma: introduce cma_release_nowait()

2021-03-25 Thread Mike Kravetz
On 3/25/21 3:22 AM, Michal Hocko wrote:
> On Thu 25-03-21 10:56:38, David Hildenbrand wrote:
>> On 25.03.21 01:28, Mike Kravetz wrote:
>>> From: Roman Gushchin 
>>>
>>> cma_release() has to lock the cma_lock mutex to clear the cma bitmap.
>>> It makes it a blocking function, which complicates its usage from
>>> non-blocking contexts. For instance, hugetlbfs code is temporarily
>>> dropping the hugetlb_lock spinlock to call cma_release().
>>>
>>> This patch introduces a non-blocking cma_release_nowait(), which
>>> postpones the cma bitmap clearance. It's done later from a work
>>> context. The first page in the cma allocation is used to store
>>> the work struct. Because CMA allocations and de-allocations are
>>> usually not that frequent, a single global workqueue is used.
>>>
>>> To make sure that subsequent cma_alloc() call will pass, cma_alloc()
>>> flushes the cma_release_wq workqueue. To avoid a performance
>>> regression in the case when only cma_release() is used, gate it
>>> by a per-cma area flag, which is set by the first call
>>> of cma_release_nowait().
>>>
>>> Signed-off-by: Roman Gushchin 
>>> [mike.krav...@oracle.com: rebased to v5.12-rc3-mmotm-2021-03-17-22-24]
>>> Signed-off-by: Mike Kravetz 
>>> ---
>>
>>
>> 1. Is there a real reason this is a mutex and not a spin lock? It seems to
>> only protect the bitmap. Are bitmaps that huge that we spend a significant
>> amount of time in there?
> 
> Good question. Looking at the code it doesn't seem that there is any
> blockable operation or any heavy lifting done under the lock.
> 7ee793a62fa8 ("cma: Remove potential deadlock situation") has introduced
> the lock and there was a simple bitmat protection back then. I suspect
> the patch just followed the cma_mutex lead and used the same type of the
> lock. cma_mutex used to protect alloc_contig_range which is sleepable.
> 
> This all suggests that there is no real reason to use a sleepable lock
> at all and we do not need all this heavy lifting.
> 

When Roman first proposed these patches, I brought up the same issue:

https://lore.kernel.org/linux-mm/20201022023352.gc300...@carbon.dhcp.thefacebook.com/

Previously, Roman proposed replacing the mutex with a spinlock but
Joonsoo was opposed.

Adding Joonsoo on Cc:
-- 
Mike Kravetz


Re: [PATCH 1/8] mm: cma: introduce cma_release_nowait()

2021-03-25 Thread Michal Hocko
On Thu 25-03-21 11:17:32, Oscar Salvador wrote:
> On Thu, Mar 25, 2021 at 11:11:49AM +0100, Michal Hocko wrote:
> > I have overlooked that
> > +static void cma_clear_bitmap_fn(struct work_struct *work)
> > +{
> > +   struct cma_clear_bitmap_work *w;
> > +
> > +   w = container_of(work, struct cma_clear_bitmap_work, work);
> > +
> > +   cma_clear_bitmap(w->cma, w->pfn, w->count);
> > +
> > +   __free_page(pfn_to_page(w->pfn));
> > +}
> > 
> > should be doing free_contig_range with w->count target.
> 
> That is currently done in cma_release_nowait().
> You meant we should move that work there in the wq?

I have missed that part. My bad. But it seems this whole thing is moot
because we can make the lock a spinlock as pointed out by David.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH 1/8] mm: cma: introduce cma_release_nowait()

2021-03-25 Thread Michal Hocko
On Thu 25-03-21 10:56:38, David Hildenbrand wrote:
> On 25.03.21 01:28, Mike Kravetz wrote:
> > From: Roman Gushchin 
> > 
> > cma_release() has to lock the cma_lock mutex to clear the cma bitmap.
> > It makes it a blocking function, which complicates its usage from
> > non-blocking contexts. For instance, hugetlbfs code is temporarily
> > dropping the hugetlb_lock spinlock to call cma_release().
> > 
> > This patch introduces a non-blocking cma_release_nowait(), which
> > postpones the cma bitmap clearance. It's done later from a work
> > context. The first page in the cma allocation is used to store
> > the work struct. Because CMA allocations and de-allocations are
> > usually not that frequent, a single global workqueue is used.
> > 
> > To make sure that subsequent cma_alloc() call will pass, cma_alloc()
> > flushes the cma_release_wq workqueue. To avoid a performance
> > regression in the case when only cma_release() is used, gate it
> > by a per-cma area flag, which is set by the first call
> > of cma_release_nowait().
> > 
> > Signed-off-by: Roman Gushchin 
> > [mike.krav...@oracle.com: rebased to v5.12-rc3-mmotm-2021-03-17-22-24]
> > Signed-off-by: Mike Kravetz 
> > ---
> 
> 
> 1. Is there a real reason this is a mutex and not a spin lock? It seems to
> only protect the bitmap. Are bitmaps that huge that we spend a significant
> amount of time in there?

Good question. Looking at the code it doesn't seem that there is any
blockable operation or any heavy lifting done under the lock.
7ee793a62fa8 ("cma: Remove potential deadlock situation") has introduced
the lock and there was a simple bitmat protection back then. I suspect
the patch just followed the cma_mutex lead and used the same type of the
lock. cma_mutex used to protect alloc_contig_range which is sleepable.

This all suggests that there is no real reason to use a sleepable lock
at all and we do not need all this heavy lifting.

Thanks!
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 1/8] mm: cma: introduce cma_release_nowait()

2021-03-25 Thread Oscar Salvador
On Thu, Mar 25, 2021 at 11:11:49AM +0100, Michal Hocko wrote:
> I have overlooked that
> +static void cma_clear_bitmap_fn(struct work_struct *work)
> +{
> +   struct cma_clear_bitmap_work *w;
> +
> +   w = container_of(work, struct cma_clear_bitmap_work, work);
> +
> +   cma_clear_bitmap(w->cma, w->pfn, w->count);
> +
> +   __free_page(pfn_to_page(w->pfn));
> +}
> 
> should be doing free_contig_range with w->count target.

That is currently done in cma_release_nowait().
You meant we should move that work there in the wq?

I thought the reason for creating a WQ in the first place was to have a
non-blocking cma_release() variant, as we might block down the chain
when clearing the bitmat, I do not think this was about freeing up
memory, or at least the changelog did not mention it.

Also, IIUC, we use the first page to store the workqueue information (we use it
as a storage for it, in cma_release_nowait()), that is why once we are done with
the workqueue work in cma_clear_bitmap_fn(), we can free that page.

-- 
Oscar Salvador
SUSE L3


Re: [PATCH 1/8] mm: cma: introduce cma_release_nowait()

2021-03-25 Thread David Hildenbrand

On 25.03.21 11:11, Michal Hocko wrote:

On Thu 25-03-21 10:45:19, Michal Hocko wrote:

On Wed 24-03-21 17:28:28, Mike Kravetz wrote:
[...]

  phys_addr_t cma_get_base(const struct cma *cma)
  {
return PFN_PHYS(cma->base_pfn);
@@ -146,6 +155,10 @@ static int __init cma_init_reserved_areas(void)
for (i = 0; i < cma_area_count; i++)
cma_activate_area(_areas[i]);
  
+	cma_release_wq = create_workqueue("cma_release");


Considering the workqueue is used to free up memory it should likely be
WQ_MEM_RECLAIM to prevent from long stalls when WQs are overloaded.

I cannot judge the CMA parts from a very quick glance this looks
reasonably.


I have overlooked that
+static void cma_clear_bitmap_fn(struct work_struct *work)
+{
+   struct cma_clear_bitmap_work *w;
+
+   w = container_of(work, struct cma_clear_bitmap_work, work);
+
+   cma_clear_bitmap(w->cma, w->pfn, w->count);
+
+   __free_page(pfn_to_page(w->pfn));
+}

should be doing free_contig_range with w->count target.



Then the name "cma_clear_bitmap_fn" is misleading.

--
Thanks,

David / dhildenb



Re: [PATCH 1/8] mm: cma: introduce cma_release_nowait()

2021-03-25 Thread Michal Hocko
On Thu 25-03-21 10:45:19, Michal Hocko wrote:
> On Wed 24-03-21 17:28:28, Mike Kravetz wrote:
> [...]
> >  phys_addr_t cma_get_base(const struct cma *cma)
> >  {
> > return PFN_PHYS(cma->base_pfn);
> > @@ -146,6 +155,10 @@ static int __init cma_init_reserved_areas(void)
> > for (i = 0; i < cma_area_count; i++)
> > cma_activate_area(_areas[i]);
> >  
> > +   cma_release_wq = create_workqueue("cma_release");
> 
> Considering the workqueue is used to free up memory it should likely be
> WQ_MEM_RECLAIM to prevent from long stalls when WQs are overloaded.
> 
> I cannot judge the CMA parts from a very quick glance this looks
> reasonably.

I have overlooked that
+static void cma_clear_bitmap_fn(struct work_struct *work)
+{
+   struct cma_clear_bitmap_work *w;
+
+   w = container_of(work, struct cma_clear_bitmap_work, work);
+
+   cma_clear_bitmap(w->cma, w->pfn, w->count);
+
+   __free_page(pfn_to_page(w->pfn));
+}

should be doing free_contig_range with w->count target.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 1/8] mm: cma: introduce cma_release_nowait()

2021-03-25 Thread Michal Hocko
On Thu 25-03-21 10:54:10, Oscar Salvador wrote:
> On Thu, Mar 25, 2021 at 10:45:18AM +0100, Michal Hocko wrote:
> > On Wed 24-03-21 17:28:28, Mike Kravetz wrote:
> > [...]
> > >  phys_addr_t cma_get_base(const struct cma *cma)
> > >  {
> > >   return PFN_PHYS(cma->base_pfn);
> > > @@ -146,6 +155,10 @@ static int __init cma_init_reserved_areas(void)
> > >   for (i = 0; i < cma_area_count; i++)
> > >   cma_activate_area(_areas[i]);
> > >  
> > > + cma_release_wq = create_workqueue("cma_release");
> > 
> > Considering the workqueue is used to free up memory it should likely be
> > WQ_MEM_RECLAIM to prevent from long stalls when WQs are overloaded.
> 
> I might be missing something but the worqueue->func() seems to not be anything
> for memory related purposes.
> The worqueue calls cma_clear_bitmap_fn(), and the only think that does
> is freeing one page, and clearing the bitmap. 

No, it shouldn't realease a sing page. That is a bug which I have
overlooked as well. I should free up the whole count worth of pages.
But fundamentally the worker does all the heavy lifting and frees up the
memory.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH 1/8] mm: cma: introduce cma_release_nowait()

2021-03-25 Thread David Hildenbrand

On 25.03.21 01:28, Mike Kravetz wrote:

From: Roman Gushchin 

cma_release() has to lock the cma_lock mutex to clear the cma bitmap.
It makes it a blocking function, which complicates its usage from
non-blocking contexts. For instance, hugetlbfs code is temporarily
dropping the hugetlb_lock spinlock to call cma_release().

This patch introduces a non-blocking cma_release_nowait(), which
postpones the cma bitmap clearance. It's done later from a work
context. The first page in the cma allocation is used to store
the work struct. Because CMA allocations and de-allocations are
usually not that frequent, a single global workqueue is used.

To make sure that subsequent cma_alloc() call will pass, cma_alloc()
flushes the cma_release_wq workqueue. To avoid a performance
regression in the case when only cma_release() is used, gate it
by a per-cma area flag, which is set by the first call
of cma_release_nowait().

Signed-off-by: Roman Gushchin 
[mike.krav...@oracle.com: rebased to v5.12-rc3-mmotm-2021-03-17-22-24]
Signed-off-by: Mike Kravetz 
---



1. Is there a real reason this is a mutex and not a spin lock? It seems 
to only protect the bitmap. Are bitmaps that huge that we spend a 
significant amount of time in there?


Because I also read "Because CMA allocations and de-allocations are
usually not that frequent".

With a spinlock, you would no longer be sleeping, but obviously you 
might end up waiting for the lock ;) Not sure if that would help.


2. IIUC, if we would do the clearing completely lockless and use atomic 
bitmap ops instead, only cma_debug_show_areas() would see slight 
inconsistencies. As long as the setting code (-> allocation code) holds 
the lock, I think this should be fine (-> no double allocations).


(sorry if that has already been discussed)

--
Thanks,

David / dhildenb



Re: [PATCH 1/8] mm: cma: introduce cma_release_nowait()

2021-03-25 Thread Oscar Salvador
On Thu, Mar 25, 2021 at 10:45:18AM +0100, Michal Hocko wrote:
> On Wed 24-03-21 17:28:28, Mike Kravetz wrote:
> [...]
> >  phys_addr_t cma_get_base(const struct cma *cma)
> >  {
> > return PFN_PHYS(cma->base_pfn);
> > @@ -146,6 +155,10 @@ static int __init cma_init_reserved_areas(void)
> > for (i = 0; i < cma_area_count; i++)
> > cma_activate_area(_areas[i]);
> >  
> > +   cma_release_wq = create_workqueue("cma_release");
> 
> Considering the workqueue is used to free up memory it should likely be
> WQ_MEM_RECLAIM to prevent from long stalls when WQs are overloaded.

I might be missing something but the worqueue->func() seems to not be anything
for memory related purposes.

The worqueue calls cma_clear_bitmap_fn(), and the only think that does
is freeing one page, and clearing the bitmap. 
I might be reading the CMA code wrongly, but I cannot see any freeing up
of memory in cma_clear_bitmap()->__bitmap_clear() (or its siblings).

The actual freeing of memory seems to happen in cma_release_no_wait()
with: 

 if (count > 1)
   free_contig_range(pfn + 1, count - 1);

-- 
Oscar Salvador
SUSE L3


Re: [PATCH 1/8] mm: cma: introduce cma_release_nowait()

2021-03-25 Thread Michal Hocko
On Wed 24-03-21 17:28:28, Mike Kravetz wrote:
[...]
>  phys_addr_t cma_get_base(const struct cma *cma)
>  {
>   return PFN_PHYS(cma->base_pfn);
> @@ -146,6 +155,10 @@ static int __init cma_init_reserved_areas(void)
>   for (i = 0; i < cma_area_count; i++)
>   cma_activate_area(_areas[i]);
>  
> + cma_release_wq = create_workqueue("cma_release");

Considering the workqueue is used to free up memory it should likely be
WQ_MEM_RECLAIM to prevent from long stalls when WQs are overloaded.

I cannot judge the CMA parts from a very quick glance this looks
reasonably.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 1/8] mm: cma: introduce cma_release_nowait()

2021-03-25 Thread Oscar Salvador
On Wed, Mar 24, 2021 at 05:28:28PM -0700, Mike Kravetz wrote:
> +struct cma_clear_bitmap_work {
> + struct work_struct work;
> + struct cma *cma;
> + unsigned long pfn;
> + unsigned int count;
> +};
> +
>  struct cma cma_areas[MAX_CMA_AREAS];
>  unsigned cma_area_count;
>  
> +struct workqueue_struct *cma_release_wq;

should this be static?

> +
>  phys_addr_t cma_get_base(const struct cma *cma)
>  {
>   return PFN_PHYS(cma->base_pfn);
> @@ -146,6 +155,10 @@ static int __init cma_init_reserved_areas(void)
>   for (i = 0; i < cma_area_count; i++)
>   cma_activate_area(_areas[i]);
>  
> + cma_release_wq = create_workqueue("cma_release");
> + if (!cma_release_wq)
> + return -ENOMEM;
> +

I did not check the code that goes through the initcalls and maybe we
cannot really have this scneario, but what happens if we return -ENOMEM?
Because I can see that later in cma_release_nowait() you mess with
cma_release_wq. Can it be that at that stage cma_release_wq == NULL? due
to -ENOMEM, or are we guaranteed to never reach that point?

Also, should the cma_release_wq go before the cma_activate_area?

> +bool cma_release_nowait(struct cma *cma, const struct page *pages,
> + unsigned int count)
> +{
> + struct cma_clear_bitmap_work *work;
> + unsigned long pfn;
> +
> + if (!cma || !pages)
> + return false;
> +
> + pr_debug("%s(page %p)\n", __func__, (void *)pages);

cma_release() seems to have:

pr_debug("%s(page %p, count %u)\n", __func__, (void *)pages, count);

any reason to not have the same here?


> +
> + pfn = page_to_pfn(pages);
> +
> + if (pfn < cma->base_pfn || pfn >= cma->base_pfn + cma->count)
> + return false;
> +
> + VM_BUG_ON(pfn + count > cma->base_pfn + cma->count);
> +
> + /*
> +  * Set CMA_DELAYED_RELEASE flag: subsequent cma_alloc()'s
> +  * will wait for the async part of cma_release_nowait() to
> +  * finish.
> +  */
> + if (unlikely(!test_bit(CMA_DELAYED_RELEASE, >flags)))
> + set_bit(CMA_DELAYED_RELEASE, >flags);

It seems this cannot really happen? This is the only place we set the
bit, right?
Why not set the bit unconditionally? Against what this is guarding us?

> +
> + /*
> +  * To make cma_release_nowait() non-blocking, cma bitmap is cleared
> +  * from a work context (see cma_clear_bitmap_fn()). The first page
> +  * in the cma allocation is used to store the work structure,
> +  * so it's released after the cma bitmap clearance. Other pages
> +  * are released immediately as previously.
> +  */
> + if (count > 1)
> + free_contig_range(pfn + 1, count - 1);
> +
> + work = (struct cma_clear_bitmap_work *)page_to_virt(pages);
> + INIT_WORK(>work, cma_clear_bitmap_fn);
> + work->cma = cma;
> + work->pfn = pfn;
> + work->count = count;
> + queue_work(cma_release_wq, >work);

As I said above, can cma_release_wq be NULL if we had -ENOMEM before?


-- 
Oscar Salvador
SUSE L3