Re: [PATCH 2/2] percpu: km: no need to consider pcpu_group_offsets[0]

2019-02-25 Thread den...@kernel.org
On Sun, Feb 24, 2019 at 01:13:50PM +, Peng Fan wrote:
> percpu-km is used on UP systems which only has one group,
> so the group offset will be always 0, there is no need
> to subtract pcpu_group_offsets[0] when assigning chunk->base_addr
> 
> Signed-off-by: Peng Fan 
> ---
>  mm/percpu-km.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/percpu-km.c b/mm/percpu-km.c
> index 66e5598be876..8872c21a487b 100644
> --- a/mm/percpu-km.c
> +++ b/mm/percpu-km.c
> @@ -67,7 +67,7 @@ static struct pcpu_chunk *pcpu_create_chunk(gfp_t gfp)
>   pcpu_set_page_chunk(nth_page(pages, i), chunk);
>  
>   chunk->data = pages;
> - chunk->base_addr = page_address(pages) - pcpu_group_offsets[0];
> + chunk->base_addr = page_address(pages);
>  
>   spin_lock_irqsave(&pcpu_lock, flags);
>   pcpu_chunk_populated(chunk, 0, nr_pages, false);
> -- 
> 2.16.4
> 

While I do think you're right, creating a chunk is not a part of the
critical path and subtracting 0 is incredibly minor overhead. So I'd
rather keep the code as is to maintain consistency between percpu-vm.c
and percpu-km.c.

Thanks,
Dennis


Re: [RFC] percpu: decrease pcpu_nr_slots by 1

2019-02-25 Thread den...@kernel.org
On Sun, Feb 24, 2019 at 09:17:08AM +, Peng Fan wrote:
> Entry pcpu_slot[pcpu_nr_slots - 2] is wasted with current code logic.
> pcpu_nr_slots is calculated with `__pcpu_size_to_slot(size) + 2`.
> Take pcpu_unit_size as 1024 for example, __pcpu_size_to_slot will
> return max(11 - PCPU_SLOT_BASE_SHIFT + 2, 1), it is 8, so the
> pcpu_nr_slots will be 10.
> 
> The chunk with free_bytes 1024 will be linked into pcpu_slot[9].
> However free_bytes in range [512,1024) will be linked into
> pcpu_slot[7], because `fls(512) - PCPU_SLOT_BASE_SHIFT + 2` is 7.
> So pcpu_slot[8] is has no chance to be used.
> 
> According comments of PCPU_SLOT_BASE_SHIFT, 1~31 bytes share the same slot
> and PCPU_SLOT_BASE_SHIFT is defined as 5. But actually 1~15 share the
> same slot 1 if we not take PCPU_MIN_ALLOC_SIZE into consideration, 16~31
> share slot 2. Calculation as below:
> highbit = fls(16) -> highbit = 5
> max(5 - PCPU_SLOT_BASE_SHIFT + 2, 1) equals 2, not 1.
> 
> This patch by decreasing pcpu_nr_slots to avoid waste one slot and
> let [PCPU_MIN_ALLOC_SIZE, 31) really share the same slot.
> 
> Signed-off-by: Peng Fan 
> ---
> 
> V1:
>  Not very sure about whether it is intended to leave the slot there.
> 
>  mm/percpu.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/percpu.c b/mm/percpu.c
> index 8d9933db6162..12a9ba38f0b5 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -219,7 +219,7 @@ static bool pcpu_addr_in_chunk(struct pcpu_chunk *chunk, 
> void *addr)
>  static int __pcpu_size_to_slot(int size)
>  {
>   int highbit = fls(size);/* size is in bytes */
> - return max(highbit - PCPU_SLOT_BASE_SHIFT + 2, 1);
> + return max(highbit - PCPU_SLOT_BASE_SHIFT + 1, 1);
>  }

Honestly, it may be better to just have [1-16) [16-31) be separate. I'm
working on a change to this area, so I may change what's going on here.

>  
>  static int pcpu_size_to_slot(int size)
> @@ -2145,7 +2145,7 @@ int __init pcpu_setup_first_chunk(const struct 
> pcpu_alloc_info *ai,
>* Allocate chunk slots.  The additional last slot is for
>* empty chunks.
>*/
> - pcpu_nr_slots = __pcpu_size_to_slot(pcpu_unit_size) + 2;
> + pcpu_nr_slots = __pcpu_size_to_slot(pcpu_unit_size) + 1;
>   pcpu_slot = memblock_alloc(pcpu_nr_slots * sizeof(pcpu_slot[0]),
>  SMP_CACHE_BYTES);
>   for (i = 0; i < pcpu_nr_slots; i++)
> -- 
> 2.16.4
> 

This is a tricky change. The nice thing about keeping the additional
slot around is that it ensures a distinction between a completely empty
chunk and a nearly empty chunk. It happens to be that the logic creates
power of 2 chunks which ends up being an additional slot anyway. So,
given that this logic is tricky and architecture dependent, I don't feel
comfortable making this change as the risk greatly outweighs the
benefit.

Thanks,
Dennis


Re: [PATCH 2/2] percpu: pcpu_next_md_free_region: inclusive check for PCPU_BITMAP_BLOCK_BITS

2019-03-04 Thread den...@kernel.org
Hi Peng,

On Mon, Mar 04, 2019 at 10:33:55AM +, Peng Fan wrote:
> If the block [contig_hint_start, contig_hint_start + contig_hint)
> matches block->right_free area, need use "<=", not "<".
> 
> Signed-off-by: Peng Fan 
> ---
> 
> V1:
>   Based on https://patchwork.kernel.org/cover/10832459/ applied linux-next
>   boot test on qemu aarch64
> 
>  mm/percpu.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/percpu.c b/mm/percpu.c
> index 5ee90fc34ea3..0f91f1d883c6 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -390,7 +390,8 @@ static void pcpu_next_md_free_region(struct pcpu_chunk 
> *chunk, int *bit_off,
>*/
>   *bits = block->contig_hint;
>   if (*bits && block->contig_hint_start >= block_off &&
> - *bits + block->contig_hint_start < PCPU_BITMAP_BLOCK_BITS) {
> + *bits + block->contig_hint_start <=
> + PCPU_BITMAP_BLOCK_BITS) {
>   *bit_off = pcpu_block_off_to_off(i,
>   block->contig_hint_start);
>   return;
> -- 
> 2.16.4
> 

This is wrong. This iterator is for updating contig hints and not for
finding fit.

Have you tried reproducing and proving the issue you are seeing? In
general, making changes to percpu carries a lot of risk. I really only
want to be taking code that is provably solving a problem and not
supported by just code inspection. Boot testing for a change like this
is really not enough as we need to be sure changes like these are
correct.

Thanks,
Dennis