Re: [PATCH 2/2] mm, util: account_locked_vm() does not hold mmap_lock
On Thu, 30 Jul 2020 16:57:05 -0400 Daniel Jordan wrote: > On Wed, Jul 29, 2020 at 12:21:11PM -0700, Hugh Dickins wrote: > > On Sun, 26 Jul 2020, Pengfei Li wrote: > > > > > Since mm->locked_vm is already an atomic counter, > > > account_locked_vm() does not need to hold mmap_lock. > > > > I am worried that this patch, already added to mmotm, along with its > > 1/2 making locked_vm an atomic64, might be rushed into v5.9 with > > just that two-line commit description, and no discussion at all. > > > > locked_vm belongs fundamentally to mm/mlock.c, and the lock to guard > > it is mmap_lock; and mlock() has some complicated stuff to do under > > that lock while it decides how to adjust locked_vm. > > > > It is very easy to convert an unsigned long to an atomic64_t, but > > "atomic read, check limit and do stuff, atomic add" does not give > > the same guarantee as holding the right lock around it all. > > Yes, this is why I withdrew my attempt to do something similar last > year, I didn't want to make the accounting racy. Stack and heap > growing and mremap would be affected in addition to mlock. > > It'd help to hear more about the motivation for this. > Thanks for your comments. My motivation is to allow mm related counters to be safely read and written without holding mmap_lock. But sorry i didn't do well. -- Pengfei
Re: [PATCH 2/2] mm, util: account_locked_vm() does not hold mmap_lock
On Wed, 29 Jul 2020 12:21:11 -0700 (PDT) Hugh Dickins wrote: Sorry for the late reply. > On Sun, 26 Jul 2020, Pengfei Li wrote: > > > Since mm->locked_vm is already an atomic counter, > > account_locked_vm() does not need to hold mmap_lock. > > I am worried that this patch, already added to mmotm, along with its > 1/2 making locked_vm an atomic64, might be rushed into v5.9 with just > that two-line commit description, and no discussion at all. > > locked_vm belongs fundamentally to mm/mlock.c, and the lock to guard > it is mmap_lock; and mlock() has some complicated stuff to do under > that lock while it decides how to adjust locked_vm. > > It is very easy to convert an unsigned long to an atomic64_t, but > "atomic read, check limit and do stuff, atomic add" does not give > the same guarantee as holding the right lock around it all. > > (At the very least, __account_locked_vm() in 1/2 should be changed to > replace its atomic64_add by an atomic64_cmpxchg, to enforce the limit > that it just checked. But that will be no more than lipstick on a > pig, when the right lock that everyone else agrees upon is not being > held.) > Thank you for your detailed comment. You are right, I should use atomic64_cmpxchg to guarantee the limit of RLIMIT_MEMLOCK. > Now, it can be argued that our locked_vm and pinned_vm maintenance > is so random and deficient, and too difficult to keep right across > a sprawl of drivers, that we should just be grateful for those that > do volunteer to subject themselves to RLIMIT_MEMLOCK limitation, > and never mind if it's a little racy. > > And it may well be that all those who have made considerable efforts > in the past to improve the situation, have more interesting things to > devote their time to, and would prefer not to get dragged back here. > > But let's at least give this a little more visibility, and hope > to hear opinions one way or the other from those who care. Thank you. My patch should be more thoughtful. I will send an email to Stephen soon asking to remove these two patches from -mm tree. -- Pengfei
[PATCH 2/2] mm, util: account_locked_vm() does not hold mmap_lock
Since mm->locked_vm is already an atomic counter, account_locked_vm() does not need to hold mmap_lock. Signed-off-by: Pengfei Li --- drivers/vfio/vfio_iommu_type1.c | 8 ++-- mm/util.c | 15 +++ 2 files changed, 5 insertions(+), 18 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 78013be07fe7..53818fce78a6 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -376,12 +376,8 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async) if (!mm) return -ESRCH; /* process exited */ - ret = mmap_write_lock_killable(mm); - if (!ret) { - ret = __account_locked_vm(mm, abs(npage), npage > 0, dma->task, - dma->lock_cap); - mmap_write_unlock(mm); - } + ret = __account_locked_vm(mm, abs(npage), npage > 0, + dma->task, dma->lock_cap); if (async) mmput(mm); diff --git a/mm/util.c b/mm/util.c index 473add0dc275..320fdd537aea 100644 --- a/mm/util.c +++ b/mm/util.c @@ -424,8 +424,7 @@ void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack) * @task:task used to check RLIMIT_MEMLOCK * @bypass_rlim: %true if checking RLIMIT_MEMLOCK should be skipped * - * Assumes @task and @mm are valid (i.e. at least one reference on each), and - * that mmap_lock is held as writer. + * Assumes @task and @mm are valid (i.e. at least one reference on each). * * Return: * * 0 on success @@ -437,8 +436,6 @@ int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc, unsigned long locked_vm, limit; int ret = 0; - mmap_assert_write_locked(mm); - locked_vm = atomic64_read(>locked_vm); if (inc) { if (!bypass_rlim) { @@ -476,17 +473,11 @@ EXPORT_SYMBOL_GPL(__account_locked_vm); */ int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc) { - int ret; - if (pages == 0 || !mm) return 0; - mmap_write_lock(mm); - ret = __account_locked_vm(mm, pages, inc, current, - capable(CAP_IPC_LOCK)); - mmap_write_unlock(mm); - - return ret; + return __account_locked_vm(mm, pages, inc, + current, capable(CAP_IPC_LOCK)); } EXPORT_SYMBOL_GPL(account_locked_vm); -- 2.26.2
[PATCH 1/2] mm: make mm->locked_vm an atomic64 counter
Like commit 70f8a3ca68d3 ("mm: make mm->pinned_vm an atomic64 counter"). By making mm->locked_vm an atomic64 counter, we can safely modify it without holding mmap_lock. The reason for using atomic64 instead of atomic_long is to keep the same as mm->pinned_vm, and there is no need to worry about overflow. Signed-off-by: Pengfei Li --- drivers/infiniband/sw/siw/siw_verbs.c | 12 +++- drivers/vfio/vfio_iommu_type1.c | 6 -- fs/io_uring.c | 4 ++-- fs/proc/task_mmu.c| 2 +- include/linux/mm_types.h | 4 ++-- kernel/fork.c | 2 +- mm/debug.c| 5 +++-- mm/mlock.c| 4 ++-- mm/mmap.c | 18 +- mm/mremap.c | 6 +++--- mm/util.c | 6 +++--- 11 files changed, 37 insertions(+), 32 deletions(-) diff --git a/drivers/infiniband/sw/siw/siw_verbs.c b/drivers/infiniband/sw/siw/siw_verbs.c index adafa1b8bebe..bf78d7988442 100644 --- a/drivers/infiniband/sw/siw/siw_verbs.c +++ b/drivers/infiniband/sw/siw/siw_verbs.c @@ -1293,14 +1293,16 @@ struct ib_mr *siw_reg_user_mr(struct ib_pd *pd, u64 start, u64 len, goto err_out; } if (mem_limit != RLIM_INFINITY) { - unsigned long num_pages = - (PAGE_ALIGN(len + (start & ~PAGE_MASK))) >> PAGE_SHIFT; + unsigned long num_pages, locked_pages; + + num_pages = (PAGE_ALIGN(len + (start & ~PAGE_MASK))) + >> PAGE_SHIFT; + locked_pages = atomic64_read(>mm->locked_vm); mem_limit >>= PAGE_SHIFT; - if (num_pages > mem_limit - current->mm->locked_vm) { + if (num_pages > mem_limit - locked_pages) { siw_dbg_pd(pd, "pages req %lu, max %lu, lock %lu\n", - num_pages, mem_limit, - current->mm->locked_vm); + num_pages, mem_limit, locked_pages); rv = -ENOMEM; goto err_out; } diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 9d41105bfd01..78013be07fe7 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -509,7 +509,8 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr, * pages are already counted against the user. */ if (!rsvd && !vfio_find_vpfn(dma, iova)) { - if (!dma->lock_cap && current->mm->locked_vm + 1 > limit) { + if (!dma->lock_cap && + atomic64_read(>mm->locked_vm) + 1 > limit) { put_pfn(*pfn_base, dma->prot); pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n", __func__, limit << PAGE_SHIFT); @@ -536,7 +537,8 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr, if (!rsvd && !vfio_find_vpfn(dma, iova)) { if (!dma->lock_cap && - current->mm->locked_vm + lock_acct + 1 > limit) { + atomic64_read(>mm->locked_vm) + + lock_acct + 1 > limit) { put_pfn(pfn, dma->prot); pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n", __func__, limit << PAGE_SHIFT); diff --git a/fs/io_uring.c b/fs/io_uring.c index 7cf2f295fba7..f1241c6314e6 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -7371,7 +7371,7 @@ static void io_unaccount_mem(struct io_ring_ctx *ctx, unsigned long nr_pages, if (ctx->sqo_mm) { if (acct == ACCT_LOCKED) - ctx->sqo_mm->locked_vm -= nr_pages; + atomic64_sub(nr_pages, >sqo_mm->locked_vm); else if (acct == ACCT_PINNED) atomic64_sub(nr_pages, >sqo_mm->pinned_vm); } @@ -7390,7 +7390,7 @@ static int io_account_mem(struct io_ring_ctx *ctx, unsigned long nr_pages, if (ctx->sqo_mm) { if (acct == ACCT_LOCKED) - ctx->sqo_mm->locked_vm += nr_pages; + atomic64_add(nr_pages, >sqo_mm->locked_vm); else if (acct == ACCT_PINNED) atomic64_add(nr_pages, >sqo_mm->pinned_vm); } diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index df2f0f05f5ba..2af56e68766e 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -58,
Re: [PATCH v6 0/3] mm, slab: Make kmalloc_info[] contain all types of names
On Thu, Oct 3, 2019 at 7:06 AM Andrew Morton wrote: > > On Mon, 23 Sep 2019 20:27:25 +0800 Pengfei Li wrote: > > > Changes in v6 > > -- > > 1. abandon patch 4-7 (Because there is not enough reason to explain > > that they are beneficial) > > So http://lkml.kernel.org/r/20190923004022.GC15734@shao2-debian can no > longer occur? > Sorry for such a late reply. Yes, it‘s caused by [patch v5 5/7]. So do not occur in v6. > > Changes in v5 > > -- > > 1. patch 1/7: > > - rename SET_KMALLOC_SIZE to INIT_KMALLOC_INFO > > 2. patch 5/7: > > - fix build errors (Reported-by: kbuild test robot) > > - make all_kmalloc_info[] static (Reported-by: kbuild test robot) > > 3. patch 6/7: > > - for robustness, determine kmalloc_cache is !NULL in > > new_kmalloc_cache() > > 4. add ack tag from David Rientjes > > > > >
[PATCH v6 2/3] mm, slab: Remove unused kmalloc_size()
The size of kmalloc can be obtained from kmalloc_info[], so remove kmalloc_size() that will not be used anymore. Signed-off-by: Pengfei Li Acked-by: Vlastimil Babka Acked-by: Roman Gushchin Acked-by: David Rientjes --- include/linux/slab.h | 20 mm/slab.c| 5 +++-- mm/slab_common.c | 5 ++--- 3 files changed, 5 insertions(+), 25 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index 56c9c7e..e773e57 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -557,26 +557,6 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags) return __kmalloc(size, flags); } -/* - * Determine size used for the nth kmalloc cache. - * return size or 0 if a kmalloc cache for that - * size does not exist - */ -static __always_inline unsigned int kmalloc_size(unsigned int n) -{ -#ifndef CONFIG_SLOB - if (n > 2) - return 1U << n; - - if (n == 1 && KMALLOC_MIN_SIZE <= 32) - return 96; - - if (n == 2 && KMALLOC_MIN_SIZE <= 64) - return 192; -#endif - return 0; -} - static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node) { #ifndef CONFIG_SLOB diff --git a/mm/slab.c b/mm/slab.c index c42b621..7bc4e90 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1248,8 +1248,9 @@ void __init kmem_cache_init(void) */ kmalloc_caches[KMALLOC_NORMAL][INDEX_NODE] = create_kmalloc_cache( kmalloc_info[INDEX_NODE].name[KMALLOC_NORMAL], - kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS, - 0, kmalloc_size(INDEX_NODE)); + kmalloc_info[INDEX_NODE].size, + ARCH_KMALLOC_FLAGS, 0, + kmalloc_info[INDEX_NODE].size); slab_state = PARTIAL_NODE; setup_kmalloc_cache_index_table(); diff --git a/mm/slab_common.c b/mm/slab_common.c index 4e78490..df030cf 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1239,11 +1239,10 @@ void __init create_kmalloc_caches(slab_flags_t flags) struct kmem_cache *s = kmalloc_caches[KMALLOC_NORMAL][i]; if (s) { - unsigned int size = kmalloc_size(i); - kmalloc_caches[KMALLOC_DMA][i] = create_kmalloc_cache( kmalloc_info[i].name[KMALLOC_DMA], - size, SLAB_CACHE_DMA | flags, 0, 0); + kmalloc_info[i].size, + SLAB_CACHE_DMA | flags, 0, 0); } } #endif -- 2.7.4
[PATCH v6 1/3] mm, slab: Make kmalloc_info[] contain all types of names
There are three types of kmalloc, KMALLOC_NORMAL, KMALLOC_RECLAIM and KMALLOC_DMA. The name of KMALLOC_NORMAL is contained in kmalloc_info[].name, but the names of KMALLOC_RECLAIM and KMALLOC_DMA are dynamically generated by kmalloc_cache_name(). This patch predefines the names of all types of kmalloc to save the time spent dynamically generating names. Besides, remove the kmalloc_cache_name() that is no longer used. Signed-off-by: Pengfei Li Acked-by: Vlastimil Babka Acked-by: Roman Gushchin Acked-by: David Rientjes --- mm/slab.c| 2 +- mm/slab.h| 2 +- mm/slab_common.c | 91 ++-- 3 files changed, 51 insertions(+), 44 deletions(-) diff --git a/mm/slab.c b/mm/slab.c index 9df3705..c42b621 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1247,7 +1247,7 @@ void __init kmem_cache_init(void) * structures first. Without this, further allocations will bug. */ kmalloc_caches[KMALLOC_NORMAL][INDEX_NODE] = create_kmalloc_cache( - kmalloc_info[INDEX_NODE].name, + kmalloc_info[INDEX_NODE].name[KMALLOC_NORMAL], kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS, 0, kmalloc_size(INDEX_NODE)); slab_state = PARTIAL_NODE; diff --git a/mm/slab.h b/mm/slab.h index 9057b80..2fc8f95 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -76,7 +76,7 @@ extern struct kmem_cache *kmem_cache; /* A table of kmalloc cache names and sizes */ extern const struct kmalloc_info_struct { - const char *name; + const char *name[NR_KMALLOC_TYPES]; unsigned int size; } kmalloc_info[]; diff --git a/mm/slab_common.c b/mm/slab_common.c index 807490f..4e78490 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1092,26 +1092,56 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags) return kmalloc_caches[kmalloc_type(flags)][index]; } +#ifdef CONFIG_ZONE_DMA +#define INIT_KMALLOC_INFO(__size, __short_size)\ +{ \ + .name[KMALLOC_NORMAL] = "kmalloc-" #__short_size, \ + .name[KMALLOC_RECLAIM] = "kmalloc-rcl-" #__short_size, \ + .name[KMALLOC_DMA] = "dma-kmalloc-" #__short_size, \ + .size = __size, \ +} +#else +#define INIT_KMALLOC_INFO(__size, __short_size)\ +{ \ + .name[KMALLOC_NORMAL] = "kmalloc-" #__short_size, \ + .name[KMALLOC_RECLAIM] = "kmalloc-rcl-" #__short_size, \ + .size = __size, \ +} +#endif + /* * kmalloc_info[] is to make slub_debug=,kmalloc-xx option work at boot time. * kmalloc_index() supports up to 2^26=64MB, so the final entry of the table is * kmalloc-67108864. */ const struct kmalloc_info_struct kmalloc_info[] __initconst = { - {NULL, 0}, {"kmalloc-96", 96}, - {"kmalloc-192", 192}, {"kmalloc-8", 8}, - {"kmalloc-16", 16}, {"kmalloc-32", 32}, - {"kmalloc-64", 64}, {"kmalloc-128", 128}, - {"kmalloc-256", 256}, {"kmalloc-512", 512}, - {"kmalloc-1k", 1024}, {"kmalloc-2k", 2048}, - {"kmalloc-4k", 4096}, {"kmalloc-8k", 8192}, - {"kmalloc-16k", 16384}, {"kmalloc-32k", 32768}, - {"kmalloc-64k", 65536}, {"kmalloc-128k", 131072}, - {"kmalloc-256k", 262144}, {"kmalloc-512k", 524288}, - {"kmalloc-1M",1048576}, {"kmalloc-2M",2097152}, - {"kmalloc-4M",4194304}, {"kmalloc-8M",8388608}, - {"kmalloc-16M", 16777216}, {"kmalloc-32M", 33554432}, - {"kmalloc-64M", 67108864} + INIT_KMALLOC_INFO(0, 0), + INIT_KMALLOC_INFO(96, 96), + INIT_KMALLOC_INFO(192, 192), + INIT_KMALLOC_INFO(8, 8), + INIT_KMALLOC_INFO(16, 16), + INIT_KMALLOC_INFO(32, 32), + INIT_KMALLOC_INFO(64, 64), + INIT_KMALLOC_INFO(128, 128), + INIT_KMALLOC_INFO(256, 256), + INIT_KMALLOC_INFO(512, 512), + INIT_KMALLOC_INFO(1024, 1k), + INIT_KMALLOC_INFO(2048, 2k), + INIT_KMALLOC_INFO(4096, 4k), + INIT_KMALLOC_INFO(8192, 8k), + INIT_KMALLOC_INFO(16384, 16k), + INIT_KMALLOC_INFO(32768, 32k
[PATCH v6 3/3] mm, slab_common: Use enum kmalloc_cache_type to iterate over kmalloc caches
The type of local variable *type* of new_kmalloc_cache() should be enum kmalloc_cache_type instead of int, so correct it. Signed-off-by: Pengfei Li Acked-by: Vlastimil Babka Acked-by: Roman Gushchin Acked-by: David Rientjes --- mm/slab_common.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/mm/slab_common.c b/mm/slab_common.c index df030cf..23054b8 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1192,7 +1192,7 @@ void __init setup_kmalloc_cache_index_table(void) } static void __init -new_kmalloc_cache(int idx, int type, slab_flags_t flags) +new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags) { if (type == KMALLOC_RECLAIM) flags |= SLAB_RECLAIM_ACCOUNT; @@ -1210,7 +1210,8 @@ new_kmalloc_cache(int idx, int type, slab_flags_t flags) */ void __init create_kmalloc_caches(slab_flags_t flags) { - int i, type; + int i; + enum kmalloc_cache_type type; for (type = KMALLOC_NORMAL; type <= KMALLOC_RECLAIM; type++) { for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++) { -- 2.7.4
[PATCH v6 0/3] mm, slab: Make kmalloc_info[] contain all types of names
Changes in v6 -- 1. abandon patch 4-7 (Because there is not enough reason to explain that they are beneficial) Changes in v5 -- 1. patch 1/7: - rename SET_KMALLOC_SIZE to INIT_KMALLOC_INFO 2. patch 5/7: - fix build errors (Reported-by: kbuild test robot) - make all_kmalloc_info[] static (Reported-by: kbuild test robot) 3. patch 6/7: - for robustness, determine kmalloc_cache is !NULL in new_kmalloc_cache() 4. add ack tag from David Rientjes Changes in v4 -- 1. [old] abandon patch 4/4 2. [new] patch 4/7: - return ZERO_SIZE_ALLOC instead 0 for zero sized requests 3. [new] patch 5/7: - reorder kmalloc_info[], kmalloc_caches[] (in order of size) - hard to split, so slightly larger 4. [new] patch 6/7: - initialize kmalloc_cache[] with the same size but different types 5. [new] patch 7/7: - modify kmalloc_caches[type][idx] to kmalloc_caches[idx][type] Patch 4-7 are newly added, more information can be obtained from commit messages. Changes in v3 -- 1. restore __initconst (patch 1/4) 2. rename patch 3/4 3. add more clarification for patch 4/4 4. add ack tag from Roman Gushchin Changes in v2 -- 1. remove __initconst (patch 1/5) 2. squash patch 2/5 3. add ack tag from Vlastimil Babka There are three types of kmalloc, KMALLOC_NORMAL, KMALLOC_RECLAIM and KMALLOC_DMA. The name of KMALLOC_NORMAL is contained in kmalloc_info[].name, but the names of KMALLOC_RECLAIM and KMALLOC_DMA are dynamically generated by kmalloc_cache_name(). Patch1 predefines the names of all types of kmalloc to save the time spent dynamically generating names. These changes make sense, and the time spent by new_kmalloc_cache() has been reduced by approximately 36.3%. Time spent by new_kmalloc_cache() (CPU cycles) 5.3-rc7 66264 5.3-rc7+patch42188 Pengfei Li (3): mm, slab: Make kmalloc_info[] contain all types of names mm, slab: Remove unused kmalloc_size() mm, slab_common: Use enum kmalloc_cache_type to iterate over kmalloc caches include/linux/slab.h | 20 --- mm/slab.c| 7 ++-- mm/slab.h| 2 +- mm/slab_common.c | 99 4 files changed, 58 insertions(+), 70 deletions(-) -- 2.7.4
Re: [RESEND v4 5/7] mm, slab_common: Make kmalloc_caches[] start at size KMALLOC_MIN_SIZE
On Mon, Sep 16, 2019 at 5:38 AM David Rientjes wrote: > > On Mon, 16 Sep 2019, Pengfei Li wrote: > > > Currently, kmalloc_cache[] is not sorted by size, kmalloc_cache[0] > > is kmalloc-96, kmalloc_cache[1] is kmalloc-192 (when ARCH_DMA_MINALIGN > > is not defined). > > > > As suggested by Vlastimil Babka, > > > > "Since you're doing these cleanups, have you considered reordering > > kmalloc_info, size_index, kmalloc_index() etc so that sizes 96 and 192 > > are ordered naturally between 64, 128 and 256? That should remove > > various special casing such as in create_kmalloc_caches(). I can't > > guarantee it will be possible without breaking e.g. constant folding > > optimizations etc., but seems to me it should be feasible. (There are > > definitely more places to change than those I listed.)" > > > > So this patch reordered kmalloc_info[], kmalloc_caches[], and modified > > kmalloc_index() and kmalloc_slab() accordingly. > > > > As a result, there is no subtle judgment about size in > > create_kmalloc_caches(). And initialize kmalloc_cache[] from 0 instead > > of KMALLOC_SHIFT_LOW. > > > > I used ./scripts/bloat-o-meter to measure the impact of this patch on > > performance. The results show that it brings some benefits. > > > > Considering the size change of kmalloc_info[], the size of the code is > > actually about 641 bytes less. > > > > bloat-o-meter is reporting a net benefit of -241 bytes for this, so not > sure about relevancy of the difference for only kmalloc_info. > Thanks for your comments. The size of kmalloc_info has been increased from 432 to 832 (it was renamed to all_kmalloc_info ). So when the change in kmalloc_info size is not included, it actually reduces 641 bytes. > This, to me, looks like increased complexity for the statically allocated > arrays vs the runtime complexity when initializing the caches themselves. For runtime kmalloc requests, the implementation of kmalloc_slab() is no different than before. For constant kmalloc requests, the smaller size of .text means better (the compiler does constant optimization). Therefore, I don't think this patch adds complexity. > Not sure that this is an improvement given that you still need to do > things like > > +#if KMALLOC_SIZE_96_EXIST == 1 > + if (size > 64 && size <= 96) return (7 - KMALLOC_IDX_ADJ_0); > +#endif > + > +#if KMALLOC_SIZE_192_EXIST == 1 > + if (size > 128 && size <= 192) return (8 - KMALLOC_IDX_ADJ_1); > +#endif kmalloc_index() is difficult to handle for me. At first, I made the judgment in the order of size in kmalloc_index(), /* Order 96, 192 */ static __always_inline unsigned int kmalloc_index(size_t size) { ... if (size <=8) return ( 3 - KMALLOC_IDX_ADJ_0); if (size <= 16) return ( 4 - KMALLOC_IDX_ADJ_0); if (size <= 32) return ( 5 - KMALLOC_IDX_ADJ_0); if (size <= 64) return ( 6 - KMALLOC_IDX_ADJ_0); #if KMALLOC_SIZE_96_EXIST == 1 if (size <= 96) return ( 7 - KMALLOC_IDX_ADJ_0); #endif if (size <= 128) return ( 7 - KMALLOC_IDX_ADJ_1); #if KMALLOC_SIZE_192_EXIST == 1 if (size <= 192) return ( 8 - KMALLOC_IDX_ADJ_1); #endif if (size <= 256) return ( 8 - KMALLOC_IDX_ADJ_2); ... } but bloat-o-meter shows that I did a bad job. $ ./scripts/bloat-o-meter vmlinux-base vmlinux-patch_1-5-order_96_192 add/remove: 3/7 grow/shrink: 129/167 up/down: 3691/-2530 (1161) Function old new delta all_kmalloc_info - 832+832 jhash7441119+375 __regmap_init 32523411+159 drm_mode_atomic_ioctl 23732479+106 apply_wqattrs_prepare449 531 +82 process_preds 17721851 +79 amd_uncore_cpu_up_prepare251 327 +76 property_entries_dup.part789 861 +72 pnp_register_port_resource98 167 +69 pnp_register_mem_resource 98 167 +69 pnp_register_irq_resource146 206 +60 pnp_register_dma_resource 61 121 +60 pcpu_get_vm_areas 30863139 +53 sr_probe13601409 +49 fl_create675 724 +49 ext4_expand_extra_isize_ea 22182265 +47 fib6_info_alloc 60 105 +45 init_worker_pool 247 291 +44 ctnetlink_alloc_filter.part
Re: [RFC PATCH] mm/slub: remove left-over debugging code
On Tue, Sep 17, 2019 at 2:32 AM David Rientjes wrote: > > On Mon, 16 Sep 2019, Qian Cai wrote: > > > SLUB_RESILIENCY_TEST and SLUB_DEBUG_CMPXCHG look like some left-over > > debugging code during the internal development that probably nobody uses > > it anymore. Remove them to make the world greener. > > Adding Pengfei Li who has been working on a patchset for modified handling > of kmalloc cache initialization and touches the resiliency test. > Thanks for looping me in. My opinion is the same as David Rientjes. The resiliency test should not be removed but should be improved. > I still find the resiliency test to be helpful/instructional for handling > unexpected conditions in these caches, so I'd suggest against removing it: > the only downside is that it's additional source code. But it's helpful > source code for reference. > > The cmpxchg failures could likely be more generalized beyond SLUB since > there will be other dependencies in the kernel than just this allocator. > > (I assume you didn't send a Signed-off-by line because this is an RFC.) > > > --- > > mm/slub.c | 110 > > -- > > 1 file changed, 110 deletions(-) > > > > diff --git a/mm/slub.c b/mm/slub.c > > index 8834563cdb4b..f97155ba097d 100644 > > --- a/mm/slub.c > > +++ b/mm/slub.c > > @@ -150,12 +150,6 @@ static inline bool kmem_cache_has_cpu_partial(struct > > kmem_cache *s) > > * - Variable sizing of the per node arrays > > */ > > > > -/* Enable to test recovery from slab corruption on boot */ > > -#undef SLUB_RESILIENCY_TEST > > - > > -/* Enable to log cmpxchg failures */ > > -#undef SLUB_DEBUG_CMPXCHG > > - > > /* > > * Mininum number of partial slabs. These will be left on the partial > > * lists even if they are empty. kmem_cache_shrink may reclaim them. > > @@ -392,10 +386,6 @@ static inline bool __cmpxchg_double_slab(struct > > kmem_cache *s, struct page *page > > cpu_relax(); > > stat(s, CMPXCHG_DOUBLE_FAIL); > > > > -#ifdef SLUB_DEBUG_CMPXCHG > > - pr_info("%s %s: cmpxchg double redo ", n, s->name); > > -#endif > > - > > return false; > > } > > > > @@ -433,10 +423,6 @@ static inline bool cmpxchg_double_slab(struct > > kmem_cache *s, struct page *page, > > cpu_relax(); > > stat(s, CMPXCHG_DOUBLE_FAIL); > > > > -#ifdef SLUB_DEBUG_CMPXCHG > > - pr_info("%s %s: cmpxchg double redo ", n, s->name); > > -#endif > > - > > return false; > > } > > > > @@ -2004,45 +1990,11 @@ static inline unsigned long next_tid(unsigned long > > tid) > > return tid + TID_STEP; > > } > > > > -static inline unsigned int tid_to_cpu(unsigned long tid) > > -{ > > - return tid % TID_STEP; > > -} > > - > > -static inline unsigned long tid_to_event(unsigned long tid) > > -{ > > - return tid / TID_STEP; > > -} > > - > > static inline unsigned int init_tid(int cpu) > > { > > return cpu; > > } > > > > -static inline void note_cmpxchg_failure(const char *n, > > - const struct kmem_cache *s, unsigned long tid) > > -{ > > -#ifdef SLUB_DEBUG_CMPXCHG > > - unsigned long actual_tid = __this_cpu_read(s->cpu_slab->tid); > > - > > - pr_info("%s %s: cmpxchg redo ", n, s->name); > > - > > -#ifdef CONFIG_PREEMPT > > - if (tid_to_cpu(tid) != tid_to_cpu(actual_tid)) > > - pr_warn("due to cpu change %d -> %d\n", > > - tid_to_cpu(tid), tid_to_cpu(actual_tid)); > > - else > > -#endif > > - if (tid_to_event(tid) != tid_to_event(actual_tid)) > > - pr_warn("due to cpu running other code. Event %ld->%ld\n", > > - tid_to_event(tid), tid_to_event(actual_tid)); > > - else > > - pr_warn("for unknown reason: actual=%lx was=%lx target=%lx\n", > > - actual_tid, tid, next_tid(tid)); > > -#endif > > - stat(s, CMPXCHG_DOUBLE_CPU_FAIL); > > -} > > - > > static void init_kmem_cache_cpus(struct kmem_cache *s) > > { > > int cpu; > > @@ -2751,7 +2703,6 @@ static __always_inline void *slab_alloc_node(struct > > kmem_cache *s, > > object, tid, > > next_object, next_tid(tid { > > > > - note_cmpxc
Re: [PATCH v5 0/7] mm, slab: Make kmalloc_info[] contain all types of names
On Tue, Sep 17, 2019 at 12:04 AM Christopher Lameter wrote: > > On Mon, 16 Sep 2019, Pengfei Li wrote: > > > The name of KMALLOC_NORMAL is contained in kmalloc_info[].name, > > but the names of KMALLOC_RECLAIM and KMALLOC_DMA are dynamically > > generated by kmalloc_cache_name(). > > > > Patch1 predefines the names of all types of kmalloc to save > > the time spent dynamically generating names. > > > > These changes make sense, and the time spent by new_kmalloc_cache() > > has been reduced by approximately 36.3%. > > This is time spend during boot and does not affect later system > performance. > Yes. > > Time spent by new_kmalloc_cache() > > (CPU cycles) > > 5.3-rc7 66264 > > 5.3-rc7+patch_1-342188 > > Ok. 15k cycles during boot saved. So we save 5 microseconds during bootup? > Yes, this is a very small benefit. > The current approach was created with the view on future setups allowing a > dynamic configuration of kmalloc caches based on need. I.e. ZONE_DMA may > not be needed once the floppy driver no longer makes use of it. Yes, With this in mind, I also used #ifdef for INIT_KMALLOC_INFO.
Re: [RESEND v4 5/7] mm, slab_common: Make kmalloc_caches[] start at size KMALLOC_MIN_SIZE
On Mon, Sep 16, 2019 at 12:54 PM kbuild test robot wrote: > > Hi Pengfei, > > Thank you for the patch! Perhaps something to improve: > > [auto build test WARNING on linus/master] > [cannot apply to v5.3 next-20190915] > [if your patch is applied to the wrong git tree, please drop us a note to > help improve the system] > > url: > https://github.com/0day-ci/linux/commits/Pengfei-Li/mm-slab-Make-kmalloc_info-contain-all-types-of-names/20190916-065820 > reproduce: > # apt-get install sparse > # sparse version: v0.6.1-rc1-7-g2b96cd8-dirty > make ARCH=x86_64 allmodconfig > make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' > > If you fix the issue, kindly add following tag > Reported-by: kbuild test robot > > > sparse warnings: (new ones prefixed by >>) > > >> mm/slab_common.c:1121:34: sparse: sparse: symbol 'all_kmalloc_info' was > >> not declared. Should it be static? Thanks. I will fix it in v5. > > Please review and possibly fold the followup patch. > > --- > 0-DAY kernel test infrastructureOpen Source Technology Center > https://lists.01.org/pipermail/kbuild-all Intel Corporation
Re: [RESEND v4 5/7] mm, slab_common: Make kmalloc_caches[] start at size KMALLOC_MIN_SIZE
On Mon, Sep 16, 2019 at 9:46 AM kbuild test robot wrote: > > Hi Pengfei, > > Thank you for the patch! Yet something to improve: > > [auto build test ERROR on linus/master] > [cannot apply to v5.3 next-20190904] > [if your patch is applied to the wrong git tree, please drop us a note to > help improve the system] > > url: > https://github.com/0day-ci/linux/commits/Pengfei-Li/mm-slab-Make-kmalloc_info-contain-all-types-of-names/20190916-065820 > config: parisc-allmodconfig (attached as .config) > compiler: hppa-linux-gcc (GCC) 7.4.0 > reproduce: > wget > https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O > ~/bin/make.cross > chmod +x ~/bin/make.cross > # save the attached .config to linux build tree > GCC_VERSION=7.4.0 make.cross ARCH=parisc > > If you fix the issue, kindly add following tag > Reported-by: kbuild test robot > > All errors (new ones prefixed by >>): > > >> mm/slab_common.c:1144:34: error: 'KMALLOC_INFO_START_IDX' undeclared here > >> (not in a function); did you mean 'VMALLOC_START'? > kmalloc_info = _kmalloc_info[KMALLOC_INFO_START_IDX]; > ^~ > VMALLOC_START > > vim +1144 mm/slab_common.c > > 1142 > 1143 const struct kmalloc_info_struct * const __initconst > > 1144 kmalloc_info = _kmalloc_info[KMALLOC_INFO_START_IDX]; > 1145 > Thanks. This error is caused by I was mistakenly placed KMALLOC_INFO_SHIFT_LOW and KMALLOC_INFO_START_IDX in the wrong place. (ARCH=sh is the same) I will fix it in v5. > --- > 0-DAY kernel test infrastructureOpen Source Technology Center > https://lists.01.org/pipermail/kbuild-all Intel Corporation
Re: [RESEND v4 6/7] mm, slab_common: Initialize the same size of kmalloc_caches[]
On Mon, Sep 16, 2019 at 5:38 AM David Rientjes wrote: Thanks for your review comments! > > On Mon, 16 Sep 2019, Pengfei Li wrote: > > > diff --git a/mm/slab_common.c b/mm/slab_common.c > > index 2aed30deb071..e7903bd28b1f 100644 > > --- a/mm/slab_common.c > > +++ b/mm/slab_common.c > > @@ -1165,12 +1165,9 @@ void __init setup_kmalloc_cache_index_table(void) > > size_index[size_index_elem(i)] = 0; > > } > > > > -static void __init > > +static __always_inline void __init > > new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t > > flags) > > { > > - if (type == KMALLOC_RECLAIM) > > - flags |= SLAB_RECLAIM_ACCOUNT; > > - > > kmalloc_caches[type][idx] = create_kmalloc_cache( > > kmalloc_info[idx].name[type], > > kmalloc_info[idx].size, flags, 0, > > @@ -1185,30 +1182,22 @@ new_kmalloc_cache(int idx, enum kmalloc_cache_type > > type, slab_flags_t flags) > > void __init create_kmalloc_caches(slab_flags_t flags) > > { > > int i; > > - enum kmalloc_cache_type type; > > > > - for (type = KMALLOC_NORMAL; type <= KMALLOC_RECLAIM; type++) { > > - for (i = 0; i < KMALLOC_CACHE_NUM; i++) { > > - if (!kmalloc_caches[type][i]) > > - new_kmalloc_cache(i, type, flags); > > - } > > - } > > + for (i = 0; i < KMALLOC_CACHE_NUM; i++) { > > + if (!kmalloc_caches[KMALLOC_NORMAL][i]) > > + new_kmalloc_cache(i, KMALLOC_NORMAL, flags); > > > > - /* Kmalloc array is now usable */ > > - slab_state = UP; > > + new_kmalloc_cache(i, KMALLOC_RECLAIM, > > + flags | SLAB_RECLAIM_ACCOUNT); > > This seems less robust, no? Previously we verified that the cache doesn't > exist before creating a new cache over top of it (for NORMAL and RECLAIM). > Now we presume that the RECLAIM cache never exists. > Agree, this is really less robust. I have checked the code and found that there is no place to initialize kmalloc-rcl-xxx before create_kmalloc_caches(). So I assume that kmalloc-rcl-xxx is NULL. > Can we just move a check to new_kmalloc_cache() to see if > kmalloc_caches[type][idx] already exists and, if so, just return? This > should be more robust and simplify create_kmalloc_caches() slightly more. For better robustness, I will do it as you suggested in v5.
Re: [RESEND v4 1/7] mm, slab: Make kmalloc_info[] contain all types of names
On Mon, Sep 16, 2019 at 5:38 AM David Rientjes wrote: > > On Mon, 16 Sep 2019, Pengfei Li wrote: > > > There are three types of kmalloc, KMALLOC_NORMAL, KMALLOC_RECLAIM > > and KMALLOC_DMA. > > > > The name of KMALLOC_NORMAL is contained in kmalloc_info[].name, > > but the names of KMALLOC_RECLAIM and KMALLOC_DMA are dynamically > > generated by kmalloc_cache_name(). > > > > This patch predefines the names of all types of kmalloc to save > > the time spent dynamically generating names. > > > > Besides, remove the kmalloc_cache_name() that is no longer used. > > > > Signed-off-by: Pengfei Li > > Acked-by: Vlastimil Babka > > Acked-by: Roman Gushchin > > Acked-by: David Rientjes > Thanks. > It's unfortunate the existing names are kmalloc-, dma-kmalloc-, and > kmalloc-rcl- since they aren't following any standard naming convention. > > Also not sure I understand the SET_KMALLOC_SIZE naming since this isn't > just setting a size. Maybe better off as INIT_KMALLOC_INFO? Yes, this name is really better. I will rename SET_KMALLOC_SIZE to INIT_KMALLOC_INFO in v5. > > Nothing major though, so: > > Acked-by: David Rientjes
[PATCH v5 7/7] mm, slab_common: Modify kmalloc_caches[type][idx] to kmalloc_caches[idx][type]
KMALLOC_NORMAL is the most frequently accessed, and kmalloc_caches[] is initialized by different types of the same size. So modifying kmalloc_caches[type][idx] to kmalloc_caches[idx][type] will benefit performance. $ ./scripts/bloat-o-meter vmlinux.patch_1-6 vmlinux.patch_1-7 add/remove: 0/0 grow/shrink: 2/57 up/down: 8/-458 (-450) Function old new delta tg3_self_test 42554259 +4 nf_queue 666 670 +4 kmalloc_slab 97 93 -4 i915_sw_fence_await_dma_fence441 437 -4 __igmp_group_dropped 619 615 -4 gss_import_sec_context 176 170 -6 xhci_alloc_command 212 205 -7 xprt_switch_alloc136 128 -8 xhci_segment_alloc 297 289 -8 xhci_ring_alloc 369 361 -8 xhci_mem_init 36643656 -8 xhci_alloc_virt_device 496 488 -8 xhci_alloc_tt_info 346 338 -8 xhci_alloc_stream_info 718 710 -8 xhci_alloc_container_ctx 215 207 -8 xfrm_policy_alloc271 263 -8 tcp_sendmsg_locked 31203112 -8 tcp_md5_do_add 774 766 -8 tcp_fastopen_defer_connect 270 262 -8 sr_read_tochdr.isra 251 243 -8 sr_read_tocentry.isra328 320 -8 sr_is_xa 376 368 -8 sr_get_mcn 260 252 -8 selinux_sk_alloc_security113 105 -8 sdev_evt_send_simple 118 110 -8 sdev_evt_alloc79 71 -8 scsi_probe_and_add_lun 29382930 -8 sbitmap_queue_init_node 418 410 -8 ring_buffer_read_prepare 94 86 -8 request_firmware_nowait 396 388 -8 regulatory_hint_found_beacon 394 386 -8 ohci_urb_enqueue31763168 -8 nla_strdup 142 134 -8 nfs_alloc_seqid 87 79 -8 nfs4_get_state_owner10401032 -8 nfs4_do_close578 570 -8 nf_ct_tmpl_alloc 85 77 -8 mempool_create_node 164 156 -8 ip_setup_cork362 354 -8 ip6_setup_cork 10211013 -8 gss_create_cred 140 132 -8 drm_flip_work_allocate_task 70 62 -8 dma_pool_alloc 410 402 -8 devres_open_group214 206 -8 create_kmalloc_caches203 195 -8 cfg80211_stop_iface 260 252 -8 cfg80211_sinfo_alloc_tid_stats77 69 -8 cfg80211_port_authorized 212 204 -8 cfg80211_parse_mbssid_data 23972389 -8 cfg80211_ibss_joined 335 327 -8 call_usermodehelper_setup149 141 -8 bpf_prog_alloc_no_stats 182 174 -8 blk_alloc_flush_queue191 183 -8 bdi_alloc_node 195 187 -8 audit_log_d_path 196 188 -8 _netlbl_catmap_getnode 247 239 -8 ip_mc_inc_group 475 467 -8 __i915_sw_fence_await_sw_fence 417 405 -12 ida_alloc_range 955 934 -21 Total: Before=14788957, After=14788507, chg -0.00% Signed-off-by: Pengfei Li Acked-by: David Rientjes --- include/linux/slab.h | 6 +++--- mm/slab.c| 4 ++-- mm/slab_common.c | 8 mm/slub.c| 12 ++-- 4 files changed, 15 insertions(+), 15 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index f53bb6980110..0842db5f7053 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -340,7 +340,7 @@ enum kmalloc_cache_type { #ifndef CONFIG_SLOB extern struct kmem_cache * -kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_CACHE_NUM]; +kmalloc_caches[KMALLOC_CACHE_NUM][NR_KMALLOC_TYPES]; static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags) { @@ -582,7 +582,7
[PATCH v5 6/7] mm, slab_common: Initialize the same size of kmalloc_caches[]
In the current implementation, KMALLOC_RECLAIM is not initialized until all the KMALLOC_NORMAL sizes have been initialized. But for a particular size, create_kmalloc_caches() can be executed faster by initializing different types of kmalloc in order. $ ./scripts/bloat-o-meter vmlinux.patch_1-5 vmlinux.patch_1-6 add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-11 (-11) Function old new delta create_kmalloc_caches214 203 -11 Total: Before=14788968, After=14788957, chg -0.00% Although the benefits are small (more judgment is made for robustness), create_kmalloc_caches() is much simpler. Besides, KMALLOC_DMA will be initialized after "slab_state = UP", this does not seem to be necessary. Commit f97d5f634d3b ("slab: Common function to create the kmalloc array") introduces create_kmalloc_caches(). And I found that for SLAB, KMALLOC_DMA is initialized before "slab_state = UP". But for SLUB, KMALLOC_DMA is initialized after "slab_state = UP". Based on this fact, I think it is okay to initialize KMALLOC_DMA before "slab_state = UP". Signed-off-by: Pengfei Li --- mm/slab_common.c | 35 +-- 1 file changed, 13 insertions(+), 22 deletions(-) diff --git a/mm/slab_common.c b/mm/slab_common.c index eeef5ac8d04d..00f2cfc66dbd 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1168,11 +1168,11 @@ void __init setup_kmalloc_cache_index_table(void) size_index[size_index_elem(i)] = 0; } -static void __init +static __always_inline void __init new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags) { - if (type == KMALLOC_RECLAIM) - flags |= SLAB_RECLAIM_ACCOUNT; + if (kmalloc_caches[type][idx]) + return; kmalloc_caches[type][idx] = create_kmalloc_cache( kmalloc_info[idx].name[type], @@ -1188,30 +1188,21 @@ new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags) void __init create_kmalloc_caches(slab_flags_t flags) { int i; - enum kmalloc_cache_type type; - for (type = KMALLOC_NORMAL; type <= KMALLOC_RECLAIM; type++) { - for (i = 0; i < KMALLOC_CACHE_NUM; i++) { - if (!kmalloc_caches[type][i]) - new_kmalloc_cache(i, type, flags); - } - } + for (i = 0; i < KMALLOC_CACHE_NUM; i++) { + new_kmalloc_cache(i, KMALLOC_NORMAL, flags); - /* Kmalloc array is now usable */ - slab_state = UP; + new_kmalloc_cache(i, KMALLOC_RECLAIM, + flags | SLAB_RECLAIM_ACCOUNT); #ifdef CONFIG_ZONE_DMA - for (i = 0; i < KMALLOC_CACHE_NUM; i++) { - struct kmem_cache *s = kmalloc_caches[KMALLOC_NORMAL][i]; - - if (s) { - kmalloc_caches[KMALLOC_DMA][i] = create_kmalloc_cache( - kmalloc_info[i].name[KMALLOC_DMA], - kmalloc_info[i].size, - SLAB_CACHE_DMA | flags, 0, 0); - } - } + new_kmalloc_cache(i, KMALLOC_DMA, + flags | SLAB_CACHE_DMA); #endif + } + + /* Kmalloc array is now usable */ + slab_state = UP; } #endif /* !CONFIG_SLOB */ -- 2.21.0
[PATCH v5 5/7] mm, slab_common: Make kmalloc_caches[] start at size KMALLOC_MIN_SIZE
96 -9 ohci_urb_enqueue31853176 -9 nfs_alloc_seqid 96 87 -9 nfs4_get_state_owner10491040 -9 nfs4_do_close587 578 -9 mempool_create_node 173 164 -9 ip6_setup_cork 10301021 -9 dma_pool_alloc 419 410 -9 devres_open_group223 214 -9 cfg80211_parse_mbssid_data 24062397 -9 __igmp_group_dropped 629 619 -10 gss_import_sec_context 187 176 -11 ip_setup_cork374 362 -12 __i915_sw_fence_await_sw_fence 429 417 -12 kmalloc_caches 336 312 -24 create_kmalloc_caches270 214 -56 kmalloc_cache_name57 - -57 new_kmalloc_cache112 --112 kmalloc_info 432 8-424 Total: Before=14789209, After=14788968, chg -0.00% Signed-off-by: Pengfei Li --- include/linux/slab.h | 96 -- mm/slab.h| 10 +-- mm/slab_common.c | 161 +-- mm/slub.c| 12 ++-- 4 files changed, 136 insertions(+), 143 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index 1f05f68f2c3e..f53bb6980110 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -297,6 +297,23 @@ static inline void __check_heap_object(const void *ptr, unsigned long n, #define KMALLOC_MIN_SIZE (1 << KMALLOC_SHIFT_LOW) #endif +#define KMALLOC_CACHE_MIN_NUM (KMALLOC_SHIFT_HIGH - KMALLOC_SHIFT_LOW + 1) + +#if KMALLOC_MIN_SIZE > 64 + #define KMALLOC_SIZE_96_EXIST (0) + #define KMALLOC_SIZE_192_EXIST (0) +#elif KMALLOC_MIN_SIZE > 32 + #define KMALLOC_SIZE_96_EXIST (0) + #define KMALLOC_SIZE_192_EXIST (1) +#else + #define KMALLOC_SIZE_96_EXIST (1) + #define KMALLOC_SIZE_192_EXIST (1) +#endif + +#define KMALLOC_CACHE_NUM (KMALLOC_CACHE_MIN_NUM \ + + KMALLOC_SIZE_96_EXIST \ + + KMALLOC_SIZE_192_EXIST) + /* * This restriction comes from byte sized index implementation. * Page size is normally 2^12 bytes and, in this case, if we want to use @@ -323,7 +340,7 @@ enum kmalloc_cache_type { #ifndef CONFIG_SLOB extern struct kmem_cache * -kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1]; +kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_CACHE_NUM]; static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags) { @@ -345,13 +362,18 @@ static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags) #endif } +/* kmalloc_index adjust size: (0, 96] */ +#define KMALLOC_IDX_ADJ_0 (KMALLOC_SHIFT_LOW) + +/* kmalloc_index adjust size: (96, 192] */ +#define KMALLOC_IDX_ADJ_1 (KMALLOC_IDX_ADJ_0 - KMALLOC_SIZE_96_EXIST) + +/* kmalloc_index adjust size: (192, N] */ +#define KMALLOC_IDX_ADJ_2 (KMALLOC_IDX_ADJ_1 - KMALLOC_SIZE_192_EXIST) + /* * Figure out which kmalloc slab an allocation of a certain size * belongs to. - * 0 = zero alloc - * 1 = 65 .. 96 bytes - * 2 = 129 .. 192 bytes - * n = 2^(n-1)+1 .. 2^n */ static __always_inline unsigned int kmalloc_index(size_t size) { @@ -359,36 +381,40 @@ static __always_inline unsigned int kmalloc_index(size_t size) return ZERO_SIZE_ALLOC; if (size <= KMALLOC_MIN_SIZE) - return KMALLOC_SHIFT_LOW; - - if (KMALLOC_MIN_SIZE <= 32 && size > 64 && size <= 96) - return 1; - if (KMALLOC_MIN_SIZE <= 64 && size > 128 && size <= 192) - return 2; - if (size <= 8) return 3; - if (size <= 16) return 4; - if (size <= 32) return 5; - if (size <= 64) return 6; - if (size <=128) return 7; - if (size <=256) return 8; - if (size <=512) return 9; - if (size <= 1024) return 10; - if (size <= 2 * 1024) return 11; - if (size <= 4 * 1024) return 12; - if (size <= 8 * 1024) return 13; - if (size <= 16 * 1024) return 14; - if (size <= 32 * 1024) return 15; - if (size <= 64 * 1024) return 16; - if (size <= 128 * 1024) return 17; - if (size <= 256 * 1024) return 18; - if (size <= 512 * 1024) return 19; - if (size <= 1024 * 1024) return 20; - if (size <= 2 * 1024 * 1024) return 21; - if (size <= 4 * 1024 * 1024) return 22; - if (size <= 8 * 1024 * 1024) ret
[PATCH v5 4/7] mm, slab: Return ZERO_SIZE_ALLOC for zero sized kmalloc requests
This is a preparation patch, just replace 0 with ZERO_SIZE_ALLOC as the return value of zero sized requests. Signed-off-by: Pengfei Li Acked-by: David Rientjes --- include/linux/slab.h | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index e773e5764b7b..1f05f68f2c3e 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -121,14 +121,20 @@ #define SLAB_DEACTIVATED ((slab_flags_t __force)0x1000U) /* - * ZERO_SIZE_PTR will be returned for zero sized kmalloc requests. + * ZERO_SIZE_ALLOC will be returned by kmalloc_index() if it was zero sized + * requests. * + * After that, ZERO_SIZE_PTR will be returned by the function that called + * kmalloc_index(). + * Dereferencing ZERO_SIZE_PTR will lead to a distinct access fault. * * ZERO_SIZE_PTR can be passed to kfree though in the same way that NULL can. * Both make kfree a no-op. */ -#define ZERO_SIZE_PTR ((void *)16) +#define ZERO_SIZE_ALLOC(UINT_MAX) + +#define ZERO_SIZE_PTR ((void *)16) #define ZERO_OR_NULL_PTR(x) ((unsigned long)(x) <= \ (unsigned long)ZERO_SIZE_PTR) @@ -350,7 +356,7 @@ static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags) static __always_inline unsigned int kmalloc_index(size_t size) { if (!size) - return 0; + return ZERO_SIZE_ALLOC; if (size <= KMALLOC_MIN_SIZE) return KMALLOC_SHIFT_LOW; @@ -546,7 +552,7 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags) #ifndef CONFIG_SLOB index = kmalloc_index(size); - if (!index) + if (index == ZERO_SIZE_ALLOC) return ZERO_SIZE_PTR; return kmem_cache_alloc_trace( @@ -564,7 +570,7 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node) size <= KMALLOC_MAX_CACHE_SIZE) { unsigned int i = kmalloc_index(size); - if (!i) + if (i == ZERO_SIZE_ALLOC) return ZERO_SIZE_PTR; return kmem_cache_alloc_node_trace( -- 2.21.0
[PATCH v5 3/7] mm, slab_common: Use enum kmalloc_cache_type to iterate over kmalloc caches
The type of local variable *type* of new_kmalloc_cache() should be enum kmalloc_cache_type instead of int, so correct it. Signed-off-by: Pengfei Li Acked-by: Vlastimil Babka Acked-by: Roman Gushchin Acked-by: David Rientjes --- mm/slab_common.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/mm/slab_common.c b/mm/slab_common.c index df030cf9f44f..23054b8b75b6 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1192,7 +1192,7 @@ void __init setup_kmalloc_cache_index_table(void) } static void __init -new_kmalloc_cache(int idx, int type, slab_flags_t flags) +new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags) { if (type == KMALLOC_RECLAIM) flags |= SLAB_RECLAIM_ACCOUNT; @@ -1210,7 +1210,8 @@ new_kmalloc_cache(int idx, int type, slab_flags_t flags) */ void __init create_kmalloc_caches(slab_flags_t flags) { - int i, type; + int i; + enum kmalloc_cache_type type; for (type = KMALLOC_NORMAL; type <= KMALLOC_RECLAIM; type++) { for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++) { -- 2.21.0
[PATCH v5 2/7] mm, slab: Remove unused kmalloc_size()
The size of kmalloc can be obtained from kmalloc_info[], so remove kmalloc_size() that will not be used anymore. Signed-off-by: Pengfei Li Acked-by: Vlastimil Babka Acked-by: Roman Gushchin Acked-by: David Rientjes --- include/linux/slab.h | 20 mm/slab.c| 5 +++-- mm/slab_common.c | 5 ++--- 3 files changed, 5 insertions(+), 25 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index 56c9c7eed34e..e773e5764b7b 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -557,26 +557,6 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags) return __kmalloc(size, flags); } -/* - * Determine size used for the nth kmalloc cache. - * return size or 0 if a kmalloc cache for that - * size does not exist - */ -static __always_inline unsigned int kmalloc_size(unsigned int n) -{ -#ifndef CONFIG_SLOB - if (n > 2) - return 1U << n; - - if (n == 1 && KMALLOC_MIN_SIZE <= 32) - return 96; - - if (n == 2 && KMALLOC_MIN_SIZE <= 64) - return 192; -#endif - return 0; -} - static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node) { #ifndef CONFIG_SLOB diff --git a/mm/slab.c b/mm/slab.c index c42b6211f42e..7bc4e90e1147 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1248,8 +1248,9 @@ void __init kmem_cache_init(void) */ kmalloc_caches[KMALLOC_NORMAL][INDEX_NODE] = create_kmalloc_cache( kmalloc_info[INDEX_NODE].name[KMALLOC_NORMAL], - kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS, - 0, kmalloc_size(INDEX_NODE)); + kmalloc_info[INDEX_NODE].size, + ARCH_KMALLOC_FLAGS, 0, + kmalloc_info[INDEX_NODE].size); slab_state = PARTIAL_NODE; setup_kmalloc_cache_index_table(); diff --git a/mm/slab_common.c b/mm/slab_common.c index 4e78490292df..df030cf9f44f 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1239,11 +1239,10 @@ void __init create_kmalloc_caches(slab_flags_t flags) struct kmem_cache *s = kmalloc_caches[KMALLOC_NORMAL][i]; if (s) { - unsigned int size = kmalloc_size(i); - kmalloc_caches[KMALLOC_DMA][i] = create_kmalloc_cache( kmalloc_info[i].name[KMALLOC_DMA], - size, SLAB_CACHE_DMA | flags, 0, 0); + kmalloc_info[i].size, + SLAB_CACHE_DMA | flags, 0, 0); } } #endif -- 2.21.0
[PATCH v5 1/7] mm, slab: Make kmalloc_info[] contain all types of names
There are three types of kmalloc, KMALLOC_NORMAL, KMALLOC_RECLAIM and KMALLOC_DMA. The name of KMALLOC_NORMAL is contained in kmalloc_info[].name, but the names of KMALLOC_RECLAIM and KMALLOC_DMA are dynamically generated by kmalloc_cache_name(). This patch predefines the names of all types of kmalloc to save the time spent dynamically generating names. Besides, remove the kmalloc_cache_name() that is no longer used. Signed-off-by: Pengfei Li Acked-by: Vlastimil Babka Acked-by: Roman Gushchin Acked-by: David Rientjes --- mm/slab.c| 2 +- mm/slab.h| 2 +- mm/slab_common.c | 91 ++-- 3 files changed, 51 insertions(+), 44 deletions(-) diff --git a/mm/slab.c b/mm/slab.c index 9df370558e5d..c42b6211f42e 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1247,7 +1247,7 @@ void __init kmem_cache_init(void) * structures first. Without this, further allocations will bug. */ kmalloc_caches[KMALLOC_NORMAL][INDEX_NODE] = create_kmalloc_cache( - kmalloc_info[INDEX_NODE].name, + kmalloc_info[INDEX_NODE].name[KMALLOC_NORMAL], kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS, 0, kmalloc_size(INDEX_NODE)); slab_state = PARTIAL_NODE; diff --git a/mm/slab.h b/mm/slab.h index 9057b8056b07..2fc8f956906a 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -76,7 +76,7 @@ extern struct kmem_cache *kmem_cache; /* A table of kmalloc cache names and sizes */ extern const struct kmalloc_info_struct { - const char *name; + const char *name[NR_KMALLOC_TYPES]; unsigned int size; } kmalloc_info[]; diff --git a/mm/slab_common.c b/mm/slab_common.c index 807490fe217a..4e78490292df 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1092,26 +1092,56 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags) return kmalloc_caches[kmalloc_type(flags)][index]; } +#ifdef CONFIG_ZONE_DMA +#define INIT_KMALLOC_INFO(__size, __short_size)\ +{ \ + .name[KMALLOC_NORMAL] = "kmalloc-" #__short_size, \ + .name[KMALLOC_RECLAIM] = "kmalloc-rcl-" #__short_size, \ + .name[KMALLOC_DMA] = "dma-kmalloc-" #__short_size, \ + .size = __size, \ +} +#else +#define INIT_KMALLOC_INFO(__size, __short_size)\ +{ \ + .name[KMALLOC_NORMAL] = "kmalloc-" #__short_size, \ + .name[KMALLOC_RECLAIM] = "kmalloc-rcl-" #__short_size, \ + .size = __size, \ +} +#endif + /* * kmalloc_info[] is to make slub_debug=,kmalloc-xx option work at boot time. * kmalloc_index() supports up to 2^26=64MB, so the final entry of the table is * kmalloc-67108864. */ const struct kmalloc_info_struct kmalloc_info[] __initconst = { - {NULL, 0}, {"kmalloc-96", 96}, - {"kmalloc-192", 192}, {"kmalloc-8", 8}, - {"kmalloc-16", 16}, {"kmalloc-32", 32}, - {"kmalloc-64", 64}, {"kmalloc-128", 128}, - {"kmalloc-256", 256}, {"kmalloc-512", 512}, - {"kmalloc-1k", 1024}, {"kmalloc-2k", 2048}, - {"kmalloc-4k", 4096}, {"kmalloc-8k", 8192}, - {"kmalloc-16k", 16384}, {"kmalloc-32k", 32768}, - {"kmalloc-64k", 65536}, {"kmalloc-128k", 131072}, - {"kmalloc-256k", 262144}, {"kmalloc-512k", 524288}, - {"kmalloc-1M",1048576}, {"kmalloc-2M",2097152}, - {"kmalloc-4M",4194304}, {"kmalloc-8M",8388608}, - {"kmalloc-16M", 16777216}, {"kmalloc-32M", 33554432}, - {"kmalloc-64M", 67108864} + INIT_KMALLOC_INFO(0, 0), + INIT_KMALLOC_INFO(96, 96), + INIT_KMALLOC_INFO(192, 192), + INIT_KMALLOC_INFO(8, 8), + INIT_KMALLOC_INFO(16, 16), + INIT_KMALLOC_INFO(32, 32), + INIT_KMALLOC_INFO(64, 64), + INIT_KMALLOC_INFO(128, 128), + INIT_KMALLOC_INFO(256, 256), + INIT_KMALLOC_INFO(512, 512), + INIT_KMALLOC_INFO(1024, 1k), + INIT_KMALLOC_INFO(2048, 2k), + INIT_KMALLOC_INFO(4096, 4k), + INIT_KMALLOC_INFO(8192, 8k), + INIT_KMALLOC_INFO(16384, 16k), +
[PATCH v5 0/7] mm, slab: Make kmalloc_info[] contain all types of names
tcp_sendmsg_locked 31293112 -17 tcp_md5_do_add 783 766 -17 tcp_fastopen_defer_connect 279 262 -17 sr_read_tochdr.isra 260 243 -17 sr_read_tocentry.isra337 320 -17 sr_is_xa 385 368 -17 sr_get_mcn 269 252 -17 scsi_probe_and_add_lun 29472930 -17 ring_buffer_read_prepare 103 86 -17 request_firmware_nowait 405 388 -17 ohci_urb_enqueue31853168 -17 nfs_alloc_seqid 96 79 -17 nfs4_get_state_owner10491032 -17 nfs4_do_close587 570 -17 mempool_create_node 173 156 -17 ip6_setup_cork 10301013 -17 ida_alloc_range 951 934 -17 gss_import_sec_context 187 170 -17 dma_pool_alloc 419 402 -17 devres_open_group223 206 -17 cfg80211_parse_mbssid_data 24062389 -17 ip_setup_cork374 354 -20 kmalloc_caches 336 312 -24 __i915_sw_fence_await_sw_fence 429 405 -24 kmalloc_cache_name57 - -57 create_kmalloc_caches270 195 -75 new_kmalloc_cache112 --112 kmalloc_info 432 8-424 Total: Before=14789209, After=14788507, chg -0.00% Pengfei Li (7): mm, slab: Make kmalloc_info[] contain all types of names mm, slab: Remove unused kmalloc_size() mm, slab_common: Use enum kmalloc_cache_type to iterate over kmalloc caches mm, slab: Return ZERO_SIZE_ALLOC for zero sized kmalloc requests mm, slab_common: Make kmalloc_caches[] start at size KMALLOC_MIN_SIZE mm, slab_common: Initialize the same size of kmalloc_caches[] mm, slab_common: Modify kmalloc_caches[type][idx] to kmalloc_caches[idx][type] include/linux/slab.h | 136 ++ mm/slab.c| 11 ++- mm/slab.h| 10 +- mm/slab_common.c | 227 ++- mm/slub.c| 12 +-- 5 files changed, 187 insertions(+), 209 deletions(-) -- 2.21.0
[RESEND v4 5/7] mm, slab_common: Make kmalloc_caches[] start at size KMALLOC_MIN_SIZE
96 -9 ohci_urb_enqueue31853176 -9 nfs_alloc_seqid 96 87 -9 nfs4_get_state_owner10491040 -9 nfs4_do_close587 578 -9 mempool_create_node 173 164 -9 ip6_setup_cork 10301021 -9 dma_pool_alloc 419 410 -9 devres_open_group223 214 -9 cfg80211_parse_mbssid_data 24062397 -9 __igmp_group_dropped 629 619 -10 gss_import_sec_context 187 176 -11 ip_setup_cork374 362 -12 __i915_sw_fence_await_sw_fence 429 417 -12 kmalloc_caches 336 312 -24 create_kmalloc_caches270 214 -56 kmalloc_cache_name57 - -57 new_kmalloc_cache112 --112 kmalloc_info 432 8-424 Total: Before=14874616, After=14874375, chg -0.00% Signed-off-by: Pengfei Li --- include/linux/slab.h | 96 -- mm/slab.h| 10 +-- mm/slab_common.c | 158 --- mm/slub.c| 12 ++-- 4 files changed, 133 insertions(+), 143 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index 1f05f68f2c3e..f53bb6980110 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -297,6 +297,23 @@ static inline void __check_heap_object(const void *ptr, unsigned long n, #define KMALLOC_MIN_SIZE (1 << KMALLOC_SHIFT_LOW) #endif +#define KMALLOC_CACHE_MIN_NUM (KMALLOC_SHIFT_HIGH - KMALLOC_SHIFT_LOW + 1) + +#if KMALLOC_MIN_SIZE > 64 + #define KMALLOC_SIZE_96_EXIST (0) + #define KMALLOC_SIZE_192_EXIST (0) +#elif KMALLOC_MIN_SIZE > 32 + #define KMALLOC_SIZE_96_EXIST (0) + #define KMALLOC_SIZE_192_EXIST (1) +#else + #define KMALLOC_SIZE_96_EXIST (1) + #define KMALLOC_SIZE_192_EXIST (1) +#endif + +#define KMALLOC_CACHE_NUM (KMALLOC_CACHE_MIN_NUM \ + + KMALLOC_SIZE_96_EXIST \ + + KMALLOC_SIZE_192_EXIST) + /* * This restriction comes from byte sized index implementation. * Page size is normally 2^12 bytes and, in this case, if we want to use @@ -323,7 +340,7 @@ enum kmalloc_cache_type { #ifndef CONFIG_SLOB extern struct kmem_cache * -kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1]; +kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_CACHE_NUM]; static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags) { @@ -345,13 +362,18 @@ static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags) #endif } +/* kmalloc_index adjust size: (0, 96] */ +#define KMALLOC_IDX_ADJ_0 (KMALLOC_SHIFT_LOW) + +/* kmalloc_index adjust size: (96, 192] */ +#define KMALLOC_IDX_ADJ_1 (KMALLOC_IDX_ADJ_0 - KMALLOC_SIZE_96_EXIST) + +/* kmalloc_index adjust size: (192, N] */ +#define KMALLOC_IDX_ADJ_2 (KMALLOC_IDX_ADJ_1 - KMALLOC_SIZE_192_EXIST) + /* * Figure out which kmalloc slab an allocation of a certain size * belongs to. - * 0 = zero alloc - * 1 = 65 .. 96 bytes - * 2 = 129 .. 192 bytes - * n = 2^(n-1)+1 .. 2^n */ static __always_inline unsigned int kmalloc_index(size_t size) { @@ -359,36 +381,40 @@ static __always_inline unsigned int kmalloc_index(size_t size) return ZERO_SIZE_ALLOC; if (size <= KMALLOC_MIN_SIZE) - return KMALLOC_SHIFT_LOW; - - if (KMALLOC_MIN_SIZE <= 32 && size > 64 && size <= 96) - return 1; - if (KMALLOC_MIN_SIZE <= 64 && size > 128 && size <= 192) - return 2; - if (size <= 8) return 3; - if (size <= 16) return 4; - if (size <= 32) return 5; - if (size <= 64) return 6; - if (size <=128) return 7; - if (size <=256) return 8; - if (size <=512) return 9; - if (size <= 1024) return 10; - if (size <= 2 * 1024) return 11; - if (size <= 4 * 1024) return 12; - if (size <= 8 * 1024) return 13; - if (size <= 16 * 1024) return 14; - if (size <= 32 * 1024) return 15; - if (size <= 64 * 1024) return 16; - if (size <= 128 * 1024) return 17; - if (size <= 256 * 1024) return 18; - if (size <= 512 * 1024) return 19; - if (size <= 1024 * 1024) return 20; - if (size <= 2 * 1024 * 1024) return 21; - if (size <= 4 * 1024 * 1024) return 22; - if (size <= 8 * 1024 * 1024) ret
[RESEND v4 7/7] mm, slab_common: Modify kmalloc_caches[type][idx] to kmalloc_caches[idx][type]
KMALLOC_NORMAL is the most frequently accessed, and kmalloc_caches[] is initialized by different types of the same size. So modifying kmalloc_caches[type][idx] to kmalloc_caches[idx][type] will benefit performance. $ ./scripts/bloat-o-meter vmlinux.patch_1-6 vmlinux.patch_1-7 add/remove: 0/0 grow/shrink: 2/57 up/down: 8/-457 (-449) Function old new delta tg3_self_test 42554259 +4 nf_queue 666 670 +4 kmalloc_slab 97 93 -4 i915_sw_fence_await_dma_fence441 437 -4 __igmp_group_dropped 619 615 -4 gss_import_sec_context 176 170 -6 xhci_alloc_command 212 205 -7 create_kmalloc_caches155 148 -7 xprt_switch_alloc136 128 -8 xhci_segment_alloc 297 289 -8 xhci_ring_alloc 369 361 -8 xhci_mem_init 36643656 -8 xhci_alloc_virt_device 496 488 -8 xhci_alloc_tt_info 346 338 -8 xhci_alloc_stream_info 718 710 -8 xhci_alloc_container_ctx 215 207 -8 xfrm_policy_alloc271 263 -8 tcp_sendmsg_locked 31203112 -8 tcp_md5_do_add 774 766 -8 tcp_fastopen_defer_connect 270 262 -8 sr_read_tochdr.isra 251 243 -8 sr_read_tocentry.isra328 320 -8 sr_is_xa 376 368 -8 sr_get_mcn 260 252 -8 selinux_sk_alloc_security113 105 -8 sdev_evt_send_simple 118 110 -8 sdev_evt_alloc79 71 -8 scsi_probe_and_add_lun 29382930 -8 sbitmap_queue_init_node 418 410 -8 ring_buffer_read_prepare 94 86 -8 request_firmware_nowait 396 388 -8 regulatory_hint_found_beacon 394 386 -8 ohci_urb_enqueue31763168 -8 nla_strdup 142 134 -8 nfs_alloc_seqid 87 79 -8 nfs4_get_state_owner10401032 -8 nfs4_do_close578 570 -8 nf_ct_tmpl_alloc 85 77 -8 mempool_create_node 164 156 -8 ip_setup_cork362 354 -8 ip6_setup_cork 10211013 -8 gss_create_cred 140 132 -8 drm_flip_work_allocate_task 70 62 -8 dma_pool_alloc 410 402 -8 devres_open_group214 206 -8 cfg80211_stop_iface 260 252 -8 cfg80211_sinfo_alloc_tid_stats77 69 -8 cfg80211_port_authorized 212 204 -8 cfg80211_parse_mbssid_data 23972389 -8 cfg80211_ibss_joined 335 327 -8 call_usermodehelper_setup149 141 -8 bpf_prog_alloc_no_stats 182 174 -8 blk_alloc_flush_queue191 183 -8 bdi_alloc_node 195 187 -8 audit_log_d_path 196 188 -8 _netlbl_catmap_getnode 247 239 -8 ip_mc_inc_group 475 467 -8 __i915_sw_fence_await_sw_fence 417 405 -12 ida_alloc_range 955 934 -21 Total: Before=14874316, After=14873867, chg -0.00% Signed-off-by: Pengfei Li --- include/linux/slab.h | 6 +++--- mm/slab.c| 4 ++-- mm/slab_common.c | 8 mm/slub.c| 12 ++-- 4 files changed, 15 insertions(+), 15 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index f53bb6980110..0842db5f7053 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -340,7 +340,7 @@ enum kmalloc_cache_type { #ifndef CONFIG_SLOB extern struct kmem_cache * -kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_CACHE_NUM]; +kmalloc_caches[KMALLOC_CACHE_NUM][NR_KMALLOC_TYPES]; static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags) { @@ -582,7 +582,7 @@ static __always_inline void
[RESEND v4 3/7] mm, slab_common: Use enum kmalloc_cache_type to iterate over kmalloc caches
The type of local variable *type* of new_kmalloc_cache() should be enum kmalloc_cache_type instead of int, so correct it. Signed-off-by: Pengfei Li Acked-by: Vlastimil Babka Acked-by: Roman Gushchin --- mm/slab_common.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/mm/slab_common.c b/mm/slab_common.c index 8b542cfcc4f2..af45b5278fdc 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1192,7 +1192,7 @@ void __init setup_kmalloc_cache_index_table(void) } static void __init -new_kmalloc_cache(int idx, int type, slab_flags_t flags) +new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags) { if (type == KMALLOC_RECLAIM) flags |= SLAB_RECLAIM_ACCOUNT; @@ -1210,7 +1210,8 @@ new_kmalloc_cache(int idx, int type, slab_flags_t flags) */ void __init create_kmalloc_caches(slab_flags_t flags) { - int i, type; + int i; + enum kmalloc_cache_type type; for (type = KMALLOC_NORMAL; type <= KMALLOC_RECLAIM; type++) { for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++) { -- 2.21.0
[RESEND v4 4/7] mm, slab: Return ZERO_SIZE_ALLOC for zero sized kmalloc requests
This is a preparation patch, just replace 0 with ZERO_SIZE_ALLOC as the return value of zero sized requests. Signed-off-by: Pengfei Li --- include/linux/slab.h | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index e773e5764b7b..1f05f68f2c3e 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -121,14 +121,20 @@ #define SLAB_DEACTIVATED ((slab_flags_t __force)0x1000U) /* - * ZERO_SIZE_PTR will be returned for zero sized kmalloc requests. + * ZERO_SIZE_ALLOC will be returned by kmalloc_index() if it was zero sized + * requests. * + * After that, ZERO_SIZE_PTR will be returned by the function that called + * kmalloc_index(). + * Dereferencing ZERO_SIZE_PTR will lead to a distinct access fault. * * ZERO_SIZE_PTR can be passed to kfree though in the same way that NULL can. * Both make kfree a no-op. */ -#define ZERO_SIZE_PTR ((void *)16) +#define ZERO_SIZE_ALLOC(UINT_MAX) + +#define ZERO_SIZE_PTR ((void *)16) #define ZERO_OR_NULL_PTR(x) ((unsigned long)(x) <= \ (unsigned long)ZERO_SIZE_PTR) @@ -350,7 +356,7 @@ static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags) static __always_inline unsigned int kmalloc_index(size_t size) { if (!size) - return 0; + return ZERO_SIZE_ALLOC; if (size <= KMALLOC_MIN_SIZE) return KMALLOC_SHIFT_LOW; @@ -546,7 +552,7 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags) #ifndef CONFIG_SLOB index = kmalloc_index(size); - if (!index) + if (index == ZERO_SIZE_ALLOC) return ZERO_SIZE_PTR; return kmem_cache_alloc_trace( @@ -564,7 +570,7 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node) size <= KMALLOC_MAX_CACHE_SIZE) { unsigned int i = kmalloc_index(size); - if (!i) + if (i == ZERO_SIZE_ALLOC) return ZERO_SIZE_PTR; return kmem_cache_alloc_node_trace( -- 2.21.0
[RESEND v4 6/7] mm, slab_common: Initialize the same size of kmalloc_caches[]
In the current implementation, KMALLOC_RECLAIM is not initialized until all the KMALLOC_NORMAL sizes have been initialized. But for a particular size, create_kmalloc_caches() can be executed faster by initializing different types of kmalloc in order. $ ./scripts/bloat-o-meter vmlinux.old vmlinux.patch_1-5 add/remove: 1/2 grow/shrink: 6/64 up/down: 872/-1113 (-241) Function old new delta create_kmalloc_caches270 214 -56 $ ./scripts/bloat-o-meter vmlinux.old vmlinux.patch_1-6 add/remove: 1/2 grow/shrink: 6/64 up/down: 872/-1172 (-300) Function old new delta create_kmalloc_caches270 155-115 We can see that it really gets the benefits. Besides, KMALLOC_DMA will be initialized after "slab_state = UP", this does not seem to be necessary. Commit f97d5f634d3b ("slab: Common function to create the kmalloc array") introduces create_kmalloc_caches(). And I found that for SLAB, KMALLOC_DMA is initialized before "slab_state = UP". But for SLUB, KMALLOC_DMA is initialized after "slab_state = UP". Based on this fact, I think it is okay to initialize KMALLOC_DMA before "slab_state = UP". Signed-off-by: Pengfei Li --- mm/slab_common.c | 35 --- 1 file changed, 12 insertions(+), 23 deletions(-) diff --git a/mm/slab_common.c b/mm/slab_common.c index 2aed30deb071..e7903bd28b1f 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1165,12 +1165,9 @@ void __init setup_kmalloc_cache_index_table(void) size_index[size_index_elem(i)] = 0; } -static void __init +static __always_inline void __init new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags) { - if (type == KMALLOC_RECLAIM) - flags |= SLAB_RECLAIM_ACCOUNT; - kmalloc_caches[type][idx] = create_kmalloc_cache( kmalloc_info[idx].name[type], kmalloc_info[idx].size, flags, 0, @@ -1185,30 +1182,22 @@ new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags) void __init create_kmalloc_caches(slab_flags_t flags) { int i; - enum kmalloc_cache_type type; - for (type = KMALLOC_NORMAL; type <= KMALLOC_RECLAIM; type++) { - for (i = 0; i < KMALLOC_CACHE_NUM; i++) { - if (!kmalloc_caches[type][i]) - new_kmalloc_cache(i, type, flags); - } - } + for (i = 0; i < KMALLOC_CACHE_NUM; i++) { + if (!kmalloc_caches[KMALLOC_NORMAL][i]) + new_kmalloc_cache(i, KMALLOC_NORMAL, flags); - /* Kmalloc array is now usable */ - slab_state = UP; + new_kmalloc_cache(i, KMALLOC_RECLAIM, + flags | SLAB_RECLAIM_ACCOUNT); #ifdef CONFIG_ZONE_DMA - for (i = 0; i < KMALLOC_CACHE_NUM; i++) { - struct kmem_cache *s = kmalloc_caches[KMALLOC_NORMAL][i]; - - if (s) { - kmalloc_caches[KMALLOC_DMA][i] = create_kmalloc_cache( - kmalloc_info[i].name[KMALLOC_DMA], - kmalloc_info[i].size, - SLAB_CACHE_DMA | flags, 0, 0); - } - } + new_kmalloc_cache(i, KMALLOC_DMA, + flags | SLAB_CACHE_DMA); #endif + } + + /* Kmalloc array is now usable */ + slab_state = UP; } #endif /* !CONFIG_SLOB */ -- 2.21.0
[RESEND v4 2/7] mm, slab: Remove unused kmalloc_size()
The size of kmalloc can be obtained from kmalloc_info[], so remove kmalloc_size() that will not be used anymore. Signed-off-by: Pengfei Li Acked-by: Vlastimil Babka Acked-by: Roman Gushchin --- include/linux/slab.h | 20 mm/slab.c| 5 +++-- mm/slab_common.c | 5 ++--- 3 files changed, 5 insertions(+), 25 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index 56c9c7eed34e..e773e5764b7b 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -557,26 +557,6 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags) return __kmalloc(size, flags); } -/* - * Determine size used for the nth kmalloc cache. - * return size or 0 if a kmalloc cache for that - * size does not exist - */ -static __always_inline unsigned int kmalloc_size(unsigned int n) -{ -#ifndef CONFIG_SLOB - if (n > 2) - return 1U << n; - - if (n == 1 && KMALLOC_MIN_SIZE <= 32) - return 96; - - if (n == 2 && KMALLOC_MIN_SIZE <= 64) - return 192; -#endif - return 0; -} - static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node) { #ifndef CONFIG_SLOB diff --git a/mm/slab.c b/mm/slab.c index c42b6211f42e..7bc4e90e1147 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1248,8 +1248,9 @@ void __init kmem_cache_init(void) */ kmalloc_caches[KMALLOC_NORMAL][INDEX_NODE] = create_kmalloc_cache( kmalloc_info[INDEX_NODE].name[KMALLOC_NORMAL], - kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS, - 0, kmalloc_size(INDEX_NODE)); + kmalloc_info[INDEX_NODE].size, + ARCH_KMALLOC_FLAGS, 0, + kmalloc_info[INDEX_NODE].size); slab_state = PARTIAL_NODE; setup_kmalloc_cache_index_table(); diff --git a/mm/slab_common.c b/mm/slab_common.c index 002e16673581..8b542cfcc4f2 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1239,11 +1239,10 @@ void __init create_kmalloc_caches(slab_flags_t flags) struct kmem_cache *s = kmalloc_caches[KMALLOC_NORMAL][i]; if (s) { - unsigned int size = kmalloc_size(i); - kmalloc_caches[KMALLOC_DMA][i] = create_kmalloc_cache( kmalloc_info[i].name[KMALLOC_DMA], - size, SLAB_CACHE_DMA | flags, 0, 0); + kmalloc_info[i].size, + SLAB_CACHE_DMA | flags, 0, 0); } } #endif -- 2.21.0
[RESEND v4 1/7] mm, slab: Make kmalloc_info[] contain all types of names
There are three types of kmalloc, KMALLOC_NORMAL, KMALLOC_RECLAIM and KMALLOC_DMA. The name of KMALLOC_NORMAL is contained in kmalloc_info[].name, but the names of KMALLOC_RECLAIM and KMALLOC_DMA are dynamically generated by kmalloc_cache_name(). This patch predefines the names of all types of kmalloc to save the time spent dynamically generating names. Besides, remove the kmalloc_cache_name() that is no longer used. Signed-off-by: Pengfei Li Acked-by: Vlastimil Babka Acked-by: Roman Gushchin --- mm/slab.c| 2 +- mm/slab.h| 2 +- mm/slab_common.c | 91 ++-- 3 files changed, 51 insertions(+), 44 deletions(-) diff --git a/mm/slab.c b/mm/slab.c index 9df370558e5d..c42b6211f42e 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1247,7 +1247,7 @@ void __init kmem_cache_init(void) * structures first. Without this, further allocations will bug. */ kmalloc_caches[KMALLOC_NORMAL][INDEX_NODE] = create_kmalloc_cache( - kmalloc_info[INDEX_NODE].name, + kmalloc_info[INDEX_NODE].name[KMALLOC_NORMAL], kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS, 0, kmalloc_size(INDEX_NODE)); slab_state = PARTIAL_NODE; diff --git a/mm/slab.h b/mm/slab.h index 9057b8056b07..2fc8f956906a 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -76,7 +76,7 @@ extern struct kmem_cache *kmem_cache; /* A table of kmalloc cache names and sizes */ extern const struct kmalloc_info_struct { - const char *name; + const char *name[NR_KMALLOC_TYPES]; unsigned int size; } kmalloc_info[]; diff --git a/mm/slab_common.c b/mm/slab_common.c index 807490fe217a..002e16673581 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1092,26 +1092,56 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags) return kmalloc_caches[kmalloc_type(flags)][index]; } +#ifdef CONFIG_ZONE_DMA +#define SET_KMALLOC_SIZE(__size, __short_size) \ +{ \ + .name[KMALLOC_NORMAL] = "kmalloc-" #__short_size, \ + .name[KMALLOC_RECLAIM] = "kmalloc-rcl-" #__short_size, \ + .name[KMALLOC_DMA] = "dma-kmalloc-" #__short_size, \ + .size = __size, \ +} +#else +#define SET_KMALLOC_SIZE(__size, __short_size) \ +{ \ + .name[KMALLOC_NORMAL] = "kmalloc-" #__short_size, \ + .name[KMALLOC_RECLAIM] = "kmalloc-rcl-" #__short_size, \ + .size = __size, \ +} +#endif + /* * kmalloc_info[] is to make slub_debug=,kmalloc-xx option work at boot time. * kmalloc_index() supports up to 2^26=64MB, so the final entry of the table is * kmalloc-67108864. */ const struct kmalloc_info_struct kmalloc_info[] __initconst = { - {NULL, 0}, {"kmalloc-96", 96}, - {"kmalloc-192", 192}, {"kmalloc-8", 8}, - {"kmalloc-16", 16}, {"kmalloc-32", 32}, - {"kmalloc-64", 64}, {"kmalloc-128", 128}, - {"kmalloc-256", 256}, {"kmalloc-512", 512}, - {"kmalloc-1k", 1024}, {"kmalloc-2k", 2048}, - {"kmalloc-4k", 4096}, {"kmalloc-8k", 8192}, - {"kmalloc-16k", 16384}, {"kmalloc-32k", 32768}, - {"kmalloc-64k", 65536}, {"kmalloc-128k", 131072}, - {"kmalloc-256k", 262144}, {"kmalloc-512k", 524288}, - {"kmalloc-1M",1048576}, {"kmalloc-2M",2097152}, - {"kmalloc-4M",4194304}, {"kmalloc-8M",8388608}, - {"kmalloc-16M", 16777216}, {"kmalloc-32M", 33554432}, - {"kmalloc-64M", 67108864} + SET_KMALLOC_SIZE(0, 0), + SET_KMALLOC_SIZE(96, 96), + SET_KMALLOC_SIZE(192, 192), + SET_KMALLOC_SIZE(8, 8), + SET_KMALLOC_SIZE(16, 16), + SET_KMALLOC_SIZE(32, 32), + SET_KMALLOC_SIZE(64, 64), + SET_KMALLOC_SIZE(128, 128), + SET_KMALLOC_SIZE(256, 256), + SET_KMALLOC_SIZE(512, 512), + SET_KMALLOC_SIZE(1024, 1k), + SET_KMALLOC_SIZE(2048, 2k), + SET_KMALLOC_SIZE(4096, 4k), + SET_KMALLOC_SIZE(8192, 8k), + SET_KMALLOC_SIZE(16384, 16k), + SET_KMALLOC_SIZE(32768, 32k), + SET_KMALLOC_SIZE(655
[RESEND v4 0/7] mm, slab: Make kmalloc_info[] contain all types of names
sr_get_mcn 269 252 -17 scsi_probe_and_add_lun 29472930 -17 ring_buffer_read_prepare 103 86 -17 request_firmware_nowait 405 388 -17 ohci_urb_enqueue31853168 -17 nfs_alloc_seqid 96 79 -17 nfs4_get_state_owner10491032 -17 nfs4_do_close587 570 -17 mempool_create_node 173 156 -17 ip6_setup_cork 10301013 -17 ida_alloc_range 951 934 -17 gss_import_sec_context 187 170 -17 dma_pool_alloc 419 402 -17 devres_open_group223 206 -17 cfg80211_parse_mbssid_data 24062389 -17 ip_setup_cork374 354 -20 kmalloc_caches 336 312 -24 __i915_sw_fence_await_sw_fence 429 405 -24 kmalloc_cache_name57 - -57 new_kmalloc_cache112 --112 create_kmalloc_caches270 148-122 kmalloc_info 432 8-424 Total: Before=14874616, After=14873867, chg -0.01% Pengfei Li (7): mm, slab: Make kmalloc_info[] contain all types of names mm, slab: Remove unused kmalloc_size() mm, slab_common: Use enum kmalloc_cache_type to iterate over kmalloc caches mm, slab: Return ZERO_SIZE_ALLOC for zero sized kmalloc requests mm, slab_common: Make kmalloc_caches[] start at size KMALLOC_MIN_SIZE mm, slab_common: Initialize the same size of kmalloc_caches[] mm, slab_common: Modify kmalloc_caches[type][idx] to kmalloc_caches[idx][type] include/linux/slab.h | 136 ++ mm/slab.c| 11 ++- mm/slab.h| 10 +- mm/slab_common.c | 224 ++- mm/slub.c| 12 +-- 5 files changed, 183 insertions(+), 210 deletions(-) -- 2.21.0
[PATCH v4 7/7] mm, slab_common: Modify kmalloc_caches[type][idx] to kmalloc_caches[idx][type]
KMALLOC_NORMAL is the most frequently accessed, and kmalloc_caches[] is initialized by different types of the same size. So modifying kmalloc_caches[type][idx] to kmalloc_caches[idx][type] will benefit performance. $ ./scripts/bloat-o-meter vmlinux.patch_1-6 vmlinux.patch_1-7 add/remove: 0/0 grow/shrink: 2/57 up/down: 8/-457 (-449) Function old new delta tg3_self_test 42554259 +4 nf_queue 666 670 +4 kmalloc_slab 97 93 -4 i915_sw_fence_await_dma_fence441 437 -4 __igmp_group_dropped 619 615 -4 gss_import_sec_context 176 170 -6 xhci_alloc_command 212 205 -7 create_kmalloc_caches155 148 -7 xprt_switch_alloc136 128 -8 xhci_segment_alloc 297 289 -8 xhci_ring_alloc 369 361 -8 xhci_mem_init 36643656 -8 xhci_alloc_virt_device 496 488 -8 xhci_alloc_tt_info 346 338 -8 xhci_alloc_stream_info 718 710 -8 xhci_alloc_container_ctx 215 207 -8 xfrm_policy_alloc271 263 -8 tcp_sendmsg_locked 31203112 -8 tcp_md5_do_add 774 766 -8 tcp_fastopen_defer_connect 270 262 -8 sr_read_tochdr.isra 251 243 -8 sr_read_tocentry.isra328 320 -8 sr_is_xa 376 368 -8 sr_get_mcn 260 252 -8 selinux_sk_alloc_security113 105 -8 sdev_evt_send_simple 118 110 -8 sdev_evt_alloc79 71 -8 scsi_probe_and_add_lun 29382930 -8 sbitmap_queue_init_node 418 410 -8 ring_buffer_read_prepare 94 86 -8 request_firmware_nowait 396 388 -8 regulatory_hint_found_beacon 394 386 -8 ohci_urb_enqueue31763168 -8 nla_strdup 142 134 -8 nfs_alloc_seqid 87 79 -8 nfs4_get_state_owner10401032 -8 nfs4_do_close578 570 -8 nf_ct_tmpl_alloc 85 77 -8 mempool_create_node 164 156 -8 ip_setup_cork362 354 -8 ip6_setup_cork 10211013 -8 gss_create_cred 140 132 -8 drm_flip_work_allocate_task 70 62 -8 dma_pool_alloc 410 402 -8 devres_open_group214 206 -8 cfg80211_stop_iface 260 252 -8 cfg80211_sinfo_alloc_tid_stats77 69 -8 cfg80211_port_authorized 212 204 -8 cfg80211_parse_mbssid_data 23972389 -8 cfg80211_ibss_joined 335 327 -8 call_usermodehelper_setup149 141 -8 bpf_prog_alloc_no_stats 182 174 -8 blk_alloc_flush_queue191 183 -8 bdi_alloc_node 195 187 -8 audit_log_d_path 196 188 -8 _netlbl_catmap_getnode 247 239 -8 ip_mc_inc_group 475 467 -8 __i915_sw_fence_await_sw_fence 417 405 -12 ida_alloc_range 955 934 -21 Total: Before=14874316, After=14873867, chg -0.00% Signed-off-by: Pengfei Li --- include/linux/slab.h | 6 +++--- mm/slab.c| 4 ++-- mm/slab_common.c | 8 mm/slub.c| 12 ++-- 4 files changed, 15 insertions(+), 15 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index f53bb6980110..0842db5f7053 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -340,7 +340,7 @@ enum kmalloc_cache_type { #ifndef CONFIG_SLOB extern struct kmem_cache * -kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_CACHE_NUM]; +kmalloc_caches[KMALLOC_CACHE_NUM][NR_KMALLOC_TYPES]; static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags) { @@ -582,7 +582,7 @@ static __always_inline void
[PATCH v4 7/7] mm, slab_common: modify kmalloc_caches[type][idx] to kmalloc_caches[idx][type]
KMALLOC_NORMAL is the most frequently accessed, and kmalloc_caches[] is initialized by different types of the same size. So modifying kmalloc_caches[type][idx] to kmalloc_caches[idx][type] will benefit performance. $ ./scripts/bloat-o-meter vmlinux.patch_1-6 vmlinux.patch_1-7 add/remove: 0/0 grow/shrink: 2/57 up/down: 8/-457 (-449) Function old new delta tg3_self_test 42554259 +4 nf_queue 666 670 +4 kmalloc_slab 97 93 -4 i915_sw_fence_await_dma_fence441 437 -4 __igmp_group_dropped 619 615 -4 gss_import_sec_context 176 170 -6 xhci_alloc_command 212 205 -7 create_kmalloc_caches155 148 -7 xprt_switch_alloc136 128 -8 xhci_segment_alloc 297 289 -8 xhci_ring_alloc 369 361 -8 xhci_mem_init 36643656 -8 xhci_alloc_virt_device 496 488 -8 xhci_alloc_tt_info 346 338 -8 xhci_alloc_stream_info 718 710 -8 xhci_alloc_container_ctx 215 207 -8 xfrm_policy_alloc271 263 -8 tcp_sendmsg_locked 31203112 -8 tcp_md5_do_add 774 766 -8 tcp_fastopen_defer_connect 270 262 -8 sr_read_tochdr.isra 251 243 -8 sr_read_tocentry.isra328 320 -8 sr_is_xa 376 368 -8 sr_get_mcn 260 252 -8 selinux_sk_alloc_security113 105 -8 sdev_evt_send_simple 118 110 -8 sdev_evt_alloc79 71 -8 scsi_probe_and_add_lun 29382930 -8 sbitmap_queue_init_node 418 410 -8 ring_buffer_read_prepare 94 86 -8 request_firmware_nowait 396 388 -8 regulatory_hint_found_beacon 394 386 -8 ohci_urb_enqueue31763168 -8 nla_strdup 142 134 -8 nfs_alloc_seqid 87 79 -8 nfs4_get_state_owner10401032 -8 nfs4_do_close578 570 -8 nf_ct_tmpl_alloc 85 77 -8 mempool_create_node 164 156 -8 ip_setup_cork362 354 -8 ip6_setup_cork 10211013 -8 gss_create_cred 140 132 -8 drm_flip_work_allocate_task 70 62 -8 dma_pool_alloc 410 402 -8 devres_open_group214 206 -8 cfg80211_stop_iface 260 252 -8 cfg80211_sinfo_alloc_tid_stats77 69 -8 cfg80211_port_authorized 212 204 -8 cfg80211_parse_mbssid_data 23972389 -8 cfg80211_ibss_joined 335 327 -8 call_usermodehelper_setup149 141 -8 bpf_prog_alloc_no_stats 182 174 -8 blk_alloc_flush_queue191 183 -8 bdi_alloc_node 195 187 -8 audit_log_d_path 196 188 -8 _netlbl_catmap_getnode 247 239 -8 ip_mc_inc_group 475 467 -8 __i915_sw_fence_await_sw_fence 417 405 -12 ida_alloc_range 955 934 -21 Total: Before=14874316, After=14873867, chg -0.00% Signed-off-by: Pengfei Li --- include/linux/slab.h | 6 +++--- mm/slab.c| 4 ++-- mm/slab_common.c | 8 mm/slub.c| 12 ++-- 4 files changed, 15 insertions(+), 15 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index f53bb6980110..0842db5f7053 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -340,7 +340,7 @@ enum kmalloc_cache_type { #ifndef CONFIG_SLOB extern struct kmem_cache * -kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_CACHE_NUM]; +kmalloc_caches[KMALLOC_CACHE_NUM][NR_KMALLOC_TYPES]; static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags) { @@ -582,7 +582,7 @@ static __always_inline void
[PATCH v4 6/7] mm, slab_common: Initialize the same size of kmalloc_caches[]
In the current implementation, KMALLOC_RECLAIM is not initialized until all the KMALLOC_NORMAL sizes have been initialized. But for a particular size, create_kmalloc_caches() can be executed faster by initializing different types of kmalloc in order. $ ./scripts/bloat-o-meter vmlinux.old vmlinux.patch_1-5 add/remove: 1/2 grow/shrink: 6/64 up/down: 872/-1113 (-241) Function old new delta create_kmalloc_caches270 214 -56 $ ./scripts/bloat-o-meter vmlinux.old vmlinux.patch_1-6 add/remove: 1/2 grow/shrink: 6/64 up/down: 872/-1172 (-300) Function old new delta create_kmalloc_caches270 155-115 We can see that it really gets the benefits. Besides, KMALLOC_DMA will be initialized after "slab_state = UP", this does not seem to be necessary. Commit f97d5f634d3b ("slab: Common function to create the kmalloc array") introduces create_kmalloc_caches(). And I found that for SLAB, KMALLOC_DMA is initialized before "slab_state = UP". But for SLUB, KMALLOC_DMA is initialized after "slab_state = UP". Based on this fact, I think it is okay to initialize KMALLOC_DMA before "slab_state = UP". Signed-off-by: Pengfei Li --- mm/slab_common.c | 35 --- 1 file changed, 12 insertions(+), 23 deletions(-) diff --git a/mm/slab_common.c b/mm/slab_common.c index 2aed30deb071..e7903bd28b1f 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1165,12 +1165,9 @@ void __init setup_kmalloc_cache_index_table(void) size_index[size_index_elem(i)] = 0; } -static void __init +static __always_inline void __init new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags) { - if (type == KMALLOC_RECLAIM) - flags |= SLAB_RECLAIM_ACCOUNT; - kmalloc_caches[type][idx] = create_kmalloc_cache( kmalloc_info[idx].name[type], kmalloc_info[idx].size, flags, 0, @@ -1185,30 +1182,22 @@ new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags) void __init create_kmalloc_caches(slab_flags_t flags) { int i; - enum kmalloc_cache_type type; - for (type = KMALLOC_NORMAL; type <= KMALLOC_RECLAIM; type++) { - for (i = 0; i < KMALLOC_CACHE_NUM; i++) { - if (!kmalloc_caches[type][i]) - new_kmalloc_cache(i, type, flags); - } - } + for (i = 0; i < KMALLOC_CACHE_NUM; i++) { + if (!kmalloc_caches[KMALLOC_NORMAL][i]) + new_kmalloc_cache(i, KMALLOC_NORMAL, flags); - /* Kmalloc array is now usable */ - slab_state = UP; + new_kmalloc_cache(i, KMALLOC_RECLAIM, + flags | SLAB_RECLAIM_ACCOUNT); #ifdef CONFIG_ZONE_DMA - for (i = 0; i < KMALLOC_CACHE_NUM; i++) { - struct kmem_cache *s = kmalloc_caches[KMALLOC_NORMAL][i]; - - if (s) { - kmalloc_caches[KMALLOC_DMA][i] = create_kmalloc_cache( - kmalloc_info[i].name[KMALLOC_DMA], - kmalloc_info[i].size, - SLAB_CACHE_DMA | flags, 0, 0); - } - } + new_kmalloc_cache(i, KMALLOC_DMA, + flags | SLAB_CACHE_DMA); #endif + } + + /* Kmalloc array is now usable */ + slab_state = UP; } #endif /* !CONFIG_SLOB */ -- 2.21.0
[PATCH v4 4/7] mm, slab: return ZERO_SIZE_ALLOC for zero sized kmalloc requests
This is a preparation patch, just replace 0 with ZERO_SIZE_ALLOC as the return value of zero sized requests. Signed-off-by: Pengfei Li --- include/linux/slab.h | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index e773e5764b7b..1f05f68f2c3e 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -121,14 +121,20 @@ #define SLAB_DEACTIVATED ((slab_flags_t __force)0x1000U) /* - * ZERO_SIZE_PTR will be returned for zero sized kmalloc requests. + * ZERO_SIZE_ALLOC will be returned by kmalloc_index() if it was zero sized + * requests. * + * After that, ZERO_SIZE_PTR will be returned by the function that called + * kmalloc_index(). + * Dereferencing ZERO_SIZE_PTR will lead to a distinct access fault. * * ZERO_SIZE_PTR can be passed to kfree though in the same way that NULL can. * Both make kfree a no-op. */ -#define ZERO_SIZE_PTR ((void *)16) +#define ZERO_SIZE_ALLOC(UINT_MAX) + +#define ZERO_SIZE_PTR ((void *)16) #define ZERO_OR_NULL_PTR(x) ((unsigned long)(x) <= \ (unsigned long)ZERO_SIZE_PTR) @@ -350,7 +356,7 @@ static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags) static __always_inline unsigned int kmalloc_index(size_t size) { if (!size) - return 0; + return ZERO_SIZE_ALLOC; if (size <= KMALLOC_MIN_SIZE) return KMALLOC_SHIFT_LOW; @@ -546,7 +552,7 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags) #ifndef CONFIG_SLOB index = kmalloc_index(size); - if (!index) + if (index == ZERO_SIZE_ALLOC) return ZERO_SIZE_PTR; return kmem_cache_alloc_trace( @@ -564,7 +570,7 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node) size <= KMALLOC_MAX_CACHE_SIZE) { unsigned int i = kmalloc_index(size); - if (!i) + if (i == ZERO_SIZE_ALLOC) return ZERO_SIZE_PTR; return kmem_cache_alloc_node_trace( -- 2.21.0
[PATCH v4 4/7] mm, slab: Return ZERO_SIZE_ALLOC for zero sized kmalloc requests
This is a preparation patch, just replace 0 with ZERO_SIZE_ALLOC as the return value of zero sized requests. Signed-off-by: Pengfei Li --- include/linux/slab.h | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index e773e5764b7b..1f05f68f2c3e 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -121,14 +121,20 @@ #define SLAB_DEACTIVATED ((slab_flags_t __force)0x1000U) /* - * ZERO_SIZE_PTR will be returned for zero sized kmalloc requests. + * ZERO_SIZE_ALLOC will be returned by kmalloc_index() if it was zero sized + * requests. * + * After that, ZERO_SIZE_PTR will be returned by the function that called + * kmalloc_index(). + * Dereferencing ZERO_SIZE_PTR will lead to a distinct access fault. * * ZERO_SIZE_PTR can be passed to kfree though in the same way that NULL can. * Both make kfree a no-op. */ -#define ZERO_SIZE_PTR ((void *)16) +#define ZERO_SIZE_ALLOC(UINT_MAX) + +#define ZERO_SIZE_PTR ((void *)16) #define ZERO_OR_NULL_PTR(x) ((unsigned long)(x) <= \ (unsigned long)ZERO_SIZE_PTR) @@ -350,7 +356,7 @@ static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags) static __always_inline unsigned int kmalloc_index(size_t size) { if (!size) - return 0; + return ZERO_SIZE_ALLOC; if (size <= KMALLOC_MIN_SIZE) return KMALLOC_SHIFT_LOW; @@ -546,7 +552,7 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags) #ifndef CONFIG_SLOB index = kmalloc_index(size); - if (!index) + if (index == ZERO_SIZE_ALLOC) return ZERO_SIZE_PTR; return kmem_cache_alloc_trace( @@ -564,7 +570,7 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node) size <= KMALLOC_MAX_CACHE_SIZE) { unsigned int i = kmalloc_index(size); - if (!i) + if (i == ZERO_SIZE_ALLOC) return ZERO_SIZE_PTR; return kmem_cache_alloc_node_trace( -- 2.21.0
[PATCH v4 6/7] mm, slab_common: initialize the same size of kmalloc_caches[]
In the current implementation, KMALLOC_RECLAIM is not initialized until all the KMALLOC_NORMAL sizes have been initialized. But for a particular size, create_kmalloc_caches() can be executed faster by initializing different types of kmalloc in order. $ ./scripts/bloat-o-meter vmlinux.old vmlinux.patch_1-5 add/remove: 1/2 grow/shrink: 6/64 up/down: 872/-1113 (-241) Function old new delta create_kmalloc_caches270 214 -56 $ ./scripts/bloat-o-meter vmlinux.old vmlinux.patch_1-6 add/remove: 1/2 grow/shrink: 6/64 up/down: 872/-1172 (-300) Function old new delta create_kmalloc_caches270 155-115 We can see that it really gets the benefits. Besides, KMALLOC_DMA will be initialized after "slab_state = UP", this does not seem to be necessary. Commit f97d5f634d3b ("slab: Common function to create the kmalloc array") introduces create_kmalloc_caches(). And I found that for SLAB, KMALLOC_DMA is initialized before "slab_state = UP". But for SLUB, KMALLOC_DMA is initialized after "slab_state = UP". Based on this fact, I think it is okay to initialize KMALLOC_DMA before "slab_state = UP". Signed-off-by: Pengfei Li --- mm/slab_common.c | 35 --- 1 file changed, 12 insertions(+), 23 deletions(-) diff --git a/mm/slab_common.c b/mm/slab_common.c index 2aed30deb071..e7903bd28b1f 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1165,12 +1165,9 @@ void __init setup_kmalloc_cache_index_table(void) size_index[size_index_elem(i)] = 0; } -static void __init +static __always_inline void __init new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags) { - if (type == KMALLOC_RECLAIM) - flags |= SLAB_RECLAIM_ACCOUNT; - kmalloc_caches[type][idx] = create_kmalloc_cache( kmalloc_info[idx].name[type], kmalloc_info[idx].size, flags, 0, @@ -1185,30 +1182,22 @@ new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags) void __init create_kmalloc_caches(slab_flags_t flags) { int i; - enum kmalloc_cache_type type; - for (type = KMALLOC_NORMAL; type <= KMALLOC_RECLAIM; type++) { - for (i = 0; i < KMALLOC_CACHE_NUM; i++) { - if (!kmalloc_caches[type][i]) - new_kmalloc_cache(i, type, flags); - } - } + for (i = 0; i < KMALLOC_CACHE_NUM; i++) { + if (!kmalloc_caches[KMALLOC_NORMAL][i]) + new_kmalloc_cache(i, KMALLOC_NORMAL, flags); - /* Kmalloc array is now usable */ - slab_state = UP; + new_kmalloc_cache(i, KMALLOC_RECLAIM, + flags | SLAB_RECLAIM_ACCOUNT); #ifdef CONFIG_ZONE_DMA - for (i = 0; i < KMALLOC_CACHE_NUM; i++) { - struct kmem_cache *s = kmalloc_caches[KMALLOC_NORMAL][i]; - - if (s) { - kmalloc_caches[KMALLOC_DMA][i] = create_kmalloc_cache( - kmalloc_info[i].name[KMALLOC_DMA], - kmalloc_info[i].size, - SLAB_CACHE_DMA | flags, 0, 0); - } - } + new_kmalloc_cache(i, KMALLOC_DMA, + flags | SLAB_CACHE_DMA); #endif + } + + /* Kmalloc array is now usable */ + slab_state = UP; } #endif /* !CONFIG_SLOB */ -- 2.21.0
[PATCH v4 5/7] mm, slab_common: make kmalloc_caches[] start at size KMALLOC_MIN_SIZE
96 -9 ohci_urb_enqueue31853176 -9 nfs_alloc_seqid 96 87 -9 nfs4_get_state_owner10491040 -9 nfs4_do_close587 578 -9 mempool_create_node 173 164 -9 ip6_setup_cork 10301021 -9 dma_pool_alloc 419 410 -9 devres_open_group223 214 -9 cfg80211_parse_mbssid_data 24062397 -9 __igmp_group_dropped 629 619 -10 gss_import_sec_context 187 176 -11 ip_setup_cork374 362 -12 __i915_sw_fence_await_sw_fence 429 417 -12 kmalloc_caches 336 312 -24 create_kmalloc_caches270 214 -56 kmalloc_cache_name57 - -57 new_kmalloc_cache112 --112 kmalloc_info 432 8-424 Total: Before=14874616, After=14874375, chg -0.00% Signed-off-by: Pengfei Li --- include/linux/slab.h | 96 -- mm/slab.h| 10 +-- mm/slab_common.c | 158 --- mm/slub.c| 12 ++-- 4 files changed, 133 insertions(+), 143 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index 1f05f68f2c3e..f53bb6980110 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -297,6 +297,23 @@ static inline void __check_heap_object(const void *ptr, unsigned long n, #define KMALLOC_MIN_SIZE (1 << KMALLOC_SHIFT_LOW) #endif +#define KMALLOC_CACHE_MIN_NUM (KMALLOC_SHIFT_HIGH - KMALLOC_SHIFT_LOW + 1) + +#if KMALLOC_MIN_SIZE > 64 + #define KMALLOC_SIZE_96_EXIST (0) + #define KMALLOC_SIZE_192_EXIST (0) +#elif KMALLOC_MIN_SIZE > 32 + #define KMALLOC_SIZE_96_EXIST (0) + #define KMALLOC_SIZE_192_EXIST (1) +#else + #define KMALLOC_SIZE_96_EXIST (1) + #define KMALLOC_SIZE_192_EXIST (1) +#endif + +#define KMALLOC_CACHE_NUM (KMALLOC_CACHE_MIN_NUM \ + + KMALLOC_SIZE_96_EXIST \ + + KMALLOC_SIZE_192_EXIST) + /* * This restriction comes from byte sized index implementation. * Page size is normally 2^12 bytes and, in this case, if we want to use @@ -323,7 +340,7 @@ enum kmalloc_cache_type { #ifndef CONFIG_SLOB extern struct kmem_cache * -kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1]; +kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_CACHE_NUM]; static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags) { @@ -345,13 +362,18 @@ static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags) #endif } +/* kmalloc_index adjust size: (0, 96] */ +#define KMALLOC_IDX_ADJ_0 (KMALLOC_SHIFT_LOW) + +/* kmalloc_index adjust size: (96, 192] */ +#define KMALLOC_IDX_ADJ_1 (KMALLOC_IDX_ADJ_0 - KMALLOC_SIZE_96_EXIST) + +/* kmalloc_index adjust size: (192, N] */ +#define KMALLOC_IDX_ADJ_2 (KMALLOC_IDX_ADJ_1 - KMALLOC_SIZE_192_EXIST) + /* * Figure out which kmalloc slab an allocation of a certain size * belongs to. - * 0 = zero alloc - * 1 = 65 .. 96 bytes - * 2 = 129 .. 192 bytes - * n = 2^(n-1)+1 .. 2^n */ static __always_inline unsigned int kmalloc_index(size_t size) { @@ -359,36 +381,40 @@ static __always_inline unsigned int kmalloc_index(size_t size) return ZERO_SIZE_ALLOC; if (size <= KMALLOC_MIN_SIZE) - return KMALLOC_SHIFT_LOW; - - if (KMALLOC_MIN_SIZE <= 32 && size > 64 && size <= 96) - return 1; - if (KMALLOC_MIN_SIZE <= 64 && size > 128 && size <= 192) - return 2; - if (size <= 8) return 3; - if (size <= 16) return 4; - if (size <= 32) return 5; - if (size <= 64) return 6; - if (size <=128) return 7; - if (size <=256) return 8; - if (size <=512) return 9; - if (size <= 1024) return 10; - if (size <= 2 * 1024) return 11; - if (size <= 4 * 1024) return 12; - if (size <= 8 * 1024) return 13; - if (size <= 16 * 1024) return 14; - if (size <= 32 * 1024) return 15; - if (size <= 64 * 1024) return 16; - if (size <= 128 * 1024) return 17; - if (size <= 256 * 1024) return 18; - if (size <= 512 * 1024) return 19; - if (size <= 1024 * 1024) return 20; - if (size <= 2 * 1024 * 1024) return 21; - if (size <= 4 * 1024 * 1024) return 22; - if (size <= 8 * 1024 * 1024) ret
[PATCH v4 5/7] mm, slab_common: Make kmalloc_caches[] start at size KMALLOC_MIN_SIZE
96 -9 ohci_urb_enqueue31853176 -9 nfs_alloc_seqid 96 87 -9 nfs4_get_state_owner10491040 -9 nfs4_do_close587 578 -9 mempool_create_node 173 164 -9 ip6_setup_cork 10301021 -9 dma_pool_alloc 419 410 -9 devres_open_group223 214 -9 cfg80211_parse_mbssid_data 24062397 -9 __igmp_group_dropped 629 619 -10 gss_import_sec_context 187 176 -11 ip_setup_cork374 362 -12 __i915_sw_fence_await_sw_fence 429 417 -12 kmalloc_caches 336 312 -24 create_kmalloc_caches270 214 -56 kmalloc_cache_name57 - -57 new_kmalloc_cache112 --112 kmalloc_info 432 8-424 Total: Before=14874616, After=14874375, chg -0.00% Signed-off-by: Pengfei Li --- include/linux/slab.h | 96 -- mm/slab.h| 10 +-- mm/slab_common.c | 158 --- mm/slub.c| 12 ++-- 4 files changed, 133 insertions(+), 143 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index 1f05f68f2c3e..f53bb6980110 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -297,6 +297,23 @@ static inline void __check_heap_object(const void *ptr, unsigned long n, #define KMALLOC_MIN_SIZE (1 << KMALLOC_SHIFT_LOW) #endif +#define KMALLOC_CACHE_MIN_NUM (KMALLOC_SHIFT_HIGH - KMALLOC_SHIFT_LOW + 1) + +#if KMALLOC_MIN_SIZE > 64 + #define KMALLOC_SIZE_96_EXIST (0) + #define KMALLOC_SIZE_192_EXIST (0) +#elif KMALLOC_MIN_SIZE > 32 + #define KMALLOC_SIZE_96_EXIST (0) + #define KMALLOC_SIZE_192_EXIST (1) +#else + #define KMALLOC_SIZE_96_EXIST (1) + #define KMALLOC_SIZE_192_EXIST (1) +#endif + +#define KMALLOC_CACHE_NUM (KMALLOC_CACHE_MIN_NUM \ + + KMALLOC_SIZE_96_EXIST \ + + KMALLOC_SIZE_192_EXIST) + /* * This restriction comes from byte sized index implementation. * Page size is normally 2^12 bytes and, in this case, if we want to use @@ -323,7 +340,7 @@ enum kmalloc_cache_type { #ifndef CONFIG_SLOB extern struct kmem_cache * -kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1]; +kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_CACHE_NUM]; static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags) { @@ -345,13 +362,18 @@ static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags) #endif } +/* kmalloc_index adjust size: (0, 96] */ +#define KMALLOC_IDX_ADJ_0 (KMALLOC_SHIFT_LOW) + +/* kmalloc_index adjust size: (96, 192] */ +#define KMALLOC_IDX_ADJ_1 (KMALLOC_IDX_ADJ_0 - KMALLOC_SIZE_96_EXIST) + +/* kmalloc_index adjust size: (192, N] */ +#define KMALLOC_IDX_ADJ_2 (KMALLOC_IDX_ADJ_1 - KMALLOC_SIZE_192_EXIST) + /* * Figure out which kmalloc slab an allocation of a certain size * belongs to. - * 0 = zero alloc - * 1 = 65 .. 96 bytes - * 2 = 129 .. 192 bytes - * n = 2^(n-1)+1 .. 2^n */ static __always_inline unsigned int kmalloc_index(size_t size) { @@ -359,36 +381,40 @@ static __always_inline unsigned int kmalloc_index(size_t size) return ZERO_SIZE_ALLOC; if (size <= KMALLOC_MIN_SIZE) - return KMALLOC_SHIFT_LOW; - - if (KMALLOC_MIN_SIZE <= 32 && size > 64 && size <= 96) - return 1; - if (KMALLOC_MIN_SIZE <= 64 && size > 128 && size <= 192) - return 2; - if (size <= 8) return 3; - if (size <= 16) return 4; - if (size <= 32) return 5; - if (size <= 64) return 6; - if (size <=128) return 7; - if (size <=256) return 8; - if (size <=512) return 9; - if (size <= 1024) return 10; - if (size <= 2 * 1024) return 11; - if (size <= 4 * 1024) return 12; - if (size <= 8 * 1024) return 13; - if (size <= 16 * 1024) return 14; - if (size <= 32 * 1024) return 15; - if (size <= 64 * 1024) return 16; - if (size <= 128 * 1024) return 17; - if (size <= 256 * 1024) return 18; - if (size <= 512 * 1024) return 19; - if (size <= 1024 * 1024) return 20; - if (size <= 2 * 1024 * 1024) return 21; - if (size <= 4 * 1024 * 1024) return 22; - if (size <= 8 * 1024 * 1024) ret
[PATCH v4 3/7] mm, slab_common: Use enum kmalloc_cache_type to iterate over kmalloc caches
The type of local variable *type* of new_kmalloc_cache() should be enum kmalloc_cache_type instead of int, so correct it. Signed-off-by: Pengfei Li Acked-by: Vlastimil Babka Acked-by: Roman Gushchin --- mm/slab_common.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/mm/slab_common.c b/mm/slab_common.c index 8b542cfcc4f2..af45b5278fdc 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1192,7 +1192,7 @@ void __init setup_kmalloc_cache_index_table(void) } static void __init -new_kmalloc_cache(int idx, int type, slab_flags_t flags) +new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags) { if (type == KMALLOC_RECLAIM) flags |= SLAB_RECLAIM_ACCOUNT; @@ -1210,7 +1210,8 @@ new_kmalloc_cache(int idx, int type, slab_flags_t flags) */ void __init create_kmalloc_caches(slab_flags_t flags) { - int i, type; + int i; + enum kmalloc_cache_type type; for (type = KMALLOC_NORMAL; type <= KMALLOC_RECLAIM; type++) { for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++) { -- 2.21.0
[PATCH v4 2/7] mm, slab: Remove unused kmalloc_size()
The size of kmalloc can be obtained from kmalloc_info[], so remove kmalloc_size() that will not be used anymore. Signed-off-by: Pengfei Li Acked-by: Vlastimil Babka Acked-by: Roman Gushchin --- include/linux/slab.h | 20 mm/slab.c| 5 +++-- mm/slab_common.c | 5 ++--- 3 files changed, 5 insertions(+), 25 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index 56c9c7eed34e..e773e5764b7b 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -557,26 +557,6 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags) return __kmalloc(size, flags); } -/* - * Determine size used for the nth kmalloc cache. - * return size or 0 if a kmalloc cache for that - * size does not exist - */ -static __always_inline unsigned int kmalloc_size(unsigned int n) -{ -#ifndef CONFIG_SLOB - if (n > 2) - return 1U << n; - - if (n == 1 && KMALLOC_MIN_SIZE <= 32) - return 96; - - if (n == 2 && KMALLOC_MIN_SIZE <= 64) - return 192; -#endif - return 0; -} - static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node) { #ifndef CONFIG_SLOB diff --git a/mm/slab.c b/mm/slab.c index c42b6211f42e..7bc4e90e1147 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1248,8 +1248,9 @@ void __init kmem_cache_init(void) */ kmalloc_caches[KMALLOC_NORMAL][INDEX_NODE] = create_kmalloc_cache( kmalloc_info[INDEX_NODE].name[KMALLOC_NORMAL], - kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS, - 0, kmalloc_size(INDEX_NODE)); + kmalloc_info[INDEX_NODE].size, + ARCH_KMALLOC_FLAGS, 0, + kmalloc_info[INDEX_NODE].size); slab_state = PARTIAL_NODE; setup_kmalloc_cache_index_table(); diff --git a/mm/slab_common.c b/mm/slab_common.c index 002e16673581..8b542cfcc4f2 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1239,11 +1239,10 @@ void __init create_kmalloc_caches(slab_flags_t flags) struct kmem_cache *s = kmalloc_caches[KMALLOC_NORMAL][i]; if (s) { - unsigned int size = kmalloc_size(i); - kmalloc_caches[KMALLOC_DMA][i] = create_kmalloc_cache( kmalloc_info[i].name[KMALLOC_DMA], - size, SLAB_CACHE_DMA | flags, 0, 0); + kmalloc_info[i].size, + SLAB_CACHE_DMA | flags, 0, 0); } } #endif -- 2.21.0
[PATCH v4 3/7] mm, slab_common: use enum kmalloc_cache_type to iterate over kmalloc caches
The type of local variable *type* of new_kmalloc_cache() should be enum kmalloc_cache_type instead of int, so correct it. Signed-off-by: Pengfei Li Acked-by: Vlastimil Babka Acked-by: Roman Gushchin --- mm/slab_common.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/mm/slab_common.c b/mm/slab_common.c index 8b542cfcc4f2..af45b5278fdc 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1192,7 +1192,7 @@ void __init setup_kmalloc_cache_index_table(void) } static void __init -new_kmalloc_cache(int idx, int type, slab_flags_t flags) +new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags) { if (type == KMALLOC_RECLAIM) flags |= SLAB_RECLAIM_ACCOUNT; @@ -1210,7 +1210,8 @@ new_kmalloc_cache(int idx, int type, slab_flags_t flags) */ void __init create_kmalloc_caches(slab_flags_t flags) { - int i, type; + int i; + enum kmalloc_cache_type type; for (type = KMALLOC_NORMAL; type <= KMALLOC_RECLAIM; type++) { for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++) { -- 2.21.0
[PATCH v4 0/7] mm, slab: Make kmalloc_info[] contain all types of names
sr_get_mcn 269 252 -17 scsi_probe_and_add_lun 29472930 -17 ring_buffer_read_prepare 103 86 -17 request_firmware_nowait 405 388 -17 ohci_urb_enqueue31853168 -17 nfs_alloc_seqid 96 79 -17 nfs4_get_state_owner10491032 -17 nfs4_do_close587 570 -17 mempool_create_node 173 156 -17 ip6_setup_cork 10301013 -17 ida_alloc_range 951 934 -17 gss_import_sec_context 187 170 -17 dma_pool_alloc 419 402 -17 devres_open_group223 206 -17 cfg80211_parse_mbssid_data 24062389 -17 ip_setup_cork374 354 -20 kmalloc_caches 336 312 -24 __i915_sw_fence_await_sw_fence 429 405 -24 kmalloc_cache_name57 - -57 new_kmalloc_cache112 --112 create_kmalloc_caches270 148-122 kmalloc_info 432 8-424 Total: Before=14874616, After=14873867, chg -0.01% Pengfei Li (7): mm, slab: Make kmalloc_info[] contain all types of names mm, slab: Remove unused kmalloc_size() mm, slab_common: Use enum kmalloc_cache_type to iterate over kmalloc caches mm, slab: Return ZERO_SIZE_ALLOC for zero sized kmalloc requests mm, slab_common: Make kmalloc_caches[] start at size KMALLOC_MIN_SIZE mm, slab_common: Initialize the same size of kmalloc_caches[] mm, slab_common: Modify kmalloc_caches[type][idx] to kmalloc_caches[idx][type] include/linux/slab.h | 136 ++ mm/slab.c| 11 ++- mm/slab.h| 10 +- mm/slab_common.c | 224 ++- mm/slub.c| 12 +-- 5 files changed, 183 insertions(+), 210 deletions(-) -- 2.21.0
[PATCH v4 1/7] mm, slab: Make kmalloc_info[] contain all types of names
There are three types of kmalloc, KMALLOC_NORMAL, KMALLOC_RECLAIM and KMALLOC_DMA. The name of KMALLOC_NORMAL is contained in kmalloc_info[].name, but the names of KMALLOC_RECLAIM and KMALLOC_DMA are dynamically generated by kmalloc_cache_name(). This patch predefines the names of all types of kmalloc to save the time spent dynamically generating names. Besides, remove the kmalloc_cache_name() that is no longer used. Signed-off-by: Pengfei Li Acked-by: Vlastimil Babka Acked-by: Roman Gushchin --- mm/slab.c| 2 +- mm/slab.h| 2 +- mm/slab_common.c | 91 ++-- 3 files changed, 51 insertions(+), 44 deletions(-) diff --git a/mm/slab.c b/mm/slab.c index 9df370558e5d..c42b6211f42e 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1247,7 +1247,7 @@ void __init kmem_cache_init(void) * structures first. Without this, further allocations will bug. */ kmalloc_caches[KMALLOC_NORMAL][INDEX_NODE] = create_kmalloc_cache( - kmalloc_info[INDEX_NODE].name, + kmalloc_info[INDEX_NODE].name[KMALLOC_NORMAL], kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS, 0, kmalloc_size(INDEX_NODE)); slab_state = PARTIAL_NODE; diff --git a/mm/slab.h b/mm/slab.h index 9057b8056b07..2fc8f956906a 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -76,7 +76,7 @@ extern struct kmem_cache *kmem_cache; /* A table of kmalloc cache names and sizes */ extern const struct kmalloc_info_struct { - const char *name; + const char *name[NR_KMALLOC_TYPES]; unsigned int size; } kmalloc_info[]; diff --git a/mm/slab_common.c b/mm/slab_common.c index 807490fe217a..002e16673581 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1092,26 +1092,56 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags) return kmalloc_caches[kmalloc_type(flags)][index]; } +#ifdef CONFIG_ZONE_DMA +#define SET_KMALLOC_SIZE(__size, __short_size) \ +{ \ + .name[KMALLOC_NORMAL] = "kmalloc-" #__short_size, \ + .name[KMALLOC_RECLAIM] = "kmalloc-rcl-" #__short_size, \ + .name[KMALLOC_DMA] = "dma-kmalloc-" #__short_size, \ + .size = __size, \ +} +#else +#define SET_KMALLOC_SIZE(__size, __short_size) \ +{ \ + .name[KMALLOC_NORMAL] = "kmalloc-" #__short_size, \ + .name[KMALLOC_RECLAIM] = "kmalloc-rcl-" #__short_size, \ + .size = __size, \ +} +#endif + /* * kmalloc_info[] is to make slub_debug=,kmalloc-xx option work at boot time. * kmalloc_index() supports up to 2^26=64MB, so the final entry of the table is * kmalloc-67108864. */ const struct kmalloc_info_struct kmalloc_info[] __initconst = { - {NULL, 0}, {"kmalloc-96", 96}, - {"kmalloc-192", 192}, {"kmalloc-8", 8}, - {"kmalloc-16", 16}, {"kmalloc-32", 32}, - {"kmalloc-64", 64}, {"kmalloc-128", 128}, - {"kmalloc-256", 256}, {"kmalloc-512", 512}, - {"kmalloc-1k", 1024}, {"kmalloc-2k", 2048}, - {"kmalloc-4k", 4096}, {"kmalloc-8k", 8192}, - {"kmalloc-16k", 16384}, {"kmalloc-32k", 32768}, - {"kmalloc-64k", 65536}, {"kmalloc-128k", 131072}, - {"kmalloc-256k", 262144}, {"kmalloc-512k", 524288}, - {"kmalloc-1M",1048576}, {"kmalloc-2M",2097152}, - {"kmalloc-4M",4194304}, {"kmalloc-8M",8388608}, - {"kmalloc-16M", 16777216}, {"kmalloc-32M", 33554432}, - {"kmalloc-64M", 67108864} + SET_KMALLOC_SIZE(0, 0), + SET_KMALLOC_SIZE(96, 96), + SET_KMALLOC_SIZE(192, 192), + SET_KMALLOC_SIZE(8, 8), + SET_KMALLOC_SIZE(16, 16), + SET_KMALLOC_SIZE(32, 32), + SET_KMALLOC_SIZE(64, 64), + SET_KMALLOC_SIZE(128, 128), + SET_KMALLOC_SIZE(256, 256), + SET_KMALLOC_SIZE(512, 512), + SET_KMALLOC_SIZE(1024, 1k), + SET_KMALLOC_SIZE(2048, 2k), + SET_KMALLOC_SIZE(4096, 4k), + SET_KMALLOC_SIZE(8192, 8k), + SET_KMALLOC_SIZE(16384, 16k), + SET_KMALLOC_SIZE(32768, 32k), + SET_KMALLOC_SIZE(655
Re: [PATCH v3 4/4] mm, slab_common: Make the loop for initializing KMALLOC_DMA start from 1
On Tue, Sep 10, 2019 at 6:26 PM Vlastimil Babka wrote: > > On 9/10/19 3:26 AM, Pengfei Li wrote: > > KMALLOC_DMA will be initialized only if KMALLOC_NORMAL with > > the same index exists. > > > > And kmalloc_caches[KMALLOC_NORMAL][0] is always NULL. > > > > Therefore, the loop that initializes KMALLOC_DMA should start > > at 1 instead of 0, which will reduce 1 meaningless attempt. > > IMHO the saving of one iteration isn't worth making the code more > subtle. KMALLOC_SHIFT_LOW would be nice, but that would skip 1 + 2 which > are special. > Yes, I agree with you. This really makes the code more subtle. > Since you're doing these cleanups, have you considered reordering > kmalloc_info, size_index, kmalloc_index() etc so that sizes 96 and 192 > are ordered naturally between 64, 128 and 256? That should remove > various special casing such as in create_kmalloc_caches(). I can't > guarantee it will be possible without breaking e.g. constant folding > optimizations etc., but seems to me it should be feasible. (There are > definitely more places to change than those I listed.) > In the past two days, I am working on what you suggested. So far, I have completed the coding work, but I need some time to make sure there are no bugs and verify the impact on performance. I will send v4 soon. Thank you for your review and suggestions. -- Pengfei > > Signed-off-by: Pengfei Li > > --- > > mm/slab_common.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/mm/slab_common.c b/mm/slab_common.c > > index af45b5278fdc..c81fc7dc2946 100644 > > --- a/mm/slab_common.c > > +++ b/mm/slab_common.c > > @@ -1236,7 +1236,7 @@ void __init create_kmalloc_caches(slab_flags_t flags) > > slab_state = UP; > > > > #ifdef CONFIG_ZONE_DMA > > - for (i = 0; i <= KMALLOC_SHIFT_HIGH; i++) { > > + for (i = 1; i <= KMALLOC_SHIFT_HIGH; i++) { > > struct kmem_cache *s = kmalloc_caches[KMALLOC_NORMAL][i]; > > > > if (s) { > > >
[PATCH v3 4/4] mm, slab_common: Make the loop for initializing KMALLOC_DMA start from 1
KMALLOC_DMA will be initialized only if KMALLOC_NORMAL with the same index exists. And kmalloc_caches[KMALLOC_NORMAL][0] is always NULL. Therefore, the loop that initializes KMALLOC_DMA should start at 1 instead of 0, which will reduce 1 meaningless attempt. Signed-off-by: Pengfei Li --- mm/slab_common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/slab_common.c b/mm/slab_common.c index af45b5278fdc..c81fc7dc2946 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1236,7 +1236,7 @@ void __init create_kmalloc_caches(slab_flags_t flags) slab_state = UP; #ifdef CONFIG_ZONE_DMA - for (i = 0; i <= KMALLOC_SHIFT_HIGH; i++) { + for (i = 1; i <= KMALLOC_SHIFT_HIGH; i++) { struct kmem_cache *s = kmalloc_caches[KMALLOC_NORMAL][i]; if (s) { -- 2.21.0
[PATCH v3 1/4] mm, slab: Make kmalloc_info[] contain all types of names
There are three types of kmalloc, KMALLOC_NORMAL, KMALLOC_RECLAIM and KMALLOC_DMA. The name of KMALLOC_NORMAL is contained in kmalloc_info[].name, but the names of KMALLOC_RECLAIM and KMALLOC_DMA are dynamically generated by kmalloc_cache_name(). This patch predefines the names of all types of kmalloc to save the time spent dynamically generating names. Besides, remove the kmalloc_cache_name() that is no longer used. Signed-off-by: Pengfei Li Acked-by: Vlastimil Babka Acked-by: Roman Gushchin --- mm/slab.c| 2 +- mm/slab.h| 2 +- mm/slab_common.c | 91 ++-- 3 files changed, 51 insertions(+), 44 deletions(-) diff --git a/mm/slab.c b/mm/slab.c index 9df370558e5d..c42b6211f42e 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1247,7 +1247,7 @@ void __init kmem_cache_init(void) * structures first. Without this, further allocations will bug. */ kmalloc_caches[KMALLOC_NORMAL][INDEX_NODE] = create_kmalloc_cache( - kmalloc_info[INDEX_NODE].name, + kmalloc_info[INDEX_NODE].name[KMALLOC_NORMAL], kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS, 0, kmalloc_size(INDEX_NODE)); slab_state = PARTIAL_NODE; diff --git a/mm/slab.h b/mm/slab.h index 9057b8056b07..2fc8f956906a 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -76,7 +76,7 @@ extern struct kmem_cache *kmem_cache; /* A table of kmalloc cache names and sizes */ extern const struct kmalloc_info_struct { - const char *name; + const char *name[NR_KMALLOC_TYPES]; unsigned int size; } kmalloc_info[]; diff --git a/mm/slab_common.c b/mm/slab_common.c index 807490fe217a..002e16673581 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1092,26 +1092,56 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags) return kmalloc_caches[kmalloc_type(flags)][index]; } +#ifdef CONFIG_ZONE_DMA +#define SET_KMALLOC_SIZE(__size, __short_size) \ +{ \ + .name[KMALLOC_NORMAL] = "kmalloc-" #__short_size, \ + .name[KMALLOC_RECLAIM] = "kmalloc-rcl-" #__short_size, \ + .name[KMALLOC_DMA] = "dma-kmalloc-" #__short_size, \ + .size = __size, \ +} +#else +#define SET_KMALLOC_SIZE(__size, __short_size) \ +{ \ + .name[KMALLOC_NORMAL] = "kmalloc-" #__short_size, \ + .name[KMALLOC_RECLAIM] = "kmalloc-rcl-" #__short_size, \ + .size = __size, \ +} +#endif + /* * kmalloc_info[] is to make slub_debug=,kmalloc-xx option work at boot time. * kmalloc_index() supports up to 2^26=64MB, so the final entry of the table is * kmalloc-67108864. */ const struct kmalloc_info_struct kmalloc_info[] __initconst = { - {NULL, 0}, {"kmalloc-96", 96}, - {"kmalloc-192", 192}, {"kmalloc-8", 8}, - {"kmalloc-16", 16}, {"kmalloc-32", 32}, - {"kmalloc-64", 64}, {"kmalloc-128", 128}, - {"kmalloc-256", 256}, {"kmalloc-512", 512}, - {"kmalloc-1k", 1024}, {"kmalloc-2k", 2048}, - {"kmalloc-4k", 4096}, {"kmalloc-8k", 8192}, - {"kmalloc-16k", 16384}, {"kmalloc-32k", 32768}, - {"kmalloc-64k", 65536}, {"kmalloc-128k", 131072}, - {"kmalloc-256k", 262144}, {"kmalloc-512k", 524288}, - {"kmalloc-1M",1048576}, {"kmalloc-2M",2097152}, - {"kmalloc-4M",4194304}, {"kmalloc-8M",8388608}, - {"kmalloc-16M", 16777216}, {"kmalloc-32M", 33554432}, - {"kmalloc-64M", 67108864} + SET_KMALLOC_SIZE(0, 0), + SET_KMALLOC_SIZE(96, 96), + SET_KMALLOC_SIZE(192, 192), + SET_KMALLOC_SIZE(8, 8), + SET_KMALLOC_SIZE(16, 16), + SET_KMALLOC_SIZE(32, 32), + SET_KMALLOC_SIZE(64, 64), + SET_KMALLOC_SIZE(128, 128), + SET_KMALLOC_SIZE(256, 256), + SET_KMALLOC_SIZE(512, 512), + SET_KMALLOC_SIZE(1024, 1k), + SET_KMALLOC_SIZE(2048, 2k), + SET_KMALLOC_SIZE(4096, 4k), + SET_KMALLOC_SIZE(8192, 8k), + SET_KMALLOC_SIZE(16384, 16k), + SET_KMALLOC_SIZE(32768, 32k), + SET_KMALLOC_SIZE(655
[PATCH v3 0/4] mm, slab: Make kmalloc_info[] contain all types of names
Changes in v3 -- 1. restore __initconst (patch 1/4) 2. rename patch 3/4 3. add more clarification for patch 4/4 Changes in v2 -- 1. remove __initconst (patch 1/5) 2. squash patch 2/5 3. add ack tag from Vlastimil Babka There are three types of kmalloc, KMALLOC_NORMAL, KMALLOC_RECLAIM and KMALLOC_DMA. The name of KMALLOC_NORMAL is contained in kmalloc_info[].name, but the names of KMALLOC_RECLAIM and KMALLOC_DMA are dynamically generated by kmalloc_cache_name(). Patch1 predefines the names of all types of kmalloc to save the time spent dynamically generating names. The other 4 patches did some cleanup work. These changes make sense, and the time spent by new_kmalloc_cache() has been reduced by approximately 36.3%. Time spent by new_kmalloc_cache() 5.3-rc7 66264 5.3-rc7+patch 42188 Pengfei Li (4): mm, slab: Make kmalloc_info[] contain all types of names mm, slab: Remove unused kmalloc_size() mm, slab_common: use enum kmalloc_cache_type to iterate over kmalloc caches mm, slab_common: Make the loop for initializing KMALLOC_DMA start from 1 include/linux/slab.h | 20 - mm/slab.c| 7 +-- mm/slab.h| 2 +- mm/slab_common.c | 101 +++ 4 files changed, 59 insertions(+), 71 deletions(-) -- 2.21.0
[PATCH v3 2/4] mm, slab: Remove unused kmalloc_size()
The size of kmalloc can be obtained from kmalloc_info[], so remove kmalloc_size() that will not be used anymore. Signed-off-by: Pengfei Li Acked-by: Vlastimil Babka Acked-by: Roman Gushchin --- include/linux/slab.h | 20 mm/slab.c| 5 +++-- mm/slab_common.c | 5 ++--- 3 files changed, 5 insertions(+), 25 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index 56c9c7eed34e..e773e5764b7b 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -557,26 +557,6 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags) return __kmalloc(size, flags); } -/* - * Determine size used for the nth kmalloc cache. - * return size or 0 if a kmalloc cache for that - * size does not exist - */ -static __always_inline unsigned int kmalloc_size(unsigned int n) -{ -#ifndef CONFIG_SLOB - if (n > 2) - return 1U << n; - - if (n == 1 && KMALLOC_MIN_SIZE <= 32) - return 96; - - if (n == 2 && KMALLOC_MIN_SIZE <= 64) - return 192; -#endif - return 0; -} - static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node) { #ifndef CONFIG_SLOB diff --git a/mm/slab.c b/mm/slab.c index c42b6211f42e..7bc4e90e1147 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1248,8 +1248,9 @@ void __init kmem_cache_init(void) */ kmalloc_caches[KMALLOC_NORMAL][INDEX_NODE] = create_kmalloc_cache( kmalloc_info[INDEX_NODE].name[KMALLOC_NORMAL], - kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS, - 0, kmalloc_size(INDEX_NODE)); + kmalloc_info[INDEX_NODE].size, + ARCH_KMALLOC_FLAGS, 0, + kmalloc_info[INDEX_NODE].size); slab_state = PARTIAL_NODE; setup_kmalloc_cache_index_table(); diff --git a/mm/slab_common.c b/mm/slab_common.c index 002e16673581..8b542cfcc4f2 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1239,11 +1239,10 @@ void __init create_kmalloc_caches(slab_flags_t flags) struct kmem_cache *s = kmalloc_caches[KMALLOC_NORMAL][i]; if (s) { - unsigned int size = kmalloc_size(i); - kmalloc_caches[KMALLOC_DMA][i] = create_kmalloc_cache( kmalloc_info[i].name[KMALLOC_DMA], - size, SLAB_CACHE_DMA | flags, 0, 0); + kmalloc_info[i].size, + SLAB_CACHE_DMA | flags, 0, 0); } } #endif -- 2.21.0
[PATCH v3 3/4] mm, slab_common: use enum kmalloc_cache_type to iterate over kmalloc caches
The type of local variable *type* of new_kmalloc_cache() should be enum kmalloc_cache_type instead of int, so correct it. Signed-off-by: Pengfei Li Acked-by: Vlastimil Babka Acked-by: Roman Gushchin --- mm/slab_common.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/mm/slab_common.c b/mm/slab_common.c index 8b542cfcc4f2..af45b5278fdc 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1192,7 +1192,7 @@ void __init setup_kmalloc_cache_index_table(void) } static void __init -new_kmalloc_cache(int idx, int type, slab_flags_t flags) +new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags) { if (type == KMALLOC_RECLAIM) flags |= SLAB_RECLAIM_ACCOUNT; @@ -1210,7 +1210,8 @@ new_kmalloc_cache(int idx, int type, slab_flags_t flags) */ void __init create_kmalloc_caches(slab_flags_t flags) { - int i, type; + int i; + enum kmalloc_cache_type type; for (type = KMALLOC_NORMAL; type <= KMALLOC_RECLAIM; type++) { for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++) { -- 2.21.0
Re: [PATCH v2 3/4] mm, slab_common: Make 'type' is enum kmalloc_cache_type
On Tue, Sep 10, 2019 at 4:00 AM Roman Gushchin wrote: > > On Tue, Sep 10, 2019 at 01:07:14AM +0800, Pengfei Li wrote: > > Hi Pengfei! > > > The 'type' of the function new_kmalloc_cache should be > > enum kmalloc_cache_type instead of int, so correct it. > > I think you mean type of the 'i' variable, not the type of > new_kmalloc_cache() function. Also the name of the patch is > misleading. How about > mm, slab_common: use enum kmalloc_cache_type to iterate over kmalloc caches ? > Or something like this. > Ok, this name is really better :) > The rest of the series looks good to me. > > Please, feel free to use > Acked-by: Roman Gushchin > for patches [1-3] in the series after fixing this commit message and > restoring __initconst. > Thanks! > Patch [4] needs some additional clarifications, IMO. > I will add more clarification in v3. > Thank you! > > > > > Signed-off-by: Pengfei Li > > Acked-by: Vlastimil Babka > > > > > --- > > mm/slab_common.c | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/mm/slab_common.c b/mm/slab_common.c > > index cae27210e4c3..d64a64660f86 100644 > > --- a/mm/slab_common.c > > +++ b/mm/slab_common.c > > @@ -1192,7 +1192,7 @@ void __init setup_kmalloc_cache_index_table(void) > > } > > > > static void __init > > -new_kmalloc_cache(int idx, int type, slab_flags_t flags) > > +new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t > > flags) > > { > > if (type == KMALLOC_RECLAIM) > > flags |= SLAB_RECLAIM_ACCOUNT; > > @@ -1210,7 +1210,8 @@ new_kmalloc_cache(int idx, int type, slab_flags_t > > flags) > > */ > > void __init create_kmalloc_caches(slab_flags_t flags) > > { > > - int i, type; > > + int i; > > + enum kmalloc_cache_type type; > > > > for (type = KMALLOC_NORMAL; type <= KMALLOC_RECLAIM; type++) { > > for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++) { > > -- > > 2.21.0 > > > >
Re: [PATCH 1/5] mm, slab: Make kmalloc_info[] contain all types of names
On Tue, Sep 10, 2019 at 2:30 AM Rasmus Villemoes wrote: > > On 09/09/2019 18.53, Pengfei Li wrote: > > On Mon, Sep 9, 2019 at 10:59 PM Vlastimil Babka wrote: > > >>> /* > >>>* kmalloc_info[] is to make slub_debug=,kmalloc-xx option work at boot > >>> time. > >>>* kmalloc_index() supports up to 2^26=64MB, so the final entry of the > >>> table is > >>>* kmalloc-67108864. > >>>*/ > >>> const struct kmalloc_info_struct kmalloc_info[] __initconst = { > >> > >> BTW should it really be an __initconst, when references to the names > >> keep on living in kmem_cache structs? Isn't this for data that's > >> discarded after init? > > > > You are right, I will remove __initconst in v2. > > No, __initconst is correct, and should be kept. The string literals > which the .name pointers point to live in .rodata, and we're copying the > values of these .name pointers. Nothing refers to something inside > kmalloc_info[] after init. (It would be a whole different matter if > struct kmalloc_info_struct consisted of { char name[NN]; unsigned int > size; }). > Thank you for your comment. I will keep it in v3. I did learn :) > Rasmus
[PATCH v2 4/4] mm, slab_common: Make initializing KMALLOC_DMA start from 1
kmalloc_caches[KMALLOC_NORMAL][0] will never be initialized, so the loop should start at 1 instead of 0 Signed-off-by: Pengfei Li --- mm/slab_common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/slab_common.c b/mm/slab_common.c index d64a64660f86..6b3e526934d9 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1236,7 +1236,7 @@ void __init create_kmalloc_caches(slab_flags_t flags) slab_state = UP; #ifdef CONFIG_ZONE_DMA - for (i = 0; i <= KMALLOC_SHIFT_HIGH; i++) { + for (i = 1; i <= KMALLOC_SHIFT_HIGH; i++) { struct kmem_cache *s = kmalloc_caches[KMALLOC_NORMAL][i]; if (s) { -- 2.21.0
[PATCH v2 2/4] mm, slab: Remove unused kmalloc_size()
The size of kmalloc can be obtained from kmalloc_info[], so remove kmalloc_size() that will not be used anymore. Signed-off-by: Pengfei Li Acked-by: Vlastimil Babka --- include/linux/slab.h | 20 mm/slab.c| 5 +++-- mm/slab_common.c | 5 ++--- 3 files changed, 5 insertions(+), 25 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index 56c9c7eed34e..e773e5764b7b 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -557,26 +557,6 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags) return __kmalloc(size, flags); } -/* - * Determine size used for the nth kmalloc cache. - * return size or 0 if a kmalloc cache for that - * size does not exist - */ -static __always_inline unsigned int kmalloc_size(unsigned int n) -{ -#ifndef CONFIG_SLOB - if (n > 2) - return 1U << n; - - if (n == 1 && KMALLOC_MIN_SIZE <= 32) - return 96; - - if (n == 2 && KMALLOC_MIN_SIZE <= 64) - return 192; -#endif - return 0; -} - static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node) { #ifndef CONFIG_SLOB diff --git a/mm/slab.c b/mm/slab.c index c42b6211f42e..7bc4e90e1147 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1248,8 +1248,9 @@ void __init kmem_cache_init(void) */ kmalloc_caches[KMALLOC_NORMAL][INDEX_NODE] = create_kmalloc_cache( kmalloc_info[INDEX_NODE].name[KMALLOC_NORMAL], - kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS, - 0, kmalloc_size(INDEX_NODE)); + kmalloc_info[INDEX_NODE].size, + ARCH_KMALLOC_FLAGS, 0, + kmalloc_info[INDEX_NODE].size); slab_state = PARTIAL_NODE; setup_kmalloc_cache_index_table(); diff --git a/mm/slab_common.c b/mm/slab_common.c index 61c1e2e54263..cae27210e4c3 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1239,11 +1239,10 @@ void __init create_kmalloc_caches(slab_flags_t flags) struct kmem_cache *s = kmalloc_caches[KMALLOC_NORMAL][i]; if (s) { - unsigned int size = kmalloc_size(i); - kmalloc_caches[KMALLOC_DMA][i] = create_kmalloc_cache( kmalloc_info[i].name[KMALLOC_DMA], - size, SLAB_CACHE_DMA | flags, 0, 0); + kmalloc_info[i].size, + SLAB_CACHE_DMA | flags, 0, 0); } } #endif -- 2.21.0
[PATCH v2 1/4] mm, slab: Make kmalloc_info[] contain all types of names
There are three types of kmalloc, KMALLOC_NORMAL, KMALLOC_RECLAIM and KMALLOC_DMA. The name of KMALLOC_NORMAL is contained in kmalloc_info[].name, but the names of KMALLOC_RECLAIM and KMALLOC_DMA are dynamically generated by kmalloc_cache_name(). This patch predefines the names of all types of kmalloc to save the time spent dynamically generating names. Besides, remove the kmalloc_cache_name() that is no longer used. Signed-off-by: Pengfei Li Acked-by: Vlastimil Babka --- mm/slab.c| 2 +- mm/slab.h| 2 +- mm/slab_common.c | 93 ++-- 3 files changed, 52 insertions(+), 45 deletions(-) diff --git a/mm/slab.c b/mm/slab.c index 9df370558e5d..c42b6211f42e 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1247,7 +1247,7 @@ void __init kmem_cache_init(void) * structures first. Without this, further allocations will bug. */ kmalloc_caches[KMALLOC_NORMAL][INDEX_NODE] = create_kmalloc_cache( - kmalloc_info[INDEX_NODE].name, + kmalloc_info[INDEX_NODE].name[KMALLOC_NORMAL], kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS, 0, kmalloc_size(INDEX_NODE)); slab_state = PARTIAL_NODE; diff --git a/mm/slab.h b/mm/slab.h index 9057b8056b07..2fc8f956906a 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -76,7 +76,7 @@ extern struct kmem_cache *kmem_cache; /* A table of kmalloc cache names and sizes */ extern const struct kmalloc_info_struct { - const char *name; + const char *name[NR_KMALLOC_TYPES]; unsigned int size; } kmalloc_info[]; diff --git a/mm/slab_common.c b/mm/slab_common.c index 807490fe217a..61c1e2e54263 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1092,26 +1092,56 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags) return kmalloc_caches[kmalloc_type(flags)][index]; } +#ifdef CONFIG_ZONE_DMA +#define SET_KMALLOC_SIZE(__size, __short_size) \ +{ \ + .name[KMALLOC_NORMAL] = "kmalloc-" #__short_size, \ + .name[KMALLOC_RECLAIM] = "kmalloc-rcl-" #__short_size, \ + .name[KMALLOC_DMA] = "dma-kmalloc-" #__short_size, \ + .size = __size, \ +} +#else +#define SET_KMALLOC_SIZE(__size, __short_size) \ +{ \ + .name[KMALLOC_NORMAL] = "kmalloc-" #__short_size, \ + .name[KMALLOC_RECLAIM] = "kmalloc-rcl-" #__short_size, \ + .size = __size, \ +} +#endif + /* * kmalloc_info[] is to make slub_debug=,kmalloc-xx option work at boot time. * kmalloc_index() supports up to 2^26=64MB, so the final entry of the table is * kmalloc-67108864. */ -const struct kmalloc_info_struct kmalloc_info[] __initconst = { - {NULL, 0}, {"kmalloc-96", 96}, - {"kmalloc-192", 192}, {"kmalloc-8", 8}, - {"kmalloc-16", 16}, {"kmalloc-32", 32}, - {"kmalloc-64", 64}, {"kmalloc-128", 128}, - {"kmalloc-256", 256}, {"kmalloc-512", 512}, - {"kmalloc-1k", 1024}, {"kmalloc-2k", 2048}, - {"kmalloc-4k", 4096}, {"kmalloc-8k", 8192}, - {"kmalloc-16k", 16384}, {"kmalloc-32k", 32768}, - {"kmalloc-64k", 65536}, {"kmalloc-128k", 131072}, - {"kmalloc-256k", 262144}, {"kmalloc-512k", 524288}, - {"kmalloc-1M",1048576}, {"kmalloc-2M",2097152}, - {"kmalloc-4M",4194304}, {"kmalloc-8M",8388608}, - {"kmalloc-16M", 16777216}, {"kmalloc-32M", 33554432}, - {"kmalloc-64M", 67108864} +const struct kmalloc_info_struct kmalloc_info[] = { + SET_KMALLOC_SIZE(0, 0), + SET_KMALLOC_SIZE(96, 96), + SET_KMALLOC_SIZE(192, 192), + SET_KMALLOC_SIZE(8, 8), + SET_KMALLOC_SIZE(16, 16), + SET_KMALLOC_SIZE(32, 32), + SET_KMALLOC_SIZE(64, 64), + SET_KMALLOC_SIZE(128, 128), + SET_KMALLOC_SIZE(256, 256), + SET_KMALLOC_SIZE(512, 512), + SET_KMALLOC_SIZE(1024, 1k), + SET_KMALLOC_SIZE(2048, 2k), + SET_KMALLOC_SIZE(4096, 4k), + SET_KMALLOC_SIZE(8192, 8k), + SET_KMALLOC_SIZE(16384, 16k), + SET_KMALLOC_SIZE(32768, 32k), +
[PATCH v2 3/4] mm, slab_common: Make 'type' is enum kmalloc_cache_type
The 'type' of the function new_kmalloc_cache should be enum kmalloc_cache_type instead of int, so correct it. Signed-off-by: Pengfei Li Acked-by: Vlastimil Babka --- mm/slab_common.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/mm/slab_common.c b/mm/slab_common.c index cae27210e4c3..d64a64660f86 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1192,7 +1192,7 @@ void __init setup_kmalloc_cache_index_table(void) } static void __init -new_kmalloc_cache(int idx, int type, slab_flags_t flags) +new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags) { if (type == KMALLOC_RECLAIM) flags |= SLAB_RECLAIM_ACCOUNT; @@ -1210,7 +1210,8 @@ new_kmalloc_cache(int idx, int type, slab_flags_t flags) */ void __init create_kmalloc_caches(slab_flags_t flags) { - int i, type; + int i; + enum kmalloc_cache_type type; for (type = KMALLOC_NORMAL; type <= KMALLOC_RECLAIM; type++) { for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++) { -- 2.21.0
[PATCH v2 0/4] mm, slab: Make kmalloc_info[] contain all types of names
Changes in v2 -- 1. remove __initconst (patch 1/5) 2. squash patch 2/5 3. add ack tag from Vlastimil Babka There are three types of kmalloc, KMALLOC_NORMAL, KMALLOC_RECLAIM and KMALLOC_DMA. The name of KMALLOC_NORMAL is contained in kmalloc_info[].name, but the names of KMALLOC_RECLAIM and KMALLOC_DMA are dynamically generated by kmalloc_cache_name(). Patch1 predefines the names of all types of kmalloc to save the time spent dynamically generating names. The other 4 patches did some cleanup work. These changes make sense, and the time spent by new_kmalloc_cache() has been reduced by approximately 36.3%. Time spent by new_kmalloc_cache() 5.3-rc7 66264 5.3-rc7+patch 42188 Pengfei Li (4): mm, slab: Make kmalloc_info[] contain all types of names mm, slab: Remove unused kmalloc_size() mm, slab_common: Make 'type' is enum kmalloc_cache_type mm, slab_common: Make initializing KMALLOC_DMA start from 1 include/linux/slab.h | 20 - mm/slab.c| 7 +-- mm/slab.h| 2 +- mm/slab_common.c | 103 +++ 4 files changed, 60 insertions(+), 72 deletions(-) -- 2.21.0
Re: [PATCH 2/5] mm, slab_common: Remove unused kmalloc_cache_name()
On Mon, Sep 9, 2019 at 10:59 PM Vlastimil Babka wrote: > > On 9/3/19 6:04 PM, Pengfei Li wrote: > > Since the name of kmalloc can be obtained from kmalloc_info[], > > remove the kmalloc_cache_name() that is no longer used. > > That could simply be part of patch 1/5 really. > Ok, thanks. > > Signed-off-by: Pengfei Li > > Ack > > > --- > > mm/slab_common.c | 15 --- > > 1 file changed, 15 deletions(-)
Re: [PATCH 1/5] mm, slab: Make kmalloc_info[] contain all types of names
On Mon, Sep 9, 2019 at 10:59 PM Vlastimil Babka wrote: > > On 9/3/19 6:04 PM, Pengfei Li wrote: > > There are three types of kmalloc, KMALLOC_NORMAL, KMALLOC_RECLAIM > > and KMALLOC_DMA. > > > > The name of KMALLOC_NORMAL is contained in kmalloc_info[].name, > > but the names of KMALLOC_RECLAIM and KMALLOC_DMA are dynamically > > generated by kmalloc_cache_name(). > > > > This patch predefines the names of all types of kmalloc to save > > the time spent dynamically generating names. > > As I said, IMHO it's more useful that we don't need to allocate the > names dynamically anymore, and it's simpler overall. > Thank you very much for your review. > > Signed-off-by: Pengfei Li > > Acked-by: Vlastimil Babka > > > /* > >* kmalloc_info[] is to make slub_debug=,kmalloc-xx option work at boot > > time. > >* kmalloc_index() supports up to 2^26=64MB, so the final entry of the > > table is > >* kmalloc-67108864. > >*/ > > const struct kmalloc_info_struct kmalloc_info[] __initconst = { > > BTW should it really be an __initconst, when references to the names > keep on living in kmem_cache structs? Isn't this for data that's > discarded after init? You are right, I will remove __initconst in v2.
Re: [PATCH 0/5] mm, slab: Make kmalloc_info[] contain all types of names
On Thu, Sep 5, 2019 at 8:25 PM Vlastimil Babka wrote: > > On 9/3/19 6:04 PM, Pengfei Li wrote: > > There are three types of kmalloc, KMALLOC_NORMAL, KMALLOC_RECLAIM > > and KMALLOC_DMA. > > > > The name of KMALLOC_NORMAL is contained in kmalloc_info[].name, > > but the names of KMALLOC_RECLAIM and KMALLOC_DMA are dynamically > > generated by kmalloc_cache_name(). > > > > Patch1 predefines the names of all types of kmalloc to save > > the time spent dynamically generating names. > > > > The other 4 patches did some cleanup work. > > > > These changes make sense, and the time spent by new_kmalloc_cache() > > has been reduced by approximately 36.3%. > > > > Time spent by > > new_kmalloc_cache() > > 5.3-rc7 66264 > > 5.3-rc7+patch 42188 > > Note that the caches are created only once upon boot, so I doubt that Thank you for your comments. Yes, kmalloc-xxx are only created at boot time. > these time savings (is it in CPU cycles?) will be noticeable at all. Yes, it is CPU cycles. > But diffstat looks ok, and it avoids using kmalloc() (via kasprintf()) to > allocate names for kmalloc(), so in that sense I think it's worthwhile > to consider. Thanks. > Thanks. > > Pengfei Li (5): > > mm, slab: Make kmalloc_info[] contain all types of names > > mm, slab_common: Remove unused kmalloc_cache_name() > > mm, slab: Remove unused kmalloc_size() > > mm, slab_common: Make 'type' is enum kmalloc_cache_type > > mm, slab_common: Make initializing KMALLOC_DMA start from 1 > > > > include/linux/slab.h | 20 - > > mm/slab.c| 7 +-- > > mm/slab.h| 2 +- > > mm/slab_common.c | 101 +++ > > 4 files changed, 59 insertions(+), 71 deletions(-) > > >
Re: [PATCH 0/5] mm, slab: Make kmalloc_info[] contain all types of names
On Thu, Sep 5, 2019 at 3:27 AM Christopher Lameter wrote: > > On Wed, 4 Sep 2019, Pengfei Li wrote: > > > There are three types of kmalloc, KMALLOC_NORMAL, KMALLOC_RECLAIM > > and KMALLOC_DMA. > > I only got a few patches of this set. Can I see the complete patchset > somewhere? Yes, you can get the patches from https://patchwork.kernel.org/cover/11128325/ or https://lore.kernel.org/patchwork/cover/1123412/ Hope you like it :)
[PATCH 5/5] mm, slab_common: Make initializing KMALLOC_DMA start from 1
kmalloc_caches[KMALLOC_NORMAL][0] will never be initialized, so the loop should start at 1 instead of 0 Signed-off-by: Pengfei Li --- mm/slab_common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/slab_common.c b/mm/slab_common.c index af45b5278fdc..c81fc7dc2946 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1236,7 +1236,7 @@ void __init create_kmalloc_caches(slab_flags_t flags) slab_state = UP; #ifdef CONFIG_ZONE_DMA - for (i = 0; i <= KMALLOC_SHIFT_HIGH; i++) { + for (i = 1; i <= KMALLOC_SHIFT_HIGH; i++) { struct kmem_cache *s = kmalloc_caches[KMALLOC_NORMAL][i]; if (s) { -- 2.21.0
[PATCH 4/5] mm, slab_common: Make 'type' is enum kmalloc_cache_type
The 'type' of the function new_kmalloc_cache should be enum kmalloc_cache_type instead of int, so correct it. Signed-off-by: Pengfei Li --- mm/slab_common.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/mm/slab_common.c b/mm/slab_common.c index 8b542cfcc4f2..af45b5278fdc 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1192,7 +1192,7 @@ void __init setup_kmalloc_cache_index_table(void) } static void __init -new_kmalloc_cache(int idx, int type, slab_flags_t flags) +new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags) { if (type == KMALLOC_RECLAIM) flags |= SLAB_RECLAIM_ACCOUNT; @@ -1210,7 +1210,8 @@ new_kmalloc_cache(int idx, int type, slab_flags_t flags) */ void __init create_kmalloc_caches(slab_flags_t flags) { - int i, type; + int i; + enum kmalloc_cache_type type; for (type = KMALLOC_NORMAL; type <= KMALLOC_RECLAIM; type++) { for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++) { -- 2.21.0
[PATCH 3/5] mm, slab: Remove unused kmalloc_size()
The size of kmalloc can be obtained from kmalloc_info[], so remove kmalloc_size() that will not be used anymore. Signed-off-by: Pengfei Li --- include/linux/slab.h | 20 mm/slab.c| 5 +++-- mm/slab_common.c | 5 ++--- 3 files changed, 5 insertions(+), 25 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index 56c9c7eed34e..e773e5764b7b 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -557,26 +557,6 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags) return __kmalloc(size, flags); } -/* - * Determine size used for the nth kmalloc cache. - * return size or 0 if a kmalloc cache for that - * size does not exist - */ -static __always_inline unsigned int kmalloc_size(unsigned int n) -{ -#ifndef CONFIG_SLOB - if (n > 2) - return 1U << n; - - if (n == 1 && KMALLOC_MIN_SIZE <= 32) - return 96; - - if (n == 2 && KMALLOC_MIN_SIZE <= 64) - return 192; -#endif - return 0; -} - static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node) { #ifndef CONFIG_SLOB diff --git a/mm/slab.c b/mm/slab.c index c42b6211f42e..7bc4e90e1147 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1248,8 +1248,9 @@ void __init kmem_cache_init(void) */ kmalloc_caches[KMALLOC_NORMAL][INDEX_NODE] = create_kmalloc_cache( kmalloc_info[INDEX_NODE].name[KMALLOC_NORMAL], - kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS, - 0, kmalloc_size(INDEX_NODE)); + kmalloc_info[INDEX_NODE].size, + ARCH_KMALLOC_FLAGS, 0, + kmalloc_info[INDEX_NODE].size); slab_state = PARTIAL_NODE; setup_kmalloc_cache_index_table(); diff --git a/mm/slab_common.c b/mm/slab_common.c index 002e16673581..8b542cfcc4f2 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1239,11 +1239,10 @@ void __init create_kmalloc_caches(slab_flags_t flags) struct kmem_cache *s = kmalloc_caches[KMALLOC_NORMAL][i]; if (s) { - unsigned int size = kmalloc_size(i); - kmalloc_caches[KMALLOC_DMA][i] = create_kmalloc_cache( kmalloc_info[i].name[KMALLOC_DMA], - size, SLAB_CACHE_DMA | flags, 0, 0); + kmalloc_info[i].size, + SLAB_CACHE_DMA | flags, 0, 0); } } #endif -- 2.21.0
[PATCH 2/5] mm, slab_common: Remove unused kmalloc_cache_name()
Since the name of kmalloc can be obtained from kmalloc_info[], remove the kmalloc_cache_name() that is no longer used. Signed-off-by: Pengfei Li --- mm/slab_common.c | 15 --- 1 file changed, 15 deletions(-) diff --git a/mm/slab_common.c b/mm/slab_common.c index 7bd88cc09987..002e16673581 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1191,21 +1191,6 @@ void __init setup_kmalloc_cache_index_table(void) } } -static const char * -kmalloc_cache_name(const char *prefix, unsigned int size) -{ - - static const char units[3] = "\0kM"; - int idx = 0; - - while (size >= 1024 && (size % 1024 == 0)) { - size /= 1024; - idx++; - } - - return kasprintf(GFP_NOWAIT, "%s-%u%c", prefix, size, units[idx]); -} - static void __init new_kmalloc_cache(int idx, int type, slab_flags_t flags) { -- 2.21.0
[PATCH 1/5] mm, slab: Make kmalloc_info[] contain all types of names
There are three types of kmalloc, KMALLOC_NORMAL, KMALLOC_RECLAIM and KMALLOC_DMA. The name of KMALLOC_NORMAL is contained in kmalloc_info[].name, but the names of KMALLOC_RECLAIM and KMALLOC_DMA are dynamically generated by kmalloc_cache_name(). This patch predefines the names of all types of kmalloc to save the time spent dynamically generating names. Signed-off-by: Pengfei Li --- mm/slab.c| 2 +- mm/slab.h| 2 +- mm/slab_common.c | 76 +++- 3 files changed, 51 insertions(+), 29 deletions(-) diff --git a/mm/slab.c b/mm/slab.c index 9df370558e5d..c42b6211f42e 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1247,7 +1247,7 @@ void __init kmem_cache_init(void) * structures first. Without this, further allocations will bug. */ kmalloc_caches[KMALLOC_NORMAL][INDEX_NODE] = create_kmalloc_cache( - kmalloc_info[INDEX_NODE].name, + kmalloc_info[INDEX_NODE].name[KMALLOC_NORMAL], kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS, 0, kmalloc_size(INDEX_NODE)); slab_state = PARTIAL_NODE; diff --git a/mm/slab.h b/mm/slab.h index 9057b8056b07..2fc8f956906a 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -76,7 +76,7 @@ extern struct kmem_cache *kmem_cache; /* A table of kmalloc cache names and sizes */ extern const struct kmalloc_info_struct { - const char *name; + const char *name[NR_KMALLOC_TYPES]; unsigned int size; } kmalloc_info[]; diff --git a/mm/slab_common.c b/mm/slab_common.c index 807490fe217a..7bd88cc09987 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1092,26 +1092,56 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags) return kmalloc_caches[kmalloc_type(flags)][index]; } +#ifdef CONFIG_ZONE_DMA +#define SET_KMALLOC_SIZE(__size, __short_size) \ +{ \ + .name[KMALLOC_NORMAL] = "kmalloc-" #__short_size, \ + .name[KMALLOC_RECLAIM] = "kmalloc-rcl-" #__short_size, \ + .name[KMALLOC_DMA] = "dma-kmalloc-" #__short_size, \ + .size = __size, \ +} +#else +#define SET_KMALLOC_SIZE(__size, __short_size) \ +{ \ + .name[KMALLOC_NORMAL] = "kmalloc-" #__short_size, \ + .name[KMALLOC_RECLAIM] = "kmalloc-rcl-" #__short_size, \ + .size = __size, \ +} +#endif + /* * kmalloc_info[] is to make slub_debug=,kmalloc-xx option work at boot time. * kmalloc_index() supports up to 2^26=64MB, so the final entry of the table is * kmalloc-67108864. */ const struct kmalloc_info_struct kmalloc_info[] __initconst = { - {NULL, 0}, {"kmalloc-96", 96}, - {"kmalloc-192", 192}, {"kmalloc-8", 8}, - {"kmalloc-16", 16}, {"kmalloc-32", 32}, - {"kmalloc-64", 64}, {"kmalloc-128", 128}, - {"kmalloc-256", 256}, {"kmalloc-512", 512}, - {"kmalloc-1k", 1024}, {"kmalloc-2k", 2048}, - {"kmalloc-4k", 4096}, {"kmalloc-8k", 8192}, - {"kmalloc-16k", 16384}, {"kmalloc-32k", 32768}, - {"kmalloc-64k", 65536}, {"kmalloc-128k", 131072}, - {"kmalloc-256k", 262144}, {"kmalloc-512k", 524288}, - {"kmalloc-1M",1048576}, {"kmalloc-2M",2097152}, - {"kmalloc-4M",4194304}, {"kmalloc-8M",8388608}, - {"kmalloc-16M", 16777216}, {"kmalloc-32M", 33554432}, - {"kmalloc-64M", 67108864} + SET_KMALLOC_SIZE(0, 0), + SET_KMALLOC_SIZE(96, 96), + SET_KMALLOC_SIZE(192, 192), + SET_KMALLOC_SIZE(8, 8), + SET_KMALLOC_SIZE(16, 16), + SET_KMALLOC_SIZE(32, 32), + SET_KMALLOC_SIZE(64, 64), + SET_KMALLOC_SIZE(128, 128), + SET_KMALLOC_SIZE(256, 256), + SET_KMALLOC_SIZE(512, 512), + SET_KMALLOC_SIZE(1024, 1k), + SET_KMALLOC_SIZE(2048, 2k), + SET_KMALLOC_SIZE(4096, 4k), + SET_KMALLOC_SIZE(8192, 8k), + SET_KMALLOC_SIZE(16384, 16k), + SET_KMALLOC_SIZE(32768, 32k), + SET_KMALLOC_SIZE(65536, 64k), + SET_KMALLOC_SIZE(131072, 128k), + SET_KMALLOC_SIZE(262144, 256k), + SET_KMALLOC_SIZE(5242
[PATCH 0/5] mm, slab: Make kmalloc_info[] contain all types of names
There are three types of kmalloc, KMALLOC_NORMAL, KMALLOC_RECLAIM and KMALLOC_DMA. The name of KMALLOC_NORMAL is contained in kmalloc_info[].name, but the names of KMALLOC_RECLAIM and KMALLOC_DMA are dynamically generated by kmalloc_cache_name(). Patch1 predefines the names of all types of kmalloc to save the time spent dynamically generating names. The other 4 patches did some cleanup work. These changes make sense, and the time spent by new_kmalloc_cache() has been reduced by approximately 36.3%. Time spent by new_kmalloc_cache() 5.3-rc7 66264 5.3-rc7+patch 42188 Pengfei Li (5): mm, slab: Make kmalloc_info[] contain all types of names mm, slab_common: Remove unused kmalloc_cache_name() mm, slab: Remove unused kmalloc_size() mm, slab_common: Make 'type' is enum kmalloc_cache_type mm, slab_common: Make initializing KMALLOC_DMA start from 1 include/linux/slab.h | 20 - mm/slab.c| 7 +-- mm/slab.h| 2 +- mm/slab_common.c | 101 +++ 4 files changed, 59 insertions(+), 71 deletions(-) -- 2.21.0
Re: [PATCH] mm/page_alloc: cleanup __alloc_pages_direct_compact()
On Mon, Aug 19, 2019 at 9:50 PM Vlastimil Babka wrote: > > On 8/17/19 12:51 PM, Pengfei Li wrote: > > This patch cleans up the if(page). > > > > No functional change. > > > > Signed-off-by: Pengfei Li > > I don't see much benefit here. The indentation wasn't that bad that it > had to be reduced using goto. But the patch is not incorrect so I'm not > NACKing. > Thanks for your review and comments. This patch reduces the number of times the if(page) (as the compiler does), and the downside is that there is a goto. If this improves readability, accept it. Otherwise, leave it as it is. Thanks again. --- Pengfei > > --- > > mm/page_alloc.c | 28 > > 1 file changed, 16 insertions(+), 12 deletions(-) > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 272c6de1bf4e..51f056ac09f5 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -3890,6 +3890,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned > > int order, > > enum compact_priority prio, enum compact_result > > *compact_result) > > { > > struct page *page = NULL; > > + struct zone *zone; > > unsigned long pflags; > > unsigned int noreclaim_flag; > > > > @@ -3911,23 +3912,26 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, > > unsigned int order, > >*/ > > count_vm_event(COMPACTSTALL); > > > > - /* Prep a captured page if available */ > > - if (page) > > + if (page) { > > + /* Prep a captured page if available */ > > prep_new_page(page, order, gfp_mask, alloc_flags); > > - > > - /* Try get a page from the freelist if available */ > > - if (!page) > > + } else { > > + /* Try get a page from the freelist if available */ > > page = get_page_from_freelist(gfp_mask, order, alloc_flags, > > ac); > > > > - if (page) { > > - struct zone *zone = page_zone(page); > > - > > - zone->compact_blockskip_flush = false; > > - compaction_defer_reset(zone, order, true); > > - count_vm_event(COMPACTSUCCESS); > > - return page; > > + if (!page) > > + goto failed; > > } > > > > + zone = page_zone(page); > > + zone->compact_blockskip_flush = false; > > + compaction_defer_reset(zone, order, true); > > + > > + count_vm_event(COMPACTSUCCESS); > > + > > + return page; > > + > > +failed: > > /* > >* It's bad if compaction run occurs and fails. The most likely reason > >* is that pages exist, but not enough to satisfy watermarks. > > >
[PATCH] mm/page_alloc: cleanup __alloc_pages_direct_compact()
This patch cleans up the if(page). No functional change. Signed-off-by: Pengfei Li --- mm/page_alloc.c | 28 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 272c6de1bf4e..51f056ac09f5 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3890,6 +3890,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, enum compact_priority prio, enum compact_result *compact_result) { struct page *page = NULL; + struct zone *zone; unsigned long pflags; unsigned int noreclaim_flag; @@ -3911,23 +3912,26 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, */ count_vm_event(COMPACTSTALL); - /* Prep a captured page if available */ - if (page) + if (page) { + /* Prep a captured page if available */ prep_new_page(page, order, gfp_mask, alloc_flags); - - /* Try get a page from the freelist if available */ - if (!page) + } else { + /* Try get a page from the freelist if available */ page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac); - if (page) { - struct zone *zone = page_zone(page); - - zone->compact_blockskip_flush = false; - compaction_defer_reset(zone, order, true); - count_vm_event(COMPACTSUCCESS); - return page; + if (!page) + goto failed; } + zone = page_zone(page); + zone->compact_blockskip_flush = false; + compaction_defer_reset(zone, order, true); + + count_vm_event(COMPACTSUCCESS); + + return page; + +failed: /* * It's bad if compaction run occurs and fails. The most likely reason * is that pages exist, but not enough to satisfy watermarks. -- 2.21.0
[PATCH] mm/compaction: remove unnecessary zone parameter in isolate_migratepages()
Like commit 40cacbcb3240 ("mm, compaction: remove unnecessary zone parameter in some instances"), remove unnecessary zone parameter. No functional change. Signed-off-by: Pengfei Li --- mm/compaction.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 952dc2fb24e5..685c3e3d0a0f 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1737,8 +1737,7 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc) * starting at the block pointed to by the migrate scanner pfn within * compact_control. */ -static isolate_migrate_t isolate_migratepages(struct zone *zone, - struct compact_control *cc) +static isolate_migrate_t isolate_migratepages(struct compact_control *cc) { unsigned long block_start_pfn; unsigned long block_end_pfn; @@ -1756,8 +1755,8 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, */ low_pfn = fast_find_migrateblock(cc); block_start_pfn = pageblock_start_pfn(low_pfn); - if (block_start_pfn < zone->zone_start_pfn) - block_start_pfn = zone->zone_start_pfn; + if (block_start_pfn < cc->zone->zone_start_pfn) + block_start_pfn = cc->zone->zone_start_pfn; /* * fast_find_migrateblock marks a pageblock skipped so to avoid @@ -1787,8 +1786,8 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, if (!(low_pfn % (SWAP_CLUSTER_MAX * pageblock_nr_pages))) cond_resched(); - page = pageblock_pfn_to_page(block_start_pfn, block_end_pfn, - zone); + page = pageblock_pfn_to_page(block_start_pfn, + block_end_pfn, cc->zone); if (!page) continue; @@ -2158,7 +2157,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc) cc->rescan = true; } - switch (isolate_migratepages(cc->zone, cc)) { + switch (isolate_migratepages(cc)) { case ISOLATE_ABORT: ret = COMPACT_CONTENDED; putback_movable_pages(>migratepages); -- 2.21.0
Re: [PATCH 00/10] make "order" unsigned int
On Mon, Jul 29, 2019 at 4:34 PM Mel Gorman wrote: > > On Sun, Jul 28, 2019 at 12:44:36AM +0800, Pengfei Li wrote: > > On Fri, Jul 26, 2019 at 3:26 PM Mel Gorman > > wrote: > > > > > > > Thank you for your comments. > > > > > On Fri, Jul 26, 2019 at 02:42:43AM +0800, Pengfei Li wrote: > > > > Objective > > > > > > > > The motivation for this series of patches is use unsigned int for > > > > "order" in compaction.c, just like in other memory subsystems. > > > > > > > > > > Why? The series is relatively subtle in parts, particularly patch 5. > > > > Before I sent this series of patches, I took a close look at the > > git log for compact.c. > > > > Here is a short history, trouble you to look patiently. > > > > 1) At first, "order" is _unsigned int_ > > > > The commit 56de7263fcf3 ("mm: compaction: direct compact when a > > high-order allocation fails") introduced the "order" in > > compact_control and its type is unsigned int. > > > > Besides, you specify that order == -1 is the flag that triggers > > compaction via proc. > > > > Yes, specifying that compaction in that context is for the entire zone > without any specific allocation context or request. Yes > > > 2) Next, because order equals -1 is special, it causes an error. > > > > The commit 7be62de99adc ("vmscan: kswapd carefully call compaction") > > determines if "order" is less than 0. > > > > This condition is always true because the type of "order" is > > _unsigned int_. > > > > - compact_zone(zone, ); > > + if (cc->order < 0 || !compaction_deferred(zone)) > > > > 3) Finally, in order to fix the above error, the type of the order > > is modified to _int_ > > > > It is done by commit: aad6ec3777bf ("mm: compaction: make > > compact_control order signed"). > > > > The reason I mention this is because I want to express that the > > type of "order" is originally _unsigned int_. And "order" is > > modified to _int_ because of the special value of -1. > > > > And in itself, why does that matter? The -1 makes order is int, which breaks the consistency of the type of order. > > > If the special value of "order" is not a negative number (for > > example, -1), but a number greater than MAX_ORDER - 1 (for example, > > MAX_ORDER), then the "order" may still be _unsigned int_ now. > > > > Sure, but then you have to check that every check on order understands > the new special value. > Since this check is done by is_via_compact_memory(), it is easy to modify the special value being checked. I have checked every check many times and now I need some reviews from the community. > > > There have been places where by it was important for order to be able to > > > go negative due to loop exit conditions. > > > > I think that even if "cc->order" is _unsigned int_, it can be done > > with a local temporary variable easily. > > > > Like this, > > > > function(...) > > { > > for(int tmp_order = cc->order; tmp_order >= 0; tmp_order--) { > > ... > > } > > } > > > > Yes, it can be expressed as unsigned but in itself why does that justify > the review of a large series? At first glance it seems that this series of patches is large. But at least half of it is to modify the corresponding trace function. And you can see that except patch 4 and patch 5, other patches only modify the type of order. Even for patch 5 with function modifications, the modified code has only about 50 lines. > There is limited to no performance gain > and functionally it's equivalent. This is just clean up. And others have done similar work before. commit d00181b96eb8 ("mm: use 'unsigned int' for page order") commit 7aeb09f9104b ("mm: page_alloc: use unsigned int for order in more places") > > > > If there was a gain from this > > > or it was a cleanup in the context of another major body of work, I > > > could understand the justification but that does not appear to be the > > > case here. > > > > > > > My final conclusion: > > > > Why "order" is _int_ instead of unsigned int? > > => Because order == -1 is used as the flag. > > => So what about making "order" greater than MAX_ORDER - 1? > > => The "order&qu
Re: [PATCH 00/10] make "order" unsigned int
On Fri, Jul 26, 2019 at 3:12 PM Michal Hocko wrote: > Thank you for your comments. > On Fri 26-07-19 07:48:36, Pengfei Li wrote: > [...] > > For the benefit, "order" may be negative, which is confusing and weird. > > order = -1 has a special meaning. > Yes. But I mean -1 can be replaced by any number greater than MAX_ORDER - 1 and there is no reason to be negative. > > There is no good reason not to do this since it can be avoided. > > "This is good because we can do it" doesn't really sound like a > convincing argument to me. I would understand if this reduced a > generated code, made an overall code readability much better or > something along those lines. Also we only use MAX_ORDER range of values > so I could argue that a smaller data type (e.g. short) should be > sufficient for this data type. > I resend an email to interpret the meaning of my commit, and I would be very grateful if you post some comments on this. > Please note that _any_ change, alebit seemingly small, can introduce a > subtle bug. Also each patch requires a man power to review so you have > to understand that "just because we can" is not a strong motivation for > people to spend their time on such a patch. Sincerely thank you, I will keep these in mind. > -- > Michal Hocko > SUSE Labs -- Pengfei
Re: [PATCH 00/10] make "order" unsigned int
On Fri, Jul 26, 2019 at 3:26 PM Mel Gorman wrote: > Thank you for your comments. > On Fri, Jul 26, 2019 at 02:42:43AM +0800, Pengfei Li wrote: > > Objective > > > > The motivation for this series of patches is use unsigned int for > > "order" in compaction.c, just like in other memory subsystems. > > > > Why? The series is relatively subtle in parts, particularly patch 5. Before I sent this series of patches, I took a close look at the git log for compact.c. Here is a short history, trouble you to look patiently. 1) At first, "order" is _unsigned int_ The commit 56de7263fcf3 ("mm: compaction: direct compact when a high-order allocation fails") introduced the "order" in compact_control and its type is unsigned int. Besides, you specify that order == -1 is the flag that triggers compaction via proc. 2) Next, because order equals -1 is special, it causes an error. The commit 7be62de99adc ("vmscan: kswapd carefully call compaction") determines if "order" is less than 0. This condition is always true because the type of "order" is _unsigned int_. - compact_zone(zone, ); + if (cc->order < 0 || !compaction_deferred(zone)) 3) Finally, in order to fix the above error, the type of the order is modified to _int_ It is done by commit: aad6ec3777bf ("mm: compaction: make compact_control order signed"). The reason I mention this is because I want to express that the type of "order" is originally _unsigned int_. And "order" is modified to _int_ because of the special value of -1. If the special value of "order" is not a negative number (for example, -1), but a number greater than MAX_ORDER - 1 (for example, MAX_ORDER), then the "order" may still be _unsigned int_ now. > There have been places where by it was important for order to be able to > go negative due to loop exit conditions. I think that even if "cc->order" is _unsigned int_, it can be done with a local temporary variable easily. Like this, function(...) { for(int tmp_order = cc->order; tmp_order >= 0; tmp_order--) { ... } } > If there was a gain from this > or it was a cleanup in the context of another major body of work, I > could understand the justification but that does not appear to be the > case here. > My final conclusion: Why "order" is _int_ instead of unsigned int? => Because order == -1 is used as the flag. => So what about making "order" greater than MAX_ORDER - 1? => The "order" can be _unsigned int_ just like in most places. (Can we only pick -1 as this special value?) This series of patches makes sense because, 1) It guarantees that "order" remains the same type. No one likes to see this __alloc_pages_slowpath(unsigned int order, ...) => should_compact_retry(int order, ...)/* The type changed */ => compaction_zonelist_suitable(int order, ...) => __compaction_suitable(int order, ...) => zone_watermark_ok(unsigned int order, ...) /* The type changed again! */ 2) It eliminates the evil "order == -1". If "order" is specified as any positive number greater than MAX_ORDER - 1 in commit 56de7263fcf3, perhaps no int order will appear in compact.c until now. > -- > Mel Gorman Thank you again for your comments, and sincerely thank you for your patience in reading such a long email. > SUSE Labs
Re: [PATCH 02/10] mm/page_alloc: use unsigned int for "order" in __rmqueue_fallback()
On Fri, Jul 26, 2019 at 5:36 PM Rasmus Villemoes wrote: > > On 25/07/2019 20.42, Pengfei Li wrote: > > Because "order" will never be negative in __rmqueue_fallback(), > > so just make "order" unsigned int. > > And modify trace_mm_page_alloc_extfrag() accordingly. > > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 75c18f4fd66a..1432cbcd87cd 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -2631,8 +2631,8 @@ static bool unreserve_highatomic_pageblock(const > > struct alloc_context *ac, > > * condition simpler. > > */ > > static __always_inline bool > > -__rmqueue_fallback(struct zone *zone, int order, int start_migratetype, > > - unsigned int alloc_flags) > > +__rmqueue_fallback(struct zone *zone, unsigned int order, > > + int start_migratetype, unsigned int alloc_flags) > > { > > Please read the last paragraph of the comment above this function, run > git blame to figure out when that was introduced, and then read the full > commit description. Thanks for your comments. I have read the commit info of commit b002529d2563 ("mm/page_alloc.c: eliminate unsigned confusion in __rmqueue_fallback"). And I looked at the discussion at https://lkml.org/lkml/2017/6/21/684 in detail. > Here be dragons. At the very least, this patch is > wrong in that it makes that comment inaccurate. I wonder if you noticed the commit 6bb154504f8b ("mm, page_alloc: spread allocations across zones before introducing fragmentation"). Commit 6bb154504f8b introduces a local variable min_order in __rmqueue_fallback(). And you can see for (current_order = MAX_ORDER - 1; current_order >= min_order; --current_order) { The “current_order” and "min_order" are int, so here is ok. Since __rmqueue_fallback() is only called by __rmqueue() and "order" is unsigned int in __rmqueue(), then I think that making "order" is also unsigned int is good. Maybe I should also modify the comments here? > > Rasmus Thank you again for your review. -- Pengfei
Re: [PATCH 00/10] make "order" unsigned int
On Fri, Jul 26, 2019 at 2:52 AM Qian Cai wrote: > > On Fri, 2019-07-26 at 02:42 +0800, Pengfei Li wrote: > > Objective > > > > The motivation for this series of patches is use unsigned int for > > "order" in compaction.c, just like in other memory subsystems. > > I suppose you will need more justification for this change. Right now, I don't Thanks for your comments. > see much real benefit apart from possibly introducing more regressions in > those As you can see, except for patch [05/10], other commits only modify the type of "order". So the change is not big. For the benefit, "order" may be negative, which is confusing and weird. There is no good reason not to do this since it can be avoided. > tricky areas of the code. Also, your testing seems quite lightweight. > Yes, you are right. I use "stress" for stress testing, and made some small code coverage testing. As you said, I need more ideas and comments about testing. Any suggestions for testing? Thanks again. -- Pengfei > > > > In addition, did some cleanup about "order" in page_alloc > > and vmscan. > > > > > > Description > > > > Directly modifying the type of "order" to unsigned int is ok in most > > places, because "order" is always non-negative. > > > > But there are two places that are special, one is next_search_order() > > and the other is compact_node(). > > > > For next_search_order(), order may be negative. It can be avoided by > > some modifications. > > > > For compact_node(), order = -1 means performing manual compaction. > > It can be avoided by specifying order = MAX_ORDER. > > > > Key changes in [PATCH 05/10] mm/compaction: make "order" and > > "search_order" unsigned. > > > > More information can be obtained from commit messages. > > > > > > Test > > > > I have done some stress testing locally and have not found any problems. > > > > In addition, local tests indicate no performance impact. > > > > > > Pengfei Li (10): > > mm/page_alloc: use unsigned int for "order" in should_compact_retry() > > mm/page_alloc: use unsigned int for "order" in __rmqueue_fallback() > > mm/page_alloc: use unsigned int for "order" in should_compact_retry() > > mm/page_alloc: remove never used "order" in alloc_contig_range() > > mm/compaction: make "order" and "search_order" unsigned int in struct > > compact_control > > mm/compaction: make "order" unsigned int in compaction.c > > trace/events/compaction: make "order" unsigned int > > mm/compaction: use unsigned int for "compact_order_failed" in struct > > zone > > mm/compaction: use unsigned int for "kcompactd_max_order" in struct > > pglist_data > > mm/vmscan: use unsigned int for "kswapd_order" in struct pglist_data > > > > include/linux/compaction.h| 30 +++ > > include/linux/mmzone.h| 8 +- > > include/trace/events/compaction.h | 40 +- > > include/trace/events/kmem.h | 6 +- > > include/trace/events/oom.h| 6 +- > > include/trace/events/vmscan.h | 4 +- > > mm/compaction.c | 127 +++--- > > mm/internal.h | 6 +- > > mm/page_alloc.c | 16 ++-- > > mm/vmscan.c | 6 +- > > 10 files changed, 126 insertions(+), 123 deletions(-) > >
Re: [PATCH 01/10] mm/page_alloc: use unsigned int for "order" in should_compact_retry()
On Fri, Jul 26, 2019 at 2:58 AM Matthew Wilcox wrote: > > On Fri, Jul 26, 2019 at 02:42:44AM +0800, Pengfei Li wrote: > > static inline bool > > -should_compact_retry(struct alloc_context *ac, int order, int alloc_flags, > > - enum compact_result compact_result, > > - enum compact_priority *compact_priority, > > - int *compaction_retries) > > +should_compact_retry(struct alloc_context *ac, unsigned int order, > > + int alloc_flags, enum compact_result compact_result, > > + enum compact_priority *compact_priority, int *compaction_retries) > > { > > int max_retries = MAX_COMPACT_RETRIES; > > One tab here is insufficient indentation. It should be at least two. Thanks for your comments. > Some parts of the kernel insist on lining up arguments with the opening > parenthesis of the function; I don't know if mm really obeys this rule, > but you're indenting function arguments to the same level as the opening > variables of the function, which is confusing. I will use two tabs in the next version. -- Pengfei
[PATCH 09/10] mm/compaction: use unsigned int for "kcompactd_max_order" in struct pglist_data
Because "kcompactd_max_order" will never be negative, so just make it unsigned int. Signed-off-by: Pengfei Li --- include/linux/compaction.h | 6 -- include/linux/mmzone.h | 2 +- mm/compaction.c| 2 +- 3 files changed, 6 insertions(+), 4 deletions(-) diff --git a/include/linux/compaction.h b/include/linux/compaction.h index a8049d582265..1b296de6efef 100644 --- a/include/linux/compaction.h +++ b/include/linux/compaction.h @@ -175,7 +175,8 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, extern int kcompactd_run(int nid); extern void kcompactd_stop(int nid); -extern void wakeup_kcompactd(pg_data_t *pgdat, int order, int classzone_idx); +extern void wakeup_kcompactd(pg_data_t *pgdat, unsigned int order, + int classzone_idx); #else static inline void reset_isolation_suitable(pg_data_t *pgdat) @@ -220,7 +221,8 @@ static inline void kcompactd_stop(int nid) { } -static inline void wakeup_kcompactd(pg_data_t *pgdat, int order, int classzone_idx) +static inline void wakeup_kcompactd(pg_data_t *pgdat, unsigned int order, + int classzone_idx) { } diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 0947e7cb4214..60bebdf47661 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -723,7 +723,7 @@ typedef struct pglist_data { int kswapd_failures;/* Number of 'reclaimed == 0' runs */ #ifdef CONFIG_COMPACTION - int kcompactd_max_order; + unsigned int kcompactd_max_order; enum zone_type kcompactd_classzone_idx; wait_queue_head_t kcompactd_wait; struct task_struct *kcompactd; diff --git a/mm/compaction.c b/mm/compaction.c index aad638ad2cc6..909ead244cff 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -2607,7 +2607,7 @@ static void kcompactd_do_work(pg_data_t *pgdat) pgdat->kcompactd_classzone_idx = pgdat->nr_zones - 1; } -void wakeup_kcompactd(pg_data_t *pgdat, int order, int classzone_idx) +void wakeup_kcompactd(pg_data_t *pgdat, unsigned int order, int classzone_idx) { if (!order) return; -- 2.21.0
[PATCH 10/10] mm/vmscan: use unsigned int for "kswapd_order" in struct pglist_data
Because "kswapd_order" will never be negative, so just make it unsigned int. And modify wakeup_kswapd(), kswapd_try_to_sleep() and trace_mm_vmscan_kswapd_wake() accordingly. Besides, make "order" unsigned int in two related trace functions. Signed-off-by: Pengfei Li --- include/linux/mmzone.h| 4 ++-- include/trace/events/compaction.h | 10 +- include/trace/events/vmscan.h | 4 ++-- mm/vmscan.c | 6 +++--- 4 files changed, 12 insertions(+), 12 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 60bebdf47661..1196ed0cee67 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -717,7 +717,7 @@ typedef struct pglist_data { wait_queue_head_t pfmemalloc_wait; struct task_struct *kswapd; /* Protected by mem_hotplug_begin/end() */ - int kswapd_order; + unsigned int kswapd_order; enum zone_type kswapd_classzone_idx; int kswapd_failures;/* Number of 'reclaimed == 0' runs */ @@ -802,7 +802,7 @@ static inline bool pgdat_is_empty(pg_data_t *pgdat) #include void build_all_zonelists(pg_data_t *pgdat); -void wakeup_kswapd(struct zone *zone, gfp_t gfp_mask, int order, +void wakeup_kswapd(struct zone *zone, gfp_t gfp_mask, unsigned int order, enum zone_type classzone_idx); bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark, int classzone_idx, unsigned int alloc_flags, diff --git a/include/trace/events/compaction.h b/include/trace/events/compaction.h index f83ba40f9614..34a9fac3b4d6 100644 --- a/include/trace/events/compaction.h +++ b/include/trace/events/compaction.h @@ -314,13 +314,13 @@ TRACE_EVENT(mm_compaction_kcompactd_sleep, DECLARE_EVENT_CLASS(kcompactd_wake_template, - TP_PROTO(int nid, int order, enum zone_type classzone_idx), + TP_PROTO(int nid, unsigned int order, enum zone_type classzone_idx), TP_ARGS(nid, order, classzone_idx), TP_STRUCT__entry( __field(int, nid) - __field(int, order) + __field(unsigned int, order) __field(enum zone_type, classzone_idx) ), @@ -330,7 +330,7 @@ DECLARE_EVENT_CLASS(kcompactd_wake_template, __entry->classzone_idx = classzone_idx; ), - TP_printk("nid=%d order=%d classzone_idx=%-8s", + TP_printk("nid=%d order=%u classzone_idx=%-8s", __entry->nid, __entry->order, __print_symbolic(__entry->classzone_idx, ZONE_TYPE)) @@ -338,14 +338,14 @@ DECLARE_EVENT_CLASS(kcompactd_wake_template, DEFINE_EVENT(kcompactd_wake_template, mm_compaction_wakeup_kcompactd, - TP_PROTO(int nid, int order, enum zone_type classzone_idx), + TP_PROTO(int nid, unsigned int order, enum zone_type classzone_idx), TP_ARGS(nid, order, classzone_idx) ); DEFINE_EVENT(kcompactd_wake_template, mm_compaction_kcompactd_wake, - TP_PROTO(int nid, int order, enum zone_type classzone_idx), + TP_PROTO(int nid, unsigned int order, enum zone_type classzone_idx), TP_ARGS(nid, order, classzone_idx) ); diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h index c37e2280e6dd..13c214f3750b 100644 --- a/include/trace/events/vmscan.h +++ b/include/trace/events/vmscan.h @@ -74,7 +74,7 @@ TRACE_EVENT(mm_vmscan_kswapd_wake, TRACE_EVENT(mm_vmscan_wakeup_kswapd, - TP_PROTO(int nid, int zid, int order, gfp_t gfp_flags), + TP_PROTO(int nid, int zid, unsigned int order, gfp_t gfp_flags), TP_ARGS(nid, zid, order, gfp_flags), @@ -92,7 +92,7 @@ TRACE_EVENT(mm_vmscan_wakeup_kswapd, __entry->gfp_flags = gfp_flags; ), - TP_printk("nid=%d order=%d gfp_flags=%s", + TP_printk("nid=%d order=%u gfp_flags=%s", __entry->nid, __entry->order, show_gfp_flags(__entry->gfp_flags)) diff --git a/mm/vmscan.c b/mm/vmscan.c index f4fd02ae233e..9d98a2e5f736 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3781,8 +3781,8 @@ static enum zone_type kswapd_classzone_idx(pg_data_t *pgdat, return pgdat->kswapd_classzone_idx; } -static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_order, - unsigned int classzone_idx) +static void kswapd_try_to_sleep(pg_data_t *pgdat, unsigned int alloc_order, + unsigned int reclaim_order, unsigned int classzone_idx) { long remaining = 0; DEFINE_WAIT(wait); @@ -3956,7 +3956,7 @@ static int kswapd(void *p) * has failed or is not needed, still wake up kcompactd if only compaction is * needed. */ -void wakeup_kswapd(struct zone *zone, gfp_t gfp_flags, int order, +void wakeup_kswapd(st
[PATCH 08/10] mm/compaction: use unsigned int for "compact_order_failed" in struct zone
Because "compact_order_failed" will never be negative, so just make it unsigned int. And modify three related trace functions accordingly. Signed-off-by: Pengfei Li --- include/linux/compaction.h| 12 ++-- include/linux/mmzone.h| 2 +- include/trace/events/compaction.h | 14 +++--- mm/compaction.c | 8 4 files changed, 18 insertions(+), 18 deletions(-) diff --git a/include/linux/compaction.h b/include/linux/compaction.h index 0201dfa57d44..a8049d582265 100644 --- a/include/linux/compaction.h +++ b/include/linux/compaction.h @@ -99,11 +99,11 @@ extern void reset_isolation_suitable(pg_data_t *pgdat); extern enum compact_result compaction_suitable(struct zone *zone, unsigned int order, unsigned int alloc_flags, int classzone_idx); -extern void defer_compaction(struct zone *zone, int order); -extern bool compaction_deferred(struct zone *zone, int order); -extern void compaction_defer_reset(struct zone *zone, int order, +extern void defer_compaction(struct zone *zone, unsigned int order); +extern bool compaction_deferred(struct zone *zone, unsigned int order); +extern void compaction_defer_reset(struct zone *zone, unsigned int order, bool alloc_success); -extern bool compaction_restarting(struct zone *zone, int order); +extern bool compaction_restarting(struct zone *zone, unsigned int order); /* Compaction has made some progress and retrying makes sense */ static inline bool compaction_made_progress(enum compact_result result) @@ -188,11 +188,11 @@ static inline enum compact_result compaction_suitable(struct zone *zone, return COMPACT_SKIPPED; } -static inline void defer_compaction(struct zone *zone, int order) +static inline void defer_compaction(struct zone *zone, unsigned int order) { } -static inline bool compaction_deferred(struct zone *zone, int order) +static inline bool compaction_deferred(struct zone *zone, unsigned int order) { return true; } diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index d77d717c620c..0947e7cb4214 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -545,7 +545,7 @@ struct zone { */ unsigned intcompact_considered; unsigned intcompact_defer_shift; - int compact_order_failed; + unsigned intcompact_order_failed; #endif #if defined CONFIG_COMPACTION || defined CONFIG_CMA diff --git a/include/trace/events/compaction.h b/include/trace/events/compaction.h index 1e1e74f6d128..f83ba40f9614 100644 --- a/include/trace/events/compaction.h +++ b/include/trace/events/compaction.h @@ -243,17 +243,17 @@ DEFINE_EVENT(mm_compaction_suitable_template, mm_compaction_suitable, DECLARE_EVENT_CLASS(mm_compaction_defer_template, - TP_PROTO(struct zone *zone, int order), + TP_PROTO(struct zone *zone, unsigned int order), TP_ARGS(zone, order), TP_STRUCT__entry( __field(int, nid) __field(enum zone_type, idx) - __field(int, order) + __field(unsigned int, order) __field(unsigned int, considered) __field(unsigned int, defer_shift) - __field(int, order_failed) + __field(unsigned int, order_failed) ), TP_fast_assign( @@ -265,7 +265,7 @@ DECLARE_EVENT_CLASS(mm_compaction_defer_template, __entry->order_failed = zone->compact_order_failed; ), - TP_printk("node=%d zone=%-8s order=%d order_failed=%d consider=%u limit=%lu", + TP_printk("node=%d zone=%-8s order=%u order_failed=%u consider=%u limit=%lu", __entry->nid, __print_symbolic(__entry->idx, ZONE_TYPE), __entry->order, @@ -276,21 +276,21 @@ DECLARE_EVENT_CLASS(mm_compaction_defer_template, DEFINE_EVENT(mm_compaction_defer_template, mm_compaction_deferred, - TP_PROTO(struct zone *zone, int order), + TP_PROTO(struct zone *zone, unsigned int order), TP_ARGS(zone, order) ); DEFINE_EVENT(mm_compaction_defer_template, mm_compaction_defer_compaction, - TP_PROTO(struct zone *zone, int order), + TP_PROTO(struct zone *zone, unsigned int order), TP_ARGS(zone, order) ); DEFINE_EVENT(mm_compaction_defer_template, mm_compaction_defer_reset, - TP_PROTO(struct zone *zone, int order), + TP_PROTO(struct zone *zone, unsigned int order), TP_ARGS(zone, order) ); diff --git a/mm/compaction.c b/mm/compaction.c index ac5df82d46e0..aad638ad2cc6 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -139,7 +139,7 @@ EXPORT_SYMBOL(__ClearPageMovable); * allocation success. 1 << compact_defer_limit compactions are skipped up * to a limit of 1 << COMPACT_MAX_DEFER_SHIFT */ -void defer_compaction(struct zone *zone, int
[PATCH 06/10] mm/compaction: make "order" unsigned int in compaction.c
Since compact_control->order and compact_control->search_order have been modified to unsigned int in the previous commit, then some of the functions in compaction.c are modified accordingly. Signed-off-by: Pengfei Li --- include/linux/compaction.h | 12 ++-- mm/compaction.c| 21 ++--- 2 files changed, 16 insertions(+), 17 deletions(-) diff --git a/include/linux/compaction.h b/include/linux/compaction.h index 9569e7c786d3..0201dfa57d44 100644 --- a/include/linux/compaction.h +++ b/include/linux/compaction.h @@ -96,8 +96,8 @@ extern enum compact_result try_to_compact_pages(gfp_t gfp_mask, const struct alloc_context *ac, enum compact_priority prio, struct page **page); extern void reset_isolation_suitable(pg_data_t *pgdat); -extern enum compact_result compaction_suitable(struct zone *zone, int order, - unsigned int alloc_flags, int classzone_idx); +extern enum compact_result compaction_suitable(struct zone *zone, + unsigned int order, unsigned int alloc_flags, int classzone_idx); extern void defer_compaction(struct zone *zone, int order); extern bool compaction_deferred(struct zone *zone, int order); @@ -170,8 +170,8 @@ static inline bool compaction_withdrawn(enum compact_result result) } -bool compaction_zonelist_suitable(struct alloc_context *ac, int order, - int alloc_flags); +bool compaction_zonelist_suitable(struct alloc_context *ac, + unsigned int order, int alloc_flags); extern int kcompactd_run(int nid); extern void kcompactd_stop(int nid); @@ -182,8 +182,8 @@ static inline void reset_isolation_suitable(pg_data_t *pgdat) { } -static inline enum compact_result compaction_suitable(struct zone *zone, int order, - int alloc_flags, int classzone_idx) +static inline enum compact_result compaction_suitable(struct zone *zone, + unsigned int order, int alloc_flags, int classzone_idx) { return COMPACT_SKIPPED; } diff --git a/mm/compaction.c b/mm/compaction.c index e47d8fa943a6..ac5df82d46e0 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1639,7 +1639,7 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc) unsigned long distance; unsigned long pfn = cc->migrate_pfn; unsigned long high_pfn; - int order; + unsigned int order; /* Skip hints are relied on to avoid repeats on the fast search */ if (cc->ignore_skip_hint) @@ -1958,10 +1958,9 @@ static enum compact_result compact_finished(struct compact_control *cc) * COMPACT_SUCCESS - If the allocation would succeed without compaction * COMPACT_CONTINUE - If compaction should run now */ -static enum compact_result __compaction_suitable(struct zone *zone, int order, - unsigned int alloc_flags, - int classzone_idx, - unsigned long wmark_target) +static enum compact_result __compaction_suitable(struct zone *zone, + unsigned int order, unsigned int alloc_flags, + int classzone_idx, unsigned long wmark_target) { unsigned long watermark; @@ -1998,7 +1997,7 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order, return COMPACT_CONTINUE; } -enum compact_result compaction_suitable(struct zone *zone, int order, +enum compact_result compaction_suitable(struct zone *zone, unsigned int order, unsigned int alloc_flags, int classzone_idx) { @@ -2036,7 +2035,7 @@ enum compact_result compaction_suitable(struct zone *zone, int order, return ret; } -bool compaction_zonelist_suitable(struct alloc_context *ac, int order, +bool compaction_zonelist_suitable(struct alloc_context *ac, unsigned int order, int alloc_flags) { struct zone *zone; @@ -2278,10 +2277,10 @@ compact_zone(struct compact_control *cc, struct capture_control *capc) return ret; } -static enum compact_result compact_zone_order(struct zone *zone, int order, - gfp_t gfp_mask, enum compact_priority prio, - unsigned int alloc_flags, int classzone_idx, - struct page **capture) +static enum compact_result compact_zone_order(struct zone *zone, + unsigned int order, gfp_t gfp_mask, + enum compact_priority prio, unsigned int alloc_flags, + int classzone_idx, struct page **capture) { enum compact_result ret; struct compact_control cc = { -- 2.21.0
[PATCH 07/10] trace/events/compaction: make "order" unsigned int
Make the same type as "compact_control->order". Signed-off-by: Pengfei Li --- include/trace/events/compaction.h | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/include/trace/events/compaction.h b/include/trace/events/compaction.h index e5bf6ee4e814..1e1e74f6d128 100644 --- a/include/trace/events/compaction.h +++ b/include/trace/events/compaction.h @@ -170,14 +170,14 @@ TRACE_EVENT(mm_compaction_end, TRACE_EVENT(mm_compaction_try_to_compact_pages, TP_PROTO( - int order, + unsigned int order, gfp_t gfp_mask, int prio), TP_ARGS(order, gfp_mask, prio), TP_STRUCT__entry( - __field(int, order) + __field(unsigned int, order) __field(gfp_t, gfp_mask) __field(int, prio) ), @@ -188,7 +188,7 @@ TRACE_EVENT(mm_compaction_try_to_compact_pages, __entry->prio = prio; ), - TP_printk("order=%d gfp_mask=%s priority=%d", + TP_printk("order=%u gfp_mask=%s priority=%d", __entry->order, show_gfp_flags(__entry->gfp_mask), __entry->prio) @@ -197,7 +197,7 @@ TRACE_EVENT(mm_compaction_try_to_compact_pages, DECLARE_EVENT_CLASS(mm_compaction_suitable_template, TP_PROTO(struct zone *zone, - int order, + unsigned int order, int ret), TP_ARGS(zone, order, ret), @@ -205,7 +205,7 @@ DECLARE_EVENT_CLASS(mm_compaction_suitable_template, TP_STRUCT__entry( __field(int, nid) __field(enum zone_type, idx) - __field(int, order) + __field(unsigned int, order) __field(int, ret) ), @@ -216,7 +216,7 @@ DECLARE_EVENT_CLASS(mm_compaction_suitable_template, __entry->ret = ret; ), - TP_printk("node=%d zone=%-8s order=%d ret=%s", + TP_printk("node=%d zone=%-8s order=%u ret=%s", __entry->nid, __print_symbolic(__entry->idx, ZONE_TYPE), __entry->order, @@ -226,7 +226,7 @@ DECLARE_EVENT_CLASS(mm_compaction_suitable_template, DEFINE_EVENT(mm_compaction_suitable_template, mm_compaction_finished, TP_PROTO(struct zone *zone, - int order, + unsigned int order, int ret), TP_ARGS(zone, order, ret) @@ -235,7 +235,7 @@ DEFINE_EVENT(mm_compaction_suitable_template, mm_compaction_finished, DEFINE_EVENT(mm_compaction_suitable_template, mm_compaction_suitable, TP_PROTO(struct zone *zone, - int order, + unsigned int order, int ret), TP_ARGS(zone, order, ret) -- 2.21.0
[PATCH 03/10] mm/page_alloc: use unsigned int for "order" in should_compact_retry()
Because "order" will never be negative in should_compact_retry(), so just make "order" unsigned int. Signed-off-by: Pengfei Li --- mm/page_alloc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1432cbcd87cd..7d47af09461f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -839,7 +839,7 @@ static inline struct capture_control *task_capc(struct zone *zone) static inline bool compaction_capture(struct capture_control *capc, struct page *page, - int order, int migratetype) + unsigned int order, int migratetype) { if (!capc || order != capc->cc->order) return false; @@ -870,7 +870,7 @@ static inline struct capture_control *task_capc(struct zone *zone) static inline bool compaction_capture(struct capture_control *capc, struct page *page, - int order, int migratetype) + unsigned int order, int migratetype) { return false; } -- 2.21.0
[PATCH 05/10] mm/compaction: make "order" and "search_order" unsigned int in struct compact_control
Objective The "order"and "search_order" is int in struct compact_control. This commit is aim to make "order" is unsigned int like other mm subsystem. Change 1) Change "order" and "search_order" to unsigned int 2) Make is_via_compact_memory() return true when "order" is equal to MAX_ORDER instead of -1, and rename it to is_manual_compaction() for a clearer meaning. 3) Modify next_search_order() to fit unsigned order. 4) Restore fast_search_fail to unsigned int. This is ok because search_order is already unsigned int, and after reverting fast_search_fail to unsigned int, compact_control is still within two cache lines. Signed-off-by: Pengfei Li --- mm/compaction.c | 96 + mm/internal.h | 6 ++-- 2 files changed, 53 insertions(+), 49 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 952dc2fb24e5..e47d8fa943a6 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -75,7 +75,7 @@ static void split_map_pages(struct list_head *list) list_for_each_entry_safe(page, next, list, lru) { list_del(>lru); - order = page_private(page); + order = page_order(page); nr_pages = 1 << order; post_alloc_hook(page, order, __GFP_MOVABLE); @@ -879,7 +879,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, * potential isolation targets. */ if (PageBuddy(page)) { - unsigned long freepage_order = page_order_unsafe(page); + unsigned int freepage_order = page_order_unsafe(page); /* * Without lock, we cannot be sure that what we got is @@ -1119,6 +1119,15 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn, #endif /* CONFIG_COMPACTION || CONFIG_CMA */ #ifdef CONFIG_COMPACTION +/* + * order == MAX_ORDER is expected when compacting via + * /proc/sys/vm/compact_memory + */ +static inline bool is_manual_compaction(struct compact_control *cc) +{ + return cc->order == MAX_ORDER; +} + static bool suitable_migration_source(struct compact_control *cc, struct page *page) { @@ -1167,7 +1176,7 @@ static bool suitable_migration_target(struct compact_control *cc, static inline unsigned int freelist_scan_limit(struct compact_control *cc) { - unsigned short shift = BITS_PER_LONG - 1; + unsigned int shift = BITS_PER_LONG - 1; return (COMPACT_CLUSTER_MAX >> min(shift, cc->fast_search_fail)) + 1; } @@ -1253,21 +1262,24 @@ fast_isolate_around(struct compact_control *cc, unsigned long pfn, unsigned long } /* Search orders in round-robin fashion */ -static int next_search_order(struct compact_control *cc, int order) +static unsigned int +next_search_order(struct compact_control *cc, unsigned int order) { - order--; - if (order < 0) - order = cc->order - 1; + unsigned int next_order = order - 1; - /* Search wrapped around? */ - if (order == cc->search_order) { - cc->search_order--; - if (cc->search_order < 0) + if (order == 0) + next_order = cc->order - 1; + + if (next_order == cc->search_order) { + next_order = UINT_MAX; + + order = cc->search_order; + cc->search_order -= 1; + if (order == 0) cc->search_order = cc->order - 1; - return -1; } - return order; + return next_order; } static unsigned long @@ -1280,10 +1292,10 @@ fast_isolate_freepages(struct compact_control *cc) unsigned long distance; struct page *page = NULL; bool scan_start = false; - int order; + unsigned int order; - /* Full compaction passes in a negative order */ - if (cc->order <= 0) + /* Full compaction when manual compaction */ + if (is_manual_compaction(cc)) return cc->free_pfn; /* @@ -1310,10 +1322,10 @@ fast_isolate_freepages(struct compact_control *cc) * Search starts from the last successful isolation order or the next * order to search after a previous failure */ - cc->search_order = min_t(unsigned int, cc->order - 1, cc->search_order); + cc->search_order = min(cc->order - 1, cc->search_order); for (order = cc->search_order; -!page && order >= 0; +!page && order < MAX_ORDER; order = next_search_order(cc, order)) { struct free_area *area = >zone->free_area[order]; struct list_head *freelist; @@ -1837,15 +1849,6 @@ s
[PATCH 04/10] mm/page_alloc: remove never used "order" in alloc_contig_range()
The "order" will never be used in alloc_contig_range(), and "order" is a negative number is very strange. So just remove it. Signed-off-by: Pengfei Li --- mm/page_alloc.c | 1 - 1 file changed, 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7d47af09461f..6208ebfac980 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8347,7 +8347,6 @@ int alloc_contig_range(unsigned long start, unsigned long end, struct compact_control cc = { .nr_migratepages = 0, - .order = -1, .zone = page_zone(pfn_to_page(start)), .mode = MIGRATE_SYNC, .ignore_skip_hint = true, -- 2.21.0
[PATCH 02/10] mm/page_alloc: use unsigned int for "order" in __rmqueue_fallback()
Because "order" will never be negative in __rmqueue_fallback(), so just make "order" unsigned int. And modify trace_mm_page_alloc_extfrag() accordingly. Signed-off-by: Pengfei Li --- include/trace/events/kmem.h | 6 +++--- mm/page_alloc.c | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h index eb57e3037deb..31f4d09aa31f 100644 --- a/include/trace/events/kmem.h +++ b/include/trace/events/kmem.h @@ -277,7 +277,7 @@ TRACE_EVENT(mm_page_pcpu_drain, TRACE_EVENT(mm_page_alloc_extfrag, TP_PROTO(struct page *page, - int alloc_order, int fallback_order, + unsigned int alloc_order, int fallback_order, int alloc_migratetype, int fallback_migratetype), TP_ARGS(page, @@ -286,7 +286,7 @@ TRACE_EVENT(mm_page_alloc_extfrag, TP_STRUCT__entry( __field(unsigned long, pfn ) - __field(int,alloc_order ) + __field(unsigned int, alloc_order ) __field(int,fallback_order ) __field(int,alloc_migratetype ) __field(int,fallback_migratetype) @@ -303,7 +303,7 @@ TRACE_EVENT(mm_page_alloc_extfrag, get_pageblock_migratetype(page)); ), - TP_printk("page=%p pfn=%lu alloc_order=%d fallback_order=%d pageblock_order=%d alloc_migratetype=%d fallback_migratetype=%d fragmenting=%d change_ownership=%d", + TP_printk("page=%p pfn=%lu alloc_order=%u fallback_order=%d pageblock_order=%d alloc_migratetype=%d fallback_migratetype=%d fragmenting=%d change_ownership=%d", pfn_to_page(__entry->pfn), __entry->pfn, __entry->alloc_order, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 75c18f4fd66a..1432cbcd87cd 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2631,8 +2631,8 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, * condition simpler. */ static __always_inline bool -__rmqueue_fallback(struct zone *zone, int order, int start_migratetype, - unsigned int alloc_flags) +__rmqueue_fallback(struct zone *zone, unsigned int order, + int start_migratetype, unsigned int alloc_flags) { struct free_area *area; int current_order; -- 2.21.0
[PATCH 00/10] make "order" unsigned int
Objective The motivation for this series of patches is use unsigned int for "order" in compaction.c, just like in other memory subsystems. In addition, did some cleanup about "order" in page_alloc and vmscan. Description Directly modifying the type of "order" to unsigned int is ok in most places, because "order" is always non-negative. But there are two places that are special, one is next_search_order() and the other is compact_node(). For next_search_order(), order may be negative. It can be avoided by some modifications. For compact_node(), order = -1 means performing manual compaction. It can be avoided by specifying order = MAX_ORDER. Key changes in [PATCH 05/10] mm/compaction: make "order" and "search_order" unsigned. More information can be obtained from commit messages. Test I have done some stress testing locally and have not found any problems. In addition, local tests indicate no performance impact. Pengfei Li (10): mm/page_alloc: use unsigned int for "order" in should_compact_retry() mm/page_alloc: use unsigned int for "order" in __rmqueue_fallback() mm/page_alloc: use unsigned int for "order" in should_compact_retry() mm/page_alloc: remove never used "order" in alloc_contig_range() mm/compaction: make "order" and "search_order" unsigned int in struct compact_control mm/compaction: make "order" unsigned int in compaction.c trace/events/compaction: make "order" unsigned int mm/compaction: use unsigned int for "compact_order_failed" in struct zone mm/compaction: use unsigned int for "kcompactd_max_order" in struct pglist_data mm/vmscan: use unsigned int for "kswapd_order" in struct pglist_data include/linux/compaction.h| 30 +++ include/linux/mmzone.h| 8 +- include/trace/events/compaction.h | 40 +- include/trace/events/kmem.h | 6 +- include/trace/events/oom.h| 6 +- include/trace/events/vmscan.h | 4 +- mm/compaction.c | 127 +++--- mm/internal.h | 6 +- mm/page_alloc.c | 16 ++-- mm/vmscan.c | 6 +- 10 files changed, 126 insertions(+), 123 deletions(-) -- 2.21.0
[PATCH 01/10] mm/page_alloc: use unsigned int for "order" in should_compact_retry()
Like another should_compact_retry(), use unsigned int for "order". And modify trace_compact_retry() accordingly. Signed-off-by: Pengfei Li --- include/trace/events/oom.h | 6 +++--- mm/page_alloc.c| 7 +++ 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/include/trace/events/oom.h b/include/trace/events/oom.h index 26a11e4a2c36..b7fa989349c7 100644 --- a/include/trace/events/oom.h +++ b/include/trace/events/oom.h @@ -154,7 +154,7 @@ TRACE_EVENT(skip_task_reaping, #ifdef CONFIG_COMPACTION TRACE_EVENT(compact_retry, - TP_PROTO(int order, + TP_PROTO(unsigned int order, enum compact_priority priority, enum compact_result result, int retries, @@ -164,7 +164,7 @@ TRACE_EVENT(compact_retry, TP_ARGS(order, priority, result, retries, max_retries, ret), TP_STRUCT__entry( - __field(int, order) + __field(unsigned int, order) __field(int, priority) __field(int, result) __field(int, retries) @@ -181,7 +181,7 @@ TRACE_EVENT(compact_retry, __entry->ret = ret; ), - TP_printk("order=%d priority=%s compaction_result=%s retries=%d max_retries=%d should_retry=%d", + TP_printk("order=%u priority=%s compaction_result=%s retries=%d max_retries=%d should_retry=%d", __entry->order, __print_symbolic(__entry->priority, COMPACTION_PRIORITY), __print_symbolic(__entry->result, COMPACTION_FEEDBACK), diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 272c6de1bf4e..75c18f4fd66a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3940,10 +3940,9 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, } static inline bool -should_compact_retry(struct alloc_context *ac, int order, int alloc_flags, -enum compact_result compact_result, -enum compact_priority *compact_priority, -int *compaction_retries) +should_compact_retry(struct alloc_context *ac, unsigned int order, + int alloc_flags, enum compact_result compact_result, + enum compact_priority *compact_priority, int *compaction_retries) { int max_retries = MAX_COMPACT_RETRIES; int min_priority; -- 2.21.0
Re: [PATCH v6 1/2] mm/vmalloc: do not keep unpurged areas in the busy tree
Thanks. Signed-off-by: Pengfei Li On Thu, Jul 25, 2019 at 10:36 AM Andrew Morton wrote: > > On Tue, 16 Jul 2019 23:26:55 +0800 Pengfei Li wrote: > > > From: "Uladzislau Rezki (Sony)" > > > > The busy tree can be quite big, even though the area is freed > > or unmapped it still stays there until "purge" logic removes > > it. > > > > 1) Optimize and reduce the size of "busy" tree by removing a > > node from it right away as soon as user triggers free paths. > > It is possible to do so, because the allocation is done using > > another augmented tree. > > > > The vmalloc test driver shows the difference, for example the > > "fix_size_alloc_test" is ~11% better comparing with default > > configuration: > > > > sudo ./test_vmalloc.sh performance > > > > > > Summary: fix_size_alloc_test loops: 100 avg: 993985 usec > > Summary: full_fit_alloc_test loops: 100 avg: 973554 usec > > Summary: long_busy_list_alloc_test loops: 100 avg: 12617652 usec > > > > > > > > Summary: fix_size_alloc_test loops: 100 avg: 882263 usec > > Summary: full_fit_alloc_test loops: 100 avg: 973407 usec > > Summary: long_busy_list_alloc_test loops: 100 avg: 12593929 usec > > > > > > 2) Since the busy tree now contains allocated areas only and does > > not interfere with lazily free nodes, introduce the new function > > show_purge_info() that dumps "unpurged" areas that is propagated > > through "/proc/vmallocinfo". > > > > 3) Eliminate VM_LAZY_FREE flag. > > > > Signed-off-by: Uladzislau Rezki (Sony) > > This should have included your signed-off-by, since you were on the > patch delivery path. (Documentation/process/submitting-patches.rst, > section 11). > > Please send along your signed-off-by?
[PATCH v6 2/2] mm/vmalloc: modify struct vmap_area to reduce its size
Objective - The current implementation of struct vmap_area wasted space. After applying this commit, sizeof(struct vmap_area) has been reduced from 11 words to 8 words. Description --- 1) Pack "subtree_max_size", "vm" and "purge_list". This is no problem because A) "subtree_max_size" is only used when vmap_area is in "free" tree B) "vm" is only used when vmap_area is in "busy" tree C) "purge_list" is only used when vmap_area is in vmap_purge_list 2) Eliminate "flags". Since only one flag VM_VM_AREA is being used, and the same thing can be done by judging whether "vm" is NULL, then the "flags" can be eliminated. Signed-off-by: Pengfei Li Suggested-by: Uladzislau Rezki (Sony) --- include/linux/vmalloc.h | 20 +--- mm/vmalloc.c| 24 ++-- 2 files changed, 23 insertions(+), 21 deletions(-) diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 9b21d0047710..a1334bd18ef1 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -51,15 +51,21 @@ struct vmap_area { unsigned long va_start; unsigned long va_end; - /* -* Largest available free size in subtree. -*/ - unsigned long subtree_max_size; - unsigned long flags; struct rb_node rb_node; /* address sorted rbtree */ struct list_head list; /* address sorted list */ - struct llist_node purge_list;/* "lazy purge" list */ - struct vm_struct *vm; + + /* +* The following three variables can be packed, because +* a vmap_area object is always one of the three states: +*1) in "free" tree (root is vmap_area_root) +*2) in "busy" tree (root is free_vmap_area_root) +*3) in purge list (head is vmap_purge_list) +*/ + union { + unsigned long subtree_max_size; /* in "free" tree */ + struct vm_struct *vm; /* in "busy" tree */ + struct llist_node purge_list; /* in purge list */ + }; }; /* diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 71d8040a8a0b..2f7edc0466e7 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -329,7 +329,6 @@ EXPORT_SYMBOL(vmalloc_to_pfn); #define DEBUG_AUGMENT_PROPAGATE_CHECK 0 #define DEBUG_AUGMENT_LOWEST_MATCH_CHECK 0 -#define VM_VM_AREA 0x04 static DEFINE_SPINLOCK(vmap_area_lock); /* Export for kexec only */ @@ -1115,7 +1114,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size, va->va_start = addr; va->va_end = addr + size; - va->flags = 0; + va->vm = NULL; insert_vmap_area(va, _area_root, _area_list); spin_unlock(_area_lock); @@ -1922,7 +1921,6 @@ void __init vmalloc_init(void) if (WARN_ON_ONCE(!va)) continue; - va->flags = VM_VM_AREA; va->va_start = (unsigned long)tmp->addr; va->va_end = va->va_start + tmp->size; va->vm = tmp; @@ -2020,7 +2018,6 @@ static void setup_vmalloc_vm(struct vm_struct *vm, struct vmap_area *va, vm->size = va->va_end - va->va_start; vm->caller = caller; va->vm = vm; - va->flags |= VM_VM_AREA; spin_unlock(_area_lock); } @@ -2125,10 +2122,10 @@ struct vm_struct *find_vm_area(const void *addr) struct vmap_area *va; va = find_vmap_area((unsigned long)addr); - if (va && va->flags & VM_VM_AREA) - return va->vm; + if (!va) + return NULL; - return NULL; + return va->vm; } /** @@ -2149,11 +2146,10 @@ struct vm_struct *remove_vm_area(const void *addr) spin_lock(_area_lock); va = __find_vmap_area((unsigned long)addr); - if (va && va->flags & VM_VM_AREA) { + if (va && va->vm) { struct vm_struct *vm = va->vm; va->vm = NULL; - va->flags &= ~VM_VM_AREA; spin_unlock(_area_lock); kasan_free_shadow(vm); @@ -2856,7 +2852,7 @@ long vread(char *buf, char *addr, unsigned long count) if (!count) break; - if (!(va->flags & VM_VM_AREA)) + if (!va->vm) continue; vm = va->vm; @@ -2936,7 +2932,7 @@ long vwrite(char *buf, char *addr, unsigned long count) if (!count) break; - if (!(va->flags & VM_VM_AREA)) + if (!va->vm) continue; vm = va->vm; @@ -3466,10 +3462,10 @@ static int s_show(struct
[PATCH v6 1/2] mm/vmalloc: do not keep unpurged areas in the busy tree
From: "Uladzislau Rezki (Sony)" The busy tree can be quite big, even though the area is freed or unmapped it still stays there until "purge" logic removes it. 1) Optimize and reduce the size of "busy" tree by removing a node from it right away as soon as user triggers free paths. It is possible to do so, because the allocation is done using another augmented tree. The vmalloc test driver shows the difference, for example the "fix_size_alloc_test" is ~11% better comparing with default configuration: sudo ./test_vmalloc.sh performance Summary: fix_size_alloc_test loops: 100 avg: 993985 usec Summary: full_fit_alloc_test loops: 100 avg: 973554 usec Summary: long_busy_list_alloc_test loops: 100 avg: 12617652 usec Summary: fix_size_alloc_test loops: 100 avg: 882263 usec Summary: full_fit_alloc_test loops: 100 avg: 973407 usec Summary: long_busy_list_alloc_test loops: 100 avg: 12593929 usec 2) Since the busy tree now contains allocated areas only and does not interfere with lazily free nodes, introduce the new function show_purge_info() that dumps "unpurged" areas that is propagated through "/proc/vmallocinfo". 3) Eliminate VM_LAZY_FREE flag. Signed-off-by: Uladzislau Rezki (Sony) --- mm/vmalloc.c | 52 1 file changed, 44 insertions(+), 8 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 4fa8d84599b0..71d8040a8a0b 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -329,7 +329,6 @@ EXPORT_SYMBOL(vmalloc_to_pfn); #define DEBUG_AUGMENT_PROPAGATE_CHECK 0 #define DEBUG_AUGMENT_LOWEST_MATCH_CHECK 0 -#define VM_LAZY_FREE 0x02 #define VM_VM_AREA 0x04 static DEFINE_SPINLOCK(vmap_area_lock); @@ -1276,7 +1275,14 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) llist_for_each_entry_safe(va, n_va, valist, purge_list) { unsigned long nr = (va->va_end - va->va_start) >> PAGE_SHIFT; - __free_vmap_area(va); + /* +* Finally insert or merge lazily-freed area. It is +* detached and there is no need to "unlink" it from +* anything. +*/ + merge_or_add_vmap_area(va, + _vmap_area_root, _vmap_area_list); + atomic_long_sub(nr, _lazy_nr); if (atomic_long_read(_lazy_nr) < resched_threshold) @@ -1318,6 +1324,10 @@ static void free_vmap_area_noflush(struct vmap_area *va) { unsigned long nr_lazy; + spin_lock(_area_lock); + unlink_va(va, _area_root); + spin_unlock(_area_lock); + nr_lazy = atomic_long_add_return((va->va_end - va->va_start) >> PAGE_SHIFT, _lazy_nr); @@ -2137,14 +2147,13 @@ struct vm_struct *remove_vm_area(const void *addr) might_sleep(); - va = find_vmap_area((unsigned long)addr); + spin_lock(_area_lock); + va = __find_vmap_area((unsigned long)addr); if (va && va->flags & VM_VM_AREA) { struct vm_struct *vm = va->vm; - spin_lock(_area_lock); va->vm = NULL; va->flags &= ~VM_VM_AREA; - va->flags |= VM_LAZY_FREE; spin_unlock(_area_lock); kasan_free_shadow(vm); @@ -2152,6 +2161,8 @@ struct vm_struct *remove_vm_area(const void *addr) return vm; } + + spin_unlock(_area_lock); return NULL; } @@ -3431,6 +3442,22 @@ static void show_numa_info(struct seq_file *m, struct vm_struct *v) } } +static void show_purge_info(struct seq_file *m) +{ + struct llist_node *head; + struct vmap_area *va; + + head = READ_ONCE(vmap_purge_list.first); + if (head == NULL) + return; + + llist_for_each_entry(va, head, purge_list) { + seq_printf(m, "0x%pK-0x%pK %7ld unpurged vm_area\n", + (void *)va->va_start, (void *)va->va_end, + va->va_end - va->va_start); + } +} + static int s_show(struct seq_file *m, void *p) { struct vmap_area *va; @@ -3443,10 +3470,9 @@ static int s_show(struct seq_file *m, void *p) * behalf of vmap area is being tear down or vm_map_ram allocation. */ if (!(va->flags & VM_VM_AREA)) { - seq_printf(m, "0x%pK-0x%pK %7ld %s\n", + seq_printf(m, "0x%pK-0x%pK %7ld vm_map_ram\n", (void *)va->va_start, (void *)va->va_end, - va->va_end - va->va_start, - va->flags & VM_LAZY_FREE ? "unpurged vm_area" : "vm_map_ram"); + va->va_end - va->va_start); return 0; } @@ -3482,6 +3508,16 @@ static int s_show(struct seq_file *m, void *p) show_numa_info(m, v); seq_putc(m, '\n'); + + /* +* As a final step, dump "unpurged" areas. Note, +* that
[PATCH v6 0/2] mm/vmalloc.c: improve readability and rewrite vmap_area
v5 -> v6 * patch 2: keep the comment in s_show() v4 -> v5 * Base on next-20190716 * patch 1: From Uladzislau Rezki (Sony) (author) - https://lkml.org/lkml/2019/7/16/276 * patch 2: Use v3 v3 -> v4: * Base on next-20190711 * patch 1: From: Uladzislau Rezki (Sony) (author) - https://lkml.org/lkml/2019/7/3/661 * patch 2: Modify the layout of struct vmap_area for readability v2 -> v3: * patch 1-4: Abandoned * patch 5: - Eliminate "flags" (suggested by Uladzislau Rezki) - Base on https://lkml.org/lkml/2019/6/6/455 and https://lkml.org/lkml/2019/7/3/661 v1 -> v2: * patch 3: Rename __find_vmap_area to __search_va_in_busy_tree instead of __search_va_from_busy_tree. * patch 5: Add motivation and necessary test data to the commit message. * patch 5: Let va->flags use only some low bits of va_start instead of completely overwriting va_start. The current implementation of struct vmap_area wasted space. After applying this commit, sizeof(struct vmap_area) has been reduced from 11 words to 8 words. Pengfei Li (1): mm/vmalloc: modify struct vmap_area to reduce its size Uladzislau Rezki (Sony) (1): mm/vmalloc: do not keep unpurged areas in the busy tree include/linux/vmalloc.h | 20 +++ mm/vmalloc.c| 76 + 2 files changed, 67 insertions(+), 29 deletions(-) -- 2.21.0
Re: [PATCH v5 2/2] mm/vmalloc: modify struct vmap_area to reduce its size
On Tue, Jul 16, 2019 at 10:35 PM Uladzislau Rezki wrote: > > On Tue, Jul 16, 2019 at 09:26:04PM +0800, Pengfei Li wrote: > > Objective > > - > > The current implementation of struct vmap_area wasted space. > > > > After applying this commit, sizeof(struct vmap_area) has been > > reduced from 11 words to 8 words. > > > > Description > > --- > > 1) Pack "subtree_max_size", "vm" and "purge_list". > > This is no problem because > > A) "subtree_max_size" is only used when vmap_area is in > >"free" tree > > B) "vm" is only used when vmap_area is in "busy" tree > > C) "purge_list" is only used when vmap_area is in > >vmap_purge_list > > > > 2) Eliminate "flags". > > Since only one flag VM_VM_AREA is being used, and the same > > thing can be done by judging whether "vm" is NULL, then the > > "flags" can be eliminated. > > > > Signed-off-by: Pengfei Li > > Suggested-by: Uladzislau Rezki (Sony) > > --- > > include/linux/vmalloc.h | 20 +--- > > mm/vmalloc.c| 24 ++-- > > 2 files changed, 23 insertions(+), 21 deletions(-) > > > > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h > > index 9b21d0047710..a1334bd18ef1 100644 > > --- a/include/linux/vmalloc.h > > +++ b/include/linux/vmalloc.h > > @@ -51,15 +51,21 @@ struct vmap_area { > > unsigned long va_start; > > unsigned long va_end; > > > > - /* > > - * Largest available free size in subtree. > > - */ > > - unsigned long subtree_max_size; > > - unsigned long flags; > > struct rb_node rb_node; /* address sorted rbtree */ > > struct list_head list; /* address sorted list */ > > - struct llist_node purge_list;/* "lazy purge" list */ > > - struct vm_struct *vm; > > + > > + /* > > + * The following three variables can be packed, because > > + * a vmap_area object is always one of the three states: > > + *1) in "free" tree (root is vmap_area_root) > > + *2) in "busy" tree (root is free_vmap_area_root) > > + *3) in purge list (head is vmap_purge_list) > > + */ > > + union { > > + unsigned long subtree_max_size; /* in "free" tree */ > > + struct vm_struct *vm; /* in "busy" tree */ > > + struct llist_node purge_list; /* in purge list */ > > + }; > > }; > > > > /* > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > index 71d8040a8a0b..39bf9cf4175a 100644 > > --- a/mm/vmalloc.c > > +++ b/mm/vmalloc.c > > @@ -329,7 +329,6 @@ EXPORT_SYMBOL(vmalloc_to_pfn); > > #define DEBUG_AUGMENT_PROPAGATE_CHECK 0 > > #define DEBUG_AUGMENT_LOWEST_MATCH_CHECK 0 > > > > -#define VM_VM_AREA 0x04 > > > > static DEFINE_SPINLOCK(vmap_area_lock); > > /* Export for kexec only */ > > @@ -1115,7 +1114,7 @@ static struct vmap_area *alloc_vmap_area(unsigned > > long size, > > > > va->va_start = addr; > > va->va_end = addr + size; > > - va->flags = 0; > > + va->vm = NULL; > > insert_vmap_area(va, _area_root, _area_list); > > > > spin_unlock(_area_lock); > > @@ -1922,7 +1921,6 @@ void __init vmalloc_init(void) > > if (WARN_ON_ONCE(!va)) > > continue; > > > > - va->flags = VM_VM_AREA; > > va->va_start = (unsigned long)tmp->addr; > > va->va_end = va->va_start + tmp->size; > > va->vm = tmp; > > @@ -2020,7 +2018,6 @@ static void setup_vmalloc_vm(struct vm_struct *vm, > > struct vmap_area *va, > > vm->size = va->va_end - va->va_start; > > vm->caller = caller; > > va->vm = vm; > > - va->flags |= VM_VM_AREA; > > spin_unlock(_area_lock); > > } > > > > @@ -2125,10 +2122,10 @@ struct vm_struct *find_vm_area(const void *addr) > > struct vmap_area *va; > > > > va = find_vmap_area((unsigned long)addr); > > - if (va && va->flags & VM_VM_AREA) > > - return va->vm; > > + if (!va) > > + return NULL; > > > > - return NULL; > > + return
[PATCH v5 1/2] mm/vmalloc: do not keep unpurged areas in the busy tree
From: "Uladzislau Rezki (Sony)" The busy tree can be quite big, even though the area is freed or unmapped it still stays there until "purge" logic removes it. 1) Optimize and reduce the size of "busy" tree by removing a node from it right away as soon as user triggers free paths. It is possible to do so, because the allocation is done using another augmented tree. The vmalloc test driver shows the difference, for example the "fix_size_alloc_test" is ~11% better comparing with default configuration: sudo ./test_vmalloc.sh performance Summary: fix_size_alloc_test loops: 100 avg: 993985 usec Summary: full_fit_alloc_test loops: 100 avg: 973554 usec Summary: long_busy_list_alloc_test loops: 100 avg: 12617652 usec Summary: fix_size_alloc_test loops: 100 avg: 882263 usec Summary: full_fit_alloc_test loops: 100 avg: 973407 usec Summary: long_busy_list_alloc_test loops: 100 avg: 12593929 usec 2) Since the busy tree now contains allocated areas only and does not interfere with lazily free nodes, introduce the new function show_purge_info() that dumps "unpurged" areas that is propagated through "/proc/vmallocinfo". 3) Eliminate VM_LAZY_FREE flag. Signed-off-by: Uladzislau Rezki (Sony) --- mm/vmalloc.c | 52 1 file changed, 44 insertions(+), 8 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 4fa8d84599b0..71d8040a8a0b 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -329,7 +329,6 @@ EXPORT_SYMBOL(vmalloc_to_pfn); #define DEBUG_AUGMENT_PROPAGATE_CHECK 0 #define DEBUG_AUGMENT_LOWEST_MATCH_CHECK 0 -#define VM_LAZY_FREE 0x02 #define VM_VM_AREA 0x04 static DEFINE_SPINLOCK(vmap_area_lock); @@ -1276,7 +1275,14 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) llist_for_each_entry_safe(va, n_va, valist, purge_list) { unsigned long nr = (va->va_end - va->va_start) >> PAGE_SHIFT; - __free_vmap_area(va); + /* +* Finally insert or merge lazily-freed area. It is +* detached and there is no need to "unlink" it from +* anything. +*/ + merge_or_add_vmap_area(va, + _vmap_area_root, _vmap_area_list); + atomic_long_sub(nr, _lazy_nr); if (atomic_long_read(_lazy_nr) < resched_threshold) @@ -1318,6 +1324,10 @@ static void free_vmap_area_noflush(struct vmap_area *va) { unsigned long nr_lazy; + spin_lock(_area_lock); + unlink_va(va, _area_root); + spin_unlock(_area_lock); + nr_lazy = atomic_long_add_return((va->va_end - va->va_start) >> PAGE_SHIFT, _lazy_nr); @@ -2137,14 +2147,13 @@ struct vm_struct *remove_vm_area(const void *addr) might_sleep(); - va = find_vmap_area((unsigned long)addr); + spin_lock(_area_lock); + va = __find_vmap_area((unsigned long)addr); if (va && va->flags & VM_VM_AREA) { struct vm_struct *vm = va->vm; - spin_lock(_area_lock); va->vm = NULL; va->flags &= ~VM_VM_AREA; - va->flags |= VM_LAZY_FREE; spin_unlock(_area_lock); kasan_free_shadow(vm); @@ -2152,6 +2161,8 @@ struct vm_struct *remove_vm_area(const void *addr) return vm; } + + spin_unlock(_area_lock); return NULL; } @@ -3431,6 +3442,22 @@ static void show_numa_info(struct seq_file *m, struct vm_struct *v) } } +static void show_purge_info(struct seq_file *m) +{ + struct llist_node *head; + struct vmap_area *va; + + head = READ_ONCE(vmap_purge_list.first); + if (head == NULL) + return; + + llist_for_each_entry(va, head, purge_list) { + seq_printf(m, "0x%pK-0x%pK %7ld unpurged vm_area\n", + (void *)va->va_start, (void *)va->va_end, + va->va_end - va->va_start); + } +} + static int s_show(struct seq_file *m, void *p) { struct vmap_area *va; @@ -3443,10 +3470,9 @@ static int s_show(struct seq_file *m, void *p) * behalf of vmap area is being tear down or vm_map_ram allocation. */ if (!(va->flags & VM_VM_AREA)) { - seq_printf(m, "0x%pK-0x%pK %7ld %s\n", + seq_printf(m, "0x%pK-0x%pK %7ld vm_map_ram\n", (void *)va->va_start, (void *)va->va_end, - va->va_end - va->va_start, - va->flags & VM_LAZY_FREE ? "unpurged vm_area" : "vm_map_ram"); + va->va_end - va->va_start); return 0; } @@ -3482,6 +3508,16 @@ static int s_show(struct seq_file *m, void *p) show_numa_info(m, v); seq_putc(m, '\n'); + + /* +* As a final step, dump "unpurged" areas. Note, +* that
[PATCH v5 2/2] mm/vmalloc: modify struct vmap_area to reduce its size
Objective - The current implementation of struct vmap_area wasted space. After applying this commit, sizeof(struct vmap_area) has been reduced from 11 words to 8 words. Description --- 1) Pack "subtree_max_size", "vm" and "purge_list". This is no problem because A) "subtree_max_size" is only used when vmap_area is in "free" tree B) "vm" is only used when vmap_area is in "busy" tree C) "purge_list" is only used when vmap_area is in vmap_purge_list 2) Eliminate "flags". Since only one flag VM_VM_AREA is being used, and the same thing can be done by judging whether "vm" is NULL, then the "flags" can be eliminated. Signed-off-by: Pengfei Li Suggested-by: Uladzislau Rezki (Sony) --- include/linux/vmalloc.h | 20 +--- mm/vmalloc.c| 24 ++-- 2 files changed, 23 insertions(+), 21 deletions(-) diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 9b21d0047710..a1334bd18ef1 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -51,15 +51,21 @@ struct vmap_area { unsigned long va_start; unsigned long va_end; - /* -* Largest available free size in subtree. -*/ - unsigned long subtree_max_size; - unsigned long flags; struct rb_node rb_node; /* address sorted rbtree */ struct list_head list; /* address sorted list */ - struct llist_node purge_list;/* "lazy purge" list */ - struct vm_struct *vm; + + /* +* The following three variables can be packed, because +* a vmap_area object is always one of the three states: +*1) in "free" tree (root is vmap_area_root) +*2) in "busy" tree (root is free_vmap_area_root) +*3) in purge list (head is vmap_purge_list) +*/ + union { + unsigned long subtree_max_size; /* in "free" tree */ + struct vm_struct *vm; /* in "busy" tree */ + struct llist_node purge_list; /* in purge list */ + }; }; /* diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 71d8040a8a0b..39bf9cf4175a 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -329,7 +329,6 @@ EXPORT_SYMBOL(vmalloc_to_pfn); #define DEBUG_AUGMENT_PROPAGATE_CHECK 0 #define DEBUG_AUGMENT_LOWEST_MATCH_CHECK 0 -#define VM_VM_AREA 0x04 static DEFINE_SPINLOCK(vmap_area_lock); /* Export for kexec only */ @@ -1115,7 +1114,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size, va->va_start = addr; va->va_end = addr + size; - va->flags = 0; + va->vm = NULL; insert_vmap_area(va, _area_root, _area_list); spin_unlock(_area_lock); @@ -1922,7 +1921,6 @@ void __init vmalloc_init(void) if (WARN_ON_ONCE(!va)) continue; - va->flags = VM_VM_AREA; va->va_start = (unsigned long)tmp->addr; va->va_end = va->va_start + tmp->size; va->vm = tmp; @@ -2020,7 +2018,6 @@ static void setup_vmalloc_vm(struct vm_struct *vm, struct vmap_area *va, vm->size = va->va_end - va->va_start; vm->caller = caller; va->vm = vm; - va->flags |= VM_VM_AREA; spin_unlock(_area_lock); } @@ -2125,10 +2122,10 @@ struct vm_struct *find_vm_area(const void *addr) struct vmap_area *va; va = find_vmap_area((unsigned long)addr); - if (va && va->flags & VM_VM_AREA) - return va->vm; + if (!va) + return NULL; - return NULL; + return va->vm; } /** @@ -2149,11 +2146,10 @@ struct vm_struct *remove_vm_area(const void *addr) spin_lock(_area_lock); va = __find_vmap_area((unsigned long)addr); - if (va && va->flags & VM_VM_AREA) { + if (va && va->vm) { struct vm_struct *vm = va->vm; va->vm = NULL; - va->flags &= ~VM_VM_AREA; spin_unlock(_area_lock); kasan_free_shadow(vm); @@ -2856,7 +2852,7 @@ long vread(char *buf, char *addr, unsigned long count) if (!count) break; - if (!(va->flags & VM_VM_AREA)) + if (!va->vm) continue; vm = va->vm; @@ -2936,7 +2932,7 @@ long vwrite(char *buf, char *addr, unsigned long count) if (!count) break; - if (!(va->flags & VM_VM_AREA)) + if (!va->vm) continue; vm = va->vm; @@ -3466,10 +3462,10 @@ static int s_show(struct
[PATCH v5 0/2] mm/vmalloc.c: improve readability and rewrite vmap_area
v4 -> v5 * Base on next-20190716 * patch 1: From Uladzislau Rezki (Sony) (author) - https://lkml.org/lkml/2019/7/16/276 * patch 2: Use v3 v3 -> v4: * Base on next-20190711 * patch 1: From: Uladzislau Rezki (Sony) (author) - https://lkml.org/lkml/2019/7/3/661 * patch 2: Modify the layout of struct vmap_area for readability v2 -> v3: * patch 1-4: Abandoned * patch 5: - Eliminate "flags" (suggested by Uladzislau Rezki) - Base on https://lkml.org/lkml/2019/6/6/455 and https://lkml.org/lkml/2019/7/3/661 v1 -> v2: * patch 3: Rename __find_vmap_area to __search_va_in_busy_tree instead of __search_va_from_busy_tree. * patch 5: Add motivation and necessary test data to the commit message. * patch 5: Let va->flags use only some low bits of va_start instead of completely overwriting va_start. The current implementation of struct vmap_area wasted space. After applying this commit, sizeof(struct vmap_area) has been reduced from 11 words to 8 words. Pengfei Li (1): mm/vmalloc: modify struct vmap_area to reduce its size Uladzislau Rezki (Sony) (1): mm/vmalloc: do not keep unpurged areas in the busy tree include/linux/vmalloc.h | 20 +++ mm/vmalloc.c| 76 + 2 files changed, 67 insertions(+), 29 deletions(-) -- 2.21.0
Re: [PATCH v4 2/2] mm/vmalloc.c: Modify struct vmap_area to reduce its size
Hi, Vlad Thanks for the comments form you and Matthew, now I am sure v3 is enough. I will follow the next version of your "mm/vmalloc: do not keep unpurged areas in the busy tree". Thanks again for your patience with me! -- Pengfei
Re: [PATCH v4 2/2] mm/vmalloc.c: Modify struct vmap_area to reduce its size
On Fri, Jul 12, 2019 at 9:49 PM Matthew Wilcox wrote: > > On Fri, Jul 12, 2019 at 08:02:13PM +0800, Pengfei Li wrote: > > I don't think you need struct union struct union. Because llist_node > is just a pointer, you can get the same savings with just: > > union { > struct llist_node purge_list; > struct vm_struct *vm; > unsigned long subtree_max_size; > }; > Thanks for your comments. As you said, I did this in v3. https://patchwork.kernel.org/patch/11031507/ The reason why I use struct union struct in v4 is that I want to express "in the tree" and "in the purge list" are two completely isolated cases. struct vmap_area { union { struct {/* Case A: In the tree */ ... }; struct {/* Case B: In the purge list */ ... }; }; }; The "rb_node" and "list" should also not be used when va is in the purge list what do you think of this idea?
[PATCH v4 2/2] mm/vmalloc.c: Modify struct vmap_area to reduce its size
Objective - The current implementation of struct vmap_area wasted space. After applying this commit, sizeof(struct vmap_area) has been reduced from 11 words to 8 words. Description --- 1) Pack "vm" and "subtree_max_size" This is no problem because A) "vm" is only used when vmap_area is in "busy" tree B) "subtree_max_size" is only used when vmap_area is in "free" tree 2) Pack "purge_list" The variable "purge_list" is only used when vmap_area is in "lazy purge" list. So it can be packed with other variables, which are only used in rbtree and list sorted by address. 3) Eliminate "flags". Since only one flag VM_VM_AREA is being used, and the same thing can be done by judging whether "vm" is NULL, then the "flags" can be eliminated. Signed-off-by: Pengfei Li Suggested-by: Uladzislau Rezki (Sony) --- include/linux/vmalloc.h | 40 +++- mm/vmalloc.c| 28 +--- 2 files changed, 44 insertions(+), 24 deletions(-) diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 9b21d0047710..6fb377ca9e7a 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -51,15 +51,37 @@ struct vmap_area { unsigned long va_start; unsigned long va_end; - /* -* Largest available free size in subtree. -*/ - unsigned long subtree_max_size; - unsigned long flags; - struct rb_node rb_node; /* address sorted rbtree */ - struct list_head list; /* address sorted list */ - struct llist_node purge_list;/* "lazy purge" list */ - struct vm_struct *vm; + union { + /* In rbtree and list sorted by address */ + struct { + union { + /* +* In "busy" rbtree and list. +* rbtree root: vmap_area_root +* list head: vmap_area_list +*/ + struct vm_struct *vm; + + /* +* In "free" rbtree and list. +* rbtree root: free_vmap_area_root +* list head: free_vmap_area_list +*/ + unsigned long subtree_max_size; + }; + + struct rb_node rb_node; + struct list_head list; + }; + + /* +* In "lazy purge" list. +* llist head: vmap_purge_list +*/ + struct { + struct llist_node purge_list; + }; + }; }; /* diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 9eb700a2087b..1245d3285a32 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -329,7 +329,6 @@ EXPORT_SYMBOL(vmalloc_to_pfn); #define DEBUG_AUGMENT_PROPAGATE_CHECK 0 #define DEBUG_AUGMENT_LOWEST_MATCH_CHECK 0 -#define VM_VM_AREA 0x04 static DEFINE_SPINLOCK(vmap_area_lock); /* Export for kexec only */ @@ -1115,7 +1114,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size, va->va_start = addr; va->va_end = addr + size; - va->flags = 0; + va->vm = NULL; insert_vmap_area(va, _area_root, _area_list); spin_unlock(_area_lock); @@ -1279,7 +1278,9 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) llist_for_each_entry_safe(va, n_va, valist, purge_list) { unsigned long nr = (va->va_end - va->va_start) >> PAGE_SHIFT; - __free_vmap_area(va); + merge_or_add_vmap_area(va, + _vmap_area_root, _vmap_area_list); + atomic_long_sub(nr, _lazy_nr); if (atomic_long_read(_lazy_nr) < resched_threshold) @@ -1919,7 +1920,6 @@ void __init vmalloc_init(void) if (WARN_ON_ONCE(!va)) continue; - va->flags = VM_VM_AREA; va->va_start = (unsigned long)tmp->addr; va->va_end = va->va_start + tmp->size; va->vm = tmp; @@ -2017,7 +2017,6 @@ static void setup_vmalloc_vm(struct vm_struct *vm, struct vmap_area *va, vm->size = va->va_end - va->va_start; vm->caller = caller; va->vm = vm; - va->flags |= VM_VM_AREA; spin_unlock(_area_lock); } @@ -2122,10 +2121,10 @@ struct vm_struct *find_vm_area(const void *addr) struct vmap_area *va; va = find_vmap_area((unsigned long)addr); - if (va && va->flags & VM_VM_AREA) -
[PATCH v4 0/2] mm/vmalloc.c: improve readability and rewrite vmap_area
v3 -> v4: * Base on next-20190711 * patch 1: From: Uladzislau Rezki (Sony) (author) - https://lkml.org/lkml/2019/7/3/661 * patch 2: Modify the layout of struct vmap_area for readability v2 -> v3: * patch 1-4: Abandoned * patch 5: - Eliminate "flags" (suggested by Uladzislau Rezki) - Base on https://lkml.org/lkml/2019/6/6/455 and https://lkml.org/lkml/2019/7/3/661 v1 -> v2: * patch 3: Rename __find_vmap_area to __search_va_in_busy_tree instead of __search_va_from_busy_tree. * patch 5: Add motivation and necessary test data to the commit message. * patch 5: Let va->flags use only some low bits of va_start instead of completely overwriting va_start. The current implementation of struct vmap_area wasted space. After applying this commit, sizeof(struct vmap_area) has been reduced from 11 words to 8 words. Pengfei Li (1): mm/vmalloc.c: Modify struct vmap_area to reduce its size Uladzislau Rezki (Sony) (1): mm/vmalloc: do not keep unpurged areas in the busy tree include/linux/vmalloc.h | 40 - mm/vmalloc.c| 79 - 2 files changed, 86 insertions(+), 33 deletions(-) -- 2.21.0