[PATCH v2] ARM: mm: clean-up in order to reduce to call kmap_high_get()
In kmap_atomic(), kmap_high_get() is invoked for checking already mapped area. In __flush_dcache_page() and dma_cache_maint_page(), we explicitly call kmap_high_get() before kmap_atomic() when cache_is_vipt(), so kmap_high_get() can be invoked twice. This is useless operation, so remove one. v2: change cache_is_vipt() to cache_is_vipt_nonaliasing() in order to be self-documented Acked-by: Nicolas Pitre n...@linaro.org Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com --- Hello Nicolas. I maintain your 'Acked-by' while updating this patch to v2. Please let me know if there is problem. Thanks. diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index e9db6b4..ef3e0f3 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -823,16 +823,17 @@ static void dma_cache_maint_page(struct page *page, unsigned long offset, if (PageHighMem(page)) { if (len + offset PAGE_SIZE) len = PAGE_SIZE - offset; - vaddr = kmap_high_get(page); - if (vaddr) { - vaddr += offset; - op(vaddr, len, dir); - kunmap_high(page); - } else if (cache_is_vipt()) { - /* unmapped pages might still be cached */ + + if (cache_is_vipt_nonaliasing()) { vaddr = kmap_atomic(page); op(vaddr + offset, len, dir); kunmap_atomic(vaddr); + } else { + vaddr = kmap_high_get(page); + if (vaddr) { + op(vaddr + offset, len, dir); + kunmap_high(page); + } } } else { vaddr = page_address(page) + offset; diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c index 1c8f7f5..0d473cc 100644 --- a/arch/arm/mm/flush.c +++ b/arch/arm/mm/flush.c @@ -170,15 +170,18 @@ void __flush_dcache_page(struct address_space *mapping, struct page *page) if (!PageHighMem(page)) { __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE); } else { - void *addr = kmap_high_get(page); - if (addr) { - __cpuc_flush_dcache_area(addr, PAGE_SIZE); - kunmap_high(page); - } else if (cache_is_vipt()) { - /* unmapped pages might still be cached */ + void *addr; + + if (cache_is_vipt_nonaliasing()) { addr = kmap_atomic(page); __cpuc_flush_dcache_area(addr, PAGE_SIZE); kunmap_atomic(addr); + } else { + addr = kmap_high_get(page); + if (addr) { + __cpuc_flush_dcache_area(addr, PAGE_SIZE); + kunmap_high(page); + } } } -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] ARM: mm: disable kmap_high_get() for SMP
On Thu, Mar 07, 2013 at 07:35:51PM +0900, JoonSoo Kim wrote: 2013/3/7 Nicolas Pitre nicolas.pi...@linaro.org: On Thu, 7 Mar 2013, Joonsoo Kim wrote: Hello, Nicolas. On Tue, Mar 05, 2013 at 05:36:12PM +0800, Nicolas Pitre wrote: On Mon, 4 Mar 2013, Joonsoo Kim wrote: With SMP and enabling kmap_high_get(), it makes users of kmap_atomic() sequential ordered, because kmap_high_get() use global kmap_lock(). It is not welcome situation, so turn off this optimization for SMP. I'm not sure I understand the problem. The lock taken by kmap_high_get() is released right away before that function returns and therefore this is not actually serializing anything. Yes, you understand what I want to say correctly. Sorry for bad explanation. Following is reasons why I send this patch with RFC tag. If we have more cpus, performance degration is possible although it is very short time to holding the lock in kmap_high_get(). And kmap has maximum 512 entries(512 * 4K = 2M) and some mobile devices has 2G memory(highmem 1G), so probability for finding matched entry is approximately 1/512. This probability can be more decreasing for device which have more memory. So I think that waste time to find matched entry is more than saved time. Above is my humble opinion, so please let me know what I am missing. Please look at the kmap_high_get() code again. It performs no searching at all. What it does is: If page is not highmem, it may be already filtered in kmap_atomic(). So we only consider highmem page. For highmem page, it perform searching. In kmap_high_get(), page_address() is called. In page_address(), it hash PA and iterate a list for this hashed value. And another advantage of disabling ARCH_NEEDS_KMAP_HIGH_GET is that kmap(), kunmap() works without irq disabled. Thanks. Hello, Nicolas. For just confirm, you don't agree with this, right? Thanks. - lock the kmap array against concurrent changes - if the given page is not highmem, unlock and return NULL - otherwise increment that page reference count, unlock, and return the mapped address for that page. There is almost zero cost to this function, independently of the number of kmap entries, whereas it does save much bigger costs elsewhere when it is successful. Nicolas -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 1/3] slub: correct to calculate num of acquired objects in get_partial_node()
Hello, Pekka. Could you pick up 1/3, 3/3? These are already acked by Christoph. 2/3 is same effect as Glauber's slub: correctly bootstrap boot caches, so should skip it. Thanks. On Mon, Jan 21, 2013 at 05:01:25PM +0900, Joonsoo Kim wrote: There is a subtle bug when calculating a number of acquired objects. Currently, we calculate available = page-objects - page-inuse, after acquire_slab() is called in get_partial_node(). In acquire_slab() with mode = 1, we always set new.inuse = page-objects. So, acquire_slab(s, n, page, object == NULL); if (!object) { c-page = page; stat(s, ALLOC_FROM_PARTIAL); object = t; available = page-objects - page-inuse; !!! availabe is always 0 !!! ... Therfore, available s-cpu_partial / 2 is always false and we always go to second iteration. This patch correct this problem. After that, we don't need return value of put_cpu_partial(). So remove it. v2: calculate nr of objects using new.objects and new.inuse. It is more accurate way than before. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/mm/slub.c b/mm/slub.c index ba2ca53..7204c74 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1493,7 +1493,7 @@ static inline void remove_partial(struct kmem_cache_node *n, */ static inline void *acquire_slab(struct kmem_cache *s, struct kmem_cache_node *n, struct page *page, - int mode) + int mode, int *objects) { void *freelist; unsigned long counters; @@ -1507,6 +1507,7 @@ static inline void *acquire_slab(struct kmem_cache *s, freelist = page-freelist; counters = page-counters; new.counters = counters; + *objects = new.objects - new.inuse; if (mode) { new.inuse = page-objects; new.freelist = NULL; @@ -1528,7 +1529,7 @@ static inline void *acquire_slab(struct kmem_cache *s, return freelist; } -static int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain); +static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain); static inline bool pfmemalloc_match(struct page *page, gfp_t gfpflags); /* @@ -1539,6 +1540,8 @@ static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n, { struct page *page, *page2; void *object = NULL; + int available = 0; + int objects; /* * Racy check. If we mistakenly see no partial slabs then we @@ -1552,22 +1555,21 @@ static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n, spin_lock(n-list_lock); list_for_each_entry_safe(page, page2, n-partial, lru) { void *t; - int available; if (!pfmemalloc_match(page, flags)) continue; - t = acquire_slab(s, n, page, object == NULL); + t = acquire_slab(s, n, page, object == NULL, objects); if (!t) break; + available += objects; if (!object) { c-page = page; stat(s, ALLOC_FROM_PARTIAL); object = t; - available = page-objects - page-inuse; } else { - available = put_cpu_partial(s, page, 0); + put_cpu_partial(s, page, 0); stat(s, CPU_PARTIAL_NODE); } if (kmem_cache_debug(s) || available s-cpu_partial / 2) @@ -1946,7 +1948,7 @@ static void unfreeze_partials(struct kmem_cache *s, * If we did not find a slot then simply move all the partials to the * per node partial list. */ -static int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain) +static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain) { struct page *oldpage; int pages; @@ -1984,7 +1986,6 @@ static int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain) page-next = oldpage; } while (this_cpu_cmpxchg(s-cpu_slab-partial, oldpage, page) != oldpage); - return pobjects; } static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c) -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] correct load_balance()
On Mon, Feb 25, 2013 at 01:56:59PM +0900, Joonsoo Kim wrote: On Thu, Feb 14, 2013 at 02:48:33PM +0900, Joonsoo Kim wrote: Commit 88b8dac0 makes load_balance() consider other cpus in its group. But, there are some missing parts for this feature to work properly. This patchset correct these things and make load_balance() robust. Others are related to LBF_ALL_PINNED. This is fallback functionality when all tasks can't be moved as cpu affinity. But, currently, if imbalance is not large enough to task's load, we leave LBF_ALL_PINNED flag and 'redo' is triggered. This is not our intention, so correct it. These are based on v3.8-rc7. Joonsoo Kim (8): sched: change position of resched_cpu() in load_balance() sched: explicitly cpu_idle_type checking in rebalance_domains() sched: don't consider other cpus in our group in case of NEWLY_IDLE sched: clean up move_task() and move_one_task() sched: move up affinity check to mitigate useless redoing overhead sched: rename load_balance_tmpmask to load_balance_cpu_active sched: prevent to re-select dst-cpu in load_balance() sched: reset lb_env when redo in load_balance() kernel/sched/core.c |9 +++-- kernel/sched/fair.c | 107 +-- 2 files changed, 67 insertions(+), 49 deletions(-) Hello, Ingo and Peter. Could you review this patch set? Please let me know what I should do for merging this? Hello. One more ping :) Thanks. -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/3] mm, nobootmem: clean-up of free_low_memory_core_early()
Remove unused argument and make function static, because there is no user outside of nobootmem.c Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h index cdc3bab..5f0b0e1 100644 --- a/include/linux/bootmem.h +++ b/include/linux/bootmem.h @@ -44,7 +44,6 @@ extern unsigned long init_bootmem_node(pg_data_t *pgdat, unsigned long endpfn); extern unsigned long init_bootmem(unsigned long addr, unsigned long memend); -extern unsigned long free_low_memory_core_early(int nodeid); extern unsigned long free_all_bootmem_node(pg_data_t *pgdat); extern unsigned long free_all_bootmem(void); diff --git a/mm/nobootmem.c b/mm/nobootmem.c index 4711e91..589c673 100644 --- a/mm/nobootmem.c +++ b/mm/nobootmem.c @@ -120,7 +120,7 @@ static unsigned long __init __free_memory_core(phys_addr_t start, return end_pfn - start_pfn; } -unsigned long __init free_low_memory_core_early(int nodeid) +static unsigned long __init free_low_memory_core_early() { unsigned long count = 0; phys_addr_t start, end, size; @@ -170,7 +170,7 @@ unsigned long __init free_all_bootmem(void) * because in some case like Node0 doesn't have RAM installed * low ram will be on Node1 */ - return free_low_memory_core_early(MAX_NUMNODES); + return free_low_memory_core_early(); } /** -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] mm, nobootmem: fix wrong usage of max_low_pfn
max_low_pfn reflect the number of _pages_ in the system, not the maximum PFN. You can easily find that fact in init_bootmem(). So fix it. Additionally, if 'start_pfn == end_pfn', we don't need to go futher, so change range check. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/mm/nobootmem.c b/mm/nobootmem.c index 5e07d36..4711e91 100644 --- a/mm/nobootmem.c +++ b/mm/nobootmem.c @@ -110,9 +110,9 @@ static unsigned long __init __free_memory_core(phys_addr_t start, { unsigned long start_pfn = PFN_UP(start); unsigned long end_pfn = min_t(unsigned long, - PFN_DOWN(end), max_low_pfn); + PFN_DOWN(end), min_low_pfn); - if (start_pfn end_pfn) + if (start_pfn = end_pfn) return 0; __free_pages_memory(start_pfn, end_pfn); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/3] mm, nobootmem: do memset() after memblock_reserve()
Currently, we do memset() before reserving the area. This may not cause any problem, but it is somewhat weird. So change execution order. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/mm/nobootmem.c b/mm/nobootmem.c index 589c673..f11ec1c 100644 --- a/mm/nobootmem.c +++ b/mm/nobootmem.c @@ -46,8 +46,8 @@ static void * __init __alloc_memory_core_early(int nid, u64 size, u64 align, return NULL; ptr = phys_to_virt(addr); - memset(ptr, 0, size); memblock_reserve(addr, size); + memset(ptr, 0, size); /* * The min_count is set to 0 so that bootmem allocated blocks * are never reported as leaks. -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] mm, nobootmem: clean-up of free_low_memory_core_early()
On Tue, Mar 19, 2013 at 02:16:00PM +0900, Joonsoo Kim wrote: Remove unused argument and make function static, because there is no user outside of nobootmem.c Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h index cdc3bab..5f0b0e1 100644 --- a/include/linux/bootmem.h +++ b/include/linux/bootmem.h @@ -44,7 +44,6 @@ extern unsigned long init_bootmem_node(pg_data_t *pgdat, unsigned long endpfn); extern unsigned long init_bootmem(unsigned long addr, unsigned long memend); -extern unsigned long free_low_memory_core_early(int nodeid); extern unsigned long free_all_bootmem_node(pg_data_t *pgdat); extern unsigned long free_all_bootmem(void); diff --git a/mm/nobootmem.c b/mm/nobootmem.c index 4711e91..589c673 100644 --- a/mm/nobootmem.c +++ b/mm/nobootmem.c @@ -120,7 +120,7 @@ static unsigned long __init __free_memory_core(phys_addr_t start, return end_pfn - start_pfn; } -unsigned long __init free_low_memory_core_early(int nodeid) +static unsigned long __init free_low_memory_core_early() { unsigned long count = 0; phys_addr_t start, end, size; @@ -170,7 +170,7 @@ unsigned long __init free_all_bootmem(void) * because in some case like Node0 doesn't have RAM installed * low ram will be on Node1 */ - return free_low_memory_core_early(MAX_NUMNODES); + return free_low_memory_core_early(); } /** -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a Sorry, this patch makes build warning. Below is fixed version. From 05f4a768dd5c514113916908f4710f8863704ed9 Mon Sep 17 00:00:00 2001 From: Joonsoo Kim iamjoonsoo@lge.com Date: Mon, 18 Mar 2013 14:17:57 +0900 Subject: [PATCH] mm, nobootmem: clean-up of free_low_memory_core_early() Remove unused argument and make function static, because there is no user outside of nobootmem.c Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h index cdc3bab..5f0b0e1 100644 --- a/include/linux/bootmem.h +++ b/include/linux/bootmem.h @@ -44,7 +44,6 @@ extern unsigned long init_bootmem_node(pg_data_t *pgdat, unsigned long endpfn); extern unsigned long init_bootmem(unsigned long addr, unsigned long memend); -extern unsigned long free_low_memory_core_early(int nodeid); extern unsigned long free_all_bootmem_node(pg_data_t *pgdat); extern unsigned long free_all_bootmem(void); diff --git a/mm/nobootmem.c b/mm/nobootmem.c index 4711e91..9c38698 100644 --- a/mm/nobootmem.c +++ b/mm/nobootmem.c @@ -120,7 +120,7 @@ static unsigned long __init __free_memory_core(phys_addr_t start, return end_pfn - start_pfn; } -unsigned long __init free_low_memory_core_early(int nodeid) +static unsigned long __init free_low_memory_core_early(void) { unsigned long count = 0; phys_addr_t start, end, size; @@ -170,7 +170,7 @@ unsigned long __init free_all_bootmem(void) * because in some case like Node0 doesn't have RAM installed * low ram will be on Node1 */ - return free_low_memory_core_early(MAX_NUMNODES); + return free_low_memory_core_early(); } /** -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3] mm, nobootmem: do memset() after memblock_reserve()
On Mon, Mar 18, 2013 at 10:53:04PM -0700, Yinghai Lu wrote: On Mon, Mar 18, 2013 at 10:16 PM, Joonsoo Kim iamjoonsoo@lge.com wrote: Currently, we do memset() before reserving the area. This may not cause any problem, but it is somewhat weird. So change execution order. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/mm/nobootmem.c b/mm/nobootmem.c index 589c673..f11ec1c 100644 --- a/mm/nobootmem.c +++ b/mm/nobootmem.c @@ -46,8 +46,8 @@ static void * __init __alloc_memory_core_early(int nid, u64 size, u64 align, return NULL; ptr = phys_to_virt(addr); - memset(ptr, 0, size); memblock_reserve(addr, size); + memset(ptr, 0, size); move down ptr = ... too ? Okay. I will send v2 soon. /* * The min_count is set to 0 so that bootmem allocated blocks * are never reported as leaks. -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] mm, nobootmem: clean-up of free_low_memory_core_early()
On Mon, Mar 18, 2013 at 10:51:43PM -0700, Yinghai Lu wrote: On Mon, Mar 18, 2013 at 10:16 PM, Joonsoo Kim iamjoonsoo@lge.com wrote: Remove unused argument and make function static, because there is no user outside of nobootmem.c Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h index cdc3bab..5f0b0e1 100644 --- a/include/linux/bootmem.h +++ b/include/linux/bootmem.h @@ -44,7 +44,6 @@ extern unsigned long init_bootmem_node(pg_data_t *pgdat, unsigned long endpfn); extern unsigned long init_bootmem(unsigned long addr, unsigned long memend); -extern unsigned long free_low_memory_core_early(int nodeid); extern unsigned long free_all_bootmem_node(pg_data_t *pgdat); extern unsigned long free_all_bootmem(void); diff --git a/mm/nobootmem.c b/mm/nobootmem.c index 4711e91..589c673 100644 --- a/mm/nobootmem.c +++ b/mm/nobootmem.c @@ -120,7 +120,7 @@ static unsigned long __init __free_memory_core(phys_addr_t start, return end_pfn - start_pfn; } -unsigned long __init free_low_memory_core_early(int nodeid) +static unsigned long __init free_low_memory_core_early() (void) ? Yes, fixed version is already sent. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] mm, nobootmem: fix wrong usage of max_low_pfn
On Mon, Mar 18, 2013 at 10:47:41PM -0700, Yinghai Lu wrote: On Mon, Mar 18, 2013 at 10:15 PM, Joonsoo Kim iamjoonsoo@lge.com wrote: max_low_pfn reflect the number of _pages_ in the system, not the maximum PFN. You can easily find that fact in init_bootmem(). So fix it. I'm confused. for x86, we have max_low_pfn defined in ... Below is queote from Russell King in 'https://lkml.org/lkml/2013/3/13/123' Now, max_low_pfn is initialized this way: /** * init_bootmem - register boot memory * @start: pfn where the bitmap is to be placed * @pages: number of available physical pages * * Returns the number of bytes needed to hold the bitmap. */ unsigned long __init init_bootmem(unsigned long start, unsigned long pages) { max_low_pfn = pages; min_low_pfn = start; return init_bootmem_core(NODE_DATA(0)-bdata, start, 0, pages); } So, min_low_pfn is the PFN offset of the start of physical memory (so 3GB PAGE_SHIFT) and max_low_pfn ends up being the number of pages, _not_ the maximum PFN value So, if physical address doesn't start at 0, max_low_pfn doesn't represent the maximum PFN value. This is a case for ARM. #ifdef CONFIG_X86_32 /* max_low_pfn get updated here */ find_low_pfn_range(); #else num_physpages = max_pfn; check_x2apic(); /* How many end-of-memory variables you have, grandma! */ /* need this before calling reserve_initrd */ if (max_pfn (1UL(32 - PAGE_SHIFT))) max_low_pfn = e820_end_of_low_ram_pfn(); else max_low_pfn = max_pfn; and under max_low_pfn is bootmem. Additionally, if 'start_pfn == end_pfn', we don't need to go futher, so change range check. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/mm/nobootmem.c b/mm/nobootmem.c index 5e07d36..4711e91 100644 --- a/mm/nobootmem.c +++ b/mm/nobootmem.c @@ -110,9 +110,9 @@ static unsigned long __init __free_memory_core(phys_addr_t start, { unsigned long start_pfn = PFN_UP(start); unsigned long end_pfn = min_t(unsigned long, - PFN_DOWN(end), max_low_pfn); + PFN_DOWN(end), min_low_pfn); what is min_low_pfn ? is it 0 for x86? My implementation is totally wrong. :) min_low_pfn is not proper value for this purpose. I will fix it. Sorry for noise. Thanks. Thanks Yinghai -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] mm, nobootmem: fix wrong usage of max_low_pfn
On Tue, Mar 19, 2013 at 03:25:22PM +0900, Joonsoo Kim wrote: On Mon, Mar 18, 2013 at 10:47:41PM -0700, Yinghai Lu wrote: On Mon, Mar 18, 2013 at 10:15 PM, Joonsoo Kim iamjoonsoo@lge.com wrote: max_low_pfn reflect the number of _pages_ in the system, not the maximum PFN. You can easily find that fact in init_bootmem(). So fix it. I'm confused. for x86, we have max_low_pfn defined in ... Below is queote from Russell King in 'https://lkml.org/lkml/2013/3/13/123' Now, max_low_pfn is initialized this way: /** * init_bootmem - register boot memory * @start: pfn where the bitmap is to be placed * @pages: number of available physical pages * * Returns the number of bytes needed to hold the bitmap. */ unsigned long __init init_bootmem(unsigned long start, unsigned long pages) { max_low_pfn = pages; min_low_pfn = start; return init_bootmem_core(NODE_DATA(0)-bdata, start, 0, pages); } So, min_low_pfn is the PFN offset of the start of physical memory (so 3GB PAGE_SHIFT) and max_low_pfn ends up being the number of pages, _not_ the maximum PFN value So, if physical address doesn't start at 0, max_low_pfn doesn't represent the maximum PFN value. This is a case for ARM. #ifdef CONFIG_X86_32 /* max_low_pfn get updated here */ find_low_pfn_range(); #else num_physpages = max_pfn; check_x2apic(); /* How many end-of-memory variables you have, grandma! */ /* need this before calling reserve_initrd */ if (max_pfn (1UL(32 - PAGE_SHIFT))) max_low_pfn = e820_end_of_low_ram_pfn(); else max_low_pfn = max_pfn; and under max_low_pfn is bootmem. Additionally, if 'start_pfn == end_pfn', we don't need to go futher, so change range check. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/mm/nobootmem.c b/mm/nobootmem.c index 5e07d36..4711e91 100644 --- a/mm/nobootmem.c +++ b/mm/nobootmem.c @@ -110,9 +110,9 @@ static unsigned long __init __free_memory_core(phys_addr_t start, { unsigned long start_pfn = PFN_UP(start); unsigned long end_pfn = min_t(unsigned long, - PFN_DOWN(end), max_low_pfn); + PFN_DOWN(end), min_low_pfn); what is min_low_pfn ? is it 0 for x86? My implementation is totally wrong. :) min_low_pfn is not proper value for this purpose. I will fix it. Sorry for noise. Thanks. How about using memblock.current_limit? unsigned long end_pfn = min_t(unsigned long, PFN_DOWN(end), memblock.current_limit); Thanks. Thanks Yinghai -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] mm, nobootmem: fix wrong usage of max_low_pfn
On Tue, Mar 19, 2013 at 12:35:45AM -0700, Yinghai Lu wrote: Can you check why sparc do not need to change interface during converting to use memblock to replace bootmem? Sure. According to my understanding to sparc32 code(arch/sparc/mm/init_32.c), they already use max_low_pfn as the maximum PFN value, not as the number of pages. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/8] sched: explicitly cpu_idle_type checking in rebalance_domains()
Hello, Peter. On Tue, Mar 19, 2013 at 03:02:21PM +0100, Peter Zijlstra wrote: On Thu, 2013-02-14 at 14:48 +0900, Joonsoo Kim wrote: After commit 88b8dac0, dst-cpu can be changed in load_balance(), then we can't know cpu_idle_type of dst-cpu when load_balance() return positive. So, add explicit cpu_idle_type checking. No real objection I suppose, but did you actually see this go wrong? No, I found it while I review whole scheduler code. Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/8] sched: don't consider other cpus in our group in case of NEWLY_IDLE
On Tue, Mar 19, 2013 at 03:20:57PM +0100, Peter Zijlstra wrote: On Thu, 2013-02-14 at 14:48 +0900, Joonsoo Kim wrote: Commit 88b8dac0 makes load_balance() consider other cpus in its group, regardless of idle type. When we do NEWLY_IDLE balancing, we should not consider it, because a motivation of NEWLY_IDLE balancing is to turn this cpu to non idle state if needed. This is not the case of other cpus. So, change code not to consider other cpus for NEWLY_IDLE balancing. With this patch, assign 'if (pulled_task) this_rq-idle_stamp = 0' in idle_balance() is corrected, because NEWLY_IDLE balancing doesn't consider other cpus. Assigning to 'this_rq-idle_stamp' is now valid. Cc: Srivatsa Vaddagiri va...@linux.vnet.ibm.com Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com Fair enough, good catch. Acked-by: Peter Zijlstra a.p.zijls...@chello.nl diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 0c6aaf6..97498f4 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5016,8 +5016,15 @@ static int load_balance(int this_cpu, struct rq *this_rq, .cpus = cpus, }; + /* For NEWLY_IDLE load_balancing, we don't need to consider +* other cpus in our group */ + if (idle == CPU_NEWLY_IDLE) { + env.dst_grpmask = NULL; + max_lb_iterations = 0; Just a small nit; I don't think we'll ever get to evaluate max_lb_iterations when !dst_grpmask. So strictly speaking its superfluous to touch it. Okay. In next spin, I will remove it and add a comment here. Thanks. + } else { + max_lb_iterations = cpumask_weight(env.dst_grpmask); + } cpumask_copy(cpus, cpu_active_mask); - max_lb_iterations = cpumask_weight(env.dst_grpmask); schedstat_inc(sd, lb_count[idle]); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/8] sched: clean up move_task() and move_one_task()
On Tue, Mar 19, 2013 at 03:30:15PM +0100, Peter Zijlstra wrote: On Thu, 2013-02-14 at 14:48 +0900, Joonsoo Kim wrote: Some validation for task moving is performed in move_tasks() and move_one_task(). We can move these code to can_migrate_task() which is already exist for this purpose. @@ -4011,18 +4027,7 @@ static int move_tasks(struct lb_env *env) break; } - if (throttled_lb_pair(task_group(p), env-src_cpu, env-dst_cpu)) - goto next; - - load = task_h_load(p); - - if (sched_feat(LB_MIN) load 16 !env-sd-nr_balance_failed) - goto next; - - if ((load / 2) env-imbalance) - goto next; - - if (!can_migrate_task(p, env)) + if (!can_migrate_task(p, env, false, load)) goto next; move_task(p, env); Right, so I'm not so taken with this one. The whole load stuff really is a balance heuristic that's part of move_tasks(), move_one_task() really doesn't care about that. So why did you include it? Purely so you didn't have to re-order the tests? I don't see any reason not to flip a tests around. I think that I'm not fully understand what you are concerning, because of my poor English. If possible, please elaborate on a problem in more detail. First of all, I do my best to answer your question. Patch 4/8, 5/8 are for mitigating useless redoing overhead caused by LBF_ALL_PINNED. For this purpose, we should check 'cpu affinity' before evaluating a load. Just moving up can_migrate_task() above load evaluation code may raise side effect, because can_migrate_task() have other checking which is 'cache hottness'. I don't want a side effect. So embedding load evaluation to can_migrate_task() and re-order checking and makes load evaluation disabled for move_one_task(). If your recommandation is to move up can_mirate_task() above load evaluation code, yes, I can, and will do that. :) Please let me know what I am misunderstand. Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/8] sched: rename load_balance_tmpmask to load_balance_cpu_active
On Tue, Mar 19, 2013 at 04:01:01PM +0100, Peter Zijlstra wrote: On Thu, 2013-02-14 at 14:48 +0900, Joonsoo Kim wrote: This name doesn't represent specific meaning. So rename it to imply it's purpose. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 26058d0..e6f8783 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6814,7 +6814,7 @@ struct task_group root_task_group; LIST_HEAD(task_groups); #endif -DECLARE_PER_CPU(cpumask_var_t, load_balance_tmpmask); +DECLARE_PER_CPU(cpumask_var_t, load_balance_cpu_active); That's not much better; how about we call it: load_balance_mask. Okay, in next spin, I will call it as load_balance_mask. Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/8] sched: prevent to re-select dst-cpu in load_balance()
On Tue, Mar 19, 2013 at 04:05:46PM +0100, Peter Zijlstra wrote: On Thu, 2013-02-14 at 14:48 +0900, Joonsoo Kim wrote: Commit 88b8dac0 makes load_balance() consider other cpus in its group. But, in that, there is no code for preventing to re-select dst-cpu. So, same dst-cpu can be selected over and over. This patch add functionality to load_balance() in order to exclude cpu which is selected once. Oh man.. seriously? Did you see this happen? Also, can't we simply remove it from lb-cpus? I didn't see it, I do just logical thinking. :) lb-cpus is for source cpus and dst-cpu is for dest cpus. So, it doesn't works to remove it from lb-cpus. Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 8/8] sched: reset lb_env when redo in load_balance()
On Tue, Mar 19, 2013 at 04:21:23PM +0100, Peter Zijlstra wrote: On Thu, 2013-02-14 at 14:48 +0900, Joonsoo Kim wrote: Commit 88b8dac0 makes load_balance() consider other cpus in its group. So, now, When we redo in load_balance(), we should reset some fields of lb_env to ensure that load_balance() works for initial cpu, not for other cpus in its group. So correct it. Cc: Srivatsa Vaddagiri va...@linux.vnet.ibm.com Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 70631e8..25c798c 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5014,14 +5014,20 @@ static int load_balance(int this_cpu, struct rq *this_rq, struct lb_env env = { .sd = sd, - .dst_cpu= this_cpu, - .dst_rq = this_rq, .dst_grpmask= dst_grp, .idle = idle, - .loop_break = sched_nr_migrate_break, .cpus = cpus, }; + schedstat_inc(sd, lb_count[idle]); + cpumask_copy(cpus, cpu_active_mask); + +redo: + env.dst_cpu = this_cpu; + env.dst_rq = this_rq; + env.loop = 0; + env.loop_break = sched_nr_migrate_break; + /* For NEWLY_IDLE load_balancing, we don't need to consider * other cpus in our group */ if (idle == CPU_NEWLY_IDLE) { OK, so this is the case where we tried to balance !this_cpu and found ALL_PINNED, right? You can only get here in very weird cases where people love their sched_setaffinity() way too much, do we care? Why not give up? Now that you mentioned it, I have no enough reason for this patch. I think that giving up is more preferable to me. I will omit this patch for next spin. Also, looking at this, shouldn't we consider env-cpus in can_migrate_task() where we compute new_dst_cpu? As previously stated, env-cpus is for src cpus, so when we decide dst_cpu, it doesn't matter. Really thanks for detailed review to all this patchset. Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/8] sched: clean up move_task() and move_one_task()
2013/3/20 Peter Zijlstra pet...@infradead.org: On Wed, 2013-03-20 at 16:33 +0900, Joonsoo Kim wrote: Right, so I'm not so taken with this one. The whole load stuff really is a balance heuristic that's part of move_tasks(), move_one_task() really doesn't care about that. So why did you include it? Purely so you didn't have to re-order the tests? I don't see any reason not to flip a tests around. I think that I'm not fully understand what you are concerning, because of my poor English. If possible, please elaborate on a problem in more detail. OK, so your initial Changelog said it wanted to remove some code duplication between move_tasks() and move_one_task(); but then you put in the load heuristics and add a boolean argument to only enable those for move_tasks() -- so clearly that wasn't duplicated. So why move that code.. I proposed that this was due a reluctance to re-arrange the various tests that stop the migration from happening. Now you say: ... Just moving up can_migrate_task() above load evaluation code may raise side effect, because can_migrate_task() have other checking which is 'cache hottness'. I don't want a side effect. So embedding load evaluation to can_migrate_task() and re-order checking and makes load evaluation disabled for move_one_task(). Which pretty much affirms this. However I also said that I don't think the order really matters that much; each test will cancel the migration of this task; the order of these tests seem immaterial. If your recommandation is to move up can_mirate_task() above load evaluation code, yes, I can, and will do that. :) I would actually propose moving the throttled test into can_migrate_task() and leave it at that. Okay. I will do that in next spin. Thanks!! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/8] sched: prevent to re-select dst-cpu in load_balance()
2013/3/20 Peter Zijlstra pet...@infradead.org: On Wed, 2013-03-20 at 16:43 +0900, Joonsoo Kim wrote: On Tue, Mar 19, 2013 at 04:05:46PM +0100, Peter Zijlstra wrote: On Thu, 2013-02-14 at 14:48 +0900, Joonsoo Kim wrote: Commit 88b8dac0 makes load_balance() consider other cpus in its group. But, in that, there is no code for preventing to re-select dst-cpu. So, same dst-cpu can be selected over and over. This patch add functionality to load_balance() in order to exclude cpu which is selected once. Oh man.. seriously? Did you see this happen? Also, can't we simply remove it from lb-cpus? I didn't see it, I do just logical thinking. :) lb-cpus is for source cpus and dst-cpu is for dest cpus. So, it doesn't works to remove it from lb-cpus. How about we interpret -cpus as the total mask to balance; so both source and destination. That way clearing a cpu means we won't take nor put tasks on it. In my quick thought, it may be possible. I will try to do it. Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHSET wq/for-3.10] workqueue: break up workqueue_lock into multiple locks
2013/3/19 Tejun Heo t...@kernel.org: On Wed, Mar 13, 2013 at 07:57:18PM -0700, Tejun Heo wrote: and available in the following git branch. git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git review-finer-locking Applied to wq/for-3.10. Hello, Tejun. I know I am late, but, please give me a change to ask a question. Finer locking for workqueue code is really needed? Is there a performance issue? I think that there is too many locks and locking rules, although the description about these are very nice. Thanks. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/10] workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[]
2013/3/20 Tejun Heo t...@kernel.org: Unbound workqueues are going to be NUMA-affine. Add wq_numa_tbl_len and wq_numa_possible_cpumask[] in preparation. The former is the highest NUMA node ID + 1 and the latter is masks of possibles CPUs for each NUMA node. It is better to move this code to topology.c or cpumask.c, then it can be generally used. Thanks. This patch only introduces these. Future patches will make use of them. Signed-off-by: Tejun Heo t...@kernel.org --- kernel/workqueue.c | 35 ++- 1 file changed, 34 insertions(+), 1 deletion(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 775c2f4..9b096e3 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -44,6 +44,7 @@ #include linux/jhash.h #include linux/hashtable.h #include linux/rculist.h +#include linux/nodemask.h #include workqueue_internal.h @@ -256,6 +257,11 @@ struct workqueue_struct { static struct kmem_cache *pwq_cache; +static int wq_numa_tbl_len;/* highest possible NUMA node id + 1 */ +static cpumask_var_t *wq_numa_possible_cpumask; + /* possible CPUs of each node, may + be NULL if init failed */ + static DEFINE_MUTEX(wq_mutex); /* protects workqueues and pools */ static DEFINE_SPINLOCK(pwq_lock); /* protects pool_workqueues */ static DEFINE_SPINLOCK(wq_mayday_lock);/* protects wq-maydays list */ @@ -4416,7 +4422,7 @@ out_unlock: static int __init init_workqueues(void) { int std_nice[NR_STD_WORKER_POOLS] = { 0, HIGHPRI_NICE_LEVEL }; - int i, cpu; + int i, node, cpu; /* make sure we have enough bits for OFFQ pool ID */ BUILD_BUG_ON((1LU (BITS_PER_LONG - WORK_OFFQ_POOL_SHIFT)) @@ -4429,6 +4435,33 @@ static int __init init_workqueues(void) cpu_notifier(workqueue_cpu_up_callback, CPU_PRI_WORKQUEUE_UP); hotcpu_notifier(workqueue_cpu_down_callback, CPU_PRI_WORKQUEUE_DOWN); + /* determine NUMA pwq table len - highest node id + 1 */ + for_each_node(node) + wq_numa_tbl_len = max(wq_numa_tbl_len, node + 1); + + /* +* We want masks of possible CPUs of each node which isn't readily +* available. Build one from cpu_to_node() which should have been +* fully initialized by now. +*/ + wq_numa_possible_cpumask = kzalloc(wq_numa_tbl_len * + sizeof(wq_numa_possible_cpumask[0]), + GFP_KERNEL); + BUG_ON(!wq_numa_possible_cpumask); + + for_each_node(node) + BUG_ON(!alloc_cpumask_var_node(wq_numa_possible_cpumask[node], + GFP_KERNEL, node)); + for_each_possible_cpu(cpu) { + node = cpu_to_node(cpu); + if (WARN_ON(node == NUMA_NO_NODE)) { + pr_err(workqueue: NUMA node mapping not available for cpu%d, disabling NUMA support\n, cpu); + wq_numa_possible_cpumask = NULL; + break; + } + cpumask_set_cpu(cpu, wq_numa_possible_cpumask[node]); + } + /* initialize CPU pools */ for_each_possible_cpu(cpu) { struct worker_pool *pool; -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 0/6] ARM: use NO_BOOTMEM on default configuration
Currently, ARM use traditional 'bootmem' allocator. It use a bitmap for managing memory space, so initialize a bitmap at first step. It is a needless overhead if we use 'nobootmem'. 'nobootmem' use a memblock allocator internally, so there is no additional initializing overhead. In addition, if we use 'nobootmem', we can save small amount of memory, because 'nobootmem' manage memory space as byte unit. However, 'bootmem' manage it as page unit, so some space is wasted, although it is very small. On my system, 20 KB memories can be saved. :) Using 'nobootmem' have another advantage. Before initializing 'bootmem' allocator, we use memblock allocator. If we use memblock allocator after initializing 'bootmem' by mistake, there is possible problem. Patch '1/6' is good example of this case. 'nobootmem' use memblock allocator internally, so these risk will be disappeared. There is one stopper to enable NO_BOOTMEM, it is max_low_pfn. nobootmem use max_low_pfn for computing boundary in free_all_bootmem() So we need proper value to max_low_pfn. But, there is some difficulty related to max_low_pfn. max_low_pfn is used for two meanings in various architectures. One is for number of pages in lowmem and the other is for maximum lowmem pfn. Now, in ARM, it is used as number of pages in lowmem. You can get more information in below link. http://lwn.net/Articles/543408/ http://lwn.net/Articles/543424/ As I investigated, architectures which use max_low_pfn as maximum pfn are more than others, so IMHO, to change meaning of max_low_pfn to maximum pfn is preferable solution to me. This patchset change max_low_pfn as maximum lowmem pfn in ARM. In addition, min_low_pfn, max_pfn is assigned according to this criteria. AFAIK, there is no real user for max_low_pfn except block/blk-setting.c and blk-setting.c assume that max_low_pfn is maximum lowmem pfn, so this patch may not harm anything. But, I'm not expert about this, so please let me know what I am missing. I did some working test on my android device and it worked. :) Feel free to give me some opinion about this patset. This patchset is based on v3.9-rc4. Thanks. Joonsoo Kim (6): ARM, TCM: initialize TCM in paging_init(), instead of setup_arch() ARM, crashkernel: use ___alloc_bootmem_node_nopanic() for reserving memory ARM, crashkernel: correct total_mem size in reserve_crashkernel() ARM, mm: don't do arm_bootmem_init() if CONFIG_NO_BOOTMEM ARM, mm: change meaning of max_low_pfn to maximum pfn for nobootmem ARM, mm: enable NO_BOOTMEM for default ARM build arch/arm/Kconfig|1 + arch/arm/kernel/setup.c | 22 -- arch/arm/kernel/tcm.c |1 - arch/arm/kernel/tcm.h | 17 - arch/arm/mm/init.c | 19 --- arch/arm/mm/mmu.c |2 ++ arch/arm/mm/tcm.h | 17 + 7 files changed, 40 insertions(+), 39 deletions(-) delete mode 100644 arch/arm/kernel/tcm.h create mode 100644 arch/arm/mm/tcm.h -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 4/6] ARM, mm: don't do arm_bootmem_init() if CONFIG_NO_BOOTMEM
arm_bootmem_init() initialize a bitmap for bootmem and it is not needed for CONFIG_NO_BOOTMEM. So skip it when CONFIG_NO_BOOTMEM. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c index ad722f1..049414a 100644 --- a/arch/arm/mm/init.c +++ b/arch/arm/mm/init.c @@ -149,6 +149,12 @@ static void __init find_limits(unsigned long *min, unsigned long *max_low, *max_high = bank_pfn_end(mi-bank[mi-nr_banks - 1]); } +#ifdef CONFIG_NO_BOOTMEM +static inline void __init arm_bootmem_init(unsigned long start_pfn, + unsigned long end_pfn) +{ +} +#else static void __init arm_bootmem_init(unsigned long start_pfn, unsigned long end_pfn) { @@ -169,7 +175,6 @@ static void __init arm_bootmem_init(unsigned long start_pfn, * Initialise the bootmem allocator, handing the * memory banks over to bootmem. */ - node_set_online(0); pgdat = NODE_DATA(0); init_bootmem_node(pgdat, __phys_to_pfn(bitmap), start_pfn, end_pfn); @@ -200,6 +205,7 @@ static void __init arm_bootmem_init(unsigned long start_pfn, (end - start) PAGE_SHIFT, BOOTMEM_DEFAULT); } } +#endif #ifdef CONFIG_ZONE_DMA @@ -392,6 +398,7 @@ void __init bootmem_init(void) find_limits(min, max_low, max_high); + node_set_online(0); arm_bootmem_init(min, max_low); /* -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 2/6] ARM, crashkernel: use ___alloc_bootmem_node_nopanic() for reserving memory
For crashkernel, specific address should be reserved. It can be achived by reserve_bootmem(), but this function is only for bootmem. Now, we try to enable nobootmem, therfore change it more general function, ___alloc_bootmem_node_nopanic(). It can be use for both, bootmem and nobootmem. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c index b3990a3..99ffe87 100644 --- a/arch/arm/kernel/setup.c +++ b/arch/arm/kernel/setup.c @@ -674,15 +674,20 @@ static void __init reserve_crashkernel(void) { unsigned long long crash_size, crash_base; unsigned long long total_mem; + unsigned long limit = 0; int ret; total_mem = get_total_mem(); ret = parse_crashkernel(boot_command_line, total_mem, crash_size, crash_base); - if (ret) + if (ret != 0 || crash_size == 0) return; - ret = reserve_bootmem(crash_base, crash_size, BOOTMEM_EXCLUSIVE); + if (crash_base != 0) + limit = crash_base + crash_size; + + ret = ___alloc_bootmem_node_nopanic(NODE_DATA(0), crash_size, + PAGE_ALIGN, crash_base, limit); if (ret 0) { printk(KERN_WARNING crashkernel reservation failed - memory is in use (0x%lx)\n, (unsigned long)crash_base); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 1/6] ARM, TCM: initialize TCM in paging_init(), instead of setup_arch()
tcm_init() call iotable_init() and it use early_alloc variants which do memblock allocation. Directly using memblock allocation after initializing bootmem should not permitted, because bootmem can't know where are additinally reserved. So move tcm_init() to a safe place before initalizing bootmem. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c index 3f6cbb2..b3990a3 100644 --- a/arch/arm/kernel/setup.c +++ b/arch/arm/kernel/setup.c @@ -56,7 +56,6 @@ #include asm/virt.h #include atags.h -#include tcm.h #if defined(CONFIG_FPE_NWFPE) || defined(CONFIG_FPE_FASTFPE) @@ -778,8 +777,6 @@ void __init setup_arch(char **cmdline_p) reserve_crashkernel(); - tcm_init(); - #ifdef CONFIG_MULTI_IRQ_HANDLER handle_arch_irq = mdesc-handle_irq; #endif diff --git a/arch/arm/kernel/tcm.c b/arch/arm/kernel/tcm.c index 30ae6bb..f50f19e 100644 --- a/arch/arm/kernel/tcm.c +++ b/arch/arm/kernel/tcm.c @@ -17,7 +17,6 @@ #include asm/mach/map.h #include asm/memory.h #include asm/system_info.h -#include tcm.h static struct gen_pool *tcm_pool; static bool dtcm_present; diff --git a/arch/arm/kernel/tcm.h b/arch/arm/kernel/tcm.h deleted file mode 100644 index 8015ad4..000 --- a/arch/arm/kernel/tcm.h +++ /dev/null @@ -1,17 +0,0 @@ -/* - * Copyright (C) 2008-2009 ST-Ericsson AB - * License terms: GNU General Public License (GPL) version 2 - * TCM memory handling for ARM systems - * - * Author: Linus Walleij linus.wall...@stericsson.com - * Author: Rickard Andersson rickard.anders...@stericsson.com - */ - -#ifdef CONFIG_HAVE_TCM -void __init tcm_init(void); -#else -/* No TCM support, just blank inlines to be optimized out */ -inline void tcm_init(void) -{ -} -#endif diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c index e95a996..a7f127d 100644 --- a/arch/arm/mm/mmu.c +++ b/arch/arm/mm/mmu.c @@ -34,6 +34,7 @@ #include asm/mach/pci.h #include mm.h +#include tcm.h /* * empty_zero_page is a special page that is used for @@ -1256,6 +1257,7 @@ void __init paging_init(struct machine_desc *mdesc) dma_contiguous_remap(); devicemaps_init(mdesc); kmap_init(); + tcm_init(); top_pmd = pmd_off_k(0x); diff --git a/arch/arm/mm/tcm.h b/arch/arm/mm/tcm.h new file mode 100644 index 000..8015ad4 --- /dev/null +++ b/arch/arm/mm/tcm.h @@ -0,0 +1,17 @@ +/* + * Copyright (C) 2008-2009 ST-Ericsson AB + * License terms: GNU General Public License (GPL) version 2 + * TCM memory handling for ARM systems + * + * Author: Linus Walleij linus.wall...@stericsson.com + * Author: Rickard Andersson rickard.anders...@stericsson.com + */ + +#ifdef CONFIG_HAVE_TCM +void __init tcm_init(void); +#else +/* No TCM support, just blank inlines to be optimized out */ +inline void tcm_init(void) +{ +} +#endif -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 3/6] ARM, crashkernel: correct total_mem size in reserve_crashkernel()
There is some platforms which have highmem, so this equation doesn't represent total_mem size properly. In addition, max_low_pfn's meaning is different in other architecture and it is scheduled to be changed, so remove related code to max_low_pfn. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c index 99ffe87..1149988 100644 --- a/arch/arm/kernel/setup.c +++ b/arch/arm/kernel/setup.c @@ -655,14 +655,6 @@ static int __init init_machine_late(void) late_initcall(init_machine_late); #ifdef CONFIG_KEXEC -static inline unsigned long long get_total_mem(void) -{ - unsigned long total; - - total = max_low_pfn - min_low_pfn; - return total PAGE_SHIFT; -} - /** * reserve_crashkernel() - reserves memory are for crash kernel * @@ -677,7 +669,7 @@ static void __init reserve_crashkernel(void) unsigned long limit = 0; int ret; - total_mem = get_total_mem(); + total_mem = memblock_phys_mem_size(); ret = parse_crashkernel(boot_command_line, total_mem, crash_size, crash_base); if (ret != 0 || crash_size == 0) -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 5/6] ARM, mm: change meaning of max_low_pfn to maximum pfn for nobootmem
nobootmem use max_low_pfn for computing boundary in free_all_bootmem() So we need proper value to max_low_pfn. But, there is some difficulty related to max_low_pfn. max_low_pfn is used for two meanings in various architectures. One is for number of pages in lowmem and the other is for maximum lowmem pfn. Now, in ARM, it is used as number of pages in lowmem. You can get more information in below link. http://lwn.net/Articles/543408/ http://lwn.net/Articles/543424/ As I investigated, architectures which use max_low_pfn as maximum pfn are more than others, so to change meaning of max_low_pfn to maximum pfn is preferable solution to me. This patch change max_low_pfn as maximum lowmem pfn in ARM. In addition, min_low_pfn, max_pfn is assigned according to this criteria. There is no real user for max_low_pfn except block/blk-setting.c and blk-setting.c assume that max_low_pfn is maximum lowmem pfn, so this patch may not harm anything. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c index 049414a..873f4ca 100644 --- a/arch/arm/mm/init.c +++ b/arch/arm/mm/init.c @@ -423,12 +423,10 @@ void __init bootmem_init(void) * This doesn't seem to be used by the Linux memory manager any * more, but is used by ll_rw_block. If we can get rid of it, we * also get rid of some of the stuff above as well. -* -* Note: max_low_pfn and max_pfn reflect the number of _pages_ in -* the system, not the maximum PFN. */ - max_low_pfn = max_low - PHYS_PFN_OFFSET; - max_pfn = max_high - PHYS_PFN_OFFSET; + min_low_pfn = min; + max_low_pfn = max_low; + max_pfn = max_high; } static inline int free_area(unsigned long pfn, unsigned long end, char *s) @@ -544,7 +542,7 @@ static void __init free_unused_memmap(struct meminfo *mi) static void __init free_highpages(void) { #ifdef CONFIG_HIGHMEM - unsigned long max_low = max_low_pfn + PHYS_PFN_OFFSET; + unsigned long max_low = max_low_pfn; struct memblock_region *mem, *res; /* set highmem page free */ -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 6/6] ARM, mm: enable NO_BOOTMEM for default ARM build
If we use NO_BOOTMEM, we don't need to initialize a bitmap and we don't need to do bitmap operation, so we can boot slightly faster. Additionaly advantage of enabling NO_BOOTMEM is saving more memory. bootmem allocator manage memories as page unit, so if we request 4 bytes area to bootmem, it actually give us PAGE_SIZE area. nobootmem manage memory as byte unit, so there is no waste. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 13b7394..8b73417 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -58,6 +58,7 @@ config ARM select CLONE_BACKWARDS select OLD_SIGSUSPEND3 select OLD_SIGACTION + select NO_BOOTMEM help The ARM series is a line of low-power-consumption RISC chip designs licensed by ARM Ltd and targeted at embedded applications and -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 0/6] correct load_balance()
Commit 88b8dac0 makes load_balance() consider other cpus in its group. But, there are some missing parts for this feature to work properly. This patchset correct these things and make load_balance() robust. Others are related to LBF_ALL_PINNED. This is fallback functionality when all tasks can't be moved as cpu affinity. But, currently, if imbalance is not large enough to task's load, we leave LBF_ALL_PINNED flag and 'redo' is triggered. This is not our intention, so correct it. These are based on v3.9-rc4. Changelog v1-v2: Changes from Peter's suggestion [4/6]: don't include a code to evaluate load value in can_migrate_task() [5/6]: rename load_balance_tmpmask to load_balance_mask [6/6]: not use one more cpumasks, use env's cpus for prevent to re-select Joonsoo Kim (6): sched: change position of resched_cpu() in load_balance() sched: explicitly cpu_idle_type checking in rebalance_domains() sched: don't consider other cpus in our group in case of NEWLY_IDLE sched: move up affinity check to mitigate useless redoing overhead sched: rename load_balance_tmpmask to load_balance_mask sched: prevent to re-select dst-cpu in load_balance() kernel/sched/core.c |4 +-- kernel/sched/fair.c | 67 +++ 2 files changed, 38 insertions(+), 33 deletions(-) -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 1/6] sched: change position of resched_cpu() in load_balance()
cur_ld_moved is reset if env.flags hit LBF_NEED_BREAK. So, there is possibility that we miss doing resched_cpu(). Correct it as changing position of resched_cpu() before checking LBF_NEED_BREAK. Acked-by: Peter Zijlstra a.p.zijls...@chello.nl Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7a33e59..f084069 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5061,17 +5061,17 @@ more_balance: double_rq_unlock(env.dst_rq, busiest); local_irq_restore(flags); - if (env.flags LBF_NEED_BREAK) { - env.flags = ~LBF_NEED_BREAK; - goto more_balance; - } - /* * some other cpu did the load balance for us. */ if (cur_ld_moved env.dst_cpu != smp_processor_id()) resched_cpu(env.dst_cpu); + if (env.flags LBF_NEED_BREAK) { + env.flags = ~LBF_NEED_BREAK; + goto more_balance; + } + /* * Revisit (affine) tasks on src_cpu that couldn't be moved to * us and move them to an alternate dst_cpu in our sched_group -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 3/6] sched: don't consider other cpus in our group in case of NEWLY_IDLE
Commit 88b8dac0 makes load_balance() consider other cpus in its group, regardless of idle type. When we do NEWLY_IDLE balancing, we should not consider it, because a motivation of NEWLY_IDLE balancing is to turn this cpu to non idle state if needed. This is not the case of other cpus. So, change code not to consider other cpus for NEWLY_IDLE balancing. With this patch, assign 'if (pulled_task) this_rq-idle_stamp = 0' in idle_balance() is corrected, because NEWLY_IDLE balancing doesn't consider other cpus. Assigning to 'this_rq-idle_stamp' is now valid. Cc: Srivatsa Vaddagiri va...@linux.vnet.ibm.com Acked-by: Peter Zijlstra a.p.zijls...@chello.nl Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 9d693d0..3f8c4f2 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5007,8 +5007,17 @@ static int load_balance(int this_cpu, struct rq *this_rq, .cpus = cpus, }; + /* For NEWLY_IDLE load_balancing, we don't need to consider +* other cpus in our group */ + if (idle == CPU_NEWLY_IDLE) { + env.dst_grpmask = NULL; + /* we don't care max_lb_iterations in this case, +* in following patch, this will be removed */ + max_lb_iterations = 0; + } else { + max_lb_iterations = cpumask_weight(env.dst_grpmask); + } cpumask_copy(cpus, cpu_active_mask); - max_lb_iterations = cpumask_weight(env.dst_grpmask); schedstat_inc(sd, lb_count[idle]); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 2/6] sched: explicitly cpu_idle_type checking in rebalance_domains()
After commit 88b8dac0, dst-cpu can be changed in load_balance(), then we can't know cpu_idle_type of dst-cpu when load_balance() return positive. So, add explicit cpu_idle_type checking. Cc: Srivatsa Vaddagiri va...@linux.vnet.ibm.com Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index f084069..9d693d0 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5506,10 +5506,10 @@ static void rebalance_domains(int cpu, enum cpu_idle_type idle) if (time_after_eq(jiffies, sd-last_balance + interval)) { if (load_balance(cpu, rq, sd, idle, balance)) { /* -* We've pulled tasks over so either we're no -* longer idle. +* We've pulled tasks over so either we may +* be no longer idle. */ - idle = CPU_NOT_IDLE; + idle = idle_cpu(cpu) ? CPU_IDLE : CPU_NOT_IDLE; } sd-last_balance = jiffies; } -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 6/6] sched: prevent to re-select dst-cpu in load_balance()
Commit 88b8dac0 makes load_balance() consider other cpus in its group. But, in that, there is no code for preventing to re-select dst-cpu. So, same dst-cpu can be selected over and over. This patch add functionality to load_balance() in order to exclude cpu which is selected once. We prevent to re-select dst_cpu via env's cpus, so now, env's cpus is a candidate not only for src_cpus, but also dst_cpus. Cc: Srivatsa Vaddagiri va...@linux.vnet.ibm.com Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e3f09f4..6f238d2 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3883,7 +3883,7 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env) return 0; if (!cpumask_test_cpu(env-dst_cpu, tsk_cpus_allowed(p))) { - int new_dst_cpu; + int cpu; schedstat_inc(p, se.statistics.nr_failed_migrations_affine); @@ -3898,12 +3898,14 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env) if (!env-dst_grpmask || (env-flags LBF_SOME_PINNED)) return 0; - new_dst_cpu = cpumask_first_and(env-dst_grpmask, - tsk_cpus_allowed(p)); - if (new_dst_cpu nr_cpu_ids) { - env-flags |= LBF_SOME_PINNED; - env-new_dst_cpu = new_dst_cpu; - } + /* Prevent to re-select dst_cpu via env's cpus */ + for_each_cpu_and(cpu, env-dst_grpmask, env-cpus) + if (cpumask_test_cpu(cpu, tsk_cpus_allowed(p))) { + env-flags |= LBF_SOME_PINNED; + env-new_dst_cpu = cpu; + break; + } + return 0; } @@ -4989,7 +4991,6 @@ static int load_balance(int this_cpu, struct rq *this_rq, int *balance) { int ld_moved, cur_ld_moved, active_balance = 0; - int lb_iterations, max_lb_iterations; struct sched_group *group; struct rq *busiest; unsigned long flags; @@ -5009,11 +5010,6 @@ static int load_balance(int this_cpu, struct rq *this_rq, * other cpus in our group */ if (idle == CPU_NEWLY_IDLE) { env.dst_grpmask = NULL; - /* we don't care max_lb_iterations in this case, -* in following patch, this will be removed */ - max_lb_iterations = 0; - } else { - max_lb_iterations = cpumask_weight(env.dst_grpmask); } cpumask_copy(cpus, cpu_active_mask); @@ -5041,7 +5037,6 @@ redo: schedstat_add(sd, lb_imbalance[idle], env.imbalance); ld_moved = 0; - lb_iterations = 1; if (busiest-nr_running 1) { /* * Attempt to move tasks. If find_busiest_group has found @@ -5098,14 +5093,17 @@ more_balance: * moreover subsequent load balance cycles should correct the * excess load moved. */ - if ((env.flags LBF_SOME_PINNED) env.imbalance 0 - lb_iterations++ max_lb_iterations) { + if ((env.flags LBF_SOME_PINNED) env.imbalance 0) { env.dst_rq = cpu_rq(env.new_dst_cpu); env.dst_cpu = env.new_dst_cpu; env.flags = ~LBF_SOME_PINNED; env.loop = 0; env.loop_break = sched_nr_migrate_break; + + /* Prevent to re-select dst_cpu via env's cpus */ + cpumask_clear_cpu(env.dst_cpu, env.cpus); + /* * Go back to more_balance rather than redo since we * need to continue with same src_cpu. -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 5/6] sched: rename load_balance_tmpmask to load_balance_mask
This name doesn't represent specific meaning. So rename it to imply it's purpose. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 7f12624..07b4178 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6865,7 +6865,7 @@ struct task_group root_task_group; LIST_HEAD(task_groups); #endif -DECLARE_PER_CPU(cpumask_var_t, load_balance_tmpmask); +DECLARE_PER_CPU(cpumask_var_t, load_balance_mask); void __init sched_init(void) { @@ -6902,7 +6902,7 @@ void __init sched_init(void) #endif /* CONFIG_RT_GROUP_SCHED */ #ifdef CONFIG_CPUMASK_OFFSTACK for_each_possible_cpu(i) { - per_cpu(load_balance_tmpmask, i) = (void *)ptr; + per_cpu(load_balance_mask, i) = (void *)ptr; ptr += cpumask_size(); } #endif /* CONFIG_CPUMASK_OFFSTACK */ diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index d3c6011..e3f09f4 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4958,7 +4958,7 @@ static struct rq *find_busiest_queue(struct lb_env *env, #define MAX_PINNED_INTERVAL512 /* Working cpumask for load_balance and load_balance_newidle. */ -DEFINE_PER_CPU(cpumask_var_t, load_balance_tmpmask); +DEFINE_PER_CPU(cpumask_var_t, load_balance_mask); static int need_active_balance(struct lb_env *env) { @@ -4993,7 +4993,7 @@ static int load_balance(int this_cpu, struct rq *this_rq, struct sched_group *group; struct rq *busiest; unsigned long flags; - struct cpumask *cpus = __get_cpu_var(load_balance_tmpmask); + struct cpumask *cpus = __get_cpu_var(load_balance_mask); struct lb_env env = { .sd = sd, -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 4/6] sched: move up affinity check to mitigate useless redoing overhead
Currently, LBF_ALL_PINNED is cleared after affinity check is passed. So, if task migration is skipped by small load value or small imbalance value in move_tasks(), we don't clear LBF_ALL_PINNED. At last, we trigger 'redo' in load_balance(). Imbalance value is often so small that any tasks cannot be moved to other cpus and, of course, this situation may be continued after we change the target cpu. So this patch move up affinity check code and clear LBF_ALL_PINNED before evaluating load value in order to mitigate useless redoing overhead. In addition, re-order some comments correctly. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3f8c4f2..d3c6011 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3874,10 +3874,14 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env) int tsk_cache_hot = 0; /* * We do not migrate tasks that are: -* 1) running (obviously), or +* 1) throttled_lb_pair, or * 2) cannot be migrated to this CPU due to cpus_allowed, or -* 3) are cache-hot on their current CPU. +* 3) running (obviously), or +* 4) are cache-hot on their current CPU. */ + if (throttled_lb_pair(task_group(p), env-src_cpu, env-dst_cpu)) + return 0; + if (!cpumask_test_cpu(env-dst_cpu, tsk_cpus_allowed(p))) { int new_dst_cpu; @@ -3948,9 +3952,6 @@ static int move_one_task(struct lb_env *env) struct task_struct *p, *n; list_for_each_entry_safe(p, n, env-src_rq-cfs_tasks, se.group_node) { - if (throttled_lb_pair(task_group(p), env-src_rq-cpu, env-dst_cpu)) - continue; - if (!can_migrate_task(p, env)) continue; @@ -4002,7 +4003,7 @@ static int move_tasks(struct lb_env *env) break; } - if (throttled_lb_pair(task_group(p), env-src_cpu, env-dst_cpu)) + if (!can_migrate_task(p, env)) goto next; load = task_h_load(p); @@ -4013,9 +4014,6 @@ static int move_tasks(struct lb_env *env) if ((load / 2) env-imbalance) goto next; - if (!can_migrate_task(p, env)) - goto next; - move_task(p, env); pulled++; env-imbalance -= load; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
On Mon, Mar 25, 2013 at 02:32:35PM -0400, Steven Rostedt wrote: On Mon, 2013-03-25 at 18:27 +, Christoph Lameter wrote: On Mon, 25 Mar 2013, Steven Rostedt wrote: If this makes it more deterministic, and lower worse case latencies, then it's definitely worth the price. Yes that would make it more deterministic. Maybe I should add an option to be able to compile the allocator without cpu partial page support? I agree that would be useful. Hello, Steven and Christoph. How about using spin_try_lock() in unfreeze_partials() and using spin_lock_contented() in get_partial_node() to reduce latency? IMHO, this doesn't make code more deterministic, but can maintain a benefit of cpu partial page with tolerable latency. Thanks. -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
On Tue, Mar 26, 2013 at 11:30:32PM -0400, Steven Rostedt wrote: On Wed, 2013-03-27 at 11:59 +0900, Joonsoo Kim wrote: How about using spin_try_lock() in unfreeze_partials() and using spin_lock_contented() in get_partial_node() to reduce latency? IMHO, this doesn't make code more deterministic, but can maintain a benefit of cpu partial page with tolerable latency. And what do you do when you fail the try lock? Try again, or just break out? Just break out. We can run benchmarks, but I don't like playing games in -rt. It either is deterministic, or it isn't. Okay. -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/5] sched: remove one division operation in find_buiest_queue()
Remove one division operation in find_buiest_queue(). Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 6f238d2..1d8774f 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4911,7 +4911,7 @@ static struct rq *find_busiest_queue(struct lb_env *env, struct sched_group *group) { struct rq *busiest = NULL, *rq; - unsigned long max_load = 0; + unsigned long busiest_load = 0, busiest_power = SCHED_POWER_SCALE; int i; for_each_cpu(i, sched_group_cpus(group)) { @@ -4942,10 +4942,9 @@ static struct rq *find_busiest_queue(struct lb_env *env, * the load can be moved away from the cpu that is potentially * running at a lower capacity. */ - wl = (wl * SCHED_POWER_SCALE) / power; - - if (wl max_load) { - max_load = wl; + if (wl * busiest_power busiest_load * power) { + busiest_load = wl; + busiest_power = power; busiest = rq; } } -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/5] sched: don't consider upper se in sched_slice()
Following-up upper se in sched_slice() should not be done, because sched_slice() is used for checking that resched is needed whithin *this* cfs_rq and there is one problem related to this in current implementation. The problem is that if we follow-up upper se in sched_slice(), it is possible that we get a ideal slice which is lower than sysctl_sched_min_granularity. For example, we assume that we have 4 tg which is attached to root tg with same share and each one have 20 runnable tasks on cpu0, respectivly. In this case, __sched_period() return sysctl_sched_min_granularity * 20 and then go into loop. At first iteration, we compute a portion of slice for this task on this cfs_rq, so get a slice, sysctl_sched_min_granularity. Afterward, we enter second iteration and get a slice which is a quarter of sysctl_sched_min_granularity, because there is 4 tgs with same share in that cfs_rq. Ensuring slice larger than min_granularity is important for performance and there is no lower bound about this, except timer tick, we should fix it not to consider upper se when calculating sched_slice. Below is my testing result on my 4 cpus machine. I did a test for verifying this effect in below environment. CONFIG_HZ=1000 and CONFIG_SCHED_AUTOGROUP=y /proc/sys/kernel/sched_min_granularity_ns is 225, that is, 2.25ms. Did following command. For each 4 sessions, for i in `seq 20`; do taskset -c 3 sh -c 'while true; do :; done' done ./perf sched record ./perf script -C 003 | grep sched_switch | cut -b -40 | less Result is below. *Vanilla* sh 2724 [003] 152.52801 sh 2779 [003] 152.52900 sh 2775 [003] 152.53000 sh 2751 [003] 152.53100 sh 2717 [003] 152.53201 *With this patch* sh 2640 [003] 147.48700 sh 2662 [003] 147.49000 sh 2601 [003] 147.49300 sh 2633 [003] 147.49400 In vanilla case, min_granularity is lower than 1ms, so every tick trigger reschedule. After patch appied, we can see min_granularity is ensured. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 204a9a9..e232421 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -631,23 +631,20 @@ static u64 __sched_period(unsigned long nr_running) */ static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se) { + struct load_weight *load; + struct load_weight lw; u64 slice = __sched_period(cfs_rq-nr_running + !se-on_rq); - for_each_sched_entity(se) { - struct load_weight *load; - struct load_weight lw; - - cfs_rq = cfs_rq_of(se); - load = cfs_rq-load; + load = cfs_rq-load; - if (unlikely(!se-on_rq)) { - lw = cfs_rq-load; + if (unlikely(!se-on_rq)) { + lw = cfs_rq-load; - update_load_add(lw, se-load.weight); - load = lw; - } - slice = calc_delta_mine(slice, se-load.weight, load); + update_load_add(lw, se-load.weight); + load = lw; } + slice = calc_delta_mine(slice, se-load.weight, load); + return slice; } -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/5] sched: clean-up struct sd_lb_stat
There is no reason to maintain separate variables for this_group and busiest_group in sd_lb_stat, except saving some space. But this structure is always allocated in stack, so this saving isn't really benificial. This patch unify these variables, so IMO, readability may be improved. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 95ec757..204a9a9 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4175,36 +4175,6 @@ static unsigned long task_h_load(struct task_struct *p) /** Helpers for find_busiest_group / /* - * sd_lb_stats - Structure to store the statistics of a sched_domain - * during load balancing. - */ -struct sd_lb_stats { - struct sched_group *busiest; /* Busiest group in this sd */ - struct sched_group *this; /* Local group in this sd */ - unsigned long total_load; /* Total load of all groups in sd */ - unsigned long total_pwr; /* Total power of all groups in sd */ - unsigned long avg_load;/* Average load across all groups in sd */ - - /** Statistics of this group */ - unsigned long this_load; - unsigned long this_load_per_task; - unsigned long this_nr_running; - unsigned long this_has_capacity; - unsigned int this_idle_cpus; - - /* Statistics of the busiest group */ - unsigned int busiest_idle_cpus; - unsigned long max_load; - unsigned long busiest_load_per_task; - unsigned long busiest_nr_running; - unsigned long busiest_group_capacity; - unsigned long busiest_has_capacity; - unsigned int busiest_group_weight; - - int group_imb; /* Is there imbalance in this sd */ -}; - -/* * sg_lb_stats - stats of a sched_group required for load_balancing */ struct sg_lb_stats { @@ -4219,6 +4189,24 @@ struct sg_lb_stats { int group_has_capacity; /* Is there extra capacity in the group? */ }; +/* + * sd_lb_stats - Structure to store the statistics of a sched_domain + * during load balancing. + */ +struct sd_lb_stats { + struct sched_group *busiest; /* Busiest group in this sd */ + struct sched_group *this; /* Local group in this sd */ + unsigned long total_load; /* Total load of all groups in sd */ + unsigned long total_pwr; /* Total power of all groups in sd */ + unsigned long sd_avg_load; /* Average load across all groups in sd */ + + /* Statistics of this group */ + struct sg_lb_stats this_stat; + + /* Statistics of the busiest group */ + struct sg_lb_stats busiest_stat; +}; + /** * get_sd_load_idx - Obtain the load index for a given sched domain. * @sd: The sched_domain whose load_idx is to be obtained. @@ -4499,7 +4487,7 @@ static bool update_sd_pick_busiest(struct lb_env *env, struct sched_group *sg, struct sg_lb_stats *sgs) { - if (sgs-avg_load = sds-max_load) + if (sgs-avg_load = sds-busiest_stat.avg_load) return false; if (sgs-sum_nr_running sgs-group_capacity) @@ -4536,7 +4524,7 @@ static inline void update_sd_lb_stats(struct lb_env *env, { struct sched_domain *child = env-sd-child; struct sched_group *sg = env-sd-groups; - struct sg_lb_stats sgs; + struct sg_lb_stats tmp_sgs; int load_idx, prefer_sibling = 0; if (child child-flags SD_PREFER_SIBLING) @@ -4546,13 +4534,16 @@ static inline void update_sd_lb_stats(struct lb_env *env, do { int local_group; + struct sg_lb_stats *sgs; local_group = cpumask_test_cpu(env-dst_cpu, sched_group_cpus(sg)); - memset(sgs, 0, sizeof(sgs)); - update_sg_lb_stats(env, sg, load_idx, local_group, sgs); - - sds-total_load += sgs.group_load; - sds-total_pwr += sg-sgp-power; + if (local_group) + sgs = sds-this_stat; + else { + sgs = tmp_sgs; + memset(sgs, 0, sizeof(*sgs)); + } + update_sg_lb_stats(env, sg, load_idx, local_group, sgs); /* * In case the child domain prefers tasks go to siblings @@ -4564,26 +4555,19 @@ static inline void update_sd_lb_stats(struct lb_env *env, * heaviest group when it is already under-utilized (possible * with a large weight task outweighs the tasks on the system). */ - if (prefer_sibling !local_group sds-this_has_capacity) - sgs.group_capacity = min(sgs.group_capacity, 1UL); + if (prefer_sibling !local_group + sds-this sds-this_stat.group_has_capacity) + sgs-group_capacity = min(sgs-group_capacity, 1UL
[PATCH 5/5] sched: limit sched_slice if it is more than sysctl_sched_latency
sched_slice() compute ideal runtime slice. If there are many tasks in cfs_rq, period for this cfs_rq is extended to guarantee that each task has time slice at least, sched_min_granularity. And then each task get a portion of this period for it. If there is a task which have much larger load weight than others, a portion of period can exceed far more than sysctl_sched_latency. For exampple, you can simply imagine that one task with nice -20 and 9 tasks with nice 0 on one cfs_rq. In this case, load weight sum for this cfs_rq is 88761 + 9 * 1024, 97977. So a portion of slice for the task with nice -20 is sysctl_sched_min_granularity * 10 * (88761 / 97977), that is, approximately, sysctl_sched_min_granularity * 9. This aspect can be much larger if there is more tasks with nice 0. So we should limit this possible weird situation. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e232421..6ceffbc 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -645,6 +645,9 @@ static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se) } slice = calc_delta_mine(slice, se-load.weight, load); + if (unlikely(slice sysctl_sched_latency)) + slice = sysctl_sched_latency; + return slice; } -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/5] sched: factor out code to should_we_balance()
Now checking that this cpu is appropriate to balance is embedded into update_sg_lb_stats() and this checking has no direct relationship to this function. There is not enough reason to place this checking at update_sg_lb_stats(), except saving one iteration for sched_group_cpus. But with this change, we can save two memset cost and can expect better compiler optimization, so clean-up may be more beneficial to us. Below is result of this patch. * Vanilla * textdata bss dec hex filename 344991136 116 357518ba7 kernel/sched/fair.o * Patched * textdata bss dec hex filename 342431136 116 354958aa7 kernel/sched/fair.o In addition, rename @balance to @should_balance in order to represent its purpose more clearly. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1d8774f..95ec757 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4406,22 +4406,17 @@ fix_small_capacity(struct sched_domain *sd, struct sched_group *group) * @group: sched_group whose statistics are to be updated. * @load_idx: Load index of sched_domain of this_cpu for load calc. * @local_group: Does group contain this_cpu. - * @balance: Should we balance. * @sgs: variable to hold the statistics for this group. */ static inline void update_sg_lb_stats(struct lb_env *env, struct sched_group *group, int load_idx, - int local_group, int *balance, struct sg_lb_stats *sgs) + int local_group, struct sg_lb_stats *sgs) { unsigned long nr_running, max_nr_running, min_nr_running; unsigned long load, max_cpu_load, min_cpu_load; - unsigned int balance_cpu = -1, first_idle_cpu = 0; unsigned long avg_load_per_task = 0; int i; - if (local_group) - balance_cpu = group_balance_cpu(group); - /* Tally up the load of all CPUs in the group */ max_cpu_load = 0; min_cpu_load = ~0UL; @@ -4434,15 +4429,9 @@ static inline void update_sg_lb_stats(struct lb_env *env, nr_running = rq-nr_running; /* Bias balancing toward cpus of our domain */ - if (local_group) { - if (idle_cpu(i) !first_idle_cpu - cpumask_test_cpu(i, sched_group_mask(group))) { - first_idle_cpu = 1; - balance_cpu = i; - } - + if (local_group) load = target_load(i, load_idx); - } else { + else { load = source_load(i, load_idx); if (load max_cpu_load) max_cpu_load = load; @@ -4462,22 +4451,9 @@ static inline void update_sg_lb_stats(struct lb_env *env, sgs-idle_cpus++; } - /* -* First idle cpu or the first cpu(busiest) in this sched group -* is eligible for doing load balancing at this and above -* domains. In the newly idle case, we will allow all the cpu's -* to do the newly idle load balance. -*/ - if (local_group) { - if (env-idle != CPU_NEWLY_IDLE) { - if (balance_cpu != env-dst_cpu) { - *balance = 0; - return; - } - update_group_power(env-sd, env-dst_cpu); - } else if (time_after_eq(jiffies, group-sgp-next_update)) - update_group_power(env-sd, env-dst_cpu); - } + if (local_group (env-idle != CPU_NEWLY_IDLE || + time_after_eq(jiffies, group-sgp-next_update))) + update_group_power(env-sd, env-dst_cpu); /* Adjust by relative CPU power of the group */ sgs-avg_load = (sgs-group_load*SCHED_POWER_SCALE) / group-sgp-power; @@ -4556,7 +4532,7 @@ static bool update_sd_pick_busiest(struct lb_env *env, * @sds: variable to hold the statistics for this sched_domain. */ static inline void update_sd_lb_stats(struct lb_env *env, - int *balance, struct sd_lb_stats *sds) + struct sd_lb_stats *sds) { struct sched_domain *child = env-sd-child; struct sched_group *sg = env-sd-groups; @@ -4573,10 +4549,7 @@ static inline void update_sd_lb_stats(struct lb_env *env, local_group = cpumask_test_cpu(env-dst_cpu, sched_group_cpus(sg)); memset(sgs, 0, sizeof(sgs)); - update_sg_lb_stats(env, sg, load_idx, local_group, balance, sgs); - - if (local_group !(*balance)) - return; + update_sg_lb_stats(env, sg, load_idx, local_group, sgs); sds-total_load
[PATCH 0/5] optimization, clean-up, correctness about fair.c
There is no unified subject for this patchset. Patch 1 is for removing one division operation with multiplication. Patch 2,3 is for clean-up related to load_balance(), there is improvement in terms of code size and IMO, readability may be also improved. Patch 4,5 is for correctness about sched_slice() Feel free to give a comment for this patchset. It's based on v3.9-rc4 and top of my previous patchset. But, perhaps, it may not really depend on my previous patchset. :) https://lkml.org/lkml/2013/3/26/28 [PATCH v2 0/6] correct load_balance() Thanks. Joonsoo Kim (5): sched: remove one division operation in find_buiest_queue() sched: factor out code to should_we_balance() sched: clean-up struct sd_lb_stat sched: don't consider upper se in sched_slice() sched: limit sched_slice if it is more than sysctl_sched_latency kernel/sched/fair.c | 347 +-- 1 file changed, 171 insertions(+), 176 deletions(-) -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 7/8] mm, vmalloc: export vmap_area_list, instead of vmlist
From: Joonsoo Kim js1...@gmail.com Although our intention is to unexport internal structure entirely, but there is one exception for kexec. kexec dumps address of vmlist and makedumpfile uses this information. We are about to remove vmlist, then another way to retrieve information of vmalloc layer is needed for makedumpfile. For this purpose, we export vmap_area_list, instead of vmlist. Cc: Eric Biederman ebied...@xmission.com Cc: Dave Anderson ander...@redhat.com Cc: Vivek Goyal vgo...@redhat.com Cc: Atsushi Kumagai kumagai-atsu...@mxc.nes.nec.co.jp Signed-off-by: Joonsoo Kim js1...@gmail.com Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 698b1e5..8a25f90 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -130,8 +130,7 @@ extern long vwrite(char *buf, char *addr, unsigned long count); /* * Internals. Dont't use.. */ -extern rwlock_t vmlist_lock; -extern struct vm_struct *vmlist; +extern struct list_head vmap_area_list; extern __init void vm_area_add_early(struct vm_struct *vm); extern __init void vm_area_register_early(struct vm_struct *vm, size_t align); diff --git a/kernel/kexec.c b/kernel/kexec.c index bddd3d7..d9bfc6c 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -1489,7 +1489,7 @@ static int __init crash_save_vmcoreinfo_init(void) VMCOREINFO_SYMBOL(swapper_pg_dir); #endif VMCOREINFO_SYMBOL(_stext); - VMCOREINFO_SYMBOL(vmlist); + VMCOREINFO_SYMBOL(vmap_area_list); #ifndef CONFIG_NEED_MULTIPLE_NODES VMCOREINFO_SYMBOL(mem_map); diff --git a/mm/nommu.c b/mm/nommu.c index e193280..ed82358 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -228,8 +228,7 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address, } EXPORT_SYMBOL(follow_pfn); -DEFINE_RWLOCK(vmlist_lock); -struct vm_struct *vmlist; +LIST_HEAD(vmap_area_list); void vfree(const void *addr) { diff --git a/mm/vmalloc.c b/mm/vmalloc.c index bda6cef..7e63984 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -261,7 +261,8 @@ struct vmap_area { }; static DEFINE_SPINLOCK(vmap_area_lock); -static LIST_HEAD(vmap_area_list); +/* Export for kexec only */ +LIST_HEAD(vmap_area_list); static struct rb_root vmap_area_root = RB_ROOT; /* The vmap cache globals are protected by vmap_area_lock */ @@ -272,6 +273,10 @@ static unsigned long cached_align; static unsigned long vmap_area_pcpu_hole; +/*** Old vmalloc interfaces ***/ +static DEFINE_RWLOCK(vmlist_lock); +static struct vm_struct *vmlist; + static struct vmap_area *__find_vmap_area(unsigned long addr) { struct rb_node *n = vmap_area_root.rb_node; @@ -1283,10 +1288,6 @@ int map_vm_area(struct vm_struct *area, pgprot_t prot, struct page ***pages) } EXPORT_SYMBOL_GPL(map_vm_area); -/*** Old vmalloc interfaces ***/ -DEFINE_RWLOCK(vmlist_lock); -struct vm_struct *vmlist; - static void setup_vmalloc_vm(struct vm_struct *vm, struct vmap_area *va, unsigned long flags, const void *caller) { -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 0/8] remove vm_struct list management
This patchset remove vm_struct list management after initializing vmalloc. Adding and removing an entry to vmlist is linear time complexity, so it is inefficient. If we maintain this list, overall time complexity of adding and removing area to vmalloc space is O(N), although we use rbtree for finding vacant place and it's time complexity is just O(logN). And vmlist and vmlist_lock is used many places of outside of vmalloc.c. It is preferable that we hide this raw data structure and provide well-defined function for supporting them, because it makes that they cannot mistake when manipulating theses structure and it makes us easily maintain vmalloc layer. For kexec and makedumpfile, I export vmap_area_list, instead of vmlist. This comes from Atsushi's recommendation. For more information, please refer below link. https://lkml.org/lkml/2012/12/6/184 These are based on v3.9-rc2. Changes from v1 5/8: skip areas for lazy_free 6/8: skip areas for lazy_free 7/8: export vmap_area_list for kexec, instead of vmlist Joonsoo Kim (8): mm, vmalloc: change iterating a vmlist to find_vm_area() mm, vmalloc: move get_vmalloc_info() to vmalloc.c mm, vmalloc: protect va-vm by vmap_area_lock mm, vmalloc: iterate vmap_area_list, instead of vmlist in vread/vwrite() mm, vmalloc: iterate vmap_area_list in get_vmalloc_info() mm, vmalloc: iterate vmap_area_list, instead of vmlist, in vmallocinfo() mm, vmalloc: export vmap_area_list, instead of vmlist mm, vmalloc: remove list management of vmlist after initializing vmalloc arch/tile/mm/pgtable.c |7 +- arch/unicore32/mm/ioremap.c | 17 ++-- arch/x86/mm/ioremap.c |7 +- fs/proc/Makefile|2 +- fs/proc/internal.h | 18 fs/proc/meminfo.c |1 + fs/proc/mmu.c | 60 - include/linux/vmalloc.h | 21 - kernel/kexec.c |2 +- mm/nommu.c |3 +- mm/vmalloc.c| 207 +-- 11 files changed, 170 insertions(+), 175 deletions(-) delete mode 100644 fs/proc/mmu.c -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 5/8] mm, vmalloc: iterate vmap_area_list in get_vmalloc_info()
From: Joonsoo Kim js1...@gmail.com This patch is preparing step for removing vmlist entirely. For above purpose, we change iterating a vmap_list codes to iterating a vmap_area_list. It is somewhat trivial change, but just one thing should be noticed. vmlist is lack of information about some areas in vmalloc address space. For example, vm_map_ram() allocate area in vmalloc address space, but it doesn't make a link with vmlist. To provide full information about vmalloc address space is better idea, so we don't use va-vm and use vmap_area directly. This makes get_vmalloc_info() more precise. Signed-off-by: Joonsoo Kim js1...@gmail.com Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 59aa328..aee1f61 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2671,46 +2671,50 @@ module_init(proc_vmalloc_init); void get_vmalloc_info(struct vmalloc_info *vmi) { - struct vm_struct *vma; + struct vmap_area *va; unsigned long free_area_size; unsigned long prev_end; vmi-used = 0; + vmi-largest_chunk = 0; - if (!vmlist) { - vmi-largest_chunk = VMALLOC_TOTAL; - } else { - vmi-largest_chunk = 0; + prev_end = VMALLOC_START; - prev_end = VMALLOC_START; - - read_lock(vmlist_lock); + spin_lock(vmap_area_lock); - for (vma = vmlist; vma; vma = vma-next) { - unsigned long addr = (unsigned long) vma-addr; + if (list_empty(vmap_area_list)) { + vmi-largest_chunk = VMALLOC_TOTAL; + goto out; + } - /* -* Some archs keep another range for modules in vmlist -*/ - if (addr VMALLOC_START) - continue; - if (addr = VMALLOC_END) - break; + list_for_each_entry(va, vmap_area_list, list) { + unsigned long addr = va-va_start; - vmi-used += vma-size; + /* +* Some archs keep another range for modules in vmalloc space +*/ + if (addr VMALLOC_START) + continue; + if (addr = VMALLOC_END) + break; - free_area_size = addr - prev_end; - if (vmi-largest_chunk free_area_size) - vmi-largest_chunk = free_area_size; + if (va-flags (VM_LAZY_FREE | VM_LAZY_FREEING)) + continue; - prev_end = vma-size + addr; - } + vmi-used += (va-va_end - va-va_start); - if (VMALLOC_END - prev_end vmi-largest_chunk) - vmi-largest_chunk = VMALLOC_END - prev_end; + free_area_size = addr - prev_end; + if (vmi-largest_chunk free_area_size) + vmi-largest_chunk = free_area_size; - read_unlock(vmlist_lock); + prev_end = va-va_end; } + + if (VMALLOC_END - prev_end vmi-largest_chunk) + vmi-largest_chunk = VMALLOC_END - prev_end; + +out: + spin_unlock(vmap_area_lock); } #endif -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 4/8] mm, vmalloc: iterate vmap_area_list, instead of vmlist in vread/vwrite()
From: Joonsoo Kim js1...@gmail.com Now, when we hold a vmap_area_lock, va-vm can't be discarded. So we can safely access to va-vm when iterating a vmap_area_list with holding a vmap_area_lock. With this property, change iterating vmlist codes in vread/vwrite() to iterating vmap_area_list. There is a little difference relate to lock, because vmlist_lock is mutex, but, vmap_area_lock is spin_lock. It may introduce a spinning overhead during vread/vwrite() is executing. But, these are debug-oriented functions, so this overhead is not real problem for common case. Signed-off-by: Joonsoo Kim js1...@gmail.com Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 1bf94ad..59aa328 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2012,7 +2012,8 @@ static int aligned_vwrite(char *buf, char *addr, unsigned long count) long vread(char *buf, char *addr, unsigned long count) { - struct vm_struct *tmp; + struct vmap_area *va; + struct vm_struct *vm; char *vaddr, *buf_start = buf; unsigned long buflen = count; unsigned long n; @@ -2021,10 +2022,17 @@ long vread(char *buf, char *addr, unsigned long count) if ((unsigned long) addr + count count) count = -(unsigned long) addr; - read_lock(vmlist_lock); - for (tmp = vmlist; count tmp; tmp = tmp-next) { - vaddr = (char *) tmp-addr; - if (addr = vaddr + tmp-size - PAGE_SIZE) + spin_lock(vmap_area_lock); + list_for_each_entry(va, vmap_area_list, list) { + if (!count) + break; + + if (!(va-flags VM_VM_AREA)) + continue; + + vm = va-vm; + vaddr = (char *) vm-addr; + if (addr = vaddr + vm-size - PAGE_SIZE) continue; while (addr vaddr) { if (count == 0) @@ -2034,10 +2042,10 @@ long vread(char *buf, char *addr, unsigned long count) addr++; count--; } - n = vaddr + tmp-size - PAGE_SIZE - addr; + n = vaddr + vm-size - PAGE_SIZE - addr; if (n count) n = count; - if (!(tmp-flags VM_IOREMAP)) + if (!(vm-flags VM_IOREMAP)) aligned_vread(buf, addr, n); else /* IOREMAP area is treated as memory hole */ memset(buf, 0, n); @@ -2046,7 +2054,7 @@ long vread(char *buf, char *addr, unsigned long count) count -= n; } finished: - read_unlock(vmlist_lock); + spin_unlock(vmap_area_lock); if (buf == buf_start) return 0; @@ -2085,7 +2093,8 @@ finished: long vwrite(char *buf, char *addr, unsigned long count) { - struct vm_struct *tmp; + struct vmap_area *va; + struct vm_struct *vm; char *vaddr; unsigned long n, buflen; int copied = 0; @@ -2095,10 +2104,17 @@ long vwrite(char *buf, char *addr, unsigned long count) count = -(unsigned long) addr; buflen = count; - read_lock(vmlist_lock); - for (tmp = vmlist; count tmp; tmp = tmp-next) { - vaddr = (char *) tmp-addr; - if (addr = vaddr + tmp-size - PAGE_SIZE) + spin_lock(vmap_area_lock); + list_for_each_entry(va, vmap_area_list, list) { + if (!count) + break; + + if (!(va-flags VM_VM_AREA)) + continue; + + vm = va-vm; + vaddr = (char *) vm-addr; + if (addr = vaddr + vm-size - PAGE_SIZE) continue; while (addr vaddr) { if (count == 0) @@ -2107,10 +2123,10 @@ long vwrite(char *buf, char *addr, unsigned long count) addr++; count--; } - n = vaddr + tmp-size - PAGE_SIZE - addr; + n = vaddr + vm-size - PAGE_SIZE - addr; if (n count) n = count; - if (!(tmp-flags VM_IOREMAP)) { + if (!(vm-flags VM_IOREMAP)) { aligned_vwrite(buf, addr, n); copied++; } @@ -2119,7 +2135,7 @@ long vwrite(char *buf, char *addr, unsigned long count) count -= n; } finished: - read_unlock(vmlist_lock); + spin_unlock(vmap_area_lock); if (!copied) return 0; return buflen; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 3/8] mm, vmalloc: protect va-vm by vmap_area_lock
From: Joonsoo Kim js1...@gmail.com Inserting and removing an entry to vmlist is linear time complexity, so it is inefficient. Following patches will try to remove vmlist entirely. This patch is preparing step for it. For removing vmlist, iterating vmlist codes should be changed to iterating a vmap_area_list. Before implementing that, we should make sure that when we iterate a vmap_area_list, accessing to va-vm doesn't cause a race condition. This patch ensure that when iterating a vmap_area_list, there is no race condition for accessing to vm_struct. Signed-off-by: Joonsoo Kim js1...@gmail.com Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 1d9878b..1bf94ad 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1290,12 +1290,14 @@ struct vm_struct *vmlist; static void setup_vmalloc_vm(struct vm_struct *vm, struct vmap_area *va, unsigned long flags, const void *caller) { + spin_lock(vmap_area_lock); vm-flags = flags; vm-addr = (void *)va-va_start; vm-size = va-va_end - va-va_start; vm-caller = caller; va-vm = vm; va-flags |= VM_VM_AREA; + spin_unlock(vmap_area_lock); } static void insert_vmalloc_vmlist(struct vm_struct *vm) @@ -1447,6 +1449,11 @@ struct vm_struct *remove_vm_area(const void *addr) if (va va-flags VM_VM_AREA) { struct vm_struct *vm = va-vm; + spin_lock(vmap_area_lock); + va-vm = NULL; + va-flags = ~VM_VM_AREA; + spin_unlock(vmap_area_lock); + if (!(vm-flags VM_UNLIST)) { struct vm_struct *tmp, **p; /* -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 8/8] mm, vmalloc: remove list management of vmlist after initializing vmalloc
From: Joonsoo Kim js1...@gmail.com Now, there is no need to maintain vmlist after initializing vmalloc. So remove related code and data structure. Signed-off-by: Joonsoo Kim js1...@gmail.com Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 7e63984..151da8a 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -273,10 +273,6 @@ static unsigned long cached_align; static unsigned long vmap_area_pcpu_hole; -/*** Old vmalloc interfaces ***/ -static DEFINE_RWLOCK(vmlist_lock); -static struct vm_struct *vmlist; - static struct vmap_area *__find_vmap_area(unsigned long addr) { struct rb_node *n = vmap_area_root.rb_node; @@ -318,7 +314,7 @@ static void __insert_vmap_area(struct vmap_area *va) rb_link_node(va-rb_node, parent, p); rb_insert_color(va-rb_node, vmap_area_root); - /* address-sort this list so it is usable like the vmlist */ + /* address-sort this list */ tmp = rb_prev(va-rb_node); if (tmp) { struct vmap_area *prev; @@ -1130,6 +1126,7 @@ void *vm_map_ram(struct page **pages, unsigned int count, int node, pgprot_t pro } EXPORT_SYMBOL(vm_map_ram); +static struct vm_struct *vmlist __initdata; /** * vm_area_add_early - add vmap area early during boot * @vm: vm_struct to add @@ -1301,10 +1298,8 @@ static void setup_vmalloc_vm(struct vm_struct *vm, struct vmap_area *va, spin_unlock(vmap_area_lock); } -static void insert_vmalloc_vmlist(struct vm_struct *vm) +static void clear_vm_unlist(struct vm_struct *vm) { - struct vm_struct *tmp, **p; - /* * Before removing VM_UNLIST, * we should make sure that vm has proper values. @@ -1312,22 +1307,13 @@ static void insert_vmalloc_vmlist(struct vm_struct *vm) */ smp_wmb(); vm-flags = ~VM_UNLIST; - - write_lock(vmlist_lock); - for (p = vmlist; (tmp = *p) != NULL; p = tmp-next) { - if (tmp-addr = vm-addr) - break; - } - vm-next = *p; - *p = vm; - write_unlock(vmlist_lock); } static void insert_vmalloc_vm(struct vm_struct *vm, struct vmap_area *va, unsigned long flags, const void *caller) { setup_vmalloc_vm(vm, va, flags, caller); - insert_vmalloc_vmlist(vm); + clear_vm_unlist(vm); } static struct vm_struct *__get_vm_area_node(unsigned long size, @@ -1370,10 +1356,9 @@ static struct vm_struct *__get_vm_area_node(unsigned long size, /* * When this function is called from __vmalloc_node_range, -* we do not add vm_struct to vmlist here to avoid -* accessing uninitialized members of vm_struct such as -* pages and nr_pages fields. They will be set later. -* To distinguish it from others, we use a VM_UNLIST flag. +* we add VM_UNLIST flag to avoid accessing uninitialized +* members of vm_struct such as pages and nr_pages fields. +* They will be set later. */ if (flags VM_UNLIST) setup_vmalloc_vm(area, va, flags, caller); @@ -1462,20 +1447,6 @@ struct vm_struct *remove_vm_area(const void *addr) va-flags = ~VM_VM_AREA; spin_unlock(vmap_area_lock); - if (!(vm-flags VM_UNLIST)) { - struct vm_struct *tmp, **p; - /* -* remove from list and disallow access to -* this vm_struct before unmap. (address range -* confliction is maintained by vmap.) -*/ - write_lock(vmlist_lock); - for (p = vmlist; (tmp = *p) != vm; p = tmp-next) - ; - *p = tmp-next; - write_unlock(vmlist_lock); - } - vmap_debug_free_range(va-va_start, va-va_end); free_unmap_vmap_area(va); vm-size -= PAGE_SIZE; @@ -1695,10 +1666,11 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align, return NULL; /* -* In this function, newly allocated vm_struct is not added -* to vmlist at __get_vm_area_node(). so, it is added here. +* In this function, newly allocated vm_struct has VM_UNLIST flag. +* It means that vm_struct is not fully initialized. +* Now, it is fully initialized, so remove this flag here. */ - insert_vmalloc_vmlist(area); + clear_vm_unlist(area); /* * A ref_count = 3 is needed because the vm_struct and vmap_area @@ -2594,7 +2566,7 @@ static void show_numa_info(struct seq_file *m, struct vm_struct *v) if (!counters) return; - /* Pair with smp_wmb() in insert_vmalloc_vmlist() */ + /* Pair with smp_wmb() in clear_vm_unlist
[PATCH v2 6/8] mm, vmalloc: iterate vmap_area_list, instead of vmlist, in vmallocinfo()
From: Joonsoo Kim js1...@gmail.com This patch is preparing step for removing vmlist entirely. For above purpose, we change iterating a vmap_list codes to iterating a vmap_area_list. It is somewhat trivial change, but just one thing should be noticed. Using vmap_area_list in vmallocinfo() introduce ordering problem in SMP system. In s_show(), we retrieve some values from vm_struct. vm_struct's values is not fully setup when va-vm is assigned. Full setup is notified by removing VM_UNLIST flag without holding a lock. When we see that VM_UNLIST is removed, it is not ensured that vm_struct has proper values in view of other CPUs. So we need smp_[rw]mb for ensuring that proper values is assigned when we see that VM_UNLIST is removed. Therefore, this patch not only change a iteration list, but also add a appropriate smp_[rw]mb to right places. Signed-off-by: Joonsoo Kim js1...@gmail.com Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/mm/vmalloc.c b/mm/vmalloc.c index aee1f61..bda6cef 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1304,7 +1304,14 @@ static void insert_vmalloc_vmlist(struct vm_struct *vm) { struct vm_struct *tmp, **p; + /* +* Before removing VM_UNLIST, +* we should make sure that vm has proper values. +* Pair with smp_rmb() in show_numa_info(). +*/ + smp_wmb(); vm-flags = ~VM_UNLIST; + write_lock(vmlist_lock); for (p = vmlist; (tmp = *p) != NULL; p = tmp-next) { if (tmp-addr = vm-addr) @@ -2542,19 +2549,19 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms) #ifdef CONFIG_PROC_FS static void *s_start(struct seq_file *m, loff_t *pos) - __acquires(vmlist_lock) + __acquires(vmap_area_lock) { loff_t n = *pos; - struct vm_struct *v; + struct vmap_area *va; - read_lock(vmlist_lock); - v = vmlist; - while (n 0 v) { + spin_lock(vmap_area_lock); + va = list_entry((vmap_area_list)-next, typeof(*va), list); + while (n 0 va-list != vmap_area_list) { n--; - v = v-next; + va = list_entry(va-list.next, typeof(*va), list); } - if (!n) - return v; + if (!n va-list != vmap_area_list) + return va; return NULL; @@ -2562,16 +2569,20 @@ static void *s_start(struct seq_file *m, loff_t *pos) static void *s_next(struct seq_file *m, void *p, loff_t *pos) { - struct vm_struct *v = p; + struct vmap_area *va = p, *next; ++*pos; - return v-next; + next = list_entry(va-list.next, typeof(*va), list); + if (next-list != vmap_area_list) + return next; + + return NULL; } static void s_stop(struct seq_file *m, void *p) - __releases(vmlist_lock) + __releases(vmap_area_lock) { - read_unlock(vmlist_lock); + spin_unlock(vmap_area_lock); } static void show_numa_info(struct seq_file *m, struct vm_struct *v) @@ -2582,6 +2593,11 @@ static void show_numa_info(struct seq_file *m, struct vm_struct *v) if (!counters) return; + /* Pair with smp_wmb() in insert_vmalloc_vmlist() */ + smp_rmb(); + if (v-flags VM_UNLIST) + return; + memset(counters, 0, nr_node_ids * sizeof(unsigned int)); for (nr = 0; nr v-nr_pages; nr++) @@ -2595,7 +2611,20 @@ static void show_numa_info(struct seq_file *m, struct vm_struct *v) static int s_show(struct seq_file *m, void *p) { - struct vm_struct *v = p; + struct vmap_area *va = p; + struct vm_struct *v; + + if (va-flags (VM_LAZY_FREE | VM_LAZY_FREEING)) + return 0; + + if (!(va-flags VM_VM_AREA)) { + seq_printf(m, 0x%pK-0x%pK %7ld vm_map_ram\n, + (void *)va-va_start, (void *)va-va_end, + va-va_end - va-va_start); + return 0; + } + + v = va-vm; seq_printf(m, 0x%pK-0x%pK %7ld, v-addr, v-addr + v-size, v-size); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 1/8] mm, vmalloc: change iterating a vmlist to find_vm_area()
From: Joonsoo Kim js1...@gmail.com The purpose of iterating a vmlist is finding vm area with specific virtual address. find_vm_area() is provided for this purpose and more efficient, because it uses a rbtree. So change it. Cc: Chris Metcalf cmetc...@tilera.com Cc: Guan Xuetao g...@mprc.pku.edu.cn Cc: Thomas Gleixner t...@linutronix.de Cc: Ingo Molnar mi...@redhat.com Cc: H. Peter Anvin h...@zytor.com Acked-by: Guan Xuetao g...@mprc.pku.edu.cn Acked-by: Ingo Molnar mi...@kernel.org Acked-by: Chris Metcalf cmetc...@tilera.com Signed-off-by: Joonsoo Kim js1...@gmail.com Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/arch/tile/mm/pgtable.c b/arch/tile/mm/pgtable.c index b3b4972..dfd63ce 100644 --- a/arch/tile/mm/pgtable.c +++ b/arch/tile/mm/pgtable.c @@ -592,12 +592,7 @@ void iounmap(volatile void __iomem *addr_in) in parallel. Reuse of the virtual address is prevented by leaving it in the global lists until we're done with it. cpa takes care of the direct mappings. */ - read_lock(vmlist_lock); - for (p = vmlist; p; p = p-next) { - if (p-addr == addr) - break; - } - read_unlock(vmlist_lock); + p = find_vm_area((void *)addr); if (!p) { pr_err(iounmap: bad address %p\n, addr); diff --git a/arch/unicore32/mm/ioremap.c b/arch/unicore32/mm/ioremap.c index b7a6055..13068ee 100644 --- a/arch/unicore32/mm/ioremap.c +++ b/arch/unicore32/mm/ioremap.c @@ -235,7 +235,7 @@ EXPORT_SYMBOL(__uc32_ioremap_cached); void __uc32_iounmap(volatile void __iomem *io_addr) { void *addr = (void *)(PAGE_MASK (unsigned long)io_addr); - struct vm_struct **p, *tmp; + struct vm_struct *vm; /* * If this is a section based mapping we need to handle it @@ -244,17 +244,10 @@ void __uc32_iounmap(volatile void __iomem *io_addr) * all the mappings before the area can be reclaimed * by someone else. */ - write_lock(vmlist_lock); - for (p = vmlist ; (tmp = *p) ; p = tmp-next) { - if ((tmp-flags VM_IOREMAP) (tmp-addr == addr)) { - if (tmp-flags VM_UNICORE_SECTION_MAPPING) { - unmap_area_sections((unsigned long)tmp-addr, - tmp-size); - } - break; - } - } - write_unlock(vmlist_lock); + vm = find_vm_area(addr); + if (vm (vm-flags VM_IOREMAP) + (vm-flags VM_UNICORE_SECTION_MAPPING)) + unmap_area_sections((unsigned long)vm-addr, vm-size); vunmap(addr); } diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index 78fe3f1..9a1e658 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -282,12 +282,7 @@ void iounmap(volatile void __iomem *addr) in parallel. Reuse of the virtual address is prevented by leaving it in the global lists until we're done with it. cpa takes care of the direct mappings. */ - read_lock(vmlist_lock); - for (p = vmlist; p; p = p-next) { - if (p-addr == (void __force *)addr) - break; - } - read_unlock(vmlist_lock); + p = find_vm_area((void __force *)addr); if (!p) { printk(KERN_ERR iounmap: bad address %p\n, addr); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 2/8] mm, vmalloc: move get_vmalloc_info() to vmalloc.c
From: Joonsoo Kim js1...@gmail.com Now get_vmalloc_info() is in fs/proc/mmu.c. There is no reason that this code must be here and it's implementation needs vmlist_lock and it iterate a vmlist which may be internal data structure for vmalloc. It is preferable that vmlist_lock and vmlist is only used in vmalloc.c for maintainability. So move the code to vmalloc.c Signed-off-by: Joonsoo Kim js1...@gmail.com Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/fs/proc/Makefile b/fs/proc/Makefile index 712f24d..ab30716 100644 --- a/fs/proc/Makefile +++ b/fs/proc/Makefile @@ -5,7 +5,7 @@ obj-y += proc.o proc-y := nommu.o task_nommu.o -proc-$(CONFIG_MMU) := mmu.o task_mmu.o +proc-$(CONFIG_MMU) := task_mmu.o proc-y += inode.o root.o base.o generic.o array.o \ fd.o diff --git a/fs/proc/internal.h b/fs/proc/internal.h index 85ff3a4..7571035 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -30,24 +30,6 @@ extern int proc_net_init(void); static inline int proc_net_init(void) { return 0; } #endif -struct vmalloc_info { - unsigned long used; - unsigned long largest_chunk; -}; - -#ifdef CONFIG_MMU -#define VMALLOC_TOTAL (VMALLOC_END - VMALLOC_START) -extern void get_vmalloc_info(struct vmalloc_info *vmi); -#else - -#define VMALLOC_TOTAL 0UL -#define get_vmalloc_info(vmi) \ -do { \ - (vmi)-used = 0;\ - (vmi)-largest_chunk = 0; \ -} while(0) -#endif - extern int proc_tid_stat(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *task); extern int proc_tgid_stat(struct seq_file *m, struct pid_namespace *ns, diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c index 1efaaa1..5aa847a 100644 --- a/fs/proc/meminfo.c +++ b/fs/proc/meminfo.c @@ -11,6 +11,7 @@ #include linux/swap.h #include linux/vmstat.h #include linux/atomic.h +#include linux/vmalloc.h #include asm/page.h #include asm/pgtable.h #include internal.h diff --git a/fs/proc/mmu.c b/fs/proc/mmu.c deleted file mode 100644 index 8ae221d..000 --- a/fs/proc/mmu.c +++ /dev/null @@ -1,60 +0,0 @@ -/* mmu.c: mmu memory info files - * - * Copyright (C) 2004 Red Hat, Inc. All Rights Reserved. - * Written by David Howells (dhowe...@redhat.com) - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ -#include linux/spinlock.h -#include linux/vmalloc.h -#include linux/highmem.h -#include asm/pgtable.h -#include internal.h - -void get_vmalloc_info(struct vmalloc_info *vmi) -{ - struct vm_struct *vma; - unsigned long free_area_size; - unsigned long prev_end; - - vmi-used = 0; - - if (!vmlist) { - vmi-largest_chunk = VMALLOC_TOTAL; - } - else { - vmi-largest_chunk = 0; - - prev_end = VMALLOC_START; - - read_lock(vmlist_lock); - - for (vma = vmlist; vma; vma = vma-next) { - unsigned long addr = (unsigned long) vma-addr; - - /* -* Some archs keep another range for modules in vmlist -*/ - if (addr VMALLOC_START) - continue; - if (addr = VMALLOC_END) - break; - - vmi-used += vma-size; - - free_area_size = addr - prev_end; - if (vmi-largest_chunk free_area_size) - vmi-largest_chunk = free_area_size; - - prev_end = vma-size + addr; - } - - if (VMALLOC_END - prev_end vmi-largest_chunk) - vmi-largest_chunk = VMALLOC_END - prev_end; - - read_unlock(vmlist_lock); - } -} diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 6071e91..698b1e5 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -158,4 +158,22 @@ pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms) # endif #endif +struct vmalloc_info { + unsigned long used; + unsigned long largest_chunk; +}; + +#ifdef CONFIG_MMU +#define VMALLOC_TOTAL (VMALLOC_END - VMALLOC_START) +extern void get_vmalloc_info(struct vmalloc_info *vmi); +#else + +#define VMALLOC_TOTAL 0UL +#define get_vmalloc_info(vmi) \ +do { \ + (vmi)-used = 0;\ + (vmi)-largest_chunk = 0; \ +} while (0) +#endif + #endif /* _LINUX_VMALLOC_H */ diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 0f751f2..1d9878b 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c
Re: [PATCH] mm: page_alloc: remove branch operation in free_pages_prepare()
Hello, Hugh. On Thu, Mar 07, 2013 at 06:01:26PM -0800, Hugh Dickins wrote: On Fri, 8 Mar 2013, Joonsoo Kim wrote: On Thu, Mar 07, 2013 at 10:54:15AM -0800, Hugh Dickins wrote: On Thu, 7 Mar 2013, Joonsoo Kim wrote: When we found that the flag has a bit of PAGE_FLAGS_CHECK_AT_PREP, we reset the flag. If we always reset the flag, we can reduce one branch operation. So remove it. Cc: Hugh Dickins hu...@google.com Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com I don't object to this patch. But certainly I would have written it that way in order not to dirty a cacheline unnecessarily. It may be obvious to you that the cacheline in question is almost always already dirty, and the branch almost always more expensive. But I'll leave that to you, and to those who know more about these subtle costs than I do. Yes. I already think about that. I thought that even if a cacheline is not dirty at this time, we always touch the 'struct page' in set_freepage_migratetype() a little later, so dirtying is not the problem. I expect that a very high proportion of user pages have PG_uptodate to be cleared here; and there's also the recently added page_nid_reset_last(), which will dirty the flags or a nearby field when CONFIG_NUMA_BALANCING. Those argue in favour of your patch. Ah... I totally missed it. But, now, I re-think this and decide to drop this patch. The reason is that 'struct page' of 'compound pages' may not be dirty at this time and will not be dirty at later time. Actual compound pages would have PG_head or PG_tail or PG_compound to be cleared there, I believe (check if I'm right on that). The questionable case is the ordinary order0 case without __GFP_COMP (and page_nid_reset_last() is applied to each subpage of those). Yes. So this patch is bad idea. I'm not so sure. I doubt your patch will make a giant improvement in kernel performance! But it might make a little - maybe you just need to give some numbers from perf to justify it (but I'm easily dazzled by numbers - don't expect me to judge the result). Okay. Thanks for enlightening comment. Now, I don't have any idea to collect a performance result for this patch. When I have more time, I try to think it. Thanks. Hugh Is there any comments? Thanks. Hugh diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8fcced7..778f2a9 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -614,8 +614,7 @@ static inline int free_pages_check(struct page *page) return 1; } page_nid_reset_last(page); - if (page-flags PAGE_FLAGS_CHECK_AT_PREP) - page-flags = ~PAGE_FLAGS_CHECK_AT_PREP; + page-flags = ~PAGE_FLAGS_CHECK_AT_PREP; return 0; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ARM: mm: clean-up in order to reduce to call kmap_high_get()
In kmap_atomic(), kmap_high_get() is invoked for checking already mapped area. In __flush_dcache_page() and dma_cache_maint_page(), we explicitly call kmap_high_get() before kmap_atomic() when cache_is_vipt(), so kmap_high_get() can be invoked twice. This is useless operation, so remove one. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index c7e3759..b7711be 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -822,16 +822,16 @@ static void dma_cache_maint_page(struct page *page, unsigned long offset, if (PageHighMem(page)) { if (len + offset PAGE_SIZE) len = PAGE_SIZE - offset; - vaddr = kmap_high_get(page); - if (vaddr) { - vaddr += offset; - op(vaddr, len, dir); - kunmap_high(page); - } else if (cache_is_vipt()) { - /* unmapped pages might still be cached */ + if (cache_is_vipt()) { vaddr = kmap_atomic(page); op(vaddr + offset, len, dir); kunmap_atomic(vaddr); + } else { + vaddr = kmap_high_get(page); + if (vaddr) { + op(vaddr + offset, len, dir); + kunmap_high(page); + } } } else { vaddr = page_address(page) + offset; diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c index 1c8f7f5..e6a03d0 100644 --- a/arch/arm/mm/flush.c +++ b/arch/arm/mm/flush.c @@ -170,15 +170,18 @@ void __flush_dcache_page(struct address_space *mapping, struct page *page) if (!PageHighMem(page)) { __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE); } else { - void *addr = kmap_high_get(page); - if (addr) { - __cpuc_flush_dcache_area(addr, PAGE_SIZE); - kunmap_high(page); - } else if (cache_is_vipt()) { - /* unmapped pages might still be cached */ + void *addr; + + if (cache_is_vipt()) { addr = kmap_atomic(page); __cpuc_flush_dcache_area(addr, PAGE_SIZE); kunmap_atomic(addr); + } else { + addr = kmap_high_get(page); + if (addr) { + __cpuc_flush_dcache_area(addr, PAGE_SIZE); + kunmap_high(page); + } } } -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH] ARM: mm: disable kmap_high_get() for SMP
With SMP and enabling kmap_high_get(), it makes users of kmap_atomic() sequential ordered, because kmap_high_get() use global kmap_lock(). It is not welcome situation, so turn off this optimization for SMP. Cc: Nicolas Pitre n...@linaro.org Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/arch/arm/include/asm/highmem.h b/arch/arm/include/asm/highmem.h index 8c5e828..82fea0f 100644 --- a/arch/arm/include/asm/highmem.h +++ b/arch/arm/include/asm/highmem.h @@ -26,15 +26,13 @@ extern void kunmap_high(struct page *page); * The reason for kmap_high_get() is to ensure that the currently kmap'd * page usage count does not decrease to zero while we're using its * existing virtual mapping in an atomic context. With a VIVT cache this - * is essential to do, but with a VIPT cache this is only an optimization - * so not to pay the price of establishing a second mapping if an existing - * one can be used. However, on platforms without hardware TLB maintenance - * broadcast, we simply cannot use ARCH_NEEDS_KMAP_HIGH_GET at all since - * the locking involved must also disable IRQs which is incompatible with - * the IPI mechanism used by global TLB operations. + * is essential to do, but with a VIPT cache this is only an optimization. + * With SMP and enabling kmap_high_get(), it makes users of kmap_atomic() + * sequential ordered, because kmap_high_get() use global kmap_lock(). + * It is not welcome situation, so turn off this optimization for SMP. */ #define ARCH_NEEDS_KMAP_HIGH_GET -#if defined(CONFIG_SMP) defined(CONFIG_CPU_TLB_V6) +#if defined(CONFIG_SMP) #undef ARCH_NEEDS_KMAP_HIGH_GET #if defined(CONFIG_HIGHMEM) defined(CONFIG_CPU_CACHE_VIVT) #error The sum of features in your kernel config cannot be supported together -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] ARM: mm: disable kmap_high_get() for SMP
Hello, Nicolas. On Tue, Mar 05, 2013 at 05:36:12PM +0800, Nicolas Pitre wrote: On Mon, 4 Mar 2013, Joonsoo Kim wrote: With SMP and enabling kmap_high_get(), it makes users of kmap_atomic() sequential ordered, because kmap_high_get() use global kmap_lock(). It is not welcome situation, so turn off this optimization for SMP. I'm not sure I understand the problem. The lock taken by kmap_high_get() is released right away before that function returns and therefore this is not actually serializing anything. Yes, you understand what I want to say correctly. Sorry for bad explanation. Following is reasons why I send this patch with RFC tag. If we have more cpus, performance degration is possible although it is very short time to holding the lock in kmap_high_get(). And kmap has maximum 512 entries(512 * 4K = 2M) and some mobile devices has 2G memory(highmem 1G), so probability for finding matched entry is approximately 1/512. This probability can be more decreasing for device which have more memory. So I think that waste time to find matched entry is more than saved time. Above is my humble opinion, so please let me know what I am missing. Thanks. Nicolas -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mm: page_alloc: remove branch operation in free_pages_prepare()
When we found that the flag has a bit of PAGE_FLAGS_CHECK_AT_PREP, we reset the flag. If we always reset the flag, we can reduce one branch operation. So remove it. Cc: Hugh Dickins hu...@google.com Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8fcced7..778f2a9 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -614,8 +614,7 @@ static inline int free_pages_check(struct page *page) return 1; } page_nid_reset_last(page); - if (page-flags PAGE_FLAGS_CHECK_AT_PREP) - page-flags = ~PAGE_FLAGS_CHECK_AT_PREP; + page-flags = ~PAGE_FLAGS_CHECK_AT_PREP; return 0; } -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] ARM: mm: disable kmap_high_get() for SMP
2013/3/7 Nicolas Pitre nicolas.pi...@linaro.org: On Thu, 7 Mar 2013, Joonsoo Kim wrote: Hello, Nicolas. On Tue, Mar 05, 2013 at 05:36:12PM +0800, Nicolas Pitre wrote: On Mon, 4 Mar 2013, Joonsoo Kim wrote: With SMP and enabling kmap_high_get(), it makes users of kmap_atomic() sequential ordered, because kmap_high_get() use global kmap_lock(). It is not welcome situation, so turn off this optimization for SMP. I'm not sure I understand the problem. The lock taken by kmap_high_get() is released right away before that function returns and therefore this is not actually serializing anything. Yes, you understand what I want to say correctly. Sorry for bad explanation. Following is reasons why I send this patch with RFC tag. If we have more cpus, performance degration is possible although it is very short time to holding the lock in kmap_high_get(). And kmap has maximum 512 entries(512 * 4K = 2M) and some mobile devices has 2G memory(highmem 1G), so probability for finding matched entry is approximately 1/512. This probability can be more decreasing for device which have more memory. So I think that waste time to find matched entry is more than saved time. Above is my humble opinion, so please let me know what I am missing. Please look at the kmap_high_get() code again. It performs no searching at all. What it does is: If page is not highmem, it may be already filtered in kmap_atomic(). So we only consider highmem page. For highmem page, it perform searching. In kmap_high_get(), page_address() is called. In page_address(), it hash PA and iterate a list for this hashed value. And another advantage of disabling ARCH_NEEDS_KMAP_HIGH_GET is that kmap(), kunmap() works without irq disabled. Thanks. - lock the kmap array against concurrent changes - if the given page is not highmem, unlock and return NULL - otherwise increment that page reference count, unlock, and return the mapped address for that page. There is almost zero cost to this function, independently of the number of kmap entries, whereas it does save much bigger costs elsewhere when it is successful. Nicolas -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: page_alloc: remove branch operation in free_pages_prepare()
Hello, Hugh. On Thu, Mar 07, 2013 at 10:54:15AM -0800, Hugh Dickins wrote: On Thu, 7 Mar 2013, Joonsoo Kim wrote: When we found that the flag has a bit of PAGE_FLAGS_CHECK_AT_PREP, we reset the flag. If we always reset the flag, we can reduce one branch operation. So remove it. Cc: Hugh Dickins hu...@google.com Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com I don't object to this patch. But certainly I would have written it that way in order not to dirty a cacheline unnecessarily. It may be obvious to you that the cacheline in question is almost always already dirty, and the branch almost always more expensive. But I'll leave that to you, and to those who know more about these subtle costs than I do. Yes. I already think about that. I thought that even if a cacheline is not dirty at this time, we always touch the 'struct page' in set_freepage_migratetype() a little later, so dirtying is not the problem. But, now, I re-think this and decide to drop this patch. The reason is that 'struct page' of 'compound pages' may not be dirty at this time and will not be dirty at later time. So this patch is bad idea. Is there any comments? Thanks. Hugh diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8fcced7..778f2a9 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -614,8 +614,7 @@ static inline int free_pages_check(struct page *page) return 1; } page_nid_reset_last(page); - if (page-flags PAGE_FLAGS_CHECK_AT_PREP) - page-flags = ~PAGE_FLAGS_CHECK_AT_PREP; + page-flags = ~PAGE_FLAGS_CHECK_AT_PREP; return 0; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ARM: mm: clean-up in order to reduce to call kmap_high_get()
Hello, Russell. On Thu, Mar 07, 2013 at 01:26:23PM +, Russell King - ARM Linux wrote: On Mon, Mar 04, 2013 at 01:50:09PM +0900, Joonsoo Kim wrote: In kmap_atomic(), kmap_high_get() is invoked for checking already mapped area. In __flush_dcache_page() and dma_cache_maint_page(), we explicitly call kmap_high_get() before kmap_atomic() when cache_is_vipt(), so kmap_high_get() can be invoked twice. This is useless operation, so remove one. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index c7e3759..b7711be 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -822,16 +822,16 @@ static void dma_cache_maint_page(struct page *page, unsigned long offset, if (PageHighMem(page)) { if (len + offset PAGE_SIZE) len = PAGE_SIZE - offset; - vaddr = kmap_high_get(page); - if (vaddr) { - vaddr += offset; - op(vaddr, len, dir); - kunmap_high(page); - } else if (cache_is_vipt()) { - /* unmapped pages might still be cached */ + if (cache_is_vipt()) { This should be: if (cache_is_vipt_nonaliasing()) to make it _explicit_ that this technique is only for non-aliasing VIPT caches (this doesn't work on any other of our cache types.) Yes, I know we don't support highmem with VIPT aliasing caches - but still, we should ensure that this is self-documented in this code. Same for arch/arm/mm/flush.c Okay. I will re-work and will send v2 soon. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH] staging, android, ashmem: invalidate pin/unpin ioctl for private map
Private mapped ashmem doesn't install vm fault handler. So, do_anonymous_page is called for handling fault in handle_pte_fault(). This type of backed memory isn't related to asma-file which is used by ashmem shrinker. Shrinking unpinned area for this mapping will not shrink memory actually. Although it's memory doesn't actually removed, pin status will return ASHMEM_WAS_PURGED. So application will re-load content. This is needless overhead, so invalidate pin/unpin ioctl behavior for private map. CC: John Stultz john.stu...@linaro.org CC: Brian Swetland swetl...@google.com CC: Colin Cross ccr...@android.com CC: Arve Hjønnevåg a...@android.com CC: Dima Zavin d...@android.com CC: Robert Love rl...@google.com Signed-off-by: Joonsoo Kim js1...@gmail.com diff --git a/drivers/staging/android/ashmem.c b/drivers/staging/android/ashmem.c index 634b9ae..2fde9df 100644 --- a/drivers/staging/android/ashmem.c +++ b/drivers/staging/android/ashmem.c @@ -49,6 +49,7 @@ struct ashmem_area { struct file *file; /* the shmem-based backing file */ size_t size; /* size of the mapping, in bytes */ unsigned long prot_mask; /* allowed prot bits, as vm_flags */ + bool shared_mapping; /* mapping type */ }; /* @@ -327,6 +328,7 @@ static int ashmem_mmap(struct file *file, struct vm_area_struct *vma) fput(asma-file); goto out; } + asma-shared_mapping = 1; } if (vma-vm_file) @@ -614,21 +616,35 @@ static int ashmem_pin_unpin(struct ashmem_area *asma, unsigned long cmd, pgstart = pin.offset / PAGE_SIZE; pgend = pgstart + (pin.len / PAGE_SIZE) - 1; - mutex_lock(ashmem_mutex); + if (asma-shared_mapping) { + mutex_lock(ashmem_mutex); - switch (cmd) { - case ASHMEM_PIN: - ret = ashmem_pin(asma, pgstart, pgend); - break; - case ASHMEM_UNPIN: - ret = ashmem_unpin(asma, pgstart, pgend); - break; - case ASHMEM_GET_PIN_STATUS: - ret = ashmem_get_pin_status(asma, pgstart, pgend); - break; - } + switch (cmd) { + case ASHMEM_PIN: + ret = ashmem_pin(asma, pgstart, pgend); + break; + case ASHMEM_UNPIN: + ret = ashmem_unpin(asma, pgstart, pgend); + break; + case ASHMEM_GET_PIN_STATUS: + ret = ashmem_get_pin_status(asma, pgstart, pgend); + break; + } - mutex_unlock(ashmem_mutex); + mutex_unlock(ashmem_mutex); + + } else { + switch (cmd) { + /* if it is private map, pin/unpin have no impact to vm */ + case ASHMEM_PIN: + case ASHMEM_UNPIN: + ret = ASHMEM_NOT_PURGED; + break; + case ASHMEM_GET_PIN_STATUS: + ret = ASHMEM_IS_PINNED; + break; + } + } return ret; } -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] staging, android, ashmem: invalidate pin/unpin ioctl for private map
Hello, Dan. 2012/12/2 Dan Carpenter dan.carpen...@oracle.com: On Sat, Dec 01, 2012 at 02:45:57AM +0900, Joonsoo Kim wrote: @@ -614,21 +616,35 @@ static int ashmem_pin_unpin(struct ashmem_area *asma, unsigned long cmd, pgstart = pin.offset / PAGE_SIZE; pgend = pgstart + (pin.len / PAGE_SIZE) - 1; - mutex_lock(ashmem_mutex); + if (asma-shared_mapping) { + mutex_lock(ashmem_mutex); Wouldn't we need to hold the mutex while we check the -shared_mapping? I doesn't fully understand ashmem's lock semantic. Code for retrieving some value of asma instance doesn't hold the mutex, now. For example, in ashmem_ioctl(), asma-size, asma-prot_mask. And in ashmem_pin_unpin(), there is asma-file, asma-size which is retrieved without the mutex. According to this semantic, the mutex doesn't need for checking asma-shared_mapping. Thanks for review! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] staging, android, ashmem: invalidate pin/unpin ioctl for private map
2012/12/3 Dan Carpenter dan.carpen...@oracle.com: On Mon, Dec 03, 2012 at 09:09:59AM +0900, JoonSoo Kim wrote: Hello, Dan. 2012/12/2 Dan Carpenter dan.carpen...@oracle.com: On Sat, Dec 01, 2012 at 02:45:57AM +0900, Joonsoo Kim wrote: @@ -614,21 +616,35 @@ static int ashmem_pin_unpin(struct ashmem_area *asma, unsigned long cmd, pgstart = pin.offset / PAGE_SIZE; pgend = pgstart + (pin.len / PAGE_SIZE) - 1; - mutex_lock(ashmem_mutex); + if (asma-shared_mapping) { + mutex_lock(ashmem_mutex); Wouldn't we need to hold the mutex while we check the -shared_mapping? I doesn't fully understand ashmem's lock semantic. Code for retrieving some value of asma instance doesn't hold the mutex, now. For example, in ashmem_ioctl(), asma-size, asma-prot_mask. And in ashmem_pin_unpin(), there is asma-file, asma-size which is retrieved without the mutex. According to this semantic, the mutex doesn't need for checking asma-shared_mapping. The ashmem_ioctl() is clearly racy. :P asma-size can be modified and read at the same time. It's not an example to follow. Okay :) I will insert a code for holding the mutex in next spin. Thanks! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] scripts/tags.sh: Support subarch for ARM
Current tags.sh doesn't handle subarch for ARM. There are too many subarch on ARM, it is hard that we locate some functions which are defined in every subarch with tags util family. Therefore support subarch for removing this unconvenience. We can use ARM subarch functionality like below. make cscope O=. SRCARCH=arm SUBARCH=xxx Signed-off-by: Joonsoo Kim js1...@gmail.com diff --git a/scripts/tags.sh b/scripts/tags.sh index 79fdafb..a400c88 100755 --- a/scripts/tags.sh +++ b/scripts/tags.sh @@ -48,13 +48,14 @@ find_arch_sources() for i in $archincludedir; do prune=$prune -wholename $i -prune -o done - find ${tree}arch/$1 $ignore $prune -name $2 -print; + find ${tree}arch/$1 $ignore $subarchprune $prune -name $2 -print; } # find sources in arch/$1/include find_arch_include_sources() { - include=$(find ${tree}arch/$1/ -name include -type d); + include=$(find ${tree}arch/$1/ $subarchprune \ + -name include -type d -print); if [ -n $include ]; then archincludedir=$archincludedir $include find $include $ignore -name $2 -print; @@ -234,6 +235,14 @@ if [ ${ARCH} = um ]; then else archinclude=${SUBARCH} fi +elif [ ${SRCARCH} = arm -a ${SUBARCH} != ]; then + subarchdir=$(find ${tree}arch/$SRCARCH/ -name mach-* -type d -o \ + -name plat-* -type d); + for i in $subarchdir; do + if ! [[ $i =~ .*-${SUBARCH}$ ]]; then + subarchprune=$subarchprune -wholename $i -prune -o + fi + done fi remove_structs= -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] scripts/tags.sh: Support compiled source
We usually have interst in compiled files only, because they are strongly related to individual's work. Current tags.sh can't select compiled files, so support it. We can use this functionality like below. make cscope O=. SRCARCH= SUBARCH=compiled It must be executed after building the kernel. Signed-off-by: Joonsoo Kim js1...@gmail.com diff --git a/scripts/tags.sh b/scripts/tags.sh index a400c88..ef9668c 100755 --- a/scripts/tags.sh +++ b/scripts/tags.sh @@ -96,6 +96,29 @@ all_sources() find_other_sources '*.[chS]' } +all_compiled_sources() +{ + for i in $(all_sources); do + if [[ $i =~ .*c$ || $i =~ .*S$ ]]; then + j=${i/\.[cS]/\.o} + if [ -e $j ]; then + echo $i + fi + else + echo $i + fi + done +} + +all_target_sources() +{ + if [ $compiled_target = true ]; then + all_compiled_sources + else + all_sources + fi +} + all_kconfigs() { for arch in $ALLSOURCE_ARCHS; do @@ -111,18 +134,18 @@ all_defconfigs() docscope() { - (echo \-k; echo \-q; all_sources) cscope.files + (echo \-k; echo \-q; all_target_sources) cscope.files cscope -b -f cscope.out } dogtags() { - all_sources | gtags -i -f - + all_target_sources | gtags -i -f - } exuberant() { - all_sources | xargs $1 -a \ + all_target_sources | xargs $1 -a\ -I __initdata,__exitdata,__acquires,__releases \ -I __read_mostly,cacheline_aligned \ -I cacheline_aligned_in_smp \ @@ -174,7 +197,7 @@ exuberant() emacs() { - all_sources | xargs $1 -a \ + all_target_sources | xargs $1 -a\ --regex='/^(ENTRY|_GLOBAL)(\([^)]*\)).*/\2/'\ --regex='/^SYSCALL_DEFINE[0-9]?(\([^,)]*\).*/sys_\1/' \ --regex='/^TRACE_EVENT(\([^,)]*\).*/trace_\1/' \ @@ -221,10 +244,13 @@ xtags() elif $1 --version 21 | grep -iq emacs; then emacs $1 else - all_sources | xargs $1 -a + all_target_sources | xargs $1 -a fi } +if [ ${SUBARCH} = compiled ]; then + compiled_target=true +fi # Support um (which uses SUBARCH) if [ ${ARCH} = um ]; then -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 1/2] scripts/tags.sh: Support subarch for ARM
Current tags.sh doesn't handle subarch for ARM. There are too many subarch on ARM, it is hard that we locate some functions which are defined in every subarch with tags util family. Therefore support subarch for removing this unconvenience. We can use ARM subarch functionality like below. make cscope O=. SRCARCH=arm SUBARCH=xxx Signed-off-by: Joonsoo Kim js1...@gmail.com --- v2: change bash specific '[[]]' to 'case in' statement. diff --git a/scripts/tags.sh b/scripts/tags.sh index 79fdafb..38483f4 100755 --- a/scripts/tags.sh +++ b/scripts/tags.sh @@ -48,13 +48,14 @@ find_arch_sources() for i in $archincludedir; do prune=$prune -wholename $i -prune -o done - find ${tree}arch/$1 $ignore $prune -name $2 -print; + find ${tree}arch/$1 $ignore $subarchprune $prune -name $2 -print; } # find sources in arch/$1/include find_arch_include_sources() { - include=$(find ${tree}arch/$1/ -name include -type d); + include=$(find ${tree}arch/$1/ $subarchprune \ + -name include -type d -print); if [ -n $include ]; then archincludedir=$archincludedir $include find $include $ignore -name $2 -print; @@ -234,6 +235,21 @@ if [ ${ARCH} = um ]; then else archinclude=${SUBARCH} fi +elif [ ${SRCARCH} = arm -a ${SUBARCH} != ]; then + subarchdir=$(find ${tree}arch/$SRCARCH/ -name mach-* -type d -o \ + -name plat-* -type d); + for i in $subarchdir; do + case $i in + *mach-${SUBARCH}) + ;; + *plat-${SUBARCH}) + ;; + *) + subarchprune=$subarchprune \ + -wholename $i -prune -o + ;; + esac + done fi remove_structs= -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 2/2] scripts/tags.sh: Support compiled source
We usually have interst in compiled files only, because they are strongly related to individual's work. Current tags.sh can't select compiled files, so support it. We can use this functionality like below. make cscope O=. SRCARCH= COMPILED_SOURCE=compiled It must be executed after building the kernel. Signed-off-by: Joonsoo Kim js1...@gmail.com --- v2: change bash specific '[[]]' to 'case in' statement. use COMPILED_SOURCE env var, instead of abusing SUBARCH diff --git a/scripts/tags.sh b/scripts/tags.sh index 38483f4..9c02921 100755 --- a/scripts/tags.sh +++ b/scripts/tags.sh @@ -96,6 +96,32 @@ all_sources() find_other_sources '*.[chS]' } +all_compiled_sources() +{ + for i in $(all_sources); do + case $i in + *.[cS]) + j=${i/\.[cS]/\.o} + if [ -e $j ]; then + echo $i + fi + ;; + *) + echo $i + ;; + esac + done +} + +all_target_sources() +{ + if [ $COMPILED_SOURCE = compiled ]; then + all_compiled_sources + else + all_sources + fi +} + all_kconfigs() { for arch in $ALLSOURCE_ARCHS; do @@ -111,18 +137,18 @@ all_defconfigs() docscope() { - (echo \-k; echo \-q; all_sources) cscope.files + (echo \-k; echo \-q; all_target_sources) cscope.files cscope -b -f cscope.out } dogtags() { - all_sources | gtags -i -f - + all_target_sources | gtags -i -f - } exuberant() { - all_sources | xargs $1 -a \ + all_target_sources | xargs $1 -a\ -I __initdata,__exitdata,__acquires,__releases \ -I __read_mostly,cacheline_aligned \ -I cacheline_aligned_in_smp \ @@ -174,7 +200,7 @@ exuberant() emacs() { - all_sources | xargs $1 -a \ + all_target_sources | xargs $1 -a\ --regex='/^(ENTRY|_GLOBAL)(\([^)]*\)).*/\2/'\ --regex='/^SYSCALL_DEFINE[0-9]?(\([^,)]*\).*/sys_\1/' \ --regex='/^TRACE_EVENT(\([^,)]*\).*/trace_\1/' \ @@ -221,11 +247,10 @@ xtags() elif $1 --version 21 | grep -iq emacs; then emacs $1 else - all_sources | xargs $1 -a + all_target_sources | xargs $1 -a fi } - # Support um (which uses SUBARCH) if [ ${ARCH} = um ]; then if [ $SUBARCH = i386 ]; then -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] scripts/tags.sh: Support compiled source
2012/12/4 Michal Marek mma...@suse.cz: On 3.12.2012 17:22, Joonsoo Kim wrote: We usually have interst in compiled files only, because they are strongly related to individual's work. Current tags.sh can't select compiled files, so support it. We can use this functionality like below. make cscope O=. SRCARCH= SUBARCH=compiled I like the idea, but please try to come up with a different syntax that does not abuse SUBARCH. +all_compiled_sources() +{ + for i in $(all_sources); do + if [[ $i =~ .*c$ || $i =~ .*S$ ]]; then + j=${i/\.[cS]/\.o} Please use 'case $i in ...' instead of the bash-specific [[ ... ]] construct. I send v2 which is fixed according to your comment. Thanks for review. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RESEND] ARM: sched: correct update_sched_clock()
Hello, Russell. On Wed, Feb 06, 2013 at 04:33:55PM +, Russell King - ARM Linux wrote: On Wed, Feb 06, 2013 at 10:33:53AM +0100, Linus Walleij wrote: On Wed, Feb 6, 2013 at 6:21 AM, Joonsoo Kim iamjoonsoo@lge.com wrote: If we want load epoch_cyc and epoch_ns atomically, we should update epoch_cyc_copy first of all. This notify reader that updating is in progress. If you think the patch is finished, put it into Russell's patch tracker: http://www.arm.linux.org.uk/developer/patches/ Yea, this patch looks like the right thing... and yes, into the patch system please. I try to put it into patch tracker, but I fail to put it. I use following command. git send-email --to patc...@arm.linux.org.uk 0001-ARM-sched-correct~ Am I wrong? Could you teach me how to put patch into your patch tracker? I already read help on your website, but I can't find any stuff for git send-email. Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RESEND] ARM: sched: correct update_sched_clock()
2013/2/9 Nicolas Pitre nicolas.pi...@linaro.org: On Fri, 8 Feb 2013, Russell King - ARM Linux wrote: On Fri, Feb 08, 2013 at 03:51:25PM +0900, Joonsoo Kim wrote: I try to put it into patch tracker, but I fail to put it. I use following command. git send-email --to patc...@arm.linux.org.uk 0001-ARM-sched-correct~ Am I wrong? Could you teach me how to put patch into your patch tracker? I already read help on your website, but I can't find any stuff for git send-email. I've never used git to send email, so I'm afraid I can't help with that. Maybe someone with more experience with sending email from git can comment? The commit text for the patch needs to contain the following 2 lines at the very bottom in order to match the patch system's expectations: PATCH FOLLOWS KernelVersion: v3.8 The KernelVersion value needs to be adjusted of course. Note: I never tried it myself but it ought to work. Yes It works! Thanks!!! Nicolas -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v5 1/3] ARM: vmregion: remove vmregion code entirely
Now, there is no user for vmregion. So remove it. Acked-by: Nicolas Pitre n...@linaro.org Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile index 8a9c4cb..4e333fa 100644 --- a/arch/arm/mm/Makefile +++ b/arch/arm/mm/Makefile @@ -6,7 +6,7 @@ obj-y := dma-mapping.o extable.o fault.o init.o \ iomap.o obj-$(CONFIG_MMU) += fault-armv.o flush.o idmap.o ioremap.o \ - mmap.o pgd.o mmu.o vmregion.o + mmap.o pgd.o mmu.o ifneq ($(CONFIG_MMU),y) obj-y += nommu.o diff --git a/arch/arm/mm/vmregion.c b/arch/arm/mm/vmregion.c deleted file mode 100644 index a631016..000 --- a/arch/arm/mm/vmregion.c +++ /dev/null @@ -1,205 +0,0 @@ -#include linux/fs.h -#include linux/spinlock.h -#include linux/list.h -#include linux/proc_fs.h -#include linux/seq_file.h -#include linux/slab.h - -#include vmregion.h - -/* - * VM region handling support. - * - * This should become something generic, handling VM region allocations for - * vmalloc and similar (ioremap, module space, etc). - * - * I envisage vmalloc()'s supporting vm_struct becoming: - * - * struct vm_struct { - *struct vmregion region; - *unsigned longflags; - *struct page **pages; - *unsigned int nr_pages; - *unsigned longphys_addr; - * }; - * - * get_vm_area() would then call vmregion_alloc with an appropriate - * struct vmregion head (eg): - * - * struct vmregion vmalloc_head = { - * .vm_list= LIST_HEAD_INIT(vmalloc_head.vm_list), - * .vm_start = VMALLOC_START, - * .vm_end = VMALLOC_END, - * }; - * - * However, vmalloc_head.vm_start is variable (typically, it is dependent on - * the amount of RAM found at boot time.) I would imagine that get_vm_area() - * would have to initialise this each time prior to calling vmregion_alloc(). - */ - -struct arm_vmregion * -arm_vmregion_alloc(struct arm_vmregion_head *head, size_t align, - size_t size, gfp_t gfp, const void *caller) -{ - unsigned long start = head-vm_start, addr = head-vm_end; - unsigned long flags; - struct arm_vmregion *c, *new; - - if (head-vm_end - head-vm_start size) { - printk(KERN_WARNING %s: allocation too big (requested %#x)\n, - __func__, size); - goto out; - } - - new = kmalloc(sizeof(struct arm_vmregion), gfp); - if (!new) - goto out; - - new-caller = caller; - - spin_lock_irqsave(head-vm_lock, flags); - - addr = rounddown(addr - size, align); - list_for_each_entry_reverse(c, head-vm_list, vm_list) { - if (addr = c-vm_end) - goto found; - addr = rounddown(c-vm_start - size, align); - if (addr start) - goto nospc; - } - - found: - /* -* Insert this entry after the one we found. -*/ - list_add(new-vm_list, c-vm_list); - new-vm_start = addr; - new-vm_end = addr + size; - new-vm_active = 1; - - spin_unlock_irqrestore(head-vm_lock, flags); - return new; - - nospc: - spin_unlock_irqrestore(head-vm_lock, flags); - kfree(new); - out: - return NULL; -} - -static struct arm_vmregion *__arm_vmregion_find(struct arm_vmregion_head *head, unsigned long addr) -{ - struct arm_vmregion *c; - - list_for_each_entry(c, head-vm_list, vm_list) { - if (c-vm_active c-vm_start == addr) - goto out; - } - c = NULL; - out: - return c; -} - -struct arm_vmregion *arm_vmregion_find(struct arm_vmregion_head *head, unsigned long addr) -{ - struct arm_vmregion *c; - unsigned long flags; - - spin_lock_irqsave(head-vm_lock, flags); - c = __arm_vmregion_find(head, addr); - spin_unlock_irqrestore(head-vm_lock, flags); - return c; -} - -struct arm_vmregion *arm_vmregion_find_remove(struct arm_vmregion_head *head, unsigned long addr) -{ - struct arm_vmregion *c; - unsigned long flags; - - spin_lock_irqsave(head-vm_lock, flags); - c = __arm_vmregion_find(head, addr); - if (c) - c-vm_active = 0; - spin_unlock_irqrestore(head-vm_lock, flags); - return c; -} - -void arm_vmregion_free(struct arm_vmregion_head *head, struct arm_vmregion *c) -{ - unsigned long flags; - - spin_lock_irqsave(head-vm_lock, flags); - list_del(c-vm_list); - spin_unlock_irqrestore(head-vm_lock, flags); - - kfree(c); -} - -#ifdef CONFIG_PROC_FS -static int arm_vmregion_show(struct seq_file *m, void *p) -{ - struct arm_vmregion *c = list_entry(p, struct arm_vmregion, vm_list); - - seq_printf(m, 0x%08lx-0x%08lx %7lu, c-vm_start, c-vm_end, - c-vm_end - c
[PATCH v5 2/3] ARM: ioremap: introduce an infrastructure for static mapped area
In current implementation, we used ARM-specific flag, that is, VM_ARM_STATIC_MAPPING, for distinguishing ARM specific static mapped area. The purpose of static mapped area is to re-use static mapped area when entire physical address range of the ioremap request can be covered by this area. This implementation causes needless overhead for some cases. For example, assume that there is only one static mapped area and vmlist has 300 areas. Every time we call ioremap, we check 300 areas for deciding whether it is matched or not. Moreover, even if there is no static mapped area and vmlist has 300 areas, every time we call ioremap, we check 300 areas in now. If we construct a extra list for static mapped area, we can eliminate above mentioned overhead. With a extra list, if there is one static mapped area, we just check only one area and proceed next operation quickly. In fact, it is not a critical problem, because ioremap is not frequently used. But reducing overhead is better idea. Another reason for doing this work is for removing architecture dependency on vmalloc layer. I think that vmlist and vmlist_lock is internal data structure for vmalloc layer. Some codes for debugging and stat inevitably use vmlist and vmlist_lock. But it is preferable that they are used as least as possible in outside of vmalloc.c Now, I introduce an ARM-specific infrastructure for static mapped area. In the following patch, we will use this and resolve above mentioned problem. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c index 88fd86c..904c15e 100644 --- a/arch/arm/mm/ioremap.c +++ b/arch/arm/mm/ioremap.c @@ -39,6 +39,70 @@ #include asm/mach/pci.h #include mm.h + +LIST_HEAD(static_vmlist); + +static struct static_vm *find_static_vm_paddr(phys_addr_t paddr, + size_t size, unsigned int mtype) +{ + struct static_vm *svm; + struct vm_struct *vm; + + list_for_each_entry(svm, static_vmlist, list) { + vm = svm-vm; + if (!(vm-flags VM_ARM_STATIC_MAPPING)) + continue; + if ((vm-flags VM_ARM_MTYPE_MASK) != VM_ARM_MTYPE(mtype)) + continue; + + if (vm-phys_addr paddr || + paddr + size - 1 vm-phys_addr + vm-size - 1) + continue; + + return svm; + } + + return NULL; +} + +struct static_vm *find_static_vm_vaddr(void *vaddr) +{ + struct static_vm *svm; + struct vm_struct *vm; + + list_for_each_entry(svm, static_vmlist, list) { + vm = svm-vm; + + /* static_vmlist is ascending order */ + if (vm-addr vaddr) + break; + + if (vm-addr = vaddr vm-addr + vm-size vaddr) + return svm; + } + + return NULL; +} + +void __init add_static_vm_early(struct static_vm *svm) +{ + struct static_vm *curr_svm; + struct vm_struct *vm; + void *vaddr; + + vm = svm-vm; + vm_area_add_early(vm); + vaddr = vm-addr; + + list_for_each_entry(curr_svm, static_vmlist, list) { + vm = curr_svm-vm; + + if (vm-addr vaddr) + break; + } + list_add_tail(svm-list, curr_svm-list); +} + int ioremap_page(unsigned long virt, unsigned long phys, const struct mem_type *mtype) { diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h index a8ee92d..d5a4e9a 100644 --- a/arch/arm/mm/mm.h +++ b/arch/arm/mm/mm.h @@ -1,4 +1,6 @@ #ifdef CONFIG_MMU +#include linux/list.h +#include linux/vmalloc.h /* the upper-most page table pointer */ extern pmd_t *top_pmd; @@ -65,6 +67,16 @@ extern void __flush_dcache_page(struct address_space *mapping, struct page *page /* consistent regions used by dma_alloc_attrs() */ #define VM_ARM_DMA_CONSISTENT 0x2000 + +struct static_vm { + struct vm_struct vm; + struct list_head list; +}; + +extern struct list_head static_vmlist; +extern struct static_vm *find_static_vm_vaddr(void *vaddr); +extern __init void add_static_vm_early(struct static_vm *svm); + #endif #ifdef CONFIG_ZONE_DMA -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v5 0/3] introduce static_vm for ARM-specific static mapped area
In current implementation, we used ARM-specific flag, that is, VM_ARM_STATIC_MAPPING, for distinguishing ARM specific static mapped area. The purpose of static mapped area is to re-use static mapped area when entire physical address range of the ioremap request can be covered by this area. This implementation causes needless overhead for some cases. For example, assume that there is only one static mapped area and vmlist has 300 areas. Every time we call ioremap, we check 300 areas for deciding whether it is matched or not. Moreover, even if there is no static mapped area and vmlist has 300 areas, every time we call ioremap, we check 300 areas in now. If we construct a extra list for static mapped area, we can eliminate above mentioned overhead. With a extra list, if there is one static mapped area, we just check only one area and proceed next operation quickly. In fact, it is not a critical problem, because ioremap is not frequently used. But reducing overhead is better idea. Another reason for doing this work is for removing vm_struct list management, entirely. For more information, look at the following link. http://lkml.org/lkml/2012/12/6/184 Changelog v4-v5: [2/3]: Changes from Nicolas' suggestion - don't use separate flags for static_vm - remove a lock - declare add_static_vm_early() as __init [3/3]: Changes from Nicolas' suggestion - add / leave comments v3-v4: [2/3]: Changes from Nicolas' suggestion - embed static_vm code in ioremap.c - simplify struct static_vm - remove init_static_vm, instead, add_static_vm_early() init static_vm Use generic list for list management of static_vm Convert spin_lock to rw_lock Modify static_vm's flags bits [3/3]: Rework according to [2/3] change Rebased on v3.8-rc5 v2-v3: coverletter: refer a link related to this work [2/3]: drop @flags of find_static_vm_vaddr Rebased on v3.8-rc4 v1-v2: [2/3]: patch description is improved. Rebased on v3.7-rc7 Joonsoo Kim (3): ARM: vmregion: remove vmregion code entirely ARM: ioremap: introduce an infrastructure for static mapped area ARM: mm: use static_vm for managing static mapped areas arch/arm/mm/Makefile |2 +- arch/arm/mm/ioremap.c | 135 +-- arch/arm/mm/mm.h | 12 +++ arch/arm/mm/mmu.c | 34 arch/arm/mm/vmregion.c | 205 arch/arm/mm/vmregion.h | 31 6 files changed, 123 insertions(+), 296 deletions(-) delete mode 100644 arch/arm/mm/vmregion.c delete mode 100644 arch/arm/mm/vmregion.h -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v5 3/3] ARM: mm: use static_vm for managing static mapped areas
A static mapped area is ARM-specific, so it is better not to use generic vmalloc data structure, that is, vmlist and vmlist_lock for managing static mapped area. And it causes some needless overhead and reducing this overhead is better idea. Now, we have newly introduced static_vm infrastructure. With it, we don't need to iterate all mapped areas. Instead, we just iterate static mapped areas. It helps to reduce an overhead of finding matched area. And architecture dependency on vmalloc layer is removed, so it will help to maintainability for vmalloc layer. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c index 904c15e..c7fef4b 100644 --- a/arch/arm/mm/ioremap.c +++ b/arch/arm/mm/ioremap.c @@ -261,13 +261,14 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn, const struct mem_type *type; int err; unsigned long addr; - struct vm_struct * area; + struct vm_struct *area; + phys_addr_t paddr = __pfn_to_phys(pfn); #ifndef CONFIG_ARM_LPAE /* * High mappings must be supersection aligned */ - if (pfn = 0x10 (__pfn_to_phys(pfn) ~SUPERSECTION_MASK)) + if (pfn = 0x10 (paddr ~SUPERSECTION_MASK)) return NULL; #endif @@ -283,24 +284,16 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn, /* * Try to reuse one of the static mapping whenever possible. */ - read_lock(vmlist_lock); - for (area = vmlist; area; area = area-next) { - if (!size || (sizeof(phys_addr_t) == 4 pfn = 0x10)) - break; - if (!(area-flags VM_ARM_STATIC_MAPPING)) - continue; - if ((area-flags VM_ARM_MTYPE_MASK) != VM_ARM_MTYPE(mtype)) - continue; - if (__phys_to_pfn(area-phys_addr) pfn || - __pfn_to_phys(pfn) + size-1 area-phys_addr + area-size-1) - continue; - /* we can drop the lock here as we know *area is static */ - read_unlock(vmlist_lock); - addr = (unsigned long)area-addr; - addr += __pfn_to_phys(pfn) - area-phys_addr; - return (void __iomem *) (offset + addr); + if (size !((sizeof(phys_addr_t) == 4 pfn = 0x10))) { + struct static_vm *svm; + + svm = find_static_vm_paddr(paddr, size, mtype); + if (svm) { + addr = (unsigned long)svm-vm.addr; + addr += paddr - svm-vm.phys_addr; + return (void __iomem *) (offset + addr); + } } - read_unlock(vmlist_lock); /* * Don't allow RAM to be mapped - this causes problems with ARMv6+ @@ -312,21 +305,21 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn, if (!area) return NULL; addr = (unsigned long)area-addr; - area-phys_addr = __pfn_to_phys(pfn); + area-phys_addr = paddr; #if !defined(CONFIG_SMP) !defined(CONFIG_ARM_LPAE) if (DOMAIN_IO == 0 (((cpu_architecture() = CPU_ARCH_ARMv6) (get_cr() CR_XP)) || cpu_is_xsc3()) pfn = 0x10 - !((__pfn_to_phys(pfn) | size | addr) ~SUPERSECTION_MASK)) { + !((paddr | size | addr) ~SUPERSECTION_MASK)) { area-flags |= VM_ARM_SECTION_MAPPING; err = remap_area_supersections(addr, pfn, size, type); - } else if (!((__pfn_to_phys(pfn) | size | addr) ~PMD_MASK)) { + } else if (!((paddr | size | addr) ~PMD_MASK)) { area-flags |= VM_ARM_SECTION_MAPPING; err = remap_area_sections(addr, pfn, size, type); } else #endif - err = ioremap_page_range(addr, addr + size, __pfn_to_phys(pfn), + err = ioremap_page_range(addr, addr + size, paddr, __pgprot(type-prot_pte)); if (err) { @@ -410,34 +403,28 @@ __arm_ioremap_exec(unsigned long phys_addr, size_t size, bool cached) void __iounmap(volatile void __iomem *io_addr) { void *addr = (void *)(PAGE_MASK (unsigned long)io_addr); - struct vm_struct *vm; + struct static_vm *svm; + + /* If this is a static mapping, we must leave it alone */ + svm = find_static_vm_vaddr(addr); + if (svm) + return; - read_lock(vmlist_lock); - for (vm = vmlist; vm; vm = vm-next) { - if (vm-addr addr) - break; - if (!(vm-flags VM_IOREMAP)) - continue; - /* If this is a static mapping we must leave it alone */ - if ((vm-flags VM_ARM_STATIC_MAPPING) - (vm-addr = addr) (vm-addr + vm-size addr)) { - read_unlock(vmlist_lock
Re: [PATCH v5 3/3] ARM: mm: use static_vm for managing static mapped areas
Hello, Nicolas. On Mon, Feb 04, 2013 at 11:44:16PM -0500, Nicolas Pitre wrote: On Tue, 5 Feb 2013, Joonsoo Kim wrote: A static mapped area is ARM-specific, so it is better not to use generic vmalloc data structure, that is, vmlist and vmlist_lock for managing static mapped area. And it causes some needless overhead and reducing this overhead is better idea. Now, we have newly introduced static_vm infrastructure. With it, we don't need to iterate all mapped areas. Instead, we just iterate static mapped areas. It helps to reduce an overhead of finding matched area. And architecture dependency on vmalloc layer is removed, so it will help to maintainability for vmalloc layer. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com Some comments below. diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c index 904c15e..c7fef4b 100644 --- a/arch/arm/mm/ioremap.c +++ b/arch/arm/mm/ioremap.c @@ -261,13 +261,14 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn, const struct mem_type *type; int err; unsigned long addr; - struct vm_struct * area; + struct vm_struct *area; + phys_addr_t paddr = __pfn_to_phys(pfn); #ifndef CONFIG_ARM_LPAE /* * High mappings must be supersection aligned */ - if (pfn = 0x10 (__pfn_to_phys(pfn) ~SUPERSECTION_MASK)) + if (pfn = 0x10 (paddr ~SUPERSECTION_MASK)) return NULL; #endif @@ -283,24 +284,16 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn, /* * Try to reuse one of the static mapping whenever possible. */ - read_lock(vmlist_lock); - for (area = vmlist; area; area = area-next) { - if (!size || (sizeof(phys_addr_t) == 4 pfn = 0x10)) - break; - if (!(area-flags VM_ARM_STATIC_MAPPING)) - continue; - if ((area-flags VM_ARM_MTYPE_MASK) != VM_ARM_MTYPE(mtype)) - continue; - if (__phys_to_pfn(area-phys_addr) pfn || - __pfn_to_phys(pfn) + size-1 area-phys_addr + area-size-1) - continue; - /* we can drop the lock here as we know *area is static */ - read_unlock(vmlist_lock); - addr = (unsigned long)area-addr; - addr += __pfn_to_phys(pfn) - area-phys_addr; - return (void __iomem *) (offset + addr); + if (size !((sizeof(phys_addr_t) == 4 pfn = 0x10))) { ^ ^ You have a needless extra set of parents here. Okay. [...] diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c index ce328c7..b2c0356 100644 --- a/arch/arm/mm/mmu.c +++ b/arch/arm/mm/mmu.c @@ -757,21 +757,24 @@ void __init iotable_init(struct map_desc *io_desc, int nr) { struct map_desc *md; struct vm_struct *vm; + struct static_vm *svm; if (!nr) return; - vm = early_alloc_aligned(sizeof(*vm) * nr, __alignof__(*vm)); + svm = early_alloc_aligned(sizeof(*svm) * nr, __alignof__(*svm)); for (md = io_desc; nr; md++, nr--) { create_mapping(md); + + vm = svm-vm; vm-addr = (void *)(md-virtual PAGE_MASK); vm-size = PAGE_ALIGN(md-length + (md-virtual ~PAGE_MASK)); vm-phys_addr = __pfn_to_phys(md-pfn); vm-flags = VM_IOREMAP | VM_ARM_STATIC_MAPPING; vm-flags |= VM_ARM_MTYPE(md-type); vm-caller = iotable_init; - vm_area_add_early(vm++); + add_static_vm_early(svm++); } } @@ -779,13 +782,16 @@ void __init vm_reserve_area_early(unsigned long addr, unsigned long size, void *caller) { struct vm_struct *vm; + struct static_vm *svm; + + svm = early_alloc_aligned(sizeof(*svm), __alignof__(*svm)); - vm = early_alloc_aligned(sizeof(*vm), __alignof__(*vm)); + vm = svm-vm; vm-addr = (void *)addr; vm-size = size; vm-flags = VM_IOREMAP | VM_ARM_EMPTY_MAPPING; vm-caller = caller; - vm_area_add_early(vm); + add_static_vm_early(svm); } #ifndef CONFIG_ARM_LPAE @@ -810,14 +816,13 @@ static void __init pmd_empty_section_gap(unsigned long addr) static void __init fill_pmd_gaps(void) { + struct static_vm *svm; struct vm_struct *vm; unsigned long addr, next = 0; pmd_t *pmd; - /* we're still single threaded hence no lock needed here */ - for (vm = vmlist; vm; vm = vm-next) { - if (!(vm-flags (VM_ARM_STATIC_MAPPING | VM_ARM_EMPTY_MAPPING))) - continue; + list_for_each_entry(svm, static_vmlist, list) { + vm = svm-vm; addr = (unsigned long)vm-addr; if (addr next) continue; @@ -859,17 +864,12 @@ static void
Re: [PATCH v5 3/3] ARM: mm: use static_vm for managing static mapped areas
Hello, Rob. On Tue, Feb 05, 2013 at 01:12:51PM -0600, Rob Herring wrote: On 02/05/2013 12:13 PM, Nicolas Pitre wrote: On Tue, 5 Feb 2013, Rob Herring wrote: On 02/04/2013 10:44 PM, Nicolas Pitre wrote: On Tue, 5 Feb 2013, Joonsoo Kim wrote: A static mapped area is ARM-specific, so it is better not to use generic vmalloc data structure, that is, vmlist and vmlist_lock for managing static mapped area. And it causes some needless overhead and reducing this overhead is better idea. Now, we have newly introduced static_vm infrastructure. With it, we don't need to iterate all mapped areas. Instead, we just iterate static mapped areas. It helps to reduce an overhead of finding matched area. And architecture dependency on vmalloc layer is removed, so it will help to maintainability for vmalloc layer. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com [snip] @@ -859,17 +864,12 @@ static void __init pci_reserve_io(void) { struct vm_struct *vm; unsigned long addr; +struct static_vm *svm; -/* we're still single threaded hence no lock needed here */ -for (vm = vmlist; vm; vm = vm-next) { -if (!(vm-flags VM_ARM_STATIC_MAPPING)) -continue; -addr = (unsigned long)vm-addr; -addr = ~(SZ_2M - 1); -if (addr == PCI_IO_VIRT_BASE) -return; +svm = find_static_vm_vaddr((void *)PCI_IO_VIRT_BASE); +if (svm) +return; -} vm_reserve_area_early(PCI_IO_VIRT_BASE, SZ_2M, pci_reserve_io); } The replacement code is not equivalent. I can't recall why the original is as it is, but it doesn't look right to me. The 2MB round down certainly looks suspicious. The PCI mapping is at a fixed, aligned 2MB mapping. If we find any virtual address within that region already mapped, it is an error. Ah, OK. This wasn't clear looking at the code. We probably should have had a WARN here. Indeed. Okay. I should fix it to find any mapping within PCI reserved region. But, I think that it is not an error. Now, I see your original commit 'c2794437091a4fda72c4a4f3567dd728dcc0c3c9' and find below message. Platforms which need early i/o mapping (e.g. for vga console) can call pci_map_io_early in their .map_io function. Therfore, for some platform, it is possible that there is a mapping within PCI reserved range. So, I will not add WARN here. I will fix and re-send v6 with your ACK. Thanks for review. The replacement code should be better. However I'd like you to get an ACK from Rob Herring as well for this patch. It doesn't appear to me the above case is handled. The virt addr is checked whether it is within an existing mapping, but not whether the new mapping would overlap an existing mapping. It would be good to check for this generically rather than specifically for the PCI i/o mapping. Agreed. However that is checked already in vm_area_add_early(). Therefore the overlap test here is redundant. Ah, right. In that case: Acked-by: Rob Herring rob.herr...@calxeda.com Rob -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/3] ARM: vmregion: remove vmregion code entirely
Hello, Santosh. On Tue, Feb 05, 2013 at 02:22:39PM +0530, Santosh Shilimkar wrote: On Tuesday 05 February 2013 06:01 AM, Joonsoo Kim wrote: Now, there is no user for vmregion. So remove it. Acked-by: Nicolas Pitre n...@linaro.org Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile index 8a9c4cb..4e333fa 100644 --- a/arch/arm/mm/Makefile +++ b/arch/arm/mm/Makefile @@ -6,7 +6,7 @@ obj-y:= dma-mapping.o extable.o fault.o init.o \ iomap.o obj-$(CONFIG_MMU) += fault-armv.o flush.o idmap.o ioremap.o \ - mmap.o pgd.o mmu.o vmregion.o + mmap.o pgd.o mmu.o ifneq ($(CONFIG_MMU),y) obj-y += nommu.o diff --git a/arch/arm/mm/vmregion.c b/arch/arm/mm/vmregion.c deleted file mode 100644 index a631016..000 --- a/arch/arm/mm/vmregion.c +++ /dev/null @@ -1,205 +0,0 @@ You might want to use 'git format-patch -D' which will just generate one line for a deleted file. Nice tip! Thanks. Regards, Santosh -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 0/3] introduce static_vm for ARM-specific static mapped area
Hello, Santosh. On Tue, Feb 05, 2013 at 02:32:06PM +0530, Santosh Shilimkar wrote: On Tuesday 05 February 2013 06:01 AM, Joonsoo Kim wrote: In current implementation, we used ARM-specific flag, that is, VM_ARM_STATIC_MAPPING, for distinguishing ARM specific static mapped area. The purpose of static mapped area is to re-use static mapped area when entire physical address range of the ioremap request can be covered by this area. This implementation causes needless overhead for some cases. For example, assume that there is only one static mapped area and vmlist has 300 areas. Every time we call ioremap, we check 300 areas for deciding whether it is matched or not. Moreover, even if there is no static mapped area and vmlist has 300 areas, every time we call ioremap, we check 300 areas in now. If we construct a extra list for static mapped area, we can eliminate above mentioned overhead. With a extra list, if there is one static mapped area, we just check only one area and proceed next operation quickly. In fact, it is not a critical problem, because ioremap is not frequently used. But reducing overhead is better idea. Another reason for doing this work is for removing vm_struct list management, entirely. For more information, look at the following link. http://lkml.org/lkml/2012/12/6/184 [..] Joonsoo Kim (3): ARM: vmregion: remove vmregion code entirely ARM: ioremap: introduce an infrastructure for static mapped area ARM: mm: use static_vm for managing static mapped areas arch/arm/mm/Makefile |2 +- arch/arm/mm/ioremap.c | 135 +-- arch/arm/mm/mm.h | 12 +++ arch/arm/mm/mmu.c | 34 arch/arm/mm/vmregion.c | 205 arch/arm/mm/vmregion.h | 31 6 files changed, 123 insertions(+), 296 deletions(-) delete mode 100644 arch/arm/mm/vmregion.c delete mode 100644 arch/arm/mm/vmregion.h Nice Clean-up. I tested this series on OMAP which uses few static mappings. Feel free to add, Tested-by: Santosh Shilimkarsantosh.shilim...@ti.com I will re-send v6 with your Tested-by. Thanks for testing this. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 3/3] ARM: mm: use static_vm for managing static mapped areas
On Wed, Feb 06, 2013 at 11:07:07AM +0900, Joonsoo Kim wrote: Hello, Rob. On Tue, Feb 05, 2013 at 01:12:51PM -0600, Rob Herring wrote: On 02/05/2013 12:13 PM, Nicolas Pitre wrote: On Tue, 5 Feb 2013, Rob Herring wrote: On 02/04/2013 10:44 PM, Nicolas Pitre wrote: On Tue, 5 Feb 2013, Joonsoo Kim wrote: A static mapped area is ARM-specific, so it is better not to use generic vmalloc data structure, that is, vmlist and vmlist_lock for managing static mapped area. And it causes some needless overhead and reducing this overhead is better idea. Now, we have newly introduced static_vm infrastructure. With it, we don't need to iterate all mapped areas. Instead, we just iterate static mapped areas. It helps to reduce an overhead of finding matched area. And architecture dependency on vmalloc layer is removed, so it will help to maintainability for vmalloc layer. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com [snip] @@ -859,17 +864,12 @@ static void __init pci_reserve_io(void) { struct vm_struct *vm; unsigned long addr; + struct static_vm *svm; - /* we're still single threaded hence no lock needed here */ - for (vm = vmlist; vm; vm = vm-next) { - if (!(vm-flags VM_ARM_STATIC_MAPPING)) - continue; - addr = (unsigned long)vm-addr; - addr = ~(SZ_2M - 1); - if (addr == PCI_IO_VIRT_BASE) - return; + svm = find_static_vm_vaddr((void *)PCI_IO_VIRT_BASE); + if (svm) + return; - } vm_reserve_area_early(PCI_IO_VIRT_BASE, SZ_2M, pci_reserve_io); } The replacement code is not equivalent. I can't recall why the original is as it is, but it doesn't look right to me. The 2MB round down certainly looks suspicious. The PCI mapping is at a fixed, aligned 2MB mapping. If we find any virtual address within that region already mapped, it is an error. Ah, OK. This wasn't clear looking at the code. We probably should have had a WARN here. Indeed. Okay. I should fix it to find any mapping within PCI reserved region. Ah... Above comment is my mistake. If there is a region already mapped within PCI reserved region and it is not found by find_static_vm_vaddr(), vm_area_add_early() hit BUG_ON(). So, to leave find_static_vm_vaddr() is safe. But, I think that it is not an error. Now, I see your original commit 'c2794437091a4fda72c4a4f3567dd728dcc0c3c9' and find below message. Platforms which need early i/o mapping (e.g. for vga console) can call pci_map_io_early in their .map_io function. Therfore, for some platform, it is possible that there is a mapping within PCI reserved range. So, I will not add WARN here. I will fix and re-send v6 with your ACK. Thanks for review. The replacement code should be better. However I'd like you to get an ACK from Rob Herring as well for this patch. It doesn't appear to me the above case is handled. The virt addr is checked whether it is within an existing mapping, but not whether the new mapping would overlap an existing mapping. It would be good to check for this generically rather than specifically for the PCI i/o mapping. Agreed. However that is checked already in vm_area_add_early(). Therefore the overlap test here is redundant. Ah, right. In that case: Acked-by: Rob Herring rob.herr...@calxeda.com Rob -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RESEND] ARM: sched: correct update_sched_clock()
If we want load epoch_cyc and epoch_ns atomically, we should update epoch_cyc_copy first of all. This notify reader that updating is in progress. If we update epoch_cyc first like as current implementation, there is subtle error case. Look at the below example. Initial Condition cyc = 9 ns = 900 cyc_copy = 9 == CASE 1 == CPU A = reader CPU B = updater write cyc = 10 read cyc = 10 read ns = 900 write ns = 1000 write cyc_copy = 10 read cyc_copy = 10 output = (10, 900) == CASE 2 == CPU A = reader CPU B = updater read cyc = 9 write cyc = 10 write ns = 1000 read ns = 1000 read cyc_copy = 9 write cyc_copy = 10 output = (9, 1000) If atomic read is ensured, output should be (9, 900) or (10, 1000). But, output in example case are not. So, change updating sequence in order to correct this problem. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/arch/arm/kernel/sched_clock.c b/arch/arm/kernel/sched_clock.c index fc6692e..bd6f56b 100644 --- a/arch/arm/kernel/sched_clock.c +++ b/arch/arm/kernel/sched_clock.c @@ -93,11 +93,11 @@ static void notrace update_sched_clock(void) * detectable in cyc_to_fixed_sched_clock(). */ raw_local_irq_save(flags); - cd.epoch_cyc = cyc; + cd.epoch_cyc_copy = cyc; smp_wmb(); cd.epoch_ns = ns; smp_wmb(); - cd.epoch_cyc_copy = cyc; + cd.epoch_cyc = cyc; raw_local_irq_restore(flags); } -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v6 0/3] introduce static_vm for ARM-specific static mapped area
In current implementation, we used ARM-specific flag, that is, VM_ARM_STATIC_MAPPING, for distinguishing ARM specific static mapped area. The purpose of static mapped area is to re-use static mapped area when entire physical address range of the ioremap request can be covered by this area. This implementation causes needless overhead for some cases. For example, assume that there is only one static mapped area and vmlist has 300 areas. Every time we call ioremap, we check 300 areas for deciding whether it is matched or not. Moreover, even if there is no static mapped area and vmlist has 300 areas, every time we call ioremap, we check 300 areas in now. If we construct a extra list for static mapped area, we can eliminate above mentioned overhead. With a extra list, if there is one static mapped area, we just check only one area and proceed next operation quickly. In fact, it is not a critical problem, because ioremap is not frequently used. But reducing overhead is better idea. Another reason for doing this work is for removing vm_struct list management, entirely. For more information, look at the following link. http://lkml.org/lkml/2012/12/6/184 Changelog v5-v6: Add Ack-by, Reviewed-by, Tested-by tags [3/3]: Change from Nicolas' suggestion - remove redundant parenthesis v4-v5: [2/3]: Changes from Nicolas' suggestion - don't use separate flags for static_vm - remove a lock - declare add_static_vm_early() as __init [3/3]: Changes from Nicolas' suggestion - add / leave comments v3-v4: [2/3]: Changes from Nicolas' suggestion - embed static_vm code in ioremap.c - simplify struct static_vm - remove init_static_vm, instead, add_static_vm_early() init static_vm Use generic list for list management of static_vm Convert spin_lock to rw_lock Modify static_vm's flags bits [3/3]: Rework according to [2/3] change Rebased on v3.8-rc5 v2-v3: coverletter: refer a link related to this work [2/3]: drop @flags of find_static_vm_vaddr Rebased on v3.8-rc4 v1-v2: [2/3]: patch description is improved. Rebased on v3.7-rc7 Joonsoo Kim (3): ARM: vmregion: remove vmregion code entirely ARM: ioremap: introduce an infrastructure for static mapped area ARM: mm: use static_vm for managing static mapped areas arch/arm/mm/Makefile |2 +- arch/arm/mm/ioremap.c | 135 +-- arch/arm/mm/mm.h | 12 +++ arch/arm/mm/mmu.c | 34 arch/arm/mm/vmregion.c | 205 arch/arm/mm/vmregion.h | 31 6 files changed, 123 insertions(+), 296 deletions(-) delete mode 100644 arch/arm/mm/vmregion.c delete mode 100644 arch/arm/mm/vmregion.h -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v6 2/3] ARM: ioremap: introduce an infrastructure for static mapped area
In current implementation, we used ARM-specific flag, that is, VM_ARM_STATIC_MAPPING, for distinguishing ARM specific static mapped area. The purpose of static mapped area is to re-use static mapped area when entire physical address range of the ioremap request can be covered by this area. This implementation causes needless overhead for some cases. For example, assume that there is only one static mapped area and vmlist has 300 areas. Every time we call ioremap, we check 300 areas for deciding whether it is matched or not. Moreover, even if there is no static mapped area and vmlist has 300 areas, every time we call ioremap, we check 300 areas in now. If we construct a extra list for static mapped area, we can eliminate above mentioned overhead. With a extra list, if there is one static mapped area, we just check only one area and proceed next operation quickly. In fact, it is not a critical problem, because ioremap is not frequently used. But reducing overhead is better idea. Another reason for doing this work is for removing architecture dependency on vmalloc layer. I think that vmlist and vmlist_lock is internal data structure for vmalloc layer. Some codes for debugging and stat inevitably use vmlist and vmlist_lock. But it is preferable that they are used as least as possible in outside of vmalloc.c Now, I introduce an ARM-specific infrastructure for static mapped area. In the following patch, we will use this and resolve above mentioned problem. Reviewed-by: Nicolas Pitre n...@linaro.org Tested-by: Santosh Shilimkar santosh.shilim...@ti.com Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c index 88fd86c..904c15e 100644 --- a/arch/arm/mm/ioremap.c +++ b/arch/arm/mm/ioremap.c @@ -39,6 +39,70 @@ #include asm/mach/pci.h #include mm.h + +LIST_HEAD(static_vmlist); + +static struct static_vm *find_static_vm_paddr(phys_addr_t paddr, + size_t size, unsigned int mtype) +{ + struct static_vm *svm; + struct vm_struct *vm; + + list_for_each_entry(svm, static_vmlist, list) { + vm = svm-vm; + if (!(vm-flags VM_ARM_STATIC_MAPPING)) + continue; + if ((vm-flags VM_ARM_MTYPE_MASK) != VM_ARM_MTYPE(mtype)) + continue; + + if (vm-phys_addr paddr || + paddr + size - 1 vm-phys_addr + vm-size - 1) + continue; + + return svm; + } + + return NULL; +} + +struct static_vm *find_static_vm_vaddr(void *vaddr) +{ + struct static_vm *svm; + struct vm_struct *vm; + + list_for_each_entry(svm, static_vmlist, list) { + vm = svm-vm; + + /* static_vmlist is ascending order */ + if (vm-addr vaddr) + break; + + if (vm-addr = vaddr vm-addr + vm-size vaddr) + return svm; + } + + return NULL; +} + +void __init add_static_vm_early(struct static_vm *svm) +{ + struct static_vm *curr_svm; + struct vm_struct *vm; + void *vaddr; + + vm = svm-vm; + vm_area_add_early(vm); + vaddr = vm-addr; + + list_for_each_entry(curr_svm, static_vmlist, list) { + vm = curr_svm-vm; + + if (vm-addr vaddr) + break; + } + list_add_tail(svm-list, curr_svm-list); +} + int ioremap_page(unsigned long virt, unsigned long phys, const struct mem_type *mtype) { diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h index a8ee92d..d5a4e9a 100644 --- a/arch/arm/mm/mm.h +++ b/arch/arm/mm/mm.h @@ -1,4 +1,6 @@ #ifdef CONFIG_MMU +#include linux/list.h +#include linux/vmalloc.h /* the upper-most page table pointer */ extern pmd_t *top_pmd; @@ -65,6 +67,16 @@ extern void __flush_dcache_page(struct address_space *mapping, struct page *page /* consistent regions used by dma_alloc_attrs() */ #define VM_ARM_DMA_CONSISTENT 0x2000 + +struct static_vm { + struct vm_struct vm; + struct list_head list; +}; + +extern struct list_head static_vmlist; +extern struct static_vm *find_static_vm_vaddr(void *vaddr); +extern __init void add_static_vm_early(struct static_vm *svm); + #endif #ifdef CONFIG_ZONE_DMA -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v6 1/3] ARM: vmregion: remove vmregion code entirely
Now, there is no user for vmregion. So remove it. Acked-by: Nicolas Pitre n...@linaro.org Tested-by: Santosh Shilimkar santosh.shilim...@ti.com Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile index 8a9c4cb..4e333fa 100644 --- a/arch/arm/mm/Makefile +++ b/arch/arm/mm/Makefile @@ -6,7 +6,7 @@ obj-y := dma-mapping.o extable.o fault.o init.o \ iomap.o obj-$(CONFIG_MMU) += fault-armv.o flush.o idmap.o ioremap.o \ - mmap.o pgd.o mmu.o vmregion.o + mmap.o pgd.o mmu.o ifneq ($(CONFIG_MMU),y) obj-y += nommu.o diff --git a/arch/arm/mm/vmregion.c b/arch/arm/mm/vmregion.c deleted file mode 100644 index a631016..000 diff --git a/arch/arm/mm/vmregion.h b/arch/arm/mm/vmregion.h deleted file mode 100644 index 0f5a5f2..000 -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v6 3/3] ARM: mm: use static_vm for managing static mapped areas
A static mapped area is ARM-specific, so it is better not to use generic vmalloc data structure, that is, vmlist and vmlist_lock for managing static mapped area. And it causes some needless overhead and reducing this overhead is better idea. Now, we have newly introduced static_vm infrastructure. With it, we don't need to iterate all mapped areas. Instead, we just iterate static mapped areas. It helps to reduce an overhead of finding matched area. And architecture dependency on vmalloc layer is removed, so it will help to maintainability for vmalloc layer. Reviewed-by: Nicolas Pitre n...@linaro.org Acked-by: Rob Herring rob.herr...@calxeda.com Tested-by: Santosh Shilimkar santosh.shilim...@ti.com Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c index 904c15e..04d9006 100644 --- a/arch/arm/mm/ioremap.c +++ b/arch/arm/mm/ioremap.c @@ -261,13 +261,14 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn, const struct mem_type *type; int err; unsigned long addr; - struct vm_struct * area; + struct vm_struct *area; + phys_addr_t paddr = __pfn_to_phys(pfn); #ifndef CONFIG_ARM_LPAE /* * High mappings must be supersection aligned */ - if (pfn = 0x10 (__pfn_to_phys(pfn) ~SUPERSECTION_MASK)) + if (pfn = 0x10 (paddr ~SUPERSECTION_MASK)) return NULL; #endif @@ -283,24 +284,16 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn, /* * Try to reuse one of the static mapping whenever possible. */ - read_lock(vmlist_lock); - for (area = vmlist; area; area = area-next) { - if (!size || (sizeof(phys_addr_t) == 4 pfn = 0x10)) - break; - if (!(area-flags VM_ARM_STATIC_MAPPING)) - continue; - if ((area-flags VM_ARM_MTYPE_MASK) != VM_ARM_MTYPE(mtype)) - continue; - if (__phys_to_pfn(area-phys_addr) pfn || - __pfn_to_phys(pfn) + size-1 area-phys_addr + area-size-1) - continue; - /* we can drop the lock here as we know *area is static */ - read_unlock(vmlist_lock); - addr = (unsigned long)area-addr; - addr += __pfn_to_phys(pfn) - area-phys_addr; - return (void __iomem *) (offset + addr); + if (size !(sizeof(phys_addr_t) == 4 pfn = 0x10)) { + struct static_vm *svm; + + svm = find_static_vm_paddr(paddr, size, mtype); + if (svm) { + addr = (unsigned long)svm-vm.addr; + addr += paddr - svm-vm.phys_addr; + return (void __iomem *) (offset + addr); + } } - read_unlock(vmlist_lock); /* * Don't allow RAM to be mapped - this causes problems with ARMv6+ @@ -312,21 +305,21 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn, if (!area) return NULL; addr = (unsigned long)area-addr; - area-phys_addr = __pfn_to_phys(pfn); + area-phys_addr = paddr; #if !defined(CONFIG_SMP) !defined(CONFIG_ARM_LPAE) if (DOMAIN_IO == 0 (((cpu_architecture() = CPU_ARCH_ARMv6) (get_cr() CR_XP)) || cpu_is_xsc3()) pfn = 0x10 - !((__pfn_to_phys(pfn) | size | addr) ~SUPERSECTION_MASK)) { + !((paddr | size | addr) ~SUPERSECTION_MASK)) { area-flags |= VM_ARM_SECTION_MAPPING; err = remap_area_supersections(addr, pfn, size, type); - } else if (!((__pfn_to_phys(pfn) | size | addr) ~PMD_MASK)) { + } else if (!((paddr | size | addr) ~PMD_MASK)) { area-flags |= VM_ARM_SECTION_MAPPING; err = remap_area_sections(addr, pfn, size, type); } else #endif - err = ioremap_page_range(addr, addr + size, __pfn_to_phys(pfn), + err = ioremap_page_range(addr, addr + size, paddr, __pgprot(type-prot_pte)); if (err) { @@ -410,34 +403,28 @@ __arm_ioremap_exec(unsigned long phys_addr, size_t size, bool cached) void __iounmap(volatile void __iomem *io_addr) { void *addr = (void *)(PAGE_MASK (unsigned long)io_addr); - struct vm_struct *vm; + struct static_vm *svm; + + /* If this is a static mapping, we must leave it alone */ + svm = find_static_vm_vaddr(addr); + if (svm) + return; - read_lock(vmlist_lock); - for (vm = vmlist; vm; vm = vm-next) { - if (vm-addr addr) - break; - if (!(vm-flags VM_IOREMAP)) - continue; - /* If this is a static mapping we must leave it alone */ - if ((vm-flags VM_ARM_STATIC_MAPPING
Re: [PATCH RESEND] ARM: sched: correct update_sched_clock()
Hello, Linus. 2013/2/6 Linus Walleij linus.wall...@linaro.org: On Wed, Feb 6, 2013 at 6:21 AM, Joonsoo Kim iamjoonsoo@lge.com wrote: If we want load epoch_cyc and epoch_ns atomically, we should update epoch_cyc_copy first of all. This notify reader that updating is in progress. If you think the patch is finished, put it into Russell's patch tracker: http://www.arm.linux.org.uk/developer/patches/ Ah... Okay. Thanks for notifying me. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 2/3] ARM: static_vm: introduce an infrastructure for static mapped area
Hello, Nicolas. On Tue, Jan 29, 2013 at 07:05:32PM -0500, Nicolas Pitre wrote: On Thu, 24 Jan 2013, Joonsoo Kim wrote: From: Joonsoo Kim js1...@gmail.com In current implementation, we used ARM-specific flag, that is, VM_ARM_STATIC_MAPPING, for distinguishing ARM specific static mapped area. The purpose of static mapped area is to re-use static mapped area when entire physical address range of the ioremap request can be covered by this area. This implementation causes needless overhead for some cases. For example, assume that there is only one static mapped area and vmlist has 300 areas. Every time we call ioremap, we check 300 areas for deciding whether it is matched or not. Moreover, even if there is no static mapped area and vmlist has 300 areas, every time we call ioremap, we check 300 areas in now. If we construct a extra list for static mapped area, we can eliminate above mentioned overhead. With a extra list, if there is one static mapped area, we just check only one area and proceed next operation quickly. In fact, it is not a critical problem, because ioremap is not frequently used. But reducing overhead is better idea. Another reason for doing this work is for removing architecture dependency on vmalloc layer. I think that vmlist and vmlist_lock is internal data structure for vmalloc layer. Some codes for debugging and stat inevitably use vmlist and vmlist_lock. But it is preferable that they are used as least as possible in outside of vmalloc.c Now, I introduce an ARM-specific infrastructure for static mapped area. In the following patch, we will use this and resolve above mentioned problem. Signed-off-by: Joonsoo Kim js1...@gmail.com Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com First of all, I don't think you really need a new file with a global scope header file. Given that this code is meant to be used only for ioremap optimization on ARM, it is probably a better idea to simply put it all into arch/arm/mm/ioremap.c instead. The only function that needs to be exported out of ioremap.c is insert_static_vm(), and only for the benefit of arch/arm/mm/mmu.c, therefore this function prototype may as well just be added to arch/arm/mm/mm.h. I agree with your all opinions. I will re-work and will re-send v4 as soon as possible. Thanks for review. More comments below. diff --git a/arch/arm/include/asm/mach/static_vm.h b/arch/arm/include/asm/mach/static_vm.h new file mode 100644 index 000..72c8339 --- /dev/null +++ b/arch/arm/include/asm/mach/static_vm.h @@ -0,0 +1,45 @@ +/* + * arch/arm/include/asm/mach/static_vm.h + * + * Copyright (C) 2012 LG Electronics, Joonsoo Kim iamjoonsoo@lge.com + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#ifndef _ASM_MACH_STATIC_VM_H +#define _ASM_MACH_STATIC_VM_H + +#include linux/types.h +#include linux/vmalloc.h + +struct static_vm { + struct static_vm*next; + void*vaddr; + unsigned long size; + unsigned long flags; + phys_addr_t paddr; + const void *caller; +}; Here you're duplicating most of the vm_struct content for no obvious reasons. Patch #3 even allocates both a vm_struct and a static_vm instance in parallel for each mapping. Instead, you should consider something like this: struct static_vm { struct static_vm *next; struct vm_struct vm; }; This way, you only need to allocate one structure: struct static_vm *svm = early_alloc(...); ... svm-vm.addr = addr; ... vm_area_add_early(svm-vm); insert_static_vm(svm); Yes! It's good idea. And then, it would make sense for the insert_static_vm() to do the vm_area_add_early() call itself as well. Okay. Maybe rename insert_static_vm() to static_vm_area_add_early() to better identify its purpose as well. It shouldn't be used for any other purpose anyway. Okay. + +extern struct static_vm *static_vmlist; +extern spinlock_t static_vmlist_lock; Your patch is providing the proper accessors to manipulate those. They therefore should not be exported globally. Okay. + +extern struct static_vm *find_static_vm_paddr(phys_addr_t paddr
[PATCH v4 1/3] ARM: vmregion: remove vmregion code entirely
Now, there is no user for vmregion. So remove it. Acked-by: Nicolas Pitre n...@linaro.org Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile index 8a9c4cb..4e333fa 100644 --- a/arch/arm/mm/Makefile +++ b/arch/arm/mm/Makefile @@ -6,7 +6,7 @@ obj-y := dma-mapping.o extable.o fault.o init.o \ iomap.o obj-$(CONFIG_MMU) += fault-armv.o flush.o idmap.o ioremap.o \ - mmap.o pgd.o mmu.o vmregion.o + mmap.o pgd.o mmu.o ifneq ($(CONFIG_MMU),y) obj-y += nommu.o diff --git a/arch/arm/mm/vmregion.c b/arch/arm/mm/vmregion.c deleted file mode 100644 index a631016..000 --- a/arch/arm/mm/vmregion.c +++ /dev/null @@ -1,205 +0,0 @@ -#include linux/fs.h -#include linux/spinlock.h -#include linux/list.h -#include linux/proc_fs.h -#include linux/seq_file.h -#include linux/slab.h - -#include vmregion.h - -/* - * VM region handling support. - * - * This should become something generic, handling VM region allocations for - * vmalloc and similar (ioremap, module space, etc). - * - * I envisage vmalloc()'s supporting vm_struct becoming: - * - * struct vm_struct { - *struct vmregion region; - *unsigned longflags; - *struct page **pages; - *unsigned int nr_pages; - *unsigned longphys_addr; - * }; - * - * get_vm_area() would then call vmregion_alloc with an appropriate - * struct vmregion head (eg): - * - * struct vmregion vmalloc_head = { - * .vm_list= LIST_HEAD_INIT(vmalloc_head.vm_list), - * .vm_start = VMALLOC_START, - * .vm_end = VMALLOC_END, - * }; - * - * However, vmalloc_head.vm_start is variable (typically, it is dependent on - * the amount of RAM found at boot time.) I would imagine that get_vm_area() - * would have to initialise this each time prior to calling vmregion_alloc(). - */ - -struct arm_vmregion * -arm_vmregion_alloc(struct arm_vmregion_head *head, size_t align, - size_t size, gfp_t gfp, const void *caller) -{ - unsigned long start = head-vm_start, addr = head-vm_end; - unsigned long flags; - struct arm_vmregion *c, *new; - - if (head-vm_end - head-vm_start size) { - printk(KERN_WARNING %s: allocation too big (requested %#x)\n, - __func__, size); - goto out; - } - - new = kmalloc(sizeof(struct arm_vmregion), gfp); - if (!new) - goto out; - - new-caller = caller; - - spin_lock_irqsave(head-vm_lock, flags); - - addr = rounddown(addr - size, align); - list_for_each_entry_reverse(c, head-vm_list, vm_list) { - if (addr = c-vm_end) - goto found; - addr = rounddown(c-vm_start - size, align); - if (addr start) - goto nospc; - } - - found: - /* -* Insert this entry after the one we found. -*/ - list_add(new-vm_list, c-vm_list); - new-vm_start = addr; - new-vm_end = addr + size; - new-vm_active = 1; - - spin_unlock_irqrestore(head-vm_lock, flags); - return new; - - nospc: - spin_unlock_irqrestore(head-vm_lock, flags); - kfree(new); - out: - return NULL; -} - -static struct arm_vmregion *__arm_vmregion_find(struct arm_vmregion_head *head, unsigned long addr) -{ - struct arm_vmregion *c; - - list_for_each_entry(c, head-vm_list, vm_list) { - if (c-vm_active c-vm_start == addr) - goto out; - } - c = NULL; - out: - return c; -} - -struct arm_vmregion *arm_vmregion_find(struct arm_vmregion_head *head, unsigned long addr) -{ - struct arm_vmregion *c; - unsigned long flags; - - spin_lock_irqsave(head-vm_lock, flags); - c = __arm_vmregion_find(head, addr); - spin_unlock_irqrestore(head-vm_lock, flags); - return c; -} - -struct arm_vmregion *arm_vmregion_find_remove(struct arm_vmregion_head *head, unsigned long addr) -{ - struct arm_vmregion *c; - unsigned long flags; - - spin_lock_irqsave(head-vm_lock, flags); - c = __arm_vmregion_find(head, addr); - if (c) - c-vm_active = 0; - spin_unlock_irqrestore(head-vm_lock, flags); - return c; -} - -void arm_vmregion_free(struct arm_vmregion_head *head, struct arm_vmregion *c) -{ - unsigned long flags; - - spin_lock_irqsave(head-vm_lock, flags); - list_del(c-vm_list); - spin_unlock_irqrestore(head-vm_lock, flags); - - kfree(c); -} - -#ifdef CONFIG_PROC_FS -static int arm_vmregion_show(struct seq_file *m, void *p) -{ - struct arm_vmregion *c = list_entry(p, struct arm_vmregion, vm_list); - - seq_printf(m, 0x%08lx-0x%08lx %7lu, c-vm_start, c-vm_end, - c-vm_end - c
[PATCH v4 2/3] ARM: ioremap: introduce an infrastructure for static mapped area
In current implementation, we used ARM-specific flag, that is, VM_ARM_STATIC_MAPPING, for distinguishing ARM specific static mapped area. The purpose of static mapped area is to re-use static mapped area when entire physical address range of the ioremap request can be covered by this area. This implementation causes needless overhead for some cases. For example, assume that there is only one static mapped area and vmlist has 300 areas. Every time we call ioremap, we check 300 areas for deciding whether it is matched or not. Moreover, even if there is no static mapped area and vmlist has 300 areas, every time we call ioremap, we check 300 areas in now. If we construct a extra list for static mapped area, we can eliminate above mentioned overhead. With a extra list, if there is one static mapped area, we just check only one area and proceed next operation quickly. In fact, it is not a critical problem, because ioremap is not frequently used. But reducing overhead is better idea. Another reason for doing this work is for removing architecture dependency on vmalloc layer. I think that vmlist and vmlist_lock is internal data structure for vmalloc layer. Some codes for debugging and stat inevitably use vmlist and vmlist_lock. But it is preferable that they are used as least as possible in outside of vmalloc.c Now, I introduce an ARM-specific infrastructure for static mapped area. In the following patch, we will use this and resolve above mentioned problem. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c index 88fd86c..ceb34ae 100644 --- a/arch/arm/mm/ioremap.c +++ b/arch/arm/mm/ioremap.c @@ -39,6 +39,78 @@ #include asm/mach/pci.h #include mm.h + +LIST_HEAD(static_vmlist); +static DEFINE_RWLOCK(static_vmlist_lock); + +static struct static_vm *find_static_vm_paddr(phys_addr_t paddr, + size_t size, unsigned long flags) +{ + struct static_vm *svm; + struct vm_struct *vm; + + read_lock(static_vmlist_lock); + list_for_each_entry(svm, static_vmlist, list) { + if (svm-flags != flags) + continue; + + vm = svm-vm; + if (vm-phys_addr paddr || + paddr + size - 1 vm-phys_addr + vm-size - 1) + continue; + + read_unlock(static_vmlist_lock); + return svm; + } + + return NULL; +} + +struct static_vm *find_static_vm_vaddr(void *vaddr) +{ + struct static_vm *svm; + struct vm_struct *vm; + + read_lock(static_vmlist_lock); + list_for_each_entry(svm, static_vmlist, list) { + vm = svm-vm; + + /* static_vmlist is ascending order */ + if (vm-addr vaddr) + break; + + if (vm-addr = vaddr vm-addr + vm-size vaddr) { + read_unlock(static_vmlist_lock); + return svm; + } + } + read_unlock(static_vmlist_lock); + + return NULL; +} + +void add_static_vm_early(struct static_vm *svm, unsigned long flags) +{ + struct static_vm *curr_svm; + struct vm_struct *vm; + void *vaddr; + + vm_area_add_early(svm-vm); + + vaddr = svm-vm.addr; + svm-flags = flags; + + write_lock(static_vmlist_lock); + list_for_each_entry(curr_svm, static_vmlist, list) { + vm = curr_svm-vm; + + if (vm-addr vaddr) + break; + } + list_add_tail(svm-list, curr_svm-list); + write_unlock(static_vmlist_lock); +} + int ioremap_page(unsigned long virt, unsigned long phys, const struct mem_type *mtype) { diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h index a8ee92d..fb45c79 100644 --- a/arch/arm/mm/mm.h +++ b/arch/arm/mm/mm.h @@ -1,4 +1,6 @@ #ifdef CONFIG_MMU +#include linux/list.h +#include linux/vmalloc.h /* the upper-most page table pointer */ extern pmd_t *top_pmd; @@ -65,6 +67,24 @@ extern void __flush_dcache_page(struct address_space *mapping, struct page *page /* consistent regions used by dma_alloc_attrs() */ #define VM_ARM_DMA_CONSISTENT 0x2000 + +/* ARM specific static_vm-flags bits */ +#define STATIC_VM_MEM 0x0001 +#define STATIC_VM_EMPTY0x0002 +#define STATIC_VM_MTYPE(mtype) ((mtype) 20) + +#define STATIC_VM_TYPE(type, mtype) (type | STATIC_VM_MTYPE(mtype)) + +struct static_vm { + struct vm_struct vm; + struct list_head list; + unsigned long flags; +}; + +extern struct list_head static_vmlist; +extern struct static_vm *find_static_vm_vaddr(void *vaddr); +extern void add_static_vm_early(struct static_vm *svm, unsigned long flags); + #endif #ifdef CONFIG_ZONE_DMA -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http
[PATCH v4 0/3] introduce static_vm for ARM-specific static mapped area
In current implementation, we used ARM-specific flag, that is, VM_ARM_STATIC_MAPPING, for distinguishing ARM specific static mapped area. The purpose of static mapped area is to re-use static mapped area when entire physical address range of the ioremap request can be covered by this area. This implementation causes needless overhead for some cases. For example, assume that there is only one static mapped area and vmlist has 300 areas. Every time we call ioremap, we check 300 areas for deciding whether it is matched or not. Moreover, even if there is no static mapped area and vmlist has 300 areas, every time we call ioremap, we check 300 areas in now. If we construct a extra list for static mapped area, we can eliminate above mentioned overhead. With a extra list, if there is one static mapped area, we just check only one area and proceed next operation quickly. In fact, it is not a critical problem, because ioremap is not frequently used. But reducing overhead is better idea. Another reason for doing this work is for removing vm_struct list management, entirely. For more information, look at the following link. http://lkml.org/lkml/2012/12/6/184 Changelog v3-v4: [2/3]: Changes from Nicolas' suggestion - embed static_vm code in ioremap.c - simplify struct static_vm - remove init_static_vm, instead, add_static_vm_early() init static_vm Use generic list for list management of static_vm Convert spin_lock to rw_lock Modify static_vm's flags bits [3/3]: Rework according to [2/3] change Rebased on v3.8-rc5 v2-v3: coverletter: refer a link related to this work [2/3]: drop @flags of find_static_vm_vaddr Rebased on v3.8-rc4 v1-v2: [2/3]: patch description is improved. Rebased on v3.7-rc7 Joonsoo Kim (3): ARM: vmregion: remove vmregion code entirely ARM: ioremap: introduce an infrastructure for static mapped area ARM: mm: use static_vm for managing static mapped areas arch/arm/mm/Makefile |2 +- arch/arm/mm/ioremap.c | 147 +++--- arch/arm/mm/mm.h | 28 --- arch/arm/mm/mmu.c | 47 ++- arch/arm/mm/vmregion.c | 205 arch/arm/mm/vmregion.h | 31 6 files changed, 147 insertions(+), 313 deletions(-) delete mode 100644 arch/arm/mm/vmregion.c delete mode 100644 arch/arm/mm/vmregion.h -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 3/3] ARM: mm: use static_vm for managing static mapped areas
A static mapped area is ARM-specific, so it is better not to use generic vmalloc data structure, that is, vmlist and vmlist_lock for managing static mapped area. And it causes some needless overhead and reducing this overhead is better idea. Now, we have newly introduced static_vm infrastructure. With it, we don't need to iterate all mapped areas. Instead, we just iterate static mapped areas. It helps to reduce an overhead of finding matched area. And architecture dependency on vmalloc layer is removed, so it will help to maintainability for vmalloc layer. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c index ceb34ae..7fe5b48 100644 --- a/arch/arm/mm/ioremap.c +++ b/arch/arm/mm/ioremap.c @@ -269,13 +269,14 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn, const struct mem_type *type; int err; unsigned long addr; - struct vm_struct * area; + struct vm_struct *area; + phys_addr_t paddr = __pfn_to_phys(pfn); #ifndef CONFIG_ARM_LPAE /* * High mappings must be supersection aligned */ - if (pfn = 0x10 (__pfn_to_phys(pfn) ~SUPERSECTION_MASK)) + if (pfn = 0x10 (paddr ~SUPERSECTION_MASK)) return NULL; #endif @@ -291,24 +292,17 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn, /* * Try to reuse one of the static mapping whenever possible. */ - read_lock(vmlist_lock); - for (area = vmlist; area; area = area-next) { - if (!size || (sizeof(phys_addr_t) == 4 pfn = 0x10)) - break; - if (!(area-flags VM_ARM_STATIC_MAPPING)) - continue; - if ((area-flags VM_ARM_MTYPE_MASK) != VM_ARM_MTYPE(mtype)) - continue; - if (__phys_to_pfn(area-phys_addr) pfn || - __pfn_to_phys(pfn) + size-1 area-phys_addr + area-size-1) - continue; - /* we can drop the lock here as we know *area is static */ - read_unlock(vmlist_lock); - addr = (unsigned long)area-addr; - addr += __pfn_to_phys(pfn) - area-phys_addr; - return (void __iomem *) (offset + addr); + if (size !((sizeof(phys_addr_t) == 4 pfn = 0x10))) { + struct static_vm *svm; + + svm = find_static_vm_paddr(paddr, size, + STATIC_VM_TYPE(STATIC_VM_MEM, mtype)); + if (svm) { + addr = (unsigned long)svm-vm.addr; + addr += paddr - svm-vm.phys_addr; + return (void __iomem *) (offset + addr); + } } - read_unlock(vmlist_lock); /* * Don't allow RAM to be mapped - this causes problems with ARMv6+ @@ -320,21 +314,21 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn, if (!area) return NULL; addr = (unsigned long)area-addr; - area-phys_addr = __pfn_to_phys(pfn); + area-phys_addr = paddr; #if !defined(CONFIG_SMP) !defined(CONFIG_ARM_LPAE) if (DOMAIN_IO == 0 (((cpu_architecture() = CPU_ARCH_ARMv6) (get_cr() CR_XP)) || cpu_is_xsc3()) pfn = 0x10 - !((__pfn_to_phys(pfn) | size | addr) ~SUPERSECTION_MASK)) { + !((paddr | size | addr) ~SUPERSECTION_MASK)) { area-flags |= VM_ARM_SECTION_MAPPING; err = remap_area_supersections(addr, pfn, size, type); - } else if (!((__pfn_to_phys(pfn) | size | addr) ~PMD_MASK)) { + } else if (!((paddr | size | addr) ~PMD_MASK)) { area-flags |= VM_ARM_SECTION_MAPPING; err = remap_area_sections(addr, pfn, size, type); } else #endif - err = ioremap_page_range(addr, addr + size, __pfn_to_phys(pfn), + err = ioremap_page_range(addr, addr + size, paddr, __pgprot(type-prot_pte)); if (err) { @@ -418,34 +412,21 @@ __arm_ioremap_exec(unsigned long phys_addr, size_t size, bool cached) void __iounmap(volatile void __iomem *io_addr) { void *addr = (void *)(PAGE_MASK (unsigned long)io_addr); - struct vm_struct *vm; + struct static_vm *svm; + + svm = find_static_vm_vaddr(addr); + if (svm) + return; - read_lock(vmlist_lock); - for (vm = vmlist; vm; vm = vm-next) { - if (vm-addr addr) - break; - if (!(vm-flags VM_IOREMAP)) - continue; - /* If this is a static mapping we must leave it alone */ - if ((vm-flags VM_ARM_STATIC_MAPPING) - (vm-addr = addr) (vm-addr + vm-size addr)) { - read_unlock(vmlist_lock
Re: [PATCH v4 3/3] ARM: mm: use static_vm for managing static mapped areas
2013/2/1 Nicolas Pitre nicolas.pi...@linaro.org: On Thu, 31 Jan 2013, Joonsoo Kim wrote: A static mapped area is ARM-specific, so it is better not to use generic vmalloc data structure, that is, vmlist and vmlist_lock for managing static mapped area. And it causes some needless overhead and reducing this overhead is better idea. Now, we have newly introduced static_vm infrastructure. With it, we don't need to iterate all mapped areas. Instead, we just iterate static mapped areas. It helps to reduce an overhead of finding matched area. And architecture dependency on vmalloc layer is removed, so it will help to maintainability for vmalloc layer. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com Comments below. diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c index ceb34ae..7fe5b48 100644 --- a/arch/arm/mm/ioremap.c +++ b/arch/arm/mm/ioremap.c @@ -269,13 +269,14 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn, const struct mem_type *type; int err; unsigned long addr; - struct vm_struct * area; + struct vm_struct *area; + phys_addr_t paddr = __pfn_to_phys(pfn); #ifndef CONFIG_ARM_LPAE /* * High mappings must be supersection aligned */ - if (pfn = 0x10 (__pfn_to_phys(pfn) ~SUPERSECTION_MASK)) + if (pfn = 0x10 (paddr ~SUPERSECTION_MASK)) return NULL; #endif @@ -291,24 +292,17 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn, /* * Try to reuse one of the static mapping whenever possible. */ - read_lock(vmlist_lock); - for (area = vmlist; area; area = area-next) { - if (!size || (sizeof(phys_addr_t) == 4 pfn = 0x10)) - break; - if (!(area-flags VM_ARM_STATIC_MAPPING)) - continue; - if ((area-flags VM_ARM_MTYPE_MASK) != VM_ARM_MTYPE(mtype)) - continue; - if (__phys_to_pfn(area-phys_addr) pfn || - __pfn_to_phys(pfn) + size-1 area-phys_addr + area-size-1) - continue; - /* we can drop the lock here as we know *area is static */ - read_unlock(vmlist_lock); - addr = (unsigned long)area-addr; - addr += __pfn_to_phys(pfn) - area-phys_addr; - return (void __iomem *) (offset + addr); + if (size !((sizeof(phys_addr_t) == 4 pfn = 0x10))) { + struct static_vm *svm; + + svm = find_static_vm_paddr(paddr, size, + STATIC_VM_TYPE(STATIC_VM_MEM, mtype)); + if (svm) { + addr = (unsigned long)svm-vm.addr; + addr += paddr - svm-vm.phys_addr; + return (void __iomem *) (offset + addr); + } } - read_unlock(vmlist_lock); /* * Don't allow RAM to be mapped - this causes problems with ARMv6+ @@ -320,21 +314,21 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn, if (!area) return NULL; addr = (unsigned long)area-addr; - area-phys_addr = __pfn_to_phys(pfn); + area-phys_addr = paddr; #if !defined(CONFIG_SMP) !defined(CONFIG_ARM_LPAE) if (DOMAIN_IO == 0 (((cpu_architecture() = CPU_ARCH_ARMv6) (get_cr() CR_XP)) || cpu_is_xsc3()) pfn = 0x10 -!((__pfn_to_phys(pfn) | size | addr) ~SUPERSECTION_MASK)) { +!((paddr | size | addr) ~SUPERSECTION_MASK)) { area-flags |= VM_ARM_SECTION_MAPPING; err = remap_area_supersections(addr, pfn, size, type); - } else if (!((__pfn_to_phys(pfn) | size | addr) ~PMD_MASK)) { + } else if (!((paddr | size | addr) ~PMD_MASK)) { area-flags |= VM_ARM_SECTION_MAPPING; err = remap_area_sections(addr, pfn, size, type); } else #endif - err = ioremap_page_range(addr, addr + size, __pfn_to_phys(pfn), + err = ioremap_page_range(addr, addr + size, paddr, __pgprot(type-prot_pte)); if (err) { @@ -418,34 +412,21 @@ __arm_ioremap_exec(unsigned long phys_addr, size_t size, bool cached) void __iounmap(volatile void __iomem *io_addr) { void *addr = (void *)(PAGE_MASK (unsigned long)io_addr); - struct vm_struct *vm; + struct static_vm *svm; + You could salvage the /* If this is a static mapping we must leave it alone */ comment here. Okay. + svm = find_static_vm_vaddr(addr); + if (svm) + return; - read_lock(vmlist_lock); - for (vm = vmlist; vm; vm = vm-next) { - if (vm-addr addr) - break; - if (!(vm-flags VM_IOREMAP)) - continue; - /* If this is a static mapping we must leave it alone */ - if ((vm-flags
Re: [PATCH v4 2/3] ARM: ioremap: introduce an infrastructure for static mapped area
Hello, Nicolas. 2013/2/1 Nicolas Pitre nicolas.pi...@linaro.org: On Thu, 31 Jan 2013, Joonsoo Kim wrote: In current implementation, we used ARM-specific flag, that is, VM_ARM_STATIC_MAPPING, for distinguishing ARM specific static mapped area. The purpose of static mapped area is to re-use static mapped area when entire physical address range of the ioremap request can be covered by this area. This implementation causes needless overhead for some cases. For example, assume that there is only one static mapped area and vmlist has 300 areas. Every time we call ioremap, we check 300 areas for deciding whether it is matched or not. Moreover, even if there is no static mapped area and vmlist has 300 areas, every time we call ioremap, we check 300 areas in now. If we construct a extra list for static mapped area, we can eliminate above mentioned overhead. With a extra list, if there is one static mapped area, we just check only one area and proceed next operation quickly. In fact, it is not a critical problem, because ioremap is not frequently used. But reducing overhead is better idea. Another reason for doing this work is for removing architecture dependency on vmalloc layer. I think that vmlist and vmlist_lock is internal data structure for vmalloc layer. Some codes for debugging and stat inevitably use vmlist and vmlist_lock. But it is preferable that they are used as least as possible in outside of vmalloc.c Now, I introduce an ARM-specific infrastructure for static mapped area. In the following patch, we will use this and resolve above mentioned problem. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com Much better. Comments below. Thanks. diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c index 88fd86c..ceb34ae 100644 --- a/arch/arm/mm/ioremap.c +++ b/arch/arm/mm/ioremap.c @@ -39,6 +39,78 @@ #include asm/mach/pci.h #include mm.h + +LIST_HEAD(static_vmlist); +static DEFINE_RWLOCK(static_vmlist_lock); In fact you don't need a lock at all. The only writer is add_static_vm_early() and we know it is only used during boot when the kernel is still single-threaded. Yes! + +static struct static_vm *find_static_vm_paddr(phys_addr_t paddr, + size_t size, unsigned long flags) +{ + struct static_vm *svm; + struct vm_struct *vm; + + read_lock(static_vmlist_lock); + list_for_each_entry(svm, static_vmlist, list) { + if (svm-flags != flags) + continue; + + vm = svm-vm; + if (vm-phys_addr paddr || + paddr + size - 1 vm-phys_addr + vm-size - 1) + continue; + + read_unlock(static_vmlist_lock); + return svm; + } + + return NULL; +} + +struct static_vm *find_static_vm_vaddr(void *vaddr) +{ + struct static_vm *svm; + struct vm_struct *vm; + + read_lock(static_vmlist_lock); + list_for_each_entry(svm, static_vmlist, list) { + vm = svm-vm; + + /* static_vmlist is ascending order */ + if (vm-addr vaddr) + break; + + if (vm-addr = vaddr vm-addr + vm-size vaddr) { + read_unlock(static_vmlist_lock); + return svm; + } + } + read_unlock(static_vmlist_lock); + + return NULL; +} + +void add_static_vm_early(struct static_vm *svm, unsigned long flags) This should be marked with __init. This way, it is less likely to be used after boot, especially with no locking. And vm_area_add_early() is valid only if !vmap_initialized anyway, and also __init. Okay. +{ + struct static_vm *curr_svm; + struct vm_struct *vm; + void *vaddr; + + vm_area_add_early(svm-vm); + + vaddr = svm-vm.addr; + svm-flags = flags; + + write_lock(static_vmlist_lock); + list_for_each_entry(curr_svm, static_vmlist, list) { + vm = curr_svm-vm; + + if (vm-addr vaddr) + break; + } + list_add_tail(svm-list, curr_svm-list); + write_unlock(static_vmlist_lock); +} + int ioremap_page(unsigned long virt, unsigned long phys, const struct mem_type *mtype) { diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h index a8ee92d..fb45c79 100644 --- a/arch/arm/mm/mm.h +++ b/arch/arm/mm/mm.h @@ -1,4 +1,6 @@ #ifdef CONFIG_MMU +#include linux/list.h +#include linux/vmalloc.h /* the upper-most page table pointer */ extern pmd_t *top_pmd; @@ -65,6 +67,24 @@ extern void __flush_dcache_page(struct address_space *mapping, struct page *page /* consistent regions used by dma_alloc_attrs() */ #define VM_ARM_DMA_CONSISTENT0x2000 + +/* ARM specific static_vm-flags bits */ +#define STATIC_VM_MEM0x0001 +#define STATIC_VM_EMPTY 0x0002 +#define STATIC_VM_MTYPE(mtype) ((mtype
Re: [PATCH] x86, reboot: skip reboot_fixups in early boot phase
Hello, Bjorn. On Thu, Jan 24, 2013 at 10:45:13AM -0700, Bjorn Helgaas wrote: On Fri, Dec 28, 2012 at 6:50 AM, Joonsoo Kim js1...@gmail.com wrote: During early boot phase, PCI bus subsystem is not yet initialized. If panic is occured in early boot phase and panic_timeout is set, code flow go into emergency_restart() and hit mach_reboot_fixups(), then encounter another panic. When second panic, we can't hold a panic_lock, so code flow go into panic_smp_self_stop() which prevent system to restart. For avoid second panic, skip reboot_fixups in early boot phase. It makes panic_timeout works in early boot phase. Cc: Thomas Gleixner t...@linutronix.de Cc: Ingo Molnar mi...@redhat.com Cc: H. Peter Anvin h...@zytor.com Signed-off-by: Joonsoo Kim js1...@gmail.com diff --git a/arch/x86/kernel/reboot_fixups_32.c b/arch/x86/kernel/reboot_fixups_32.c index c8e41e9..b9b8ec9 100644 --- a/arch/x86/kernel/reboot_fixups_32.c +++ b/arch/x86/kernel/reboot_fixups_32.c @@ -89,6 +89,10 @@ void mach_reboot_fixups(void) if (in_interrupt()) return; + /* During early boot phase, PCI is not yet initialized */ + if (system_state == SYSTEM_BOOTING) + return; + for (i=0; i ARRAY_SIZE(fixups_table); i++) { cur = (fixups_table[i]); dev = pci_get_device(cur-vendor, cur-device, NULL); I guess you're saying that if we call pci_get_device() too early, it panics? Did you figure out why that happens? If we call pci_get_device() before PCI has been initialized, it would be good if it just returned NULL, indicating that we didn't find any matching device. I looked briefly, and I thought that's what would happen, but apparently I'm missing something. In bus_find_device(), klist_iter_init_node() is called with @bus-p-klist_device. Before initialization, bus-p is NULL, so panic is occured. Bjorn -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, reboot: skip reboot_fixups in early boot phase
On Thu, Jan 24, 2013 at 08:52:32PM -0800, Greg Kroah-Hartman wrote: On Thu, Jan 24, 2013 at 09:21:52PM -0700, Bjorn Helgaas wrote: On Thu, Jan 24, 2013 at 9:14 PM, Greg Kroah-Hartman gre...@linuxfoundation.org wrote: On Thu, Jan 24, 2013 at 07:59:01PM -0700, Bjorn Helgaas wrote: [+cc Greg for driver core] On Fri, Jan 25, 2013 at 10:13:03AM +0900, Joonsoo Kim wrote: Hello, Bjorn. On Thu, Jan 24, 2013 at 10:45:13AM -0700, Bjorn Helgaas wrote: On Fri, Dec 28, 2012 at 6:50 AM, Joonsoo Kim js1...@gmail.com wrote: During early boot phase, PCI bus subsystem is not yet initialized. If panic is occured in early boot phase and panic_timeout is set, code flow go into emergency_restart() and hit mach_reboot_fixups(), then encounter another panic. When second panic, we can't hold a panic_lock, so code flow go into panic_smp_self_stop() which prevent system to restart. For avoid second panic, skip reboot_fixups in early boot phase. It makes panic_timeout works in early boot phase. Cc: Thomas Gleixner t...@linutronix.de Cc: Ingo Molnar mi...@redhat.com Cc: H. Peter Anvin h...@zytor.com Signed-off-by: Joonsoo Kim js1...@gmail.com diff --git a/arch/x86/kernel/reboot_fixups_32.c b/arch/x86/kernel/reboot_fixups_32.c index c8e41e9..b9b8ec9 100644 --- a/arch/x86/kernel/reboot_fixups_32.c +++ b/arch/x86/kernel/reboot_fixups_32.c @@ -89,6 +89,10 @@ void mach_reboot_fixups(void) if (in_interrupt()) return; + /* During early boot phase, PCI is not yet initialized */ + if (system_state == SYSTEM_BOOTING) + return; + for (i=0; i ARRAY_SIZE(fixups_table); i++) { cur = (fixups_table[i]); dev = pci_get_device(cur-vendor, cur-device, NULL); I guess you're saying that if we call pci_get_device() too early, it panics? Did you figure out why that happens? If we call pci_get_device() before PCI has been initialized, it would be good if it just returned NULL, indicating that we didn't find any matching device. I looked briefly, and I thought that's what would happen, but apparently I'm missing something. In bus_find_device(), klist_iter_init_node() is called with @bus-p-klist_device. Before initialization, bus-p is NULL, so panic is occured. I see. pci_bus_type.p is initialized by __bus_register() in this path: pci_driver_init# postcore_initcall bus_register(pci_bus_type) __bus_register priv = kzalloc(sizeof(struct subsys_private)) bus-p = priv klist_init(priv-klist_devices, klist_devices_get, klist_devices_put) I was hoping we could statically initialize the klist, but that doesn't seem likely. But I wonder if we could do something like the following. If we could, then callers wouldn't have to worry about whether or not the bus has been initialized. snip I have no objection to that patch, but really, someone wants to call pci_find_device() before PCI is initialized? Can't that be fixed instead, as that is the root problem, not the driver core. But, to paper over your subsystem's bugs, I guess I can take it :) The caller is in the native_machine_emergency_restart() path. Joonsoo's original patch does what I think you're suggesting: + /* During early boot phase, PCI is not yet initialized */ + if (system_state == SYSTEM_BOOTING) + return; + for (i=0; i ARRAY_SIZE(fixups_table); i++) { cur = (fixups_table[i]); dev = pci_get_device(cur-vendor, cur-device, NULL); I think it's sort of ugly to check system_state before using pci_get_device(), and there's not really an obvious connection between system_state and PCI initialization. But if you prefer that, Joonsoo's original patch is fine with me. Both probably would be best, as there are probably other things that you don't want to touch when you are still booting and trying to restart the machine at the same time. I agree that my patch is ugly :) So I drop it and wait for your patch. Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] slub: assign refcount for kmalloc_caches
On Thu, Jan 24, 2013 at 10:32:32PM -0500, CAI Qian wrote: - Original Message - From: Greg Kroah-Hartman gre...@linuxfoundation.org To: Joonsoo Kim iamjoonsoo@lge.com Cc: Paul Hargrove phhargr...@lbl.gov, Pekka Enberg penb...@kernel.org, linux-kernel@vger.kernel.org, linux...@kvack.org, Christoph Lameter c...@linux.com Sent: Tuesday, January 15, 2013 3:23:36 AM Subject: Re: [PATCH] slub: assign refcount for kmalloc_caches On Fri, Jan 11, 2013 at 04:52:54PM +0900, Joonsoo Kim wrote: On Thu, Jan 10, 2013 at 08:47:39PM -0800, Paul Hargrove wrote: I just had a look at patch-3.7.2-rc1, and this change doesn't appear to have made it in yet. Am I missing something? -Paul I try to check it. Ccing to Greg. Hello, Pekka and Greg. v3.8-rcX has already fixed by another stuff, but it is not simple change. So I made a new patch and sent it. How this kind of patch (only for stable v3.7) go into stable tree? through Pekka's slab tree? or send it to Greg, directly? I don't know how to submit this kind of patch to stable tree exactly. Could anyone help me? Please redo it, and send it to sta...@vger.kernel.org, and say exactly why it isn't in Linus's tree, and that it should only be applied to 3.7-stable. I also met this during the testing, so I'll re-send it then. Hello, CAI Qian. I totally forget this. Thanks for this work. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 0/3] introduce static_vm for ARM-specific static mapped area
On Mon, Jan 28, 2013 at 01:04:24PM -0500, Nicolas Pitre wrote: On Mon, 28 Jan 2013, Will Deacon wrote: Hello, On Thu, Jan 24, 2013 at 01:28:51AM +, Joonsoo Kim wrote: In current implementation, we used ARM-specific flag, that is, VM_ARM_STATIC_MAPPING, for distinguishing ARM specific static mapped area. The purpose of static mapped area is to re-use static mapped area when entire physical address range of the ioremap request can be covered by this area. This implementation causes needless overhead for some cases. For example, assume that there is only one static mapped area and vmlist has 300 areas. Every time we call ioremap, we check 300 areas for deciding whether it is matched or not. Moreover, even if there is no static mapped area and vmlist has 300 areas, every time we call ioremap, we check 300 areas in now. If we construct a extra list for static mapped area, we can eliminate above mentioned overhead. With a extra list, if there is one static mapped area, we just check only one area and proceed next operation quickly. In fact, it is not a critical problem, because ioremap is not frequently used. But reducing overhead is better idea. Another reason for doing this work is for removing vm_struct list management, entirely. For more information, look at the following link. http://lkml.org/lkml/2012/12/6/184 First patch looks good (removing the unused vmregion stuff) but I'm not so sure about the rest of it. If you really care about ioremap performance, perhaps it would be better to have a container struct around the vm_struct for static mappings and then stick them in an augmented rbtree so you can efficiently find the mapping encompassing a particular physical address? How can ioremap performance be a problem is the question I had since the beginning. Firstly, ioremap is _not_ meant to be used in performance critical paths. Secondly, there shouldn't be _that_ many entries on the vmlist such as 300. That sounds a bit excessive. So please, can we discuss the reasons that motivated those patches in the first place? Maybe that's where the actual problem is. Hello, Wiil and Nicolas. First of all, thanks for reviewing. There is another reason for doing this work. As mentioned above, I try to remove list management for vm_struct(vmlist), entirely. For that purpose, removing architecture dependency against vmlist is needed. Below link is for my RFC patch trying to remove list management for vm_struct. http://lkml.org/lkml/2012/12/6/184 Removing dependency for other architectures is rather trivial, but for ARM, it is not trivial case. So I prepared this patchset. My description makes you missleading possibly. Sorry for this. Answer for your other questions is below. I know ioremap is _not_ meant to be used in performance critical paths, and I mentioned it earlier. In fact, it is not a critical problem, because ioremap is not frequently used. But reducing overhead is better idea. And, there are many entries on the vmlist for my test devices(Android phone). I saw more than 300 entries in former days, but today, I re-check it and find 230~250 entries. Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 1/3] staging, zsmalloc: introduce zs_mem_[read/write]
Hello, Minchan. On Thu, Jan 17, 2013 at 08:59:22AM +0900, Minchan Kim wrote: Hi Joonsoo, On Wed, Jan 16, 2013 at 05:08:55PM +0900, Joonsoo Kim wrote: If object is on boundary of page, zs_map_object() copy content of object to pre-allocated page and return virtual address of IMHO, for reviewer reading easily, it would be better to specify explict word instead of abstract. pre-allocated pages : vm_buf which is reserved pages for zsmalloc this pre-allocated page. If user inform zsmalloc of memcpy region, we can avoid this copy overhead. That's a good idea! This patch implement two API and these get information of memcpy region. Using this information, we can do memcpy without overhead. For the clarification, we can reduce copy overhead with this patch in !USE_PGTABLE_MAPPING case. For USE_PGTABLE_MAPPING case, we can avoid flush cache and tlb overhead via these API. Yeb! Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com --- These are [RFC] patches, because I don't test and I don't have test environment, yet. Just compile test done. If there is positive comment, I will setup test env and check correctness. These are based on v3.8-rc3. If rebase is needed, please notify me what tree I should rebase. Whenever you send zsmalloc/zram/zcache, you have to based on recent linux-next. But I hope we send the patches to akpm by promoting soon. :( diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c b/drivers/staging/zsmalloc/zsmalloc-main.c index 09a9d35..e3ef5a5 100644 --- a/drivers/staging/zsmalloc/zsmalloc-main.c +++ b/drivers/staging/zsmalloc/zsmalloc-main.c @@ -1045,6 +1045,118 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle) } EXPORT_SYMBOL_GPL(zs_unmap_object); It's exported function. Please write description. +void zs_mem_read(struct zs_pool *pool, unsigned long handle, + void *dest, unsigned long src_off, size_t n) n is meaningless, please use meaningful word. How about this? void *buf, unsigned long offset, size_t count +{ + struct page *page; + unsigned long obj_idx, off; + + unsigned int class_idx; + enum fullness_group fg; + struct size_class *class; + struct page *pages[2]; + int sizes[2]; + void *addr; + + BUG_ON(!handle); + + /* +* Because we use per-cpu mapping areas shared among the +* pools/users, we can't allow mapping in interrupt context +* because it can corrupt another users mappings. +*/ + BUG_ON(in_interrupt()); + + obj_handle_to_location(handle, page, obj_idx); + get_zspage_mapping(get_first_page(page), class_idx, fg); + class = pool-size_class[class_idx]; + off = obj_idx_to_offset(page, obj_idx, class-size); + off += src_off; + + BUG_ON(class-size n); + + if (off + n = PAGE_SIZE) { + /* this object is contained entirely within a page */ + addr = kmap_atomic(page); + memcpy(dest, addr + off, n); + kunmap_atomic(addr); + return; + } + + /* this object spans two pages */ + pages[0] = page; + pages[1] = get_next_page(page); + BUG_ON(!pages[1]); + + sizes[0] = PAGE_SIZE - off; + sizes[1] = n - sizes[0]; + + addr = kmap_atomic(pages[0]); + memcpy(dest, addr + off, sizes[0]); + kunmap_atomic(addr); + + addr = kmap_atomic(pages[1]); + memcpy(dest + sizes[0], addr, sizes[1]); + kunmap_atomic(addr); +} +EXPORT_SYMBOL_GPL(zs_mem_read); + Ditto. Write descriptoin. +void zs_mem_write(struct zs_pool *pool, unsigned long handle, + const void *src, unsigned long dest_off, size_t n) +{ + struct page *page; + unsigned long obj_idx, off; + + unsigned int class_idx; + enum fullness_group fg; + struct size_class *class; + struct page *pages[2]; + int sizes[2]; + void *addr; + + BUG_ON(!handle); + + /* +* Because we use per-cpu mapping areas shared among the +* pools/users, we can't allow mapping in interrupt context +* because it can corrupt another users mappings. +*/ + BUG_ON(in_interrupt()); + + obj_handle_to_location(handle, page, obj_idx); + get_zspage_mapping(get_first_page(page), class_idx, fg); + class = pool-size_class[class_idx]; + off = obj_idx_to_offset(page, obj_idx, class-size); + off += dest_off; + + BUG_ON(class-size n); + + if (off + n = PAGE_SIZE) { + /* this object is contained entirely within a page */ + addr = kmap_atomic(page); + memcpy(addr + off, src, n); + kunmap_atomic(addr); + return; + } + + /* this object spans two pages */ + pages[0] = page; + pages[1] = get_next_page(page); + BUG_ON(!pages[1
[PATCH v2 2/3] slub: correct bootstrap() for kmem_cache, kmem_cache_node
Current implementation of bootstrap() is not sufficient for kmem_cache and kmem_cache_node. First, for kmem_cache. bootstrap() call kmem_cache_zalloc() at first. When kmem_cache_zalloc() is called, kmem_cache's slab is moved to cpu slab for satisfying kmem_cache allocation request. In current implementation, we only consider n-partial slabs, so, we miss this cpu slab for kmem_cache. Second, for kmem_cache_node. When slab_state = PARTIAL, create_boot_cache() is called. And then, kmem_cache_node's slab is moved to cpu slab for satisfying kmem_cache_node allocation request. So, we also miss this slab. These didn't make any error previously, because we normally don't free objects which comes from kmem_cache's first slab and kmem_cache_node's. Problem will be solved if we consider a cpu slab in bootstrap(). This patch implement it. v2: don't loop over all processors in bootstrap(). Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com diff --git a/mm/slub.c b/mm/slub.c index 7204c74..8b95364 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3614,10 +3614,15 @@ static int slab_memory_callback(struct notifier_block *self, static struct kmem_cache * __init bootstrap(struct kmem_cache *static_cache) { int node; + struct kmem_cache_cpu *c; struct kmem_cache *s = kmem_cache_zalloc(kmem_cache, GFP_NOWAIT); memcpy(s, static_cache, kmem_cache-object_size); + c = this_cpu_ptr(s-cpu_slab); + if (c-page) + c-page-slab_cache = s; + for_each_node_state(node, N_NORMAL_MEMORY) { struct kmem_cache_node *n = get_node(s, node); struct page *p; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/