[PATCH v2] ARM: mm: clean-up in order to reduce to call kmap_high_get()

2013-03-18 Thread Joonsoo Kim
In kmap_atomic(), kmap_high_get() is invoked for checking already
mapped area. In __flush_dcache_page() and dma_cache_maint_page(),
we explicitly call kmap_high_get() before kmap_atomic()
when cache_is_vipt(), so kmap_high_get() can be invoked twice.
This is useless operation, so remove one.

v2: change cache_is_vipt() to cache_is_vipt_nonaliasing() in order to
be self-documented

Acked-by: Nicolas Pitre n...@linaro.org
Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
---
Hello Nicolas.
I maintain your 'Acked-by' while updating this patch to v2.
Please let me know if there is problem.

Thanks.

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index e9db6b4..ef3e0f3 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -823,16 +823,17 @@ static void dma_cache_maint_page(struct page *page, 
unsigned long offset,
if (PageHighMem(page)) {
if (len + offset  PAGE_SIZE)
len = PAGE_SIZE - offset;
-   vaddr = kmap_high_get(page);
-   if (vaddr) {
-   vaddr += offset;
-   op(vaddr, len, dir);
-   kunmap_high(page);
-   } else if (cache_is_vipt()) {
-   /* unmapped pages might still be cached */
+
+   if (cache_is_vipt_nonaliasing()) {
vaddr = kmap_atomic(page);
op(vaddr + offset, len, dir);
kunmap_atomic(vaddr);
+   } else {
+   vaddr = kmap_high_get(page);
+   if (vaddr) {
+   op(vaddr + offset, len, dir);
+   kunmap_high(page);
+   }
}
} else {
vaddr = page_address(page) + offset;
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 1c8f7f5..0d473cc 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -170,15 +170,18 @@ void __flush_dcache_page(struct address_space *mapping, 
struct page *page)
if (!PageHighMem(page)) {
__cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
} else {
-   void *addr = kmap_high_get(page);
-   if (addr) {
-   __cpuc_flush_dcache_area(addr, PAGE_SIZE);
-   kunmap_high(page);
-   } else if (cache_is_vipt()) {
-   /* unmapped pages might still be cached */
+   void *addr;
+
+   if (cache_is_vipt_nonaliasing()) {
addr = kmap_atomic(page);
__cpuc_flush_dcache_area(addr, PAGE_SIZE);
kunmap_atomic(addr);
+   } else {
+   addr = kmap_high_get(page);
+   if (addr) {
+   __cpuc_flush_dcache_area(addr, PAGE_SIZE);
+   kunmap_high(page);
+   }
}
}
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] ARM: mm: disable kmap_high_get() for SMP

2013-03-18 Thread Joonsoo Kim
On Thu, Mar 07, 2013 at 07:35:51PM +0900, JoonSoo Kim wrote:
 2013/3/7 Nicolas Pitre nicolas.pi...@linaro.org:
  On Thu, 7 Mar 2013, Joonsoo Kim wrote:
 
  Hello, Nicolas.
 
  On Tue, Mar 05, 2013 at 05:36:12PM +0800, Nicolas Pitre wrote:
   On Mon, 4 Mar 2013, Joonsoo Kim wrote:
  
With SMP and enabling kmap_high_get(), it makes users of kmap_atomic()
sequential ordered, because kmap_high_get() use global kmap_lock().
It is not welcome situation, so turn off this optimization for SMP.
  
   I'm not sure I understand the problem.
  
   The lock taken by kmap_high_get() is released right away before that
   function returns and therefore this is not actually serializing
   anything.
 
  Yes, you understand what I want to say correctly.
  Sorry for bad explanation.
 
  Following is reasons why I send this patch with RFC tag.
 
  If we have more cpus, performance degration is possible although
  it is very short time to holding the lock in kmap_high_get().
 
  And kmap has maximum 512 entries(512 * 4K = 2M) and some mobile devices
  has 2G memory(highmem 1G), so probability for finding matched entry
  is approximately  1/512. This probability can be more decreasing
  for device which have more memory. So I think that waste time to find
  matched entry is more than saved time.
 
  Above is my humble opinion, so please let me know what I am missing.
 
  Please look at the kmap_high_get() code again.  It performs no
  searching at all.  What it does is:
 
 If page is not highmem, it may be already filtered in kmap_atomic().
 So we only consider highmem page.
 
 For highmem page, it perform searching.
 In kmap_high_get(), page_address() is called.
 In page_address(), it hash PA and iterate a list for this hashed value.
 
 And another advantage of disabling ARCH_NEEDS_KMAP_HIGH_GET is
 that kmap(), kunmap() works without irq disabled.
 
 Thanks.

Hello, Nicolas.

For just confirm, you don't agree with this, right?

Thanks.

 
  - lock the kmap array against concurrent changes
 
  - if the given page is not highmem, unlock and return NULL
 
  - otherwise increment that page reference count, unlock, and return the
mapped address for that page.
 
  There is almost zero cost to this function, independently of the number
  of kmap entries, whereas it does save much bigger costs elsewhere when
  it is successful.
 
 
  Nicolas
  --
  To unsubscribe from this list: send the line unsubscribe linux-kernel in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  Please read the FAQ at  http://www.tux.org/lkml/
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/3] slub: correct to calculate num of acquired objects in get_partial_node()

2013-03-18 Thread Joonsoo Kim
Hello, Pekka.

Could you pick up 1/3, 3/3?
These are already acked by Christoph.
2/3 is same effect as Glauber's slub: correctly bootstrap boot caches,
so should skip it.

Thanks.

On Mon, Jan 21, 2013 at 05:01:25PM +0900, Joonsoo Kim wrote:
 There is a subtle bug when calculating a number of acquired objects.
 
 Currently, we calculate available = page-objects - page-inuse,
 after acquire_slab() is called in get_partial_node().
 
 In acquire_slab() with mode = 1, we always set new.inuse = page-objects.
 So,
 
   acquire_slab(s, n, page, object == NULL);
 
   if (!object) {
   c-page = page;
   stat(s, ALLOC_FROM_PARTIAL);
   object = t;
   available = page-objects - page-inuse;
 
   !!! availabe is always 0 !!!
   ...
 
 Therfore, available  s-cpu_partial / 2 is always false and
 we always go to second iteration.
 This patch correct this problem.
 
 After that, we don't need return value of put_cpu_partial().
 So remove it.
 
 v2: calculate nr of objects using new.objects and new.inuse.
 It is more accurate way than before.
 
 Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
 
 diff --git a/mm/slub.c b/mm/slub.c
 index ba2ca53..7204c74 100644
 --- a/mm/slub.c
 +++ b/mm/slub.c
 @@ -1493,7 +1493,7 @@ static inline void remove_partial(struct 
 kmem_cache_node *n,
   */
  static inline void *acquire_slab(struct kmem_cache *s,
   struct kmem_cache_node *n, struct page *page,
 - int mode)
 + int mode, int *objects)
  {
   void *freelist;
   unsigned long counters;
 @@ -1507,6 +1507,7 @@ static inline void *acquire_slab(struct kmem_cache *s,
   freelist = page-freelist;
   counters = page-counters;
   new.counters = counters;
 + *objects = new.objects - new.inuse;
   if (mode) {
   new.inuse = page-objects;
   new.freelist = NULL;
 @@ -1528,7 +1529,7 @@ static inline void *acquire_slab(struct kmem_cache *s,
   return freelist;
  }
  
 -static int put_cpu_partial(struct kmem_cache *s, struct page *page, int 
 drain);
 +static void put_cpu_partial(struct kmem_cache *s, struct page *page, int 
 drain);
  static inline bool pfmemalloc_match(struct page *page, gfp_t gfpflags);
  
  /*
 @@ -1539,6 +1540,8 @@ static void *get_partial_node(struct kmem_cache *s, 
 struct kmem_cache_node *n,
  {
   struct page *page, *page2;
   void *object = NULL;
 + int available = 0;
 + int objects;
  
   /*
* Racy check. If we mistakenly see no partial slabs then we
 @@ -1552,22 +1555,21 @@ static void *get_partial_node(struct kmem_cache *s, 
 struct kmem_cache_node *n,
   spin_lock(n-list_lock);
   list_for_each_entry_safe(page, page2, n-partial, lru) {
   void *t;
 - int available;
  
   if (!pfmemalloc_match(page, flags))
   continue;
  
 - t = acquire_slab(s, n, page, object == NULL);
 + t = acquire_slab(s, n, page, object == NULL, objects);
   if (!t)
   break;
  
 + available += objects;
   if (!object) {
   c-page = page;
   stat(s, ALLOC_FROM_PARTIAL);
   object = t;
 - available =  page-objects - page-inuse;
   } else {
 - available = put_cpu_partial(s, page, 0);
 + put_cpu_partial(s, page, 0);
   stat(s, CPU_PARTIAL_NODE);
   }
   if (kmem_cache_debug(s) || available  s-cpu_partial / 2)
 @@ -1946,7 +1948,7 @@ static void unfreeze_partials(struct kmem_cache *s,
   * If we did not find a slot then simply move all the partials to the
   * per node partial list.
   */
 -static int put_cpu_partial(struct kmem_cache *s, struct page *page, int 
 drain)
 +static void put_cpu_partial(struct kmem_cache *s, struct page *page, int 
 drain)
  {
   struct page *oldpage;
   int pages;
 @@ -1984,7 +1986,6 @@ static int put_cpu_partial(struct kmem_cache *s, struct 
 page *page, int drain)
   page-next = oldpage;
  
   } while (this_cpu_cmpxchg(s-cpu_slab-partial, oldpage, page) != 
 oldpage);
 - return pobjects;
  }
  
  static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 -- 
 1.7.9.5
 
 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] correct load_balance()

2013-03-18 Thread Joonsoo Kim
On Mon, Feb 25, 2013 at 01:56:59PM +0900, Joonsoo Kim wrote:
 On Thu, Feb 14, 2013 at 02:48:33PM +0900, Joonsoo Kim wrote:
  Commit 88b8dac0 makes load_balance() consider other cpus in its group.
  But, there are some missing parts for this feature to work properly.
  This patchset correct these things and make load_balance() robust.
  
  Others are related to LBF_ALL_PINNED. This is fallback functionality
  when all tasks can't be moved as cpu affinity. But, currently,
  if imbalance is not large enough to task's load, we leave LBF_ALL_PINNED
  flag and 'redo' is triggered. This is not our intention, so correct it.
  
  These are based on v3.8-rc7.
  
  Joonsoo Kim (8):
sched: change position of resched_cpu() in load_balance()
sched: explicitly cpu_idle_type checking in rebalance_domains()
sched: don't consider other cpus in our group in case of NEWLY_IDLE
sched: clean up move_task() and move_one_task()
sched: move up affinity check to mitigate useless redoing overhead
sched: rename load_balance_tmpmask to load_balance_cpu_active
sched: prevent to re-select dst-cpu in load_balance()
sched: reset lb_env when redo in load_balance()
  
   kernel/sched/core.c |9 +++--
   kernel/sched/fair.c |  107 
  +--
   2 files changed, 67 insertions(+), 49 deletions(-)
 
 Hello, Ingo and Peter.
 
 Could you review this patch set?
 Please let me know what I should do for merging this?

Hello.
One more ping :)

 
 Thanks.
 
  -- 
  1.7.9.5
  
  --
  To unsubscribe from this list: send the line unsubscribe linux-kernel in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  Please read the FAQ at  http://www.tux.org/lkml/
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] mm, nobootmem: clean-up of free_low_memory_core_early()

2013-03-18 Thread Joonsoo Kim
Remove unused argument and make function static,
because there is no user outside of nobootmem.c

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h
index cdc3bab..5f0b0e1 100644
--- a/include/linux/bootmem.h
+++ b/include/linux/bootmem.h
@@ -44,7 +44,6 @@ extern unsigned long init_bootmem_node(pg_data_t *pgdat,
   unsigned long endpfn);
 extern unsigned long init_bootmem(unsigned long addr, unsigned long memend);
 
-extern unsigned long free_low_memory_core_early(int nodeid);
 extern unsigned long free_all_bootmem_node(pg_data_t *pgdat);
 extern unsigned long free_all_bootmem(void);
 
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index 4711e91..589c673 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -120,7 +120,7 @@ static unsigned long __init __free_memory_core(phys_addr_t 
start,
return end_pfn - start_pfn;
 }
 
-unsigned long __init free_low_memory_core_early(int nodeid)
+static unsigned long __init free_low_memory_core_early()
 {
unsigned long count = 0;
phys_addr_t start, end, size;
@@ -170,7 +170,7 @@ unsigned long __init free_all_bootmem(void)
 *  because in some case like Node0 doesn't have RAM installed
 *  low ram will be on Node1
 */
-   return free_low_memory_core_early(MAX_NUMNODES);
+   return free_low_memory_core_early();
 }
 
 /**
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] mm, nobootmem: fix wrong usage of max_low_pfn

2013-03-18 Thread Joonsoo Kim
max_low_pfn reflect the number of _pages_ in the system,
not the maximum PFN. You can easily find that fact in init_bootmem().
So fix it.

Additionally, if 'start_pfn == end_pfn', we don't need to go futher,
so change range check.

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index 5e07d36..4711e91 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -110,9 +110,9 @@ static unsigned long __init __free_memory_core(phys_addr_t 
start,
 {
unsigned long start_pfn = PFN_UP(start);
unsigned long end_pfn = min_t(unsigned long,
- PFN_DOWN(end), max_low_pfn);
+ PFN_DOWN(end), min_low_pfn);
 
-   if (start_pfn  end_pfn)
+   if (start_pfn = end_pfn)
return 0;
 
__free_pages_memory(start_pfn, end_pfn);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] mm, nobootmem: do memset() after memblock_reserve()

2013-03-18 Thread Joonsoo Kim
Currently, we do memset() before reserving the area.
This may not cause any problem, but it is somewhat weird.
So change execution order.

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index 589c673..f11ec1c 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -46,8 +46,8 @@ static void * __init __alloc_memory_core_early(int nid, u64 
size, u64 align,
return NULL;
 
ptr = phys_to_virt(addr);
-   memset(ptr, 0, size);
memblock_reserve(addr, size);
+   memset(ptr, 0, size);
/*
 * The min_count is set to 0 so that bootmem allocated blocks
 * are never reported as leaks.
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] mm, nobootmem: clean-up of free_low_memory_core_early()

2013-03-18 Thread Joonsoo Kim
On Tue, Mar 19, 2013 at 02:16:00PM +0900, Joonsoo Kim wrote:
 Remove unused argument and make function static,
 because there is no user outside of nobootmem.c
 
 Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
 
 diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h
 index cdc3bab..5f0b0e1 100644
 --- a/include/linux/bootmem.h
 +++ b/include/linux/bootmem.h
 @@ -44,7 +44,6 @@ extern unsigned long init_bootmem_node(pg_data_t *pgdat,
  unsigned long endpfn);
  extern unsigned long init_bootmem(unsigned long addr, unsigned long memend);
  
 -extern unsigned long free_low_memory_core_early(int nodeid);
  extern unsigned long free_all_bootmem_node(pg_data_t *pgdat);
  extern unsigned long free_all_bootmem(void);
  
 diff --git a/mm/nobootmem.c b/mm/nobootmem.c
 index 4711e91..589c673 100644
 --- a/mm/nobootmem.c
 +++ b/mm/nobootmem.c
 @@ -120,7 +120,7 @@ static unsigned long __init 
 __free_memory_core(phys_addr_t start,
   return end_pfn - start_pfn;
  }
  
 -unsigned long __init free_low_memory_core_early(int nodeid)
 +static unsigned long __init free_low_memory_core_early()
  {
   unsigned long count = 0;
   phys_addr_t start, end, size;
 @@ -170,7 +170,7 @@ unsigned long __init free_all_bootmem(void)
*  because in some case like Node0 doesn't have RAM installed
*  low ram will be on Node1
*/
 - return free_low_memory_core_early(MAX_NUMNODES);
 + return free_low_memory_core_early();
  }
  
  /**
 -- 
 1.7.9.5
 
 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a

Sorry, this patch makes build warning.
Below is fixed version.


From 05f4a768dd5c514113916908f4710f8863704ed9 Mon Sep 17 00:00:00 2001
From: Joonsoo Kim iamjoonsoo@lge.com
Date: Mon, 18 Mar 2013 14:17:57 +0900
Subject: [PATCH] mm, nobootmem: clean-up of free_low_memory_core_early()

Remove unused argument and make function static,
because there is no user outside of nobootmem.c

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h
index cdc3bab..5f0b0e1 100644
--- a/include/linux/bootmem.h
+++ b/include/linux/bootmem.h
@@ -44,7 +44,6 @@ extern unsigned long init_bootmem_node(pg_data_t *pgdat,
   unsigned long endpfn);
 extern unsigned long init_bootmem(unsigned long addr, unsigned long memend);
 
-extern unsigned long free_low_memory_core_early(int nodeid);
 extern unsigned long free_all_bootmem_node(pg_data_t *pgdat);
 extern unsigned long free_all_bootmem(void);
 
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index 4711e91..9c38698 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -120,7 +120,7 @@ static unsigned long __init __free_memory_core(phys_addr_t 
start,
return end_pfn - start_pfn;
 }
 
-unsigned long __init free_low_memory_core_early(int nodeid)
+static unsigned long __init free_low_memory_core_early(void)
 {
unsigned long count = 0;
phys_addr_t start, end, size;
@@ -170,7 +170,7 @@ unsigned long __init free_all_bootmem(void)
 *  because in some case like Node0 doesn't have RAM installed
 *  low ram will be on Node1
 */
-   return free_low_memory_core_early(MAX_NUMNODES);
+   return free_low_memory_core_early();
 }
 
 /**
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] mm, nobootmem: do memset() after memblock_reserve()

2013-03-18 Thread Joonsoo Kim
On Mon, Mar 18, 2013 at 10:53:04PM -0700, Yinghai Lu wrote:
 On Mon, Mar 18, 2013 at 10:16 PM, Joonsoo Kim iamjoonsoo@lge.com wrote:
  Currently, we do memset() before reserving the area.
  This may not cause any problem, but it is somewhat weird.
  So change execution order.
 
  Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
 
  diff --git a/mm/nobootmem.c b/mm/nobootmem.c
  index 589c673..f11ec1c 100644
  --- a/mm/nobootmem.c
  +++ b/mm/nobootmem.c
  @@ -46,8 +46,8 @@ static void * __init __alloc_memory_core_early(int nid, 
  u64 size, u64 align,
  return NULL;
 
  ptr = phys_to_virt(addr);
  -   memset(ptr, 0, size);
  memblock_reserve(addr, size);
  +   memset(ptr, 0, size);
 
 move down ptr = ... too ?
Okay.
I will send v2 soon.

 
  /*
   * The min_count is set to 0 so that bootmem allocated blocks
   * are never reported as leaks.
  --
  1.7.9.5
 
 
 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] mm, nobootmem: clean-up of free_low_memory_core_early()

2013-03-18 Thread Joonsoo Kim
On Mon, Mar 18, 2013 at 10:51:43PM -0700, Yinghai Lu wrote:
 On Mon, Mar 18, 2013 at 10:16 PM, Joonsoo Kim iamjoonsoo@lge.com wrote:
  Remove unused argument and make function static,
  because there is no user outside of nobootmem.c
 
  Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
 
  diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h
  index cdc3bab..5f0b0e1 100644
  --- a/include/linux/bootmem.h
  +++ b/include/linux/bootmem.h
  @@ -44,7 +44,6 @@ extern unsigned long init_bootmem_node(pg_data_t *pgdat,
 unsigned long endpfn);
   extern unsigned long init_bootmem(unsigned long addr, unsigned long 
  memend);
 
  -extern unsigned long free_low_memory_core_early(int nodeid);
   extern unsigned long free_all_bootmem_node(pg_data_t *pgdat);
   extern unsigned long free_all_bootmem(void);
 
  diff --git a/mm/nobootmem.c b/mm/nobootmem.c
  index 4711e91..589c673 100644
  --- a/mm/nobootmem.c
  +++ b/mm/nobootmem.c
  @@ -120,7 +120,7 @@ static unsigned long __init 
  __free_memory_core(phys_addr_t start,
  return end_pfn - start_pfn;
   }
 
  -unsigned long __init free_low_memory_core_early(int nodeid)
  +static unsigned long __init free_low_memory_core_early()
 
 (void) ?

Yes, fixed version is already sent.
Thanks.

 
 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] mm, nobootmem: fix wrong usage of max_low_pfn

2013-03-19 Thread Joonsoo Kim
On Mon, Mar 18, 2013 at 10:47:41PM -0700, Yinghai Lu wrote:
 On Mon, Mar 18, 2013 at 10:15 PM, Joonsoo Kim iamjoonsoo@lge.com wrote:
  max_low_pfn reflect the number of _pages_ in the system,
  not the maximum PFN. You can easily find that fact in init_bootmem().
  So fix it.
 
 I'm confused. for x86, we have max_low_pfn defined in ...

Below is queote from Russell King in 'https://lkml.org/lkml/2013/3/13/123'


 Now, max_low_pfn is initialized this way:
 
 /**
  * init_bootmem - register boot memory
  * @start: pfn where the bitmap is to be placed
  * @pages: number of available physical pages
  *
  * Returns the number of bytes needed to hold the bitmap.
  */
 unsigned long __init init_bootmem(unsigned long start, unsigned long pages)
 {
max_low_pfn = pages;
min_low_pfn = start;
return init_bootmem_core(NODE_DATA(0)-bdata, start, 0, pages);
 }
 So, min_low_pfn is the PFN offset of the start of physical memory (so
 3GB  PAGE_SHIFT) and max_low_pfn ends up being the number of pages,
 _not_ the maximum PFN value

So, if physical address doesn't start at 0, max_low_pfn doesn't represent 
the maximum PFN value. This is a case for ARM.

 
 #ifdef CONFIG_X86_32
 /* max_low_pfn get updated here */
 find_low_pfn_range();
 #else
 num_physpages = max_pfn;
 
 check_x2apic();
 
 /* How many end-of-memory variables you have, grandma! */
 /* need this before calling reserve_initrd */
 if (max_pfn  (1UL(32 - PAGE_SHIFT)))
 max_low_pfn = e820_end_of_low_ram_pfn();
 else
 max_low_pfn = max_pfn;
 
 and under max_low_pfn is bootmem.
 
 
  Additionally, if 'start_pfn == end_pfn', we don't need to go futher,
  so change range check.
 
  Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
 
  diff --git a/mm/nobootmem.c b/mm/nobootmem.c
  index 5e07d36..4711e91 100644
  --- a/mm/nobootmem.c
  +++ b/mm/nobootmem.c
  @@ -110,9 +110,9 @@ static unsigned long __init 
  __free_memory_core(phys_addr_t start,
   {
  unsigned long start_pfn = PFN_UP(start);
  unsigned long end_pfn = min_t(unsigned long,
  - PFN_DOWN(end), max_low_pfn);
  + PFN_DOWN(end), min_low_pfn);
 
 what is min_low_pfn ?  is it 0 for x86?

My implementation is totally wrong. :)
min_low_pfn is not proper value for this purpose.

I will fix it.
Sorry for noise.

Thanks.

 
 Thanks
 
 Yinghai
 
 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] mm, nobootmem: fix wrong usage of max_low_pfn

2013-03-19 Thread Joonsoo Kim
On Tue, Mar 19, 2013 at 03:25:22PM +0900, Joonsoo Kim wrote:
 On Mon, Mar 18, 2013 at 10:47:41PM -0700, Yinghai Lu wrote:
  On Mon, Mar 18, 2013 at 10:15 PM, Joonsoo Kim iamjoonsoo@lge.com 
  wrote:
   max_low_pfn reflect the number of _pages_ in the system,
   not the maximum PFN. You can easily find that fact in init_bootmem().
   So fix it.
  
  I'm confused. for x86, we have max_low_pfn defined in ...
 
 Below is queote from Russell King in 'https://lkml.org/lkml/2013/3/13/123'
 
 
  Now, max_low_pfn is initialized this way:
  
  /**
   * init_bootmem - register boot memory
   * @start: pfn where the bitmap is to be placed
   * @pages: number of available physical pages
   *
   * Returns the number of bytes needed to hold the bitmap.
   */
  unsigned long __init init_bootmem(unsigned long start, unsigned long pages)
  {
 max_low_pfn = pages;
 min_low_pfn = start;
 return init_bootmem_core(NODE_DATA(0)-bdata, start, 0, pages);
  }
  So, min_low_pfn is the PFN offset of the start of physical memory (so
  3GB  PAGE_SHIFT) and max_low_pfn ends up being the number of pages,
  _not_ the maximum PFN value
 
 So, if physical address doesn't start at 0, max_low_pfn doesn't represent 
 the maximum PFN value. This is a case for ARM.
 
  
  #ifdef CONFIG_X86_32
  /* max_low_pfn get updated here */
  find_low_pfn_range();
  #else
  num_physpages = max_pfn;
  
  check_x2apic();
  
  /* How many end-of-memory variables you have, grandma! */
  /* need this before calling reserve_initrd */
  if (max_pfn  (1UL(32 - PAGE_SHIFT)))
  max_low_pfn = e820_end_of_low_ram_pfn();
  else
  max_low_pfn = max_pfn;
  
  and under max_low_pfn is bootmem.
  
  
   Additionally, if 'start_pfn == end_pfn', we don't need to go futher,
   so change range check.
  
   Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
  
   diff --git a/mm/nobootmem.c b/mm/nobootmem.c
   index 5e07d36..4711e91 100644
   --- a/mm/nobootmem.c
   +++ b/mm/nobootmem.c
   @@ -110,9 +110,9 @@ static unsigned long __init 
   __free_memory_core(phys_addr_t start,
{
   unsigned long start_pfn = PFN_UP(start);
   unsigned long end_pfn = min_t(unsigned long,
   - PFN_DOWN(end), max_low_pfn);
   + PFN_DOWN(end), min_low_pfn);
  
  what is min_low_pfn ?  is it 0 for x86?
 
 My implementation is totally wrong. :)
 min_low_pfn is not proper value for this purpose.
 
 I will fix it.
 Sorry for noise.
 
 Thanks.

How about using memblock.current_limit?

unsigned long end_pfn = min_t(unsigned long, PFN_DOWN(end),
memblock.current_limit);

Thanks.

 
  
  Thanks
  
  Yinghai
  
  --
  To unsubscribe, send a message with 'unsubscribe linux-mm' in
  the body to majord...@kvack.org.  For more info on Linux MM,
  see: http://www.linux-mm.org/ .
  Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a
 
 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] mm, nobootmem: fix wrong usage of max_low_pfn

2013-03-19 Thread Joonsoo Kim
On Tue, Mar 19, 2013 at 12:35:45AM -0700, Yinghai Lu wrote:
 Can you check why sparc do not need to change interface during converting
 to use memblock to replace bootmem?

Sure.
According to my understanding to sparc32 code(arch/sparc/mm/init_32.c),
they already use max_low_pfn as the maximum PFN value,
not as the number of pages.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] sched: explicitly cpu_idle_type checking in rebalance_domains()

2013-03-20 Thread Joonsoo Kim
Hello, Peter.

On Tue, Mar 19, 2013 at 03:02:21PM +0100, Peter Zijlstra wrote:
 On Thu, 2013-02-14 at 14:48 +0900, Joonsoo Kim wrote:
  After commit 88b8dac0, dst-cpu can be changed in load_balance(),
  then we can't know cpu_idle_type of dst-cpu when load_balance()
  return positive. So, add explicit cpu_idle_type checking.
 
 No real objection I suppose, but did you actually see this go wrong?

No, I found it while I review whole scheduler code.
Thanks.

 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/8] sched: don't consider other cpus in our group in case of NEWLY_IDLE

2013-03-20 Thread Joonsoo Kim
On Tue, Mar 19, 2013 at 03:20:57PM +0100, Peter Zijlstra wrote:
 On Thu, 2013-02-14 at 14:48 +0900, Joonsoo Kim wrote:
  Commit 88b8dac0 makes load_balance() consider other cpus in its group,
  regardless of idle type. When we do NEWLY_IDLE balancing, we should not
  consider it, because a motivation of NEWLY_IDLE balancing is to turn
  this cpu to non idle state if needed. This is not the case of other cpus.
  So, change code not to consider other cpus for NEWLY_IDLE balancing.
  
  With this patch, assign 'if (pulled_task) this_rq-idle_stamp = 0'
  in idle_balance() is corrected, because NEWLY_IDLE balancing doesn't
  consider other cpus. Assigning to 'this_rq-idle_stamp' is now valid.
  
  Cc: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
  Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
 
 Fair enough, good catch.
 
 Acked-by: Peter Zijlstra a.p.zijls...@chello.nl
 
  diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
  index 0c6aaf6..97498f4 100644
  --- a/kernel/sched/fair.c
  +++ b/kernel/sched/fair.c
  @@ -5016,8 +5016,15 @@ static int load_balance(int this_cpu, struct rq 
  *this_rq,
  .cpus   = cpus,
  };
   
  +   /* For NEWLY_IDLE load_balancing, we don't need to consider
  +* other cpus in our group */
  +   if (idle == CPU_NEWLY_IDLE) {
  +   env.dst_grpmask = NULL;
  +   max_lb_iterations = 0;
 
 Just a small nit; I don't think we'll ever get to evaluate
 max_lb_iterations when !dst_grpmask. So strictly speaking its
 superfluous to touch it.

Okay. In next spin, I will remove it and add a comment here.

Thanks.

 
  +   } else {
  +   max_lb_iterations = cpumask_weight(env.dst_grpmask);
  +   }
  cpumask_copy(cpus, cpu_active_mask);
  -   max_lb_iterations = cpumask_weight(env.dst_grpmask);
   
  schedstat_inc(sd, lb_count[idle]);
   
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/8] sched: clean up move_task() and move_one_task()

2013-03-20 Thread Joonsoo Kim
On Tue, Mar 19, 2013 at 03:30:15PM +0100, Peter Zijlstra wrote:
 On Thu, 2013-02-14 at 14:48 +0900, Joonsoo Kim wrote:
  Some validation for task moving is performed in move_tasks() and
  move_one_task(). We can move these code to can_migrate_task()
  which is already exist for this purpose.
 
  @@ -4011,18 +4027,7 @@ static int move_tasks(struct lb_env *env)
  break;
  }
   
  -   if (throttled_lb_pair(task_group(p), env-src_cpu, 
  env-dst_cpu))
  -   goto next;
  -
  -   load = task_h_load(p);
  -
  -   if (sched_feat(LB_MIN)  load  16  
  !env-sd-nr_balance_failed)
  -   goto next;
  -
  -   if ((load / 2)  env-imbalance)
  -   goto next;
  -
  -   if (!can_migrate_task(p, env))
  +   if (!can_migrate_task(p, env, false, load))
  goto next;
   
  move_task(p, env);
 
 Right, so I'm not so taken with this one. The whole load stuff really
 is a balance heuristic that's part of move_tasks(), move_one_task()
 really doesn't care about that.
 
 So why did you include it? Purely so you didn't have to re-order the
 tests? I don't see any reason not to flip a tests around.

I think that I'm not fully understand what you are concerning, because of
my poor English. If possible, please elaborate on a problem in more detail.

First of all, I do my best to answer your question.

Patch 4/8, 5/8 are for mitigating useless redoing overhead caused
by LBF_ALL_PINNED. For this purpose, we should check 'cpu affinity'
before evaluating a load. Just moving up can_migrate_task() above
load evaluation code may raise side effect, because can_migrate_task() have
other checking which is 'cache hottness'. I don't want a side effect. So
embedding load evaluation to can_migrate_task() and re-order checking and
makes load evaluation disabled for move_one_task().

If your recommandation is to move up can_mirate_task() above
load evaluation code, yes, I can, and will do that. :)

Please let me know what I am misunderstand.

Thanks.

 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/8] sched: rename load_balance_tmpmask to load_balance_cpu_active

2013-03-20 Thread Joonsoo Kim
On Tue, Mar 19, 2013 at 04:01:01PM +0100, Peter Zijlstra wrote:
 On Thu, 2013-02-14 at 14:48 +0900, Joonsoo Kim wrote:
  This name doesn't represent specific meaning.
  So rename it to imply it's purpose.
  
  Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
  
  diff --git a/kernel/sched/core.c b/kernel/sched/core.c
  index 26058d0..e6f8783 100644
  --- a/kernel/sched/core.c
  +++ b/kernel/sched/core.c
  @@ -6814,7 +6814,7 @@ struct task_group root_task_group;
   LIST_HEAD(task_groups);
   #endif
   
  -DECLARE_PER_CPU(cpumask_var_t, load_balance_tmpmask);
  +DECLARE_PER_CPU(cpumask_var_t, load_balance_cpu_active);
 
 That's not much better; how about we call it: load_balance_mask.

Okay, in next spin, I will call it as load_balance_mask.

Thanks.

 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/8] sched: prevent to re-select dst-cpu in load_balance()

2013-03-20 Thread Joonsoo Kim
On Tue, Mar 19, 2013 at 04:05:46PM +0100, Peter Zijlstra wrote:
 On Thu, 2013-02-14 at 14:48 +0900, Joonsoo Kim wrote:
  Commit 88b8dac0 makes load_balance() consider other cpus in its group.
  But, in that, there is no code for preventing to re-select dst-cpu.
  So, same dst-cpu can be selected over and over.
  
  This patch add functionality to load_balance() in order to exclude
  cpu which is selected once.
 
 Oh man.. seriously? Did you see this happen? Also, can't we simply
 remove it from lb-cpus?

I didn't see it, I do just logical thinking. :)
lb-cpus is for source cpus and dst-cpu is for dest cpus. So, it doesn't
works to remove it from lb-cpus.

Thanks.

 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 8/8] sched: reset lb_env when redo in load_balance()

2013-03-20 Thread Joonsoo Kim
On Tue, Mar 19, 2013 at 04:21:23PM +0100, Peter Zijlstra wrote:
 On Thu, 2013-02-14 at 14:48 +0900, Joonsoo Kim wrote:
  Commit 88b8dac0 makes load_balance() consider other cpus in its group.
  So, now, When we redo in load_balance(), we should reset some fields of
  lb_env to ensure that load_balance() works for initial cpu, not for other
  cpus in its group. So correct it.
  
  Cc: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
  Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
  
  diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
  index 70631e8..25c798c 100644
  --- a/kernel/sched/fair.c
  +++ b/kernel/sched/fair.c
  @@ -5014,14 +5014,20 @@ static int load_balance(int this_cpu, struct rq 
  *this_rq,
   
  struct lb_env env = {
  .sd = sd,
  -   .dst_cpu= this_cpu,
  -   .dst_rq = this_rq,
  .dst_grpmask= dst_grp,
  .idle   = idle,
  -   .loop_break = sched_nr_migrate_break,
  .cpus   = cpus,
  };
   
  +   schedstat_inc(sd, lb_count[idle]);
  +   cpumask_copy(cpus, cpu_active_mask);
  +
  +redo:
  +   env.dst_cpu = this_cpu;
  +   env.dst_rq = this_rq;
  +   env.loop = 0;
  +   env.loop_break = sched_nr_migrate_break;
  +
  /* For NEWLY_IDLE load_balancing, we don't need to consider
   * other cpus in our group */
  if (idle == CPU_NEWLY_IDLE) {
 
 OK, so this is the case where we tried to balance !this_cpu and found
 ALL_PINNED, right?
 
 You can only get here in very weird cases where people love their
 sched_setaffinity() way too much, do we care? Why not give up?

Now that you mentioned it, I have no enough reason for this patch.
I think that giving up is more preferable to me.
I will omit this patch for next spin.

 
 Also, looking at this, shouldn't we consider env-cpus in
 can_migrate_task() where we compute new_dst_cpu?

As previously stated, env-cpus is for src cpus, so when we decide dst_cpu,
it doesn't matter.

Really thanks for detailed review to all this patchset.
Thanks.

 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/8] sched: clean up move_task() and move_one_task()

2013-03-20 Thread JoonSoo Kim
2013/3/20 Peter Zijlstra pet...@infradead.org:
 On Wed, 2013-03-20 at 16:33 +0900, Joonsoo Kim wrote:

  Right, so I'm not so taken with this one. The whole load stuff really
  is a balance heuristic that's part of move_tasks(), move_one_task()
  really doesn't care about that.
 
  So why did you include it? Purely so you didn't have to re-order the
  tests? I don't see any reason not to flip a tests around.

 I think that I'm not fully understand what you are concerning, because of
 my poor English. If possible, please elaborate on a problem in more detail.

 OK, so your initial Changelog said it wanted to remove some code
 duplication between move_tasks() and move_one_task(); but then you put
 in the load heuristics and add a boolean argument to only enable those
 for move_tasks() -- so clearly that wasn't duplicated.

 So why move that code.. I proposed that this was due a reluctance to
 re-arrange the various tests that stop the migration from happening.

 Now you say:

 ... Just moving up can_migrate_task() above
 load evaluation code may raise side effect, because can_migrate_task() have
 other checking which is 'cache hottness'. I don't want a side effect. So
 embedding load evaluation to can_migrate_task() and re-order checking and
 makes load evaluation disabled for move_one_task().

 Which pretty much affirms this. However I also said that I don't think
 the order really matters that much; each test will cancel the migration
 of this task; the order of these tests seem immaterial.

 If your recommandation is to move up can_mirate_task() above
 load evaluation code, yes, I can, and will do that. :)

 I would actually propose moving the throttled test into
 can_migrate_task() and leave it at that.

Okay. I will do that in next spin.

Thanks!!


 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/8] sched: prevent to re-select dst-cpu in load_balance()

2013-03-20 Thread JoonSoo Kim
2013/3/20 Peter Zijlstra pet...@infradead.org:
 On Wed, 2013-03-20 at 16:43 +0900, Joonsoo Kim wrote:
 On Tue, Mar 19, 2013 at 04:05:46PM +0100, Peter Zijlstra wrote:
  On Thu, 2013-02-14 at 14:48 +0900, Joonsoo Kim wrote:
   Commit 88b8dac0 makes load_balance() consider other cpus in its group.
   But, in that, there is no code for preventing to re-select dst-cpu.
   So, same dst-cpu can be selected over and over.
  
   This patch add functionality to load_balance() in order to exclude
   cpu which is selected once.
 
  Oh man.. seriously? Did you see this happen? Also, can't we simply
  remove it from lb-cpus?

 I didn't see it, I do just logical thinking. :)
 lb-cpus is for source cpus and dst-cpu is for dest cpus. So, it doesn't
 works to remove it from lb-cpus.

 How about we interpret -cpus as the total mask to balance; so both
 source and destination. That way clearing a cpu means we won't take nor
 put tasks on it.

In my quick thought, it may be possible.
I will try to do it.

Thanks.

 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET wq/for-3.10] workqueue: break up workqueue_lock into multiple locks

2013-03-20 Thread JoonSoo Kim
2013/3/19 Tejun Heo t...@kernel.org:
 On Wed, Mar 13, 2013 at 07:57:18PM -0700, Tejun Heo wrote:
 and available in the following git branch.

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git review-finer-locking

 Applied to wq/for-3.10.

Hello, Tejun.

I know I am late, but, please give me a change to ask a question.

Finer locking for workqueue code is really needed?
Is there a performance issue?
I think that there is too many locks and locking rules,
although the description about these are very nice.

Thanks.

 Thanks.

 --
 tejun
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/10] workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[]

2013-03-20 Thread JoonSoo Kim
2013/3/20 Tejun Heo t...@kernel.org:
 Unbound workqueues are going to be NUMA-affine.  Add wq_numa_tbl_len
 and wq_numa_possible_cpumask[] in preparation.  The former is the
 highest NUMA node ID + 1 and the latter is masks of possibles CPUs for
 each NUMA node.

It is better to move this code to topology.c or cpumask.c,
then it can be generally used.

Thanks.

 This patch only introduces these.  Future patches will make use of
 them.

 Signed-off-by: Tejun Heo t...@kernel.org
 ---
  kernel/workqueue.c | 35 ++-
  1 file changed, 34 insertions(+), 1 deletion(-)

 diff --git a/kernel/workqueue.c b/kernel/workqueue.c
 index 775c2f4..9b096e3 100644
 --- a/kernel/workqueue.c
 +++ b/kernel/workqueue.c
 @@ -44,6 +44,7 @@
  #include linux/jhash.h
  #include linux/hashtable.h
  #include linux/rculist.h
 +#include linux/nodemask.h

  #include workqueue_internal.h

 @@ -256,6 +257,11 @@ struct workqueue_struct {

  static struct kmem_cache *pwq_cache;

 +static int wq_numa_tbl_len;/* highest possible NUMA node id + 1 
 */
 +static cpumask_var_t *wq_numa_possible_cpumask;
 +   /* possible CPUs of each node, may
 +  be NULL if init failed */
 +
  static DEFINE_MUTEX(wq_mutex); /* protects workqueues and pools */
  static DEFINE_SPINLOCK(pwq_lock);  /* protects pool_workqueues */
  static DEFINE_SPINLOCK(wq_mayday_lock);/* protects wq-maydays list 
 */
 @@ -4416,7 +4422,7 @@ out_unlock:
  static int __init init_workqueues(void)
  {
 int std_nice[NR_STD_WORKER_POOLS] = { 0, HIGHPRI_NICE_LEVEL };
 -   int i, cpu;
 +   int i, node, cpu;

 /* make sure we have enough bits for OFFQ pool ID */
 BUILD_BUG_ON((1LU  (BITS_PER_LONG - WORK_OFFQ_POOL_SHIFT)) 
 @@ -4429,6 +4435,33 @@ static int __init init_workqueues(void)
 cpu_notifier(workqueue_cpu_up_callback, CPU_PRI_WORKQUEUE_UP);
 hotcpu_notifier(workqueue_cpu_down_callback, CPU_PRI_WORKQUEUE_DOWN);

 +   /* determine NUMA pwq table len - highest node id + 1 */
 +   for_each_node(node)
 +   wq_numa_tbl_len = max(wq_numa_tbl_len, node + 1);
 +
 +   /*
 +* We want masks of possible CPUs of each node which isn't readily
 +* available.  Build one from cpu_to_node() which should have been
 +* fully initialized by now.
 +*/
 +   wq_numa_possible_cpumask = kzalloc(wq_numa_tbl_len *
 +  
 sizeof(wq_numa_possible_cpumask[0]),
 +  GFP_KERNEL);
 +   BUG_ON(!wq_numa_possible_cpumask);
 +
 +   for_each_node(node)
 +   
 BUG_ON(!alloc_cpumask_var_node(wq_numa_possible_cpumask[node],
 +  GFP_KERNEL, node));
 +   for_each_possible_cpu(cpu) {
 +   node = cpu_to_node(cpu);
 +   if (WARN_ON(node == NUMA_NO_NODE)) {
 +   pr_err(workqueue: NUMA node mapping not available 
 for cpu%d, disabling NUMA support\n, cpu);
 +   wq_numa_possible_cpumask = NULL;
 +   break;
 +   }
 +   cpumask_set_cpu(cpu, wq_numa_possible_cpumask[node]);
 +   }
 +
 /* initialize CPU pools */
 for_each_possible_cpu(cpu) {
 struct worker_pool *pool;
 --
 1.8.1.4

 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH 0/6] ARM: use NO_BOOTMEM on default configuration

2013-03-24 Thread Joonsoo Kim
Currently, ARM use traditional 'bootmem' allocator. It use a bitmap for
managing memory space, so initialize a bitmap at first step. It is
a needless overhead if we use 'nobootmem'. 'nobootmem' use a memblock
allocator internally, so there is no additional initializing overhead.
In addition, if we use 'nobootmem', we can save small amount of memory,
because 'nobootmem' manage memory space as byte unit. However, 
'bootmem' manage it as page unit, so some space is wasted,
although it is very small. On my system, 20 KB memories can be saved. :)
Using 'nobootmem' have another advantage. Before initializing 'bootmem'
allocator, we use memblock allocator. If we use memblock allocator
after initializing 'bootmem' by mistake, there is possible problem.
Patch '1/6' is good example of this case. 'nobootmem' use memblock
allocator internally, so these risk will be disappeared.

There is one stopper to enable NO_BOOTMEM, it is max_low_pfn.
nobootmem use max_low_pfn for computing boundary in free_all_bootmem()
So we need proper value to max_low_pfn.
But, there is some difficulty related to max_low_pfn. max_low_pfn is used
for two meanings in various architectures. One is for number of pages
in lowmem and the other is for maximum lowmem pfn. Now, in ARM, it is used
as number of pages in lowmem. You can get more information in below link.
http://lwn.net/Articles/543408/
http://lwn.net/Articles/543424/

As I investigated, architectures which use max_low_pfn as maximum pfn are
more than others, so IMHO, to change meaning of max_low_pfn to maximum pfn
is preferable solution to me. This patchset change max_low_pfn as maximum
lowmem pfn in ARM. In addition, min_low_pfn, max_pfn is assigned according
to this criteria.

AFAIK, there is no real user for max_low_pfn except block/blk-setting.c
and blk-setting.c assume that max_low_pfn is maximum lowmem pfn,
so this patch may not harm anything. But, I'm not expert about this,
so please let me know what I am missing.

I did some working test on my android device and it worked. :)
Feel free to give me some opinion about this patset.
This patchset is based on v3.9-rc4.

Thanks.

Joonsoo Kim (6):
  ARM, TCM: initialize TCM in paging_init(), instead of setup_arch()
  ARM, crashkernel: use ___alloc_bootmem_node_nopanic() for reserving
memory
  ARM, crashkernel: correct total_mem size in reserve_crashkernel()
  ARM, mm: don't do arm_bootmem_init() if CONFIG_NO_BOOTMEM
  ARM, mm: change meaning of max_low_pfn to maximum pfn for nobootmem
  ARM, mm: enable NO_BOOTMEM for default ARM build

 arch/arm/Kconfig|1 +
 arch/arm/kernel/setup.c |   22 --
 arch/arm/kernel/tcm.c   |1 -
 arch/arm/kernel/tcm.h   |   17 -
 arch/arm/mm/init.c  |   19 ---
 arch/arm/mm/mmu.c   |2 ++
 arch/arm/mm/tcm.h   |   17 +
 7 files changed, 40 insertions(+), 39 deletions(-)
 delete mode 100644 arch/arm/kernel/tcm.h
 create mode 100644 arch/arm/mm/tcm.h

-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH 4/6] ARM, mm: don't do arm_bootmem_init() if CONFIG_NO_BOOTMEM

2013-03-24 Thread Joonsoo Kim
arm_bootmem_init() initialize a bitmap for bootmem and
it is not needed for CONFIG_NO_BOOTMEM.
So skip it when CONFIG_NO_BOOTMEM.

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index ad722f1..049414a 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -149,6 +149,12 @@ static void __init find_limits(unsigned long *min, 
unsigned long *max_low,
*max_high = bank_pfn_end(mi-bank[mi-nr_banks - 1]);
 }
 
+#ifdef CONFIG_NO_BOOTMEM
+static inline void __init arm_bootmem_init(unsigned long start_pfn,
+   unsigned long end_pfn)
+{
+}
+#else
 static void __init arm_bootmem_init(unsigned long start_pfn,
unsigned long end_pfn)
 {
@@ -169,7 +175,6 @@ static void __init arm_bootmem_init(unsigned long start_pfn,
 * Initialise the bootmem allocator, handing the
 * memory banks over to bootmem.
 */
-   node_set_online(0);
pgdat = NODE_DATA(0);
init_bootmem_node(pgdat, __phys_to_pfn(bitmap), start_pfn, end_pfn);
 
@@ -200,6 +205,7 @@ static void __init arm_bootmem_init(unsigned long start_pfn,
(end - start)  PAGE_SHIFT, BOOTMEM_DEFAULT);
}
 }
+#endif
 
 #ifdef CONFIG_ZONE_DMA
 
@@ -392,6 +398,7 @@ void __init bootmem_init(void)
 
find_limits(min, max_low, max_high);
 
+   node_set_online(0);
arm_bootmem_init(min, max_low);
 
/*
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH 2/6] ARM, crashkernel: use ___alloc_bootmem_node_nopanic() for reserving memory

2013-03-24 Thread Joonsoo Kim
For crashkernel, specific address should be reserved.
It can be achived by reserve_bootmem(), but this function is
only for bootmem.

Now, we try to enable nobootmem, therfore change it more general function,
___alloc_bootmem_node_nopanic(). It can be use for both,
bootmem and nobootmem.

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index b3990a3..99ffe87 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -674,15 +674,20 @@ static void __init reserve_crashkernel(void)
 {
unsigned long long crash_size, crash_base;
unsigned long long total_mem;
+   unsigned long limit = 0;
int ret;
 
total_mem = get_total_mem();
ret = parse_crashkernel(boot_command_line, total_mem,
crash_size, crash_base);
-   if (ret)
+   if (ret != 0 || crash_size == 0)
return;
 
-   ret = reserve_bootmem(crash_base, crash_size, BOOTMEM_EXCLUSIVE);
+   if (crash_base != 0)
+   limit = crash_base + crash_size;
+
+   ret = ___alloc_bootmem_node_nopanic(NODE_DATA(0), crash_size,
+   PAGE_ALIGN, crash_base, limit);
if (ret  0) {
printk(KERN_WARNING crashkernel reservation failed - 
   memory is in use (0x%lx)\n, (unsigned long)crash_base);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH 1/6] ARM, TCM: initialize TCM in paging_init(), instead of setup_arch()

2013-03-24 Thread Joonsoo Kim
tcm_init() call iotable_init() and it use early_alloc variants which
do memblock allocation. Directly using memblock allocation after
initializing bootmem should not permitted, because bootmem can't know
where are additinally reserved.
So move tcm_init() to a safe place before initalizing bootmem.

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 3f6cbb2..b3990a3 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -56,7 +56,6 @@
 #include asm/virt.h
 
 #include atags.h
-#include tcm.h
 
 
 #if defined(CONFIG_FPE_NWFPE) || defined(CONFIG_FPE_FASTFPE)
@@ -778,8 +777,6 @@ void __init setup_arch(char **cmdline_p)
 
reserve_crashkernel();
 
-   tcm_init();
-
 #ifdef CONFIG_MULTI_IRQ_HANDLER
handle_arch_irq = mdesc-handle_irq;
 #endif
diff --git a/arch/arm/kernel/tcm.c b/arch/arm/kernel/tcm.c
index 30ae6bb..f50f19e 100644
--- a/arch/arm/kernel/tcm.c
+++ b/arch/arm/kernel/tcm.c
@@ -17,7 +17,6 @@
 #include asm/mach/map.h
 #include asm/memory.h
 #include asm/system_info.h
-#include tcm.h
 
 static struct gen_pool *tcm_pool;
 static bool dtcm_present;
diff --git a/arch/arm/kernel/tcm.h b/arch/arm/kernel/tcm.h
deleted file mode 100644
index 8015ad4..000
--- a/arch/arm/kernel/tcm.h
+++ /dev/null
@@ -1,17 +0,0 @@
-/*
- * Copyright (C) 2008-2009 ST-Ericsson AB
- * License terms: GNU General Public License (GPL) version 2
- * TCM memory handling for ARM systems
- *
- * Author: Linus Walleij linus.wall...@stericsson.com
- * Author: Rickard Andersson rickard.anders...@stericsson.com
- */
-
-#ifdef CONFIG_HAVE_TCM
-void __init tcm_init(void);
-#else
-/* No TCM support, just blank inlines to be optimized out */
-inline void tcm_init(void)
-{
-}
-#endif
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index e95a996..a7f127d 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -34,6 +34,7 @@
 #include asm/mach/pci.h
 
 #include mm.h
+#include tcm.h
 
 /*
  * empty_zero_page is a special page that is used for
@@ -1256,6 +1257,7 @@ void __init paging_init(struct machine_desc *mdesc)
dma_contiguous_remap();
devicemaps_init(mdesc);
kmap_init();
+   tcm_init();
 
top_pmd = pmd_off_k(0x);
 
diff --git a/arch/arm/mm/tcm.h b/arch/arm/mm/tcm.h
new file mode 100644
index 000..8015ad4
--- /dev/null
+++ b/arch/arm/mm/tcm.h
@@ -0,0 +1,17 @@
+/*
+ * Copyright (C) 2008-2009 ST-Ericsson AB
+ * License terms: GNU General Public License (GPL) version 2
+ * TCM memory handling for ARM systems
+ *
+ * Author: Linus Walleij linus.wall...@stericsson.com
+ * Author: Rickard Andersson rickard.anders...@stericsson.com
+ */
+
+#ifdef CONFIG_HAVE_TCM
+void __init tcm_init(void);
+#else
+/* No TCM support, just blank inlines to be optimized out */
+inline void tcm_init(void)
+{
+}
+#endif
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH 3/6] ARM, crashkernel: correct total_mem size in reserve_crashkernel()

2013-03-24 Thread Joonsoo Kim
There is some platforms which have highmem, so this equation
doesn't represent total_mem size properly.
In addition, max_low_pfn's meaning is different in other architecture and
it is scheduled to be changed, so remove related code to max_low_pfn.

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 99ffe87..1149988 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -655,14 +655,6 @@ static int __init init_machine_late(void)
 late_initcall(init_machine_late);
 
 #ifdef CONFIG_KEXEC
-static inline unsigned long long get_total_mem(void)
-{
-   unsigned long total;
-
-   total = max_low_pfn - min_low_pfn;
-   return total  PAGE_SHIFT;
-}
-
 /**
  * reserve_crashkernel() - reserves memory are for crash kernel
  *
@@ -677,7 +669,7 @@ static void __init reserve_crashkernel(void)
unsigned long limit = 0;
int ret;
 
-   total_mem = get_total_mem();
+   total_mem = memblock_phys_mem_size();
ret = parse_crashkernel(boot_command_line, total_mem,
crash_size, crash_base);
if (ret != 0 || crash_size == 0)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH 5/6] ARM, mm: change meaning of max_low_pfn to maximum pfn for nobootmem

2013-03-24 Thread Joonsoo Kim
nobootmem use max_low_pfn for computing boundary in free_all_bootmem()
So we need proper value to max_low_pfn.

But, there is some difficulty related to max_low_pfn. max_low_pfn is used
for two meanings in various architectures. One is for number of pages
in lowmem and the other is for maximum lowmem pfn. Now, in ARM, it is used
as number of pages in lowmem. You can get more information in below link.
http://lwn.net/Articles/543408/
http://lwn.net/Articles/543424/

As I investigated, architectures which use max_low_pfn as maximum pfn are
more than others, so to change meaning of max_low_pfn to maximum pfn
is preferable solution to me. This patch change max_low_pfn as maximum
lowmem pfn in ARM. In addition, min_low_pfn, max_pfn is assigned according
to this criteria.

There is no real user for max_low_pfn except block/blk-setting.c and
blk-setting.c assume that max_low_pfn is maximum lowmem pfn,
so this patch may not harm anything.

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 049414a..873f4ca 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -423,12 +423,10 @@ void __init bootmem_init(void)
 * This doesn't seem to be used by the Linux memory manager any
 * more, but is used by ll_rw_block.  If we can get rid of it, we
 * also get rid of some of the stuff above as well.
-*
-* Note: max_low_pfn and max_pfn reflect the number of _pages_ in
-* the system, not the maximum PFN.
 */
-   max_low_pfn = max_low - PHYS_PFN_OFFSET;
-   max_pfn = max_high - PHYS_PFN_OFFSET;
+   min_low_pfn = min;
+   max_low_pfn = max_low;
+   max_pfn = max_high;
 }
 
 static inline int free_area(unsigned long pfn, unsigned long end, char *s)
@@ -544,7 +542,7 @@ static void __init free_unused_memmap(struct meminfo *mi)
 static void __init free_highpages(void)
 {
 #ifdef CONFIG_HIGHMEM
-   unsigned long max_low = max_low_pfn + PHYS_PFN_OFFSET;
+   unsigned long max_low = max_low_pfn;
struct memblock_region *mem, *res;
 
/* set highmem page free */
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH 6/6] ARM, mm: enable NO_BOOTMEM for default ARM build

2013-03-24 Thread Joonsoo Kim
If we use NO_BOOTMEM, we don't need to initialize a bitmap and
we don't need to do bitmap operation, so we can boot slightly faster.

Additionaly advantage of enabling NO_BOOTMEM is saving more memory.
bootmem allocator manage memories as page unit, so if we request
4 bytes area to bootmem, it actually give us PAGE_SIZE area.
nobootmem manage memory as byte unit, so there is no waste.

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 13b7394..8b73417 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -58,6 +58,7 @@ config ARM
select CLONE_BACKWARDS
select OLD_SIGSUSPEND3
select OLD_SIGACTION
+   select NO_BOOTMEM
help
  The ARM series is a line of low-power-consumption RISC chip designs
  licensed by ARM Ltd and targeted at embedded applications and
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 0/6] correct load_balance()

2013-03-26 Thread Joonsoo Kim
Commit 88b8dac0 makes load_balance() consider other cpus in its group.
But, there are some missing parts for this feature to work properly.
This patchset correct these things and make load_balance() robust.

Others are related to LBF_ALL_PINNED. This is fallback functionality
when all tasks can't be moved as cpu affinity. But, currently,
if imbalance is not large enough to task's load, we leave LBF_ALL_PINNED
flag and 'redo' is triggered. This is not our intention, so correct it.

These are based on v3.9-rc4.

Changelog
v1-v2: Changes from Peter's suggestion
 [4/6]: don't include a code to evaluate load value in can_migrate_task()
 [5/6]: rename load_balance_tmpmask to load_balance_mask
 [6/6]: not use one more cpumasks, use env's cpus for prevent to re-select

Joonsoo Kim (6):
  sched: change position of resched_cpu() in load_balance()
  sched: explicitly cpu_idle_type checking in rebalance_domains()
  sched: don't consider other cpus in our group in case of NEWLY_IDLE
  sched: move up affinity check to mitigate useless redoing overhead
  sched: rename load_balance_tmpmask to load_balance_mask
  sched: prevent to re-select dst-cpu in load_balance()

 kernel/sched/core.c |4 +--
 kernel/sched/fair.c |   67 +++
 2 files changed, 38 insertions(+), 33 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 1/6] sched: change position of resched_cpu() in load_balance()

2013-03-26 Thread Joonsoo Kim
cur_ld_moved is reset if env.flags hit LBF_NEED_BREAK.
So, there is possibility that we miss doing resched_cpu().
Correct it as changing position of resched_cpu()
before checking LBF_NEED_BREAK.

Acked-by: Peter Zijlstra a.p.zijls...@chello.nl
Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7a33e59..f084069 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5061,17 +5061,17 @@ more_balance:
double_rq_unlock(env.dst_rq, busiest);
local_irq_restore(flags);
 
-   if (env.flags  LBF_NEED_BREAK) {
-   env.flags = ~LBF_NEED_BREAK;
-   goto more_balance;
-   }
-
/*
 * some other cpu did the load balance for us.
 */
if (cur_ld_moved  env.dst_cpu != smp_processor_id())
resched_cpu(env.dst_cpu);
 
+   if (env.flags  LBF_NEED_BREAK) {
+   env.flags = ~LBF_NEED_BREAK;
+   goto more_balance;
+   }
+
/*
 * Revisit (affine) tasks on src_cpu that couldn't be moved to
 * us and move them to an alternate dst_cpu in our sched_group
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 3/6] sched: don't consider other cpus in our group in case of NEWLY_IDLE

2013-03-26 Thread Joonsoo Kim
Commit 88b8dac0 makes load_balance() consider other cpus in its group,
regardless of idle type. When we do NEWLY_IDLE balancing, we should not
consider it, because a motivation of NEWLY_IDLE balancing is to turn
this cpu to non idle state if needed. This is not the case of other cpus.
So, change code not to consider other cpus for NEWLY_IDLE balancing.

With this patch, assign 'if (pulled_task) this_rq-idle_stamp = 0'
in idle_balance() is corrected, because NEWLY_IDLE balancing doesn't
consider other cpus. Assigning to 'this_rq-idle_stamp' is now valid.

Cc: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
Acked-by: Peter Zijlstra a.p.zijls...@chello.nl
Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9d693d0..3f8c4f2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5007,8 +5007,17 @@ static int load_balance(int this_cpu, struct rq *this_rq,
.cpus   = cpus,
};
 
+   /* For NEWLY_IDLE load_balancing, we don't need to consider
+* other cpus in our group */
+   if (idle == CPU_NEWLY_IDLE) {
+   env.dst_grpmask = NULL;
+   /* we don't care max_lb_iterations in this case,
+* in following patch, this will be removed */
+   max_lb_iterations = 0;
+   } else {
+   max_lb_iterations = cpumask_weight(env.dst_grpmask);
+   }
cpumask_copy(cpus, cpu_active_mask);
-   max_lb_iterations = cpumask_weight(env.dst_grpmask);
 
schedstat_inc(sd, lb_count[idle]);
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 2/6] sched: explicitly cpu_idle_type checking in rebalance_domains()

2013-03-26 Thread Joonsoo Kim
After commit 88b8dac0, dst-cpu can be changed in load_balance(),
then we can't know cpu_idle_type of dst-cpu when load_balance()
return positive. So, add explicit cpu_idle_type checking.

Cc: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f084069..9d693d0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5506,10 +5506,10 @@ static void rebalance_domains(int cpu, enum 
cpu_idle_type idle)
if (time_after_eq(jiffies, sd-last_balance + interval)) {
if (load_balance(cpu, rq, sd, idle, balance)) {
/*
-* We've pulled tasks over so either we're no
-* longer idle.
+* We've pulled tasks over so either we may
+* be no longer idle.
 */
-   idle = CPU_NOT_IDLE;
+   idle = idle_cpu(cpu) ? CPU_IDLE : CPU_NOT_IDLE;
}
sd-last_balance = jiffies;
}
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 6/6] sched: prevent to re-select dst-cpu in load_balance()

2013-03-26 Thread Joonsoo Kim
Commit 88b8dac0 makes load_balance() consider other cpus in its group.
But, in that, there is no code for preventing to re-select dst-cpu.
So, same dst-cpu can be selected over and over.

This patch add functionality to load_balance() in order to exclude
cpu which is selected once. We prevent to re-select dst_cpu via
env's cpus, so now, env's cpus is a candidate not only for src_cpus,
but also dst_cpus.

Cc: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e3f09f4..6f238d2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3883,7 +3883,7 @@ int can_migrate_task(struct task_struct *p, struct lb_env 
*env)
return 0;
 
if (!cpumask_test_cpu(env-dst_cpu, tsk_cpus_allowed(p))) {
-   int new_dst_cpu;
+   int cpu;
 
schedstat_inc(p, se.statistics.nr_failed_migrations_affine);
 
@@ -3898,12 +3898,14 @@ int can_migrate_task(struct task_struct *p, struct 
lb_env *env)
if (!env-dst_grpmask || (env-flags  LBF_SOME_PINNED))
return 0;
 
-   new_dst_cpu = cpumask_first_and(env-dst_grpmask,
-   tsk_cpus_allowed(p));
-   if (new_dst_cpu  nr_cpu_ids) {
-   env-flags |= LBF_SOME_PINNED;
-   env-new_dst_cpu = new_dst_cpu;
-   }
+   /* Prevent to re-select dst_cpu via env's cpus */
+   for_each_cpu_and(cpu, env-dst_grpmask, env-cpus)
+   if (cpumask_test_cpu(cpu, tsk_cpus_allowed(p))) {
+   env-flags |= LBF_SOME_PINNED;
+   env-new_dst_cpu = cpu;
+   break;
+   }
+
return 0;
}
 
@@ -4989,7 +4991,6 @@ static int load_balance(int this_cpu, struct rq *this_rq,
int *balance)
 {
int ld_moved, cur_ld_moved, active_balance = 0;
-   int lb_iterations, max_lb_iterations;
struct sched_group *group;
struct rq *busiest;
unsigned long flags;
@@ -5009,11 +5010,6 @@ static int load_balance(int this_cpu, struct rq *this_rq,
 * other cpus in our group */
if (idle == CPU_NEWLY_IDLE) {
env.dst_grpmask = NULL;
-   /* we don't care max_lb_iterations in this case,
-* in following patch, this will be removed */
-   max_lb_iterations = 0;
-   } else {
-   max_lb_iterations = cpumask_weight(env.dst_grpmask);
}
cpumask_copy(cpus, cpu_active_mask);
 
@@ -5041,7 +5037,6 @@ redo:
schedstat_add(sd, lb_imbalance[idle], env.imbalance);
 
ld_moved = 0;
-   lb_iterations = 1;
if (busiest-nr_running  1) {
/*
 * Attempt to move tasks. If find_busiest_group has found
@@ -5098,14 +5093,17 @@ more_balance:
 * moreover subsequent load balance cycles should correct the
 * excess load moved.
 */
-   if ((env.flags  LBF_SOME_PINNED)  env.imbalance  0 
-   lb_iterations++  max_lb_iterations) {
+   if ((env.flags  LBF_SOME_PINNED)  env.imbalance  0) {
 
env.dst_rq   = cpu_rq(env.new_dst_cpu);
env.dst_cpu  = env.new_dst_cpu;
env.flags   = ~LBF_SOME_PINNED;
env.loop = 0;
env.loop_break   = sched_nr_migrate_break;
+
+   /* Prevent to re-select dst_cpu via env's cpus */
+   cpumask_clear_cpu(env.dst_cpu, env.cpus);
+
/*
 * Go back to more_balance rather than redo since we
 * need to continue with same src_cpu.
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 5/6] sched: rename load_balance_tmpmask to load_balance_mask

2013-03-26 Thread Joonsoo Kim
This name doesn't represent specific meaning.
So rename it to imply it's purpose.

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7f12624..07b4178 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6865,7 +6865,7 @@ struct task_group root_task_group;
 LIST_HEAD(task_groups);
 #endif
 
-DECLARE_PER_CPU(cpumask_var_t, load_balance_tmpmask);
+DECLARE_PER_CPU(cpumask_var_t, load_balance_mask);
 
 void __init sched_init(void)
 {
@@ -6902,7 +6902,7 @@ void __init sched_init(void)
 #endif /* CONFIG_RT_GROUP_SCHED */
 #ifdef CONFIG_CPUMASK_OFFSTACK
for_each_possible_cpu(i) {
-   per_cpu(load_balance_tmpmask, i) = (void *)ptr;
+   per_cpu(load_balance_mask, i) = (void *)ptr;
ptr += cpumask_size();
}
 #endif /* CONFIG_CPUMASK_OFFSTACK */
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d3c6011..e3f09f4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4958,7 +4958,7 @@ static struct rq *find_busiest_queue(struct lb_env *env,
 #define MAX_PINNED_INTERVAL512
 
 /* Working cpumask for load_balance and load_balance_newidle. */
-DEFINE_PER_CPU(cpumask_var_t, load_balance_tmpmask);
+DEFINE_PER_CPU(cpumask_var_t, load_balance_mask);
 
 static int need_active_balance(struct lb_env *env)
 {
@@ -4993,7 +4993,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
struct sched_group *group;
struct rq *busiest;
unsigned long flags;
-   struct cpumask *cpus = __get_cpu_var(load_balance_tmpmask);
+   struct cpumask *cpus = __get_cpu_var(load_balance_mask);
 
struct lb_env env = {
.sd = sd,
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 4/6] sched: move up affinity check to mitigate useless redoing overhead

2013-03-26 Thread Joonsoo Kim
Currently, LBF_ALL_PINNED is cleared after affinity check is passed.
So, if task migration is skipped by small load value or
small imbalance value in move_tasks(), we don't clear LBF_ALL_PINNED.
At last, we trigger 'redo' in load_balance().

Imbalance value is often so small that any tasks cannot be moved
to other cpus and, of course, this situation may be continued after
we change the target cpu. So this patch move up affinity check code and
clear LBF_ALL_PINNED before evaluating load value in order to
mitigate useless redoing overhead.

In addition, re-order some comments correctly.

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3f8c4f2..d3c6011 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3874,10 +3874,14 @@ int can_migrate_task(struct task_struct *p, struct 
lb_env *env)
int tsk_cache_hot = 0;
/*
 * We do not migrate tasks that are:
-* 1) running (obviously), or
+* 1) throttled_lb_pair, or
 * 2) cannot be migrated to this CPU due to cpus_allowed, or
-* 3) are cache-hot on their current CPU.
+* 3) running (obviously), or
+* 4) are cache-hot on their current CPU.
 */
+   if (throttled_lb_pair(task_group(p), env-src_cpu, env-dst_cpu))
+   return 0;
+
if (!cpumask_test_cpu(env-dst_cpu, tsk_cpus_allowed(p))) {
int new_dst_cpu;
 
@@ -3948,9 +3952,6 @@ static int move_one_task(struct lb_env *env)
struct task_struct *p, *n;
 
list_for_each_entry_safe(p, n, env-src_rq-cfs_tasks, se.group_node) {
-   if (throttled_lb_pair(task_group(p), env-src_rq-cpu, 
env-dst_cpu))
-   continue;
-
if (!can_migrate_task(p, env))
continue;
 
@@ -4002,7 +4003,7 @@ static int move_tasks(struct lb_env *env)
break;
}
 
-   if (throttled_lb_pair(task_group(p), env-src_cpu, 
env-dst_cpu))
+   if (!can_migrate_task(p, env))
goto next;
 
load = task_h_load(p);
@@ -4013,9 +4014,6 @@ static int move_tasks(struct lb_env *env)
if ((load / 2)  env-imbalance)
goto next;
 
-   if (!can_migrate_task(p, env))
-   goto next;
-
move_task(p, env);
pulled++;
env-imbalance -= load;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.

2013-03-26 Thread Joonsoo Kim
On Mon, Mar 25, 2013 at 02:32:35PM -0400, Steven Rostedt wrote:
 On Mon, 2013-03-25 at 18:27 +, Christoph Lameter wrote:
  On Mon, 25 Mar 2013, Steven Rostedt wrote:
  
   If this makes it more deterministic, and lower worse case latencies,
   then it's definitely worth the price.
  
  Yes that would make it more deterministic. Maybe I should add an option
  to be able to compile the allocator without cpu partial page support?
 
 I agree that would be useful.

Hello, Steven and Christoph.

How about using spin_try_lock() in unfreeze_partials() and
using spin_lock_contented() in get_partial_node() to reduce latency?
IMHO, this doesn't make code more deterministic, but can maintain
a benefit of cpu partial page with tolerable latency.

Thanks.

 
 -- Steve
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.

2013-03-27 Thread Joonsoo Kim
On Tue, Mar 26, 2013 at 11:30:32PM -0400, Steven Rostedt wrote:
 On Wed, 2013-03-27 at 11:59 +0900, Joonsoo Kim wrote:
 
  How about using spin_try_lock() in unfreeze_partials() and
  using spin_lock_contented() in get_partial_node() to reduce latency?
  IMHO, this doesn't make code more deterministic, but can maintain
  a benefit of cpu partial page with tolerable latency.
 
 And what do you do when you fail the try lock? Try again, or just break
 out?

Just break out.

 
 We can run benchmarks, but I don't like playing games in -rt. It either
 is deterministic, or it isn't.

Okay.

 
 -- Steve
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/5] sched: remove one division operation in find_buiest_queue()

2013-03-28 Thread Joonsoo Kim
Remove one division operation in find_buiest_queue().

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6f238d2..1d8774f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4911,7 +4911,7 @@ static struct rq *find_busiest_queue(struct lb_env *env,
 struct sched_group *group)
 {
struct rq *busiest = NULL, *rq;
-   unsigned long max_load = 0;
+   unsigned long busiest_load = 0, busiest_power = SCHED_POWER_SCALE;
int i;
 
for_each_cpu(i, sched_group_cpus(group)) {
@@ -4942,10 +4942,9 @@ static struct rq *find_busiest_queue(struct lb_env *env,
 * the load can be moved away from the cpu that is potentially
 * running at a lower capacity.
 */
-   wl = (wl * SCHED_POWER_SCALE) / power;
-
-   if (wl  max_load) {
-   max_load = wl;
+   if (wl * busiest_power  busiest_load * power) {
+   busiest_load = wl;
+   busiest_power = power;
busiest = rq;
}
}
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/5] sched: don't consider upper se in sched_slice()

2013-03-28 Thread Joonsoo Kim
Following-up upper se in sched_slice() should not be done,
because sched_slice() is used for checking that resched is needed
whithin *this* cfs_rq and there is one problem related to this
in current implementation.

The problem is that if we follow-up upper se in sched_slice(), it is
possible that we get a ideal slice which is lower than
sysctl_sched_min_granularity.

For example, we assume that we have 4 tg which is attached to root tg
with same share and each one have 20 runnable tasks on cpu0, respectivly.
In this case, __sched_period() return sysctl_sched_min_granularity * 20
and then go into loop. At first iteration, we compute a portion of slice
for this task on this cfs_rq, so get a slice, sysctl_sched_min_granularity.
Afterward, we enter second iteration and get a slice which is a quarter of
sysctl_sched_min_granularity, because there is 4 tgs with same share
in that cfs_rq.

Ensuring slice larger than min_granularity is important for performance
and there is no lower bound about this, except timer tick, we should
fix it not to consider upper se when calculating sched_slice.

Below is my testing result on my 4 cpus machine.

I did a test for verifying this effect in below environment.

CONFIG_HZ=1000 and CONFIG_SCHED_AUTOGROUP=y
/proc/sys/kernel/sched_min_granularity_ns is 225, that is, 2.25ms.

Did following command.

For each 4 sessions,
for i in `seq 20`; do taskset -c 3 sh -c 'while true; do :; done'  done

./perf sched record
./perf script -C 003 | grep sched_switch | cut -b -40 | less

Result is below.

*Vanilla*
  sh  2724 [003]   152.52801
  sh  2779 [003]   152.52900
  sh  2775 [003]   152.53000
  sh  2751 [003]   152.53100
  sh  2717 [003]   152.53201

*With this patch*
  sh  2640 [003]   147.48700
  sh  2662 [003]   147.49000
  sh  2601 [003]   147.49300
  sh  2633 [003]   147.49400

In vanilla case, min_granularity is lower than 1ms, so every tick trigger
reschedule. After patch appied, we can see min_granularity is ensured.

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 204a9a9..e232421 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -631,23 +631,20 @@ static u64 __sched_period(unsigned long nr_running)
  */
 static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
+   struct load_weight *load;
+   struct load_weight lw;
u64 slice = __sched_period(cfs_rq-nr_running + !se-on_rq);
 
-   for_each_sched_entity(se) {
-   struct load_weight *load;
-   struct load_weight lw;
-
-   cfs_rq = cfs_rq_of(se);
-   load = cfs_rq-load;
+   load = cfs_rq-load;
 
-   if (unlikely(!se-on_rq)) {
-   lw = cfs_rq-load;
+   if (unlikely(!se-on_rq)) {
+   lw = cfs_rq-load;
 
-   update_load_add(lw, se-load.weight);
-   load = lw;
-   }
-   slice = calc_delta_mine(slice, se-load.weight, load);
+   update_load_add(lw, se-load.weight);
+   load = lw;
}
+   slice = calc_delta_mine(slice, se-load.weight, load);
+
return slice;
 }
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/5] sched: clean-up struct sd_lb_stat

2013-03-28 Thread Joonsoo Kim
There is no reason to maintain separate variables for this_group
and busiest_group in sd_lb_stat, except saving some space.
But this structure is always allocated in stack, so this saving
isn't really benificial.

This patch unify these variables, so IMO, readability may be improved.

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 95ec757..204a9a9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4175,36 +4175,6 @@ static unsigned long task_h_load(struct task_struct *p)
 
 /** Helpers for find_busiest_group /
 /*
- * sd_lb_stats - Structure to store the statistics of a sched_domain
- * during load balancing.
- */
-struct sd_lb_stats {
-   struct sched_group *busiest; /* Busiest group in this sd */
-   struct sched_group *this;  /* Local group in this sd */
-   unsigned long total_load;  /* Total load of all groups in sd */
-   unsigned long total_pwr;   /*   Total power of all groups in sd */
-   unsigned long avg_load;/* Average load across all groups in sd */
-
-   /** Statistics of this group */
-   unsigned long this_load;
-   unsigned long this_load_per_task;
-   unsigned long this_nr_running;
-   unsigned long this_has_capacity;
-   unsigned int  this_idle_cpus;
-
-   /* Statistics of the busiest group */
-   unsigned int  busiest_idle_cpus;
-   unsigned long max_load;
-   unsigned long busiest_load_per_task;
-   unsigned long busiest_nr_running;
-   unsigned long busiest_group_capacity;
-   unsigned long busiest_has_capacity;
-   unsigned int  busiest_group_weight;
-
-   int group_imb; /* Is there imbalance in this sd */
-};
-
-/*
  * sg_lb_stats - stats of a sched_group required for load_balancing
  */
 struct sg_lb_stats {
@@ -4219,6 +4189,24 @@ struct sg_lb_stats {
int group_has_capacity; /* Is there extra capacity in the group? */
 };
 
+/*
+ * sd_lb_stats - Structure to store the statistics of a sched_domain
+ * during load balancing.
+ */
+struct sd_lb_stats {
+   struct sched_group *busiest; /* Busiest group in this sd */
+   struct sched_group *this;  /* Local group in this sd */
+   unsigned long total_load;  /* Total load of all groups in sd */
+   unsigned long total_pwr;   /*   Total power of all groups in sd */
+   unsigned long sd_avg_load; /* Average load across all groups in sd */
+
+   /* Statistics of this group */
+   struct sg_lb_stats this_stat;
+
+   /* Statistics of the busiest group */
+   struct sg_lb_stats busiest_stat;
+};
+
 /**
  * get_sd_load_idx - Obtain the load index for a given sched domain.
  * @sd: The sched_domain whose load_idx is to be obtained.
@@ -4499,7 +4487,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
   struct sched_group *sg,
   struct sg_lb_stats *sgs)
 {
-   if (sgs-avg_load = sds-max_load)
+   if (sgs-avg_load = sds-busiest_stat.avg_load)
return false;
 
if (sgs-sum_nr_running  sgs-group_capacity)
@@ -4536,7 +4524,7 @@ static inline void update_sd_lb_stats(struct lb_env *env,
 {
struct sched_domain *child = env-sd-child;
struct sched_group *sg = env-sd-groups;
-   struct sg_lb_stats sgs;
+   struct sg_lb_stats tmp_sgs;
int load_idx, prefer_sibling = 0;
 
if (child  child-flags  SD_PREFER_SIBLING)
@@ -4546,13 +4534,16 @@ static inline void update_sd_lb_stats(struct lb_env 
*env,
 
do {
int local_group;
+   struct sg_lb_stats *sgs;
 
local_group = cpumask_test_cpu(env-dst_cpu, 
sched_group_cpus(sg));
-   memset(sgs, 0, sizeof(sgs));
-   update_sg_lb_stats(env, sg, load_idx, local_group, sgs);
-
-   sds-total_load += sgs.group_load;
-   sds-total_pwr += sg-sgp-power;
+   if (local_group)
+   sgs = sds-this_stat;
+   else {
+   sgs = tmp_sgs;
+   memset(sgs, 0, sizeof(*sgs));
+   }
+   update_sg_lb_stats(env, sg, load_idx, local_group, sgs);
 
/*
 * In case the child domain prefers tasks go to siblings
@@ -4564,26 +4555,19 @@ static inline void update_sd_lb_stats(struct lb_env 
*env,
 * heaviest group when it is already under-utilized (possible
 * with a large weight task outweighs the tasks on the system).
 */
-   if (prefer_sibling  !local_group  sds-this_has_capacity)
-   sgs.group_capacity = min(sgs.group_capacity, 1UL);
+   if (prefer_sibling  !local_group 
+   sds-this  sds-this_stat.group_has_capacity)
+   sgs-group_capacity = min(sgs-group_capacity, 1UL

[PATCH 5/5] sched: limit sched_slice if it is more than sysctl_sched_latency

2013-03-28 Thread Joonsoo Kim
sched_slice() compute ideal runtime slice. If there are many tasks
in cfs_rq, period for this cfs_rq is extended to guarantee that each task
has time slice at least, sched_min_granularity. And then each task get
a portion of this period for it. If there is a task which have much larger
load weight than others, a portion of period can exceed far more than
sysctl_sched_latency.

For exampple, you can simply imagine that one task with nice -20 and
9 tasks with nice 0 on one cfs_rq. In this case, load weight sum for
this cfs_rq is 88761 + 9 * 1024, 97977. So a portion of slice for the
task with nice -20 is sysctl_sched_min_granularity * 10 * (88761 / 97977),
that is, approximately, sysctl_sched_min_granularity * 9. This aspect
can be much larger if there is more tasks with nice 0.

So we should limit this possible weird situation.

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e232421..6ceffbc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -645,6 +645,9 @@ static u64 sched_slice(struct cfs_rq *cfs_rq, struct 
sched_entity *se)
}
slice = calc_delta_mine(slice, se-load.weight, load);
 
+   if (unlikely(slice  sysctl_sched_latency))
+   slice = sysctl_sched_latency;
+
return slice;
 }
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/5] sched: factor out code to should_we_balance()

2013-03-28 Thread Joonsoo Kim
Now checking that this cpu is appropriate to balance is embedded into
update_sg_lb_stats() and this checking has no direct relationship to this
function.

There is not enough reason to place this checking at update_sg_lb_stats(),
except saving one iteration for sched_group_cpus. But with this change,
we can save two memset cost and can expect better compiler optimization,
so clean-up may be more beneficial to us.

Below is result of this patch.

* Vanilla *
   textdata bss dec hex filename
  344991136 116   357518ba7 kernel/sched/fair.o

* Patched *
   textdata bss dec hex filename
  342431136 116   354958aa7 kernel/sched/fair.o

In addition, rename @balance to @should_balance in order to represent
its purpose more clearly.

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1d8774f..95ec757 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4406,22 +4406,17 @@ fix_small_capacity(struct sched_domain *sd, struct 
sched_group *group)
  * @group: sched_group whose statistics are to be updated.
  * @load_idx: Load index of sched_domain of this_cpu for load calc.
  * @local_group: Does group contain this_cpu.
- * @balance: Should we balance.
  * @sgs: variable to hold the statistics for this group.
  */
 static inline void update_sg_lb_stats(struct lb_env *env,
struct sched_group *group, int load_idx,
-   int local_group, int *balance, struct sg_lb_stats *sgs)
+   int local_group, struct sg_lb_stats *sgs)
 {
unsigned long nr_running, max_nr_running, min_nr_running;
unsigned long load, max_cpu_load, min_cpu_load;
-   unsigned int balance_cpu = -1, first_idle_cpu = 0;
unsigned long avg_load_per_task = 0;
int i;
 
-   if (local_group)
-   balance_cpu = group_balance_cpu(group);
-
/* Tally up the load of all CPUs in the group */
max_cpu_load = 0;
min_cpu_load = ~0UL;
@@ -4434,15 +4429,9 @@ static inline void update_sg_lb_stats(struct lb_env *env,
nr_running = rq-nr_running;
 
/* Bias balancing toward cpus of our domain */
-   if (local_group) {
-   if (idle_cpu(i)  !first_idle_cpu 
-   cpumask_test_cpu(i, 
sched_group_mask(group))) {
-   first_idle_cpu = 1;
-   balance_cpu = i;
-   }
-
+   if (local_group)
load = target_load(i, load_idx);
-   } else {
+   else {
load = source_load(i, load_idx);
if (load  max_cpu_load)
max_cpu_load = load;
@@ -4462,22 +4451,9 @@ static inline void update_sg_lb_stats(struct lb_env *env,
sgs-idle_cpus++;
}
 
-   /*
-* First idle cpu or the first cpu(busiest) in this sched group
-* is eligible for doing load balancing at this and above
-* domains. In the newly idle case, we will allow all the cpu's
-* to do the newly idle load balance.
-*/
-   if (local_group) {
-   if (env-idle != CPU_NEWLY_IDLE) {
-   if (balance_cpu != env-dst_cpu) {
-   *balance = 0;
-   return;
-   }
-   update_group_power(env-sd, env-dst_cpu);
-   } else if (time_after_eq(jiffies, group-sgp-next_update))
-   update_group_power(env-sd, env-dst_cpu);
-   }
+   if (local_group  (env-idle != CPU_NEWLY_IDLE ||
+   time_after_eq(jiffies, group-sgp-next_update)))
+   update_group_power(env-sd, env-dst_cpu);
 
/* Adjust by relative CPU power of the group */
sgs-avg_load = (sgs-group_load*SCHED_POWER_SCALE) / group-sgp-power;
@@ -4556,7 +4532,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
  * @sds: variable to hold the statistics for this sched_domain.
  */
 static inline void update_sd_lb_stats(struct lb_env *env,
-   int *balance, struct sd_lb_stats *sds)
+   struct sd_lb_stats *sds)
 {
struct sched_domain *child = env-sd-child;
struct sched_group *sg = env-sd-groups;
@@ -4573,10 +4549,7 @@ static inline void update_sd_lb_stats(struct lb_env *env,
 
local_group = cpumask_test_cpu(env-dst_cpu, 
sched_group_cpus(sg));
memset(sgs, 0, sizeof(sgs));
-   update_sg_lb_stats(env, sg, load_idx, local_group, balance, 
sgs);
-
-   if (local_group  !(*balance))
-   return;
+   update_sg_lb_stats(env, sg, load_idx, local_group, sgs);
 
sds-total_load

[PATCH 0/5] optimization, clean-up, correctness about fair.c

2013-03-28 Thread Joonsoo Kim
There is no unified subject for this patchset.

Patch 1 is for removing one division operation with multiplication.
Patch 2,3 is for clean-up related to load_balance(), there is improvement
in terms of code size and IMO, readability may be also improved.
Patch 4,5 is for correctness about sched_slice()

Feel free to give a comment for this patchset.

It's based on v3.9-rc4 and top of my previous patchset. But, perhaps,
it may not really depend on my previous patchset. :)

https://lkml.org/lkml/2013/3/26/28
[PATCH v2 0/6] correct load_balance()

Thanks.

Joonsoo Kim (5):
  sched: remove one division operation in find_buiest_queue()
  sched: factor out code to should_we_balance()
  sched: clean-up struct sd_lb_stat
  sched: don't consider upper se in sched_slice()
  sched: limit sched_slice if it is more than sysctl_sched_latency

 kernel/sched/fair.c |  347 +--
 1 file changed, 171 insertions(+), 176 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 7/8] mm, vmalloc: export vmap_area_list, instead of vmlist

2013-03-13 Thread Joonsoo Kim
From: Joonsoo Kim js1...@gmail.com

Although our intention is to unexport internal structure entirely,
but there is one exception for kexec. kexec dumps address of vmlist
and makedumpfile uses this information.

We are about to remove vmlist, then another way to retrieve information
of vmalloc layer is needed for makedumpfile. For this purpose,
we export vmap_area_list, instead of vmlist.

Cc: Eric Biederman ebied...@xmission.com
Cc: Dave Anderson ander...@redhat.com
Cc: Vivek Goyal vgo...@redhat.com
Cc: Atsushi Kumagai kumagai-atsu...@mxc.nes.nec.co.jp
Signed-off-by: Joonsoo Kim js1...@gmail.com
Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 698b1e5..8a25f90 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -130,8 +130,7 @@ extern long vwrite(char *buf, char *addr, unsigned long 
count);
 /*
  * Internals.  Dont't use..
  */
-extern rwlock_t vmlist_lock;
-extern struct vm_struct *vmlist;
+extern struct list_head vmap_area_list;
 extern __init void vm_area_add_early(struct vm_struct *vm);
 extern __init void vm_area_register_early(struct vm_struct *vm, size_t align);
 
diff --git a/kernel/kexec.c b/kernel/kexec.c
index bddd3d7..d9bfc6c 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1489,7 +1489,7 @@ static int __init crash_save_vmcoreinfo_init(void)
VMCOREINFO_SYMBOL(swapper_pg_dir);
 #endif
VMCOREINFO_SYMBOL(_stext);
-   VMCOREINFO_SYMBOL(vmlist);
+   VMCOREINFO_SYMBOL(vmap_area_list);
 
 #ifndef CONFIG_NEED_MULTIPLE_NODES
VMCOREINFO_SYMBOL(mem_map);
diff --git a/mm/nommu.c b/mm/nommu.c
index e193280..ed82358 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -228,8 +228,7 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long 
address,
 }
 EXPORT_SYMBOL(follow_pfn);
 
-DEFINE_RWLOCK(vmlist_lock);
-struct vm_struct *vmlist;
+LIST_HEAD(vmap_area_list);
 
 void vfree(const void *addr)
 {
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index bda6cef..7e63984 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -261,7 +261,8 @@ struct vmap_area {
 };
 
 static DEFINE_SPINLOCK(vmap_area_lock);
-static LIST_HEAD(vmap_area_list);
+/* Export for kexec only */
+LIST_HEAD(vmap_area_list);
 static struct rb_root vmap_area_root = RB_ROOT;
 
 /* The vmap cache globals are protected by vmap_area_lock */
@@ -272,6 +273,10 @@ static unsigned long cached_align;
 
 static unsigned long vmap_area_pcpu_hole;
 
+/*** Old vmalloc interfaces ***/
+static DEFINE_RWLOCK(vmlist_lock);
+static struct vm_struct *vmlist;
+
 static struct vmap_area *__find_vmap_area(unsigned long addr)
 {
struct rb_node *n = vmap_area_root.rb_node;
@@ -1283,10 +1288,6 @@ int map_vm_area(struct vm_struct *area, pgprot_t prot, 
struct page ***pages)
 }
 EXPORT_SYMBOL_GPL(map_vm_area);
 
-/*** Old vmalloc interfaces ***/
-DEFINE_RWLOCK(vmlist_lock);
-struct vm_struct *vmlist;
-
 static void setup_vmalloc_vm(struct vm_struct *vm, struct vmap_area *va,
  unsigned long flags, const void *caller)
 {
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 0/8] remove vm_struct list management

2013-03-13 Thread Joonsoo Kim
This patchset remove vm_struct list management after initializing vmalloc.
Adding and removing an entry to vmlist is linear time complexity, so
it is inefficient. If we maintain this list, overall time complexity of
adding and removing area to vmalloc space is O(N), although we use
rbtree for finding vacant place and it's time complexity is just O(logN).

And vmlist and vmlist_lock is used many places of outside of vmalloc.c.
It is preferable that we hide this raw data structure and provide
well-defined function for supporting them, because it makes that they
cannot mistake when manipulating theses structure and it makes us easily
maintain vmalloc layer.

For kexec and makedumpfile, I export vmap_area_list, instead of vmlist.
This comes from Atsushi's recommendation.
For more information, please refer below link.
https://lkml.org/lkml/2012/12/6/184

These are based on v3.9-rc2.

Changes from v1
5/8: skip areas for lazy_free
6/8: skip areas for lazy_free
7/8: export vmap_area_list for kexec, instead of vmlist

Joonsoo Kim (8):
  mm, vmalloc: change iterating a vmlist to find_vm_area()
  mm, vmalloc: move get_vmalloc_info() to vmalloc.c
  mm, vmalloc: protect va-vm by vmap_area_lock
  mm, vmalloc: iterate vmap_area_list, instead of vmlist in
vread/vwrite()
  mm, vmalloc: iterate vmap_area_list in get_vmalloc_info()
  mm, vmalloc: iterate vmap_area_list, instead of vmlist, in
vmallocinfo()
  mm, vmalloc: export vmap_area_list, instead of vmlist
  mm, vmalloc: remove list management of vmlist after initializing
vmalloc

 arch/tile/mm/pgtable.c  |7 +-
 arch/unicore32/mm/ioremap.c |   17 ++--
 arch/x86/mm/ioremap.c   |7 +-
 fs/proc/Makefile|2 +-
 fs/proc/internal.h  |   18 
 fs/proc/meminfo.c   |1 +
 fs/proc/mmu.c   |   60 -
 include/linux/vmalloc.h |   21 -
 kernel/kexec.c  |2 +-
 mm/nommu.c  |3 +-
 mm/vmalloc.c|  207 +--
 11 files changed, 170 insertions(+), 175 deletions(-)
 delete mode 100644 fs/proc/mmu.c

-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 5/8] mm, vmalloc: iterate vmap_area_list in get_vmalloc_info()

2013-03-13 Thread Joonsoo Kim
From: Joonsoo Kim js1...@gmail.com

This patch is preparing step for removing vmlist entirely.
For above purpose, we change iterating a vmap_list codes to iterating a
vmap_area_list. It is somewhat trivial change, but just one thing
should be noticed.

vmlist is lack of information about some areas in vmalloc address space.
For example, vm_map_ram() allocate area in vmalloc address space,
but it doesn't make a link with vmlist. To provide full information about
vmalloc address space is better idea, so we don't use va-vm and use
vmap_area directly.
This makes get_vmalloc_info() more precise.

Signed-off-by: Joonsoo Kim js1...@gmail.com
Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 59aa328..aee1f61 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2671,46 +2671,50 @@ module_init(proc_vmalloc_init);
 
 void get_vmalloc_info(struct vmalloc_info *vmi)
 {
-   struct vm_struct *vma;
+   struct vmap_area *va;
unsigned long free_area_size;
unsigned long prev_end;
 
vmi-used = 0;
+   vmi-largest_chunk = 0;
 
-   if (!vmlist) {
-   vmi-largest_chunk = VMALLOC_TOTAL;
-   } else {
-   vmi-largest_chunk = 0;
+   prev_end = VMALLOC_START;
 
-   prev_end = VMALLOC_START;
-
-   read_lock(vmlist_lock);
+   spin_lock(vmap_area_lock);
 
-   for (vma = vmlist; vma; vma = vma-next) {
-   unsigned long addr = (unsigned long) vma-addr;
+   if (list_empty(vmap_area_list)) {
+   vmi-largest_chunk = VMALLOC_TOTAL;
+   goto out;
+   }
 
-   /*
-* Some archs keep another range for modules in vmlist
-*/
-   if (addr  VMALLOC_START)
-   continue;
-   if (addr = VMALLOC_END)
-   break;
+   list_for_each_entry(va, vmap_area_list, list) {
+   unsigned long addr = va-va_start;
 
-   vmi-used += vma-size;
+   /*
+* Some archs keep another range for modules in vmalloc space
+*/
+   if (addr  VMALLOC_START)
+   continue;
+   if (addr = VMALLOC_END)
+   break;
 
-   free_area_size = addr - prev_end;
-   if (vmi-largest_chunk  free_area_size)
-   vmi-largest_chunk = free_area_size;
+   if (va-flags  (VM_LAZY_FREE | VM_LAZY_FREEING))
+   continue;
 
-   prev_end = vma-size + addr;
-   }
+   vmi-used += (va-va_end - va-va_start);
 
-   if (VMALLOC_END - prev_end  vmi-largest_chunk)
-   vmi-largest_chunk = VMALLOC_END - prev_end;
+   free_area_size = addr - prev_end;
+   if (vmi-largest_chunk  free_area_size)
+   vmi-largest_chunk = free_area_size;
 
-   read_unlock(vmlist_lock);
+   prev_end = va-va_end;
}
+
+   if (VMALLOC_END - prev_end  vmi-largest_chunk)
+   vmi-largest_chunk = VMALLOC_END - prev_end;
+
+out:
+   spin_unlock(vmap_area_lock);
 }
 #endif
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 4/8] mm, vmalloc: iterate vmap_area_list, instead of vmlist in vread/vwrite()

2013-03-13 Thread Joonsoo Kim
From: Joonsoo Kim js1...@gmail.com

Now, when we hold a vmap_area_lock, va-vm can't be discarded. So we can
safely access to va-vm when iterating a vmap_area_list with holding a
vmap_area_lock. With this property, change iterating vmlist codes in
vread/vwrite() to iterating vmap_area_list.

There is a little difference relate to lock, because vmlist_lock is mutex,
but, vmap_area_lock is spin_lock. It may introduce a spinning overhead
during vread/vwrite() is executing. But, these are debug-oriented
functions, so this overhead is not real problem for common case.

Signed-off-by: Joonsoo Kim js1...@gmail.com
Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 1bf94ad..59aa328 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2012,7 +2012,8 @@ static int aligned_vwrite(char *buf, char *addr, unsigned 
long count)
 
 long vread(char *buf, char *addr, unsigned long count)
 {
-   struct vm_struct *tmp;
+   struct vmap_area *va;
+   struct vm_struct *vm;
char *vaddr, *buf_start = buf;
unsigned long buflen = count;
unsigned long n;
@@ -2021,10 +2022,17 @@ long vread(char *buf, char *addr, unsigned long count)
if ((unsigned long) addr + count  count)
count = -(unsigned long) addr;
 
-   read_lock(vmlist_lock);
-   for (tmp = vmlist; count  tmp; tmp = tmp-next) {
-   vaddr = (char *) tmp-addr;
-   if (addr = vaddr + tmp-size - PAGE_SIZE)
+   spin_lock(vmap_area_lock);
+   list_for_each_entry(va, vmap_area_list, list) {
+   if (!count)
+   break;
+
+   if (!(va-flags  VM_VM_AREA))
+   continue;
+
+   vm = va-vm;
+   vaddr = (char *) vm-addr;
+   if (addr = vaddr + vm-size - PAGE_SIZE)
continue;
while (addr  vaddr) {
if (count == 0)
@@ -2034,10 +2042,10 @@ long vread(char *buf, char *addr, unsigned long count)
addr++;
count--;
}
-   n = vaddr + tmp-size - PAGE_SIZE - addr;
+   n = vaddr + vm-size - PAGE_SIZE - addr;
if (n  count)
n = count;
-   if (!(tmp-flags  VM_IOREMAP))
+   if (!(vm-flags  VM_IOREMAP))
aligned_vread(buf, addr, n);
else /* IOREMAP area is treated as memory hole */
memset(buf, 0, n);
@@ -2046,7 +2054,7 @@ long vread(char *buf, char *addr, unsigned long count)
count -= n;
}
 finished:
-   read_unlock(vmlist_lock);
+   spin_unlock(vmap_area_lock);
 
if (buf == buf_start)
return 0;
@@ -2085,7 +2093,8 @@ finished:
 
 long vwrite(char *buf, char *addr, unsigned long count)
 {
-   struct vm_struct *tmp;
+   struct vmap_area *va;
+   struct vm_struct *vm;
char *vaddr;
unsigned long n, buflen;
int copied = 0;
@@ -2095,10 +2104,17 @@ long vwrite(char *buf, char *addr, unsigned long count)
count = -(unsigned long) addr;
buflen = count;
 
-   read_lock(vmlist_lock);
-   for (tmp = vmlist; count  tmp; tmp = tmp-next) {
-   vaddr = (char *) tmp-addr;
-   if (addr = vaddr + tmp-size - PAGE_SIZE)
+   spin_lock(vmap_area_lock);
+   list_for_each_entry(va, vmap_area_list, list) {
+   if (!count)
+   break;
+
+   if (!(va-flags  VM_VM_AREA))
+   continue;
+
+   vm = va-vm;
+   vaddr = (char *) vm-addr;
+   if (addr = vaddr + vm-size - PAGE_SIZE)
continue;
while (addr  vaddr) {
if (count == 0)
@@ -2107,10 +2123,10 @@ long vwrite(char *buf, char *addr, unsigned long count)
addr++;
count--;
}
-   n = vaddr + tmp-size - PAGE_SIZE - addr;
+   n = vaddr + vm-size - PAGE_SIZE - addr;
if (n  count)
n = count;
-   if (!(tmp-flags  VM_IOREMAP)) {
+   if (!(vm-flags  VM_IOREMAP)) {
aligned_vwrite(buf, addr, n);
copied++;
}
@@ -2119,7 +2135,7 @@ long vwrite(char *buf, char *addr, unsigned long count)
count -= n;
}
 finished:
-   read_unlock(vmlist_lock);
+   spin_unlock(vmap_area_lock);
if (!copied)
return 0;
return buflen;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 3/8] mm, vmalloc: protect va-vm by vmap_area_lock

2013-03-13 Thread Joonsoo Kim
From: Joonsoo Kim js1...@gmail.com

Inserting and removing an entry to vmlist is linear time complexity, so
it is inefficient. Following patches will try to remove vmlist entirely.
This patch is preparing step for it.

For removing vmlist, iterating vmlist codes should be changed to iterating
a vmap_area_list. Before implementing that, we should make sure that
when we iterate a vmap_area_list, accessing to va-vm doesn't cause a race
condition. This patch ensure that when iterating a vmap_area_list,
there is no race condition for accessing to vm_struct.

Signed-off-by: Joonsoo Kim js1...@gmail.com
Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 1d9878b..1bf94ad 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1290,12 +1290,14 @@ struct vm_struct *vmlist;
 static void setup_vmalloc_vm(struct vm_struct *vm, struct vmap_area *va,
  unsigned long flags, const void *caller)
 {
+   spin_lock(vmap_area_lock);
vm-flags = flags;
vm-addr = (void *)va-va_start;
vm-size = va-va_end - va-va_start;
vm-caller = caller;
va-vm = vm;
va-flags |= VM_VM_AREA;
+   spin_unlock(vmap_area_lock);
 }
 
 static void insert_vmalloc_vmlist(struct vm_struct *vm)
@@ -1447,6 +1449,11 @@ struct vm_struct *remove_vm_area(const void *addr)
if (va  va-flags  VM_VM_AREA) {
struct vm_struct *vm = va-vm;
 
+   spin_lock(vmap_area_lock);
+   va-vm = NULL;
+   va-flags = ~VM_VM_AREA;
+   spin_unlock(vmap_area_lock);
+
if (!(vm-flags  VM_UNLIST)) {
struct vm_struct *tmp, **p;
/*
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 8/8] mm, vmalloc: remove list management of vmlist after initializing vmalloc

2013-03-13 Thread Joonsoo Kim
From: Joonsoo Kim js1...@gmail.com

Now, there is no need to maintain vmlist after initializing vmalloc.
So remove related code and data structure.

Signed-off-by: Joonsoo Kim js1...@gmail.com
Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 7e63984..151da8a 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -273,10 +273,6 @@ static unsigned long cached_align;
 
 static unsigned long vmap_area_pcpu_hole;
 
-/*** Old vmalloc interfaces ***/
-static DEFINE_RWLOCK(vmlist_lock);
-static struct vm_struct *vmlist;
-
 static struct vmap_area *__find_vmap_area(unsigned long addr)
 {
struct rb_node *n = vmap_area_root.rb_node;
@@ -318,7 +314,7 @@ static void __insert_vmap_area(struct vmap_area *va)
rb_link_node(va-rb_node, parent, p);
rb_insert_color(va-rb_node, vmap_area_root);
 
-   /* address-sort this list so it is usable like the vmlist */
+   /* address-sort this list */
tmp = rb_prev(va-rb_node);
if (tmp) {
struct vmap_area *prev;
@@ -1130,6 +1126,7 @@ void *vm_map_ram(struct page **pages, unsigned int count, 
int node, pgprot_t pro
 }
 EXPORT_SYMBOL(vm_map_ram);
 
+static struct vm_struct *vmlist __initdata;
 /**
  * vm_area_add_early - add vmap area early during boot
  * @vm: vm_struct to add
@@ -1301,10 +1298,8 @@ static void setup_vmalloc_vm(struct vm_struct *vm, 
struct vmap_area *va,
spin_unlock(vmap_area_lock);
 }
 
-static void insert_vmalloc_vmlist(struct vm_struct *vm)
+static void clear_vm_unlist(struct vm_struct *vm)
 {
-   struct vm_struct *tmp, **p;
-
/*
 * Before removing VM_UNLIST,
 * we should make sure that vm has proper values.
@@ -1312,22 +1307,13 @@ static void insert_vmalloc_vmlist(struct vm_struct *vm)
 */
smp_wmb();
vm-flags = ~VM_UNLIST;
-
-   write_lock(vmlist_lock);
-   for (p = vmlist; (tmp = *p) != NULL; p = tmp-next) {
-   if (tmp-addr = vm-addr)
-   break;
-   }
-   vm-next = *p;
-   *p = vm;
-   write_unlock(vmlist_lock);
 }
 
 static void insert_vmalloc_vm(struct vm_struct *vm, struct vmap_area *va,
  unsigned long flags, const void *caller)
 {
setup_vmalloc_vm(vm, va, flags, caller);
-   insert_vmalloc_vmlist(vm);
+   clear_vm_unlist(vm);
 }
 
 static struct vm_struct *__get_vm_area_node(unsigned long size,
@@ -1370,10 +1356,9 @@ static struct vm_struct *__get_vm_area_node(unsigned 
long size,
 
/*
 * When this function is called from __vmalloc_node_range,
-* we do not add vm_struct to vmlist here to avoid
-* accessing uninitialized members of vm_struct such as
-* pages and nr_pages fields. They will be set later.
-* To distinguish it from others, we use a VM_UNLIST flag.
+* we add VM_UNLIST flag to avoid accessing uninitialized
+* members of vm_struct such as pages and nr_pages fields.
+* They will be set later.
 */
if (flags  VM_UNLIST)
setup_vmalloc_vm(area, va, flags, caller);
@@ -1462,20 +1447,6 @@ struct vm_struct *remove_vm_area(const void *addr)
va-flags = ~VM_VM_AREA;
spin_unlock(vmap_area_lock);
 
-   if (!(vm-flags  VM_UNLIST)) {
-   struct vm_struct *tmp, **p;
-   /*
-* remove from list and disallow access to
-* this vm_struct before unmap. (address range
-* confliction is maintained by vmap.)
-*/
-   write_lock(vmlist_lock);
-   for (p = vmlist; (tmp = *p) != vm; p = tmp-next)
-   ;
-   *p = tmp-next;
-   write_unlock(vmlist_lock);
-   }
-
vmap_debug_free_range(va-va_start, va-va_end);
free_unmap_vmap_area(va);
vm-size -= PAGE_SIZE;
@@ -1695,10 +1666,11 @@ void *__vmalloc_node_range(unsigned long size, unsigned 
long align,
return NULL;
 
/*
-* In this function, newly allocated vm_struct is not added
-* to vmlist at __get_vm_area_node(). so, it is added here.
+* In this function, newly allocated vm_struct has VM_UNLIST flag.
+* It means that vm_struct is not fully initialized.
+* Now, it is fully initialized, so remove this flag here.
 */
-   insert_vmalloc_vmlist(area);
+   clear_vm_unlist(area);
 
/*
 * A ref_count = 3 is needed because the vm_struct and vmap_area
@@ -2594,7 +2566,7 @@ static void show_numa_info(struct seq_file *m, struct 
vm_struct *v)
if (!counters)
return;
 
-   /* Pair with smp_wmb() in insert_vmalloc_vmlist() */
+   /* Pair with smp_wmb() in clear_vm_unlist

[PATCH v2 6/8] mm, vmalloc: iterate vmap_area_list, instead of vmlist, in vmallocinfo()

2013-03-13 Thread Joonsoo Kim
From: Joonsoo Kim js1...@gmail.com

This patch is preparing step for removing vmlist entirely.
For above purpose, we change iterating a vmap_list codes to iterating a
vmap_area_list. It is somewhat trivial change, but just one thing
should be noticed.

Using vmap_area_list in vmallocinfo() introduce ordering problem in SMP
system. In s_show(), we retrieve some values from vm_struct. vm_struct's
values is not fully setup when va-vm is assigned. Full setup is notified
by removing VM_UNLIST flag without holding a lock. When we see that
VM_UNLIST is removed, it is not ensured that vm_struct has proper values
in view of other CPUs. So we need smp_[rw]mb for ensuring that proper
values is assigned when we see that VM_UNLIST is removed.

Therefore, this patch not only change a iteration list, but also add a
appropriate smp_[rw]mb to right places.

Signed-off-by: Joonsoo Kim js1...@gmail.com
Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index aee1f61..bda6cef 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1304,7 +1304,14 @@ static void insert_vmalloc_vmlist(struct vm_struct *vm)
 {
struct vm_struct *tmp, **p;
 
+   /*
+* Before removing VM_UNLIST,
+* we should make sure that vm has proper values.
+* Pair with smp_rmb() in show_numa_info().
+*/
+   smp_wmb();
vm-flags = ~VM_UNLIST;
+
write_lock(vmlist_lock);
for (p = vmlist; (tmp = *p) != NULL; p = tmp-next) {
if (tmp-addr = vm-addr)
@@ -2542,19 +2549,19 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int 
nr_vms)
 
 #ifdef CONFIG_PROC_FS
 static void *s_start(struct seq_file *m, loff_t *pos)
-   __acquires(vmlist_lock)
+   __acquires(vmap_area_lock)
 {
loff_t n = *pos;
-   struct vm_struct *v;
+   struct vmap_area *va;
 
-   read_lock(vmlist_lock);
-   v = vmlist;
-   while (n  0  v) {
+   spin_lock(vmap_area_lock);
+   va = list_entry((vmap_area_list)-next, typeof(*va), list);
+   while (n  0  va-list != vmap_area_list) {
n--;
-   v = v-next;
+   va = list_entry(va-list.next, typeof(*va), list);
}
-   if (!n)
-   return v;
+   if (!n  va-list != vmap_area_list)
+   return va;
 
return NULL;
 
@@ -2562,16 +2569,20 @@ static void *s_start(struct seq_file *m, loff_t *pos)
 
 static void *s_next(struct seq_file *m, void *p, loff_t *pos)
 {
-   struct vm_struct *v = p;
+   struct vmap_area *va = p, *next;
 
++*pos;
-   return v-next;
+   next = list_entry(va-list.next, typeof(*va), list);
+   if (next-list != vmap_area_list)
+   return next;
+
+   return NULL;
 }
 
 static void s_stop(struct seq_file *m, void *p)
-   __releases(vmlist_lock)
+   __releases(vmap_area_lock)
 {
-   read_unlock(vmlist_lock);
+   spin_unlock(vmap_area_lock);
 }
 
 static void show_numa_info(struct seq_file *m, struct vm_struct *v)
@@ -2582,6 +2593,11 @@ static void show_numa_info(struct seq_file *m, struct 
vm_struct *v)
if (!counters)
return;
 
+   /* Pair with smp_wmb() in insert_vmalloc_vmlist() */
+   smp_rmb();
+   if (v-flags  VM_UNLIST)
+   return;
+
memset(counters, 0, nr_node_ids * sizeof(unsigned int));
 
for (nr = 0; nr  v-nr_pages; nr++)
@@ -2595,7 +2611,20 @@ static void show_numa_info(struct seq_file *m, struct 
vm_struct *v)
 
 static int s_show(struct seq_file *m, void *p)
 {
-   struct vm_struct *v = p;
+   struct vmap_area *va = p;
+   struct vm_struct *v;
+
+   if (va-flags  (VM_LAZY_FREE | VM_LAZY_FREEING))
+   return 0;
+
+   if (!(va-flags  VM_VM_AREA)) {
+   seq_printf(m, 0x%pK-0x%pK %7ld vm_map_ram\n,
+   (void *)va-va_start, (void *)va-va_end,
+   va-va_end - va-va_start);
+   return 0;
+   }
+
+   v = va-vm;
 
seq_printf(m, 0x%pK-0x%pK %7ld,
v-addr, v-addr + v-size, v-size);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 1/8] mm, vmalloc: change iterating a vmlist to find_vm_area()

2013-03-13 Thread Joonsoo Kim
From: Joonsoo Kim js1...@gmail.com

The purpose of iterating a vmlist is finding vm area with specific
virtual address. find_vm_area() is provided for this purpose
and more efficient, because it uses a rbtree.
So change it.

Cc: Chris Metcalf cmetc...@tilera.com
Cc: Guan Xuetao g...@mprc.pku.edu.cn
Cc: Thomas Gleixner t...@linutronix.de
Cc: Ingo Molnar mi...@redhat.com
Cc: H. Peter Anvin h...@zytor.com
Acked-by: Guan Xuetao g...@mprc.pku.edu.cn
Acked-by: Ingo Molnar mi...@kernel.org
Acked-by: Chris Metcalf cmetc...@tilera.com
Signed-off-by: Joonsoo Kim js1...@gmail.com
Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/arch/tile/mm/pgtable.c b/arch/tile/mm/pgtable.c
index b3b4972..dfd63ce 100644
--- a/arch/tile/mm/pgtable.c
+++ b/arch/tile/mm/pgtable.c
@@ -592,12 +592,7 @@ void iounmap(volatile void __iomem *addr_in)
   in parallel. Reuse of the virtual address is prevented by
   leaving it in the global lists until we're done with it.
   cpa takes care of the direct mappings. */
-   read_lock(vmlist_lock);
-   for (p = vmlist; p; p = p-next) {
-   if (p-addr == addr)
-   break;
-   }
-   read_unlock(vmlist_lock);
+   p = find_vm_area((void *)addr);
 
if (!p) {
pr_err(iounmap: bad address %p\n, addr);
diff --git a/arch/unicore32/mm/ioremap.c b/arch/unicore32/mm/ioremap.c
index b7a6055..13068ee 100644
--- a/arch/unicore32/mm/ioremap.c
+++ b/arch/unicore32/mm/ioremap.c
@@ -235,7 +235,7 @@ EXPORT_SYMBOL(__uc32_ioremap_cached);
 void __uc32_iounmap(volatile void __iomem *io_addr)
 {
void *addr = (void *)(PAGE_MASK  (unsigned long)io_addr);
-   struct vm_struct **p, *tmp;
+   struct vm_struct *vm;
 
/*
 * If this is a section based mapping we need to handle it
@@ -244,17 +244,10 @@ void __uc32_iounmap(volatile void __iomem *io_addr)
 * all the mappings before the area can be reclaimed
 * by someone else.
 */
-   write_lock(vmlist_lock);
-   for (p = vmlist ; (tmp = *p) ; p = tmp-next) {
-   if ((tmp-flags  VM_IOREMAP)  (tmp-addr == addr)) {
-   if (tmp-flags  VM_UNICORE_SECTION_MAPPING) {
-   unmap_area_sections((unsigned long)tmp-addr,
-   tmp-size);
-   }
-   break;
-   }
-   }
-   write_unlock(vmlist_lock);
+   vm = find_vm_area(addr);
+   if (vm  (vm-flags  VM_IOREMAP) 
+   (vm-flags  VM_UNICORE_SECTION_MAPPING))
+   unmap_area_sections((unsigned long)vm-addr, vm-size);
 
vunmap(addr);
 }
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 78fe3f1..9a1e658 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -282,12 +282,7 @@ void iounmap(volatile void __iomem *addr)
   in parallel. Reuse of the virtual address is prevented by
   leaving it in the global lists until we're done with it.
   cpa takes care of the direct mappings. */
-   read_lock(vmlist_lock);
-   for (p = vmlist; p; p = p-next) {
-   if (p-addr == (void __force *)addr)
-   break;
-   }
-   read_unlock(vmlist_lock);
+   p = find_vm_area((void __force *)addr);
 
if (!p) {
printk(KERN_ERR iounmap: bad address %p\n, addr);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 2/8] mm, vmalloc: move get_vmalloc_info() to vmalloc.c

2013-03-13 Thread Joonsoo Kim
From: Joonsoo Kim js1...@gmail.com

Now get_vmalloc_info() is in fs/proc/mmu.c. There is no reason
that this code must be here and it's implementation needs vmlist_lock
and it iterate a vmlist which may be internal data structure for vmalloc.

It is preferable that vmlist_lock and vmlist is only used in vmalloc.c
for maintainability. So move the code to vmalloc.c

Signed-off-by: Joonsoo Kim js1...@gmail.com
Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/fs/proc/Makefile b/fs/proc/Makefile
index 712f24d..ab30716 100644
--- a/fs/proc/Makefile
+++ b/fs/proc/Makefile
@@ -5,7 +5,7 @@
 obj-y   += proc.o
 
 proc-y := nommu.o task_nommu.o
-proc-$(CONFIG_MMU) := mmu.o task_mmu.o
+proc-$(CONFIG_MMU) := task_mmu.o
 
 proc-y   += inode.o root.o base.o generic.o array.o \
fd.o
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 85ff3a4..7571035 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -30,24 +30,6 @@ extern int proc_net_init(void);
 static inline int proc_net_init(void) { return 0; }
 #endif
 
-struct vmalloc_info {
-   unsigned long   used;
-   unsigned long   largest_chunk;
-};
-
-#ifdef CONFIG_MMU
-#define VMALLOC_TOTAL (VMALLOC_END - VMALLOC_START)
-extern void get_vmalloc_info(struct vmalloc_info *vmi);
-#else
-
-#define VMALLOC_TOTAL 0UL
-#define get_vmalloc_info(vmi)  \
-do {   \
-   (vmi)-used = 0;\
-   (vmi)-largest_chunk = 0;   \
-} while(0)
-#endif
-
 extern int proc_tid_stat(struct seq_file *m, struct pid_namespace *ns,
struct pid *pid, struct task_struct *task);
 extern int proc_tgid_stat(struct seq_file *m, struct pid_namespace *ns,
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 1efaaa1..5aa847a 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -11,6 +11,7 @@
 #include linux/swap.h
 #include linux/vmstat.h
 #include linux/atomic.h
+#include linux/vmalloc.h
 #include asm/page.h
 #include asm/pgtable.h
 #include internal.h
diff --git a/fs/proc/mmu.c b/fs/proc/mmu.c
deleted file mode 100644
index 8ae221d..000
--- a/fs/proc/mmu.c
+++ /dev/null
@@ -1,60 +0,0 @@
-/* mmu.c: mmu memory info files
- *
- * Copyright (C) 2004 Red Hat, Inc. All Rights Reserved.
- * Written by David Howells (dhowe...@redhat.com)
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- */
-#include linux/spinlock.h
-#include linux/vmalloc.h
-#include linux/highmem.h
-#include asm/pgtable.h
-#include internal.h
-
-void get_vmalloc_info(struct vmalloc_info *vmi)
-{
-   struct vm_struct *vma;
-   unsigned long free_area_size;
-   unsigned long prev_end;
-
-   vmi-used = 0;
-
-   if (!vmlist) {
-   vmi-largest_chunk = VMALLOC_TOTAL;
-   }
-   else {
-   vmi-largest_chunk = 0;
-
-   prev_end = VMALLOC_START;
-
-   read_lock(vmlist_lock);
-
-   for (vma = vmlist; vma; vma = vma-next) {
-   unsigned long addr = (unsigned long) vma-addr;
-
-   /*
-* Some archs keep another range for modules in vmlist
-*/
-   if (addr  VMALLOC_START)
-   continue;
-   if (addr = VMALLOC_END)
-   break;
-
-   vmi-used += vma-size;
-
-   free_area_size = addr - prev_end;
-   if (vmi-largest_chunk  free_area_size)
-   vmi-largest_chunk = free_area_size;
-
-   prev_end = vma-size + addr;
-   }
-
-   if (VMALLOC_END - prev_end  vmi-largest_chunk)
-   vmi-largest_chunk = VMALLOC_END - prev_end;
-
-   read_unlock(vmlist_lock);
-   }
-}
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 6071e91..698b1e5 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -158,4 +158,22 @@ pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
 # endif
 #endif
 
+struct vmalloc_info {
+   unsigned long   used;
+   unsigned long   largest_chunk;
+};
+
+#ifdef CONFIG_MMU
+#define VMALLOC_TOTAL (VMALLOC_END - VMALLOC_START)
+extern void get_vmalloc_info(struct vmalloc_info *vmi);
+#else
+
+#define VMALLOC_TOTAL 0UL
+#define get_vmalloc_info(vmi)  \
+do {   \
+   (vmi)-used = 0;\
+   (vmi)-largest_chunk = 0;   \
+} while (0)
+#endif
+
 #endif /* _LINUX_VMALLOC_H */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 0f751f2..1d9878b 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c

Re: [PATCH] mm: page_alloc: remove branch operation in free_pages_prepare()

2013-03-13 Thread Joonsoo Kim
Hello, Hugh.

On Thu, Mar 07, 2013 at 06:01:26PM -0800, Hugh Dickins wrote:
 On Fri, 8 Mar 2013, Joonsoo Kim wrote:
  On Thu, Mar 07, 2013 at 10:54:15AM -0800, Hugh Dickins wrote:
   On Thu, 7 Mar 2013, Joonsoo Kim wrote:
   
When we found that the flag has a bit of PAGE_FLAGS_CHECK_AT_PREP,
we reset the flag. If we always reset the flag, we can reduce one
branch operation. So remove it.

Cc: Hugh Dickins hu...@google.com
Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
   
   I don't object to this patch.  But certainly I would have written it
   that way in order not to dirty a cacheline unnecessarily.  It may be
   obvious to you that the cacheline in question is almost always already
   dirty, and the branch almost always more expensive.  But I'll leave that
   to you, and to those who know more about these subtle costs than I do.
  
  Yes. I already think about that. I thought that even if a cacheline is
  not dirty at this time, we always touch the 'struct page' in
  set_freepage_migratetype() a little later, so dirtying is not the problem.
 
 I expect that a very high proportion of user pages have
 PG_uptodate to be cleared here; and there's also the recently added
 page_nid_reset_last(), which will dirty the flags or a nearby field
 when CONFIG_NUMA_BALANCING.  Those argue in favour of your patch.
 

Ah... I totally missed it.

  
  But, now, I re-think this and decide to drop this patch.
  The reason is that 'struct page' of 'compound pages' may not be dirty
  at this time and will not be dirty at later time.
 
 Actual compound pages would have PG_head or PG_tail or PG_compound
 to be cleared there, I believe (check if I'm right on that).  The
 questionable case is the ordinary order0 case without __GFP_COMP
 (and page_nid_reset_last() is applied to each subpage of those).


Yes.

  So this patch is bad idea.
 
 I'm not so sure.  I doubt your patch will make a giant improvement
 in kernel performance!  But it might make a little - maybe you just
 need to give some numbers from perf to justify it (but I'm easily
 dazzled by numbers - don't expect me to judge the result).

Okay.
Thanks for enlightening comment.
Now, I don't have any idea to collect a performance result for this patch.
When I have more time, I try to think it.

Thanks.

 
 Hugh
 
  
  Is there any comments?
  
  Thanks.
  
   Hugh
   

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8fcced7..778f2a9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -614,8 +614,7 @@ static inline int free_pages_check(struct page 
*page)
return 1;
}
page_nid_reset_last(page);
-   if (page-flags  PAGE_FLAGS_CHECK_AT_PREP)
-   page-flags = ~PAGE_FLAGS_CHECK_AT_PREP;
+   page-flags = ~PAGE_FLAGS_CHECK_AT_PREP;
return 0;
 }
 
   
   --
   To unsubscribe, send a message with 'unsubscribe linux-mm' in
   the body to majord...@kvack.org.  For more info on Linux MM,
   see: http://www.linux-mm.org/ .
   Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a
  
 
 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ARM: mm: clean-up in order to reduce to call kmap_high_get()

2013-03-03 Thread Joonsoo Kim
In kmap_atomic(), kmap_high_get() is invoked for checking already
mapped area. In __flush_dcache_page() and dma_cache_maint_page(),
we explicitly call kmap_high_get() before kmap_atomic()
when cache_is_vipt(), so kmap_high_get() can be invoked twice.
This is useless operation, so remove one.

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index c7e3759..b7711be 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -822,16 +822,16 @@ static void dma_cache_maint_page(struct page *page, 
unsigned long offset,
if (PageHighMem(page)) {
if (len + offset  PAGE_SIZE)
len = PAGE_SIZE - offset;
-   vaddr = kmap_high_get(page);
-   if (vaddr) {
-   vaddr += offset;
-   op(vaddr, len, dir);
-   kunmap_high(page);
-   } else if (cache_is_vipt()) {
-   /* unmapped pages might still be cached */
+   if (cache_is_vipt()) {
vaddr = kmap_atomic(page);
op(vaddr + offset, len, dir);
kunmap_atomic(vaddr);
+   } else {
+   vaddr = kmap_high_get(page);
+   if (vaddr) {
+   op(vaddr + offset, len, dir);
+   kunmap_high(page);
+   }
}
} else {
vaddr = page_address(page) + offset;
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 1c8f7f5..e6a03d0 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -170,15 +170,18 @@ void __flush_dcache_page(struct address_space *mapping, 
struct page *page)
if (!PageHighMem(page)) {
__cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
} else {
-   void *addr = kmap_high_get(page);
-   if (addr) {
-   __cpuc_flush_dcache_area(addr, PAGE_SIZE);
-   kunmap_high(page);
-   } else if (cache_is_vipt()) {
-   /* unmapped pages might still be cached */
+   void *addr;
+
+   if (cache_is_vipt()) {
addr = kmap_atomic(page);
__cpuc_flush_dcache_area(addr, PAGE_SIZE);
kunmap_atomic(addr);
+   } else {
+   addr = kmap_high_get(page);
+   if (addr) {
+   __cpuc_flush_dcache_area(addr, PAGE_SIZE);
+   kunmap_high(page);
+   }
}
}
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH] ARM: mm: disable kmap_high_get() for SMP

2013-03-03 Thread Joonsoo Kim
With SMP and enabling kmap_high_get(), it makes users of kmap_atomic()
sequential ordered, because kmap_high_get() use global kmap_lock().
It is not welcome situation, so turn off this optimization for SMP.

Cc: Nicolas Pitre n...@linaro.org
Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/arch/arm/include/asm/highmem.h b/arch/arm/include/asm/highmem.h
index 8c5e828..82fea0f 100644
--- a/arch/arm/include/asm/highmem.h
+++ b/arch/arm/include/asm/highmem.h
@@ -26,15 +26,13 @@ extern void kunmap_high(struct page *page);
  * The reason for kmap_high_get() is to ensure that the currently kmap'd
  * page usage count does not decrease to zero while we're using its
  * existing virtual mapping in an atomic context.  With a VIVT cache this
- * is essential to do, but with a VIPT cache this is only an optimization
- * so not to pay the price of establishing a second mapping if an existing
- * one can be used.  However, on platforms without hardware TLB maintenance
- * broadcast, we simply cannot use ARCH_NEEDS_KMAP_HIGH_GET at all since
- * the locking involved must also disable IRQs which is incompatible with
- * the IPI mechanism used by global TLB operations.
+ * is essential to do, but with a VIPT cache this is only an optimization.
+ * With SMP and enabling kmap_high_get(), it makes users of kmap_atomic()
+ * sequential ordered, because kmap_high_get() use global kmap_lock().
+ * It is not welcome situation, so turn off this optimization for SMP.
  */
 #define ARCH_NEEDS_KMAP_HIGH_GET
-#if defined(CONFIG_SMP)  defined(CONFIG_CPU_TLB_V6)
+#if defined(CONFIG_SMP)
 #undef ARCH_NEEDS_KMAP_HIGH_GET
 #if defined(CONFIG_HIGHMEM)  defined(CONFIG_CPU_CACHE_VIVT)
 #error The sum of features in your kernel config cannot be supported together
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] ARM: mm: disable kmap_high_get() for SMP

2013-03-07 Thread Joonsoo Kim
Hello, Nicolas.

On Tue, Mar 05, 2013 at 05:36:12PM +0800, Nicolas Pitre wrote:
 On Mon, 4 Mar 2013, Joonsoo Kim wrote:
 
  With SMP and enabling kmap_high_get(), it makes users of kmap_atomic()
  sequential ordered, because kmap_high_get() use global kmap_lock().
  It is not welcome situation, so turn off this optimization for SMP.
 
 I'm not sure I understand the problem.
 
 The lock taken by kmap_high_get() is released right away before that 
 function returns and therefore this is not actually serializing 
 anything.

Yes, you understand what I want to say correctly.
Sorry for bad explanation.

Following is reasons why I send this patch with RFC tag.

If we have more cpus, performance degration is possible although
it is very short time to holding the lock in kmap_high_get().

And kmap has maximum 512 entries(512 * 4K = 2M) and some mobile devices 
has 2G memory(highmem 1G), so probability for finding matched entry
is approximately  1/512. This probability can be more decreasing
for device which have more memory. So I think that waste time to find
matched entry is more than saved time.

Above is my humble opinion, so please let me know what I am missing.

Thanks.

 
 
 Nicolas
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm: page_alloc: remove branch operation in free_pages_prepare()

2013-03-07 Thread Joonsoo Kim
When we found that the flag has a bit of PAGE_FLAGS_CHECK_AT_PREP,
we reset the flag. If we always reset the flag, we can reduce one
branch operation. So remove it.

Cc: Hugh Dickins hu...@google.com
Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8fcced7..778f2a9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -614,8 +614,7 @@ static inline int free_pages_check(struct page *page)
return 1;
}
page_nid_reset_last(page);
-   if (page-flags  PAGE_FLAGS_CHECK_AT_PREP)
-   page-flags = ~PAGE_FLAGS_CHECK_AT_PREP;
+   page-flags = ~PAGE_FLAGS_CHECK_AT_PREP;
return 0;
 }
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] ARM: mm: disable kmap_high_get() for SMP

2013-03-07 Thread JoonSoo Kim
2013/3/7 Nicolas Pitre nicolas.pi...@linaro.org:
 On Thu, 7 Mar 2013, Joonsoo Kim wrote:

 Hello, Nicolas.

 On Tue, Mar 05, 2013 at 05:36:12PM +0800, Nicolas Pitre wrote:
  On Mon, 4 Mar 2013, Joonsoo Kim wrote:
 
   With SMP and enabling kmap_high_get(), it makes users of kmap_atomic()
   sequential ordered, because kmap_high_get() use global kmap_lock().
   It is not welcome situation, so turn off this optimization for SMP.
 
  I'm not sure I understand the problem.
 
  The lock taken by kmap_high_get() is released right away before that
  function returns and therefore this is not actually serializing
  anything.

 Yes, you understand what I want to say correctly.
 Sorry for bad explanation.

 Following is reasons why I send this patch with RFC tag.

 If we have more cpus, performance degration is possible although
 it is very short time to holding the lock in kmap_high_get().

 And kmap has maximum 512 entries(512 * 4K = 2M) and some mobile devices
 has 2G memory(highmem 1G), so probability for finding matched entry
 is approximately  1/512. This probability can be more decreasing
 for device which have more memory. So I think that waste time to find
 matched entry is more than saved time.

 Above is my humble opinion, so please let me know what I am missing.

 Please look at the kmap_high_get() code again.  It performs no
 searching at all.  What it does is:

If page is not highmem, it may be already filtered in kmap_atomic().
So we only consider highmem page.

For highmem page, it perform searching.
In kmap_high_get(), page_address() is called.
In page_address(), it hash PA and iterate a list for this hashed value.

And another advantage of disabling ARCH_NEEDS_KMAP_HIGH_GET is
that kmap(), kunmap() works without irq disabled.

Thanks.

 - lock the kmap array against concurrent changes

 - if the given page is not highmem, unlock and return NULL

 - otherwise increment that page reference count, unlock, and return the
   mapped address for that page.

 There is almost zero cost to this function, independently of the number
 of kmap entries, whereas it does save much bigger costs elsewhere when
 it is successful.


 Nicolas
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: page_alloc: remove branch operation in free_pages_prepare()

2013-03-07 Thread Joonsoo Kim
Hello, Hugh.

On Thu, Mar 07, 2013 at 10:54:15AM -0800, Hugh Dickins wrote:
 On Thu, 7 Mar 2013, Joonsoo Kim wrote:
 
  When we found that the flag has a bit of PAGE_FLAGS_CHECK_AT_PREP,
  we reset the flag. If we always reset the flag, we can reduce one
  branch operation. So remove it.
  
  Cc: Hugh Dickins hu...@google.com
  Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
 
 I don't object to this patch.  But certainly I would have written it
 that way in order not to dirty a cacheline unnecessarily.  It may be
 obvious to you that the cacheline in question is almost always already
 dirty, and the branch almost always more expensive.  But I'll leave that
 to you, and to those who know more about these subtle costs than I do.

Yes. I already think about that. I thought that even if a cacheline is
not dirty at this time, we always touch the 'struct page' in
set_freepage_migratetype() a little later, so dirtying is not the problem.

But, now, I re-think this and decide to drop this patch.
The reason is that 'struct page' of 'compound pages' may not be dirty
at this time and will not be dirty at later time.
So this patch is bad idea.

Is there any comments?

Thanks.

 Hugh
 
  
  diff --git a/mm/page_alloc.c b/mm/page_alloc.c
  index 8fcced7..778f2a9 100644
  --- a/mm/page_alloc.c
  +++ b/mm/page_alloc.c
  @@ -614,8 +614,7 @@ static inline int free_pages_check(struct page *page)
  return 1;
  }
  page_nid_reset_last(page);
  -   if (page-flags  PAGE_FLAGS_CHECK_AT_PREP)
  -   page-flags = ~PAGE_FLAGS_CHECK_AT_PREP;
  +   page-flags = ~PAGE_FLAGS_CHECK_AT_PREP;
  return 0;
   }
   
 
 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: mm: clean-up in order to reduce to call kmap_high_get()

2013-03-07 Thread Joonsoo Kim
Hello, Russell.

On Thu, Mar 07, 2013 at 01:26:23PM +, Russell King - ARM Linux wrote:
 On Mon, Mar 04, 2013 at 01:50:09PM +0900, Joonsoo Kim wrote:
  In kmap_atomic(), kmap_high_get() is invoked for checking already
  mapped area. In __flush_dcache_page() and dma_cache_maint_page(),
  we explicitly call kmap_high_get() before kmap_atomic()
  when cache_is_vipt(), so kmap_high_get() can be invoked twice.
  This is useless operation, so remove one.
  
  Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
  
  diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
  index c7e3759..b7711be 100644
  --- a/arch/arm/mm/dma-mapping.c
  +++ b/arch/arm/mm/dma-mapping.c
  @@ -822,16 +822,16 @@ static void dma_cache_maint_page(struct page *page, 
  unsigned long offset,
  if (PageHighMem(page)) {
  if (len + offset  PAGE_SIZE)
  len = PAGE_SIZE - offset;
  -   vaddr = kmap_high_get(page);
  -   if (vaddr) {
  -   vaddr += offset;
  -   op(vaddr, len, dir);
  -   kunmap_high(page);
  -   } else if (cache_is_vipt()) {
  -   /* unmapped pages might still be cached */
  +   if (cache_is_vipt()) {
 
 This should be:
   if (cache_is_vipt_nonaliasing())
 
 to make it _explicit_ that this technique is only for non-aliasing VIPT
 caches (this doesn't work on any other of our cache types.)  Yes, I
 know we don't support highmem with VIPT aliasing caches - but still,
 we should ensure that this is self-documented in this code.
 
 Same for arch/arm/mm/flush.c

Okay. I will re-work and will send v2 soon.

 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH] staging, android, ashmem: invalidate pin/unpin ioctl for private map

2012-11-30 Thread Joonsoo Kim
Private mapped ashmem doesn't install vm fault handler.
So, do_anonymous_page is called for handling fault in handle_pte_fault().
This type of backed memory isn't related to asma-file which is used by
ashmem shrinker. Shrinking unpinned area for this mapping
will not shrink memory actually.
Although it's memory doesn't actually removed, pin status will
return ASHMEM_WAS_PURGED. So application will re-load content.
This is needless overhead, so invalidate pin/unpin ioctl behavior for
private map.

CC: John Stultz john.stu...@linaro.org
CC: Brian Swetland swetl...@google.com
CC: Colin Cross ccr...@android.com
CC: Arve Hjønnevåg a...@android.com
CC: Dima Zavin d...@android.com
CC: Robert Love rl...@google.com
Signed-off-by: Joonsoo Kim js1...@gmail.com

diff --git a/drivers/staging/android/ashmem.c b/drivers/staging/android/ashmem.c
index 634b9ae..2fde9df 100644
--- a/drivers/staging/android/ashmem.c
+++ b/drivers/staging/android/ashmem.c
@@ -49,6 +49,7 @@ struct ashmem_area {
struct file *file;   /* the shmem-based backing file */
size_t size; /* size of the mapping, in bytes */
unsigned long prot_mask; /* allowed prot bits, as vm_flags */
+   bool shared_mapping; /* mapping type */
 };
 
 /*
@@ -327,6 +328,7 @@ static int ashmem_mmap(struct file *file, struct 
vm_area_struct *vma)
fput(asma-file);
goto out;
}
+   asma-shared_mapping = 1;
}
 
if (vma-vm_file)
@@ -614,21 +616,35 @@ static int ashmem_pin_unpin(struct ashmem_area *asma, 
unsigned long cmd,
pgstart = pin.offset / PAGE_SIZE;
pgend = pgstart + (pin.len / PAGE_SIZE) - 1;
 
-   mutex_lock(ashmem_mutex);
+   if (asma-shared_mapping) {
+   mutex_lock(ashmem_mutex);
 
-   switch (cmd) {
-   case ASHMEM_PIN:
-   ret = ashmem_pin(asma, pgstart, pgend);
-   break;
-   case ASHMEM_UNPIN:
-   ret = ashmem_unpin(asma, pgstart, pgend);
-   break;
-   case ASHMEM_GET_PIN_STATUS:
-   ret = ashmem_get_pin_status(asma, pgstart, pgend);
-   break;
-   }
+   switch (cmd) {
+   case ASHMEM_PIN:
+   ret = ashmem_pin(asma, pgstart, pgend);
+   break;
+   case ASHMEM_UNPIN:
+   ret = ashmem_unpin(asma, pgstart, pgend);
+   break;
+   case ASHMEM_GET_PIN_STATUS:
+   ret = ashmem_get_pin_status(asma, pgstart, pgend);
+   break;
+   }
 
-   mutex_unlock(ashmem_mutex);
+   mutex_unlock(ashmem_mutex);
+
+   } else {
+   switch (cmd) {
+   /* if it is private map, pin/unpin have no impact to vm */
+   case ASHMEM_PIN:
+   case ASHMEM_UNPIN:
+   ret = ASHMEM_NOT_PURGED;
+   break;
+   case ASHMEM_GET_PIN_STATUS:
+   ret = ASHMEM_IS_PINNED;
+   break;
+   }
+   }
 
return ret;
 }
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] staging, android, ashmem: invalidate pin/unpin ioctl for private map

2012-12-02 Thread JoonSoo Kim
Hello, Dan.

2012/12/2 Dan Carpenter dan.carpen...@oracle.com:
 On Sat, Dec 01, 2012 at 02:45:57AM +0900, Joonsoo Kim wrote:
 @@ -614,21 +616,35 @@ static int ashmem_pin_unpin(struct ashmem_area *asma, 
 unsigned long cmd,
   pgstart = pin.offset / PAGE_SIZE;
   pgend = pgstart + (pin.len / PAGE_SIZE) - 1;

 - mutex_lock(ashmem_mutex);
 + if (asma-shared_mapping) {
 + mutex_lock(ashmem_mutex);

 Wouldn't we need to hold the mutex while we check the
 -shared_mapping?

I doesn't fully understand ashmem's lock semantic.
Code for retrieving some value of asma instance doesn't hold the mutex, now.
For example, in ashmem_ioctl(), asma-size, asma-prot_mask.
And in ashmem_pin_unpin(), there is asma-file, asma-size which is
retrieved without the mutex.
According to this semantic, the mutex doesn't need for checking
asma-shared_mapping.

Thanks for review!
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] staging, android, ashmem: invalidate pin/unpin ioctl for private map

2012-12-03 Thread JoonSoo Kim
2012/12/3 Dan Carpenter dan.carpen...@oracle.com:
 On Mon, Dec 03, 2012 at 09:09:59AM +0900, JoonSoo Kim wrote:
 Hello, Dan.

 2012/12/2 Dan Carpenter dan.carpen...@oracle.com:
  On Sat, Dec 01, 2012 at 02:45:57AM +0900, Joonsoo Kim wrote:
  @@ -614,21 +616,35 @@ static int ashmem_pin_unpin(struct ashmem_area 
  *asma, unsigned long cmd,
pgstart = pin.offset / PAGE_SIZE;
pgend = pgstart + (pin.len / PAGE_SIZE) - 1;
 
  - mutex_lock(ashmem_mutex);
  + if (asma-shared_mapping) {
  + mutex_lock(ashmem_mutex);
 
  Wouldn't we need to hold the mutex while we check the
  -shared_mapping?

 I doesn't fully understand ashmem's lock semantic.
 Code for retrieving some value of asma instance doesn't hold the mutex, now.
 For example, in ashmem_ioctl(), asma-size, asma-prot_mask.
 And in ashmem_pin_unpin(), there is asma-file, asma-size which is
 retrieved without the mutex.
 According to this semantic, the mutex doesn't need for checking
 asma-shared_mapping.

 The ashmem_ioctl() is clearly racy.  :P  asma-size can be modified
 and read at the same time.  It's not an example to follow.

Okay :)
I will insert a code for holding the mutex in next spin.

Thanks!
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] scripts/tags.sh: Support subarch for ARM

2012-12-03 Thread Joonsoo Kim
Current tags.sh doesn't handle subarch for ARM.
There are too many subarch on ARM, it is hard that we locate some
functions which are defined in every subarch with tags util family.
Therefore support subarch for removing this unconvenience.

We can use ARM subarch functionality like below.
make cscope O=. SRCARCH=arm SUBARCH=xxx

Signed-off-by: Joonsoo Kim js1...@gmail.com

diff --git a/scripts/tags.sh b/scripts/tags.sh
index 79fdafb..a400c88 100755
--- a/scripts/tags.sh
+++ b/scripts/tags.sh
@@ -48,13 +48,14 @@ find_arch_sources()
for i in $archincludedir; do
prune=$prune -wholename $i -prune -o
done
-   find ${tree}arch/$1 $ignore $prune -name $2 -print;
+   find ${tree}arch/$1 $ignore $subarchprune $prune -name $2 -print;
 }
 
 # find sources in arch/$1/include
 find_arch_include_sources()
 {
-   include=$(find ${tree}arch/$1/ -name include -type d);
+   include=$(find ${tree}arch/$1/ $subarchprune \
+   -name include -type d -print);
if [ -n $include ]; then
archincludedir=$archincludedir $include
find $include $ignore -name $2 -print;
@@ -234,6 +235,14 @@ if [ ${ARCH} = um ]; then
else
archinclude=${SUBARCH}
fi
+elif [ ${SRCARCH} = arm -a ${SUBARCH} !=  ]; then
+   subarchdir=$(find ${tree}arch/$SRCARCH/ -name mach-* -type d -o \
+   -name plat-* -type d);
+   for i in $subarchdir; do
+   if ! [[ $i =~ .*-${SUBARCH}$ ]]; then
+   subarchprune=$subarchprune -wholename $i -prune -o
+   fi
+   done
 fi
 
 remove_structs=
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] scripts/tags.sh: Support compiled source

2012-12-03 Thread Joonsoo Kim
We usually have interst in compiled files only,
because they are strongly related to individual's work.
Current tags.sh can't select compiled files, so support it.

We can use this functionality like below.
make cscope O=. SRCARCH= SUBARCH=compiled

It must be executed after building the kernel.

Signed-off-by: Joonsoo Kim js1...@gmail.com

diff --git a/scripts/tags.sh b/scripts/tags.sh
index a400c88..ef9668c 100755
--- a/scripts/tags.sh
+++ b/scripts/tags.sh
@@ -96,6 +96,29 @@ all_sources()
find_other_sources '*.[chS]'
 }
 
+all_compiled_sources()
+{
+   for i in $(all_sources); do
+   if [[ $i =~ .*c$ || $i =~ .*S$ ]]; then
+   j=${i/\.[cS]/\.o}
+   if [ -e $j ]; then
+   echo $i
+   fi
+   else
+   echo $i
+   fi
+   done
+}
+
+all_target_sources()
+{
+   if [ $compiled_target = true ]; then
+   all_compiled_sources
+   else
+   all_sources
+   fi
+}
+
 all_kconfigs()
 {
for arch in $ALLSOURCE_ARCHS; do
@@ -111,18 +134,18 @@ all_defconfigs()
 
 docscope()
 {
-   (echo \-k; echo \-q; all_sources)  cscope.files
+   (echo \-k; echo \-q; all_target_sources)  cscope.files
cscope -b -f cscope.out
 }
 
 dogtags()
 {
-   all_sources | gtags -i -f -
+   all_target_sources | gtags -i -f -
 }
 
 exuberant()
 {
-   all_sources | xargs $1 -a   \
+   all_target_sources | xargs $1 -a\
-I __initdata,__exitdata,__acquires,__releases  \
-I __read_mostly,cacheline_aligned  \
-I cacheline_aligned_in_smp \
@@ -174,7 +197,7 @@ exuberant()
 
 emacs()
 {
-   all_sources | xargs $1 -a   \
+   all_target_sources | xargs $1 -a\
--regex='/^(ENTRY|_GLOBAL)(\([^)]*\)).*/\2/'\
--regex='/^SYSCALL_DEFINE[0-9]?(\([^,)]*\).*/sys_\1/'   \
--regex='/^TRACE_EVENT(\([^,)]*\).*/trace_\1/'  \
@@ -221,10 +244,13 @@ xtags()
elif $1 --version 21 | grep -iq emacs; then
emacs $1
else
-   all_sources | xargs $1 -a
+   all_target_sources | xargs $1 -a
 fi
 }
 
+if [ ${SUBARCH} = compiled ]; then
+   compiled_target=true
+fi
 
 # Support um (which uses SUBARCH)
 if [ ${ARCH} = um ]; then
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 1/2] scripts/tags.sh: Support subarch for ARM

2012-12-03 Thread Joonsoo Kim
Current tags.sh doesn't handle subarch for ARM.
There are too many subarch on ARM, it is hard that we locate some
functions which are defined in every subarch with tags util family.
Therefore support subarch for removing this unconvenience.

We can use ARM subarch functionality like below.
make cscope O=. SRCARCH=arm SUBARCH=xxx

Signed-off-by: Joonsoo Kim js1...@gmail.com
---
v2: change bash specific '[[]]' to 'case in' statement.

diff --git a/scripts/tags.sh b/scripts/tags.sh
index 79fdafb..38483f4 100755
--- a/scripts/tags.sh
+++ b/scripts/tags.sh
@@ -48,13 +48,14 @@ find_arch_sources()
for i in $archincludedir; do
prune=$prune -wholename $i -prune -o
done
-   find ${tree}arch/$1 $ignore $prune -name $2 -print;
+   find ${tree}arch/$1 $ignore $subarchprune $prune -name $2 -print;
 }
 
 # find sources in arch/$1/include
 find_arch_include_sources()
 {
-   include=$(find ${tree}arch/$1/ -name include -type d);
+   include=$(find ${tree}arch/$1/ $subarchprune \
+   -name include -type d -print);
if [ -n $include ]; then
archincludedir=$archincludedir $include
find $include $ignore -name $2 -print;
@@ -234,6 +235,21 @@ if [ ${ARCH} = um ]; then
else
archinclude=${SUBARCH}
fi
+elif [ ${SRCARCH} = arm -a ${SUBARCH} !=  ]; then
+   subarchdir=$(find ${tree}arch/$SRCARCH/ -name mach-* -type d -o \
+   -name plat-* -type d);
+   for i in $subarchdir; do
+   case $i in
+   *mach-${SUBARCH})
+   ;;
+   *plat-${SUBARCH})
+   ;;
+   *)
+   subarchprune=$subarchprune \
+   -wholename $i -prune -o
+   ;;
+   esac
+   done
 fi
 
 remove_structs=
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 2/2] scripts/tags.sh: Support compiled source

2012-12-03 Thread Joonsoo Kim
We usually have interst in compiled files only,
because they are strongly related to individual's work.
Current tags.sh can't select compiled files, so support it.

We can use this functionality like below.
make cscope O=. SRCARCH= COMPILED_SOURCE=compiled

It must be executed after building the kernel.

Signed-off-by: Joonsoo Kim js1...@gmail.com
---
v2: change bash specific '[[]]' to 'case in' statement.
use COMPILED_SOURCE env var, instead of abusing SUBARCH

diff --git a/scripts/tags.sh b/scripts/tags.sh
index 38483f4..9c02921 100755
--- a/scripts/tags.sh
+++ b/scripts/tags.sh
@@ -96,6 +96,32 @@ all_sources()
find_other_sources '*.[chS]'
 }
 
+all_compiled_sources()
+{
+   for i in $(all_sources); do
+   case $i in
+   *.[cS])
+   j=${i/\.[cS]/\.o}
+   if [ -e $j ]; then
+   echo $i
+   fi
+   ;;
+   *)
+   echo $i
+   ;;
+   esac
+   done
+}
+
+all_target_sources()
+{
+   if [ $COMPILED_SOURCE = compiled ]; then
+   all_compiled_sources
+   else
+   all_sources
+   fi
+}
+
 all_kconfigs()
 {
for arch in $ALLSOURCE_ARCHS; do
@@ -111,18 +137,18 @@ all_defconfigs()
 
 docscope()
 {
-   (echo \-k; echo \-q; all_sources)  cscope.files
+   (echo \-k; echo \-q; all_target_sources)  cscope.files
cscope -b -f cscope.out
 }
 
 dogtags()
 {
-   all_sources | gtags -i -f -
+   all_target_sources | gtags -i -f -
 }
 
 exuberant()
 {
-   all_sources | xargs $1 -a   \
+   all_target_sources | xargs $1 -a\
-I __initdata,__exitdata,__acquires,__releases  \
-I __read_mostly,cacheline_aligned  \
-I cacheline_aligned_in_smp \
@@ -174,7 +200,7 @@ exuberant()
 
 emacs()
 {
-   all_sources | xargs $1 -a   \
+   all_target_sources | xargs $1 -a\
--regex='/^(ENTRY|_GLOBAL)(\([^)]*\)).*/\2/'\
--regex='/^SYSCALL_DEFINE[0-9]?(\([^,)]*\).*/sys_\1/'   \
--regex='/^TRACE_EVENT(\([^,)]*\).*/trace_\1/'  \
@@ -221,11 +247,10 @@ xtags()
elif $1 --version 21 | grep -iq emacs; then
emacs $1
else
-   all_sources | xargs $1 -a
+   all_target_sources | xargs $1 -a
 fi
 }
 
-
 # Support um (which uses SUBARCH)
 if [ ${ARCH} = um ]; then
if [ $SUBARCH = i386 ]; then
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] scripts/tags.sh: Support compiled source

2012-12-03 Thread JoonSoo Kim
2012/12/4 Michal Marek mma...@suse.cz:
 On 3.12.2012 17:22, Joonsoo Kim wrote:
 We usually have interst in compiled files only,
 because they are strongly related to individual's work.
 Current tags.sh can't select compiled files, so support it.

 We can use this functionality like below.
 make cscope O=. SRCARCH= SUBARCH=compiled

 I like the idea, but please try to come up with a different syntax that
 does not abuse SUBARCH.


 +all_compiled_sources()
 +{
 + for i in $(all_sources); do
 + if [[ $i =~ .*c$ || $i =~ .*S$ ]]; then
 + j=${i/\.[cS]/\.o}

 Please use 'case $i in ...' instead of the bash-specific [[ ... ]]
 construct.

I send v2 which is fixed according to your comment.
Thanks for review.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND] ARM: sched: correct update_sched_clock()

2013-02-07 Thread Joonsoo Kim
Hello, Russell.

On Wed, Feb 06, 2013 at 04:33:55PM +, Russell King - ARM Linux wrote:
 On Wed, Feb 06, 2013 at 10:33:53AM +0100, Linus Walleij wrote:
  On Wed, Feb 6, 2013 at 6:21 AM, Joonsoo Kim iamjoonsoo@lge.com wrote:
  
   If we want load epoch_cyc and epoch_ns atomically,
   we should update epoch_cyc_copy first of all.
   This notify reader that updating is in progress.
  
  If you think the patch is finished, put it into Russell's patch tracker:
  http://www.arm.linux.org.uk/developer/patches/
 
 Yea, this patch looks like the right thing... and yes, into the patch
 system please.

I try to put it into patch tracker, but I fail to put it.
I use following command.

git send-email --to patc...@arm.linux.org.uk 0001-ARM-sched-correct~

Am I wrong?
Could you teach me how to put patch into your patch tracker?
I already read help on your website, but I can't find any stuff for
git send-email.

Thanks.

 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND] ARM: sched: correct update_sched_clock()

2013-02-08 Thread JoonSoo Kim
2013/2/9 Nicolas Pitre nicolas.pi...@linaro.org:
 On Fri, 8 Feb 2013, Russell King - ARM Linux wrote:

 On Fri, Feb 08, 2013 at 03:51:25PM +0900, Joonsoo Kim wrote:
  I try to put it into patch tracker, but I fail to put it.
  I use following command.
 
  git send-email --to patc...@arm.linux.org.uk 0001-ARM-sched-correct~
 
  Am I wrong?
  Could you teach me how to put patch into your patch tracker?
  I already read help on your website, but I can't find any stuff for
  git send-email.

 I've never used git to send email, so I'm afraid I can't help with that.
 Maybe someone with more experience with sending email from git can
 comment?

 The commit text for the patch needs to contain the following 2 lines at
 the very bottom in order to match the patch system's expectations:

 PATCH FOLLOWS
 KernelVersion: v3.8

 The KernelVersion value needs to be adjusted of course.

 Note: I never tried it myself but it ought to work.

Yes It works!
Thanks!!!

 Nicolas
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 1/3] ARM: vmregion: remove vmregion code entirely

2013-02-04 Thread Joonsoo Kim
Now, there is no user for vmregion.
So remove it.

Acked-by: Nicolas Pitre n...@linaro.org
Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
index 8a9c4cb..4e333fa 100644
--- a/arch/arm/mm/Makefile
+++ b/arch/arm/mm/Makefile
@@ -6,7 +6,7 @@ obj-y   := dma-mapping.o extable.o 
fault.o init.o \
   iomap.o
 
 obj-$(CONFIG_MMU)  += fault-armv.o flush.o idmap.o ioremap.o \
-  mmap.o pgd.o mmu.o vmregion.o
+  mmap.o pgd.o mmu.o
 
 ifneq ($(CONFIG_MMU),y)
 obj-y  += nommu.o
diff --git a/arch/arm/mm/vmregion.c b/arch/arm/mm/vmregion.c
deleted file mode 100644
index a631016..000
--- a/arch/arm/mm/vmregion.c
+++ /dev/null
@@ -1,205 +0,0 @@
-#include linux/fs.h
-#include linux/spinlock.h
-#include linux/list.h
-#include linux/proc_fs.h
-#include linux/seq_file.h
-#include linux/slab.h
-
-#include vmregion.h
-
-/*
- * VM region handling support.
- *
- * This should become something generic, handling VM region allocations for
- * vmalloc and similar (ioremap, module space, etc).
- *
- * I envisage vmalloc()'s supporting vm_struct becoming:
- *
- *  struct vm_struct {
- *struct vmregion  region;
- *unsigned longflags;
- *struct page  **pages;
- *unsigned int nr_pages;
- *unsigned longphys_addr;
- *  };
- *
- * get_vm_area() would then call vmregion_alloc with an appropriate
- * struct vmregion head (eg):
- *
- *  struct vmregion vmalloc_head = {
- * .vm_list= LIST_HEAD_INIT(vmalloc_head.vm_list),
- * .vm_start   = VMALLOC_START,
- * .vm_end = VMALLOC_END,
- *  };
- *
- * However, vmalloc_head.vm_start is variable (typically, it is dependent on
- * the amount of RAM found at boot time.)  I would imagine that get_vm_area()
- * would have to initialise this each time prior to calling vmregion_alloc().
- */
-
-struct arm_vmregion *
-arm_vmregion_alloc(struct arm_vmregion_head *head, size_t align,
-  size_t size, gfp_t gfp, const void *caller)
-{
-   unsigned long start = head-vm_start, addr = head-vm_end;
-   unsigned long flags;
-   struct arm_vmregion *c, *new;
-
-   if (head-vm_end - head-vm_start  size) {
-   printk(KERN_WARNING %s: allocation too big (requested %#x)\n,
-   __func__, size);
-   goto out;
-   }
-
-   new = kmalloc(sizeof(struct arm_vmregion), gfp);
-   if (!new)
-   goto out;
-
-   new-caller = caller;
-
-   spin_lock_irqsave(head-vm_lock, flags);
-
-   addr = rounddown(addr - size, align);
-   list_for_each_entry_reverse(c, head-vm_list, vm_list) {
-   if (addr = c-vm_end)
-   goto found;
-   addr = rounddown(c-vm_start - size, align);
-   if (addr  start)
-   goto nospc;
-   }
-
- found:
-   /*
-* Insert this entry after the one we found.
-*/
-   list_add(new-vm_list, c-vm_list);
-   new-vm_start = addr;
-   new-vm_end = addr + size;
-   new-vm_active = 1;
-
-   spin_unlock_irqrestore(head-vm_lock, flags);
-   return new;
-
- nospc:
-   spin_unlock_irqrestore(head-vm_lock, flags);
-   kfree(new);
- out:
-   return NULL;
-}
-
-static struct arm_vmregion *__arm_vmregion_find(struct arm_vmregion_head 
*head, unsigned long addr)
-{
-   struct arm_vmregion *c;
-
-   list_for_each_entry(c, head-vm_list, vm_list) {
-   if (c-vm_active  c-vm_start == addr)
-   goto out;
-   }
-   c = NULL;
- out:
-   return c;
-}
-
-struct arm_vmregion *arm_vmregion_find(struct arm_vmregion_head *head, 
unsigned long addr)
-{
-   struct arm_vmregion *c;
-   unsigned long flags;
-
-   spin_lock_irqsave(head-vm_lock, flags);
-   c = __arm_vmregion_find(head, addr);
-   spin_unlock_irqrestore(head-vm_lock, flags);
-   return c;
-}
-
-struct arm_vmregion *arm_vmregion_find_remove(struct arm_vmregion_head *head, 
unsigned long addr)
-{
-   struct arm_vmregion *c;
-   unsigned long flags;
-
-   spin_lock_irqsave(head-vm_lock, flags);
-   c = __arm_vmregion_find(head, addr);
-   if (c)
-   c-vm_active = 0;
-   spin_unlock_irqrestore(head-vm_lock, flags);
-   return c;
-}
-
-void arm_vmregion_free(struct arm_vmregion_head *head, struct arm_vmregion *c)
-{
-   unsigned long flags;
-
-   spin_lock_irqsave(head-vm_lock, flags);
-   list_del(c-vm_list);
-   spin_unlock_irqrestore(head-vm_lock, flags);
-
-   kfree(c);
-}
-
-#ifdef CONFIG_PROC_FS
-static int arm_vmregion_show(struct seq_file *m, void *p)
-{
-   struct arm_vmregion *c = list_entry(p, struct arm_vmregion, vm_list);
-
-   seq_printf(m, 0x%08lx-0x%08lx %7lu, c-vm_start, c-vm_end,
-   c-vm_end - c

[PATCH v5 2/3] ARM: ioremap: introduce an infrastructure for static mapped area

2013-02-04 Thread Joonsoo Kim
In current implementation, we used ARM-specific flag, that is,
VM_ARM_STATIC_MAPPING, for distinguishing ARM specific static mapped area.
The purpose of static mapped area is to re-use static mapped area when
entire physical address range of the ioremap request can be covered
by this area.

This implementation causes needless overhead for some cases.
For example, assume that there is only one static mapped area and
vmlist has 300 areas. Every time we call ioremap, we check 300 areas for
deciding whether it is matched or not. Moreover, even if there is
no static mapped area and vmlist has 300 areas, every time we call
ioremap, we check 300 areas in now.

If we construct a extra list for static mapped area, we can eliminate
above mentioned overhead.
With a extra list, if there is one static mapped area,
we just check only one area and proceed next operation quickly.

In fact, it is not a critical problem, because ioremap is not frequently
used. But reducing overhead is better idea.

Another reason for doing this work is for removing architecture dependency
on vmalloc layer. I think that vmlist and vmlist_lock is internal data
structure for vmalloc layer. Some codes for debugging and stat inevitably
use vmlist and vmlist_lock. But it is preferable that they are used
as least as possible in outside of vmalloc.c

Now, I introduce an ARM-specific infrastructure for static mapped area. In
the following patch, we will use this and resolve above mentioned problem.

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c
index 88fd86c..904c15e 100644
--- a/arch/arm/mm/ioremap.c
+++ b/arch/arm/mm/ioremap.c
@@ -39,6 +39,70 @@
 #include asm/mach/pci.h
 #include mm.h
 
+
+LIST_HEAD(static_vmlist);
+
+static struct static_vm *find_static_vm_paddr(phys_addr_t paddr,
+   size_t size, unsigned int mtype)
+{
+   struct static_vm *svm;
+   struct vm_struct *vm;
+
+   list_for_each_entry(svm, static_vmlist, list) {
+   vm = svm-vm;
+   if (!(vm-flags  VM_ARM_STATIC_MAPPING))
+   continue;
+   if ((vm-flags  VM_ARM_MTYPE_MASK) != VM_ARM_MTYPE(mtype))
+   continue;
+
+   if (vm-phys_addr  paddr ||
+   paddr + size - 1  vm-phys_addr + vm-size - 1)
+   continue;
+
+   return svm;
+   }
+
+   return NULL;
+}
+
+struct static_vm *find_static_vm_vaddr(void *vaddr)
+{
+   struct static_vm *svm;
+   struct vm_struct *vm;
+
+   list_for_each_entry(svm, static_vmlist, list) {
+   vm = svm-vm;
+
+   /* static_vmlist is ascending order */
+   if (vm-addr  vaddr)
+   break;
+
+   if (vm-addr = vaddr  vm-addr + vm-size  vaddr)
+   return svm;
+   }
+
+   return NULL;
+}
+
+void __init add_static_vm_early(struct static_vm *svm)
+{
+   struct static_vm *curr_svm;
+   struct vm_struct *vm;
+   void *vaddr;
+
+   vm = svm-vm;
+   vm_area_add_early(vm);
+   vaddr = vm-addr;
+
+   list_for_each_entry(curr_svm, static_vmlist, list) {
+   vm = curr_svm-vm;
+
+   if (vm-addr  vaddr)
+   break;
+   }
+   list_add_tail(svm-list, curr_svm-list);
+}
+
 int ioremap_page(unsigned long virt, unsigned long phys,
 const struct mem_type *mtype)
 {
diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
index a8ee92d..d5a4e9a 100644
--- a/arch/arm/mm/mm.h
+++ b/arch/arm/mm/mm.h
@@ -1,4 +1,6 @@
 #ifdef CONFIG_MMU
+#include linux/list.h
+#include linux/vmalloc.h
 
 /* the upper-most page table pointer */
 extern pmd_t *top_pmd;
@@ -65,6 +67,16 @@ extern void __flush_dcache_page(struct address_space 
*mapping, struct page *page
 /* consistent regions used by dma_alloc_attrs() */
 #define VM_ARM_DMA_CONSISTENT  0x2000
 
+
+struct static_vm {
+   struct vm_struct vm;
+   struct list_head list;
+};
+
+extern struct list_head static_vmlist;
+extern struct static_vm *find_static_vm_vaddr(void *vaddr);
+extern __init void add_static_vm_early(struct static_vm *svm);
+
 #endif
 
 #ifdef CONFIG_ZONE_DMA
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 0/3] introduce static_vm for ARM-specific static mapped area

2013-02-04 Thread Joonsoo Kim
In current implementation, we used ARM-specific flag, that is,
VM_ARM_STATIC_MAPPING, for distinguishing ARM specific static mapped area.
The purpose of static mapped area is to re-use static mapped area when
entire physical address range of the ioremap request can be covered
by this area.

This implementation causes needless overhead for some cases.
For example, assume that there is only one static mapped area and
vmlist has 300 areas. Every time we call ioremap, we check 300 areas for
deciding whether it is matched or not. Moreover, even if there is
no static mapped area and vmlist has 300 areas, every time we call
ioremap, we check 300 areas in now.

If we construct a extra list for static mapped area, we can eliminate
above mentioned overhead.
With a extra list, if there is one static mapped area,
we just check only one area and proceed next operation quickly.

In fact, it is not a critical problem, because ioremap is not frequently
used. But reducing overhead is better idea.

Another reason for doing this work is for removing vm_struct list management,
entirely. For more information, look at the following link.
http://lkml.org/lkml/2012/12/6/184

Changelog
v4-v5:
  [2/3]: Changes from Nicolas' suggestion
 - don't use separate flags for static_vm
 - remove a lock
 - declare add_static_vm_early() as __init
  [3/3]: Changes from Nicolas' suggestion
 - add / leave comments

v3-v4:
  [2/3]: Changes from Nicolas' suggestion
 - embed static_vm code in ioremap.c
 - simplify struct static_vm
 - remove init_static_vm, instead, add_static_vm_early() init static_vm
 Use generic list for list management of static_vm
 Convert spin_lock to rw_lock
 Modify static_vm's flags bits
  [3/3]: Rework according to [2/3] change
  Rebased on v3.8-rc5

v2-v3:
  coverletter: refer a link related to this work
  [2/3]: drop @flags of find_static_vm_vaddr
  Rebased on v3.8-rc4

v1-v2: 
  [2/3]: patch description is improved.
  Rebased on v3.7-rc7

Joonsoo Kim (3):
  ARM: vmregion: remove vmregion code entirely
  ARM: ioremap: introduce an infrastructure for static mapped area
  ARM: mm: use static_vm for managing static mapped areas

 arch/arm/mm/Makefile   |2 +-
 arch/arm/mm/ioremap.c  |  135 +--
 arch/arm/mm/mm.h   |   12 +++
 arch/arm/mm/mmu.c  |   34 
 arch/arm/mm/vmregion.c |  205 
 arch/arm/mm/vmregion.h |   31 
 6 files changed, 123 insertions(+), 296 deletions(-)
 delete mode 100644 arch/arm/mm/vmregion.c
 delete mode 100644 arch/arm/mm/vmregion.h

-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 3/3] ARM: mm: use static_vm for managing static mapped areas

2013-02-04 Thread Joonsoo Kim
A static mapped area is ARM-specific, so it is better not to use
generic vmalloc data structure, that is, vmlist and vmlist_lock
for managing static mapped area. And it causes some needless overhead and
reducing this overhead is better idea.

Now, we have newly introduced static_vm infrastructure.
With it, we don't need to iterate all mapped areas. Instead, we just
iterate static mapped areas. It helps to reduce an overhead of finding
matched area. And architecture dependency on vmalloc layer is removed,
so it will help to maintainability for vmalloc layer.

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c
index 904c15e..c7fef4b 100644
--- a/arch/arm/mm/ioremap.c
+++ b/arch/arm/mm/ioremap.c
@@ -261,13 +261,14 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn,
const struct mem_type *type;
int err;
unsigned long addr;
-   struct vm_struct * area;
+   struct vm_struct *area;
+   phys_addr_t paddr = __pfn_to_phys(pfn);
 
 #ifndef CONFIG_ARM_LPAE
/*
 * High mappings must be supersection aligned
 */
-   if (pfn = 0x10  (__pfn_to_phys(pfn)  ~SUPERSECTION_MASK))
+   if (pfn = 0x10  (paddr  ~SUPERSECTION_MASK))
return NULL;
 #endif
 
@@ -283,24 +284,16 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn,
/*
 * Try to reuse one of the static mapping whenever possible.
 */
-   read_lock(vmlist_lock);
-   for (area = vmlist; area; area = area-next) {
-   if (!size || (sizeof(phys_addr_t) == 4  pfn = 0x10))
-   break;
-   if (!(area-flags  VM_ARM_STATIC_MAPPING))
-   continue;
-   if ((area-flags  VM_ARM_MTYPE_MASK) != VM_ARM_MTYPE(mtype))
-   continue;
-   if (__phys_to_pfn(area-phys_addr)  pfn ||
-   __pfn_to_phys(pfn) + size-1  area-phys_addr + 
area-size-1)
-   continue;
-   /* we can drop the lock here as we know *area is static */
-   read_unlock(vmlist_lock);
-   addr = (unsigned long)area-addr;
-   addr += __pfn_to_phys(pfn) - area-phys_addr;
-   return (void __iomem *) (offset + addr);
+   if (size  !((sizeof(phys_addr_t) == 4  pfn = 0x10))) {
+   struct static_vm *svm;
+
+   svm = find_static_vm_paddr(paddr, size, mtype);
+   if (svm) {
+   addr = (unsigned long)svm-vm.addr;
+   addr += paddr - svm-vm.phys_addr;
+   return (void __iomem *) (offset + addr);
+   }
}
-   read_unlock(vmlist_lock);
 
/*
 * Don't allow RAM to be mapped - this causes problems with ARMv6+
@@ -312,21 +305,21 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn,
if (!area)
return NULL;
addr = (unsigned long)area-addr;
-   area-phys_addr = __pfn_to_phys(pfn);
+   area-phys_addr = paddr;
 
 #if !defined(CONFIG_SMP)  !defined(CONFIG_ARM_LPAE)
if (DOMAIN_IO == 0 
(((cpu_architecture() = CPU_ARCH_ARMv6)  (get_cr()  CR_XP)) ||
   cpu_is_xsc3())  pfn = 0x10 
-  !((__pfn_to_phys(pfn) | size | addr)  ~SUPERSECTION_MASK)) {
+  !((paddr | size | addr)  ~SUPERSECTION_MASK)) {
area-flags |= VM_ARM_SECTION_MAPPING;
err = remap_area_supersections(addr, pfn, size, type);
-   } else if (!((__pfn_to_phys(pfn) | size | addr)  ~PMD_MASK)) {
+   } else if (!((paddr | size | addr)  ~PMD_MASK)) {
area-flags |= VM_ARM_SECTION_MAPPING;
err = remap_area_sections(addr, pfn, size, type);
} else
 #endif
-   err = ioremap_page_range(addr, addr + size, __pfn_to_phys(pfn),
+   err = ioremap_page_range(addr, addr + size, paddr,
 __pgprot(type-prot_pte));
 
if (err) {
@@ -410,34 +403,28 @@ __arm_ioremap_exec(unsigned long phys_addr, size_t size, 
bool cached)
 void __iounmap(volatile void __iomem *io_addr)
 {
void *addr = (void *)(PAGE_MASK  (unsigned long)io_addr);
-   struct vm_struct *vm;
+   struct static_vm *svm;
+
+   /* If this is a static mapping, we must leave it alone */
+   svm = find_static_vm_vaddr(addr);
+   if (svm)
+   return;
 
-   read_lock(vmlist_lock);
-   for (vm = vmlist; vm; vm = vm-next) {
-   if (vm-addr  addr)
-   break;
-   if (!(vm-flags  VM_IOREMAP))
-   continue;
-   /* If this is a static mapping we must leave it alone */
-   if ((vm-flags  VM_ARM_STATIC_MAPPING) 
-   (vm-addr = addr)  (vm-addr + vm-size  addr)) {
-   read_unlock(vmlist_lock

Re: [PATCH v5 3/3] ARM: mm: use static_vm for managing static mapped areas

2013-02-05 Thread Joonsoo Kim
Hello, Nicolas.

On Mon, Feb 04, 2013 at 11:44:16PM -0500, Nicolas Pitre wrote:
 On Tue, 5 Feb 2013, Joonsoo Kim wrote:
 
  A static mapped area is ARM-specific, so it is better not to use
  generic vmalloc data structure, that is, vmlist and vmlist_lock
  for managing static mapped area. And it causes some needless overhead and
  reducing this overhead is better idea.
  
  Now, we have newly introduced static_vm infrastructure.
  With it, we don't need to iterate all mapped areas. Instead, we just
  iterate static mapped areas. It helps to reduce an overhead of finding
  matched area. And architecture dependency on vmalloc layer is removed,
  so it will help to maintainability for vmalloc layer.
  
  Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
 
 Some comments below.
 
  diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c
  index 904c15e..c7fef4b 100644
  --- a/arch/arm/mm/ioremap.c
  +++ b/arch/arm/mm/ioremap.c
  @@ -261,13 +261,14 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long 
  pfn,
  const struct mem_type *type;
  int err;
  unsigned long addr;
  -   struct vm_struct * area;
  +   struct vm_struct *area;
  +   phys_addr_t paddr = __pfn_to_phys(pfn);
   
   #ifndef CONFIG_ARM_LPAE
  /*
   * High mappings must be supersection aligned
   */
  -   if (pfn = 0x10  (__pfn_to_phys(pfn)  ~SUPERSECTION_MASK))
  +   if (pfn = 0x10  (paddr  ~SUPERSECTION_MASK))
  return NULL;
   #endif
   
  @@ -283,24 +284,16 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long 
  pfn,
  /*
   * Try to reuse one of the static mapping whenever possible.
   */
  -   read_lock(vmlist_lock);
  -   for (area = vmlist; area; area = area-next) {
  -   if (!size || (sizeof(phys_addr_t) == 4  pfn = 0x10))
  -   break;
  -   if (!(area-flags  VM_ARM_STATIC_MAPPING))
  -   continue;
  -   if ((area-flags  VM_ARM_MTYPE_MASK) != VM_ARM_MTYPE(mtype))
  -   continue;
  -   if (__phys_to_pfn(area-phys_addr)  pfn ||
  -   __pfn_to_phys(pfn) + size-1  area-phys_addr + 
  area-size-1)
  -   continue;
  -   /* we can drop the lock here as we know *area is static */
  -   read_unlock(vmlist_lock);
  -   addr = (unsigned long)area-addr;
  -   addr += __pfn_to_phys(pfn) - area-phys_addr;
  -   return (void __iomem *) (offset + addr);
  +   if (size  !((sizeof(phys_addr_t) == 4  pfn = 0x10))) {
  ^ ^
 You have a needless extra set of parents here.
 
Okay.

 [...]
 
  diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
  index ce328c7..b2c0356 100644
  --- a/arch/arm/mm/mmu.c
  +++ b/arch/arm/mm/mmu.c
  @@ -757,21 +757,24 @@ void __init iotable_init(struct map_desc *io_desc, 
  int nr)
   {
  struct map_desc *md;
  struct vm_struct *vm;
  +   struct static_vm *svm;
   
  if (!nr)
  return;
   
  -   vm = early_alloc_aligned(sizeof(*vm) * nr, __alignof__(*vm));
  +   svm = early_alloc_aligned(sizeof(*svm) * nr, __alignof__(*svm));
   
  for (md = io_desc; nr; md++, nr--) {
  create_mapping(md);
  +
  +   vm = svm-vm;
  vm-addr = (void *)(md-virtual  PAGE_MASK);
  vm-size = PAGE_ALIGN(md-length + (md-virtual  ~PAGE_MASK));
  vm-phys_addr = __pfn_to_phys(md-pfn);
  vm-flags = VM_IOREMAP | VM_ARM_STATIC_MAPPING;
  vm-flags |= VM_ARM_MTYPE(md-type);
  vm-caller = iotable_init;
  -   vm_area_add_early(vm++);
  +   add_static_vm_early(svm++);
  }
   }
   
  @@ -779,13 +782,16 @@ void __init vm_reserve_area_early(unsigned long addr, 
  unsigned long size,
void *caller)
   {
  struct vm_struct *vm;
  +   struct static_vm *svm;
  +
  +   svm = early_alloc_aligned(sizeof(*svm), __alignof__(*svm));
   
  -   vm = early_alloc_aligned(sizeof(*vm), __alignof__(*vm));
  +   vm = svm-vm;
  vm-addr = (void *)addr;
  vm-size = size;
  vm-flags = VM_IOREMAP | VM_ARM_EMPTY_MAPPING;
  vm-caller = caller;
  -   vm_area_add_early(vm);
  +   add_static_vm_early(svm);
   }
   
   #ifndef CONFIG_ARM_LPAE
  @@ -810,14 +816,13 @@ static void __init pmd_empty_section_gap(unsigned 
  long addr)
   
   static void __init fill_pmd_gaps(void)
   {
  +   struct static_vm *svm;
  struct vm_struct *vm;
  unsigned long addr, next = 0;
  pmd_t *pmd;
   
  -   /* we're still single threaded hence no lock needed here */
  -   for (vm = vmlist; vm; vm = vm-next) {
  -   if (!(vm-flags  (VM_ARM_STATIC_MAPPING | 
  VM_ARM_EMPTY_MAPPING)))
  -   continue;
  +   list_for_each_entry(svm, static_vmlist, list) {
  +   vm = svm-vm;
  addr = (unsigned long)vm-addr;
  if (addr  next)
  continue;
  @@ -859,17 +864,12 @@ static void

Re: [PATCH v5 3/3] ARM: mm: use static_vm for managing static mapped areas

2013-02-05 Thread Joonsoo Kim
Hello, Rob.

On Tue, Feb 05, 2013 at 01:12:51PM -0600, Rob Herring wrote:
 On 02/05/2013 12:13 PM, Nicolas Pitre wrote:
  On Tue, 5 Feb 2013, Rob Herring wrote:
  
  On 02/04/2013 10:44 PM, Nicolas Pitre wrote:
  On Tue, 5 Feb 2013, Joonsoo Kim wrote:
 
  A static mapped area is ARM-specific, so it is better not to use
  generic vmalloc data structure, that is, vmlist and vmlist_lock
  for managing static mapped area. And it causes some needless overhead and
  reducing this overhead is better idea.
 
  Now, we have newly introduced static_vm infrastructure.
  With it, we don't need to iterate all mapped areas. Instead, we just
  iterate static mapped areas. It helps to reduce an overhead of finding
  matched area. And architecture dependency on vmalloc layer is removed,
  so it will help to maintainability for vmalloc layer.
 
  Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
 
  [snip]
 
  @@ -859,17 +864,12 @@ static void __init pci_reserve_io(void)
   {
   struct vm_struct *vm;
   unsigned long addr;
  +struct static_vm *svm;
   
  -/* we're still single threaded hence no lock needed here */
  -for (vm = vmlist; vm; vm = vm-next) {
  -if (!(vm-flags  VM_ARM_STATIC_MAPPING))
  -continue;
  -addr = (unsigned long)vm-addr;
  -addr = ~(SZ_2M - 1);
  -if (addr == PCI_IO_VIRT_BASE)
  -return;
  +svm = find_static_vm_vaddr((void *)PCI_IO_VIRT_BASE);
  +if (svm)
  +return;
 
  -}
   
   vm_reserve_area_early(PCI_IO_VIRT_BASE, SZ_2M, pci_reserve_io);
   }
 
  The replacement code is not equivalent.  I can't recall why the original 
  is as it is, but it doesn't look right to me.  The 2MB round down 
  certainly looks suspicious.
 
  The PCI mapping is at a fixed, aligned 2MB mapping. If we find any
  virtual address within that region already mapped, it is an error.
  Ah, OK.  This wasn't clear looking at the code.
  We probably should have had a WARN here.
  
  Indeed.
  

Okay.
I should fix it to find any mapping within PCI reserved region.
But, I think that it is not an error.
Now, I see your original commit 'c2794437091a4fda72c4a4f3567dd728dcc0c3c9'
and find below message.

Platforms which need early i/o mapping (e.g. for vga console) can call
pci_map_io_early in their .map_io function.

Therfore, for some platform, it is possible that there is a mapping within
PCI reserved range.

So, I will not add WARN here.

I will fix and re-send v6 with your ACK.

Thanks for review.

 
  The replacement code should be better.  However I'd like you to get an 
  ACK from Rob Herring as well for this patch.
 
  It doesn't appear to me the above case is handled. The virt addr is
  checked whether it is within an existing mapping, but not whether the
  new mapping would overlap an existing mapping. It would be good to check
  for this generically rather than specifically for the PCI i/o mapping.
  
  Agreed.  However that is checked already in vm_area_add_early().  
  Therefore the overlap test here is redundant.
 
 Ah, right. In that case:
 
 Acked-by: Rob Herring rob.herr...@calxeda.com
 
 Rob
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/3] ARM: vmregion: remove vmregion code entirely

2013-02-05 Thread Joonsoo Kim
Hello, Santosh.

On Tue, Feb 05, 2013 at 02:22:39PM +0530, Santosh Shilimkar wrote:
 On Tuesday 05 February 2013 06:01 AM, Joonsoo Kim wrote:
 Now, there is no user for vmregion.
 So remove it.
 
 Acked-by: Nicolas Pitre n...@linaro.org
 Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
 
 diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
 index 8a9c4cb..4e333fa 100644
 --- a/arch/arm/mm/Makefile
 +++ b/arch/arm/mm/Makefile
 @@ -6,7 +6,7 @@ obj-y:= dma-mapping.o 
 extable.o fault.o init.o \
 iomap.o
 
   obj-$(CONFIG_MMU)  += fault-armv.o flush.o idmap.o ioremap.o \
 -   mmap.o pgd.o mmu.o vmregion.o
 +   mmap.o pgd.o mmu.o
 
   ifneq ($(CONFIG_MMU),y)
   obj-y  += nommu.o
 diff --git a/arch/arm/mm/vmregion.c b/arch/arm/mm/vmregion.c
 deleted file mode 100644
 index a631016..000
 --- a/arch/arm/mm/vmregion.c
 +++ /dev/null
 @@ -1,205 +0,0 @@
 You might want to use 'git format-patch -D'
 which will just generate one line for a deleted file.

Nice tip!
Thanks.

 Regards,
 Santosh
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 0/3] introduce static_vm for ARM-specific static mapped area

2013-02-05 Thread Joonsoo Kim
Hello, Santosh.

On Tue, Feb 05, 2013 at 02:32:06PM +0530, Santosh Shilimkar wrote:
 On Tuesday 05 February 2013 06:01 AM, Joonsoo Kim wrote:
 In current implementation, we used ARM-specific flag, that is,
 VM_ARM_STATIC_MAPPING, for distinguishing ARM specific static mapped area.
 The purpose of static mapped area is to re-use static mapped area when
 entire physical address range of the ioremap request can be covered
 by this area.
 
 This implementation causes needless overhead for some cases.
 For example, assume that there is only one static mapped area and
 vmlist has 300 areas. Every time we call ioremap, we check 300 areas for
 deciding whether it is matched or not. Moreover, even if there is
 no static mapped area and vmlist has 300 areas, every time we call
 ioremap, we check 300 areas in now.
 
 If we construct a extra list for static mapped area, we can eliminate
 above mentioned overhead.
 With a extra list, if there is one static mapped area,
 we just check only one area and proceed next operation quickly.
 
 In fact, it is not a critical problem, because ioremap is not frequently
 used. But reducing overhead is better idea.
 
 Another reason for doing this work is for removing vm_struct list management,
 entirely. For more information, look at the following link.
 http://lkml.org/lkml/2012/12/6/184
 
 
 [..]
 
 
 Joonsoo Kim (3):
ARM: vmregion: remove vmregion code entirely
ARM: ioremap: introduce an infrastructure for static mapped area
ARM: mm: use static_vm for managing static mapped areas
 
   arch/arm/mm/Makefile   |2 +-
   arch/arm/mm/ioremap.c  |  135 +--
   arch/arm/mm/mm.h   |   12 +++
   arch/arm/mm/mmu.c  |   34 
   arch/arm/mm/vmregion.c |  205 
  
   arch/arm/mm/vmregion.h |   31 
   6 files changed, 123 insertions(+), 296 deletions(-)
   delete mode 100644 arch/arm/mm/vmregion.c
   delete mode 100644 arch/arm/mm/vmregion.h
 
 Nice Clean-up. I tested this series on OMAP which uses few static
 mappings. Feel free to add,
 
 Tested-by: Santosh Shilimkarsantosh.shilim...@ti.com

I will re-send v6 with your Tested-by.

Thanks for testing this.

 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 3/3] ARM: mm: use static_vm for managing static mapped areas

2013-02-05 Thread Joonsoo Kim
On Wed, Feb 06, 2013 at 11:07:07AM +0900, Joonsoo Kim wrote:
 Hello, Rob.
 
 On Tue, Feb 05, 2013 at 01:12:51PM -0600, Rob Herring wrote:
  On 02/05/2013 12:13 PM, Nicolas Pitre wrote:
   On Tue, 5 Feb 2013, Rob Herring wrote:
   
   On 02/04/2013 10:44 PM, Nicolas Pitre wrote:
   On Tue, 5 Feb 2013, Joonsoo Kim wrote:
  
   A static mapped area is ARM-specific, so it is better not to use
   generic vmalloc data structure, that is, vmlist and vmlist_lock
   for managing static mapped area. And it causes some needless overhead 
   and
   reducing this overhead is better idea.
  
   Now, we have newly introduced static_vm infrastructure.
   With it, we don't need to iterate all mapped areas. Instead, we just
   iterate static mapped areas. It helps to reduce an overhead of finding
   matched area. And architecture dependency on vmalloc layer is removed,
   so it will help to maintainability for vmalloc layer.
  
   Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
  
   [snip]
  
   @@ -859,17 +864,12 @@ static void __init pci_reserve_io(void)
{
  struct vm_struct *vm;
  unsigned long addr;
   +  struct static_vm *svm;

   -  /* we're still single threaded hence no lock needed here */
   -  for (vm = vmlist; vm; vm = vm-next) {
   -  if (!(vm-flags  VM_ARM_STATIC_MAPPING))
   -  continue;
   -  addr = (unsigned long)vm-addr;
   -  addr = ~(SZ_2M - 1);
   -  if (addr == PCI_IO_VIRT_BASE)
   -  return;
   +  svm = find_static_vm_vaddr((void *)PCI_IO_VIRT_BASE);
   +  if (svm)
   +  return;
  
   -  }

  vm_reserve_area_early(PCI_IO_VIRT_BASE, SZ_2M, pci_reserve_io);
}
  
   The replacement code is not equivalent.  I can't recall why the 
   original 
   is as it is, but it doesn't look right to me.  The 2MB round down 
   certainly looks suspicious.
  
   The PCI mapping is at a fixed, aligned 2MB mapping. If we find any
   virtual address within that region already mapped, it is an error.
   Ah, OK.  This wasn't clear looking at the code.
   We probably should have had a WARN here.
   
   Indeed.
   
 
 Okay.
 I should fix it to find any mapping within PCI reserved region.

Ah...
Above comment is my mistake.

If there is a region already mapped within PCI reserved region and
it is not found by find_static_vm_vaddr(), vm_area_add_early() hit BUG_ON().
So, to leave find_static_vm_vaddr() is safe.

 But, I think that it is not an error.
 Now, I see your original commit 'c2794437091a4fda72c4a4f3567dd728dcc0c3c9'
 and find below message.
 
 Platforms which need early i/o mapping (e.g. for vga console) can call
 pci_map_io_early in their .map_io function.
 
 Therfore, for some platform, it is possible that there is a mapping within
 PCI reserved range.
 
 So, I will not add WARN here.
 
 I will fix and re-send v6 with your ACK.
 
 Thanks for review.
 
  
   The replacement code should be better.  However I'd like you to get an 
   ACK from Rob Herring as well for this patch.
  
   It doesn't appear to me the above case is handled. The virt addr is
   checked whether it is within an existing mapping, but not whether the
   new mapping would overlap an existing mapping. It would be good to check
   for this generically rather than specifically for the PCI i/o mapping.
   
   Agreed.  However that is checked already in vm_area_add_early().  
   Therefore the overlap test here is redundant.
  
  Ah, right. In that case:
  
  Acked-by: Rob Herring rob.herr...@calxeda.com
  
  Rob
  
  --
  To unsubscribe from this list: send the line unsubscribe linux-kernel in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  Please read the FAQ at  http://www.tux.org/lkml/
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RESEND] ARM: sched: correct update_sched_clock()

2013-02-05 Thread Joonsoo Kim
If we want load epoch_cyc and epoch_ns atomically,
we should update epoch_cyc_copy first of all.
This notify reader that updating is in progress.

If we update epoch_cyc first like as current implementation,
there is subtle error case.
Look at the below example.

Initial Condition
cyc = 9
ns = 900
cyc_copy = 9

== CASE 1 ==
CPU A = reader   CPU B = updater
   write cyc = 10
read cyc = 10
read ns = 900
   write ns = 1000
   write cyc_copy = 10
read cyc_copy = 10

output = (10, 900)

== CASE 2 ==
CPU A = reader   CPU B = updater
read cyc = 9
   write cyc = 10
   write ns = 1000
read ns = 1000
read cyc_copy = 9
   write cyc_copy = 10
output = (9, 1000)

If atomic read is ensured, output should be (9, 900) or (10, 1000).
But, output in example case are not.

So, change updating sequence in order to correct this problem.

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/arch/arm/kernel/sched_clock.c b/arch/arm/kernel/sched_clock.c
index fc6692e..bd6f56b 100644
--- a/arch/arm/kernel/sched_clock.c
+++ b/arch/arm/kernel/sched_clock.c
@@ -93,11 +93,11 @@ static void notrace update_sched_clock(void)
 * detectable in cyc_to_fixed_sched_clock().
 */
raw_local_irq_save(flags);
-   cd.epoch_cyc = cyc;
+   cd.epoch_cyc_copy = cyc;
smp_wmb();
cd.epoch_ns = ns;
smp_wmb();
-   cd.epoch_cyc_copy = cyc;
+   cd.epoch_cyc = cyc;
raw_local_irq_restore(flags);
 }
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 0/3] introduce static_vm for ARM-specific static mapped area

2013-02-06 Thread Joonsoo Kim
In current implementation, we used ARM-specific flag, that is,
VM_ARM_STATIC_MAPPING, for distinguishing ARM specific static mapped area.
The purpose of static mapped area is to re-use static mapped area when
entire physical address range of the ioremap request can be covered
by this area.

This implementation causes needless overhead for some cases.
For example, assume that there is only one static mapped area and
vmlist has 300 areas. Every time we call ioremap, we check 300 areas for
deciding whether it is matched or not. Moreover, even if there is
no static mapped area and vmlist has 300 areas, every time we call
ioremap, we check 300 areas in now.

If we construct a extra list for static mapped area, we can eliminate
above mentioned overhead.
With a extra list, if there is one static mapped area,
we just check only one area and proceed next operation quickly.

In fact, it is not a critical problem, because ioremap is not frequently
used. But reducing overhead is better idea.

Another reason for doing this work is for removing vm_struct list management,
entirely. For more information, look at the following link.
http://lkml.org/lkml/2012/12/6/184

Changelog
v5-v6:
  Add Ack-by, Reviewed-by, Tested-by tags
  [3/3]: Change from Nicolas' suggestion
 - remove redundant parenthesis

v4-v5:
  [2/3]: Changes from Nicolas' suggestion
 - don't use separate flags for static_vm
 - remove a lock
 - declare add_static_vm_early() as __init
  [3/3]: Changes from Nicolas' suggestion
 - add / leave comments

v3-v4:
  [2/3]: Changes from Nicolas' suggestion
 - embed static_vm code in ioremap.c
 - simplify struct static_vm
 - remove init_static_vm, instead, add_static_vm_early() init static_vm
 Use generic list for list management of static_vm
 Convert spin_lock to rw_lock
 Modify static_vm's flags bits
  [3/3]: Rework according to [2/3] change
  Rebased on v3.8-rc5

v2-v3:
  coverletter: refer a link related to this work
  [2/3]: drop @flags of find_static_vm_vaddr
  Rebased on v3.8-rc4

v1-v2: 
  [2/3]: patch description is improved.
  Rebased on v3.7-rc7

Joonsoo Kim (3):
  ARM: vmregion: remove vmregion code entirely
  ARM: ioremap: introduce an infrastructure for static mapped area
  ARM: mm: use static_vm for managing static mapped areas

 arch/arm/mm/Makefile   |2 +-
 arch/arm/mm/ioremap.c  |  135 +--
 arch/arm/mm/mm.h   |   12 +++
 arch/arm/mm/mmu.c  |   34 
 arch/arm/mm/vmregion.c |  205 
 arch/arm/mm/vmregion.h |   31 
 6 files changed, 123 insertions(+), 296 deletions(-)
 delete mode 100644 arch/arm/mm/vmregion.c
 delete mode 100644 arch/arm/mm/vmregion.h

-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 2/3] ARM: ioremap: introduce an infrastructure for static mapped area

2013-02-06 Thread Joonsoo Kim
In current implementation, we used ARM-specific flag, that is,
VM_ARM_STATIC_MAPPING, for distinguishing ARM specific static mapped area.
The purpose of static mapped area is to re-use static mapped area when
entire physical address range of the ioremap request can be covered
by this area.

This implementation causes needless overhead for some cases.
For example, assume that there is only one static mapped area and
vmlist has 300 areas. Every time we call ioremap, we check 300 areas for
deciding whether it is matched or not. Moreover, even if there is
no static mapped area and vmlist has 300 areas, every time we call
ioremap, we check 300 areas in now.

If we construct a extra list for static mapped area, we can eliminate
above mentioned overhead.
With a extra list, if there is one static mapped area,
we just check only one area and proceed next operation quickly.

In fact, it is not a critical problem, because ioremap is not frequently
used. But reducing overhead is better idea.

Another reason for doing this work is for removing architecture dependency
on vmalloc layer. I think that vmlist and vmlist_lock is internal data
structure for vmalloc layer. Some codes for debugging and stat inevitably
use vmlist and vmlist_lock. But it is preferable that they are used
as least as possible in outside of vmalloc.c

Now, I introduce an ARM-specific infrastructure for static mapped area. In
the following patch, we will use this and resolve above mentioned problem.

Reviewed-by: Nicolas Pitre n...@linaro.org
Tested-by: Santosh Shilimkar santosh.shilim...@ti.com
Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c
index 88fd86c..904c15e 100644
--- a/arch/arm/mm/ioremap.c
+++ b/arch/arm/mm/ioremap.c
@@ -39,6 +39,70 @@
 #include asm/mach/pci.h
 #include mm.h
 
+
+LIST_HEAD(static_vmlist);
+
+static struct static_vm *find_static_vm_paddr(phys_addr_t paddr,
+   size_t size, unsigned int mtype)
+{
+   struct static_vm *svm;
+   struct vm_struct *vm;
+
+   list_for_each_entry(svm, static_vmlist, list) {
+   vm = svm-vm;
+   if (!(vm-flags  VM_ARM_STATIC_MAPPING))
+   continue;
+   if ((vm-flags  VM_ARM_MTYPE_MASK) != VM_ARM_MTYPE(mtype))
+   continue;
+
+   if (vm-phys_addr  paddr ||
+   paddr + size - 1  vm-phys_addr + vm-size - 1)
+   continue;
+
+   return svm;
+   }
+
+   return NULL;
+}
+
+struct static_vm *find_static_vm_vaddr(void *vaddr)
+{
+   struct static_vm *svm;
+   struct vm_struct *vm;
+
+   list_for_each_entry(svm, static_vmlist, list) {
+   vm = svm-vm;
+
+   /* static_vmlist is ascending order */
+   if (vm-addr  vaddr)
+   break;
+
+   if (vm-addr = vaddr  vm-addr + vm-size  vaddr)
+   return svm;
+   }
+
+   return NULL;
+}
+
+void __init add_static_vm_early(struct static_vm *svm)
+{
+   struct static_vm *curr_svm;
+   struct vm_struct *vm;
+   void *vaddr;
+
+   vm = svm-vm;
+   vm_area_add_early(vm);
+   vaddr = vm-addr;
+
+   list_for_each_entry(curr_svm, static_vmlist, list) {
+   vm = curr_svm-vm;
+
+   if (vm-addr  vaddr)
+   break;
+   }
+   list_add_tail(svm-list, curr_svm-list);
+}
+
 int ioremap_page(unsigned long virt, unsigned long phys,
 const struct mem_type *mtype)
 {
diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
index a8ee92d..d5a4e9a 100644
--- a/arch/arm/mm/mm.h
+++ b/arch/arm/mm/mm.h
@@ -1,4 +1,6 @@
 #ifdef CONFIG_MMU
+#include linux/list.h
+#include linux/vmalloc.h
 
 /* the upper-most page table pointer */
 extern pmd_t *top_pmd;
@@ -65,6 +67,16 @@ extern void __flush_dcache_page(struct address_space 
*mapping, struct page *page
 /* consistent regions used by dma_alloc_attrs() */
 #define VM_ARM_DMA_CONSISTENT  0x2000
 
+
+struct static_vm {
+   struct vm_struct vm;
+   struct list_head list;
+};
+
+extern struct list_head static_vmlist;
+extern struct static_vm *find_static_vm_vaddr(void *vaddr);
+extern __init void add_static_vm_early(struct static_vm *svm);
+
 #endif
 
 #ifdef CONFIG_ZONE_DMA
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 1/3] ARM: vmregion: remove vmregion code entirely

2013-02-06 Thread Joonsoo Kim
Now, there is no user for vmregion.
So remove it.

Acked-by: Nicolas Pitre n...@linaro.org
Tested-by: Santosh Shilimkar santosh.shilim...@ti.com
Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
index 8a9c4cb..4e333fa 100644
--- a/arch/arm/mm/Makefile
+++ b/arch/arm/mm/Makefile
@@ -6,7 +6,7 @@ obj-y   := dma-mapping.o extable.o 
fault.o init.o \
   iomap.o
 
 obj-$(CONFIG_MMU)  += fault-armv.o flush.o idmap.o ioremap.o \
-  mmap.o pgd.o mmu.o vmregion.o
+  mmap.o pgd.o mmu.o
 
 ifneq ($(CONFIG_MMU),y)
 obj-y  += nommu.o
diff --git a/arch/arm/mm/vmregion.c b/arch/arm/mm/vmregion.c
deleted file mode 100644
index a631016..000
diff --git a/arch/arm/mm/vmregion.h b/arch/arm/mm/vmregion.h
deleted file mode 100644
index 0f5a5f2..000
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 3/3] ARM: mm: use static_vm for managing static mapped areas

2013-02-06 Thread Joonsoo Kim
A static mapped area is ARM-specific, so it is better not to use
generic vmalloc data structure, that is, vmlist and vmlist_lock
for managing static mapped area. And it causes some needless overhead and
reducing this overhead is better idea.

Now, we have newly introduced static_vm infrastructure.
With it, we don't need to iterate all mapped areas. Instead, we just
iterate static mapped areas. It helps to reduce an overhead of finding
matched area. And architecture dependency on vmalloc layer is removed,
so it will help to maintainability for vmalloc layer.

Reviewed-by: Nicolas Pitre n...@linaro.org
Acked-by: Rob Herring rob.herr...@calxeda.com
Tested-by: Santosh Shilimkar santosh.shilim...@ti.com
Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c
index 904c15e..04d9006 100644
--- a/arch/arm/mm/ioremap.c
+++ b/arch/arm/mm/ioremap.c
@@ -261,13 +261,14 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn,
const struct mem_type *type;
int err;
unsigned long addr;
-   struct vm_struct * area;
+   struct vm_struct *area;
+   phys_addr_t paddr = __pfn_to_phys(pfn);
 
 #ifndef CONFIG_ARM_LPAE
/*
 * High mappings must be supersection aligned
 */
-   if (pfn = 0x10  (__pfn_to_phys(pfn)  ~SUPERSECTION_MASK))
+   if (pfn = 0x10  (paddr  ~SUPERSECTION_MASK))
return NULL;
 #endif
 
@@ -283,24 +284,16 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn,
/*
 * Try to reuse one of the static mapping whenever possible.
 */
-   read_lock(vmlist_lock);
-   for (area = vmlist; area; area = area-next) {
-   if (!size || (sizeof(phys_addr_t) == 4  pfn = 0x10))
-   break;
-   if (!(area-flags  VM_ARM_STATIC_MAPPING))
-   continue;
-   if ((area-flags  VM_ARM_MTYPE_MASK) != VM_ARM_MTYPE(mtype))
-   continue;
-   if (__phys_to_pfn(area-phys_addr)  pfn ||
-   __pfn_to_phys(pfn) + size-1  area-phys_addr + 
area-size-1)
-   continue;
-   /* we can drop the lock here as we know *area is static */
-   read_unlock(vmlist_lock);
-   addr = (unsigned long)area-addr;
-   addr += __pfn_to_phys(pfn) - area-phys_addr;
-   return (void __iomem *) (offset + addr);
+   if (size  !(sizeof(phys_addr_t) == 4  pfn = 0x10)) {
+   struct static_vm *svm;
+
+   svm = find_static_vm_paddr(paddr, size, mtype);
+   if (svm) {
+   addr = (unsigned long)svm-vm.addr;
+   addr += paddr - svm-vm.phys_addr;
+   return (void __iomem *) (offset + addr);
+   }
}
-   read_unlock(vmlist_lock);
 
/*
 * Don't allow RAM to be mapped - this causes problems with ARMv6+
@@ -312,21 +305,21 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn,
if (!area)
return NULL;
addr = (unsigned long)area-addr;
-   area-phys_addr = __pfn_to_phys(pfn);
+   area-phys_addr = paddr;
 
 #if !defined(CONFIG_SMP)  !defined(CONFIG_ARM_LPAE)
if (DOMAIN_IO == 0 
(((cpu_architecture() = CPU_ARCH_ARMv6)  (get_cr()  CR_XP)) ||
   cpu_is_xsc3())  pfn = 0x10 
-  !((__pfn_to_phys(pfn) | size | addr)  ~SUPERSECTION_MASK)) {
+  !((paddr | size | addr)  ~SUPERSECTION_MASK)) {
area-flags |= VM_ARM_SECTION_MAPPING;
err = remap_area_supersections(addr, pfn, size, type);
-   } else if (!((__pfn_to_phys(pfn) | size | addr)  ~PMD_MASK)) {
+   } else if (!((paddr | size | addr)  ~PMD_MASK)) {
area-flags |= VM_ARM_SECTION_MAPPING;
err = remap_area_sections(addr, pfn, size, type);
} else
 #endif
-   err = ioremap_page_range(addr, addr + size, __pfn_to_phys(pfn),
+   err = ioremap_page_range(addr, addr + size, paddr,
 __pgprot(type-prot_pte));
 
if (err) {
@@ -410,34 +403,28 @@ __arm_ioremap_exec(unsigned long phys_addr, size_t size, 
bool cached)
 void __iounmap(volatile void __iomem *io_addr)
 {
void *addr = (void *)(PAGE_MASK  (unsigned long)io_addr);
-   struct vm_struct *vm;
+   struct static_vm *svm;
+
+   /* If this is a static mapping, we must leave it alone */
+   svm = find_static_vm_vaddr(addr);
+   if (svm)
+   return;
 
-   read_lock(vmlist_lock);
-   for (vm = vmlist; vm; vm = vm-next) {
-   if (vm-addr  addr)
-   break;
-   if (!(vm-flags  VM_IOREMAP))
-   continue;
-   /* If this is a static mapping we must leave it alone */
-   if ((vm-flags  VM_ARM_STATIC_MAPPING

Re: [PATCH RESEND] ARM: sched: correct update_sched_clock()

2013-02-06 Thread JoonSoo Kim
Hello, Linus.

2013/2/6 Linus Walleij linus.wall...@linaro.org:
 On Wed, Feb 6, 2013 at 6:21 AM, Joonsoo Kim iamjoonsoo@lge.com wrote:

 If we want load epoch_cyc and epoch_ns atomically,
 we should update epoch_cyc_copy first of all.
 This notify reader that updating is in progress.

 If you think the patch is finished, put it into Russell's patch tracker:
 http://www.arm.linux.org.uk/developer/patches/

Ah... Okay.
Thanks for notifying me.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 2/3] ARM: static_vm: introduce an infrastructure for static mapped area

2013-01-29 Thread Joonsoo Kim
Hello, Nicolas.

On Tue, Jan 29, 2013 at 07:05:32PM -0500, Nicolas Pitre wrote:
 On Thu, 24 Jan 2013, Joonsoo Kim wrote:
 
  From: Joonsoo Kim js1...@gmail.com
  
  In current implementation, we used ARM-specific flag, that is,
  VM_ARM_STATIC_MAPPING, for distinguishing ARM specific static mapped area.
  The purpose of static mapped area is to re-use static mapped area when
  entire physical address range of the ioremap request can be covered
  by this area.
  
  This implementation causes needless overhead for some cases.
  For example, assume that there is only one static mapped area and
  vmlist has 300 areas. Every time we call ioremap, we check 300 areas for
  deciding whether it is matched or not. Moreover, even if there is
  no static mapped area and vmlist has 300 areas, every time we call
  ioremap, we check 300 areas in now.
  
  If we construct a extra list for static mapped area, we can eliminate
  above mentioned overhead.
  With a extra list, if there is one static mapped area,
  we just check only one area and proceed next operation quickly.
  
  In fact, it is not a critical problem, because ioremap is not frequently
  used. But reducing overhead is better idea.
  
  Another reason for doing this work is for removing architecture dependency
  on vmalloc layer. I think that vmlist and vmlist_lock is internal data
  structure for vmalloc layer. Some codes for debugging and stat inevitably
  use vmlist and vmlist_lock. But it is preferable that they are used
  as least as possible in outside of vmalloc.c
  
  Now, I introduce an ARM-specific infrastructure for static mapped area. In
  the following patch, we will use this and resolve above mentioned problem.
  
  Signed-off-by: Joonsoo Kim js1...@gmail.com
  Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
 
 First of all, I don't think you really need a new file with a global 
 scope header file.  Given that this code is meant to be used only for 
 ioremap optimization  on ARM, it is probably a better idea to simply put 
 it all into arch/arm/mm/ioremap.c instead.  The only function that needs 
 to be exported out of ioremap.c is insert_static_vm(), and only for the 
 benefit of arch/arm/mm/mmu.c, therefore this function prototype may as 
 well just be added to arch/arm/mm/mm.h.

I agree with your all opinions.
I will re-work and will re-send v4 as soon as possible.
Thanks for review.

 More comments below.
 
  diff --git a/arch/arm/include/asm/mach/static_vm.h 
  b/arch/arm/include/asm/mach/static_vm.h
  new file mode 100644
  index 000..72c8339
  --- /dev/null
  +++ b/arch/arm/include/asm/mach/static_vm.h
  @@ -0,0 +1,45 @@
  +/*
  + * arch/arm/include/asm/mach/static_vm.h
  + *
  + * Copyright (C) 2012 LG Electronics, Joonsoo Kim iamjoonsoo@lge.com
  + *
  + * This program is free software; you can redistribute it and/or modify
  + * it under the terms of the GNU General Public License version 2 as
  + * published by the Free Software Foundation.
  + *
  + * This program is distributed in the hope that it will be useful,
  + * but WITHOUT ANY WARRANTY; without even the implied warranty of
  + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  + * GNU General Public License for more details.
  + *
  + * You should have received a copy of the GNU General Public License
  + * along with this program; if not, write to the Free Software
  + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
  + */
  +
  +#ifndef _ASM_MACH_STATIC_VM_H
  +#define _ASM_MACH_STATIC_VM_H
  +
  +#include linux/types.h
  +#include linux/vmalloc.h
  +
  +struct static_vm {
  +   struct static_vm*next;
  +   void*vaddr;
  +   unsigned long   size;
  +   unsigned long   flags;
  +   phys_addr_t paddr;
  +   const void  *caller;
  +};
 
 Here you're duplicating most of the vm_struct content for no obvious 
 reasons.  Patch #3 even allocates both a vm_struct and a static_vm 
 instance in parallel for each mapping.  Instead, you should consider 
 something like this:
 
 struct static_vm {
   struct static_vm *next;
   struct vm_struct vm;
 };
 
 This way, you only need to allocate one structure:
 
   struct static_vm *svm = early_alloc(...);
   ...
   svm-vm.addr = addr;
   ...
   vm_area_add_early(svm-vm);
   insert_static_vm(svm);

Yes!
It's good idea.

 And then, it would make sense for the insert_static_vm() to do the 
 vm_area_add_early() call itself as well.

Okay.

 Maybe rename insert_static_vm() to static_vm_area_add_early() to better 
 identify its purpose as well.  It shouldn't be used for any other 
 purpose anyway.

Okay.

  +
  +extern struct static_vm *static_vmlist;
  +extern spinlock_t static_vmlist_lock;
 
 Your patch is providing the proper accessors to manipulate those.  They 
 therefore should not be exported globally.

Okay.

  +
  +extern struct static_vm *find_static_vm_paddr(phys_addr_t paddr

[PATCH v4 1/3] ARM: vmregion: remove vmregion code entirely

2013-01-30 Thread Joonsoo Kim
Now, there is no user for vmregion.
So remove it.

Acked-by: Nicolas Pitre n...@linaro.org
Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
index 8a9c4cb..4e333fa 100644
--- a/arch/arm/mm/Makefile
+++ b/arch/arm/mm/Makefile
@@ -6,7 +6,7 @@ obj-y   := dma-mapping.o extable.o 
fault.o init.o \
   iomap.o
 
 obj-$(CONFIG_MMU)  += fault-armv.o flush.o idmap.o ioremap.o \
-  mmap.o pgd.o mmu.o vmregion.o
+  mmap.o pgd.o mmu.o
 
 ifneq ($(CONFIG_MMU),y)
 obj-y  += nommu.o
diff --git a/arch/arm/mm/vmregion.c b/arch/arm/mm/vmregion.c
deleted file mode 100644
index a631016..000
--- a/arch/arm/mm/vmregion.c
+++ /dev/null
@@ -1,205 +0,0 @@
-#include linux/fs.h
-#include linux/spinlock.h
-#include linux/list.h
-#include linux/proc_fs.h
-#include linux/seq_file.h
-#include linux/slab.h
-
-#include vmregion.h
-
-/*
- * VM region handling support.
- *
- * This should become something generic, handling VM region allocations for
- * vmalloc and similar (ioremap, module space, etc).
- *
- * I envisage vmalloc()'s supporting vm_struct becoming:
- *
- *  struct vm_struct {
- *struct vmregion  region;
- *unsigned longflags;
- *struct page  **pages;
- *unsigned int nr_pages;
- *unsigned longphys_addr;
- *  };
- *
- * get_vm_area() would then call vmregion_alloc with an appropriate
- * struct vmregion head (eg):
- *
- *  struct vmregion vmalloc_head = {
- * .vm_list= LIST_HEAD_INIT(vmalloc_head.vm_list),
- * .vm_start   = VMALLOC_START,
- * .vm_end = VMALLOC_END,
- *  };
- *
- * However, vmalloc_head.vm_start is variable (typically, it is dependent on
- * the amount of RAM found at boot time.)  I would imagine that get_vm_area()
- * would have to initialise this each time prior to calling vmregion_alloc().
- */
-
-struct arm_vmregion *
-arm_vmregion_alloc(struct arm_vmregion_head *head, size_t align,
-  size_t size, gfp_t gfp, const void *caller)
-{
-   unsigned long start = head-vm_start, addr = head-vm_end;
-   unsigned long flags;
-   struct arm_vmregion *c, *new;
-
-   if (head-vm_end - head-vm_start  size) {
-   printk(KERN_WARNING %s: allocation too big (requested %#x)\n,
-   __func__, size);
-   goto out;
-   }
-
-   new = kmalloc(sizeof(struct arm_vmregion), gfp);
-   if (!new)
-   goto out;
-
-   new-caller = caller;
-
-   spin_lock_irqsave(head-vm_lock, flags);
-
-   addr = rounddown(addr - size, align);
-   list_for_each_entry_reverse(c, head-vm_list, vm_list) {
-   if (addr = c-vm_end)
-   goto found;
-   addr = rounddown(c-vm_start - size, align);
-   if (addr  start)
-   goto nospc;
-   }
-
- found:
-   /*
-* Insert this entry after the one we found.
-*/
-   list_add(new-vm_list, c-vm_list);
-   new-vm_start = addr;
-   new-vm_end = addr + size;
-   new-vm_active = 1;
-
-   spin_unlock_irqrestore(head-vm_lock, flags);
-   return new;
-
- nospc:
-   spin_unlock_irqrestore(head-vm_lock, flags);
-   kfree(new);
- out:
-   return NULL;
-}
-
-static struct arm_vmregion *__arm_vmregion_find(struct arm_vmregion_head 
*head, unsigned long addr)
-{
-   struct arm_vmregion *c;
-
-   list_for_each_entry(c, head-vm_list, vm_list) {
-   if (c-vm_active  c-vm_start == addr)
-   goto out;
-   }
-   c = NULL;
- out:
-   return c;
-}
-
-struct arm_vmregion *arm_vmregion_find(struct arm_vmregion_head *head, 
unsigned long addr)
-{
-   struct arm_vmregion *c;
-   unsigned long flags;
-
-   spin_lock_irqsave(head-vm_lock, flags);
-   c = __arm_vmregion_find(head, addr);
-   spin_unlock_irqrestore(head-vm_lock, flags);
-   return c;
-}
-
-struct arm_vmregion *arm_vmregion_find_remove(struct arm_vmregion_head *head, 
unsigned long addr)
-{
-   struct arm_vmregion *c;
-   unsigned long flags;
-
-   spin_lock_irqsave(head-vm_lock, flags);
-   c = __arm_vmregion_find(head, addr);
-   if (c)
-   c-vm_active = 0;
-   spin_unlock_irqrestore(head-vm_lock, flags);
-   return c;
-}
-
-void arm_vmregion_free(struct arm_vmregion_head *head, struct arm_vmregion *c)
-{
-   unsigned long flags;
-
-   spin_lock_irqsave(head-vm_lock, flags);
-   list_del(c-vm_list);
-   spin_unlock_irqrestore(head-vm_lock, flags);
-
-   kfree(c);
-}
-
-#ifdef CONFIG_PROC_FS
-static int arm_vmregion_show(struct seq_file *m, void *p)
-{
-   struct arm_vmregion *c = list_entry(p, struct arm_vmregion, vm_list);
-
-   seq_printf(m, 0x%08lx-0x%08lx %7lu, c-vm_start, c-vm_end,
-   c-vm_end - c

[PATCH v4 2/3] ARM: ioremap: introduce an infrastructure for static mapped area

2013-01-30 Thread Joonsoo Kim
In current implementation, we used ARM-specific flag, that is,
VM_ARM_STATIC_MAPPING, for distinguishing ARM specific static mapped area.
The purpose of static mapped area is to re-use static mapped area when
entire physical address range of the ioremap request can be covered
by this area.

This implementation causes needless overhead for some cases.
For example, assume that there is only one static mapped area and
vmlist has 300 areas. Every time we call ioremap, we check 300 areas for
deciding whether it is matched or not. Moreover, even if there is
no static mapped area and vmlist has 300 areas, every time we call
ioremap, we check 300 areas in now.

If we construct a extra list for static mapped area, we can eliminate
above mentioned overhead.
With a extra list, if there is one static mapped area,
we just check only one area and proceed next operation quickly.

In fact, it is not a critical problem, because ioremap is not frequently
used. But reducing overhead is better idea.

Another reason for doing this work is for removing architecture dependency
on vmalloc layer. I think that vmlist and vmlist_lock is internal data
structure for vmalloc layer. Some codes for debugging and stat inevitably
use vmlist and vmlist_lock. But it is preferable that they are used
as least as possible in outside of vmalloc.c

Now, I introduce an ARM-specific infrastructure for static mapped area. In
the following patch, we will use this and resolve above mentioned problem.

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c
index 88fd86c..ceb34ae 100644
--- a/arch/arm/mm/ioremap.c
+++ b/arch/arm/mm/ioremap.c
@@ -39,6 +39,78 @@
 #include asm/mach/pci.h
 #include mm.h
 
+
+LIST_HEAD(static_vmlist);
+static DEFINE_RWLOCK(static_vmlist_lock);
+
+static struct static_vm *find_static_vm_paddr(phys_addr_t paddr,
+   size_t size, unsigned long flags)
+{
+   struct static_vm *svm;
+   struct vm_struct *vm;
+
+   read_lock(static_vmlist_lock);
+   list_for_each_entry(svm, static_vmlist, list) {
+   if (svm-flags != flags)
+   continue;
+
+   vm = svm-vm;
+   if (vm-phys_addr  paddr ||
+   paddr + size - 1  vm-phys_addr + vm-size - 1)
+   continue;
+
+   read_unlock(static_vmlist_lock);
+   return svm;
+   }
+
+   return NULL;
+}
+
+struct static_vm *find_static_vm_vaddr(void *vaddr)
+{
+   struct static_vm *svm;
+   struct vm_struct *vm;
+
+   read_lock(static_vmlist_lock);
+   list_for_each_entry(svm, static_vmlist, list) {
+   vm = svm-vm;
+
+   /* static_vmlist is ascending order */
+   if (vm-addr  vaddr)
+   break;
+
+   if (vm-addr = vaddr  vm-addr + vm-size  vaddr) {
+   read_unlock(static_vmlist_lock);
+   return svm;
+   }
+   }
+   read_unlock(static_vmlist_lock);
+
+   return NULL;
+}
+
+void add_static_vm_early(struct static_vm *svm, unsigned long flags)
+{
+   struct static_vm *curr_svm;
+   struct vm_struct *vm;
+   void *vaddr;
+
+   vm_area_add_early(svm-vm);
+
+   vaddr = svm-vm.addr;
+   svm-flags = flags;
+
+   write_lock(static_vmlist_lock);
+   list_for_each_entry(curr_svm, static_vmlist, list) {
+   vm = curr_svm-vm;
+
+   if (vm-addr  vaddr)
+   break;
+   }
+   list_add_tail(svm-list, curr_svm-list);
+   write_unlock(static_vmlist_lock);
+}
+
 int ioremap_page(unsigned long virt, unsigned long phys,
 const struct mem_type *mtype)
 {
diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
index a8ee92d..fb45c79 100644
--- a/arch/arm/mm/mm.h
+++ b/arch/arm/mm/mm.h
@@ -1,4 +1,6 @@
 #ifdef CONFIG_MMU
+#include linux/list.h
+#include linux/vmalloc.h
 
 /* the upper-most page table pointer */
 extern pmd_t *top_pmd;
@@ -65,6 +67,24 @@ extern void __flush_dcache_page(struct address_space 
*mapping, struct page *page
 /* consistent regions used by dma_alloc_attrs() */
 #define VM_ARM_DMA_CONSISTENT  0x2000
 
+
+/* ARM specific static_vm-flags bits */
+#define STATIC_VM_MEM  0x0001
+#define STATIC_VM_EMPTY0x0002
+#define STATIC_VM_MTYPE(mtype) ((mtype)  20)
+
+#define STATIC_VM_TYPE(type, mtype) (type | STATIC_VM_MTYPE(mtype))
+
+struct static_vm {
+   struct vm_struct vm;
+   struct list_head list;
+   unsigned long flags;
+};
+
+extern struct list_head static_vmlist;
+extern struct static_vm *find_static_vm_vaddr(void *vaddr);
+extern void add_static_vm_early(struct static_vm *svm, unsigned long flags);
+
 #endif
 
 #ifdef CONFIG_ZONE_DMA
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http

[PATCH v4 0/3] introduce static_vm for ARM-specific static mapped area

2013-01-30 Thread Joonsoo Kim
In current implementation, we used ARM-specific flag, that is,
VM_ARM_STATIC_MAPPING, for distinguishing ARM specific static mapped area.
The purpose of static mapped area is to re-use static mapped area when
entire physical address range of the ioremap request can be covered
by this area.

This implementation causes needless overhead for some cases.
For example, assume that there is only one static mapped area and
vmlist has 300 areas. Every time we call ioremap, we check 300 areas for
deciding whether it is matched or not. Moreover, even if there is
no static mapped area and vmlist has 300 areas, every time we call
ioremap, we check 300 areas in now.

If we construct a extra list for static mapped area, we can eliminate
above mentioned overhead.
With a extra list, if there is one static mapped area,
we just check only one area and proceed next operation quickly.

In fact, it is not a critical problem, because ioremap is not frequently
used. But reducing overhead is better idea.

Another reason for doing this work is for removing vm_struct list management,
entirely. For more information, look at the following link.
http://lkml.org/lkml/2012/12/6/184

Changelog
v3-v4:
  [2/3]: Changes from Nicolas' suggestion
 - embed static_vm code in ioremap.c
 - simplify struct static_vm
 - remove init_static_vm, instead, add_static_vm_early() init static_vm
 Use generic list for list management of static_vm
 Convert spin_lock to rw_lock
 Modify static_vm's flags bits
  [3/3]: Rework according to [2/3] change
  Rebased on v3.8-rc5

v2-v3:
  coverletter: refer a link related to this work
  [2/3]: drop @flags of find_static_vm_vaddr
  Rebased on v3.8-rc4

v1-v2: 
  [2/3]: patch description is improved.
  Rebased on v3.7-rc7

Joonsoo Kim (3):
  ARM: vmregion: remove vmregion code entirely
  ARM: ioremap: introduce an infrastructure for static mapped area
  ARM: mm: use static_vm for managing static mapped areas

 arch/arm/mm/Makefile   |2 +-
 arch/arm/mm/ioremap.c  |  147 +++---
 arch/arm/mm/mm.h   |   28 ---
 arch/arm/mm/mmu.c  |   47 ++-
 arch/arm/mm/vmregion.c |  205 
 arch/arm/mm/vmregion.h |   31 
 6 files changed, 147 insertions(+), 313 deletions(-)
 delete mode 100644 arch/arm/mm/vmregion.c
 delete mode 100644 arch/arm/mm/vmregion.h

-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 3/3] ARM: mm: use static_vm for managing static mapped areas

2013-01-30 Thread Joonsoo Kim
A static mapped area is ARM-specific, so it is better not to use
generic vmalloc data structure, that is, vmlist and vmlist_lock
for managing static mapped area. And it causes some needless overhead and
reducing this overhead is better idea.

Now, we have newly introduced static_vm infrastructure.
With it, we don't need to iterate all mapped areas. Instead, we just
iterate static mapped areas. It helps to reduce an overhead of finding
matched area. And architecture dependency on vmalloc layer is removed,
so it will help to maintainability for vmalloc layer.

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c
index ceb34ae..7fe5b48 100644
--- a/arch/arm/mm/ioremap.c
+++ b/arch/arm/mm/ioremap.c
@@ -269,13 +269,14 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn,
const struct mem_type *type;
int err;
unsigned long addr;
-   struct vm_struct * area;
+   struct vm_struct *area;
+   phys_addr_t paddr = __pfn_to_phys(pfn);
 
 #ifndef CONFIG_ARM_LPAE
/*
 * High mappings must be supersection aligned
 */
-   if (pfn = 0x10  (__pfn_to_phys(pfn)  ~SUPERSECTION_MASK))
+   if (pfn = 0x10  (paddr  ~SUPERSECTION_MASK))
return NULL;
 #endif
 
@@ -291,24 +292,17 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn,
/*
 * Try to reuse one of the static mapping whenever possible.
 */
-   read_lock(vmlist_lock);
-   for (area = vmlist; area; area = area-next) {
-   if (!size || (sizeof(phys_addr_t) == 4  pfn = 0x10))
-   break;
-   if (!(area-flags  VM_ARM_STATIC_MAPPING))
-   continue;
-   if ((area-flags  VM_ARM_MTYPE_MASK) != VM_ARM_MTYPE(mtype))
-   continue;
-   if (__phys_to_pfn(area-phys_addr)  pfn ||
-   __pfn_to_phys(pfn) + size-1  area-phys_addr + 
area-size-1)
-   continue;
-   /* we can drop the lock here as we know *area is static */
-   read_unlock(vmlist_lock);
-   addr = (unsigned long)area-addr;
-   addr += __pfn_to_phys(pfn) - area-phys_addr;
-   return (void __iomem *) (offset + addr);
+   if (size  !((sizeof(phys_addr_t) == 4  pfn = 0x10))) {
+   struct static_vm *svm;
+
+   svm = find_static_vm_paddr(paddr, size,
+   STATIC_VM_TYPE(STATIC_VM_MEM, mtype));
+   if (svm) {
+   addr = (unsigned long)svm-vm.addr;
+   addr += paddr - svm-vm.phys_addr;
+   return (void __iomem *) (offset + addr);
+   }
}
-   read_unlock(vmlist_lock);
 
/*
 * Don't allow RAM to be mapped - this causes problems with ARMv6+
@@ -320,21 +314,21 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn,
if (!area)
return NULL;
addr = (unsigned long)area-addr;
-   area-phys_addr = __pfn_to_phys(pfn);
+   area-phys_addr = paddr;
 
 #if !defined(CONFIG_SMP)  !defined(CONFIG_ARM_LPAE)
if (DOMAIN_IO == 0 
(((cpu_architecture() = CPU_ARCH_ARMv6)  (get_cr()  CR_XP)) ||
   cpu_is_xsc3())  pfn = 0x10 
-  !((__pfn_to_phys(pfn) | size | addr)  ~SUPERSECTION_MASK)) {
+  !((paddr | size | addr)  ~SUPERSECTION_MASK)) {
area-flags |= VM_ARM_SECTION_MAPPING;
err = remap_area_supersections(addr, pfn, size, type);
-   } else if (!((__pfn_to_phys(pfn) | size | addr)  ~PMD_MASK)) {
+   } else if (!((paddr | size | addr)  ~PMD_MASK)) {
area-flags |= VM_ARM_SECTION_MAPPING;
err = remap_area_sections(addr, pfn, size, type);
} else
 #endif
-   err = ioremap_page_range(addr, addr + size, __pfn_to_phys(pfn),
+   err = ioremap_page_range(addr, addr + size, paddr,
 __pgprot(type-prot_pte));
 
if (err) {
@@ -418,34 +412,21 @@ __arm_ioremap_exec(unsigned long phys_addr, size_t size, 
bool cached)
 void __iounmap(volatile void __iomem *io_addr)
 {
void *addr = (void *)(PAGE_MASK  (unsigned long)io_addr);
-   struct vm_struct *vm;
+   struct static_vm *svm;
+
+   svm = find_static_vm_vaddr(addr);
+   if (svm)
+   return;
 
-   read_lock(vmlist_lock);
-   for (vm = vmlist; vm; vm = vm-next) {
-   if (vm-addr  addr)
-   break;
-   if (!(vm-flags  VM_IOREMAP))
-   continue;
-   /* If this is a static mapping we must leave it alone */
-   if ((vm-flags  VM_ARM_STATIC_MAPPING) 
-   (vm-addr = addr)  (vm-addr + vm-size  addr)) {
-   read_unlock(vmlist_lock

Re: [PATCH v4 3/3] ARM: mm: use static_vm for managing static mapped areas

2013-02-01 Thread JoonSoo Kim
2013/2/1 Nicolas Pitre nicolas.pi...@linaro.org:
 On Thu, 31 Jan 2013, Joonsoo Kim wrote:

 A static mapped area is ARM-specific, so it is better not to use
 generic vmalloc data structure, that is, vmlist and vmlist_lock
 for managing static mapped area. And it causes some needless overhead and
 reducing this overhead is better idea.

 Now, we have newly introduced static_vm infrastructure.
 With it, we don't need to iterate all mapped areas. Instead, we just
 iterate static mapped areas. It helps to reduce an overhead of finding
 matched area. And architecture dependency on vmalloc layer is removed,
 so it will help to maintainability for vmalloc layer.

 Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

 Comments below.


 diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c
 index ceb34ae..7fe5b48 100644
 --- a/arch/arm/mm/ioremap.c
 +++ b/arch/arm/mm/ioremap.c
 @@ -269,13 +269,14 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long 
 pfn,
   const struct mem_type *type;
   int err;
   unsigned long addr;
 - struct vm_struct * area;
 + struct vm_struct *area;
 + phys_addr_t paddr = __pfn_to_phys(pfn);

  #ifndef CONFIG_ARM_LPAE
   /*
* High mappings must be supersection aligned
*/
 - if (pfn = 0x10  (__pfn_to_phys(pfn)  ~SUPERSECTION_MASK))
 + if (pfn = 0x10  (paddr  ~SUPERSECTION_MASK))
   return NULL;
  #endif

 @@ -291,24 +292,17 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long 
 pfn,
   /*
* Try to reuse one of the static mapping whenever possible.
*/
 - read_lock(vmlist_lock);
 - for (area = vmlist; area; area = area-next) {
 - if (!size || (sizeof(phys_addr_t) == 4  pfn = 0x10))
 - break;
 - if (!(area-flags  VM_ARM_STATIC_MAPPING))
 - continue;
 - if ((area-flags  VM_ARM_MTYPE_MASK) != VM_ARM_MTYPE(mtype))
 - continue;
 - if (__phys_to_pfn(area-phys_addr)  pfn ||
 - __pfn_to_phys(pfn) + size-1  area-phys_addr + 
 area-size-1)
 - continue;
 - /* we can drop the lock here as we know *area is static */
 - read_unlock(vmlist_lock);
 - addr = (unsigned long)area-addr;
 - addr += __pfn_to_phys(pfn) - area-phys_addr;
 - return (void __iomem *) (offset + addr);
 + if (size  !((sizeof(phys_addr_t) == 4  pfn = 0x10))) {
 + struct static_vm *svm;
 +
 + svm = find_static_vm_paddr(paddr, size,
 + STATIC_VM_TYPE(STATIC_VM_MEM, mtype));
 + if (svm) {
 + addr = (unsigned long)svm-vm.addr;
 + addr += paddr - svm-vm.phys_addr;
 + return (void __iomem *) (offset + addr);
 + }
   }
 - read_unlock(vmlist_lock);

   /*
* Don't allow RAM to be mapped - this causes problems with ARMv6+
 @@ -320,21 +314,21 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long 
 pfn,
   if (!area)
   return NULL;
   addr = (unsigned long)area-addr;
 - area-phys_addr = __pfn_to_phys(pfn);
 + area-phys_addr = paddr;

  #if !defined(CONFIG_SMP)  !defined(CONFIG_ARM_LPAE)
   if (DOMAIN_IO == 0 
   (((cpu_architecture() = CPU_ARCH_ARMv6)  (get_cr()  CR_XP)) ||
  cpu_is_xsc3())  pfn = 0x10 
 -!((__pfn_to_phys(pfn) | size | addr)  ~SUPERSECTION_MASK)) {
 +!((paddr | size | addr)  ~SUPERSECTION_MASK)) {
   area-flags |= VM_ARM_SECTION_MAPPING;
   err = remap_area_supersections(addr, pfn, size, type);
 - } else if (!((__pfn_to_phys(pfn) | size | addr)  ~PMD_MASK)) {
 + } else if (!((paddr | size | addr)  ~PMD_MASK)) {
   area-flags |= VM_ARM_SECTION_MAPPING;
   err = remap_area_sections(addr, pfn, size, type);
   } else
  #endif
 - err = ioremap_page_range(addr, addr + size, __pfn_to_phys(pfn),
 + err = ioremap_page_range(addr, addr + size, paddr,
__pgprot(type-prot_pte));

   if (err) {
 @@ -418,34 +412,21 @@ __arm_ioremap_exec(unsigned long phys_addr, size_t 
 size, bool cached)
  void __iounmap(volatile void __iomem *io_addr)
  {
   void *addr = (void *)(PAGE_MASK  (unsigned long)io_addr);
 - struct vm_struct *vm;
 + struct static_vm *svm;
 +

 You could salvage the /* If this is a static mapping we must leave it
 alone */ comment here.

Okay.

 + svm = find_static_vm_vaddr(addr);
 + if (svm)
 + return;

 - read_lock(vmlist_lock);
 - for (vm = vmlist; vm; vm = vm-next) {
 - if (vm-addr  addr)
 - break;
 - if (!(vm-flags  VM_IOREMAP))
 - continue;
 - /* If this is a static mapping we must leave it alone */
 - if ((vm-flags

Re: [PATCH v4 2/3] ARM: ioremap: introduce an infrastructure for static mapped area

2013-02-01 Thread JoonSoo Kim
Hello, Nicolas.

2013/2/1 Nicolas Pitre nicolas.pi...@linaro.org:
 On Thu, 31 Jan 2013, Joonsoo Kim wrote:

 In current implementation, we used ARM-specific flag, that is,
 VM_ARM_STATIC_MAPPING, for distinguishing ARM specific static mapped area.
 The purpose of static mapped area is to re-use static mapped area when
 entire physical address range of the ioremap request can be covered
 by this area.

 This implementation causes needless overhead for some cases.
 For example, assume that there is only one static mapped area and
 vmlist has 300 areas. Every time we call ioremap, we check 300 areas for
 deciding whether it is matched or not. Moreover, even if there is
 no static mapped area and vmlist has 300 areas, every time we call
 ioremap, we check 300 areas in now.

 If we construct a extra list for static mapped area, we can eliminate
 above mentioned overhead.
 With a extra list, if there is one static mapped area,
 we just check only one area and proceed next operation quickly.

 In fact, it is not a critical problem, because ioremap is not frequently
 used. But reducing overhead is better idea.

 Another reason for doing this work is for removing architecture dependency
 on vmalloc layer. I think that vmlist and vmlist_lock is internal data
 structure for vmalloc layer. Some codes for debugging and stat inevitably
 use vmlist and vmlist_lock. But it is preferable that they are used
 as least as possible in outside of vmalloc.c

 Now, I introduce an ARM-specific infrastructure for static mapped area. In
 the following patch, we will use this and resolve above mentioned problem.

 Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

 Much better.  Comments below.

Thanks.

 diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c
 index 88fd86c..ceb34ae 100644
 --- a/arch/arm/mm/ioremap.c
 +++ b/arch/arm/mm/ioremap.c
 @@ -39,6 +39,78 @@
  #include asm/mach/pci.h
  #include mm.h

 +
 +LIST_HEAD(static_vmlist);
 +static DEFINE_RWLOCK(static_vmlist_lock);

 In fact you don't need a lock at all.  The only writer is
 add_static_vm_early() and we know it is only used during boot when the
 kernel is still single-threaded.

Yes!

 +
 +static struct static_vm *find_static_vm_paddr(phys_addr_t paddr,
 + size_t size, unsigned long flags)
 +{
 + struct static_vm *svm;
 + struct vm_struct *vm;
 +
 + read_lock(static_vmlist_lock);
 + list_for_each_entry(svm, static_vmlist, list) {
 + if (svm-flags != flags)
 + continue;
 +
 + vm = svm-vm;
 + if (vm-phys_addr  paddr ||
 + paddr + size - 1  vm-phys_addr + vm-size - 1)
 + continue;
 +
 + read_unlock(static_vmlist_lock);
 + return svm;
 + }
 +
 + return NULL;
 +}
 +
 +struct static_vm *find_static_vm_vaddr(void *vaddr)
 +{
 + struct static_vm *svm;
 + struct vm_struct *vm;
 +
 + read_lock(static_vmlist_lock);
 + list_for_each_entry(svm, static_vmlist, list) {
 + vm = svm-vm;
 +
 + /* static_vmlist is ascending order */
 + if (vm-addr  vaddr)
 + break;
 +
 + if (vm-addr = vaddr  vm-addr + vm-size  vaddr) {
 + read_unlock(static_vmlist_lock);
 + return svm;
 + }
 + }
 + read_unlock(static_vmlist_lock);
 +
 + return NULL;
 +}
 +
 +void add_static_vm_early(struct static_vm *svm, unsigned long flags)

 This should be marked with __init.  This way, it is less likely to be
 used after boot, especially with no locking.  And vm_area_add_early() is
 valid only if !vmap_initialized anyway, and also __init.

Okay.

 +{
 + struct static_vm *curr_svm;
 + struct vm_struct *vm;
 + void *vaddr;
 +
 + vm_area_add_early(svm-vm);
 +
 + vaddr = svm-vm.addr;
 + svm-flags = flags;
 +
 + write_lock(static_vmlist_lock);
 + list_for_each_entry(curr_svm, static_vmlist, list) {
 + vm = curr_svm-vm;
 +
 + if (vm-addr  vaddr)
 + break;
 + }
 + list_add_tail(svm-list, curr_svm-list);
 + write_unlock(static_vmlist_lock);
 +}
 +
  int ioremap_page(unsigned long virt, unsigned long phys,
const struct mem_type *mtype)
  {
 diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
 index a8ee92d..fb45c79 100644
 --- a/arch/arm/mm/mm.h
 +++ b/arch/arm/mm/mm.h
 @@ -1,4 +1,6 @@
  #ifdef CONFIG_MMU
 +#include linux/list.h
 +#include linux/vmalloc.h

  /* the upper-most page table pointer */
  extern pmd_t *top_pmd;
 @@ -65,6 +67,24 @@ extern void __flush_dcache_page(struct address_space 
 *mapping, struct page *page
  /* consistent regions used by dma_alloc_attrs() */
  #define VM_ARM_DMA_CONSISTENT0x2000

 +
 +/* ARM specific static_vm-flags bits */
 +#define STATIC_VM_MEM0x0001
 +#define STATIC_VM_EMPTY  0x0002
 +#define STATIC_VM_MTYPE(mtype)   ((mtype

Re: [PATCH] x86, reboot: skip reboot_fixups in early boot phase

2013-01-24 Thread Joonsoo Kim
Hello, Bjorn.

On Thu, Jan 24, 2013 at 10:45:13AM -0700, Bjorn Helgaas wrote:
 On Fri, Dec 28, 2012 at 6:50 AM, Joonsoo Kim js1...@gmail.com wrote:
  During early boot phase, PCI bus subsystem is not yet initialized.
  If panic is occured in early boot phase and panic_timeout is set,
  code flow go into emergency_restart() and hit mach_reboot_fixups(), then
  encounter another panic. When second panic, we can't hold a panic_lock, so
  code flow go into panic_smp_self_stop() which prevent system to restart.
 
  For avoid second panic, skip reboot_fixups in early boot phase.
  It makes panic_timeout works in early boot phase.
 
  Cc: Thomas Gleixner t...@linutronix.de
  Cc: Ingo Molnar mi...@redhat.com
  Cc: H. Peter Anvin h...@zytor.com
  Signed-off-by: Joonsoo Kim js1...@gmail.com
 
  diff --git a/arch/x86/kernel/reboot_fixups_32.c 
  b/arch/x86/kernel/reboot_fixups_32.c
  index c8e41e9..b9b8ec9 100644
  --- a/arch/x86/kernel/reboot_fixups_32.c
  +++ b/arch/x86/kernel/reboot_fixups_32.c
  @@ -89,6 +89,10 @@ void mach_reboot_fixups(void)
  if (in_interrupt())
  return;
 
  +   /* During early boot phase, PCI is not yet initialized */
  +   if (system_state == SYSTEM_BOOTING)
  +   return;
  +
  for (i=0; i  ARRAY_SIZE(fixups_table); i++) {
  cur = (fixups_table[i]);
  dev = pci_get_device(cur-vendor, cur-device, NULL);
 
 I guess you're saying that if we call pci_get_device() too early, it
 panics?  Did you figure out why that happens?
 
 If we call pci_get_device() before PCI has been initialized, it would
 be good if it just returned NULL, indicating that we didn't find any
 matching device.  I looked briefly, and I thought that's what would
 happen, but apparently I'm missing something.

In bus_find_device(), klist_iter_init_node() is called with 
@bus-p-klist_device. Before initialization, bus-p is NULL,
so panic is occured.

 Bjorn
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, reboot: skip reboot_fixups in early boot phase

2013-01-28 Thread Joonsoo Kim
On Thu, Jan 24, 2013 at 08:52:32PM -0800, Greg Kroah-Hartman wrote:
 On Thu, Jan 24, 2013 at 09:21:52PM -0700, Bjorn Helgaas wrote:
  On Thu, Jan 24, 2013 at 9:14 PM, Greg Kroah-Hartman
  gre...@linuxfoundation.org wrote:
   On Thu, Jan 24, 2013 at 07:59:01PM -0700, Bjorn Helgaas wrote:
   [+cc Greg for driver core]
  
   On Fri, Jan 25, 2013 at 10:13:03AM +0900, Joonsoo Kim wrote:
Hello, Bjorn.
   
On Thu, Jan 24, 2013 at 10:45:13AM -0700, Bjorn Helgaas wrote:
 On Fri, Dec 28, 2012 at 6:50 AM, Joonsoo Kim js1...@gmail.com 
 wrote:
  During early boot phase, PCI bus subsystem is not yet initialized.
  If panic is occured in early boot phase and panic_timeout is set,
  code flow go into emergency_restart() and hit 
  mach_reboot_fixups(), then
  encounter another panic. When second panic, we can't hold a 
  panic_lock, so
  code flow go into panic_smp_self_stop() which prevent system to 
  restart.
 
  For avoid second panic, skip reboot_fixups in early boot phase.
  It makes panic_timeout works in early boot phase.
 
  Cc: Thomas Gleixner t...@linutronix.de
  Cc: Ingo Molnar mi...@redhat.com
  Cc: H. Peter Anvin h...@zytor.com
  Signed-off-by: Joonsoo Kim js1...@gmail.com
 
  diff --git a/arch/x86/kernel/reboot_fixups_32.c 
  b/arch/x86/kernel/reboot_fixups_32.c
  index c8e41e9..b9b8ec9 100644
  --- a/arch/x86/kernel/reboot_fixups_32.c
  +++ b/arch/x86/kernel/reboot_fixups_32.c
  @@ -89,6 +89,10 @@ void mach_reboot_fixups(void)
  if (in_interrupt())
  return;
 
  +   /* During early boot phase, PCI is not yet initialized */
  +   if (system_state == SYSTEM_BOOTING)
  +   return;
  +
  for (i=0; i  ARRAY_SIZE(fixups_table); i++) {
  cur = (fixups_table[i]);
  dev = pci_get_device(cur-vendor, cur-device, 
  NULL);

 I guess you're saying that if we call pci_get_device() too early, it
 panics?  Did you figure out why that happens?

 If we call pci_get_device() before PCI has been initialized, it would
 be good if it just returned NULL, indicating that we didn't find any
 matching device.  I looked briefly, and I thought that's what would
 happen, but apparently I'm missing something.
   
In bus_find_device(), klist_iter_init_node() is called with
@bus-p-klist_device. Before initialization, bus-p is NULL,
so panic is occured.
  
   I see.  pci_bus_type.p is initialized by __bus_register() in this path:
  
 pci_driver_init# postcore_initcall
   bus_register(pci_bus_type)
 __bus_register
   priv = kzalloc(sizeof(struct subsys_private))
 bus-p = priv
 klist_init(priv-klist_devices, klist_devices_get, 
   klist_devices_put)
  
   I was hoping we could statically initialize the klist, but that doesn't
   seem likely.
  
   But I wonder if we could do something like the following.  If we could,
   then callers wouldn't have to worry about whether or not the bus has been
   initialized.
  
   snip
  
   I have no objection to that patch, but really, someone wants to call
   pci_find_device() before PCI is initialized?  Can't that be fixed
   instead, as that is the root problem, not the driver core.
  
   But, to paper over your subsystem's bugs, I guess I can take it :)
  
  The caller is in the native_machine_emergency_restart() path.
  Joonsoo's original patch does what I think you're suggesting:
  
  +   /* During early boot phase, PCI is not yet initialized */
  +   if (system_state == SYSTEM_BOOTING)
  +   return;
  +
  for (i=0; i  ARRAY_SIZE(fixups_table); i++) {
  cur = (fixups_table[i]);
  dev = pci_get_device(cur-vendor, cur-device, 
  NULL);
  
  I think it's sort of ugly to check system_state before using
  pci_get_device(), and there's not really an obvious connection between
  system_state and PCI initialization.
  
  But if you prefer that, Joonsoo's original patch is fine with me.
 
 Both probably would be best, as there are probably other things that you
 don't want to touch when you are still booting and trying to restart the
 machine at the same time.

I agree that my patch is ugly :)
So I drop it and wait for your patch.

Thanks.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] slub: assign refcount for kmalloc_caches

2013-01-28 Thread Joonsoo Kim
On Thu, Jan 24, 2013 at 10:32:32PM -0500, CAI Qian wrote:
 
 
 - Original Message -
  From: Greg Kroah-Hartman gre...@linuxfoundation.org
  To: Joonsoo Kim iamjoonsoo@lge.com
  Cc: Paul Hargrove phhargr...@lbl.gov, Pekka Enberg 
  penb...@kernel.org, linux-kernel@vger.kernel.org,
  linux...@kvack.org, Christoph Lameter c...@linux.com
  Sent: Tuesday, January 15, 2013 3:23:36 AM
  Subject: Re: [PATCH] slub: assign refcount for kmalloc_caches
  
  On Fri, Jan 11, 2013 at 04:52:54PM +0900, Joonsoo Kim wrote:
   On Thu, Jan 10, 2013 at 08:47:39PM -0800, Paul Hargrove wrote:
I just had a look at patch-3.7.2-rc1, and this change doesn't
appear to
have made it in yet.
Am I missing something?

-Paul
   
   I try to check it.
   Ccing to Greg.
   
   Hello, Pekka and Greg.
   
   v3.8-rcX has already fixed by another stuff, but it is not simple
   change.
   So I made a new patch and sent it.
   
   How this kind of patch (only for stable v3.7) go into stable tree?
   through Pekka's slab tree? or send it to Greg, directly?
   
   I don't know how to submit this kind of patch to stable tree
   exactly.
   Could anyone help me?
  
  Please redo it, and send it to sta...@vger.kernel.org, and say
  exactly
  why it isn't in Linus's tree, and that it should only be applied to
  3.7-stable.
 I also met this during the testing, so I'll re-send it then.

Hello, CAI Qian.

I totally forget this.
Thanks for this work.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/3] introduce static_vm for ARM-specific static mapped area

2013-01-28 Thread Joonsoo Kim
On Mon, Jan 28, 2013 at 01:04:24PM -0500, Nicolas Pitre wrote:
 On Mon, 28 Jan 2013, Will Deacon wrote:
 
  Hello,
  
  On Thu, Jan 24, 2013 at 01:28:51AM +, Joonsoo Kim wrote:
   In current implementation, we used ARM-specific flag, that is,
   VM_ARM_STATIC_MAPPING, for distinguishing ARM specific static mapped area.
   The purpose of static mapped area is to re-use static mapped area when
   entire physical address range of the ioremap request can be covered
   by this area.
   
   This implementation causes needless overhead for some cases.
   For example, assume that there is only one static mapped area and
   vmlist has 300 areas. Every time we call ioremap, we check 300 areas for
   deciding whether it is matched or not. Moreover, even if there is
   no static mapped area and vmlist has 300 areas, every time we call
   ioremap, we check 300 areas in now.
   
   If we construct a extra list for static mapped area, we can eliminate
   above mentioned overhead.
   With a extra list, if there is one static mapped area,
   we just check only one area and proceed next operation quickly.
   
   In fact, it is not a critical problem, because ioremap is not frequently
   used. But reducing overhead is better idea.
   
   Another reason for doing this work is for removing vm_struct list 
   management,
   entirely. For more information, look at the following link.
   http://lkml.org/lkml/2012/12/6/184
  
  First patch looks good (removing the unused vmregion stuff) but I'm not so
  sure about the rest of it. If you really care about ioremap performance,
  perhaps it would be better to have a container struct around the vm_struct
  for static mappings and then stick them in an augmented rbtree so you can
  efficiently find the mapping encompassing a particular physical address?
 
 How can ioremap performance be a problem is the question I had since the 
 beginning.
 
 Firstly, ioremap is _not_ meant to be used in performance critical 
 paths.
 
 Secondly, there shouldn't be _that_ many entries on the vmlist such as 
 300.  That sounds a bit excessive.
 
 So please, can we discuss the reasons that motivated those patches in 
 the first place?  Maybe that's where the actual problem is.

Hello, Wiil and Nicolas.
First of all, thanks for reviewing.

There is another reason for doing this work.
As mentioned above, I try to remove list management for vm_struct(vmlist),
entirely. For that purpose, removing architecture dependency against vmlist
is needed. Below link is for my RFC patch trying to remove list management
for vm_struct.

http://lkml.org/lkml/2012/12/6/184

Removing dependency for other architectures is rather trivial, but for ARM,
it is not trivial case. So I prepared this patchset.
My description makes you missleading possibly.
Sorry for this.

Answer for your other questions is below.

I know ioremap is _not_ meant to be used in performance critical paths, and
I mentioned it earlier.
In fact, it is not a critical problem, because ioremap is not frequently used.
But reducing overhead is better idea.

And, there are many entries on the vmlist for my test devices(Android phone).
I saw more than 300 entries in former days, but today, I re-check it and
find 230~250 entries.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 1/3] staging, zsmalloc: introduce zs_mem_[read/write]

2013-01-20 Thread Joonsoo Kim
Hello, Minchan.

On Thu, Jan 17, 2013 at 08:59:22AM +0900, Minchan Kim wrote:
 Hi Joonsoo,
 
 On Wed, Jan 16, 2013 at 05:08:55PM +0900, Joonsoo Kim wrote:
  If object is on boundary of page, zs_map_object() copy content of object
  to pre-allocated page and return virtual address of
 
 IMHO, for reviewer reading easily, it would be better to specify explict
 word instead of abstract.
 
 pre-allocated pages : vm_buf which is reserved pages for zsmalloc
 
  this pre-allocated page. If user inform zsmalloc of memcpy region,
  we can avoid this copy overhead.
 
 That's a good idea!
 
  This patch implement two API and these get information of memcpy region.
  Using this information, we can do memcpy without overhead.
 
 For the clarification,
 
   we can reduce copy overhead with this patch
   in !USE_PGTABLE_MAPPING case.
 
  
  For USE_PGTABLE_MAPPING case, we can avoid flush cache and tlb overhead
  via these API.
 
 Yeb!
 
  
  Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com
  ---
  These are [RFC] patches, because I don't test and
  I don't have test environment, yet. Just compile test done.
  If there is positive comment, I will setup test env and check correctness.
  These are based on v3.8-rc3.
  If rebase is needed, please notify me what tree I should rebase.
 
 Whenever you send zsmalloc/zram/zcache, you have to based on recent 
 linux-next.
 But I hope we send the patches to akpm by promoting soon. :(
 
  
  diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c 
  b/drivers/staging/zsmalloc/zsmalloc-main.c
  index 09a9d35..e3ef5a5 100644
  --- a/drivers/staging/zsmalloc/zsmalloc-main.c
  +++ b/drivers/staging/zsmalloc/zsmalloc-main.c
  @@ -1045,6 +1045,118 @@ void zs_unmap_object(struct zs_pool *pool, unsigned 
  long handle)
   }
   EXPORT_SYMBOL_GPL(zs_unmap_object);
   
 
 It's exported function. Please write description.
 
  +void zs_mem_read(struct zs_pool *pool, unsigned long handle,
  +   void *dest, unsigned long src_off, size_t n)
 
 n is meaningless, please use meaningful word.
 How about this?
 void *buf, unsigned long offset, size_t count
 
  +{
  +   struct page *page;
  +   unsigned long obj_idx, off;
  +
  +   unsigned int class_idx;
  +   enum fullness_group fg;
  +   struct size_class *class;
  +   struct page *pages[2];
  +   int sizes[2];
  +   void *addr;
  +
  +   BUG_ON(!handle);
  +
  +   /*
  +* Because we use per-cpu mapping areas shared among the
  +* pools/users, we can't allow mapping in interrupt context
  +* because it can corrupt another users mappings.
  +*/
  +   BUG_ON(in_interrupt());
  +
  +   obj_handle_to_location(handle, page, obj_idx);
  +   get_zspage_mapping(get_first_page(page), class_idx, fg);
  +   class = pool-size_class[class_idx];
  +   off = obj_idx_to_offset(page, obj_idx, class-size);
  +   off += src_off;
  +
  +   BUG_ON(class-size  n);
  +
  +   if (off + n = PAGE_SIZE) {
  +   /* this object is contained entirely within a page */
  +   addr = kmap_atomic(page);
  +   memcpy(dest, addr + off, n);
  +   kunmap_atomic(addr);
  +   return;
  +   }
  +
  +   /* this object spans two pages */
  +   pages[0] = page;
  +   pages[1] = get_next_page(page);
  +   BUG_ON(!pages[1]);
  +
  +   sizes[0] = PAGE_SIZE - off;
  +   sizes[1] = n - sizes[0];
  +
  +   addr = kmap_atomic(pages[0]);
  +   memcpy(dest, addr + off, sizes[0]);
  +   kunmap_atomic(addr);
  +
  +   addr = kmap_atomic(pages[1]);
  +   memcpy(dest + sizes[0], addr, sizes[1]);
  +   kunmap_atomic(addr);
  +}
  +EXPORT_SYMBOL_GPL(zs_mem_read);
  +
 
 Ditto. Write descriptoin.
 
  +void zs_mem_write(struct zs_pool *pool, unsigned long handle,
  +   const void *src, unsigned long dest_off, size_t n)
  +{
  +   struct page *page;
  +   unsigned long obj_idx, off;
  +
  +   unsigned int class_idx;
  +   enum fullness_group fg;
  +   struct size_class *class;
  +   struct page *pages[2];
  +   int sizes[2];
  +   void *addr;
  +
  +   BUG_ON(!handle);
  +
  +   /*
  +* Because we use per-cpu mapping areas shared among the
  +* pools/users, we can't allow mapping in interrupt context
  +* because it can corrupt another users mappings.
  +*/
  +   BUG_ON(in_interrupt());
  +
  +   obj_handle_to_location(handle, page, obj_idx);
  +   get_zspage_mapping(get_first_page(page), class_idx, fg);
  +   class = pool-size_class[class_idx];
  +   off = obj_idx_to_offset(page, obj_idx, class-size);
  +   off += dest_off;
  +
  +   BUG_ON(class-size  n);
  +
  +   if (off + n = PAGE_SIZE) {
  +   /* this object is contained entirely within a page */
  +   addr = kmap_atomic(page);
  +   memcpy(addr + off, src, n);
  +   kunmap_atomic(addr);
  +   return;
  +   }
  +
  +   /* this object spans two pages */
  +   pages[0] = page;
  +   pages[1] = get_next_page(page);
  +   BUG_ON(!pages[1

[PATCH v2 2/3] slub: correct bootstrap() for kmem_cache, kmem_cache_node

2013-01-21 Thread Joonsoo Kim
Current implementation of bootstrap() is not sufficient for kmem_cache
and kmem_cache_node.

First, for kmem_cache.
bootstrap() call kmem_cache_zalloc() at first. When kmem_cache_zalloc()
is called, kmem_cache's slab is moved to cpu slab for satisfying kmem_cache
allocation request. In current implementation, we only consider
n-partial slabs, so, we miss this cpu slab for kmem_cache.

Second, for kmem_cache_node.
When slab_state = PARTIAL, create_boot_cache() is called. And then,
kmem_cache_node's slab is moved to cpu slab for satisfying kmem_cache_node
allocation request. So, we also miss this slab.

These didn't make any error previously, because we normally don't free
objects which comes from kmem_cache's first slab and kmem_cache_node's.

Problem will be solved if we consider a cpu slab in bootstrap().
This patch implement it.

v2: don't loop over all processors in bootstrap().

Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com

diff --git a/mm/slub.c b/mm/slub.c
index 7204c74..8b95364 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3614,10 +3614,15 @@ static int slab_memory_callback(struct notifier_block 
*self,
 static struct kmem_cache * __init bootstrap(struct kmem_cache *static_cache)
 {
int node;
+   struct kmem_cache_cpu *c;
struct kmem_cache *s = kmem_cache_zalloc(kmem_cache, GFP_NOWAIT);
 
memcpy(s, static_cache, kmem_cache-object_size);
 
+   c = this_cpu_ptr(s-cpu_slab);
+   if (c-page)
+   c-page-slab_cache = s;
+
for_each_node_state(node, N_NORMAL_MEMORY) {
struct kmem_cache_node *n = get_node(s, node);
struct page *p;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    1   2   3   4   5   6   7   8   9   10   >