date:20130309

Re: [PATCH v2, part1 22/29] mm/SPARC: use common help functions to free reserved pages

2013-03-09 Thread Sam Ravnborg

On Sun, Mar 10, 2013 at 02:27:05PM +0800, Jiang Liu wrote:
> Use common help functions to free reserved pages.
> 
> Signed-off-by: Jiang Liu 
> Acked-by: David S. Miller 
> Cc: Sam Ravnborg 
Acked-by: Sam Ravnborg 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 07/20] vmcore: copy non page-size aligned head and tail pages in 2nd kernel

2013-03-09 Thread Zhang Yanfei

于 2013年03月02日 16:36, HATAYAMA Daisuke 写道:
> Due to mmap() requirement, we need to copy pages not starting or
> ending with page-size aligned address in 2nd kernel and to map them to
> user-space.
> 
> For example, see the map below:
> 
> -0001 : reserved
> 0001-0009f7ff : System RAM
> 0009f800-0009 : reserved
> 
> where the System RAM ends with 0x9f800 that is not page-size
> aligned. This map is divided into two parts:
> 
> 0001-0009dfff

0001-0009efff

> 0009f000-0009f7ff
> 
> and the first one is kept in old memory and the 2nd one is copied into
> buffer on 2nd kernel.
> 
> This kind of non-page-size-aligned area can always occur since any
> part of System RAM can be converted into reserved area at runtime.
> 
> If not doing copying like this and if remapping non page-size aligned
> pages on old memory directly, mmap() had to export memory which is not
> dump target to user-space. In the above example this is reserved
> 0x9f800-0xa.
> 
> Signed-off-by: HATAYAMA Daisuke 
> ---
> 
>  fs/proc/vmcore.c |  192 
> --
>  1 files changed, 172 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index c511cf4..6b071b4 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -474,11 +474,10 @@ static int __init 
> process_ptload_program_headers_elf64(char *elfptr,
>   size_t elfsz,
>   struct list_head *vc_list)
>  {
> - int i;
> + int i, rc;
>   Elf64_Ehdr *ehdr_ptr;
>   Elf64_Phdr *phdr_ptr;
>   loff_t vmcore_off;
> - struct vmcore *new;
>  
>   ehdr_ptr = (Elf64_Ehdr *)elfptr;
>   phdr_ptr = (Elf64_Phdr*)(elfptr + ehdr_ptr->e_phoff); /* PT_NOTE hdr */
> @@ -488,20 +487,97 @@ static int __init 
> process_ptload_program_headers_elf64(char *elfptr,
> PAGE_SIZE);
>  
>   for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> + u64 start, end, rest;
> +
>   if (phdr_ptr->p_type != PT_LOAD)
>   continue;
>  
> - /* Add this contiguous chunk of memory to vmcore list.*/
> - new = get_new_element();
> - if (!new)
> - return -ENOMEM;
> - new->paddr = phdr_ptr->p_offset;
> - new->size = phdr_ptr->p_memsz;
> - list_add_tail(>list, vc_list);
> + start = phdr_ptr->p_offset;
> + end = phdr_ptr->p_offset + phdr_ptr->p_memsz;
> + rest = phdr_ptr->p_memsz;
> +
> + if (start & ~PAGE_MASK) {
> + u64 paddr, len;
> + char *buf;
> + struct vmcore *new;
> +
> + paddr = start;
> + len = min(roundup(start,PAGE_SIZE), end) - start;
> +
> + buf = (char *)get_zeroed_page(GFP_KERNEL);
> + if (!buf)
> + return -ENOMEM;
> + rc = read_from_oldmem(buf + (start & ~PAGE_MASK), len,
> +   , 0);
> + if (rc < 0) {
> + free_pages((unsigned long)buf, 0);
> + return rc;
> + }
> +
> + new = get_new_element();
> + if (!new) {
> + free_pages((unsigned long)buf, 0);
> + return -ENOMEM;
> + }
> + new->flag |= MEM_TYPE_CURRENT_KERNEL;
> + new->size = PAGE_SIZE;
> + new->buf = buf;
> + list_add_tail(>list, vc_list);
> +
> + rest -= len;
> + }
> +
> + if (rest > 0 &&
> + roundup(start, PAGE_SIZE) < rounddown(end, PAGE_SIZE)) {
> + u64 paddr, len;
> + struct vmcore *new;
> +
> + paddr = roundup(start, PAGE_SIZE);
> + len =rounddown(end,PAGE_SIZE)-roundup(start,PAGE_SIZE);
> +
> + new = get_new_element();
> + if (!new)
> + return -ENOMEM;
> + new->paddr = paddr;
> + new->size = len;
> + list_add_tail(>list, vc_list);
> +
> + rest -= len;
> + }
> +
> + if (rest > 0) {
> + u64 paddr, len;
> + char *buf;
> + struct vmcore *new;
> +
> + paddr = rounddown(end, PAGE_SIZE);
> + len = end - rounddown(end, PAGE_SIZE);
> +
> + buf = (char *)get_zeroed_page(GFP_KERNEL);
> + if (!buf)
> + return

[PATCH v2 00/20] x86, ACPI, numa: Parse numa info early

2013-03-09 Thread Yinghai Lu

One commit that tried to parse SRAT early get reverted before v3.9-rc1.

| commit e8d1955258091e4c92d5a975ebd7fd8a98f5d30f
| Author: Tang Chen 
| Date:   Fri Feb 22 16:33:44 2013 -0800
|
|acpi, memory-hotplug: parse SRAT before memblock is ready

It broke several things, like acpi override and fall back path etc.

This patchset is clean implementation that will parse numa info early.
1. keep the acpi table initrd override working by split finding with copying.
   finding is done at head_32.S and head64.c stage,
in head_32.S, initrd is accessed in 32bit flat mode with phys addr.
in head64.c, initrd is accessed via kernel low mapping address
with help of #PF set page table.
   copying is done with early_ioremap just after memblock is setup.
2. keep fallback path working. numaq and ACPI and amd_nmua and dummy.
   seperate initmem_init to two stages.
   early_initmem_init will only extract numa info early into numa_meminfo.
   initmem_init will keep slit and emulation handling.
3. keep other old code flow untouched like relocate_initrd and initmem_init.
   early_initmem_init will take old init_mem_mapping position.
   it call early_x86_numa_init and init_mem_mapping for every nodes.
   For 64bit, we avoid having size limit on initrd, as relocate_initrd
   is still after init_mem_mapping for all memory.
4. last patch will try to put page table on local node, so that memory
   hotplug will be happy.

In short, early_initmem_init will parse numa info early and call
init_mem_mapping to set page table for every nodes's mem.

could be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git 
for-x86-mm

and it is based on today's Linus tree.

-v2: Address tj's review and split patches to small ones.

Thanks

Yinghai

Yinghai Lu (20):
  x86: Change get_ramdisk_image() to global
  x86, microcode: Use common get_ramdisk_image()
  x86, ACPI, mm: Kill max_low_pfn_mapped
  x86, ACPI: Increase override tables number limit
  x86, ACPI: Split acpi_initrd_override to find/copy two functions
  x86, ACPI: Store override acpi tables phys addr in cpio files info array
  x86, ACPI: Make acpi_initrd_override_find work with 32bit flat mode
  x86, ACPI: Find acpi tables in initrd early from head_32.S/head64.c
  x86, mm, numa: Move two functions calling on successful path later
  x86, mm, numa: Call numa_meminfo_cover_memory() checking early
  x86, mm, numa: Move node_map_pfn alignment() to x86
  x86, mm, numa: Use numa_meminfo to check node_map_pfn alignment
  x86, mm, numa: Set memblock nid later
  x86, mm, numa: Move node_possible_map setting later
  x86, mm, numa: Move emulation handling down.
  x86, ACPI, numa, ia64: split SLIT handling out
  x86, mm, numa: Add early_initmem_init() stub
  x86, mm: Parse numa info early
  x86, mm: Make init_mem_mapping be able to be called several times
  x86, mm, numa: Put pagetable on local node ram for 64bit

 arch/ia64/kernel/setup.c|4 +-
 arch/x86/include/asm/acpi.h |3 +-
 arch/x86/include/asm/page_types.h   |2 +-
 arch/x86/include/asm/pgtable.h  |2 +-
 arch/x86/include/asm/setup.h|9 ++
 arch/x86/kernel/head64.c|2 +
 arch/x86/kernel/head_32.S   |4 +
 arch/x86/kernel/microcode_intel_early.c |8 +-
 arch/x86/kernel/setup.c |   86 ++-
 arch/x86/mm/init.c  |   88 +++-
 arch/x86/mm/numa.c  |  240 ---
 arch/x86/mm/numa_emulation.c|2 +-
 arch/x86/mm/numa_internal.h |2 +
 arch/x86/mm/srat.c  |   11 +-
 drivers/acpi/numa.c |   13 +-
 drivers/acpi/osl.c  |  134 +++--
 include/linux/acpi.h|   20 +--
 include/linux/mm.h  |3 -
 mm/page_alloc.c |   52 +--
 19 files changed, 445 insertions(+), 240 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 04/20] x86, ACPI: Increase override tables number limit

2013-03-09 Thread Yinghai Lu

Current acpi tables in initrd is limited to 10, that is too small.
64 should be good enough as we have 35 sigs and could have several
SSDT.

Two problems in current code prevent us from increasing limit:
1. that cpio file info array is put in stack, as every element is 32
   bytes, could run out of stack if we have that array size to 64.
   We can move it out from stack, and make it as global and put it in
   __initdata section.
2. early_ioremap only can remap 256k one time. Current code is mapping
   10 tables one time. If we increase that limit, whole size could be
   more than 256k, early_ioremap will fail with that.
   We can map table one by one during copying, instead of mapping
   all them one time.

-v2: According to tj, split it out to separated patch, also
 rename array name to acpi_initrd_files.

Signed-off-by: Yinghai 
Cc: Rafael J. Wysocki 
Cc: linux-a...@vger.kernel.org
---
 drivers/acpi/osl.c |   21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index c08cdb6..8aaf721 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -569,8 +569,8 @@ static const char * const table_sigs[] = {
 
 #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
 
-/* Must not increase 10 or needs code modification below */
-#define ACPI_OVERRIDE_TABLES 10
+#define ACPI_OVERRIDE_TABLES 64
+static struct cpio_data __initdata acpi_initrd_files[ACPI_OVERRIDE_TABLES];
 
 void __init acpi_initrd_override(void *data, size_t size)
 {
@@ -579,7 +579,6 @@ void __init acpi_initrd_override(void *data, size_t size)
struct acpi_table_header *table;
char cpio_path[32] = "kernel/firmware/acpi/";
struct cpio_data file;
-   struct cpio_data early_initrd_files[ACPI_OVERRIDE_TABLES];
char *p;
 
if (data == NULL || size == 0)
@@ -617,8 +616,8 @@ void __init acpi_initrd_override(void *data, size_t size)
table->signature, cpio_path, file.name, table->length);
 
all_tables_size += table->length;
-   early_initrd_files[table_nr].data = file.data;
-   early_initrd_files[table_nr].size = file.size;
+   acpi_initrd_files[table_nr].data = file.data;
+   acpi_initrd_files[table_nr].size = file.size;
table_nr++;
}
if (table_nr == 0)
@@ -648,14 +647,14 @@ void __init acpi_initrd_override(void *data, size_t size)
memblock_reserve(acpi_tables_addr, acpi_tables_addr + all_tables_size);
arch_reserve_mem_area(acpi_tables_addr, all_tables_size);
 
-   p = early_ioremap(acpi_tables_addr, all_tables_size);
-
for (no = 0; no < table_nr; no++) {
-   memcpy(p + total_offset, early_initrd_files[no].data,
-  early_initrd_files[no].size);
-   total_offset += early_initrd_files[no].size;
+   phys_addr_t size = acpi_initrd_files[no].size;
+
+   p = early_ioremap(acpi_tables_addr + total_offset, size);
+   memcpy(p, acpi_initrd_files[no].data, size);
+   early_iounmap(p, size);
+   total_offset += size;
}
-   early_iounmap(p, all_tables_size);
 }
 #endif /* CONFIG_ACPI_INITRD_TABLE_OVERRIDE */
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 03/20] x86, ACPI, mm: Kill max_low_pfn_mapped

2013-03-09 Thread Yinghai Lu

Now we have arch_pfn_mapped array, and max_low_pfn_mapped should not
be used anymore.

User should use arch_pfn_mapped or just 1UL<<(32-PAGE_SHIFT) instead.

Only user is ACPI_INITRD_TABLE_OVERRIDE, and it should not use that,
as later accessing is using early_ioremap(). Change to try to 4G below
and then 4G above.

-v2: Leave alone max_low_pfn_mapped in i915 code according to tj.

Suggested-by: H. Peter Anvin 
Signed-off-by: Yinghai Lu 
Cc: "Rafael J. Wysocki" 
Cc: Daniel Vetter 
Cc: David Airlie 
Cc: Jacob Shin 
Cc: linux-a...@vger.kernel.org
Cc: dri-de...@lists.freedesktop.org
---
 arch/x86/include/asm/page_types.h |1 -
 arch/x86/kernel/setup.c   |4 +---
 arch/x86/mm/init.c|4 
 drivers/acpi/osl.c|   10 +++---
 4 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/page_types.h 
b/arch/x86/include/asm/page_types.h
index 54c9787..b012b82 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -43,7 +43,6 @@
 
 extern int devmem_is_allowed(unsigned long pagenr);
 
-extern unsigned long max_low_pfn_mapped;
 extern unsigned long max_pfn_mapped;
 
 static inline phys_addr_t get_max_mapped(void)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 1629577..e75c6e6 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -113,13 +113,11 @@
 #include 
 
 /*
- * max_low_pfn_mapped: highest direct mapped pfn under 4GB
- * max_pfn_mapped: highest direct mapped pfn over 4GB
+ * max_pfn_mapped: highest direct mapped pfn
  *
  * The direct mapping only covers E820_RAM regions, so the ranges and gaps are
  * represented by pfn_mapped
  */
-unsigned long max_low_pfn_mapped;
 unsigned long max_pfn_mapped;
 
 #ifdef CONFIG_DMI
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 59b7fc4..abcc241 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -313,10 +313,6 @@ static void add_pfn_range_mapped(unsigned long start_pfn, 
unsigned long end_pfn)
nr_pfn_mapped = clean_sort_range(pfn_mapped, E820_X_MAX);
 
max_pfn_mapped = max(max_pfn_mapped, end_pfn);
-
-   if (start_pfn < (1UL<<(32-PAGE_SHIFT)))
-   max_low_pfn_mapped = max(max_low_pfn_mapped,
-min(end_pfn, 1UL<<(32-PAGE_SHIFT)));
 }
 
 bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn)
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 586e7e9..c08cdb6 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -624,9 +624,13 @@ void __init acpi_initrd_override(void *data, size_t size)
if (table_nr == 0)
return;
 
-   acpi_tables_addr =
-   memblock_find_in_range(0, max_low_pfn_mapped << PAGE_SHIFT,
-  all_tables_size, PAGE_SIZE);
+   /* under 4G at first, then above 4G */
+   acpi_tables_addr = memblock_find_in_range(0, (1ULL<<32) - 1,
+   all_tables_size, PAGE_SIZE);
+   if (!acpi_tables_addr)
+   acpi_tables_addr = memblock_find_in_range(0,
+   ~(phys_addr_t)0,
+   all_tables_size, PAGE_SIZE);
if (!acpi_tables_addr) {
WARN_ON(1);
return;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 08/20] x86, ACPI: Find acpi tables in initrd early from head_32.S/head64.c

2013-03-09 Thread Yinghai Lu

head64.c could use #PF handler set page table to access initrd before
init mem mapping and initrd relocating.

head_32.S could use 32bit flat mode to access initrd before init mem
mapping initrd relocating.

That make 32bit and 64 bit more consistent.

-v2: use inline function in header file instead according to tj.
 also still need to keep #idef head_32.S to avoid compiling error.

Signed-off-by: Yinghai Lu 
Cc: Pekka Enberg 
Cc: Jacob Shin 
Cc: Rafael J. Wysocki 
Cc: linux-a...@vger.kernel.org
---
 arch/x86/include/asm/setup.h |6 ++
 arch/x86/kernel/head64.c |2 ++
 arch/x86/kernel/head_32.S|4 
 arch/x86/kernel/setup.c  |   30 --
 4 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index 4f71d48..6f885b7 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -42,6 +42,12 @@ extern void visws_early_detect(void);
 static inline void visws_early_detect(void) { }
 #endif
 
+#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
+void x86_acpi_override_find(void);
+#else
+static inline void x86_acpi_override_find(void) { }
+#endif
+
 extern unsigned long saved_video_mode;
 
 extern void reserve_standard_io_resources(void);
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index c5e403f..a31bc63 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -174,6 +174,8 @@ void __init x86_64_start_kernel(char * real_mode_data)
if (console_loglevel == 10)
early_printk("Kernel alive\n");
 
+   x86_acpi_override_find();
+
clear_page(init_level4_pgt);
/* set init_level4_pgt kernel high mapping*/
init_level4_pgt[511] = early_level4_pgt[511];
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 73afd11..ca08f0e 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -149,6 +149,10 @@ ENTRY(startup_32)
call load_ucode_bsp
 #endif
 
+#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
+   call x86_acpi_override_find
+#endif
+
 /*
  * Initialize page tables.  This creates a PDE and a set of page
  * tables, which are located immediately beyond __brk_base.  The variable
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 16a703f..b067663 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -424,6 +424,34 @@ static void __init reserve_initrd(void)
 }
 #endif /* CONFIG_BLK_DEV_INITRD */
 
+#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
+void __init x86_acpi_override_find(void)
+{
+   unsigned long ramdisk_image, ramdisk_size;
+   unsigned char *p = NULL;
+
+#ifdef CONFIG_X86_32
+   struct boot_params *boot_params_p;
+
+   /*
+* 32bit is from head_32.S, and it is 32bit flat mode.
+* So need to use phys address to access global variables.
+*/
+   boot_params_p = (struct boot_params *)__pa_symbol(_params);
+   ramdisk_image = get_ramdisk_image(boot_params_p);
+   ramdisk_size  = get_ramdisk_size(boot_params_p);
+   p = (unsigned char *)ramdisk_image;
+   acpi_initrd_override_find(p, ramdisk_size, true);
+#else
+   ramdisk_image = get_ramdisk_image(_params);
+   ramdisk_size  = get_ramdisk_size(_params);
+   if (ramdisk_image)
+   p = __va(ramdisk_image);
+   acpi_initrd_override_find(p, ramdisk_size, false);
+#endif
+}
+#endif
+
 static void __init parse_setup_data(void)
 {
struct setup_data *data;
@@ -1092,8 +1120,6 @@ void __init setup_arch(char **cmdline_p)
 
reserve_initrd();
 
-   acpi_initrd_override_find((void *)initrd_start,
-   initrd_end - initrd_start, false);
acpi_initrd_override_copy();
 
reserve_crashkernel();
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 11/20] x86, mm, numa: Move node_map_pfn alignment() to x86

2013-03-09 Thread Yinghai Lu

Move node_map_pfn_alignment() to arch/x86/mm as no other user for it.

Will update it to use numa_meminfo instead of memblock.

Signed-off-by: Yinghai Lu 
---
 arch/x86/mm/numa.c |   50 ++
 include/linux/mm.h |1 -
 mm/page_alloc.c|   50 --
 3 files changed, 50 insertions(+), 51 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index b7173f6..24155b2 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -477,6 +477,56 @@ static bool __init numa_meminfo_cover_memory(const struct 
numa_meminfo *mi)
return true;
 }
 
+/**
+ * node_map_pfn_alignment - determine the maximum internode alignment
+ *
+ * This function should be called after node map is populated and sorted.
+ * It calculates the maximum power of two alignment which can distinguish
+ * all the nodes.
+ *
+ * For example, if all nodes are 1GiB and aligned to 1GiB, the return value
+ * would indicate 1GiB alignment with (1 << (30 - PAGE_SHIFT)).  If the
+ * nodes are shifted by 256MiB, 256MiB.  Note that if only the last node is
+ * shifted, 1GiB is enough and this function will indicate so.
+ *
+ * This is used to test whether pfn -> nid mapping of the chosen memory
+ * model has fine enough granularity to avoid incorrect mapping for the
+ * populated node map.
+ *
+ * Returns the determined alignment in pfn's.  0 if there is no alignment
+ * requirement (single node).
+ */
+unsigned long __init node_map_pfn_alignment(void)
+{
+   unsigned long accl_mask = 0, last_end = 0;
+   unsigned long start, end, mask;
+   int last_nid = -1;
+   int i, nid;
+
+   for_each_mem_pfn_range(i, MAX_NUMNODES, , , ) {
+   if (!start || last_nid < 0 || last_nid == nid) {
+   last_nid = nid;
+   last_end = end;
+   continue;
+   }
+
+   /*
+* Start with a mask granular enough to pin-point to the
+* start pfn and tick off bits one-by-one until it becomes
+* too coarse to separate the current node from the last.
+*/
+   mask = ~((1 << __ffs(start)) - 1);
+   while (mask && last_end <= (start & (mask << 1)))
+   mask <<= 1;
+
+   /* accumulate all internode masks */
+   accl_mask |= mask;
+   }
+
+   /* convert mask to number of pages */
+   return ~accl_mask + 1;
+}
+
 static int __init numa_register_memblks(struct numa_meminfo *mi)
 {
unsigned long uninitialized_var(pfn_align);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2ae2050..1c79b10 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1323,7 +1323,6 @@ extern void free_initmem(void);
  * CONFIG_HAVE_MEMBLOCK_NODE_MAP.
  */
 extern void free_area_init_nodes(unsigned long *max_zone_pfn);
-unsigned long node_map_pfn_alignment(void);
 extern unsigned long absent_pages_in_range(unsigned long start_pfn,
unsigned long end_pfn);
 extern void get_pfn_range_for_nid(unsigned int nid,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 580d919..f368db4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4725,56 +4725,6 @@ static inline void setup_nr_node_ids(void)
 }
 #endif
 
-/**
- * node_map_pfn_alignment - determine the maximum internode alignment
- *
- * This function should be called after node map is populated and sorted.
- * It calculates the maximum power of two alignment which can distinguish
- * all the nodes.
- *
- * For example, if all nodes are 1GiB and aligned to 1GiB, the return value
- * would indicate 1GiB alignment with (1 << (30 - PAGE_SHIFT)).  If the
- * nodes are shifted by 256MiB, 256MiB.  Note that if only the last node is
- * shifted, 1GiB is enough and this function will indicate so.
- *
- * This is used to test whether pfn -> nid mapping of the chosen memory
- * model has fine enough granularity to avoid incorrect mapping for the
- * populated node map.
- *
- * Returns the determined alignment in pfn's.  0 if there is no alignment
- * requirement (single node).
- */
-unsigned long __init node_map_pfn_alignment(void)
-{
-   unsigned long accl_mask = 0, last_end = 0;
-   unsigned long start, end, mask;
-   int last_nid = -1;
-   int i, nid;
-
-   for_each_mem_pfn_range(i, MAX_NUMNODES, , , ) {
-   if (!start || last_nid < 0 || last_nid == nid) {
-   last_nid = nid;
-   last_end = end;
-   continue;
-   }
-
-   /*
-* Start with a mask granular enough to pin-point to the
-* start pfn and tick off bits one-by-one until it becomes
-* too coarse to separate the current node from the last.
-*/
-   mask = ~((1 << __ffs(start)) - 1);
-

Re: [PATCH v2 01/20] vmcore: refer to e_phoff member explicitly

2013-03-09 Thread Zhang Yanfei

于 2013年03月05日 15:35, Zhang Yanfei 写道:
> 于 2013年03月02日 16:35, HATAYAMA Daisuke 写道:
>> Code around /proc/vmcore currently assumes program header table is
>> next to ELF header. But future change can break the assumption on
>> kexec-tools and the 1st kernel. To avoid worst case, now refer to
>> e_phoff member that indicates position of program header table in
>> file-offset.
> 
> Reviewed-by: Zhang Yanfei 
> 
>>
>> Signed-off-by: HATAYAMA Daisuke 
>> ---
>>
>>  fs/proc/vmcore.c |   40 
>>  1 files changed, 20 insertions(+), 20 deletions(-)
>>
>> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
>> index b870f74..abf4f01 100644
>> --- a/fs/proc/vmcore.c
>> +++ b/fs/proc/vmcore.c
>> @@ -221,8 +221,8 @@ static u64 __init get_vmcore_size_elf64(char *elfptr)
>>  Elf64_Phdr *phdr_ptr;
>>  
>>  ehdr_ptr = (Elf64_Ehdr *)elfptr;
>> -phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
>> -size = sizeof(Elf64_Ehdr) + ((ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr));
>> +phdr_ptr = (Elf64_Phdr*)(elfptr + ehdr_ptr->e_phoff);
>> +size = ehdr_ptr->e_phoff + ((ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr));
>>  for (i = 0; i < ehdr_ptr->e_phnum; i++) {
>>  size += phdr_ptr->p_memsz;
>>  phdr_ptr++;
>> @@ -238,8 +238,8 @@ static u64 __init get_vmcore_size_elf32(char *elfptr)
>>  Elf32_Phdr *phdr_ptr;
>>  
>>  ehdr_ptr = (Elf32_Ehdr *)elfptr;
>> -phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
>> -size = sizeof(Elf32_Ehdr) + ((ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr));
>> +phdr_ptr = (Elf32_Phdr*)(elfptr + ehdr_ptr->e_phoff);
>> +size = ehdr_ptr->e_phoff + ((ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr));
>>  for (i = 0; i < ehdr_ptr->e_phnum; i++) {
>>  size += phdr_ptr->p_memsz;
>>  phdr_ptr++;
>> @@ -259,7 +259,7 @@ static int __init merge_note_headers_elf64(char *elfptr, 
>> size_t *elfsz,
>>  u64 phdr_sz = 0, note_off;
>>  
>>  ehdr_ptr = (Elf64_Ehdr *)elfptr;
>> -phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
>> +phdr_ptr = (Elf64_Phdr*)(elfptr + ehdr_ptr->e_phoff);
>>  for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
>>  int j;
>>  void *notes_section;
>> @@ -305,7 +305,7 @@ static int __init merge_note_headers_elf64(char *elfptr, 
>> size_t *elfsz,
>>  /* Prepare merged PT_NOTE program header. */
>>  phdr.p_type= PT_NOTE;
>>  phdr.p_flags   = 0;
>> -note_off = sizeof(Elf64_Ehdr) +
>> +note_off = ehdr_ptr->e_phoff +
>>  (ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf64_Phdr);
>>  phdr.p_offset  = note_off;
>>  phdr.p_vaddr   = phdr.p_paddr = 0;
>> @@ -313,14 +313,14 @@ static int __init merge_note_headers_elf64(char 
>> *elfptr, size_t *elfsz,
>>  phdr.p_align   = 0;
>>  
>>  /* Add merged PT_NOTE program header*/
>> -tmp = elfptr + sizeof(Elf64_Ehdr);
>> +tmp = elfptr + ehdr_ptr->e_phoff;
>>  memcpy(tmp, , sizeof(phdr));
>>  tmp += sizeof(phdr);
>>  
>>  /* Remove unwanted PT_NOTE program headers. */
>>  i = (nr_ptnote - 1) * sizeof(Elf64_Phdr);
>>  *elfsz = *elfsz - i;
>> -memmove(tmp, tmp+i, ((*elfsz)-sizeof(Elf64_Ehdr)-sizeof(Elf64_Phdr)));
>> +memmove(tmp, tmp+i, ((*elfsz)-ehdr_ptr->e_phoff-sizeof(Elf64_Phdr)));
>>  
>>  /* Modify e_phnum to reflect merged headers. */
>>  ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
>> @@ -340,7 +340,7 @@ static int __init merge_note_headers_elf32(char *elfptr, 
>> size_t *elfsz,
>>  u64 phdr_sz = 0, note_off;
>>  
>>  ehdr_ptr = (Elf32_Ehdr *)elfptr;
>> -phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
>> +phdr_ptr = (Elf32_Phdr*)(elfptr + ehdr_ptr->e_phoff);
>>  for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
>>  int j;
>>  void *notes_section;
>> @@ -386,7 +386,7 @@ static int __init merge_note_headers_elf32(char *elfptr, 
>> size_t *elfsz,
>>  /* Prepare merged PT_NOTE program header. */
>>  phdr.p_type= PT_NOTE;
>>  phdr.p_flags   = 0;
>> -note_off = sizeof(Elf32_Ehdr) +
>> +note_off = ehdr_ptr->e_phoff +
>>  (ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf32_Phdr);
>>  phdr.p_offset  = note_off;
>>  phdr.p_vaddr   = phdr.p_paddr = 0;
>> @@ -394,14 +394,14 @@ static int __init merge_note_headers_elf32(char 
>> *elfptr, size_t *elfsz,
>>  phdr.p_align   = 0;
>>  
>>  /* Add merged PT_NOTE program header*/
>> -tmp = elfptr + sizeof(Elf32_Ehdr);
>> +tmp = elfptr + ehdr_ptr->e_phoff;
>>  memcpy(tmp, , sizeof(phdr));
>>  tmp += sizeof(phdr);
>>  
>>  /* Remove unwanted PT_NOTE program headers. */
>>  i = (nr_ptnote - 1) * sizeof(Elf32_Phdr);
>>  *elfsz = *elfsz - i;
>> -memmove(tmp, tmp+i, ((*elfsz)-sizeof(Elf32_Ehdr)-sizeof(Elf32_Phdr)));
>> +memmove(tmp, tmp+i, ((*elfsz)-ehdr_ptr->e_phoff-sizeof(Elf32_Phdr)));
>>  
>>  /*

[PATCH v2 19/20] x86, mm: Make init_mem_mapping be able to be called several times

2013-03-09 Thread Yinghai Lu

Prepare to put page table on local nodes.

Move calling of init_mem_mapping to early_initmem_init.

Rework alloc_low_pages to alloc page table in following order:
BRK, local node, low range

Still only load_cr3 one time, otherwise we would break xen 64bit again.

Signed-off-by: Yinghai Lu 
Cc: Pekka Enberg 
Cc: Jacob Shin 
Cc: Konrad Rzeszutek Wilk 
---
 arch/x86/include/asm/pgtable.h |2 +-
 arch/x86/kernel/setup.c|1 -
 arch/x86/mm/init.c |   88 
 arch/x86/mm/numa.c |   24 +++
 4 files changed, 79 insertions(+), 36 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 1e67223..868687c 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -621,7 +621,7 @@ static inline int pgd_none(pgd_t pgd)
 #ifndef __ASSEMBLY__
 
 extern int direct_gbpages;
-void init_mem_mapping(void);
+void init_mem_mapping(unsigned long begin, unsigned long end);
 void early_alloc_pgt_buf(void);
 
 /* local pte updates need not use xchg for locking */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 86e1ec0..1cdc1a7 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1105,7 +1105,6 @@ void __init setup_arch(char **cmdline_p)
acpi_boot_table_init();
early_acpi_boot_init();
early_initmem_init();
-   init_mem_mapping();
memblock.current_limit = get_max_mapped();
early_trap_pf_init();
 
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 28b294f..8d0007a 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -24,7 +24,10 @@ static unsigned long __initdata pgt_buf_start;
 static unsigned long __initdata pgt_buf_end;
 static unsigned long __initdata pgt_buf_top;
 
-static unsigned long min_pfn_mapped;
+static unsigned long low_min_pfn_mapped;
+static unsigned long low_max_pfn_mapped;
+static unsigned long local_min_pfn_mapped;
+static unsigned long local_max_pfn_mapped;
 
 static bool __initdata can_use_brk_pgt = true;
 
@@ -52,10 +55,17 @@ __ref void *alloc_low_pages(unsigned int num)
 
if ((pgt_buf_end + num) > pgt_buf_top || !can_use_brk_pgt) {
unsigned long ret;
-   if (min_pfn_mapped >= max_pfn_mapped)
-   panic("alloc_low_page: ran out of memory");
-   ret = memblock_find_in_range(min_pfn_mapped << PAGE_SHIFT,
-   max_pfn_mapped << PAGE_SHIFT,
+   if (local_min_pfn_mapped >= local_max_pfn_mapped) {
+   if (low_min_pfn_mapped >= low_max_pfn_mapped)
+   panic("alloc_low_page: ran out of memory");
+   ret = memblock_find_in_range(
+   low_min_pfn_mapped << PAGE_SHIFT,
+   low_max_pfn_mapped << PAGE_SHIFT,
+   PAGE_SIZE * num , PAGE_SIZE);
+   } else
+   ret = memblock_find_in_range(
+   local_min_pfn_mapped << PAGE_SHIFT,
+   local_max_pfn_mapped << PAGE_SHIFT,
PAGE_SIZE * num , PAGE_SIZE);
if (!ret)
panic("alloc_low_page: can not alloc memory");
@@ -387,60 +397,75 @@ static unsigned long __init init_range_memory_mapping(
 
 /* (PUD_SHIFT-PMD_SHIFT)/2 */
 #define STEP_SIZE_SHIFT 5
-void __init init_mem_mapping(void)
+void __init init_mem_mapping(unsigned long begin, unsigned long end)
 {
-   unsigned long end, real_end, start, last_start;
+   unsigned long real_end, start, last_start;
unsigned long step_size;
unsigned long addr;
unsigned long mapped_ram_size = 0;
unsigned long new_mapped_ram_size;
+   bool is_low = false;
+
+   if (!begin) {
+   probe_page_size_mask();
+   /* the ISA range is always mapped regardless of memory holes */
+   init_memory_mapping(0, ISA_END_ADDRESS);
+   begin = ISA_END_ADDRESS;
+   is_low = true;
+   }
 
-   probe_page_size_mask();
-
-#ifdef CONFIG_X86_64
-   end = max_pfn << PAGE_SHIFT;
-#else
-   end = max_low_pfn << PAGE_SHIFT;
-#endif
-
-   /* the ISA range is always mapped regardless of memory holes */
-   init_memory_mapping(0, ISA_END_ADDRESS);
+   if (begin >= end)
+   return;
 
/* xen has big range in reserved near end of ram, skip it at first.*/
-   addr = memblock_find_in_range(ISA_END_ADDRESS, end, PMD_SIZE, PMD_SIZE);
+   addr = memblock_find_in_range(begin, end, PMD_SIZE, PMD_SIZE);
real_end = addr + PMD_SIZE;
 
/* step_size need to be small so pgt_buf from BRK could cover it */
step_size = PMD_SIZE;
-   max_pfn_mapped = 0; /* will get exact value next */
-   min_pfn_mapped

[PATCH v2 20/20] x86, mm, numa: Put pagetable on local node ram for 64bit

2013-03-09 Thread Yinghai Lu

If node with ram is hotplugable, local node mem for page table and vmemmap
should be on that node ram.

This patch is some kind of refreshment of
| commit 1411e0ec3123ae4c4ead6bfc9fe3ee5a3ae5c327
| Date:   Mon Dec 27 16:48:17 2010 -0800
|
|x86-64, numa: Put pgtable to local node memory
That was reverted before.

We have reason to reintroduce it to make memory hotplug work.

Calling init_mem_mapping in early_initmem_init for every node.
alloc_low_pages will alloc page table in following order:
BRK, local node, low range
So page table will be on low range or local nodes.

Signed-off-by: Yinghai Lu 
Cc: Pekka Enberg 
Cc: Jacob Shin 
Cc: Konrad Rzeszutek Wilk 
---
 arch/x86/mm/numa.c |   34 +-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index d3eb0c9..11acdf6 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -673,7 +673,39 @@ static void __init early_x86_numa_init(void)
 #ifdef CONFIG_X86_64
 static void __init early_x86_numa_init_mapping(void)
 {
-   init_mem_mapping(0, max_pfn << PAGE_SHIFT);
+   unsigned long last_start = 0, last_end = 0;
+   struct numa_meminfo *mi = _meminfo;
+   unsigned long start, end;
+   int last_nid = -1;
+   int i, nid;
+
+   for (i = 0; i < mi->nr_blks; i++) {
+   nid   = mi->blk[i].nid;
+   start = mi->blk[i].start;
+   end   = mi->blk[i].end;
+
+   if (last_nid == nid) {
+   last_end = end;
+   continue;
+   }
+
+   /* other nid now */
+   if (last_nid >= 0) {
+   printk(KERN_DEBUG "Node %d: [mem %#016lx-%#016lx]\n",
+   last_nid, last_start, last_end - 1);
+   init_mem_mapping(last_start, last_end);
+   }
+
+   /* for next nid */
+   last_nid   = nid;
+   last_start = start;
+   last_end   = end;
+   }
+   /* last one */
+   printk(KERN_DEBUG "Node %d: [mem %#016lx-%#016lx]\n",
+   last_nid, last_start, last_end - 1);
+   init_mem_mapping(last_start, last_end);
+
if (max_pfn > max_low_pfn)
max_low_pfn = max_pfn;
 }
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 17/20] x86, mm, numa: Add early_initmem_init() stub

2013-03-09 Thread Yinghai Lu

early_initmem_init() call early_x86_numa_init() to parse numa info early.

Later will call init_mem_mapping for nodes in it.

Signed-off-by: Yinghai Lu 
Cc: Pekka Enberg 
Cc: Jacob Shin 
---
 arch/x86/include/asm/page_types.h |1 +
 arch/x86/kernel/setup.c   |1 +
 arch/x86/mm/init.c|6 ++
 arch/x86/mm/numa.c|7 +--
 4 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/page_types.h 
b/arch/x86/include/asm/page_types.h
index b012b82..d04dd8c 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -55,6 +55,7 @@ bool pfn_range_is_mapped(unsigned long start_pfn, unsigned 
long end_pfn);
 extern unsigned long init_memory_mapping(unsigned long start,
 unsigned long end);
 
+void early_initmem_init(void);
 extern void initmem_init(void);
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index b067663..626bc9f 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1135,6 +1135,7 @@ void __init setup_arch(char **cmdline_p)
 
early_acpi_boot_init();
 
+   early_initmem_init();
initmem_init();
memblock_find_dma_reserve();
 
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index abcc241..28b294f 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -450,6 +450,12 @@ void __init init_mem_mapping(void)
early_memtest(0, max_pfn_mapped << PAGE_SHIFT);
 }
 
+#ifndef CONFIG_NUMA
+void __init early_initmem_init(void)
+{
+}
+#endif
+
 /*
  * devmem_is_allowed() checks to see if /dev/mem access to a certain address
  * is valid. The argument is a physical page number.
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 182e085..c2d4653 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -668,13 +668,16 @@ static void __init early_x86_numa_init(void)
numa_init(dummy_numa_init);
 }
 
+void __init early_initmem_init(void)
+{
+   early_x86_numa_init();
+}
+
 void __init x86_numa_init(void)
 {
int i, nid;
struct numa_meminfo *mi = _meminfo;
 
-   early_x86_numa_init();
-
 #ifdef CONFIG_ACPI_NUMA
if (srat_used)
x86_acpi_numa_init_slit();
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 18/20] x86, mm: Parse numa info early

2013-03-09 Thread Yinghai Lu

Parsing numa info has been separated to two functions now.

early_initmem_info() only parse info in numa_meminfo and
nodes_parsed. still keep numaq, acpi_numa, amd_numa, dummy
fall back sequence working.

SLIT and numa emulation handling are still left in initmem_init().

Call early_initmem_init before init_mem_mapping() to prepare
to use numa_info with it.

Signed-off-by: Yinghai Lu 
Cc: Pekka Enberg 
Cc: Jacob Shin 
---
 arch/x86/kernel/setup.c |   24 ++--
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 626bc9f..86e1ec0 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1098,13 +1098,21 @@ void __init setup_arch(char **cmdline_p)
trim_platform_memory_ranges();
trim_low_memory_range();
 
+   /*
+* Parse the ACPI tables for possible boot-time SMP configuration.
+*/
+   acpi_initrd_override_copy();
+   acpi_boot_table_init();
+   early_acpi_boot_init();
+   early_initmem_init();
init_mem_mapping();
-
+   memblock.current_limit = get_max_mapped();
early_trap_pf_init();
 
+   reserve_initrd();
+
setup_real_mode();
 
-   memblock.current_limit = get_max_mapped();
dma_contiguous_reserve(0);
 
/*
@@ -1118,24 +1126,12 @@ void __init setup_arch(char **cmdline_p)
/* Allocate bigger log buffer */
setup_log_buf(1);
 
-   reserve_initrd();
-
-   acpi_initrd_override_copy();
-
reserve_crashkernel();
 
vsmp_init();
 
io_delay_init();
 
-   /*
-* Parse the ACPI tables for possible boot-time SMP configuration.
-*/
-   acpi_boot_table_init();
-
-   early_acpi_boot_init();
-
-   early_initmem_init();
initmem_init();
memblock_find_dma_reserve();
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 15/20] x86, mm, numa: Move emulation handling down.

2013-03-09 Thread Yinghai Lu

It needs to allocate buffer for new numa_meminfo and distance matrix,
so move it down.

Also we change the behavoir:
before this patch, if user input wrong data in command line, it
will fall back to next numa probing or disabling numa.
after this patch, if user input wrong data in command line, it will
stay with numa info from probing before, like acpi srat or amd_numa.

We need to call numa_check_memblks to reject wrong user inputs early,
so keep the original numa_meminfo not changed.

Signed-off-by: Yinghai Lu 
Cc: David Rientjes 
---
 arch/x86/mm/numa.c   |6 +++---
 arch/x86/mm/numa_emulation.c |2 +-
 arch/x86/mm/numa_internal.h  |2 ++
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 1d5fa08..90fd123 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -537,7 +537,7 @@ static unsigned long __init node_map_pfn_alignment(struct 
numa_meminfo *mi)
 }
 #endif
 
-static int __init numa_check_memblks(struct numa_meminfo *mi)
+int __init numa_check_memblks(struct numa_meminfo *mi)
 {
nodemask_t nodes_parsed;
unsigned long pfn_align;
@@ -607,8 +607,6 @@ static int __init numa_init(int (*init_func)(void))
if (ret < 0)
return ret;
 
-   numa_emulation(_meminfo, numa_distance_cnt);
-
ret = numa_check_memblks(_meminfo);
if (ret < 0)
return ret;
@@ -672,6 +670,8 @@ void __init x86_numa_init(void)
 
early_x86_numa_init();
 
+   numa_emulation(_meminfo, numa_distance_cnt);
+
node_possible_map = numa_nodes_parsed;
numa_nodemask_from_meminfo(_possible_map, mi);
 
diff --git a/arch/x86/mm/numa_emulation.c b/arch/x86/mm/numa_emulation.c
index d47..5a0433d 100644
--- a/arch/x86/mm/numa_emulation.c
+++ b/arch/x86/mm/numa_emulation.c
@@ -348,7 +348,7 @@ void __init numa_emulation(struct numa_meminfo 
*numa_meminfo, int numa_dist_cnt)
if (ret < 0)
goto no_emu;
 
-   if (numa_cleanup_meminfo() < 0) {
+   if (numa_cleanup_meminfo() < 0 || numa_check_memblks() < 0) {
pr_warning("NUMA: Warning: constructed meminfo invalid, 
disabling emulation\n");
goto no_emu;
}
diff --git a/arch/x86/mm/numa_internal.h b/arch/x86/mm/numa_internal.h
index ad86ec9..bb2fbcc 100644
--- a/arch/x86/mm/numa_internal.h
+++ b/arch/x86/mm/numa_internal.h
@@ -21,6 +21,8 @@ void __init numa_reset_distance(void);
 
 void __init x86_numa_init(void);
 
+int __init numa_check_memblks(struct numa_meminfo *mi);
+
 #ifdef CONFIG_NUMA_EMU
 void __init numa_emulation(struct numa_meminfo *numa_meminfo,
   int numa_dist_cnt);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 16/20] x86, ACPI, numa, ia64: split SLIT handling out

2013-03-09 Thread Yinghai Lu

We need to handle slit later, as it need to allocate buffer for distance
matrix. Also we do not need SLIT info before init_mem_mapping.

So move SLIT parsing later.

x86_acpi_numa_init become x86_acpi_numa_init_srat/x86_acpi_numa_init_slit.

It should not break ia64 by replacing acpi_numa_init with
acpi_numa_init_srat/acpi_numa_init_slit/acpi_num_arch_fixup.

-v2: Change name to acpi_numa_init_srat/acpi_numa_init_slit according tj.
 remove the reset_numa_distance() in numa_init(), as get we only set
 distance in slit handling.

Signed-off-by: Yinghai Lu 
Cc: Rafael J. Wysocki 
Cc: linux-a...@vger.kernel.org
Cc: Tony Luck 
Cc: Fenghua Yu 
Cc: linux-i...@vger.kernel.org
---
 arch/ia64/kernel/setup.c|4 +++-
 arch/x86/include/asm/acpi.h |3 ++-
 arch/x86/mm/numa.c  |   14 --
 arch/x86/mm/srat.c  |   11 +++
 drivers/acpi/numa.c |   13 +++--
 include/linux/acpi.h|3 ++-
 6 files changed, 33 insertions(+), 15 deletions(-)

diff --git a/arch/ia64/kernel/setup.c b/arch/ia64/kernel/setup.c
index 2029cc0..6a2efb5 100644
--- a/arch/ia64/kernel/setup.c
+++ b/arch/ia64/kernel/setup.c
@@ -558,7 +558,9 @@ setup_arch (char **cmdline_p)
acpi_table_init();
early_acpi_boot_init();
 # ifdef CONFIG_ACPI_NUMA
-   acpi_numa_init();
+   acpi_numa_init_srat();
+   acpi_numa_init_slit();
+   acpi_numa_arch_fixup();
 #  ifdef CONFIG_ACPI_HOTPLUG_CPU
prefill_possible_map();
 #  endif
diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
index b31bf97..651db0b 100644
--- a/arch/x86/include/asm/acpi.h
+++ b/arch/x86/include/asm/acpi.h
@@ -178,7 +178,8 @@ static inline void disable_acpi(void) { }
 
 #ifdef CONFIG_ACPI_NUMA
 extern int acpi_numa;
-extern int x86_acpi_numa_init(void);
+int x86_acpi_numa_init_srat(void);
+void x86_acpi_numa_init_slit(void);
 #endif /* CONFIG_ACPI_NUMA */
 
 #define acpi_unlazy_tlb(x) leave_mm(x)
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 90fd123..182e085 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -598,7 +598,6 @@ static int __init numa_init(int (*init_func)(void))
 
nodes_clear(numa_nodes_parsed);
memset(_meminfo, 0, sizeof(numa_meminfo));
-   numa_reset_distance();
 
ret = init_func();
if (ret < 0)
@@ -636,6 +635,10 @@ static int __init dummy_numa_init(void)
return 0;
 }
 
+#ifdef CONFIG_ACPI_NUMA
+static bool srat_used __initdata;
+#endif
+
 /**
  * x86_numa_init - Initialize NUMA
  *
@@ -651,8 +654,10 @@ static void __init early_x86_numa_init(void)
return;
 #endif
 #ifdef CONFIG_ACPI_NUMA
-   if (!numa_init(x86_acpi_numa_init))
+   if (!numa_init(x86_acpi_numa_init_srat)) {
+   srat_used = true;
return;
+   }
 #endif
 #ifdef CONFIG_AMD_NUMA
if (!numa_init(amd_numa_init))
@@ -670,6 +675,11 @@ void __init x86_numa_init(void)
 
early_x86_numa_init();
 
+#ifdef CONFIG_ACPI_NUMA
+   if (srat_used)
+   x86_acpi_numa_init_slit();
+#endif
+
numa_emulation(_meminfo, numa_distance_cnt);
 
node_possible_map = numa_nodes_parsed;
diff --git a/arch/x86/mm/srat.c b/arch/x86/mm/srat.c
index cdd0da9..443f9ef 100644
--- a/arch/x86/mm/srat.c
+++ b/arch/x86/mm/srat.c
@@ -185,14 +185,17 @@ out_err:
return -1;
 }
 
-void __init acpi_numa_arch_fixup(void) {}
-
-int __init x86_acpi_numa_init(void)
+int __init x86_acpi_numa_init_srat(void)
 {
int ret;
 
-   ret = acpi_numa_init();
+   ret = acpi_numa_init_srat();
if (ret < 0)
return ret;
return srat_disabled() ? -EINVAL : 0;
 }
+
+void __init x86_acpi_numa_init_slit(void)
+{
+   acpi_numa_init_slit();
+}
diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
index 33e609f..6460db4 100644
--- a/drivers/acpi/numa.c
+++ b/drivers/acpi/numa.c
@@ -282,7 +282,7 @@ acpi_table_parse_srat(enum acpi_srat_type id,
handler, max_entries);
 }
 
-int __init acpi_numa_init(void)
+int __init acpi_numa_init_srat(void)
 {
int cnt = 0;
 
@@ -303,11 +303,6 @@ int __init acpi_numa_init(void)
NR_NODE_MEMBLKS);
}
 
-   /* SLIT: System Locality Information Table */
-   acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);
-
-   acpi_numa_arch_fixup();
-
if (cnt < 0)
return cnt;
else if (!parsed_numa_memblks)
@@ -315,6 +310,12 @@ int __init acpi_numa_init(void)
return 0;
 }
 
+void __init acpi_numa_init_slit(void)
+{
+   /* SLIT: System Locality Information Table */
+   acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);
+}
+
 int acpi_get_pxm(acpi_handle h)
 {
unsigned long long pxm;
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 4b943e6..4a78235 100644
--- a/include/linux/acpi.h
+++

[PATCH v2 13/20] x86, mm, numa: Set memblock nid later

2013-03-09 Thread Yinghai Lu

For the separation, we need to set memblock nid later, as it
could change memblock array, and possible doube memblock.memory
array that will need to allocate buffer.

Only set memblock nid one time for successful path.

Also rename numa_register_memblks to numa_check_memblks()
after move out code for setting memblock nid.

Signed-off-by: Yinghai Lu 
---
 arch/x86/mm/numa.c |   16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index fcaeba9..e2ddcbd 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -537,10 +537,9 @@ static unsigned long __init node_map_pfn_alignment(struct 
numa_meminfo *mi)
 }
 #endif
 
-static int __init numa_register_memblks(struct numa_meminfo *mi)
+static int __init numa_check_memblks(struct numa_meminfo *mi)
 {
unsigned long pfn_align;
-   int i;
 
/* Account for nodes with cpus and no memory */
node_possible_map = numa_nodes_parsed;
@@ -563,11 +562,6 @@ static int __init numa_register_memblks(struct 
numa_meminfo *mi)
return -EINVAL;
}
 
-   for (i = 0; i < mi->nr_blks; i++) {
-   struct numa_memblk *mb = >blk[i];
-   memblock_set_node(mb->start, mb->end - mb->start, mb->nid);
-   }
-
return 0;
 }
 
@@ -604,7 +598,6 @@ static int __init numa_init(int (*init_func)(void))
nodes_clear(numa_nodes_parsed);
nodes_clear(node_possible_map);
memset(_meminfo, 0, sizeof(numa_meminfo));
-   WARN_ON(memblock_set_node(0, ULLONG_MAX, MAX_NUMNODES));
numa_reset_distance();
 
ret = init_func();
@@ -616,7 +609,7 @@ static int __init numa_init(int (*init_func)(void))
 
numa_emulation(_meminfo, numa_distance_cnt);
 
-   ret = numa_register_memblks(_meminfo);
+   ret = numa_check_memblks(_meminfo);
if (ret < 0)
return ret;
 
@@ -679,6 +672,11 @@ void __init x86_numa_init(void)
 
early_x86_numa_init();
 
+   for (i = 0; i < mi->nr_blks; i++) {
+   struct numa_memblk *mb = >blk[i];
+   memblock_set_node(mb->start, mb->end - mb->start, mb->nid);
+   }
+
/* Finally register nodes. */
for_each_node_mask(nid, node_possible_map) {
u64 start = PFN_PHYS(max_pfn);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 14/20] x86, mm, numa: Move node_possible_map setting later

2013-03-09 Thread Yinghai Lu

Move node_possible_map handling out of numa_check_memblks to avoid side
changing in numa_check_memblks().

Only set once for successful path instead of resetting in numa_init()
every time.

Suggested-by: Tejun Heo 
Signed-off-by: Yinghai Lu 
---
 arch/x86/mm/numa.c |   11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index e2ddcbd..1d5fa08 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -539,12 +539,13 @@ static unsigned long __init node_map_pfn_alignment(struct 
numa_meminfo *mi)
 
 static int __init numa_check_memblks(struct numa_meminfo *mi)
 {
+   nodemask_t nodes_parsed;
unsigned long pfn_align;
 
/* Account for nodes with cpus and no memory */
-   node_possible_map = numa_nodes_parsed;
-   numa_nodemask_from_meminfo(_possible_map, mi);
-   if (WARN_ON(nodes_empty(node_possible_map)))
+   nodes_parsed = numa_nodes_parsed;
+   numa_nodemask_from_meminfo(_parsed, mi);
+   if (WARN_ON(nodes_empty(nodes_parsed)))
return -EINVAL;
 
if (!numa_meminfo_cover_memory(mi))
@@ -596,7 +597,6 @@ static int __init numa_init(int (*init_func)(void))
set_apicid_to_node(i, NUMA_NO_NODE);
 
nodes_clear(numa_nodes_parsed);
-   nodes_clear(node_possible_map);
memset(_meminfo, 0, sizeof(numa_meminfo));
numa_reset_distance();
 
@@ -672,6 +672,9 @@ void __init x86_numa_init(void)
 
early_x86_numa_init();
 
+   node_possible_map = numa_nodes_parsed;
+   numa_nodemask_from_meminfo(_possible_map, mi);
+
for (i = 0; i < mi->nr_blks; i++) {
struct numa_memblk *mb = >blk[i];
memblock_set_node(mb->start, mb->end - mb->start, mb->nid);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 12/20] x86, mm, numa: Use numa_meminfo to check node_map_pfn alignment

2013-03-09 Thread Yinghai Lu

We could use numa_meminfo directly instead of memblock nid.

So we could move down set memblock nid and only do it one time
for successful path.

-v2: according to tj, separate moving to another patch.

Signed-off-by: Yinghai Lu 
---
 arch/x86/mm/numa.c |   30 +++---
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 24155b2..fcaeba9 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -496,14 +496,18 @@ static bool __init numa_meminfo_cover_memory(const struct 
numa_meminfo *mi)
  * Returns the determined alignment in pfn's.  0 if there is no alignment
  * requirement (single node).
  */
-unsigned long __init node_map_pfn_alignment(void)
+#ifdef NODE_NOT_IN_PAGE_FLAGS
+static unsigned long __init node_map_pfn_alignment(struct numa_meminfo *mi)
 {
unsigned long accl_mask = 0, last_end = 0;
unsigned long start, end, mask;
int last_nid = -1;
int i, nid;
 
-   for_each_mem_pfn_range(i, MAX_NUMNODES, , , ) {
+   for (i = 0; i < mi->nr_blks; i++) {
+   start = mi->blk[i].start >> PAGE_SHIFT;
+   end = mi->blk[i].end >> PAGE_SHIFT;
+   nid = mi->blk[i].nid;
if (!start || last_nid < 0 || last_nid == nid) {
last_nid = nid;
last_end = end;
@@ -526,10 +530,16 @@ unsigned long __init node_map_pfn_alignment(void)
/* convert mask to number of pages */
return ~accl_mask + 1;
 }
+#else
+static unsigned long __init node_map_pfn_alignment(struct numa_meminfo *mi)
+{
+   return 0;
+}
+#endif
 
 static int __init numa_register_memblks(struct numa_meminfo *mi)
 {
-   unsigned long uninitialized_var(pfn_align);
+   unsigned long pfn_align;
int i;
 
/* Account for nodes with cpus and no memory */
@@ -541,24 +551,22 @@ static int __init numa_register_memblks(struct 
numa_meminfo *mi)
if (!numa_meminfo_cover_memory(mi))
return -EINVAL;
 
-   for (i = 0; i < mi->nr_blks; i++) {
-   struct numa_memblk *mb = >blk[i];
-   memblock_set_node(mb->start, mb->end - mb->start, mb->nid);
-   }
-
/*
 * If sections array is gonna be used for pfn -> nid mapping, check
 * whether its granularity is fine enough.
 */
-#ifdef NODE_NOT_IN_PAGE_FLAGS
-   pfn_align = node_map_pfn_alignment();
+   pfn_align = node_map_pfn_alignment(mi);
if (pfn_align && pfn_align < PAGES_PER_SECTION) {
printk(KERN_WARNING "Node alignment %LuMB < min %LuMB, 
rejecting NUMA config\n",
   PFN_PHYS(pfn_align) >> 20,
   PFN_PHYS(PAGES_PER_SECTION) >> 20);
return -EINVAL;
}
-#endif
+
+   for (i = 0; i < mi->nr_blks; i++) {
+   struct numa_memblk *mb = >blk[i];
+   memblock_set_node(mb->start, mb->end - mb->start, mb->nid);
+   }
 
return 0;
 }
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 10/20] x86, mm, numa: Call numa_meminfo_cover_memory() checking early

2013-03-09 Thread Yinghai Lu

For the separation, we need to set memblock nid later, as it
could change memblock array, and possible doube memblock.memory
array that will need to allocate buffer.

We do not need to use nid in memblock to find out absent pages.
So we can move that numa_meminfo_cover_memory() early.

Also could change __absent_pages_in_range() to static and use
absent_pages_in_range() directly.

Later we can only set memblock nid one time on successful path.

Signed-off-by: Yinghai Lu 
---
 arch/x86/mm/numa.c |7 ---
 include/linux/mm.h |2 --
 mm/page_alloc.c|2 +-
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index d545638..b7173f6 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -460,7 +460,7 @@ static bool __init numa_meminfo_cover_memory(const struct 
numa_meminfo *mi)
u64 s = mi->blk[i].start >> PAGE_SHIFT;
u64 e = mi->blk[i].end >> PAGE_SHIFT;
numaram += e - s;
-   numaram -= __absent_pages_in_range(mi->blk[i].nid, s, e);
+   numaram -= absent_pages_in_range(s, e);
if ((s64)numaram < 0)
numaram = 0;
}
@@ -488,6 +488,9 @@ static int __init numa_register_memblks(struct numa_meminfo 
*mi)
if (WARN_ON(nodes_empty(node_possible_map)))
return -EINVAL;
 
+   if (!numa_meminfo_cover_memory(mi))
+   return -EINVAL;
+
for (i = 0; i < mi->nr_blks; i++) {
struct numa_memblk *mb = >blk[i];
memblock_set_node(mb->start, mb->end - mb->start, mb->nid);
@@ -506,8 +509,6 @@ static int __init numa_register_memblks(struct numa_meminfo 
*mi)
return -EINVAL;
}
 #endif
-   if (!numa_meminfo_cover_memory(mi))
-   return -EINVAL;
 
return 0;
 }
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7acc9dc..2ae2050 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1324,8 +1324,6 @@ extern void free_initmem(void);
  */
 extern void free_area_init_nodes(unsigned long *max_zone_pfn);
 unsigned long node_map_pfn_alignment(void);
-unsigned long __absent_pages_in_range(int nid, unsigned long start_pfn,
-   unsigned long end_pfn);
 extern unsigned long absent_pages_in_range(unsigned long start_pfn,
unsigned long end_pfn);
 extern void get_pfn_range_for_nid(unsigned int nid,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8fcced7..580d919 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4356,7 +4356,7 @@ static unsigned long __meminit 
zone_spanned_pages_in_node(int nid,
  * Return the number of holes in a range on a node. If nid is MAX_NUMNODES,
  * then all holes in the requested range will be accounted for.
  */
-unsigned long __meminit __absent_pages_in_range(int nid,
+static unsigned long __meminit __absent_pages_in_range(int nid,
unsigned long range_start_pfn,
unsigned long range_end_pfn)
 {
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 07/20] x86, ACPI: Make acpi_initrd_override_find work with 32bit flat mode

2013-03-09 Thread Yinghai Lu

For finding with 32bit, it would be easy to access initrd in 32bit
flat mode, as we don't need to set page table.

That is from head_32.S, and microcode updating already use this trick.

Need to change acpi_initrd_override_find to use phys to access global
variables.

Pass is_phys in the function, as we can not use address to decide if it
is phys or virtual address on 32 bit. Boot loader could load initrd above
max_low_pfn.

Don't call printk as it uses global variables, so delay print later
during copying.

Change table_sigs to use stack instead, otherwise it is too messy to change
string array to phys and still keep offset calculating correct.
That size is about 36x4 bytes, and it is small to settle in stack.

Also remove "continue" in MARCO to make code more readable.

Signed-off-by: Yinghai Lu 
Cc: Pekka Enberg 
Cc: Jacob Shin 
Cc: Rafael J. Wysocki 
Cc: linux-a...@vger.kernel.org
---
 arch/x86/kernel/setup.c |2 +-
 drivers/acpi/osl.c  |   85 ---
 include/linux/acpi.h|5 +--
 3 files changed, 63 insertions(+), 29 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index d0cc176..16a703f 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1093,7 +1093,7 @@ void __init setup_arch(char **cmdline_p)
reserve_initrd();
 
acpi_initrd_override_find((void *)initrd_start,
-   initrd_end - initrd_start);
+   initrd_end - initrd_start, false);
acpi_initrd_override_copy();
 
reserve_crashkernel();
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 54bcc37..611ca9b 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -551,38 +551,54 @@ u8 __init acpi_table_checksum(u8 *buffer, u32 length)
return sum;
 }
 
-/* All but ACPI_SIG_RSDP and ACPI_SIG_FACS: */
-static const char * const table_sigs[] = {
-   ACPI_SIG_BERT, ACPI_SIG_CPEP, ACPI_SIG_ECDT, ACPI_SIG_EINJ,
-   ACPI_SIG_ERST, ACPI_SIG_HEST, ACPI_SIG_MADT, ACPI_SIG_MSCT,
-   ACPI_SIG_SBST, ACPI_SIG_SLIT, ACPI_SIG_SRAT, ACPI_SIG_ASF,
-   ACPI_SIG_BOOT, ACPI_SIG_DBGP, ACPI_SIG_DMAR, ACPI_SIG_HPET,
-   ACPI_SIG_IBFT, ACPI_SIG_IVRS, ACPI_SIG_MCFG, ACPI_SIG_MCHI,
-   ACPI_SIG_SLIC, ACPI_SIG_SPCR, ACPI_SIG_SPMI, ACPI_SIG_TCPA,
-   ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT,
-   ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
-   ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL };
-
 /* Non-fatal errors: Affected tables/files are ignored */
 #define INVALID_TABLE(x, path, name)   \
-   { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); continue; }
+   do { pr_err("ACPI OVERRIDE: " x " [%s%s]\n", path, name); } while (0)
 
 #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
 
 #define ACPI_OVERRIDE_TABLES 64
 static struct cpio_data __initdata acpi_initrd_files[ACPI_OVERRIDE_TABLES];
 
-void __init acpi_initrd_override_find(void *data, size_t size)
+/*
+ * acpi_initrd_override_find() is called from head_32.S and head64.c.
+ * head_32.S calling path is with 32bit flat mode, so we can access
+ * initrd early without setting pagetable or relocating initrd. For
+ * global variables accessing, we need to use phys address instead of
+ * kernel virtual address, try to put table_sigs string array in stack,
+ * so avoid switching for it.
+ * Also don't call printk as it uses global variables.
+ */
+void __init acpi_initrd_override_find(void *data, size_t size, bool is_phys)
 {
int sig, no, table_nr = 0;
long offset = 0;
struct acpi_table_header *table;
char cpio_path[32] = "kernel/firmware/acpi/";
struct cpio_data file;
+   struct cpio_data *files = acpi_initrd_files;
+   int *all_tables_size_p = _tables_size;
+
+   /* All but ACPI_SIG_RSDP and ACPI_SIG_FACS: */
+   char *table_sigs[] = {
+   ACPI_SIG_BERT, ACPI_SIG_CPEP, ACPI_SIG_ECDT, ACPI_SIG_EINJ,
+   ACPI_SIG_ERST, ACPI_SIG_HEST, ACPI_SIG_MADT, ACPI_SIG_MSCT,
+   ACPI_SIG_SBST, ACPI_SIG_SLIT, ACPI_SIG_SRAT, ACPI_SIG_ASF,
+   ACPI_SIG_BOOT, ACPI_SIG_DBGP, ACPI_SIG_DMAR, ACPI_SIG_HPET,
+   ACPI_SIG_IBFT, ACPI_SIG_IVRS, ACPI_SIG_MCFG, ACPI_SIG_MCHI,
+   ACPI_SIG_SLIC, ACPI_SIG_SPCR, ACPI_SIG_SPMI, ACPI_SIG_TCPA,
+   ACPI_SIG_UEFI, ACPI_SIG_WAET, ACPI_SIG_WDAT, ACPI_SIG_WDDT,
+   ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT,
+   ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, NULL };
 
if (data == NULL || size == 0)
return;
 
+   if (is_phys) {
+   files = (struct cpio_data *)__pa_symbol(acpi_initrd_files);
+   all_tables_size_p = (int *)__pa_symbol(_tables_size);
+   }
+
for (no = 0; no < ACPI_OVERRIDE_TABLES; no++) {
file =

[PATCH v2 09/20] x86, mm, numa: Move two functions calling on successful path later

2013-03-09 Thread Yinghai Lu

We need to have numa info ready before init_mem_mapping, so we
can call init_mem_mapping per nodes also can trim node mem range to
big alignment.

Current numa parsing need to allocate some buffer and need to be
called after init_mem_mapping.

So try to split parsing numa info to two stages, and early one will be
before init_mem_mapping, and it should not need allocate buffers.

At last we will have early_initmem_init() and initmem_init().

This one is first one for separation.

setup_node_data() and numa_init_array() are only called for successful
path, so we can move calling to x86_numa_init(). That will also make
numa_init() small and readable.

-v2: remove online_node_map clear in numa_init(), as it is only
 set in setup_node_data() at last in successful path.

Signed-off-by: Yinghai Lu 
---
 arch/x86/mm/numa.c |   69 +---
 1 file changed, 39 insertions(+), 30 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 72fe01e..d545638 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -480,7 +480,7 @@ static bool __init numa_meminfo_cover_memory(const struct 
numa_meminfo *mi)
 static int __init numa_register_memblks(struct numa_meminfo *mi)
 {
unsigned long uninitialized_var(pfn_align);
-   int i, nid;
+   int i;
 
/* Account for nodes with cpus and no memory */
node_possible_map = numa_nodes_parsed;
@@ -509,24 +509,6 @@ static int __init numa_register_memblks(struct 
numa_meminfo *mi)
if (!numa_meminfo_cover_memory(mi))
return -EINVAL;
 
-   /* Finally register nodes. */
-   for_each_node_mask(nid, node_possible_map) {
-   u64 start = PFN_PHYS(max_pfn);
-   u64 end = 0;
-
-   for (i = 0; i < mi->nr_blks; i++) {
-   if (nid != mi->blk[i].nid)
-   continue;
-   start = min(mi->blk[i].start, start);
-   end = max(mi->blk[i].end, end);
-   }
-
-   if (start < end)
-   setup_node_data(nid, start, end);
-   }
-
-   /* Dump memblock with node info and return. */
-   memblock_dump_all();
return 0;
 }
 
@@ -562,7 +544,6 @@ static int __init numa_init(int (*init_func)(void))
 
nodes_clear(numa_nodes_parsed);
nodes_clear(node_possible_map);
-   nodes_clear(node_online_map);
memset(_meminfo, 0, sizeof(numa_meminfo));
WARN_ON(memblock_set_node(0, ULLONG_MAX, MAX_NUMNODES));
numa_reset_distance();
@@ -580,15 +561,6 @@ static int __init numa_init(int (*init_func)(void))
if (ret < 0)
return ret;
 
-   for (i = 0; i < nr_cpu_ids; i++) {
-   int nid = early_cpu_to_node(i);
-
-   if (nid == NUMA_NO_NODE)
-   continue;
-   if (!node_online(nid))
-   numa_clear_node(i);
-   }
-   numa_init_array();
return 0;
 }
 
@@ -621,7 +593,7 @@ static int __init dummy_numa_init(void)
  * last fallback is dummy single node config encomapssing whole memory and
  * never fails.
  */
-void __init x86_numa_init(void)
+static void __init early_x86_numa_init(void)
 {
if (!numa_off) {
 #ifdef CONFIG_X86_NUMAQ
@@ -641,6 +613,43 @@ void __init x86_numa_init(void)
numa_init(dummy_numa_init);
 }
 
+void __init x86_numa_init(void)
+{
+   int i, nid;
+   struct numa_meminfo *mi = _meminfo;
+
+   early_x86_numa_init();
+
+   /* Finally register nodes. */
+   for_each_node_mask(nid, node_possible_map) {
+   u64 start = PFN_PHYS(max_pfn);
+   u64 end = 0;
+
+   for (i = 0; i < mi->nr_blks; i++) {
+   if (nid != mi->blk[i].nid)
+   continue;
+   start = min(mi->blk[i].start, start);
+   end = max(mi->blk[i].end, end);
+   }
+
+   if (start < end)
+   setup_node_data(nid, start, end); /* online is set */
+   }
+
+   /* Dump memblock with node info */
+   memblock_dump_all();
+
+   for (i = 0; i < nr_cpu_ids; i++) {
+   int nid = early_cpu_to_node(i);
+
+   if (nid == NUMA_NO_NODE)
+   continue;
+   if (!node_online(nid))
+   numa_clear_node(i);
+   }
+   numa_init_array();
+}
+
 static __init int find_near_online_node(int node)
 {
int n, val;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 06/20] x86, ACPI: Store override acpi tables phys addr in cpio files info array

2013-03-09 Thread Yinghai Lu

In 32bit we will find table with phys address during 32bit flat mode
in head_32.S, because at that time we don't need set page table to
access initrd.

For copying we could use early_ioremap() with phys directly before mem mapping
is set.

To keep 32bit and 64bit consistent, use phys_addr for all.

Signed-off-by: Yinghai Lu 
Cc: Rafael J. Wysocki 
Cc: linux-a...@vger.kernel.org
---
 drivers/acpi/osl.c |   14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index d66ae0e..54bcc37 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -615,7 +615,7 @@ void __init acpi_initrd_override_find(void *data, size_t 
size)
table->signature, cpio_path, file.name, table->length);
 
all_tables_size += table->length;
-   acpi_initrd_files[table_nr].data = file.data;
+   acpi_initrd_files[table_nr].data = (void *)__pa(file.data);
acpi_initrd_files[table_nr].size = file.size;
table_nr++;
}
@@ -624,7 +624,7 @@ void __init acpi_initrd_override_find(void *data, size_t 
size)
 void __init acpi_initrd_override_copy(void)
 {
int no, total_offset = 0;
-   char *p;
+   char *p, *q;
 
if (!all_tables_size)
return;
@@ -654,12 +654,20 @@ void __init acpi_initrd_override_copy(void)
arch_reserve_mem_area(acpi_tables_addr, all_tables_size);
 
for (no = 0; no < ACPI_OVERRIDE_TABLES; no++) {
+   /*
+* have to use unsigned long, otherwise 32bit spit warning
+* and it is ok to unsigned long, as bootloader would not
+* load initrd above 4G for 32bit kernel.
+*/
+   unsigned long addr = (unsigned long)acpi_initrd_files[no].data;
phys_addr_t size = acpi_initrd_files[no].size;
 
if (!size)
break;
+   q = early_ioremap(addr, size);
p = early_ioremap(acpi_tables_addr + total_offset, size);
-   memcpy(p, acpi_initrd_files[no].data, size);
+   memcpy(p, q, size);
+   early_iounmap(q, size);
early_iounmap(p, size);
total_offset += size;
}
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 02/20] x86, microcode: Use common get_ramdisk_image()

2013-03-09 Thread Yinghai Lu

Use common get_ramdisk_image() to get ramdisk start phys address.

We need this to get correct ramdisk adress for 64bit bzImage that
initrd can be loaded above 4G by kexec-tools.

Signed-off-by: Yinghai Lu 
Cc: Fenghua Yu 
---
 arch/x86/kernel/microcode_intel_early.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/microcode_intel_early.c 
b/arch/x86/kernel/microcode_intel_early.c
index 7890bc8..a8df75f 100644
--- a/arch/x86/kernel/microcode_intel_early.c
+++ b/arch/x86/kernel/microcode_intel_early.c
@@ -742,8 +742,8 @@ load_ucode_intel_bsp(void)
struct boot_params *boot_params_p;
 
boot_params_p = (struct boot_params *)__pa_symbol(_params);
-   ramdisk_image = boot_params_p->hdr.ramdisk_image;
-   ramdisk_size  = boot_params_p->hdr.ramdisk_size;
+   ramdisk_image = get_ramdisk_image(boot_params_p);
+   ramdisk_size  = get_ramdisk_image(boot_params_p);
initrd_start_early = ramdisk_image;
initrd_end_early = initrd_start_early + ramdisk_size;
 
@@ -752,8 +752,8 @@ load_ucode_intel_bsp(void)
(unsigned long *)__pa_symbol(_saved_in_initrd),
initrd_start_early, initrd_end_early, );
 #else
-   ramdisk_image = boot_params.hdr.ramdisk_image;
-   ramdisk_size  = boot_params.hdr.ramdisk_size;
+   ramdisk_image = get_ramdisk_image(_params);
+   ramdisk_size  = get_ramdisk_size(_params);
initrd_start_early = ramdisk_image + PAGE_OFFSET;
initrd_end_early = initrd_start_early + ramdisk_size;
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 05/20] x86, ACPI: Split acpi_initrd_override to find/copy two functions

2013-03-09 Thread Yinghai Lu

To parse srat early, we need to move acpi table probing early.
acpi_initrd_table_override is before acpi table probing. So we need to
move it early too.

Current code acpi_initrd_table_override is after init_mem_mapping and
relocate_initrd(), so it can scan initrd and copy acpi tables with kernel
virtual address of initrd.
Copying need to be after memblock is ready, because it need to allocate
buffer for new acpi tables.

So we have to split that function to find and copy two functions.
Find should be as early as possible. Copy should be after memblock is ready.

Finding could be done in head_32.S and head64.c, just like microcode
early scanning. In head_32.S, it is 32bit flat mode, we don't
need to set page table to access it. In head64.c, #PF set page table
could help us access initrd with kernel low mapping address.

Copying could be done just after memblock is ready and before probing
acpi tables, and we need to early_ioremap to access source and target
range, as init_mem_mapping is not called yet.

Also move down two functions declaration to avoid #ifdef in setup.c

ACPI_INITRD_TABLE_OVERRIDE depends one ACPI and BLK_DEV_INITRD.
So could move declaration out from #ifdef CONFIG_ACPI protection.

-v2: Split one patch out according to tj.
 also don't pass table_nr around.

Signed-off-by: Yinghai 
Cc: Pekka Enberg 
Cc: Jacob Shin 
Cc: Rafael J. Wysocki 
Cc: linux-a...@vger.kernel.org
---
 arch/x86/kernel/setup.c |6 +++---
 drivers/acpi/osl.c  |   18 +-
 include/linux/acpi.h|   16 
 3 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index e75c6e6..d0cc176 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1092,9 +1092,9 @@ void __init setup_arch(char **cmdline_p)
 
reserve_initrd();
 
-#if defined(CONFIG_ACPI) && defined(CONFIG_BLK_DEV_INITRD)
-   acpi_initrd_override((void *)initrd_start, initrd_end - initrd_start);
-#endif
+   acpi_initrd_override_find((void *)initrd_start,
+   initrd_end - initrd_start);
+   acpi_initrd_override_copy();
 
reserve_crashkernel();
 
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 8aaf721..d66ae0e 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -572,14 +572,13 @@ static const char * const table_sigs[] = {
 #define ACPI_OVERRIDE_TABLES 64
 static struct cpio_data __initdata acpi_initrd_files[ACPI_OVERRIDE_TABLES];
 
-void __init acpi_initrd_override(void *data, size_t size)
+void __init acpi_initrd_override_find(void *data, size_t size)
 {
-   int sig, no, table_nr = 0, total_offset = 0;
+   int sig, no, table_nr = 0;
long offset = 0;
struct acpi_table_header *table;
char cpio_path[32] = "kernel/firmware/acpi/";
struct cpio_data file;
-   char *p;
 
if (data == NULL || size == 0)
return;
@@ -620,7 +619,14 @@ void __init acpi_initrd_override(void *data, size_t size)
acpi_initrd_files[table_nr].size = file.size;
table_nr++;
}
-   if (table_nr == 0)
+}
+
+void __init acpi_initrd_override_copy(void)
+{
+   int no, total_offset = 0;
+   char *p;
+
+   if (!all_tables_size)
return;
 
/* under 4G at first, then above 4G */
@@ -647,9 +653,11 @@ void __init acpi_initrd_override(void *data, size_t size)
memblock_reserve(acpi_tables_addr, acpi_tables_addr + all_tables_size);
arch_reserve_mem_area(acpi_tables_addr, all_tables_size);
 
-   for (no = 0; no < table_nr; no++) {
+   for (no = 0; no < ACPI_OVERRIDE_TABLES; no++) {
phys_addr_t size = acpi_initrd_files[no].size;
 
+   if (!size)
+   break;
p = early_ioremap(acpi_tables_addr + total_offset, size);
memcpy(p, acpi_initrd_files[no].data, size);
early_iounmap(p, size);
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index bcbdd74..1654a241 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -79,14 +79,6 @@ typedef int (*acpi_tbl_table_handler)(struct 
acpi_table_header *table);
 typedef int (*acpi_tbl_entry_handler)(struct acpi_subtable_header *header,
  const unsigned long end);
 
-#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
-void acpi_initrd_override(void *data, size_t size);
-#else
-static inline void acpi_initrd_override(void *data, size_t size)
-{
-}
-#endif
-
 char * __acpi_map_table (unsigned long phys_addr, unsigned long size);
 void __acpi_unmap_table(char *map, unsigned long size);
 int early_acpi_boot_init(void);
@@ -485,6 +477,14 @@ static inline bool acpi_driver_match_device(struct device 
*dev,
 
 #endif /* !CONFIG_ACPI */
 
+#ifdef CONFIG_ACPI_INITRD_TABLE_OVERRIDE
+void acpi_initrd_override_find(void *data, size_t size);
+void acpi_initrd_override_copy(void);
+#else
+static inline void

[PATCH v2 01/20] x86: Change get_ramdisk_image() to global

2013-03-09 Thread Yinghai Lu

Need to use get_ramdisk_image() with early microcode_updating in other file.
Change it to global.

Also make it to take boot_params pointer, as head_32.S need to access it via
phys address during 32bit flat mode.

Signed-off-by: Yinghai Lu 
---
 arch/x86/include/asm/setup.h |3 +++
 arch/x86/kernel/setup.c  |   28 ++--
 2 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index b7bf350..4f71d48 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -106,6 +106,9 @@ void *extend_brk(size_t size, size_t align);
RESERVE_BRK(name, sizeof(type) * entries)
 
 extern void probe_roms(void);
+u64 get_ramdisk_image(struct boot_params *bp);
+u64 get_ramdisk_size(struct boot_params *bp);
+
 #ifdef __i386__
 
 void __init i386_start_kernel(void);
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 90d8cc9..1629577 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -300,19 +300,19 @@ static void __init reserve_brk(void)
 
 #ifdef CONFIG_BLK_DEV_INITRD
 
-static u64 __init get_ramdisk_image(void)
+u64 __init get_ramdisk_image(struct boot_params *bp)
 {
-   u64 ramdisk_image = boot_params.hdr.ramdisk_image;
+   u64 ramdisk_image = bp->hdr.ramdisk_image;
 
-   ramdisk_image |= (u64)boot_params.ext_ramdisk_image << 32;
+   ramdisk_image |= (u64)bp->ext_ramdisk_image << 32;
 
return ramdisk_image;
 }
-static u64 __init get_ramdisk_size(void)
+u64 __init get_ramdisk_size(struct boot_params *bp)
 {
-   u64 ramdisk_size = boot_params.hdr.ramdisk_size;
+   u64 ramdisk_size = bp->hdr.ramdisk_size;
 
-   ramdisk_size |= (u64)boot_params.ext_ramdisk_size << 32;
+   ramdisk_size |= (u64)bp->ext_ramdisk_size << 32;
 
return ramdisk_size;
 }
@@ -321,8 +321,8 @@ static u64 __init get_ramdisk_size(void)
 static void __init relocate_initrd(void)
 {
/* Assume only end is not page aligned */
-   u64 ramdisk_image = get_ramdisk_image();
-   u64 ramdisk_size  = get_ramdisk_size();
+   u64 ramdisk_image = get_ramdisk_image(_params);
+   u64 ramdisk_size  = get_ramdisk_size(_params);
u64 area_size = PAGE_ALIGN(ramdisk_size);
u64 ramdisk_here;
unsigned long slop, clen, mapaddr;
@@ -361,8 +361,8 @@ static void __init relocate_initrd(void)
ramdisk_size  -= clen;
}
 
-   ramdisk_image = get_ramdisk_image();
-   ramdisk_size  = get_ramdisk_size();
+   ramdisk_image = get_ramdisk_image(_params);
+   ramdisk_size  = get_ramdisk_size(_params);
printk(KERN_INFO "Move RAMDISK from [mem %#010llx-%#010llx] to"
" [mem %#010llx-%#010llx]\n",
ramdisk_image, ramdisk_image + ramdisk_size - 1,
@@ -372,8 +372,8 @@ static void __init relocate_initrd(void)
 static void __init early_reserve_initrd(void)
 {
/* Assume only end is not page aligned */
-   u64 ramdisk_image = get_ramdisk_image();
-   u64 ramdisk_size  = get_ramdisk_size();
+   u64 ramdisk_image = get_ramdisk_image(_params);
+   u64 ramdisk_size  = get_ramdisk_size(_params);
u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
 
if (!boot_params.hdr.type_of_loader ||
@@ -385,8 +385,8 @@ static void __init early_reserve_initrd(void)
 static void __init reserve_initrd(void)
 {
/* Assume only end is not page aligned */
-   u64 ramdisk_image = get_ramdisk_image();
-   u64 ramdisk_size  = get_ramdisk_size();
+   u64 ramdisk_image = get_ramdisk_image(_params);
+   u64 ramdisk_size  = get_ramdisk_size(_params);
u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
u64 mapped_size;
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2, part1 03/29] mm/ARM: use common help functions to free reserved pages

2013-03-09 Thread Jiang Liu

Use common help functions to free reserved pages.

Signed-off-by: Jiang Liu 
Cc: Russell King 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
---
 arch/arm/mm/init.c   |   48 
 arch/arm64/mm/init.c |   26 ++
 2 files changed, 18 insertions(+), 56 deletions(-)

diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index ad722f1..40a5bc2 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -424,24 +424,6 @@ void __init bootmem_init(void)
max_pfn = max_high - PHYS_PFN_OFFSET;
 }
 
-static inline int free_area(unsigned long pfn, unsigned long end, char *s)
-{
-   unsigned int pages = 0, size = (end - pfn) << (PAGE_SHIFT - 10);
-
-   for (; pfn < end; pfn++) {
-   struct page *page = pfn_to_page(pfn);
-   ClearPageReserved(page);
-   init_page_count(page);
-   __free_page(page);
-   pages++;
-   }
-
-   if (size && s)
-   printk(KERN_INFO "Freeing %s memory: %dK\n", s, size);
-
-   return pages;
-}
-
 /*
  * Poison init memory with an undefined instruction (ARM) or a branch to an
  * undefined instruction (Thumb).
@@ -534,6 +516,16 @@ static void __init free_unused_memmap(struct meminfo *mi)
 #endif
 }
 
+#ifdef CONFIG_HIGHMEM
+static inline void free_area_high(unsigned long pfn, unsigned long end)
+{
+   for (; pfn < end; pfn++) {
+   __free_reserved_page(pfn_to_page(pfn));
+   totalhigh_pages++;
+   }
+}
+#endif
+
 static void __init free_highpages(void)
 {
 #ifdef CONFIG_HIGHMEM
@@ -569,8 +561,7 @@ static void __init free_highpages(void)
if (res_end > end)
res_end = end;
if (res_start != start)
-   totalhigh_pages += free_area(start, res_start,
-NULL);
+   free_area_high(start, res_start);
start = res_end;
if (start == end)
break;
@@ -578,7 +569,7 @@ static void __init free_highpages(void)
 
/* And now free anything which remains */
if (start < end)
-   totalhigh_pages += free_area(start, end, NULL);
+   free_area_high(start, end);
}
totalram_pages += totalhigh_pages;
 #endif
@@ -609,8 +600,7 @@ void __init mem_init(void)
 
 #ifdef CONFIG_SA
/* now that our DMA memory is actually so designated, we can free it */
-   totalram_pages += free_area(PHYS_PFN_OFFSET,
-   __phys_to_pfn(__pa(swapper_pg_dir)), NULL);
+   free_reserved_area(__va(PHYS_PFN_OFFSET), swapper_pg_dir, 0, NULL);
 #endif
 
free_highpages();
@@ -738,16 +728,12 @@ void free_initmem(void)
extern char __tcm_start, __tcm_end;
 
poison_init_mem(&__tcm_start, &__tcm_end - &__tcm_start);
-   totalram_pages += free_area(__phys_to_pfn(__pa(&__tcm_start)),
-   __phys_to_pfn(__pa(&__tcm_end)),
-   "TCM link");
+   free_reserved_area(&__tcm_start, &__tcm_end, 0, "TCM link");
 #endif
 
poison_init_mem(__init_begin, __init_end - __init_begin);
if (!machine_is_integrator() && !machine_is_cintegrator())
-   totalram_pages += free_area(__phys_to_pfn(__pa(__init_begin)),
-   __phys_to_pfn(__pa(__init_end)),
-   "init");
+   free_initmem_default(0);
 }
 
 #ifdef CONFIG_BLK_DEV_INITRD
@@ -758,9 +744,7 @@ void free_initrd_mem(unsigned long start, unsigned long end)
 {
if (!keep_initrd) {
poison_init_mem((void *)start, PAGE_ALIGN(end) - start);
-   totalram_pages += free_area(__phys_to_pfn(__pa(start)),
-   __phys_to_pfn(__pa(end)),
-   "initrd");
+   free_reserved_area(start, end, 0, "initrd");
}
 }
 
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 800aac3..f497ca7 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -197,24 +197,6 @@ void __init bootmem_init(void)
max_pfn = max_low_pfn = max;
 }
 
-static inline int free_area(unsigned long pfn, unsigned long end, char *s)
-{
-   unsigned int pages = 0, size = (end - pfn) << (PAGE_SHIFT - 10);
-
-   for (; pfn < end; pfn++) {
-   struct page *page = pfn_to_page(pfn);
-   ClearPageReserved(page);
-   init_page_count(page);
-   __free_page(page);
-   pages++;
-   }
-
-   if (size && s)
-   pr_info("Freeing %s memory: %dK\n", s, size);
-
-   return pages;
-}
-
 /*
  *

[PATCH v2, part1 06/29] mm/c6x: use common help functions to free reserved pages

2013-03-09 Thread Jiang Liu

Use common help functions to free reserved pages.

Signed-off-by: Jiang Liu 
Cc: Mark Salter 
Cc: Aurelien Jacquiot 
---
 arch/c6x/mm/init.c |   30 ++
 1 file changed, 2 insertions(+), 28 deletions(-)

diff --git a/arch/c6x/mm/init.c b/arch/c6x/mm/init.c
index 89395f0..a9fcd89 100644
--- a/arch/c6x/mm/init.c
+++ b/arch/c6x/mm/init.c
@@ -77,37 +77,11 @@ void __init mem_init(void)
 #ifdef CONFIG_BLK_DEV_INITRD
 void __init free_initrd_mem(unsigned long start, unsigned long end)
 {
-   int pages = 0;
-   for (; start < end; start += PAGE_SIZE) {
-   ClearPageReserved(virt_to_page(start));
-   init_page_count(virt_to_page(start));
-   free_page(start);
-   totalram_pages++;
-   pages++;
-   }
-   printk(KERN_INFO "Freeing initrd memory: %luk freed\n",
-  (pages * PAGE_SIZE) >> 10);
+   free_reserved_area(start, end, 0, "initrd");
 }
 #endif
 
 void __init free_initmem(void)
 {
-   unsigned long addr;
-
-   /*
-* The following code should be cool even if these sections
-* are not page aligned.
-*/
-   addr = PAGE_ALIGN((unsigned long)(__init_begin));
-
-   /* next to check that the page we free is not a partial page */
-   for (; addr + PAGE_SIZE < (unsigned long)(__init_end);
-addr += PAGE_SIZE) {
-   ClearPageReserved(virt_to_page(addr));
-   init_page_count(virt_to_page(addr));
-   free_page(addr);
-   totalram_pages++;
-   }
-   printk(KERN_INFO "Freeing unused kernel memory: %dK freed\n",
-  (int) ((addr - PAGE_ALIGN((long) &__init_begin)) >> 10));
+   free_initmem_default(0);
 }
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2, part1 04/29] mm/avr32: use common help functions to free reserved pages

2013-03-09 Thread Jiang Liu

Use common help functions to free reserved pages.

Signed-off-by: Jiang Liu 
Acked-by: Hans-Christian Egtvedt 
Cc: Haavard Skinnemoen 
---
 arch/avr32/mm/init.c |   24 ++--
 1 file changed, 2 insertions(+), 22 deletions(-)

diff --git a/arch/avr32/mm/init.c b/arch/avr32/mm/init.c
index 2798c2d..e66e840 100644
--- a/arch/avr32/mm/init.c
+++ b/arch/avr32/mm/init.c
@@ -146,34 +146,14 @@ void __init mem_init(void)
initsize >> 10);
 }
 
-static inline void free_area(unsigned long addr, unsigned long end, char *s)
-{
-   unsigned int size = (end - addr) >> 10;
-
-   for (; addr < end; addr += PAGE_SIZE) {
-   struct page *page = virt_to_page(addr);
-   ClearPageReserved(page);
-   init_page_count(page);
-   free_page(addr);
-   totalram_pages++;
-   }
-
-   if (size && s)
-   printk(KERN_INFO "Freeing %s memory: %dK (%lx - %lx)\n",
-  s, size, end - (size << 10), end);
-}
-
 void free_initmem(void)
 {
-   free_area((unsigned long)__init_begin, (unsigned long)__init_end,
- "init");
+   free_initmem_default(0);
 }
 
 #ifdef CONFIG_BLK_DEV_INITRD
-
 void free_initrd_mem(unsigned long start, unsigned long end)
 {
-   free_area(start, end, "initrd");
+   free_reserved_area(start, end, 0, "initrd");
 }
-
 #endif
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2, part1 28/29] mm/metag: use common help functions to free reserved pages

2013-03-09 Thread Jiang Liu

Use common help functions to free reserved pages.

Signed-off-by: Jiang Liu 
Cc: James Hogan 
Cc: linux-kernel@vger.kernel.org
---
 arch/metag/mm/init.c |   21 ++---
 1 file changed, 2 insertions(+), 19 deletions(-)

diff --git a/arch/metag/mm/init.c b/arch/metag/mm/init.c
index 504a398..c6784fb 100644
--- a/arch/metag/mm/init.c
+++ b/arch/metag/mm/init.c
@@ -412,32 +412,15 @@ void __init mem_init(void)
return;
 }
 
-static void free_init_pages(char *what, unsigned long begin, unsigned long end)
-{
-   unsigned long addr;
-
-   for (addr = begin; addr < end; addr += PAGE_SIZE) {
-   ClearPageReserved(virt_to_page(addr));
-   init_page_count(virt_to_page(addr));
-   memset((void *)addr, POISON_FREE_INITMEM, PAGE_SIZE);
-   free_page(addr);
-   totalram_pages++;
-   }
-   pr_info("Freeing %s: %luk freed\n", what, (end - begin) >> 10);
-}
-
 void free_initmem(void)
 {
-   free_init_pages("unused kernel memory",
-   (unsigned long)(&__init_begin),
-   (unsigned long)(&__init_end));
+   free_initmem_default(POISON_FREE_INITMEM);
 }
 
 #ifdef CONFIG_BLK_DEV_INITRD
 void free_initrd_mem(unsigned long start, unsigned long end)
 {
-   end = end & PAGE_MASK;
-   free_init_pages("initrd memory", start, end);
+   free_reserved_area(start, end, POISON_FREE_INITMEM, "initrd");
 }
 #endif
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2, part1 29/29] mm,kexec: use common help functions to free reserved pages

2013-03-09 Thread Jiang Liu

Use common help functions to free reserved pages.

Signed-off-by: Jiang Liu 
Cc: Eric Biederman 
---
 kernel/kexec.c |8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/kernel/kexec.c b/kernel/kexec.c
index bddd3d7..be95397 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1118,12 +1118,8 @@ void __weak crash_free_reserved_phys_range(unsigned long 
begin,
 {
unsigned long addr;
 
-   for (addr = begin; addr < end; addr += PAGE_SIZE) {
-   ClearPageReserved(pfn_to_page(addr >> PAGE_SHIFT));
-   init_page_count(pfn_to_page(addr >> PAGE_SHIFT));
-   free_page((unsigned long)__va(addr));
-   totalram_pages++;
-   }
+   for (addr = begin; addr < end; addr += PAGE_SIZE)
+   free_reserved_page(pfn_to_page(addr >> PAGE_SHIFT));
 }
 
 int crash_shrink_memory(unsigned long new_size)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2, part1 26/29] mm/xtensa: use common help functions to free reserved pages

2013-03-09 Thread Jiang Liu

Use common help functions to free reserved pages.

Signed-off-by: Jiang Liu 
Cc: Chris Zankel 
Cc: Max Filippov 
---
 arch/xtensa/mm/init.c |   21 +++--
 1 file changed, 3 insertions(+), 18 deletions(-)

diff --git a/arch/xtensa/mm/init.c b/arch/xtensa/mm/init.c
index 7a5156f..bba125b 100644
--- a/arch/xtensa/mm/init.c
+++ b/arch/xtensa/mm/init.c
@@ -208,32 +208,17 @@ void __init mem_init(void)
   highmemsize >> 10);
 }
 
-void
-free_reserved_mem(void *start, void *end)
-{
-   for (; start < end; start += PAGE_SIZE) {
-   ClearPageReserved(virt_to_page(start));
-   init_page_count(virt_to_page(start));
-   free_page((unsigned long)start);
-   totalram_pages++;
-   }
-}
-
 #ifdef CONFIG_BLK_DEV_INITRD
 extern int initrd_is_mapped;
 
 void free_initrd_mem(unsigned long start, unsigned long end)
 {
-   if (initrd_is_mapped) {
-   free_reserved_mem((void*)start, (void*)end);
-   printk ("Freeing initrd memory: %ldk freed\n",(end-start)>>10);
-   }
+   if (initrd_is_mapped)
+   free_reserved_area(start, end, 0, "initrd");
 }
 #endif
 
 void free_initmem(void)
 {
-   free_reserved_mem(__init_begin, __init_end);
-   printk("Freeing unused kernel memory: %zuk freed\n",
-  (__init_end - __init_begin) >> 10);
+   free_initmem_default(0);
 }
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2, part1 27/29] mm/arc: use common help functions to free reserved pages

2013-03-09 Thread Jiang Liu

Use common help functions to free reserved pages.

Signed-off-by: Jiang Liu 
Acked-by: Vineet Gupta 
Cc: linux-snps-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org (open list)
---
 arch/arc/mm/init.c |   23 ++-
 1 file changed, 2 insertions(+), 21 deletions(-)

diff --git a/arch/arc/mm/init.c b/arch/arc/mm/init.c
index caf797d..727d479 100644
--- a/arch/arc/mm/init.c
+++ b/arch/arc/mm/init.c
@@ -144,37 +144,18 @@ void __init mem_init(void)
PAGES_TO_KB(reserved_pages));
 }
 
-static void __init free_init_pages(const char *what, unsigned long begin,
-  unsigned long end)
-{
-   unsigned long addr;
-
-   pr_info("Freeing %s: %ldk [%lx] to [%lx]\n",
-   what, TO_KB(end - begin), begin, end);
-
-   /* need to check that the page we free is not a partial page */
-   for (addr = begin; addr + PAGE_SIZE <= end; addr += PAGE_SIZE) {
-   ClearPageReserved(virt_to_page(addr));
-   init_page_count(virt_to_page(addr));
-   free_page(addr);
-   totalram_pages++;
-   }
-}
-
 /*
  * free_initmem: Free all the __init memory.
  */
 void __init_refok free_initmem(void)
 {
-   free_init_pages("unused kernel memory",
-   (unsigned long)__init_begin,
-   (unsigned long)__init_end);
+   free_initmem_default(0);
 }
 
 #ifdef CONFIG_BLK_DEV_INITRD
 void __init free_initrd_mem(unsigned long start, unsigned long end)
 {
-   free_init_pages("initrd memory", start, end);
+   free_reserved_area(start, end, 0, "initrd");
 }
 #endif
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2, part1 25/29] mm/x86: use common help functions to free reserved pages

2013-03-09 Thread Jiang Liu

Use common help functions to free reserved pages.

Signed-off-by: Jiang Liu 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
---
 arch/x86/mm/init.c|5 +
 arch/x86/mm/init_64.c |5 ++---
 2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 4903a03..4a705e6 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -516,11 +516,8 @@ void free_init_pages(char *what, unsigned long begin, 
unsigned long end)
printk(KERN_INFO "Freeing %s: %luk freed\n", what, (end - begin) >> 10);
 
for (; addr < end; addr += PAGE_SIZE) {
-   ClearPageReserved(virt_to_page(addr));
-   init_page_count(virt_to_page(addr));
memset((void *)addr, POISON_FREE_INITMEM, PAGE_SIZE);
-   free_page(addr);
-   totalram_pages++;
+   free_reserved_page(virt_to_page(addr));
}
 #endif
 }
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 474e28f..2ef81f1 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1067,10 +1067,9 @@ void __init mem_init(void)
 
/* clear_bss() already clear the empty_zero_page */
 
-   reservedpages = 0;
-
-   /* this will put all low memory onto the freelists */
register_page_bootmem_info();
+
+   /* this will put all memory onto the freelists */
totalram_pages = free_all_bootmem();
 
absent_pages = absent_pages_in_range(0, max_pfn);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2, part1 24/29] mm/unicore32: use common help functions to free reserved pages

2013-03-09 Thread Jiang Liu

Use common help functions to free reserved pages.

Signed-off-by: Jiang Liu 
Cc: Guan Xuetao 
---
 arch/unicore32/mm/init.c |   28 +++-
 1 file changed, 3 insertions(+), 25 deletions(-)

diff --git a/arch/unicore32/mm/init.c b/arch/unicore32/mm/init.c
index de186bd..c5817b0 100644
--- a/arch/unicore32/mm/init.c
+++ b/arch/unicore32/mm/init.c
@@ -313,24 +313,6 @@ void __init bootmem_init(void)
max_pfn = max_high - PHYS_PFN_OFFSET;
 }
 
-static inline int free_area(unsigned long pfn, unsigned long end, char *s)
-{
-   unsigned int pages = 0, size = (end - pfn) << (PAGE_SHIFT - 10);
-
-   for (; pfn < end; pfn++) {
-   struct page *page = pfn_to_page(pfn);
-   ClearPageReserved(page);
-   init_page_count(page);
-   __free_page(page);
-   pages++;
-   }
-
-   if (size && s)
-   printk(KERN_INFO "Freeing %s memory: %dK\n", s, size);
-
-   return pages;
-}
-
 static inline void
 free_memmap(unsigned long start_pfn, unsigned long end_pfn)
 {
@@ -404,9 +386,9 @@ void __init mem_init(void)
 
max_mapnr   = pfn_to_page(max_pfn + PHYS_PFN_OFFSET) - mem_map;
 
-   /* this will put all unused low memory onto the freelists */
free_unused_memmap();
 
+   /* this will put all unused low memory onto the freelists */
totalram_pages += free_all_bootmem();
 
reserved_pages = free_pages = 0;
@@ -491,9 +473,7 @@ void __init mem_init(void)
 
 void free_initmem(void)
 {
-   totalram_pages += free_area(__phys_to_pfn(__pa(__init_begin)),
-   __phys_to_pfn(__pa(__init_end)),
-   "init");
+   free_initmem_default(0);
 }
 
 #ifdef CONFIG_BLK_DEV_INITRD
@@ -503,9 +483,7 @@ static int keep_initrd;
 void free_initrd_mem(unsigned long start, unsigned long end)
 {
if (!keep_initrd)
-   totalram_pages += free_area(__phys_to_pfn(__pa(start)),
-   __phys_to_pfn(__pa(end)),
-   "initrd");
+   free_reserved_area(start, end, 0, "initrd");
 }
 
 static int __init keepinitrd_setup(char *__unused)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2, part1 22/29] mm/SPARC: use common help functions to free reserved pages

2013-03-09 Thread Jiang Liu

Use common help functions to free reserved pages.

Signed-off-by: Jiang Liu 
Acked-by: David S. Miller 
Cc: Sam Ravnborg 
---
 arch/sparc/kernel/leon_smp.c |   15 +++
 arch/sparc/mm/init_32.c  |   37 +++--
 arch/sparc/mm/init_64.c  |   26 +-
 3 files changed, 11 insertions(+), 67 deletions(-)

diff --git a/arch/sparc/kernel/leon_smp.c b/arch/sparc/kernel/leon_smp.c
index 9b40c9c..6cfc1b0 100644
--- a/arch/sparc/kernel/leon_smp.c
+++ b/arch/sparc/kernel/leon_smp.c
@@ -253,24 +253,15 @@ void __init leon_smp_done(void)
 
/* Free unneeded trap tables */
if (!cpu_present(1)) {
-   ClearPageReserved(virt_to_page(_cpu1));
-   init_page_count(virt_to_page(_cpu1));
-   free_page((unsigned long)_cpu1);
-   totalram_pages++;
+   free_reserved_page(virt_to_page(_cpu1));
num_physpages++;
}
if (!cpu_present(2)) {
-   ClearPageReserved(virt_to_page(_cpu2));
-   init_page_count(virt_to_page(_cpu2));
-   free_page((unsigned long)_cpu2);
-   totalram_pages++;
+   free_reserved_page(virt_to_page(_cpu2));
num_physpages++;
}
if (!cpu_present(3)) {
-   ClearPageReserved(virt_to_page(_cpu3));
-   init_page_count(virt_to_page(_cpu3));
-   free_page((unsigned long)_cpu3);
-   totalram_pages++;
+   free_reserved_page(virt_to_page(_cpu3));
num_physpages++;
}
/* Ok, they are spinning and ready to go. */
diff --git a/arch/sparc/mm/init_32.c b/arch/sparc/mm/init_32.c
index 48e0c03..13d6fee 100644
--- a/arch/sparc/mm/init_32.c
+++ b/arch/sparc/mm/init_32.c
@@ -374,45 +374,14 @@ void __init mem_init(void)
 
 void free_initmem (void)
 {
-   unsigned long addr;
-   unsigned long freed;
-
-   addr = (unsigned long)(&__init_begin);
-   freed = (unsigned long)(&__init_end) - addr;
-   for (; addr < (unsigned long)(&__init_end); addr += PAGE_SIZE) {
-   struct page *p;
-
-   memset((void *)addr, POISON_FREE_INITMEM, PAGE_SIZE);
-   p = virt_to_page(addr);
-
-   ClearPageReserved(p);
-   init_page_count(p);
-   __free_page(p);
-   totalram_pages++;
-   num_physpages++;
-   }
-   printk(KERN_INFO "Freeing unused kernel memory: %ldk freed\n",
-   freed >> 10);
+   num_physpages += free_initmem_default(POISON_FREE_INITMEM);
 }
 
 #ifdef CONFIG_BLK_DEV_INITRD
 void free_initrd_mem(unsigned long start, unsigned long end)
 {
-   if (start < end)
-   printk(KERN_INFO "Freeing initrd memory: %ldk freed\n",
-   (end - start) >> 10);
-   for (; start < end; start += PAGE_SIZE) {
-   struct page *p;
-
-   memset((void *)start, POISON_FREE_INITMEM, PAGE_SIZE);
-   p = virt_to_page(start);
-
-   ClearPageReserved(p);
-   init_page_count(p);
-   __free_page(p);
-   totalram_pages++;
-   num_physpages++;
-   }
+   num_physpages += free_reserved_area(start, end, POISON_FREE_INITMEM,
+   "initrd");
 }
 #endif
 
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 1588d33..3f559d1 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2060,8 +2060,7 @@ void __init mem_init(void)
/* We subtract one to account for the mem_map_zero page
 * allocated below.
 */
-   totalram_pages -= 1;
-   num_physpages = totalram_pages;
+   num_physpages = totalram_pages - 1;
 
/*
 * Set up the zero page, mark it reserved, so that page count
@@ -2072,7 +2071,7 @@ void __init mem_init(void)
prom_printf("paging_init: Cannot alloc zero page.\n");
prom_halt();
}
-   SetPageReserved(mem_map_zero);
+   mark_page_reserved(mem_map_zero);
 
codepages = (((unsigned long) _etext) - ((unsigned long) _start));
codepages = PAGE_ALIGN(codepages) >> PAGE_SHIFT;
@@ -2112,7 +2111,6 @@ void free_initmem(void)
initend = (unsigned long)(__init_end) & PAGE_MASK;
for (; addr < initend; addr += PAGE_SIZE) {
unsigned long page;
-   struct page *p;
 
page = (addr +
((unsigned long) __va(kern_base)) -
@@ -2120,13 +2118,8 @@ void free_initmem(void)
memset((void *)addr, POISON_FREE_INITMEM, PAGE_SIZE);
 
if (do_free) {
-   p = virt_to_page(page);
-
-   ClearPageReserved(p);
-   init_page_count(p);
-   __free_page(p);
+   free_reserved_page(virt_to_page(page));

[PATCH v2, part1 23/29] mm/um: use common help functions to free reserved pages

2013-03-09 Thread Jiang Liu

Use common help functions to free reserved pages.

Signed-off-by: Jiang Liu 
Cc: Jeff Dike 
---
 arch/um/kernel/mem.c |   10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/arch/um/kernel/mem.c b/arch/um/kernel/mem.c
index 5abcbfb..d5ac802 100644
--- a/arch/um/kernel/mem.c
+++ b/arch/um/kernel/mem.c
@@ -254,15 +254,7 @@ void free_initmem(void)
 #ifdef CONFIG_BLK_DEV_INITRD
 void free_initrd_mem(unsigned long start, unsigned long end)
 {
-   if (start < end)
-   printk(KERN_INFO "Freeing initrd memory: %ldk freed\n",
-  (end - start) >> 10);
-   for (; start < end; start += PAGE_SIZE) {
-   ClearPageReserved(virt_to_page(start));
-   init_page_count(virt_to_page(start));
-   free_page(start);
-   totalram_pages++;
-   }
+   free_reserved_area(start, end, 0, "initrd");
 }
 #endif
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2, part1 21/29] mm/SH: use common help functions to free reserved pages

2013-03-09 Thread Jiang Liu

Use common help functions to free reserved pages.

Signed-off-by: Jiang Liu 
Acked-by: Paul Mundt 
---
 arch/sh/mm/init.c |   26 +++---
 1 file changed, 3 insertions(+), 23 deletions(-)

diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index 1057940..20f9ead 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -417,15 +417,13 @@ void __init mem_init(void)
 
for_each_online_node(nid) {
pg_data_t *pgdat = NODE_DATA(nid);
-   unsigned long node_pages = 0;
void *node_high_memory;
 
num_physpages += pgdat->node_present_pages;
 
if (pgdat->node_spanned_pages)
-   node_pages = free_all_bootmem_node(pgdat);
+   totalram_pages += free_all_bootmem_node(pgdat);
 
-   totalram_pages += node_pages;
 
node_high_memory = (void *)__va((pgdat->node_start_pfn +
 pgdat->node_spanned_pages) <<
@@ -501,31 +499,13 @@ void __init mem_init(void)
 
 void free_initmem(void)
 {
-   unsigned long addr;
-
-   addr = (unsigned long)(&__init_begin);
-   for (; addr < (unsigned long)(&__init_end); addr += PAGE_SIZE) {
-   ClearPageReserved(virt_to_page(addr));
-   init_page_count(virt_to_page(addr));
-   free_page(addr);
-   totalram_pages++;
-   }
-   printk("Freeing unused kernel memory: %ldk freed\n",
-  ((unsigned long)&__init_end -
-   (unsigned long)&__init_begin) >> 10);
+   free_initmem_default(0);
 }
 
 #ifdef CONFIG_BLK_DEV_INITRD
 void free_initrd_mem(unsigned long start, unsigned long end)
 {
-   unsigned long p;
-   for (p = start; p < end; p += PAGE_SIZE) {
-   ClearPageReserved(virt_to_page(p));
-   init_page_count(virt_to_page(p));
-   free_page(p);
-   totalram_pages++;
-   }
-   printk("Freeing initrd memory: %ldk freed\n", (end - start) >> 10);
+   free_reserved_area(start, end, 0, "initrd");
 }
 #endif
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2, part1 19/29] mm/s390: use common help functions to free reserved pages

2013-03-09 Thread Jiang Liu

Use common help functions to free reserved pages.

Signed-off-by: Jiang Liu 
Cc: Martin Schwidefsky 
Cc: Heiko Carstens 
---
 arch/s390/mm/init.c |   35 ++-
 1 file changed, 6 insertions(+), 29 deletions(-)

diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 49ce6bb..70bda9e 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -42,11 +42,10 @@ pgd_t swapper_pg_dir[PTRS_PER_PGD] 
__attribute__((__aligned__(PAGE_SIZE)));
 unsigned long empty_zero_page, zero_page_mask;
 EXPORT_SYMBOL(empty_zero_page);
 
-static unsigned long __init setup_zero_pages(void)
+static void __init setup_zero_pages(void)
 {
struct cpuid cpu_id;
unsigned int order;
-   unsigned long size;
struct page *page;
int i;
 
@@ -75,14 +74,11 @@ static unsigned long __init setup_zero_pages(void)
page = virt_to_page((void *) empty_zero_page);
split_page(page, order);
for (i = 1 << order; i > 0; i--) {
-   SetPageReserved(page);
+   mark_page_reserved(page);
page++;
}
 
-   size = PAGE_SIZE << order;
-   zero_page_mask = (size - 1) & PAGE_MASK;
-
-   return 1UL << order;
+   zero_page_mask = ((PAGE_SIZE << order) - 1) & PAGE_MASK;
 }
 
 /*
@@ -139,7 +135,7 @@ void __init mem_init(void)
 
/* this will put all low memory onto the freelists */
totalram_pages += free_all_bootmem();
-   totalram_pages -= setup_zero_pages();   /* Setup zeroed pages. */
+   setup_zero_pages(); /* Setup zeroed pages. */
 
reservedpages = 0;
 
@@ -158,34 +154,15 @@ void __init mem_init(void)
   PFN_ALIGN((unsigned long)&_eshared) - 1);
 }
 
-void free_init_pages(char *what, unsigned long begin, unsigned long end)
-{
-   unsigned long addr = begin;
-
-   if (begin >= end)
-   return;
-   for (; addr < end; addr += PAGE_SIZE) {
-   ClearPageReserved(virt_to_page(addr));
-   init_page_count(virt_to_page(addr));
-   memset((void *)(addr & PAGE_MASK), POISON_FREE_INITMEM,
-  PAGE_SIZE);
-   free_page(addr);
-   totalram_pages++;
-   }
-   printk(KERN_INFO "Freeing %s: %luk freed\n", what, (end - begin) >> 10);
-}
-
 void free_initmem(void)
 {
-   free_init_pages("unused kernel memory",
-   (unsigned long)&__init_begin,
-   (unsigned long)&__init_end);
+   free_initmem_default(0);
 }
 
 #ifdef CONFIG_BLK_DEV_INITRD
 void __init free_initrd_mem(unsigned long start, unsigned long end)
 {
-   free_init_pages("initrd memory", start, end);
+   free_reserved_area(start, end, POISON_FREE_INITMEM, "initrd");
 }
 #endif
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2, part1 20/29] mm/score: use common help functions to free reserved pages

2013-03-09 Thread Jiang Liu

Use common help functions to free reserved pages.

Signed-off-by: Jiang Liu 
Cc: Chen Liqin 
Cc: Lennox Wu 
---
 arch/score/mm/init.c |   33 +
 1 file changed, 5 insertions(+), 28 deletions(-)

diff --git a/arch/score/mm/init.c b/arch/score/mm/init.c
index cee6bce..1592aad 100644
--- a/arch/score/mm/init.c
+++ b/arch/score/mm/init.c
@@ -43,7 +43,7 @@ EXPORT_SYMBOL_GPL(empty_zero_page);
 
 static struct kcore_list kcore_mem, kcore_vmalloc;
 
-static unsigned long setup_zero_page(void)
+static void setup_zero_page(void)
 {
struct page *page;
 
@@ -52,9 +52,7 @@ static unsigned long setup_zero_page(void)
panic("Oh boy, that early out of memory?");
 
page = virt_to_page((void *) empty_zero_page);
-   SetPageReserved(page);
-
-   return 1UL;
+   mark_page_reserved(page);
 }
 
 #ifndef CONFIG_NEED_MULTIPLE_NODES
@@ -84,7 +82,7 @@ void __init mem_init(void)
 
high_memory = (void *) __va(max_low_pfn << PAGE_SHIFT);
totalram_pages += free_all_bootmem();
-   totalram_pages -= setup_zero_page();/* Setup zeroed pages. */
+   setup_zero_page();  /* Setup zeroed pages. */
reservedpages = 0;
 
for (tmp = 0; tmp < max_low_pfn; tmp++)
@@ -109,37 +107,16 @@ void __init mem_init(void)
 }
 #endif /* !CONFIG_NEED_MULTIPLE_NODES */
 
-static void free_init_pages(const char *what, unsigned long begin, unsigned 
long end)
-{
-   unsigned long pfn;
-
-   for (pfn = PFN_UP(begin); pfn < PFN_DOWN(end); pfn++) {
-   struct page *page = pfn_to_page(pfn);
-   void *addr = phys_to_virt(PFN_PHYS(pfn));
-
-   ClearPageReserved(page);
-   init_page_count(page);
-   memset(addr, POISON_FREE_INITMEM, PAGE_SIZE);
-   __free_page(page);
-   totalram_pages++;
-   }
-   printk(KERN_INFO "Freeing %s: %ldk freed\n", what, (end - begin) >> 10);
-}
-
 #ifdef CONFIG_BLK_DEV_INITRD
 void free_initrd_mem(unsigned long start, unsigned long end)
 {
-   free_init_pages("initrd memory",
-   virt_to_phys((void *) start),
-   virt_to_phys((void *) end));
+   free_reserved_area(start, end, POISON_FREE_INITMEM, "initrd");
 }
 #endif
 
 void __init_refok free_initmem(void)
 {
-   free_init_pages("unused kernel memory",
-   __pa(&__init_begin),
-   __pa(&__init_end));
+   free_initmem_default(POISON_FREE_INITMEM);
 }
 
 unsigned long pgd_current;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2, part1 17/29] mm/parisc: use common help functions to free reserved pages

2013-03-09 Thread Jiang Liu

Use common help functions to free reserved pages.

Signed-off-by: Jiang Liu 
Cc: "James E.J. Bottomley" 
Cc: Helge Deller 
---
 arch/parisc/mm/init.c |   23 ++-
 1 file changed, 2 insertions(+), 21 deletions(-)

diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c
index 3ac462d..de2159a 100644
--- a/arch/parisc/mm/init.c
+++ b/arch/parisc/mm/init.c
@@ -505,7 +505,6 @@ static void __init map_pages(unsigned long start_vaddr,
 
 void free_initmem(void)
 {
-   unsigned long addr;
unsigned long init_begin = (unsigned long)__init_begin;
unsigned long init_end = (unsigned long)__init_end;
 
@@ -533,19 +532,10 @@ void free_initmem(void)
 * pages are no-longer executable */
flush_icache_range(init_begin, init_end);

-   for (addr = init_begin; addr < init_end; addr += PAGE_SIZE) {
-   ClearPageReserved(virt_to_page(addr));
-   init_page_count(virt_to_page(addr));
-   free_page(addr);
-   num_physpages++;
-   totalram_pages++;
-   }
+   num_physpages += free_initmem_default(0);
 
/* set up a new led state on systems shipped LED State panel */
pdc_chassis_send_status(PDC_CHASSIS_DIRECT_BCOMPLETE);
-   
-   printk(KERN_INFO "Freeing unused kernel memory: %luk freed\n",
-   (init_end - init_begin) >> 10);
 }
 
 
@@ -1107,15 +1097,6 @@ void flush_tlb_all(void)
 #ifdef CONFIG_BLK_DEV_INITRD
 void free_initrd_mem(unsigned long start, unsigned long end)
 {
-   if (start >= end)
-   return;
-   printk(KERN_INFO "Freeing initrd memory: %ldk freed\n", (end - start) 
>> 10);
-   for (; start < end; start += PAGE_SIZE) {
-   ClearPageReserved(virt_to_page(start));
-   init_page_count(virt_to_page(start));
-   free_page(start);
-   num_physpages++;
-   totalram_pages++;
-   }
+   num_physpages += free_reserved_area(start, end, 0, "initrd");
 }
 #endif
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2, part1 18/29] mm/ppc: use common help functions to free reserved pages

2013-03-09 Thread Jiang Liu

Use common help functions to free reserved pages.

Signed-off-by: Jiang Liu 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Anatolij Gustschin 
---
 arch/powerpc/kernel/crash_dump.c |5 +
 arch/powerpc/kernel/fadump.c |5 +
 arch/powerpc/kernel/kvm.c|7 +--
 arch/powerpc/mm/mem.c|   29 ++
 arch/powerpc/platforms/512x/mpc512x_shared.c |5 +
 5 files changed, 6 insertions(+), 45 deletions(-)

diff --git a/arch/powerpc/kernel/crash_dump.c b/arch/powerpc/kernel/crash_dump.c
index b3ba516..9ec3fe1 100644
--- a/arch/powerpc/kernel/crash_dump.c
+++ b/arch/powerpc/kernel/crash_dump.c
@@ -150,10 +150,7 @@ void crash_free_reserved_phys_range(unsigned long begin, 
unsigned long end)
if (addr <= rtas_end && ((addr + PAGE_SIZE) > rtas_start))
continue;
 
-   ClearPageReserved(pfn_to_page(addr >> PAGE_SHIFT));
-   init_page_count(pfn_to_page(addr >> PAGE_SHIFT));
-   free_page((unsigned long)__va(addr));
-   totalram_pages++;
+   free_reserved_page(pfn_to_page(addr >> PAGE_SHIFT));
}
 }
 #endif
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 06c8202..2230fd0 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -1045,10 +1045,7 @@ static void fadump_release_memory(unsigned long begin, 
unsigned long end)
if (addr <= ra_end && ((addr + PAGE_SIZE) > ra_start))
continue;
 
-   ClearPageReserved(pfn_to_page(addr >> PAGE_SHIFT));
-   init_page_count(pfn_to_page(addr >> PAGE_SHIFT));
-   free_page((unsigned long)__va(addr));
-   totalram_pages++;
+   free_reserved_page(pfn_to_page(addr >> PAGE_SHIFT));
}
 }
 
diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index a61b133..6782221 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -756,12 +756,7 @@ static __init void kvm_free_tmp(void)
end = (ulong)_tmp[ARRAY_SIZE(kvm_tmp)] & PAGE_MASK;
 
/* Free the tmp space we don't need */
-   for (; start < end; start += PAGE_SIZE) {
-   ClearPageReserved(virt_to_page(start));
-   init_page_count(virt_to_page(start));
-   free_page(start);
-   totalram_pages++;
-   }
+   free_reserved_area(start, end, 0, NULL);
 }
 
 static int __init kvm_guest_init(void)
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index f1f7409..c756713 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -405,39 +405,14 @@ void __init mem_init(void)
 
 void free_initmem(void)
 {
-   unsigned long addr;
-
ppc_md.progress = ppc_printk_progress;
-
-   addr = (unsigned long)__init_begin;
-   for (; addr < (unsigned long)__init_end; addr += PAGE_SIZE) {
-   memset((void *)addr, POISON_FREE_INITMEM, PAGE_SIZE);
-   ClearPageReserved(virt_to_page(addr));
-   init_page_count(virt_to_page(addr));
-   free_page(addr);
-   totalram_pages++;
-   }
-   pr_info("Freeing unused kernel memory: %luk freed\n",
-   ((unsigned long)__init_end -
-   (unsigned long)__init_begin) >> 10);
+   free_initmem_default(POISON_FREE_INITMEM);
 }
 
 #ifdef CONFIG_BLK_DEV_INITRD
 void __init free_initrd_mem(unsigned long start, unsigned long end)
 {
-   if (start >= end)
-   return;
-
-   start = _ALIGN_DOWN(start, PAGE_SIZE);
-   end = _ALIGN_UP(end, PAGE_SIZE);
-   pr_info("Freeing initrd memory: %ldk freed\n", (end - start) >> 10);
-
-   for (; start < end; start += PAGE_SIZE) {
-   ClearPageReserved(virt_to_page(start));
-   init_page_count(virt_to_page(start));
-   free_page(start);
-   totalram_pages++;
-   }
+   free_reserved_area(start, end, 0, "initrd");
 }
 #endif
 
diff --git a/arch/powerpc/platforms/512x/mpc512x_shared.c 
b/arch/powerpc/platforms/512x/mpc512x_shared.c
index d30235b..db6ac38 100644
--- a/arch/powerpc/platforms/512x/mpc512x_shared.c
+++ b/arch/powerpc/platforms/512x/mpc512x_shared.c
@@ -172,12 +172,9 @@ static struct fsl_diu_shared_fb __attribute__ 
((__aligned__(8))) diu_shared_fb;
 
 static inline void mpc512x_free_bootmem(struct page *page)
 {
-   __ClearPageReserved(page);
BUG_ON(PageTail(page));
BUG_ON(atomic_read(>_count) > 1);
-   atomic_set(>_count, 1);
-   __free_page(page);
-   totalram_pages++;
+   free_reserved_page(page);
 }
 
 void mpc512x_release_bootmem(void)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at

[PATCH v2, part1 16/29] mm/openrisc: use common help functions to free reserved pages

2013-03-09 Thread Jiang Liu

Use common help functions to free reserved pages.
Also include  to avoid local declarations.

Signed-off-by: Jiang Liu 
Cc: Jonas Bonn 
---
 arch/openrisc/mm/init.c |   27 +++
 1 file changed, 3 insertions(+), 24 deletions(-)

diff --git a/arch/openrisc/mm/init.c b/arch/openrisc/mm/init.c
index e7fdc50..b3cbc67 100644
--- a/arch/openrisc/mm/init.c
+++ b/arch/openrisc/mm/init.c
@@ -43,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 
 
 int mem_init_done;
 
@@ -201,9 +202,6 @@ void __init paging_init(void)
 
 /* References to section boundaries */
 
-extern char _stext, _etext, _edata, __bss_start, _end;
-extern char __init_begin, __init_end;
-
 static int __init free_pages_init(void)
 {
int reservedpages, pfn;
@@ -263,30 +261,11 @@ void __init mem_init(void)
 #ifdef CONFIG_BLK_DEV_INITRD
 void free_initrd_mem(unsigned long start, unsigned long end)
 {
-   printk(KERN_INFO "Freeing initrd memory: %ldk freed\n",
-  (end - start) >> 10);
-
-   for (; start < end; start += PAGE_SIZE) {
-   ClearPageReserved(virt_to_page(start));
-   init_page_count(virt_to_page(start));
-   free_page(start);
-   totalram_pages++;
-   }
+   free_reserved_area(start, end, 0, "initrd");
 }
 #endif
 
 void free_initmem(void)
 {
-   unsigned long addr;
-
-   addr = (unsigned long)(&__init_begin);
-   for (; addr < (unsigned long)(&__init_end); addr += PAGE_SIZE) {
-   ClearPageReserved(virt_to_page(addr));
-   init_page_count(virt_to_page(addr));
-   free_page(addr);
-   totalram_pages++;
-   }
-   printk(KERN_INFO "Freeing unused kernel memory: %luk freed\n",
-  ((unsigned long)&__init_end -
-   (unsigned long)&__init_begin) >> 10);
+   free_initmem_default(0);
 }
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2, part1 14/29] mm/MIPS: use common help functions to free reserved pages

2013-03-09 Thread Jiang Liu

Use common help functions to free reserved pages.

Signed-off-by: Jiang Liu 
Cc: Ralf Baechle 
---
 arch/mips/mm/init.c  |   31 +--
 arch/mips/sgi-ip27/ip27-memory.c |4 ++--
 2 files changed, 11 insertions(+), 24 deletions(-)

diff --git a/arch/mips/mm/init.c b/arch/mips/mm/init.c
index 6792925..60f7c61 100644
--- a/arch/mips/mm/init.c
+++ b/arch/mips/mm/init.c
@@ -77,10 +77,9 @@ EXPORT_SYMBOL_GPL(empty_zero_page);
 /*
  * Not static inline because used by IP27 special magic initialization code
  */
-unsigned long setup_zero_pages(void)
+void setup_zero_pages(void)
 {
-   unsigned int order;
-   unsigned long size;
+   unsigned int order, i;
struct page *page;
 
if (cpu_has_vce)
@@ -94,15 +93,10 @@ unsigned long setup_zero_pages(void)
 
page = virt_to_page((void *)empty_zero_page);
split_page(page, order);
-   while (page < virt_to_page((void *)(empty_zero_page + (PAGE_SIZE << 
order {
-   SetPageReserved(page);
-   page++;
-   }
-
-   size = PAGE_SIZE << order;
-   zero_page_mask = (size - 1) & PAGE_MASK;
+   for (i = 0; i < (1 << order); i++, page++)
+   mark_page_reserved(page);
 
-   return 1UL << order;
+   zero_page_mask = ((PAGE_SIZE << order) - 1) & PAGE_MASK;
 }
 
 #ifdef CONFIG_MIPS_MT_SMTC
@@ -380,7 +374,7 @@ void __init mem_init(void)
high_memory = (void *) __va(max_low_pfn << PAGE_SHIFT);
 
totalram_pages += free_all_bootmem();
-   totalram_pages -= setup_zero_pages();   /* Setup zeroed pages.  */
+   setup_zero_pages(); /* Setup zeroed pages.  */
 
reservedpages = ram = 0;
for (tmp = 0; tmp < max_low_pfn; tmp++)
@@ -440,11 +434,8 @@ void free_init_pages(const char *what, unsigned long 
begin, unsigned long end)
struct page *page = pfn_to_page(pfn);
void *addr = phys_to_virt(PFN_PHYS(pfn));
 
-   ClearPageReserved(page);
-   init_page_count(page);
memset(addr, POISON_FREE_INITMEM, PAGE_SIZE);
-   __free_page(page);
-   totalram_pages++;
+   free_reserved_page(page);
}
printk(KERN_INFO "Freeing %s: %ldk freed\n", what, (end - begin) >> 10);
 }
@@ -452,18 +443,14 @@ void free_init_pages(const char *what, unsigned long 
begin, unsigned long end)
 #ifdef CONFIG_BLK_DEV_INITRD
 void free_initrd_mem(unsigned long start, unsigned long end)
 {
-   free_init_pages("initrd memory",
-   virt_to_phys((void *)start),
-   virt_to_phys((void *)end));
+   free_reserved_area(start, end, POISON_FREE_INITMEM, "initrd");
 }
 #endif
 
 void __init_refok free_initmem(void)
 {
prom_free_prom_memory();
-   free_init_pages("unused kernel memory",
-   __pa_symbol(&__init_begin),
-   __pa_symbol(&__init_end));
+   free_initmem_default(POISON_FREE_INITMEM);
 }
 
 #ifndef CONFIG_MIPS_PGD_C0_CONTEXT
diff --git a/arch/mips/sgi-ip27/ip27-memory.c b/arch/mips/sgi-ip27/ip27-memory.c
index 3505d08..5f2bddb 100644
--- a/arch/mips/sgi-ip27/ip27-memory.c
+++ b/arch/mips/sgi-ip27/ip27-memory.c
@@ -457,7 +457,7 @@ void __init prom_free_prom_memory(void)
/* We got nothing to free here ...  */
 }
 
-extern unsigned long setup_zero_pages(void);
+extern void setup_zero_pages(void);
 
 void __init paging_init(void)
 {
@@ -492,7 +492,7 @@ void __init mem_init(void)
totalram_pages += free_all_bootmem_node(NODE_DATA(node));
}
 
-   totalram_pages -= setup_zero_pages();   /* This comes from node 0 */
+   setup_zero_pages(); /* This comes from node 0 */
 
codesize =  (unsigned long) &_etext - (unsigned long) &_text;
datasize =  (unsigned long) &_edata - (unsigned long) &_etext;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2, part1 15/29] mm/mn10300: use common help functions to free reserved pages

2013-03-09 Thread Jiang Liu

Use common help functions to free reserved pages.

Signed-off-by: Jiang Liu 
Cc: Koichi Yasutake 
---
 arch/mn10300/mm/init.c |   23 ++-
 1 file changed, 2 insertions(+), 21 deletions(-)

diff --git a/arch/mn10300/mm/init.c b/arch/mn10300/mm/init.c
index e57e5bc..5a8ace6 100644
--- a/arch/mn10300/mm/init.c
+++ b/arch/mn10300/mm/init.c
@@ -139,30 +139,11 @@ void __init mem_init(void)
 }
 
 /*
- *
- */
-void free_init_pages(char *what, unsigned long begin, unsigned long end)
-{
-   unsigned long addr;
-
-   for (addr = begin; addr < end; addr += PAGE_SIZE) {
-   ClearPageReserved(virt_to_page(addr));
-   init_page_count(virt_to_page(addr));
-   memset((void *) addr, 0xcc, PAGE_SIZE);
-   free_page(addr);
-   totalram_pages++;
-   }
-   printk(KERN_INFO "Freeing %s: %ldk freed\n", what, (end - begin) >> 10);
-}
-
-/*
  * recycle memory containing stuff only required for initialisation
  */
 void free_initmem(void)
 {
-   free_init_pages("unused kernel memory",
-   (unsigned long) &__init_begin,
-   (unsigned long) &__init_end);
+   free_initmem_default(POISON_FREE_INITMEM);
 }
 
 /*
@@ -171,6 +152,6 @@ void free_initmem(void)
 #ifdef CONFIG_BLK_DEV_INITRD
 void free_initrd_mem(unsigned long start, unsigned long end)
 {
-   free_init_pages("initrd memory", start, end);
+   free_reserved_area(start, end, POISON_FREE_INITMEM, "initrd");
 }
 #endif
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2, part1 12/29] mm/m68k: use common help functions to free reserved pages

2013-03-09 Thread Jiang Liu

Use common help functions to free reserved pages.

Signed-off-by: Jiang Liu 
Cc: Geert Uytterhoeven 
---
 arch/m68k/mm/init.c |   24 ++--
 1 file changed, 2 insertions(+), 22 deletions(-)

diff --git a/arch/m68k/mm/init.c b/arch/m68k/mm/init.c
index afd8106f..b5c1ab1 100644
--- a/arch/m68k/mm/init.c
+++ b/arch/m68k/mm/init.c
@@ -110,18 +110,7 @@ void __init paging_init(void)
 void free_initmem(void)
 {
 #ifndef CONFIG_MMU_SUN3
-   unsigned long addr;
-
-   addr = (unsigned long) __init_begin;
-   for (; addr < ((unsigned long) __init_end); addr += PAGE_SIZE) {
-   ClearPageReserved(virt_to_page(addr));
-   init_page_count(virt_to_page(addr));
-   free_page(addr);
-   totalram_pages++;
-   }
-   pr_notice("Freeing unused kernel memory: %luk freed (0x%x - 0x%x)\n",
-   (addr - (unsigned long) __init_begin) >> 10,
-   (unsigned int) __init_begin, (unsigned int) __init_end);
+   free_initmem_default(0);
 #endif /* CONFIG_MMU_SUN3 */
 }
 
@@ -213,15 +202,6 @@ void __init mem_init(void)
 #ifdef CONFIG_BLK_DEV_INITRD
 void free_initrd_mem(unsigned long start, unsigned long end)
 {
-   int pages = 0;
-   for (; start < end; start += PAGE_SIZE) {
-   ClearPageReserved(virt_to_page(start));
-   init_page_count(virt_to_page(start));
-   free_page(start);
-   totalram_pages++;
-   pages++;
-   }
-   pr_notice("Freeing initrd memory: %dk freed\n",
-   pages << (PAGE_SHIFT - 10));
+   free_reserved_area(start, end, 0, "initrd");
 }
 #endif
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2, part1 13/29] mm/microblaze: use common help functions to free reserved pages

2013-03-09 Thread Jiang Liu

Use common help functions to free reserved pages.

Signed-off-by: Jiang Liu 
Cc: Michal Simek 
---
 arch/microblaze/include/asm/setup.h |1 -
 arch/microblaze/mm/init.c   |   28 ++--
 2 files changed, 2 insertions(+), 27 deletions(-)

diff --git a/arch/microblaze/include/asm/setup.h 
b/arch/microblaze/include/asm/setup.h
index 0e0b0a5..f05df56 100644
--- a/arch/microblaze/include/asm/setup.h
+++ b/arch/microblaze/include/asm/setup.h
@@ -46,7 +46,6 @@ void machine_shutdown(void);
 void machine_halt(void);
 void machine_power_off(void);
 
-void free_init_pages(char *what, unsigned long begin, unsigned long end);
 extern void *alloc_maybe_bootmem(size_t size, gfp_t mask);
 extern void *zalloc_maybe_bootmem(size_t size, gfp_t mask);
 
diff --git a/arch/microblaze/mm/init.c b/arch/microblaze/mm/init.c
index 8f8b367..9be5302 100644
--- a/arch/microblaze/mm/init.c
+++ b/arch/microblaze/mm/init.c
@@ -236,40 +236,16 @@ void __init setup_memory(void)
paging_init();
 }
 
-void free_init_pages(char *what, unsigned long begin, unsigned long end)
-{
-   unsigned long addr;
-
-   for (addr = begin; addr < end; addr += PAGE_SIZE) {
-   ClearPageReserved(virt_to_page(addr));
-   init_page_count(virt_to_page(addr));
-   free_page(addr);
-   totalram_pages++;
-   }
-   pr_info("Freeing %s: %ldk freed\n", what, (end - begin) >> 10);
-}
-
 #ifdef CONFIG_BLK_DEV_INITRD
 void free_initrd_mem(unsigned long start, unsigned long end)
 {
-   int pages = 0;
-   for (; start < end; start += PAGE_SIZE) {
-   ClearPageReserved(virt_to_page(start));
-   init_page_count(virt_to_page(start));
-   free_page(start);
-   totalram_pages++;
-   pages++;
-   }
-   pr_notice("Freeing initrd memory: %dk freed\n",
-   (int)(pages * (PAGE_SIZE / 1024)));
+   free_reserved_area(start, end, 0, "initrd");
 }
 #endif
 
 void free_initmem(void)
 {
-   free_init_pages("unused kernel memory",
-   (unsigned long)(&__init_begin),
-   (unsigned long)(&__init_end));
+   free_initmem_default(0);
 }
 
 void __init mem_init(void)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2, part1 10/29] mm/IA64: use common help functions to free reserved pages

2013-03-09 Thread Jiang Liu

Use common help functions to free reserved pages.

Signed-off-by: Jiang Liu 
Cc: Tony Luck 
Cc: Fenghua Yu 
---
 arch/ia64/mm/init.c |   23 ---
 1 file changed, 4 insertions(+), 19 deletions(-)

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 20bc967..d1fe4b4 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -154,25 +154,14 @@ ia64_init_addr_space (void)
 void
 free_initmem (void)
 {
-   unsigned long addr, eaddr;
-
-   addr = (unsigned long) ia64_imva(__init_begin);
-   eaddr = (unsigned long) ia64_imva(__init_end);
-   while (addr < eaddr) {
-   ClearPageReserved(virt_to_page(addr));
-   init_page_count(virt_to_page(addr));
-   free_page(addr);
-   ++totalram_pages;
-   addr += PAGE_SIZE;
-   }
-   printk(KERN_INFO "Freeing unused kernel memory: %ldkB freed\n",
-  (__init_end - __init_begin) >> 10);
+   free_reserved_area((unsigned long)ia64_imva(__init_begin),
+  (unsigned long)ia64_imva(__init_end),
+  0, "unused kernel");
 }
 
 void __init
 free_initrd_mem (unsigned long start, unsigned long end)
 {
-   struct page *page;
/*
 * EFI uses 4KB pages while the kernel can use 4KB or bigger.
 * Thus EFI and the kernel may have different page sizes. It is
@@ -213,11 +202,7 @@ free_initrd_mem (unsigned long start, unsigned long end)
for (; start < end; start += PAGE_SIZE) {
if (!virt_addr_valid(start))
continue;
-   page = virt_to_page(start);
-   ClearPageReserved(page);
-   init_page_count(page);
-   free_page(start);
-   ++totalram_pages;
+   free_reserved_page(virt_to_page(start));
}
 }
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2, part1 11/29] mm/m32r: use common help functions to free reserved pages

2013-03-09 Thread Jiang Liu

Use common help functions to free reserved pages.
Also include  to avoid local declarations.

Signed-off-by: Jiang Liu 
Cc: Hirokazu Takata 
---
 arch/m32r/mm/init.c |   26 +++---
 1 file changed, 3 insertions(+), 23 deletions(-)

diff --git a/arch/m32r/mm/init.c b/arch/m32r/mm/init.c
index 78b660e..ab4cbce 100644
--- a/arch/m32r/mm/init.c
+++ b/arch/m32r/mm/init.c
@@ -28,10 +28,7 @@
 #include 
 #include 
 #include 
-
-/* References to section boundaries */
-extern char _text, _etext, _edata;
-extern char __init_begin, __init_end;
+#include 
 
 pgd_t swapper_pg_dir[1024];
 
@@ -184,17 +181,7 @@ void __init mem_init(void)
  *==*/
 void free_initmem(void)
 {
-   unsigned long addr;
-
-   addr = (unsigned long)(&__init_begin);
-   for (; addr < (unsigned long)(&__init_end); addr += PAGE_SIZE) {
-   ClearPageReserved(virt_to_page(addr));
-   init_page_count(virt_to_page(addr));
-   free_page(addr);
-   totalram_pages++;
-   }
-   printk (KERN_INFO "Freeing unused kernel memory: %dk freed\n", \
- (int)(&__init_end - &__init_begin) >> 10);
+   free_initmem_default(0);
 }
 
 #ifdef CONFIG_BLK_DEV_INITRD
@@ -204,13 +191,6 @@ void free_initmem(void)
  *==*/
 void free_initrd_mem(unsigned long start, unsigned long end)
 {
-   unsigned long p;
-   for (p = start; p < end; p += PAGE_SIZE) {
-   ClearPageReserved(virt_to_page(p));
-   init_page_count(virt_to_page(p));
-   free_page(p);
-   totalram_pages++;
-   }
-   printk (KERN_INFO "Freeing initrd memory: %ldk freed\n", (end - start) 
>> 10);
+   free_reserved_area(start, end, 0, "initrd");
 }
 #endif
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2, part1 09/29] mm/h8300: use common help functions to free reserved pages

2013-03-09 Thread Jiang Liu

Use common help functions to free reserved pages.

Signed-off-by: Jiang Liu 
Cc: Yoshinori Sato 
---
 arch/h8300/mm/init.c |   30 +++---
 1 file changed, 3 insertions(+), 27 deletions(-)

diff --git a/arch/h8300/mm/init.c b/arch/h8300/mm/init.c
index 981e250..ff349d7 100644
--- a/arch/h8300/mm/init.c
+++ b/arch/h8300/mm/init.c
@@ -139,7 +139,7 @@ void __init mem_init(void)
start_mem = PAGE_ALIGN(start_mem);
max_mapnr = num_physpages = MAP_NR(high_memory);
 
-   /* this will put all memory onto the freelists */
+   /* this will put all low memory onto the freelists */
totalram_pages = free_all_bootmem();
 
codek = (_etext - _stext) >> 10;
@@ -161,15 +161,7 @@ void __init mem_init(void)
 #ifdef CONFIG_BLK_DEV_INITRD
 void free_initrd_mem(unsigned long start, unsigned long end)
 {
-   int pages = 0;
-   for (; start < end; start += PAGE_SIZE) {
-   ClearPageReserved(virt_to_page(start));
-   init_page_count(virt_to_page(start));
-   free_page(start);
-   totalram_pages++;
-   pages++;
-   }
-   printk ("Freeing initrd memory: %dk freed\n", pages);
+   free_reserved_area(start, end, 0, "initrd");
 }
 #endif
 
@@ -177,23 +169,7 @@ void
 free_initmem(void)
 {
 #ifdef CONFIG_RAMKERNEL
-   unsigned long addr;
-/*
- * the following code should be cool even if these sections
- * are not page aligned.
- */
-   addr = PAGE_ALIGN((unsigned long)(__init_begin));
-   /* next to check that the page we free is not a partial page */
-   for (; addr + PAGE_SIZE < (unsigned long)__init_end; addr +=PAGE_SIZE) {
-   ClearPageReserved(virt_to_page(addr));
-   init_page_count(virt_to_page(addr));
-   free_page(addr);
-   totalram_pages++;
-   }
-   printk(KERN_INFO "Freeing unused kernel memory: %ldk freed (0x%x - 
0x%x)\n",
-   (addr - PAGE_ALIGN((long) __init_begin)) >> 10,
-   (int)(PAGE_ALIGN((unsigned long)__init_begin)),
-   (int)(addr - PAGE_SIZE));
+   free_initmem_default(0);
 #endif
 }
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2, part1 08/29] mm/FRV: use common help functions to free reserved pages

2013-03-09 Thread Jiang Liu

Use common help functions to free reserved pages.

Signed-off-by: Jiang Liu 
Cc: David Howells 
---
 arch/frv/mm/init.c |   34 --
 1 file changed, 4 insertions(+), 30 deletions(-)

diff --git a/arch/frv/mm/init.c b/arch/frv/mm/init.c
index 92e97b0..21b9290 100644
--- a/arch/frv/mm/init.c
+++ b/arch/frv/mm/init.c
@@ -122,7 +122,7 @@ void __init mem_init(void)
 #endif
int codek = 0, datak = 0;
 
-   /* this will put all memory onto the freelists */
+   /* this will put all low memory onto the freelists */
totalram_pages = free_all_bootmem();
 
 #ifdef CONFIG_MMU
@@ -132,11 +132,7 @@ void __init mem_init(void)
 
 #ifdef CONFIG_HIGHMEM
for (pfn = num_physpages - 1; pfn >= num_mappedpages; pfn--) {
-   struct page *page = _map[pfn];
-
-   ClearPageReserved(page);
-   init_page_count(page);
-   __free_page(page);
+   __free_reserved_page(_map[pfn]);
totalram_pages++;
}
 #endif
@@ -168,21 +164,7 @@ void __init mem_init(void)
 void free_initmem(void)
 {
 #if defined(CONFIG_RAMKERNEL) && !defined(CONFIG_PROTECT_KERNEL)
-   unsigned long start, end, addr;
-
-   start = PAGE_ALIGN((unsigned long) &__init_begin);  /* round up */
-   end   = ((unsigned long) &__init_end) & PAGE_MASK;  /* round down */
-
-   /* next to check that the page we free is not a partial page */
-   for (addr = start; addr < end; addr += PAGE_SIZE) {
-   ClearPageReserved(virt_to_page(addr));
-   init_page_count(virt_to_page(addr));
-   free_page(addr);
-   totalram_pages++;
-   }
-
-   printk("Freeing unused kernel memory: %ldKiB freed (0x%lx - 0x%lx)\n",
-  (end - start) >> 10, start, end);
+   free_initmem_default(0);
 #endif
 } /* end free_initmem() */
 
@@ -193,14 +175,6 @@ void free_initmem(void)
 #ifdef CONFIG_BLK_DEV_INITRD
 void __init free_initrd_mem(unsigned long start, unsigned long end)
 {
-   int pages = 0;
-   for (; start < end; start += PAGE_SIZE) {
-   ClearPageReserved(virt_to_page(start));
-   init_page_count(virt_to_page(start));
-   free_page(start);
-   totalram_pages++;
-   pages++;
-   }
-   printk("Freeing initrd memory: %dKiB freed\n", (pages * PAGE_SIZE) >> 
10);
+   free_reserved_area(start, end, 0, "initrd");
 } /* end free_initrd_mem() */
 #endif
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2, part1 05/29] mm/blackfin: use common help functions to free reserved pages

2013-03-09 Thread Jiang Liu

Use common help functions to free reserved pages.

Signed-off-by: Jiang Liu 
Cc: Mike Frysinger 
---
 arch/blackfin/mm/init.c |   22 +++---
 1 file changed, 3 insertions(+), 19 deletions(-)

diff --git a/arch/blackfin/mm/init.c b/arch/blackfin/mm/init.c
index 9cb8553..82d01a7 100644
--- a/arch/blackfin/mm/init.c
+++ b/arch/blackfin/mm/init.c
@@ -103,7 +103,7 @@ void __init mem_init(void)
max_mapnr = num_physpages = MAP_NR(high_memory);
printk(KERN_DEBUG "Kernel managed physical pages: %lu\n", 
num_physpages);
 
-   /* This will put all memory onto the freelists. */
+   /* This will put all low memory onto the freelists. */
totalram_pages = free_all_bootmem();
 
reservedpages = 0;
@@ -129,24 +129,11 @@ void __init mem_init(void)
initk, codek, datak, DMA_UNCACHED_REGION >> 10, (reservedpages 
<< (PAGE_SHIFT-10)));
 }
 
-static void __init free_init_pages(const char *what, unsigned long begin, 
unsigned long end)
-{
-   unsigned long addr;
-   /* next to check that the page we free is not a partial page */
-   for (addr = begin; addr + PAGE_SIZE <= end; addr += PAGE_SIZE) {
-   ClearPageReserved(virt_to_page(addr));
-   init_page_count(virt_to_page(addr));
-   free_page(addr);
-   totalram_pages++;
-   }
-   printk(KERN_INFO "Freeing %s: %ldk freed\n", what, (end - begin) >> 10);
-}
-
 #ifdef CONFIG_BLK_DEV_INITRD
 void __init free_initrd_mem(unsigned long start, unsigned long end)
 {
 #ifndef CONFIG_MPU
-   free_init_pages("initrd memory", start, end);
+   free_reserved_area(start, end, 0, "initrd");
 #endif
 }
 #endif
@@ -154,10 +141,7 @@ void __init free_initrd_mem(unsigned long start, unsigned 
long end)
 void __init_refok free_initmem(void)
 {
 #if defined CONFIG_RAMKERNEL && !defined CONFIG_MPU
-   free_init_pages("unused kernel memory",
-   (unsigned long)(&__init_begin),
-   (unsigned long)(&__init_end));
-
+   free_initmem_default(0);
if (memory_start == (unsigned long)(&__init_end))
memory_start = (unsigned long)(&__init_begin);
 #endif
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2, part1 07/29] mm/cris: use common help functions to free reserved pages

2013-03-09 Thread Jiang Liu

Use common help functions to free reserved pages.
Also include  to avoid local declaration.

Signed-off-by: Jiang Liu 
Cc: Mikael Starvik 
---
 arch/cris/mm/init.c |   16 ++--
 1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/arch/cris/mm/init.c b/arch/cris/mm/init.c
index d72ab58..9ac8094 100644
--- a/arch/cris/mm/init.c
+++ b/arch/cris/mm/init.c
@@ -12,12 +12,10 @@
 #include 
 #include 
 #include 
+#include 
 
 unsigned long empty_zero_page;
 
-extern char _stext, _edata, _etext; /* From linkerscript */
-extern char __init_begin, __init_end;
-
 void __init
 mem_init(void)
 {
@@ -67,15 +65,5 @@ mem_init(void)
 void 
 free_initmem(void)
 {
-unsigned long addr;
-
-addr = (unsigned long)(&__init_begin);
-for (; addr < (unsigned long)(&__init_end); addr += PAGE_SIZE) {
-ClearPageReserved(virt_to_page(addr));
-init_page_count(virt_to_page(addr));
-free_page(addr);
-totalram_pages++;
-}
-printk (KERN_INFO "Freeing unused kernel memory: %luk freed\n",
-   (unsigned long)((&__init_end - &__init_begin) >> 10));
+   free_initmem_default(0);
 }
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2, part1 02/29] mm/alpha: use common help functions to free reserved pages

2013-03-09 Thread Jiang Liu

Use common help functions to free reserved pages.
Also include  to avoid local declarations.

Signed-off-by: Jiang Liu 
Cc: Richard Henderson 
Cc: Ivan Kokshaysky 
Cc: Matt Turner 
---
 arch/alpha/kernel/sys_nautilus.c |5 ++---
 arch/alpha/mm/init.c |   24 +++-
 arch/alpha/mm/numa.c |3 +--
 3 files changed, 6 insertions(+), 26 deletions(-)

diff --git a/arch/alpha/kernel/sys_nautilus.c b/arch/alpha/kernel/sys_nautilus.c
index 4d4c046..a8b9d66 100644
--- a/arch/alpha/kernel/sys_nautilus.c
+++ b/arch/alpha/kernel/sys_nautilus.c
@@ -185,7 +185,6 @@ nautilus_machine_check(unsigned long vector, unsigned long 
la_ptr)
mb();
 }
 
-extern void free_reserved_mem(void *, void *);
 extern void pcibios_claim_one_bus(struct pci_bus *);
 
 static struct resource irongate_mem = {
@@ -234,8 +233,8 @@ nautilus_init_pci(void)
if (pci_mem < memtop)
memtop = pci_mem;
if (memtop > alpha_mv.min_mem_address) {
-   free_reserved_mem(__va(alpha_mv.min_mem_address),
- __va(memtop));
+   free_reserved_area((unsigned 
long)__va(alpha_mv.min_mem_address),
+  (unsigned long)__va(memtop), 0, NULL);
printk("nautilus_init_pci: %ldk freed\n",
(memtop - alpha_mv.min_mem_address) >> 10);
}
diff --git a/arch/alpha/mm/init.c b/arch/alpha/mm/init.c
index 1ad6ca7..0ba85ee 100644
--- a/arch/alpha/mm/init.c
+++ b/arch/alpha/mm/init.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 extern void die_if_kernel(char *,struct pt_regs *,long);
 
@@ -281,8 +282,6 @@ printk_memory_info(void)
 {
unsigned long codesize, reservedpages, datasize, initsize, tmp;
extern int page_is_ram(unsigned long) __init;
-   extern char _text, _etext, _data, _edata;
-   extern char __init_begin, __init_end;
 
/* printk all informations */
reservedpages = 0;
@@ -318,32 +317,15 @@ mem_init(void)
 #endif /* CONFIG_DISCONTIGMEM */
 
 void
-free_reserved_mem(void *start, void *end)
-{
-   void *__start = start;
-   for (; __start < end; __start += PAGE_SIZE) {
-   ClearPageReserved(virt_to_page(__start));
-   init_page_count(virt_to_page(__start));
-   free_page((long)__start);
-   totalram_pages++;
-   }
-}
-
-void
 free_initmem(void)
 {
-   extern char __init_begin, __init_end;
-
-   free_reserved_mem(&__init_begin, &__init_end);
-   printk ("Freeing unused kernel memory: %ldk freed\n",
-   (&__init_end - &__init_begin) >> 10);
+   free_initmem_default(0);
 }
 
 #ifdef CONFIG_BLK_DEV_INITRD
 void
 free_initrd_mem(unsigned long start, unsigned long end)
 {
-   free_reserved_mem((void *)start, (void *)end);
-   printk ("Freeing initrd memory: %ldk freed\n", (end - start) >> 10);
+   free_reserved_area(start, end, 0, "initrd");
 }
 #endif
diff --git a/arch/alpha/mm/numa.c b/arch/alpha/mm/numa.c
index 3973ae3..3388504 100644
--- a/arch/alpha/mm/numa.c
+++ b/arch/alpha/mm/numa.c
@@ -17,6 +17,7 @@
 
 #include 
 #include 
+#include 
 
 pg_data_t node_data[MAX_NUMNODES];
 EXPORT_SYMBOL(node_data);
@@ -325,8 +326,6 @@ void __init mem_init(void)
 {
unsigned long codesize, reservedpages, datasize, initsize, pfn;
extern int page_is_ram(unsigned long) __init;
-   extern char _text, _etext, _data, _edata;
-   extern char __init_begin, __init_end;
unsigned long nid, i;
high_memory = (void *) __va(max_low_pfn << PAGE_SHIFT);
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm, x86: no zeroing of hugetlbfs pages at boot

2013-03-09 Thread Hillf Danton

On Thu, Mar 7, 2013 at 5:50 AM, Cliff Wickman  wrote:
> From: Cliff Wickman 
>
> Allocating a large number of 1GB hugetlbfs pages at boot takes a
> very long time.
>
> Large system sites would at times like to allocate a very large amount of
> memory as 1GB pages.  They would put this on the kernel boot line:
>default_hugepagesz=1G hugepagesz=1G hugepages=4096
> [Dynamic allocation of 1G pages is not an option, as zone pages only go
>  up to MAX_ORDER, and MAX_ORDER cannot exceed the section size.]
>
> Each page is zeroed as it is allocated, and all allocation is done by
> cpu 0, as this path is early in boot:
>   start_kernel
> kernel_init
>   do_pre_smp_initcalls
> hugetlb_init
>   hugetlb_init_hstates
> hugetlb_hstate_alloc_pages
>
> Zeroing remote (offnode) memory takes ~1GB/sec (and most memory is offnode
> on large numa systems).
> This estimate is approximate (it depends on core frequency & number of hops
> to remote memory) but should be within a factor of 2 on most systems.
> A benchmark attempting to reserve a TB for 1GB pages would thus require
> ~1000 seconds of boot time just for this allocating.  32TB would take 8 hours.
>
> I propose passing a flag to the early allocator to indicate that no zeroing
> of a page should be done.  The 'no zeroing' flag would have to be passed
> down this code path:
>

FYI: huge pages are cleared just after allocated, for instance,
clear_huge_page() in hugetlb_no_page()

Hillf
>   hugetlb_hstate_alloc_pages
> alloc_bootmem_huge_page
>   __alloc_bootmem_node_nopanic NO_ZERO  (nobootmem.c)
> __alloc_memory_core_early  NO_ZERO
>   if (!(flags & NO_ZERO))
> memset(ptr, 0, size);
>
> Or this path if CONFIG_NO_BOOTMEM is not set:
>
>   hugetlb_hstate_alloc_pages
> alloc_bootmem_huge_page
>   __alloc_bootmem_node_nopanic  NO_ZERO  (bootmem.c)
> alloc_bootmem_core  NO_ZERO
>   if (!(flags & NO_ZERO))
> memset(region, 0, size);
> __alloc_bootmem_nopanic NO_ZERO
>   ___alloc_bootmem_nopanic  NO_ZERO
> alloc_bootmem_core  NO_ZERO
>   if (!(flags & NO_ZERO))
> memset(region, 0, size);
>
> Signed-off-by: Cliff Wickman 
>
> ---
>  arch/x86/kernel/setup_percpu.c |4 ++--
>  include/linux/bootmem.h|   23 ---
>  mm/bootmem.c   |   12 +++-
>  mm/hugetlb.c   |3 ++-
>  mm/nobootmem.c |   41 
> +++--
>  mm/page_cgroup.c   |2 +-
>  mm/sparse.c|2 +-
>  7 files changed, 52 insertions(+), 35 deletions(-)
>
> Index: linux/include/linux/bootmem.h
> ===
> --- linux.orig/include/linux/bootmem.h
> +++ linux/include/linux/bootmem.h
> @@ -8,6 +8,11 @@
>  #include 
>
>  /*
> + * allocation flags
> + */
> +#define NO_ZERO0x0001
> +
> +/*
>   *  simple boot-time physical memory area allocator.
>   */
>
> @@ -79,7 +84,8 @@ extern void *__alloc_bootmem(unsigned lo
>  unsigned long goal);
>  extern void *__alloc_bootmem_nopanic(unsigned long size,
>  unsigned long align,
> -unsigned long goal);
> +unsigned long goal,
> +u32 flags);
>  extern void *__alloc_bootmem_node(pg_data_t *pgdat,
>   unsigned long size,
>   unsigned long align,
> @@ -91,12 +97,14 @@ void *__alloc_bootmem_node_high(pg_data_
>  extern void *__alloc_bootmem_node_nopanic(pg_data_t *pgdat,
>   unsigned long size,
>   unsigned long align,
> - unsigned long goal);
> + unsigned long goal,
> + u32 flags);
>  void *___alloc_bootmem_node_nopanic(pg_data_t *pgdat,
>   unsigned long size,
>   unsigned long align,
>   unsigned long goal,
> - unsigned long limit);
> + unsigned long limit,
> + u32 flags);
>  extern void *__alloc_bootmem_low(unsigned long size,
>  unsigned long align,
>  unsigned long goal);
> @@ -120,19 +128,20 @@ extern void *__alloc_bootmem_low_node(pg
>  #define alloc_bootmem_align(x, align) \
> __alloc_bootmem(x, align, BOOTMEM_LOW_LIMIT)
>  #define alloc_bootmem_nopanic(x) \
> -   __alloc_bootmem_nopanic(x, SMP_CACHE_BYTES, BOOTMEM_LOW_LIMIT)
> +   __alloc_bootmem_nopanic(x, SMP_CACHE_BYTES, BOOTMEM_LOW_LIMIT, 0)
>

[PATCH] vmscan: minor cleanup for kswapd

2013-03-09 Thread Hillf Danton

The local variable, total_scanned, is no longer used, so clean up now.

Signed-off-by: Hillf Danton 
---

--- a/mm/vmscan.c   Thu Feb 21 20:01:02 2013
+++ b/mm/vmscan.c   Sun Mar 10 12:52:10 2013
@@ -2619,7 +2619,6 @@ static unsigned long balance_pgdat(pg_da
bool pgdat_is_balanced = false;
int i;
int end_zone = 0;   /* Inclusive.  0 = ZONE_DMA */
-   unsigned long total_scanned;
struct reclaim_state *reclaim_state = current->reclaim_state;
unsigned long nr_soft_reclaimed;
unsigned long nr_soft_scanned;
@@ -2639,7 +2638,6 @@ static unsigned long balance_pgdat(pg_da
.gfp_mask = sc.gfp_mask,
};
 loop_again:
-   total_scanned = 0;
sc.priority = DEF_PRIORITY;
sc.nr_reclaimed = 0;
sc.may_writepage = !laptop_mode;
@@ -2730,7 +2728,6 @@ loop_again:
order, sc.gfp_mask,
_soft_scanned);
sc.nr_reclaimed += nr_soft_reclaimed;
-   total_scanned += nr_soft_scanned;

/*
 * We put equal pressure on every zone, unless
@@ -2765,7 +2762,6 @@ loop_again:
reclaim_state->reclaimed_slab = 0;
nr_slab = shrink_slab(, sc.nr_scanned, 
lru_pages);
sc.nr_reclaimed += 
reclaim_state->reclaimed_slab;
-   total_scanned += sc.nr_scanned;

if (nr_slab == 0 && !zone_reclaimable(zone))
zone->all_unreclaimable = 1;
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: epoll: possible bug from wakeup_source activation

2013-03-09 Thread Eric Dumazet

On Sun, 2013-03-10 at 01:11 +, Eric Wong wrote:
>  
>  static void ep_destroy_wakeup_source(struct epitem *epi)
>  {
> - wakeup_source_unregister(epi->ws);
> - epi->ws = NULL;
> + struct wakeup_source *ws = epi->ws;
> +
> + rcu_assign_pointer(epi->ws, NULL);

There is no need to use a memory barrier before setting epi->ws to NULL

You should use RCU_POINTER_INIT()

> + synchronize_rcu(); /* wait for ep_pm_stay_awake to finish */

Wont this add a significant slowdown ?

> + wakeup_source_unregister(ws);
>  }

Please add the following in your .config

CONFIG_SPARSE_RCU_POINTER=y

and try :

make C=2 fs/eventpoll.o

And fix all warnings ;)



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Fwd: [Nepomuk] Better support for (desktop) file search / indexing applications

2013-03-09 Thread Simeon Bird

Sent again in plain-text - apologies.

-- Forwarded message --
From: Simeon Bird 
Date: 9 March 2013 23:49
Subject: Re: [Nepomuk] Better support for (desktop) file search /
indexing applications
To: Tvrtko Ursulin 
Cc: Martin Steigerwald , Jan Kara ,
Robert Love , linux-kernel@vger.kernel.org, Eric
Paris , Nepomuk Mailing List ,
Linux Filesystem Development Mailinglist
, Eric Paris ,
John McCutchan 

On 12 November 2012 04:10, Tvrtko Ursulin  wrote:
>
> On Saturday 10 November 2012 17:53:45 Martin Steigerwald wrote:
> > Still fanotify needs root access and thus this would need a daemon running
> > as root and some policy kit stuff to access it and in case of mount point
> > watches robust and secure code so that each user may only see his/her own
> > results.
>
> Perhaps then also extend fanotify to support user watches, from the top of my
> head I can't think of a reason it would be very difficult to implement. But it
> has been a few years since I actively worked with that code.
>
> Since you are not the only group having issues with fanotify feature set I can
> see this mini-project (together with extensions from me previous reply) being
> useful. It is also better to evolve it than neglect due a few shortcomings and
> then in a few years someone will come up with something completely new and
> then we will have yet another notification system.
>
> Tvrtko

Hi,

We (nepomuk) recently looked at using fanotify, and indeed we would
need user watches, support for moves and recursive directory watches
(we need to support the case where /home is not a separate filesystem)
before it would be useful to us. If you are interested in adding
these, we would be delighted to use nepomuk as a test-case for them.

We were wondering also if it would be possible to extend inotify a
little? Our wishlist is:

1) Recursive folder watches
2) When a file moves, some way to get the destination without watching
the directory it moved to, so moves can be tracked without watching
every file on the system.

I understand that there are reasons of security and performance why
you cannot implement 1), but is 2) possible? Maybe by extending
IN_MOVED_TO, or adding a new event type?

2) is actually in some ways the more severe problem for us. As well as
being an indexer, nepomuk is a system that allows you to store file
metadata such as ratings. When users move the files, they want the
metadata to move too, so we need to track where the file moved, and
thus at the moment we recursively watch everything. This is
particularly problematic with removable media; because a lot of people
will plug in an external drive and then move files onto it, we have to
watch every drive as soon as it is plugged in. If we were able to get
the destination of move events without watching the destination
directory, we could watch only those directories with interesting
metadata in, which would make things a lot easier.

inotify move tracking would also be useful for other things - eg, a
text editor could use inotify to see if a file it has open has moved
and offer to re-open the file in its new location, which is impossible
at the moment.

Since the lack of recursive watches is really a problem because we
have a tendency to run out of watches, it would also really help if
the default limit was a bit higher -  most people seem to have > 8000
folders, but I suspect far fewer have > 32000 (probably excepting
those who are indexing kernel source trees: I have 21000, and half of
that is KDE source).

Would any of this be possible? If you happen to know of a better way
to track moves using existing tools, that would be even better.

Thanks,
Simeon
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xhci module fails when booting in UEFI mode

2013-03-09 Thread Marco

David Härdeman  hardeman.nu> writes:

> 
> On Thu, Jan 10, 2013 at 11:15:56AM +, Frederik Himpe wrote:
> >I've got a HP EliteBook 8470p on which I installed Debian Wheezy in UEFI 
> >mode. With both the 3.2 kernel from Wheezy, as the 3.7.1 kernel from 
> >experimental, xhci fails to initialize and my USB mouse connected to one 
> >of these ports is not recognized at all. The USB3 ports work fine in 
> >Windows.
> >
> >[1.316248] xhci_hcd :00:14.0: can't derive routing for PCI INT A
> >[1.316251] xhci_hcd :00:14.0: PCI INT A: no GSI
> >[1.316253] 
> >[1.316277] xhci_hcd :00:14.0: setting latency timer to 64
> >[1.316281] xhci_hcd :00:14.0: xHCI Host Controller
> >[1.316287] xhci_hcd :00:14.0: new USB bus registered, assigned 
> >bus number 1
> >[1.316393] xhci_hcd :00:14.0: cache line size of 64 is not 
> >supported
> >[1.316395] xhci_hcd :00:14.0: request interrupt 255 failed
> >[1.316447] xhci_hcd :00:14.0: USB bus 1 deregistered
> >[1.316466] xhci_hcd :00:14.0: can't derive routing for PCI INT A
> >[1.316467] xhci_hcd :00:14.0: init :00:14.0 fail, -22
> >[1.316505] xhci_hcd: probe of :00:14.0 failed with error -22
> >
> >Full dmesg, lspci, lsusb and lsmod can be found here:
> >http://artipc10.vub.ac.be/~frederik/uefi-xhci/
> >
> >I found this report about the same problem on a HP Probook system: 
> >https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1072918 
> 
> I have the same problem (HP Elitebook 8570p in my case).
> 
> This thread seems related:
> https://lkml.org/lkml/2012/2/13/453
> 
> No idea what happened to the patches in that thread though...
> 


workaround on hp 8570


using kernel parameter

acpi=noirq

USB 3 will work


[ 0.00] Linux version 3.5.0-25-generic (buildd@komainu) (gcc version 4.7.2
(Ubuntu/Linaro 4.7.2-2ubuntu1) ) #39-Ubuntu SMP Mon Feb 25 18:26:58 UTC 2013
(Ubuntu 3.5.0-25.39-generic 3.5.7.4)
[ 0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-3.5.0-25-generic.efi.signed
root=UUID=f8ccd8b4-9642-48c1-b57d-258ebbb6ee14 ro quiet splash acpi=noirq
vt.handoff=7

$ dmesg | grep xhci
[ 1.141906] xhci_hcd :00:14.0: can't find IRQ for PCI INT A; please try
using pci=biosirq
[ 1.141922] xhci_hcd :00:14.0: setting latency timer to 64
[ 1.141924] xhci_hcd :00:14.0: xHCI Host Controller
[ 1.141927] xhci_hcd :00:14.0: new USB bus registered, assigned bus number 3
[ 1.142008] xhci_hcd :00:14.0: cache line size of 64 is not supported
[ 1.142048] xhci_hcd :00:14.0: irq 23 for MSI/MSI-X
[ 1.142087] usb usb3: Manufacturer: Linux 3.5.0-25-generic xhci_hcd
[ 1.142137] xHCI xhci_add_endpoint called for root hub
[ 1.142139] xHCI xhci_check_bandwidth called for root hub
[ 1.142196] xhci_hcd :00:14.0: xHCI Host Controller
[ 1.142198] xhci_hcd :00:14.0: new USB bus registered, assigned bus number 4
[ 1.142218] usb usb4: Manufacturer: Linux 3.5.0-25-generic xhci_hcd
[ 1.142260] xHCI xhci_add_endpoint called for root hub
[ 1.142261] xHCI xhci_check_bandwidth called for root hub
[ 15.089213] genirq: Flags mismatch irq 23. 2001 (lis3lv02d) vs. 
(xhci_hcd)


if you like to recompile your own kernel, someone has reported this patch work
i can wait until this one will be included in the main kernel :D

https://lkml.org/lkml/2013/2/18/115




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] include/linux/res_counter.h: fixup commit 9259826ccb8165f797e4c2c9d17925b41af5f6ae

2013-03-09 Thread Chen Gang


  still need include linux/errno.h for EBUSY.

additional info:
  make command:
make EXTRA_CFLAGS=-W V=1 ARCH=arm allmodconfig
make EXTRA_CFLAGS=-W V=1 ARCH=arm
  error report:
In file included from mm/memcontrol.c:28:0:
include/linux/res_counter.h: in function ‘res_counter_set_limit’:
include/linux/res_counter.h:204:13: error ：‘EBUSY’ is not declared.


Signed-off-by: Chen Gang 
---
 include/linux/res_counter.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
index a83a849..f9822dc 100644
--- a/include/linux/res_counter.h
+++ b/include/linux/res_counter.h
@@ -13,6 +13,7 @@
  * info about what this counter is.
  */
 
+#include 
 #include 
 
 /*
-- 
1.7.7.6
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ARM: fix the magic numbers used for checking the existence of saved caller registers

2013-03-09 Thread Tao Hou

When any backtraced function has saved the caller register r10
(e.g., show_mem), without the fix in c_backtrace all saved caller
registers of the function will not been dumped.

Signed-off-by: Tao Hou 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
---
 arch/arm/lib/backtrace.S |   10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm/lib/backtrace.S b/arch/arm/lib/backtrace.S
index cd07b58..b4ecdd5 100644
--- a/arch/arm/lib/backtrace.S
+++ b/arch/arm/lib/backtrace.S
@@ -69,7 +69,7 @@ for_each_frame:   tst frame, mask @ Check 
for address exceptions
 
 1003:  ldr r2, [sv_pc, #-4]@ if stmfd sp!, {args} exists,
ldr r3, .Ldsi+4 @ adjust saved 'pc' back one
-   teq r3, r2, lsr #10 @ instruction
+   teq r3, r2, lsr #11 @ instruction
subne   r0, sv_pc, #4   @ allow for mov
subeq   r0, sv_pc, #8   @ allow for mov + stmia
 
@@ -80,14 +80,14 @@ for_each_frame: tst frame, mask @ Check 
for address exceptions
 
ldr r1, [sv_pc, #-4]@ if stmfd sp!, {args} exists,
ldr r3, .Ldsi+4
-   teq r3, r1, lsr #10
+   teq r3, r1, lsr #11
ldreq   r0, [frame, #-8]@ get sp
subeq   r0, r0, #4  @ point at the last arg
bleq.Ldumpstm   @ dump saved registers
 
 1004:  ldr r1, [sv_pc, #0] @ if stmfd sp!, {..., fp, ip, 
lr, pc}
ldr r3, .Ldsi   @ instruction exists,
-   teq r3, r1, lsr #10
+   teq r3, r1, lsr #11
subeq   r0, frame, #16
bleq.Ldumpstm   @ dump saved registers
 
@@ -146,7 +146,7 @@ ENDPROC(c_backtrace)
 .Lcr:  .asciz  "\n"
 .Lbad: .asciz  "Backtrace aborted due to bad frame pointer <%p>\n"
.align
-.Ldsi: .word   0xe92dd800 >> 10@ stmfd sp!, {... fp, ip, lr, 
pc}
-   .word   0xe92d >> 10@ stmfd sp!, {}
+.Ldsi: .word   0xe92dd800 >> 11@ stmfd sp!, {... fp, ip, lr, 
pc}
+   .word   0xe92d >> 11@ stmfd sp!, {}
 
 #endif
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/5] tasklist_lock fairness issues

2013-03-09 Thread Michel Lespinasse

On Sat, Mar 9, 2013 at 10:26 AM, Oleg Nesterov  wrote:
> Hi Michel,
>
> Well. I can't say I really like this. 4/5 itself looks fine, but other
> complications do not look nice, at least in the long term. Imho, imho,
> I can be wrong.
>
> Everyone seem to agree that tasklist should die, this series doesn't
> even try to solve the fundamental problems with this global lock.

I admit that this series does not make any progress towards removing
tasklist_lock call sites, which has been the direction so far.

However, I think that patches 1-3 are not very complex (they add
inline wrapper functions around the tasklist_lock locking calls, but
the compiled code ends up being the same). And by showing us which
tasklist_lock call sites are taken from interrupt contexts, they show
up which places are imposing the current tasklist_lock semantics on
us:

- sending signals, with the send_sigio() / send_sigurg() / kill_pgrp()
call sites;
- posix_cpu_timer_schedule()
- sysrq debugging features

These are only 9 call sites, so they should be more easily attackable
than the full tasklist_lock removal problem. And if we could get these
few eliminated, then it would become trivial to make the remaining
tasklist_lock fair (and maybe even remove the unfair rwlock_t
implementation), which would IMO be a significant step forward.

In other words, patch 5 is, in a way, cheating by trying to get some
tasklist_lock fairness benefits right now. There is a bit of
complexity to it since it replaces tasklist_lock with a pair of locks,
one fair and one unfair. But if we don't like patch 5, patches 1-3 can
still give us a closer intermediate goal of removing much of the
tasklist_lock nastyness by eliminating only 9 of the existing locking
sites.

> However, I agree. tasklist rework can take ages, so probably we need
> to solve at least the write-starvation problem right now. Probably this
> series is correct (I didn't read it except 4/5), but too complex.

Regarding the complexity - do you mean the complexity of implementing
a fair rwlock (I would guess not, since you said 4/5 look ok) or do
you mean that you don't like having inline wrapper functions around
tasklist_lock locking sites (which is what 1-3 do) ?

> Can't we do some simple hack to avoid the user-triggered livelocks?
> Say, can't we add
>
> // Just for illustration, at least we need CONTENDED/etc
>
> void read_lock_fair(rwlock_t *lock)
> {
> while (arch_writer_active(lock))
> cpu_relax();
> read_lock(lock);
> }
>
> and then turn some of read_lock's (do_wait, setpriority, more) into
> read_lock_fair?
>
> I have to admit, I do not know how rwlock_t actually works, and it seems
> that arch_writer_active() is not trivial even on x86. __write_lock_failed()
> changes the counter back, and afaics you can't detect the writer in the
> window between addl and subl. Perhaps we can change __write_lock_failed()
> to use RW_LOCK_BIAS + HUGE_NUMBER while it spins, but I am not sure...

IMO having a separate fair rw spinlock is less trouble than trying to
build it on top of rwlock_t - especially if we're planning to remove
the tasklist_lock use of rwlock_t, which is currently the main
justification for rwlock_t's existence. I'm hoping we can eventually
get rid of rwlock_t, so I prefer not to build on top of it when we can
easily avoid doing so.

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 15/20] kexec: fill note buffers by NT_VMCORE_PAD notes in page-size boundary

2013-03-09 Thread Zhang Yanfei

于 2013年03月09日 11:46, HATAYAMA Daisuke 写道:
> From: Yanfei Zhang 
> Subject: Re: [PATCH v2 15/20] kexec: fill note buffers by NT_VMCORE_PAD notes 
> in page-size boundary
> Date: Fri, 8 Mar 2013 21:02:50 +0800
> 
>> 2013/3/8 HATAYAMA Daisuke :
>>> From: Zhang Yanfei 
>>> Subject: Re: [PATCH v2 15/20] kexec: fill note buffers by NT_VMCORE_PAD 
>>> notes in page-size boundary
>>> Date: Thu, 7 Mar 2013 18:11:30 +0800
>>>
 于 2013年03月02日 16:37, HATAYAMA Daisuke 写道:
> Fill both crash_notes and vmcoreinfo_note buffers by NT_VMCORE_PAD
> note type to make them satisfy mmap()'s page-size boundary
> requirement.
>
> So far, end of note segments has been marked by zero-filled elf
> header. Instead, this patch writes NT_VMCORE_PAD note in the end of
> note segments until the offset on page-size boundary.


 In the codes below, it seems that you assign name "VMCOREINFO" for
 note type NT_VMCORE_PAD, right? This is kind of wired, i think. This
 name has been used for NT_VMCORE_DEBUGINFO note already. Why not something
 like "VMCOREPAD" or "PAD"?

>>>
>>> It looks you are confusing or don't know name and type. The name is
>>> namespace and in the namespace, there are multiple note types, each of
>>> which has the corresponding data. In other words, data corresponding
>>> to types differ if they belong to differnet name space even if
>>> integers of the types are coincide with.
>>
>> Yes, I knew this. Just as the spec said " a program must recognize both the 
>> name
>> and the type to recognize a descriptor.". But I cannot understand what your 
>> word
>> "namespace" came from? I think you complicate simple things here.
>>
>> Only with a type, we cannot recognize a descriptor, because "multiple
>> interpretations of
>> a single type value may exist", So we should combine the name and the type
>> together. If both the name and type of two descriptors are the same,
>> we could say we
>> have two same descriptors. If one of them (type or name) are
>> different, we say the
>> two descriptors are different and the two notes have different data.
>>
>> If I am wrong, please correct me.
> 
> ??? I think you're saying here the same thing as my explanation above.
> 
> Although the term ''name space'' never occurs in ELF, it seems to me
> standard to represent the same values as different ones by combining
> additional elements as names to them.
> 
> Well, formally, it is represented as simply tuples or vector
> space. For example, support set S and S' and define new set S x S' by
> 
>   S x S' := { (s, s') | s in S, s' in S' }
> 
> and equality of the S x S' are defined as usual:
> 
>   (s1, s1') == (s2, s2') iff s1 == s2 and s1' == s2'.
> 
> In ELF, S is names and S' is types. There's no other formal meaning
> there.
> 
>>>
>>> The "VMCOREINFO" name represents information exported from
>>> /proc/vmcore that is used in kdump framework. In this sense,
>>> NT_VMCORE_PAD that is specific for /proc/vmcore and kdump framework,
>>> should belong to the "VMCOREINFO" name.
>>
>> I cannot understand the name explanation totally. Does the name really
>> have this meaning? Is there any authentic document? I was always thinking we
>> could feel free to name a name by ourselves!
> 
> Of course, it's optional for you to decide how to name notes within
> the mechanism. But it's important to treat naming for ease of managing
> note types. In addition to the above formal definition, it's important
> to consider what name gives us. It's readability, telling us that note
> types that belong to unique name are treated in common in the sense of
> the name. This is apart from the formal definition above.
> 
> It's certainly possible to distinguish notes by giving names only and
> not giving types. For example, imagine there are new 27 notes and they
> have different names but have 0 as type.
> 
> name  type
> "SOME_NOTE_A" 0
> "SOME_NOTE_B" 0
> ...
> "SOME_NOTE_Z" 0
> 
> Also, for example,
> 
> nametype
> "SOME_NOTE" 0 => NT_SOME_NOTE_A
> "SOME_NOTE" 1 => NT_SOME_NOTE_B
> ...
> "SOME_NOTE" 26=> NT_SOME_NOTE_Z
> 
> For the former case, it *looks to me* that space of time is not used
> effectively and it *looks to me* that space of name is not consumed
> efficiently.
> 
> After all, it amounts to individual preference about naming. I cannot
> say anything more.
> 

I see. I know what you mean now. I was just surprised and puzzled about your
"namespace" concept. 

Other than the name of NT_VMCORE_PAD, no complaints about the code.

Reviewed-by: Zhang Yanfei 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ARM:net: an issue for k which is u32, never < 0

2013-03-09 Thread Chen Gang


  k is u32 which never < 0, need type cast, or cause issue.

Signed-off-by: Chen Gang 
---
 arch/arm/net/bpf_jit_32.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index 6828ef6..a0bd8a7 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -576,7 +576,7 @@ load_ind:
/* x = ((*(frame + k)) & 0xf) << 2; */
ctx->seen |= SEEN_X | SEEN_DATA | SEEN_CALL;
/* the interpreter should deal with the negative K */
-   if (k < 0)
+   if ((int)k < 0)
return -1;
/* offset in r1: we might have to take the slow path */
emit_mov_i(r_off, k, ctx);
-- 
1.7.7.6
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ARM:kernel: beautify code, using 'static const' instead of 'const static'

2013-03-09 Thread Chen Gang


  better using 'static const' instead of 'const static'

Signed-off-by: Chen Gang 
---
 arch/arm/kernel/smp_twd.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/arm/kernel/smp_twd.c b/arch/arm/kernel/smp_twd.c
index 3f25650..ef3e499 100644
--- a/arch/arm/kernel/smp_twd.c
+++ b/arch/arm/kernel/smp_twd.c
@@ -362,7 +362,7 @@ int __init twd_local_timer_register(struct twd_local_timer 
*tlt)
 }
 
 #ifdef CONFIG_OF
-const static struct of_device_id twd_of_match[] __initconst = {
+static const struct of_device_id twd_of_match[] __initconst = {
{ .compatible = "arm,cortex-a9-twd-timer",  },
{ .compatible = "arm,cortex-a5-twd-timer",  },
{ .compatible = "arm,arm11mp-twd-timer",},
-- 
1.7.7.6
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ARM:kernel: beautify code, rel->r_offset is __u32 which never < 0

2013-03-09 Thread Chen Gang


  rel->r_offset is __u32 which never < 0.
  it is defined in uapi/linux/elf.h

Signed-off-by: Chen Gang 
---
 arch/arm/kernel/module.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
index 1e9be5d..386086e 100644
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -74,7 +74,7 @@ apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, 
unsigned int symindex,
sym = ((Elf32_Sym *)symsec->sh_addr) + offset;
symname = strtab + sym->st_name;
 
-   if (rel->r_offset < 0 || rel->r_offset > dstsec->sh_size - 
sizeof(u32)) {
+   if (rel->r_offset > dstsec->sh_size - sizeof(u32)) {
pr_err("%s: section %u reloc %u sym '%s': out of bounds 
relocation, offset %d size %u\n",
   module->name, relindex, i, symname,
   rel->r_offset, dstsec->sh_size);
-- 
1.7.7.6
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: epoll: possible bug from wakeup_source activation

2013-03-09 Thread Eric Wong

Eric Wong  wrote:
> Arve Hjønnevåg  wrote:
> > On Fri, Mar 8, 2013 at 12:49 PM, Eric Wong  wrote:
> > > What happens if ep_modify calls ep_destroy_wakeup_source
> > > while __pm_stay_awake is running on the same epi->ws?
> > 
> > Yes, that looks like a problem. I think calling
> > ep_destroy_wakeup_source with ep->lock held should fix that. It is not
> > clear how useful changing EPOLLWAKEUP in ep_modify is, so
> > alternatively we could remove that feature and instead only allow it
> > to be set in ep_insert.
> 
> ep->lock would work, but ep->lock is already a source of heavy
> contention in my multithreaded+epoll webservers.
> 
> Perhaps RCU can be used?  I've no experience with RCU, but I've been
> meaning to get acquainted with RCU.

The following is lightly tested, at least I haven't gotten lockdep
to complain.

- 8< 
>From 2bcd549893aa204bd858cc1500a70f20b28e47c1 Mon Sep 17 00:00:00 2001
From: Eric Wong 
Date: Sun, 10 Mar 2013 01:06:50 +
Subject: [PATCH] epoll: RCU protect wakeup_source in epitem

This prevents wakeup_source destruction when a user hits the
item with EPOLL_CTL_MOD while ep_poll_callback is running.

Signed-off-by: Eric Wong 
---
 fs/eventpoll.c | 31 ---
 1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 9fec183..e008d54 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -158,7 +158,7 @@ struct epitem {
struct list_head fllink;
 
/* wakeup_source used when EPOLLWAKEUP is set */
-   struct wakeup_source *ws;
+   struct wakeup_source __rcu *ws;
 
/* The structure that describe the interested events and the source fd 
*/
struct epoll_event event;
@@ -536,6 +536,17 @@ static void ep_unregister_pollwait(struct eventpoll *ep, 
struct epitem *epi)
}
 }
 
+static inline void ep_pm_stay_awake(struct epitem *epi)
+{
+   struct wakeup_source *ws;
+
+   rcu_read_lock();
+   ws = rcu_dereference(epi->ws);
+   if (ws)
+   __pm_stay_awake(ws);
+   rcu_read_unlock();
+}
+
 /**
  * ep_scan_ready_list - Scans the ready list in a way that makes possible for
  *  the scan code, to call f_op->poll(). Also allows for
@@ -984,7 +995,7 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned 
mode, int sync, void *k
/* If this file is already in the ready list we exit soon */
if (!ep_is_linked(>rdllink)) {
list_add_tail(>rdllink, >rdllist);
-   __pm_stay_awake(epi->ws);
+   ep_pm_stay_awake(epi);
}
 
/*
@@ -1146,6 +1157,7 @@ static int reverse_path_check(void)
 static int ep_create_wakeup_source(struct epitem *epi)
 {
const char *name;
+   struct wakeup_source *ws;
 
if (!epi->ep->ws) {
epi->ep->ws = wakeup_source_register("eventpoll");
@@ -1154,17 +1166,22 @@ static int ep_create_wakeup_source(struct epitem *epi)
}
 
name = epi->ffd.file->f_path.dentry->d_name.name;
-   epi->ws = wakeup_source_register(name);
-   if (!epi->ws)
+   ws = wakeup_source_register(name);
+
+   if (!ws)
return -ENOMEM;
+   rcu_assign_pointer(epi->ws, ws);
 
return 0;
 }
 
 static void ep_destroy_wakeup_source(struct epitem *epi)
 {
-   wakeup_source_unregister(epi->ws);
-   epi->ws = NULL;
+   struct wakeup_source *ws = epi->ws;
+
+   rcu_assign_pointer(epi->ws, NULL);
+   synchronize_rcu(); /* wait for ep_pm_stay_awake to finish */
+   wakeup_source_unregister(ws);
 }
 
 /*
@@ -1199,7 +1216,7 @@ static int ep_insert(struct eventpoll *ep, struct 
epoll_event *event,
if (error)
goto error_create_wakeup_source;
} else {
-   epi->ws = NULL;
+   RCU_INIT_POINTER(epi->ws, NULL);
}
 
/* Initialize the poll table using the queue callback */
-- 
Eric Wong

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] irq: add quirk for broken interrupt remapping on 55XX chipsets

2013-03-09 Thread Prarit Bhargava

On 03/04/2013 08:24 AM, Don Dutile wrote:
> On 03/02/2013 10:59 AM, Andreas Mohr wrote:
>> Hi,
>>
>>> if ((revision == 0x13)&&  irq_remapping_enabled) {
>>> +pr_warn("WARNING WARNING WARNING WARNING WARNING
>>> WARNING\n"
>>> +"This system BIOS has enabled interrupt
>>> remapping\n"
>>> +"on a chipset that contains an errata making
>>> that\n"
>>> +"feature unstable.  Please reboot with
>>> nointremap\n"
>>> +"added to the kernel command line and contact\n"
>>> +"your BIOS vendor for an update");
>>> +}
>>
>> Forgive me, but ISTR that there's a special BIOS firmware quirk bug 
>> annotating
>> logger warning message mechanism (have I managed to hit all keywords yet? ;)
>> in the kernel which might be useful in this case.
>>
>>
>> OK, found something (but I don't think it was the mechanism
>> that ISTR - perhaps it got modernized?):
>>
>>
>> include/linux/printk.h:
>>
>> /*
>>   * FW_BUG
>>   * Add this to a message where you are sure the firmware is buggy or
>>   * behaves
>>   * really stupid or out of spec. Be aware that the responsible BIOS
>>   * developer
>>   * should be able to fix this issue or at least get a concrete idea of
>>   * the
>>   * problem by reading your message without the need of looking at the
>>   * kernel
>>   * code.
>>   *
>>   * Use it for definite and high priority BIOS bugs.
>>   *
>>   * FW_WARN
>>   * Use it for not that clear (e.g. could the kernel messed up things
>>   * already?)
>>   * and medium priority BIOS bugs.
>>   *
>>   * FW_INFO
>>   * Use this one if you want to tell the user or vendor about something
>>   * suspicious, but generally harmless related to the firmware.
>>   *
>>   * Use it for information or very low priority BIOS bugs.
>>   */
>>
> 
> It is not a firmware/BIOS bug. 

Correct.  This is a hardware bug that *may be* resolved through a BIOS update.
But there is no guarantee that a BIOS update is available.  Labelling it a FW
bug would be a mistake.

Prarit's comment to annotate it as
> a HW_ERR is more accurate.  A software patch is being tested now
> to see if it can do set-affinity in a manner that avoids this race
> and enables IR to stay on for all these systems.  It requires
> more testing to ensure the logic is valid.  This patch was
> recommended as a necessary short-term fix, and to highlight to
> others this possible state -- which Gerry mentioned he had.

Yup -- as mstowe asked ... should we even consider this patch then, or should we
wait for the possible real fix?

Having said that ... I'm nervous about playing around with the set-affinity path
for this HW problem.  We're basically changing good/reliable code for broken-ass
hardware.  :/  That doesn't seem a like a good choice to me.

I can understand if we all feel that the code is broken, or it can be made
better -- but to change it because of bad HW  just doesn't seem like the right
thing to do.

IMO.

P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: memory leak and other oddness in pinctrl-mvebu.c

2013-03-09 Thread Jason Cooper

Added LinusW, Gregory and Ezequiel to the email.  Guys, can you give
this a Tested-by before I apply (or Ack for LinusW)?

thx,

Jason.

On Sat, Mar 09, 2013 at 11:39:31PM +, David Woodhouse wrote:
> On Sat, 2013-03-09 at 17:53 -0500, Jason Cooper wrote:
> > > + if (!nr_funcs)
> > 
> > shouldn't this be:
> > 
> > if (nr_funcs <= 0)
> 
> Hm, no. But the loop should terminate if nr_funcs ever reaches zero,
> otherwise funcs->num_groups will be off the end of the original array:
> 
> diff --git a/drivers/pinctrl/mvebu/pinctrl-mvebu.c 
> b/drivers/pinctrl/mvebu/pinctrl-mvebu.c
> index c689c04..8bbc607 100644
> --- a/drivers/pinctrl/mvebu/pinctrl-mvebu.c
> +++ b/drivers/pinctrl/mvebu/pinctrl-mvebu.c
> @@ -478,16 +478,21 @@ static struct pinctrl_ops mvebu_pinctrl_ops = {
>   .dt_free_map = mvebu_pinctrl_dt_free_map,
>  };
>  
> -static int _add_function(struct mvebu_pinctrl_function *funcs, const char 
> *name)
> +static int _add_function(struct mvebu_pinctrl_function *funcs, int nr_funcs,
> +  const char *name)
>  {
> - while (funcs->num_groups) {
> + while (nr_funcs && funcs->num_groups) {
>   /* function already there */
>   if (strcmp(funcs->name, name) == 0) {
>   funcs->num_groups++;
>   return -EEXIST;
>   }
>   funcs++;
> + nr_funcs--;
>   }
> + if (!nr_funcs)
> + return -EOVERFLOW;
> +
>   funcs->name = name;
>   funcs->num_groups = 1;
>   return 0;
> @@ -501,7 +506,7 @@ static int mvebu_pinctrl_build_functions(struct 
> platform_device *pdev,
>   int n, s;
>  
>   /* we allocate functions for number of pins and hope
> -  * there are less unique functions than pins available */
> +  * there are fewer unique functions than pins available */
>   funcs = devm_kzalloc(>dev, pctl->desc.npins *
>sizeof(struct mvebu_pinctrl_function), GFP_KERNEL);
>   if (!funcs)
> @@ -510,26 +515,27 @@ static int mvebu_pinctrl_build_functions(struct 
> platform_device *pdev,
>   for (n = 0; n < pctl->num_groups; n++) {
>   struct mvebu_pinctrl_group *grp = >groups[n];
>   for (s = 0; s < grp->num_settings; s++) {
> + int ret;
> +
>   /* skip unsupported settings on this variant */
>   if (pctl->variant &&
>   !(pctl->variant & grp->settings[s].variant))
>   continue;
>  
>   /* check for unique functions and count groups */
> - if (_add_function(funcs, grp->settings[s].name))
> + ret = _add_function(funcs, pctl->desc.npins,
> + grp->settings[s].name);
> + if (ret == -EOVERFLOW)
> + dev_err(>dev,
> + "More functions than pins(%d)\n",
> + pctl->desc.npins);
> + if (ret)
>   continue;
>  
>   num++;
>   }
>   }
>  
> - /* with the number of unique functions and it's groups known,
> -reallocate functions and assign group names */
> - funcs = krealloc(funcs, num * sizeof(struct mvebu_pinctrl_function),
> -  GFP_KERNEL);
> - if (!funcs)
> - return -ENOMEM;
> -
>   pctl->num_functions = num;
>   pctl->functions = funcs;
>  
> 
> -- 
> dwmw2
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 6/7] Split remaining calls to call_usermodehelper_fns()

2013-03-09 Thread Lucas De Marchi

On Sat, Mar 9, 2013 at 5:42 PM, Oleg Nesterov  wrote:
> On 03/08, Lucas De Marchi wrote:
>>
>> @@ -571,9 +572,17 @@ void do_coredump(siginfo_t *siginfo)
>>   goto fail_dropcount;
>>   }
>>
>> - retval = call_usermodehelper_fns(helper_argv[0], helper_argv,
>> - NULL, UMH_WAIT_EXEC, umh_pipe_setup,
>> - NULL, );
>> + sub_info = call_usermodehelper_setup(helper_argv[0],
>> + helper_argv, NULL, GFP_KERNEL,
>> + umh_pipe_setup, NULL, );
>> + if (!sub_info) {
>> + printk(KERN_WARNING "%s failed to allocate memory\n",
>> +__func__);
>
> Why?
>
>> + argv_free(helper_argv);
>> + goto fail_dropcount;
>> + }
>> +
>> + retval = call_usermodehelper_exec(sub_info, UMH_WAIT_EXEC);
>
> I do not really like another argv_free() here... How about
>
> retval = -ENOMEM;
> info = call_usermodehelper_setup(...);
> if (info)
> retval = call_usermodehelper_fns(...);
> argv_free();
>
> ?

Looks good. I'll prepare a v3

Thanks
Lucas De Marchi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 4/7] KEYS: split call to call_usermodehelper_fns()

2013-03-09 Thread Lucas De Marchi

On Sat, Mar 9, 2013 at 5:25 PM, Oleg Nesterov  wrote:
> On 03/08, Lucas De Marchi wrote:
>>
>>  static int call_usermodehelper_keys(char *path, char **argv, char **envp,
>>   struct key *session_keyring, int wait)
>>  {
>> - return call_usermodehelper_fns(path, argv, envp, wait,
>> -umh_keys_init, umh_keys_cleanup,
>> -key_get(session_keyring));
>> + struct subprocess_info *info;
>> +
>> + info = call_usermodehelper_setup(path, argv, envp, GFP_KERNEL,
>> +   umh_keys_init, umh_keys_cleanup,
>> +   key_get(session_keyring));
>> + if (!info) {
>> + key_put(session_keyring);
>> + return -ENOMEM;
>> + }
>> +
>> + return call_usermodehelper_exec(info, wait);
>
> Looks correct, but can't we simpluify it a bit?
>
> info = call_usermodehelper_setup(session_keyring);
> if (!info)
> return ENOMEM;
>
> key_get(session_keyring));
> return call_usermodehelper_exec(info);

Yep, looks better this way.

Lucas De Marchi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 3/7] kmod: split call to call_usermodehelper_fns()

2013-03-09 Thread Lucas De Marchi

On Sat, Mar 9, 2013 at 5:23 PM, Oleg Nesterov  wrote:
> On 03/08, Lucas De Marchi wrote:
>>
>> Use call_usermodehelper_setup() + call_usermodehelper_exec() instead of
>> calling call_usermodehelper_fns(). In case the latter returns -ENOMEM
>> the cleanup function may had not been called - in this case we would
>> not free argv and module_name.
>>
>> Signed-off-by: Lucas De Marchi 
>
> Thanks!
>
> looks correct, but...
>
>> @@ -98,8 +100,17 @@ static int call_modprobe(char *module_name, int wait)
>>   argv[3] = module_name;  /* check free_modprobe_argv() */
>>   argv[4] = NULL;
>>
>> - return call_usermodehelper_fns(modprobe_path, argv, envp,
>> - wait | UMH_KILLABLE, NULL, free_modprobe_argv, NULL);
>> + gfp_mask = (wait == UMH_NO_WAIT) ? GFP_ATOMIC : GFP_KERNEL;
>
> Why? it is never called with UMH_NO_WAIT,
>
>> + info = call_usermodehelper_setup(modprobe_path, argv, envp,
>> +  gfp_mask, NULL, free_modprobe_argv,
>
> can't we simply use GFP_KERNEL?

True... I was preserving the previous behavior and didn't check the
callers of this function.
I'm going to send a v3.

Thanks
Lucas De Marchi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 17/25] wm97xx: don't use [delayed_]work_pending()

2013-03-09 Thread Dmitry Torokhov

On Mon, Dec 24, 2012 at 04:18:27PM +, Mark Brown wrote:
> On Sun, Dec 23, 2012 at 01:54:50AM -0800, Dmitry Torokhov wrote:
> 
> > This is not 100% equivalent transformation as now we schedule first and
> > disable IRQ later... Anyway, I think the driver shoudl be converted to
> > threaded IRQ instead. Mark, does the patch below make any sense to you?
> 
> I'm a bit nervous about the fact that currently both the pen down IRQ
> and the coordinate read are pushed through a single workqueue so are
> serialised but after your patch they'll be split into the IRQ thread and
> the workqueue.  It *should* be fine but I'd have to sit there and study
> it to convince myself that it's safe.

Mark,

Did yo have a chance to review the patch?

Thanks!

-- 
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: memory leak and other oddness in pinctrl-mvebu.c

2013-03-09 Thread David Woodhouse

On Sat, 2013-03-09 at 17:53 -0500, Jason Cooper wrote:
> > + if (!nr_funcs)
> 
> shouldn't this be:
> 
> if (nr_funcs <= 0)

Hm, no. But the loop should terminate if nr_funcs ever reaches zero,
otherwise funcs->num_groups will be off the end of the original array:

diff --git a/drivers/pinctrl/mvebu/pinctrl-mvebu.c 
b/drivers/pinctrl/mvebu/pinctrl-mvebu.c
index c689c04..8bbc607 100644
--- a/drivers/pinctrl/mvebu/pinctrl-mvebu.c
+++ b/drivers/pinctrl/mvebu/pinctrl-mvebu.c
@@ -478,16 +478,21 @@ static struct pinctrl_ops mvebu_pinctrl_ops = {
.dt_free_map = mvebu_pinctrl_dt_free_map,
 };
 
-static int _add_function(struct mvebu_pinctrl_function *funcs, const char 
*name)
+static int _add_function(struct mvebu_pinctrl_function *funcs, int nr_funcs,
+const char *name)
 {
-   while (funcs->num_groups) {
+   while (nr_funcs && funcs->num_groups) {
/* function already there */
if (strcmp(funcs->name, name) == 0) {
funcs->num_groups++;
return -EEXIST;
}
funcs++;
+   nr_funcs--;
}
+   if (!nr_funcs)
+   return -EOVERFLOW;
+
funcs->name = name;
funcs->num_groups = 1;
return 0;
@@ -501,7 +506,7 @@ static int mvebu_pinctrl_build_functions(struct 
platform_device *pdev,
int n, s;
 
/* we allocate functions for number of pins and hope
-* there are less unique functions than pins available */
+* there are fewer unique functions than pins available */
funcs = devm_kzalloc(>dev, pctl->desc.npins *
 sizeof(struct mvebu_pinctrl_function), GFP_KERNEL);
if (!funcs)
@@ -510,26 +515,27 @@ static int mvebu_pinctrl_build_functions(struct 
platform_device *pdev,
for (n = 0; n < pctl->num_groups; n++) {
struct mvebu_pinctrl_group *grp = >groups[n];
for (s = 0; s < grp->num_settings; s++) {
+   int ret;
+
/* skip unsupported settings on this variant */
if (pctl->variant &&
!(pctl->variant & grp->settings[s].variant))
continue;
 
/* check for unique functions and count groups */
-   if (_add_function(funcs, grp->settings[s].name))
+   ret = _add_function(funcs, pctl->desc.npins,
+   grp->settings[s].name);
+   if (ret == -EOVERFLOW)
+   dev_err(>dev,
+   "More functions than pins(%d)\n",
+   pctl->desc.npins);
+   if (ret)
continue;
 
num++;
}
}
 
-   /* with the number of unique functions and it's groups known,
-  reallocate functions and assign group names */
-   funcs = krealloc(funcs, num * sizeof(struct mvebu_pinctrl_function),
-GFP_KERNEL);
-   if (!funcs)
-   return -ENOMEM;
-
pctl->num_functions = num;
pctl->functions = funcs;
 

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature

Re: [PATCH 2/2] PCI: fix system hang issue of Marvell SATA host controller

2013-03-09 Thread Myron Stowe

On Sat, Mar 9, 2013 at 7:49 AM, Xiangliang Yu  wrote:
> Hi, Bjorn
>
>>> >> > Fix system hang issue: if first accessed resource file of BAR0 ~
>>> >> > BAR4, system will hang after executing lspci command
>>> >>
>>> >> This needs more explanation.  We've already read the BARs by the
>>> >> time header quirks are run, so apparently it's not just the mere
>>> >> act of accessing a BAR that causes a hang.
>>> >>
>>> >> We need to know exactly what's going on here.  For example, do
>>> >> BARs
>>> >> 0-4 exist?  Does the device decode accesses to the regions
>>> >> described by the BARs?  The PCI core has to know what resources
>>> >> the device uses, so if the device decodes accesses, we can't just
>>> >> throw away the start/end information.
>>> > The BARs 0-4 is exist and the PCI device is enable IO space, but
>>> > user access
>>> the regions file by udevadm command with info parameter, the system will 
>>> hang.
>>> > Like this: udevadmin info --attribut-walk
>>> --path=/sys/device/pci-device/000:*.
>>> > Because the device is just AHCI host controller, don't need the
>>> > BAR0 ~ 4 region
>>> file.
>>> > Is my explanation ok for the patch?
>>>
>>> No, I still don't know what causes the hang; I only know that udevadm
>>> can trigger it.  I don't want to just paper over the problem until we
>>> know what the root cause is.
>>>
>>> Does "lspci -H1 -vv" also cause a hang?  What about "setpci -s
>>> BASE_ADDRESS_0"?  "setpci -H1 -s BASE_ADDRESS_0"?
>> The commands are ok because the commands can't find the device after 
>> accessing IO port.
>> The root cause is that accessing of IO port will make the chip go bad. So, 
>> the point of the patch is don't export capability of the IO accessing.
>
>>Ah, so the problem is not with accessing the BAR in config space.  The 
>>problem is with accessing the I/O port space mapped by the BAR.  Is that 
>>right?
>
> Yes...
>
>>Does "udevadm info --attribute-walk" really access the device address space 
>>mapped by the BARs?
>
> The older version maybe will access the space, I just got the info from HP. 
> And I simplify the issue by executing following command:
> Cat /sys/devices/pci-device/**/resourceX
>
> I want to set the resources of BAR0 ~ 4 to 0 to avoid the IO accessing by 
> user.

I tried to explain earlier the possible issues with the approach that
is currently being put forth.  Please review that and if you have any
questions ask.

>
> Any question? Thanks!

Googling and looking at the PCI IDs data base I see that the Marvell
9125 device has been around since sometime around 2010 and that there
even seem to be a number of follow-on iterations of the chip (i.e.
9128, 9120, ...).  It seems incredibly unlikely that Marvell made a
device that has been shipping for 2+ years with five I/O BARs that do
not work and we are only now finding out such.

Am I missing something relevant here?  Can you verify that this device
has is indeed not new and has been successfully used in recent
platforms?

You just recently responded with  "... I just got the info from HP.
..." so I'm assuming this is an issue that has just been encountered
on some type of HP system - is this correct?  If so, do you have
access to the system to provide the logs I asked for earlier?  Also,
is there anything special or completely new about this platform that
would explain away the arguments for why this is probably not a
Marvell device issue?

At this point it seems more likely that there is an issue with the
BIOS of the HP system, perhaps a resource duplication/overlap issue
much like I talked about earlier.

To understand the root cause and not just band-aid over a symptom we
need to get the logs asked for from the system.  HP likely needs to
get involved and start participating and providing such at this point.

Again, the logs that would be helpful currently are: A 'dmesg' log
from the system which was booted using both the "debug" and
"ignore_loglevel" boot parameters, a 'lspci -xxx -s' capture
targeting the Marvell 9125 device, and a 'lspci -vv' capture of the
system's entire PCI hierarchy.

>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Suggestion for fixing the variable length array used in the kernel.

2013-03-09 Thread Christopher Li

On Sat, Mar 9, 2013 at 2:34 PM, Dan Carpenter  wrote:
> The problems is if we go over the 8k stack.  So big arrays are bad.
> Also if the dynamically sized array is inside a loop then normally
> GCC frees it after each iteration, but on some arches it didn't free
> it until after the last iteration.

So it seems that you agree those variable array usage should be
better change to use kmalloc or some thing.

> Btw, I've Smatch has cross function analysis, and I'd like to use
> it here to figure out if the max size for dynamically sized arrays.
> I ran into a problem:
>
> The code looks like this:
> char buf[a];
> The size expression should be an EXPR_SYMBOL, but smatch gets:
> char buf[*a];

Sparse currently does not deal with the dynamic array size right now.
It only want to get constant value from the array size.

The part that evaluate the array size is actually correct. Remember
the EXPR_SYMBOL
actually contain the *address* of symbol "a". So the proper
sizeof(buf) is actually
the content of "*a". That part is fine.
The more complicated case of dynamic array size is using the dynamic array in
a struct:

struct {
char descriptor1[length+1];
char descriptor2[length+1];
} *d;

Then the sizeof(*d) need to be ((*length) + 1 + (*length) + 1), assume
"length" is a
symbol address. The sizeof (struct foo) can be pretty complicate expression.

Some USB code use this kind of the dynamic array. However, it does not allocate
the struct in the stack, the struct is allocated via kmalloc using pointer.
Sparse still complain the variable length array though.

Let me see if I can make the sparse handle dynamic array better.

Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: memory leak and other oddness in pinctrl-mvebu.c

2013-03-09 Thread Jason Cooper

On Sat, Mar 09, 2013 at 07:02:05PM +, David Woodhouse wrote:
> On Sat, 2013-03-09 at 09:49 +0100, Sebastian Hesselbarth wrote:
> > I don't have a strong opinion on that, but I prefer not to have the list
> > statically in the SoC specific drivers. I think counting the number of
> > unique functions for each SoC specific driver once and verify the above
> > heuristic (fewer unique functions than pins) is still valid. Then drop
> > the krealloc and leave the array the way it is allocated on devm_kzalloc.
> 
> Yeah. If you stick a check in the loop and make it warn if it *would*
> have run over the end of the array, that sounds like it ought to be
> fine. Something like this, perhaps? Still untested but otherwise
> Signed-off-by: David Woodhouse 
> 
> diff --git a/drivers/pinctrl/mvebu/pinctrl-mvebu.c 
> b/drivers/pinctrl/mvebu/pinctrl-mvebu.c
> index c689c04..55d55d5 100644
> --- a/drivers/pinctrl/mvebu/pinctrl-mvebu.c
> +++ b/drivers/pinctrl/mvebu/pinctrl-mvebu.c
> @@ -478,7 +478,8 @@ static struct pinctrl_ops mvebu_pinctrl_ops = {
>   .dt_free_map = mvebu_pinctrl_dt_free_map,
>  };
>  
> -static int _add_function(struct mvebu_pinctrl_function *funcs, const char 
> *name)
> +static int _add_function(struct mvebu_pinctrl_function *funcs, int nr_funcs,
> +  const char *name)
>  {
>   while (funcs->num_groups) {
>   /* function already there */
> @@ -487,7 +488,11 @@ static int _add_function(struct mvebu_pinctrl_function 
> *funcs, const char *name)
>   return -EEXIST;
>   }
>   funcs++;
> + nr_funcs--;
>   }
> + if (!nr_funcs)

shouldn't this be:

if (nr_funcs <= 0)

thx,

Jason.

> + return -EOVERFLOW;
> +
>   funcs->name = name;
>   funcs->num_groups = 1;
>   return 0;
> @@ -501,7 +506,7 @@ static int mvebu_pinctrl_build_functions(struct 
> platform_device *pdev,
>   int n, s;
>  
>   /* we allocate functions for number of pins and hope
> -  * there are less unique functions than pins available */
> +  * there are fewer unique functions than pins available */
>   funcs = devm_kzalloc(>dev, pctl->desc.npins *
>sizeof(struct mvebu_pinctrl_function), GFP_KERNEL);
>   if (!funcs)
> @@ -510,26 +515,27 @@ static int mvebu_pinctrl_build_functions(struct 
> platform_device *pdev,
>   for (n = 0; n < pctl->num_groups; n++) {
>   struct mvebu_pinctrl_group *grp = >groups[n];
>   for (s = 0; s < grp->num_settings; s++) {
> + int ret;
> +
>   /* skip unsupported settings on this variant */
>   if (pctl->variant &&
>   !(pctl->variant & grp->settings[s].variant))
>   continue;
>  
>   /* check for unique functions and count groups */
> - if (_add_function(funcs, grp->settings[s].name))
> + ret = _add_function(funcs, pctl->desc.npins,
> + grp->settings[s].name);
> + if (ret == -EOVERFLOW)
> + dev_err(>dev,
> + "More functions than pins(%d)\n",
> + pctl->desc.npins);
> + if (ret)
>   continue;
>  
>   num++;
>   }
>   }
>  
> - /* with the number of unique functions and it's groups known,
> -reallocate functions and assign group names */
> - funcs = krealloc(funcs, num * sizeof(struct mvebu_pinctrl_function),
> -  GFP_KERNEL);
> - if (!funcs)
> - return -ENOMEM;
> -
>   pctl->num_functions = num;
>   pctl->functions = funcs;
>  
> 
> 
> -- 
> dwmw2
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Suggestion for fixing the variable length array used in the kernel.

2013-03-09 Thread Dan Carpenter

On Sat, Mar 09, 2013 at 10:10:08AM -0800, Christopher Li wrote:
> On Fri, Mar 8, 2013 at 9:39 PM, Dan Carpenter  
> wrote:
> > On Fri, Mar 08, 2013 at 04:29:22PM -0800, Andrew Morton wrote:
> >> Roughly how many instances of this are there kernel-wide?
> >>
> >
> > Around 150 on x86 allmodconfig.  They are pretty well audited.
> 
> I saw 207 on x86-64 allmodconfig. See the list that I attached.
> 

Ah.  Sorry, I'm on my laptop and my sparse output was old.

> Can you elaborate the well audited part? How it was audited?
> 

The problems is if we go over the 8k stack.  So big arrays are bad.
Also if the dynamically sized array is inside a loop then normally
GCC frees it after each iteration, but on some arches it didn't free
it until after the last iteration.

Btw, I've Smatch has cross function analysis, and I'd like to use
it here to figure out if the max size for dynamically sized arrays.
I ran into a problem:

The code looks like this:
char buf[a];
The size expression should be an EXPR_SYMBOL, but smatch gets:
char buf[*a];
Where the size expression is an EXPR_PREOP.

In smatch, how I use sparse is that I call sparse_keep_tokens() and
then I parse the resulting symbol list myself.  The problem is in
examine_array_type() we call get_expression_value() which changes
the symbols from normal symbols to dereferences. The call tree is:
examine_array_type()
  -> get_expression_value()
 -> __get_expression_value()
-> evaluate_expression()
   -> evaluate_symbol_expression() <- change happens here.

I'm not sure what to do.

regards,
dan carpenter

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] irq: add quirk for broken interrupt remapping on 55XX chipsets

2013-03-09 Thread Myron Stowe

On Sat, Mar 9, 2013 at 1:49 PM, Neil Horman  wrote:
> On Mon, Mar 04, 2013 at 02:04:19PM -0500, Neil Horman wrote:
>> A few years back intel published a spec update:
>> http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf
>>
>> For the 5520 and 5500 chipsets which contained an errata (specificially 
>> errata
>> 53), which noted that these chipsets can't properly do interrupt remapping, 
>> and
>> as a result the recommend that interrupt remapping be disabled in bios.  
>> While
>> many vendors have a bios update to do exactly that, not all do, and of course
>> not all users update their bios to a level that corrects the problem.  As a
>> result, occasionally interrupts can arrive at a cpu even after affinity for 
>> that
>> interrupt has be moved, leading to lost or spurrious interrupts (usually
>> characterized by the message:
>> kernel: do_IRQ: 7.71 No irq handler for vector (irq -1)
>>
>> There have been several incidents recently of people seeing this error, and
>> investigation has shown that they have system for which their BIOS level is 
>> such
>> that this feature was not properly turned off.  As such, it would be good to
>> give them a reminder that their systems are vulnurable to this problem.
>>
>> Signed-off-by: Neil Horman 
>> CC: Prarit Bhargava 
>> CC: Don Zickus 
>> CC: Don Dutile 
>> CC: Bjorn Helgaas 
>> CC: Asit Mallick 
>> CC: linux-...@vger.kernel.org
>>
> Ping, anyone want to Ack/Nack this?

Don's comment earlier seems to imply that this is a short term fix and
that a more long term fix may be coming soon.  If that is the case
wouldn't we want to wait for the long term fix and just pull that in?

Myron

> Neil
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] tty: serial: fix typo "ARCH_S5P6450"

2013-03-09 Thread Paul Bolle

This could have been either ARCH_S5P64X0 or CPU_S5P6450. Looking at
commit 2555e663b367b8d555e76023f4de3f6338c28d6c ("ARM: S5P64X0: Add UART
serial support for S5P6450") - which added this typo - makes clear this
should be CPU_S5P6450.

Signed-off-by: Paul Bolle 
---
Bravely untested.

 drivers/tty/serial/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tty/serial/Kconfig b/drivers/tty/serial/Kconfig
index cf9210d..40ddbe4 100644
--- a/drivers/tty/serial/Kconfig
+++ b/drivers/tty/serial/Kconfig
@@ -218,7 +218,7 @@ config SERIAL_SAMSUNG_UARTS_4
 config SERIAL_SAMSUNG_UARTS
int
depends on PLAT_SAMSUNG
-   default 6 if ARCH_S5P6450
+   default 6 if CPU_S5P6450
default 4 if SERIAL_SAMSUNG_UARTS_4 || CPU_S3C2416
default 3
help
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ibm-acpi-devel] x230: unhandled HKEY event 0x6050

2013-03-09 Thread Borislav Petkov

On Sat, Mar 09, 2013 at 05:49:13PM -0300, Henrique de Moraes Holschuh wrote:
> Hmm, I don't follow.  thinkpad-acpi does run on ring 0, so was that aimed at
> me?

Huh, is "thinkpad-acpi" the name of a part of a house? :-)

> Or are you complaining about the Win-8 bug-to-bug compatibility madness
> Lenovo added to the ACPI firmware?

Of course, I was simply ranting at the fact that hw vendors like lenovo
need to do all kinds of dancing in the BIOS just so they can get their
windoze certification. I know, I know, they don't have a choice and we
have more of those windoze workarounds in the kernel but we shouldn't.

Btw, what your signature says is very fitting to the occasion. :)

> -- 
>   "One disk to rule them all, One disk to find them. One disk to bring
>   them all and in the darkness grind them. In the Land of Redmond
>   where the shadows lie." -- The Silicon Valley Tarot
>   Henrique Holschuh

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] power: make goldfish option have a dependency on goldfish

2013-03-09 Thread Anton Vorontsov

On Sat, Mar 09, 2013 at 10:49:08AM -0500, Paul Gortmaker wrote:
[...]
> >> I didn't send the patch to akpm, but I did have a chance to ask akpm how
> >> dependencies should be used, and you can see his answer here:
> >>
> >>   https://lkml.org/lkml/2013/3/7/456
> >
> > Thanks for asking! FWIW, I won't be against CONFIG_AKPM. ;-) Something
> > like that will work:
> >
> > depends on GENERIC_HARDIRQS
> > depends on RESTRICT_PLATFORM && GOLDFISH
> >
> > But not that I think we really need this option, though. Whoever wants to
> 
> Of course, it was only meant sarcastically, but the CONFIG_AKPM
> joke wasn't the important part of the email discussion though.
> 
> Above,  you asked "If Andrew agrees [that dependencies should describe
> the hardware/platform] ... I will start applying such patches in future."
> 
> The important bit is Andrew's answer to your question:
> 
>   "...offering useless stuff to non-kernel-developers has downsides
>   with no balancing benefit, and we really should optimise things
>   for our users because there are so many more of them than there
>   are of us."

To me, the important bit was "drives me bats when I merge a patch but have
to jump through a series of hoops ..."

And "we really should optimise things" does not mean that your patch is an
ideal solution. You make users' life a bit easier (maybe), but miserable
for me. As I read Andrew, a better solution have to be implemented (e.g.
CONFIG_AKPM, which I didn't find being too sarcastic :-), which would suit
both users and maintainers.

Btw, I am a Linux user too. And the amount of Kconfig symbols never ever
bothered me personally. Why? Look:

~$ grep CONFIG /boot/config-3.8.0-28-desktop  | wc -l
5274

Do you think that even if you divide this by two it would make any
difference? Not to me. When dealing with these large sets of options, my
strategy of making configs is different (as I explained previous emails it
is either '{old,allmod}config'+tuning or 'allnoconfig'+deliberately
enabling options for specific hardware/needs.

Don't take it personally, but for me the patch does not do anything good
at all. And again, feel free to send it to akpm with my nack and a link,
since I might be still wrong.

Cheers,

Anton
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Regression: Screen turns off when booting in EFI mode

2013-03-09 Thread Mantas Mikulėnas

On 2013-02-22 03:03, Mantas Mikulėnas wrote:
> On 2013-02-22 01:54, Dave Airlie wrote:
>>>
>>> | radeon :01:00.0: No connectors reported connected with modes
>>> | [drm] Cannot find any crtc or sizes - going 1024x768
>>>
>>> The connector is definitely connected, since this is a laptop with a
>>> built-in screen...
>>>
>>
>> Can you get the log with drm.debug=6 from both boots as well?
> 
> Attached.

The log is also at http://nullroute.eu.org/tmp/2013/dmesg-drm-debug.txt

Not to be annoying, but I hope this can be fixed until 3.9...

(I just tested v3.9-rc1-278-g8343bce, and it still does not detect any
displays. And if I understood it correctly, "nomodeset" is going to go
away?)

-- 
Mantas Mikulėnas 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ACPI undocking on 3.8-rc5 no longer works with Lenovo T61

2013-03-09 Thread Henrique de Moraes Holschuh

On Thu, 07 Mar 2013, Toshi Kani wrote:
> dock.0 on your Lenovo likely shows as "battery_bay", which I think you
> cannot undock regardless of this patch.  Can you try to undock the one

You're supposed to be able to undock any thinkpad battery just fine, and it
used to work on older firmware such as the one in the T61.  If this is
broken, some other bugs may be at play.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] thinkpad-acpi: kill hotkey_thread_mutex

2013-03-09 Thread Henrique de Moraes Holschuh

On Thu, 07 Mar 2013, Mandeep Singh Baines wrote:
> On Thu, Mar 7, 2013 at 9:53 AM, Oleg Nesterov  wrote:
> > hotkey_kthread() does try_to_freeze() under hotkey_thread_mutex.
> >
> > We can simply kill this mutex, hotkey_poll_stop_sync() does not need
> > to serialize with hotkey_kthread(). When kthread_stop() returns the
> > thread is already dead, it called do_exit()->complete_vfork_done().
> >
> > Reported-by: Artem Savkov 
> > Reported-by: Maciej Rutecki 
> > Signed-off-by: Oleg Nesterov 
> >
> 
> Reviewed-by: Mandeep Singh Baines 

Acked-by: Henrique de Moraes Holschuh 

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] zcache: fix typo "64_BIT"

2013-03-09 Thread Paul Bolle

Signed-off-by: Paul Bolle 
---
Untested! This specifically needs testing on (some) 64 bit platforms,
because this means HAVE_ALIGNED_STRUCT_PAGE will not be selected anymore
on those platforms.

 drivers/staging/zcache/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/zcache/Kconfig b/drivers/staging/zcache/Kconfig
index 7358270..5c37145 100644
--- a/drivers/staging/zcache/Kconfig
+++ b/drivers/staging/zcache/Kconfig
@@ -15,7 +15,7 @@ config RAMSTER
depends on CONFIGFS_FS=y && SYSFS=y && !HIGHMEM && ZCACHE=y
depends on NET
# must ensure struct page is 8-byte aligned
-   select HAVE_ALIGNED_STRUCT_PAGE if !64_BIT
+   select HAVE_ALIGNED_STRUCT_PAGE if !64BIT
default n
help
  RAMster allows RAM on other machines in a cluster to be utilized
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH net-next] drivers:net: Remove unnecessary OOM messages after netdev_alloc_skb

2013-03-09 Thread David Miller

From: Joe Perches 
Date: Fri,  8 Mar 2013 17:03:25 -0800

> Emitting netdev_alloc_skb and netdev_alloc_skb_ip_align OOM
> messages is unnecessary as there is already a dump_stack
> after allocation failures.
> 
> Other trivial changes around these removals:
> 
> Convert a few comparisons of pointer to 0 to !pointer.
> Change flow to remove unnecessary label.
> Remove now unused variable.
> Hoist assignment from if.
> 
> Signed-off-by: Joe Perches 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] irq: add quirk for broken interrupt remapping on 55XX chipsets

2013-03-09 Thread Neil Horman

On Mon, Mar 04, 2013 at 02:04:19PM -0500, Neil Horman wrote:
> A few years back intel published a spec update:
> http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf
> 
> For the 5520 and 5500 chipsets which contained an errata (specificially errata
> 53), which noted that these chipsets can't properly do interrupt remapping, 
> and
> as a result the recommend that interrupt remapping be disabled in bios.  While
> many vendors have a bios update to do exactly that, not all do, and of course
> not all users update their bios to a level that corrects the problem.  As a
> result, occasionally interrupts can arrive at a cpu even after affinity for 
> that
> interrupt has be moved, leading to lost or spurrious interrupts (usually
> characterized by the message:
> kernel: do_IRQ: 7.71 No irq handler for vector (irq -1)
> 
> There have been several incidents recently of people seeing this error, and
> investigation has shown that they have system for which their BIOS level is 
> such
> that this feature was not properly turned off.  As such, it would be good to
> give them a reminder that their systems are vulnurable to this problem.
> 
> Signed-off-by: Neil Horman 
> CC: Prarit Bhargava 
> CC: Don Zickus 
> CC: Don Dutile 
> CC: Bjorn Helgaas 
> CC: Asit Mallick 
> CC: linux-...@vger.kernel.org
> 
Ping, anyone want to Ack/Nack this?
Neil

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ibm-acpi-devel] x230: unhandled HKEY event 0x6050

2013-03-09 Thread Henrique de Moraes Holschuh

On Sat, 09 Mar 2013, Borislav Petkov wrote:
> On Sat, Mar 09, 2013 at 09:06:46AM +0100, Yves-Alexis Perez wrote:
> > Also see https://bugzilla.kernel.org/show_bug.cgi?id=51231 and
> > https://patchwork.kernel.org/patch/2124861/
> 
> Great, another fit-the-hardware-to-the-software idiocy because those
> dimwits which call their pile of stinking software running in ring 0 the
> parts of a house want to control every-f*cking-thing. Sigh :(.
> 
> Thanks for the links Yves-Alexis, certainly a sad read.

Hmm, I don't follow.  thinkpad-acpi does run on ring 0, so was that aimed at
me?  Or are you complaining about the Win-8 bug-to-bug compatibility madness
Lenovo added to the ACPI firmware?

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 6/7] Split remaining calls to call_usermodehelper_fns()

2013-03-09 Thread Oleg Nesterov

On 03/08, Lucas De Marchi wrote:
>
> @@ -571,9 +572,17 @@ void do_coredump(siginfo_t *siginfo)
>   goto fail_dropcount;
>   }
>
> - retval = call_usermodehelper_fns(helper_argv[0], helper_argv,
> - NULL, UMH_WAIT_EXEC, umh_pipe_setup,
> - NULL, );
> + sub_info = call_usermodehelper_setup(helper_argv[0],
> + helper_argv, NULL, GFP_KERNEL,
> + umh_pipe_setup, NULL, );
> + if (!sub_info) {
> + printk(KERN_WARNING "%s failed to allocate memory\n",
> +__func__);

Why?

> + argv_free(helper_argv);
> + goto fail_dropcount;
> + }
> +
> + retval = call_usermodehelper_exec(sub_info, UMH_WAIT_EXEC);

I do not really like another argv_free() here... How about

retval = -ENOMEM;
info = call_usermodehelper_setup(...);
if (info)
retval = call_usermodehelper_fns(...);
argv_free();

?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 4/7] KEYS: split call to call_usermodehelper_fns()

2013-03-09 Thread Oleg Nesterov

On 03/08, Lucas De Marchi wrote:
>
>  static int call_usermodehelper_keys(char *path, char **argv, char **envp,
>   struct key *session_keyring, int wait)
>  {
> - return call_usermodehelper_fns(path, argv, envp, wait,
> -umh_keys_init, umh_keys_cleanup,
> -key_get(session_keyring));
> + struct subprocess_info *info;
> +
> + info = call_usermodehelper_setup(path, argv, envp, GFP_KERNEL,
> +   umh_keys_init, umh_keys_cleanup,
> +   key_get(session_keyring));
> + if (!info) {
> + key_put(session_keyring);
> + return -ENOMEM;
> + }
> +
> + return call_usermodehelper_exec(info, wait);

Looks correct, but can't we simpluify it a bit?

info = call_usermodehelper_setup(session_keyring);
if (!info)
return ENOMEM;

key_get(session_keyring));
return call_usermodehelper_exec(info);

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 3/7] kmod: split call to call_usermodehelper_fns()

2013-03-09 Thread Oleg Nesterov

On 03/08, Lucas De Marchi wrote:
>
> Use call_usermodehelper_setup() + call_usermodehelper_exec() instead of
> calling call_usermodehelper_fns(). In case the latter returns -ENOMEM
> the cleanup function may had not been called - in this case we would
> not free argv and module_name.
>
> Signed-off-by: Lucas De Marchi 

Thanks!

looks correct, but...

> @@ -98,8 +100,17 @@ static int call_modprobe(char *module_name, int wait)
>   argv[3] = module_name;  /* check free_modprobe_argv() */
>   argv[4] = NULL;
>
> - return call_usermodehelper_fns(modprobe_path, argv, envp,
> - wait | UMH_KILLABLE, NULL, free_modprobe_argv, NULL);
> + gfp_mask = (wait == UMH_NO_WAIT) ? GFP_ATOMIC : GFP_KERNEL;

Why? it is never called with UMH_NO_WAIT,

> + info = call_usermodehelper_setup(modprobe_path, argv, envp,
> +  gfp_mask, NULL, free_modprobe_argv,

can't we simply use GFP_KERNEL?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 2/7] usermodehelper: Export _exec() and _setup() functions

2013-03-09 Thread Oleg Nesterov

On 03/08, Lucas De Marchi wrote:
>
> call_usermodehelper_setup() + call_usermodehelper_exec() need to be
> called instead of call_usermodehelper_fns() when the cleanup function
> needs to be called even when an ENOMEM error occurs. In this case using
> call_usermodehelper_fns() the user can't distinguish if the cleanup
> function was called or not.
>
> Signed-off-by: Lucas De Marchi 

Reviewed-by: Oleg Nesterov 

> ---
>  include/linux/kmod.h |  8 
>  kernel/kmod.c| 56 
> +---
>  2 files changed, 31 insertions(+), 33 deletions(-)
> 
> diff --git a/include/linux/kmod.h b/include/linux/kmod.h
> index 5398d58..7eebcf5 100644
> --- a/include/linux/kmod.h
> +++ b/include/linux/kmod.h
> @@ -71,6 +71,14 @@ call_usermodehelper_fns(char *path, char **argv, char 
> **envp, int wait,
>   int (*init)(struct subprocess_info *info, struct cred 
> *new),
>   void (*cleanup)(struct subprocess_info *), void *data);
>  
> +extern struct subprocess_info *
> +call_usermodehelper_setup(char *path, char **argv, char **envp, gfp_t 
> gfp_mask,
> +   int (*init)(struct subprocess_info *info, struct cred 
> *new),
> +   void (*cleanup)(struct subprocess_info *), void 
> *data);
> +
> +extern int
> +call_usermodehelper_exec(struct subprocess_info *info, int wait);
> +
>  static inline int
>  call_usermodehelper(char *path, char **argv, char **envp, int wait)
>  {
> diff --git a/kernel/kmod.c b/kernel/kmod.c
> index 56dd349..b39f240 100644
> --- a/kernel/kmod.c
> +++ b/kernel/kmod.c
> @@ -502,14 +502,28 @@ static void helper_unlock(void)
>   * @argv: arg vector for process
>   * @envp: environment for process
>   * @gfp_mask: gfp mask for memory allocation
> + * @cleanup: a cleanup function
> + * @init: an init function
> + * @data: arbitrary context sensitive data
>   *
>   * Returns either %NULL on allocation failure, or a subprocess_info
>   * structure.  This should be passed to call_usermodehelper_exec to
>   * exec the process and free the structure.
> + *
> + * The init function is used to customize the helper process prior to
> + * exec.  A non-zero return code causes the process to error out, exit,
> + * and return the failure to the calling process
> + *
> + * The cleanup function is just before ethe subprocess_info is about to
> + * be freed.  This can be used for freeing the argv and envp.  The
> + * Function must be runnable in either a process context or the
> + * context in which call_usermodehelper_exec is called.
>   */
> -static
>  struct subprocess_info *call_usermodehelper_setup(char *path, char **argv,
> -   char **envp, gfp_t gfp_mask)
> + char **envp, gfp_t gfp_mask,
> + int (*init)(struct subprocess_info *info, struct cred *new),
> + void (*cleanup)(struct subprocess_info *info),
> + void *data)
>  {
>   struct subprocess_info *sub_info;
>   sub_info = kzalloc(sizeof(struct subprocess_info), gfp_mask);
> @@ -520,38 +534,15 @@ struct subprocess_info *call_usermodehelper_setup(char 
> *path, char **argv,
>   sub_info->path = path;
>   sub_info->argv = argv;
>   sub_info->envp = envp;
> +
> + sub_info->cleanup = cleanup;
> + sub_info->init = init;
> + sub_info->data = data;
>out:
>   return sub_info;
>  }
>  
>  /**
> - * call_usermodehelper_setfns - set a cleanup/init function
> - * @info: a subprocess_info returned by call_usermodehelper_setup
> - * @cleanup: a cleanup function
> - * @init: an init function
> - * @data: arbitrary context sensitive data
> - *
> - * The init function is used to customize the helper process prior to
> - * exec.  A non-zero return code causes the process to error out, exit,
> - * and return the failure to the calling process
> - *
> - * The cleanup function is just before ethe subprocess_info is about to
> - * be freed.  This can be used for freeing the argv and envp.  The
> - * Function must be runnable in either a process context or the
> - * context in which call_usermodehelper_exec is called.
> - */
> -static
> -void call_usermodehelper_setfns(struct subprocess_info *info,
> - int (*init)(struct subprocess_info *info, struct cred *new),
> - void (*cleanup)(struct subprocess_info *info),
> - void *data)
> -{
> - info->cleanup = cleanup;
> - info->init = init;
> - info->data = data;
> -}
> -
> -/**
>   * call_usermodehelper_exec - start a usermode application
>   * @sub_info: information about the subprocessa
>   * @wait: wait for the application to finish and return status.
> @@ -563,7 +554,6 @@ void call_usermodehelper_setfns(struct subprocess_info 
> *info,
>   * asynchronously if wait is not set, and runs as a child of keventd.
>   * (ie. it runs with full root capabilities).
>   */
> -static
>  int call_usermodehelper_exec(struct

Re: [PATCH v2 1/7] kernel/sys.c: Use the simpler call_usermodehelper()

2013-03-09 Thread Oleg Nesterov

On 03/08, Lucas De Marchi wrote:
>
> Commit "7ff6764 usermodehelper: cleanup/fix __orderly_poweroff() &&
> argv_free()" simplified __orderly_poweroff() removing the need to use
> call_usermodehelper_fns().
>
> Since we are not passing any callback, it's simpler to use
> call_usermodehelper().
>
> Signed-off-by: Lucas De Marchi 

Reviewed-by: Oleg Nesterov 

> ---
>  kernel/sys.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 81f5644..bd15276 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -2203,8 +2203,7 @@ static int __orderly_poweroff(void)
>   return -ENOMEM;
>   }
>  
> - ret = call_usermodehelper_fns(argv[0], argv, envp, UMH_WAIT_EXEC,
> -   NULL, NULL, NULL);
> + ret = call_usermodehelper(argv[0], argv, envp, UMH_WAIT_EXEC);
>   argv_free(argv);
>  
>   return ret;
> -- 
> 1.8.1.5
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] do not abuse ->cred_guard_mutex in threadgroup_lock()

2013-03-09 Thread Tejun Heo

On Sat, Mar 9, 2013 at 12:01 PM, Oleg Nesterov  wrote:
> threadgroup_lock() takes signal->cred_guard_mutex to ensure that
> thread_group_leader() is stable. This doesn't look nice, the scope
> of this lock in do_execve() is huge.
>
> And as Dave pointed out this can lead to deadlock, we have the
> following dependencies:
>
> do_execve:  cred_guard_mutex -> i_mutex
> cgroup_mount:   i_mutex -> cgroup_mutex
> attach_task_by_pid: cgroup_mutex -> cred_guard_mutex
>
> Change de_thread() to take threadgroup_change_begin() around the
> switch-the-leader code and change threadgroup_lock() to avoid
> ->cred_guard_mutex.
>
> Note that de_thread() can't sleep with ->group_rwsem held, this
> can obviously deadlock with the exiting leader if the writer is
> active, so it does threadgroup_change_end() before schedule().
>
> Reported-by: Dave Jones 
> Signed-off-by: Oleg Nesterov 

Acked-by: Tejun Heo 

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/1] do not abuse ->cred_guard_mutex in threadgroup_lock()

2013-03-09 Thread Oleg Nesterov

threadgroup_lock() takes signal->cred_guard_mutex to ensure that
thread_group_leader() is stable. This doesn't look nice, the scope
of this lock in do_execve() is huge.

And as Dave pointed out this can lead to deadlock, we have the
following dependencies:

do_execve:  cred_guard_mutex -> i_mutex
cgroup_mount:   i_mutex -> cgroup_mutex
attach_task_by_pid: cgroup_mutex -> cred_guard_mutex

Change de_thread() to take threadgroup_change_begin() around the
switch-the-leader code and change threadgroup_lock() to avoid
->cred_guard_mutex.

Note that de_thread() can't sleep with ->group_rwsem held, this
can obviously deadlock with the exiting leader if the writer is
active, so it does threadgroup_change_end() before schedule().

Reported-by: Dave Jones 
Signed-off-by: Oleg Nesterov 
---
 fs/exec.c |3 +++
 include/linux/sched.h |   18 --
 2 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 20df02c..bea2f7d 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -898,11 +898,13 @@ static int de_thread(struct task_struct *tsk)
 
sig->notify_count = -1; /* for exit_notify() */
for (;;) {
+   threadgroup_change_begin(tsk);
write_lock_irq(_lock);
if (likely(leader->exit_state))
break;
__set_current_state(TASK_KILLABLE);
write_unlock_irq(_lock);
+   threadgroup_change_end(tsk);
schedule();
if (unlikely(__fatal_signal_pending(tsk)))
goto killed;
@@ -960,6 +962,7 @@ static int de_thread(struct task_struct *tsk)
if (unlikely(leader->ptrace))
__wake_up_parent(leader, leader->parent);
write_unlock_irq(_lock);
+   threadgroup_change_end(tsk);
 
release_task(leader);
}
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 932a90c..67cfdb5 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2486,27 +2486,18 @@ static inline void threadgroup_change_end(struct 
task_struct *tsk)
  *
  * Lock the threadgroup @tsk belongs to.  No new task is allowed to enter
  * and member tasks aren't allowed to exit (as indicated by PF_EXITING) or
- * perform exec.  This is useful for cases where the threadgroup needs to
- * stay stable across blockable operations.
+ * change ->group_leader/pid.  This is useful for cases where the threadgroup
+ * needs to stay stable across blockable operations.
  *
  * fork and exit paths explicitly call threadgroup_change_{begin|end}() for
  * synchronization.  While held, no new task will be added to threadgroup
  * and no existing live task will have its PF_EXITING set.
  *
- * During exec, a task goes and puts its thread group through unusual
- * changes.  After de-threading, exclusive access is assumed to resources
- * which are usually shared by tasks in the same group - e.g. sighand may
- * be replaced with a new one.  Also, the exec'ing task takes over group
- * leader role including its pid.  Exclude these changes while locked by
- * grabbing cred_guard_mutex which is used to synchronize exec path.
+ * de_thread() does threadgroup_change_{begin|end}() when a non-leader
+ * sub-thread becomes a new leader.
  */
 static inline void threadgroup_lock(struct task_struct *tsk)
 {
-   /*
-* exec uses exit for de-threading nesting group_rwsem inside
-* cred_guard_mutex. Grab cred_guard_mutex first.
-*/
-   mutex_lock(>signal->cred_guard_mutex);
down_write(>signal->group_rwsem);
 }
 
@@ -2519,7 +2510,6 @@ static inline void threadgroup_lock(struct task_struct 
*tsk)
 static inline void threadgroup_unlock(struct task_struct *tsk)
 {
up_write(>signal->group_rwsem);
-   mutex_unlock(>signal->cred_guard_mutex);
 }
 #else
 static inline void threadgroup_change_begin(struct task_struct *tsk) {}
-- 
1.5.5.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/1] do not abuse ->cred_guard_mutex in threadgroup_lock()

2013-03-09 Thread Oleg Nesterov

On 03/09, Li Zefan wrote:
>
> We don't need both patches for 3.9, so we'll queue Oleg's fix for 3.9 and
> yours for 3.10?

Well. OK, please see 1/1 (compile tested only).

But I still like the patch from Tejun more... Except _perhaps_ my
patch is better for 3.9 just because it is simpler.

And. I still think that probably we can avoid thread_group_leader()
recheck-and-restart logic in attach_task_by_pid(). But even if this
is true (and thus we can revert the changes in de_thread), we should
do this on top of Tejun's patch.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ARM: OMAP: drop "select MACH_NOKIA_RM696"

2013-03-09 Thread Paul Bolle

On Sat, 2013-03-09 at 00:01 +, Russell King - ARM Linux wrote:
> It's actually quite clever.  There's two levels to it.
> 
> The first is that CONFIG_MACH_xxx result in their machine_is_xxx() macros
> being defined to constant zero if the CONFIG option is not enabled.  That
> allows the compiler to throw away code for disabled platforms because
> the expression is always false.
> 
> Otherwise, they end up as (machine_arch_type == MACH_TYPE_xxx).
> 
> The second is the magic which happens when two CONFIG_MACH_xxx are
> selected.  If only one is selected, then machine_arch_type is defined
> to the appropriate MACH_TYPE_xxx.  This means that the above expression
> becomes constant-true, and the conditional is eliminated.
> 
> If more than one is selected, then machine_arch_type is defined to a
> variable which is appropriately set to one of the MACH_TYPE_xxx values.

At boot?

> So, the result is that:
> - de-selected platforms have their if (machine_is_xxx()) { } optimised
>   out of the kernel.
> - for a kernel built targetting one platform, all the
>   if (machine_is_xxx()) tests are optimised away, leaving only the
>   relevant code behind.
> - otherwise, we get the _appropriate_ conditional code for the
>   configuration generated.

Thanks for clarifying this. Quite clever indeed. 

> However, going back to that MACH_NOKIA_RM696.  If there exists only a
> select of this symbol and no "config MACH_NOKIA_RM696" entry, then the
> symbol will never be generated in the output .config file.
> 
>[...]
> 
> My conclusion is... it's a mess.

That mess can only be fully cleaned up if the code for the RM-696 that
now is maintained in some unknown to me repository gets merged into
mainline, can't it?

In the meantime, how do you prefer I solve the (trivial) issue of an
useless select for MACH_NOKIA_RM696? Drop that select or add an (equally
useless) config entry for MACH_NOKIA_RM696? Or should I try to ignore it
for the time being?


Paul Bolle

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: memory leak and other oddness in pinctrl-mvebu.c

2013-03-09 Thread Sebastian Hesselbarth

David,

I will not be able to test before mid-week ealiest. I added Andrew Lunn to
the list. He and Thomas can test your patch for Kirkwood and Armada XP/370
respectively. I will test on Dove asap.

Sebastian

On Sat, Mar 9, 2013 at 8:02 PM, David Woodhouse  wrote:
> On Sat, 2013-03-09 at 09:49 +0100, Sebastian Hesselbarth wrote:
>> I don't have a strong opinion on that, but I prefer not to have the list
>> statically in the SoC specific drivers. I think counting the number of
>> unique functions for each SoC specific driver once and verify the above
>> heuristic (fewer unique functions than pins) is still valid. Then drop
>> the krealloc and leave the array the way it is allocated on devm_kzalloc.
>
> Yeah. If you stick a check in the loop and make it warn if it *would*
> have run over the end of the array, that sounds like it ought to be
> fine. Something like this, perhaps? Still untested but otherwise
> Signed-off-by: David Woodhouse 
>
> diff --git a/drivers/pinctrl/mvebu/pinctrl-mvebu.c 
> b/drivers/pinctrl/mvebu/pinctrl-mvebu.c
> index c689c04..55d55d5 100644
> --- a/drivers/pinctrl/mvebu/pinctrl-mvebu.c
> +++ b/drivers/pinctrl/mvebu/pinctrl-mvebu.c
> @@ -478,7 +478,8 @@ static struct pinctrl_ops mvebu_pinctrl_ops = {
> .dt_free_map = mvebu_pinctrl_dt_free_map,
>  };
>
> -static int _add_function(struct mvebu_pinctrl_function *funcs, const char 
> *name)
> +static int _add_function(struct mvebu_pinctrl_function *funcs, int nr_funcs,
> +const char *name)
>  {
> while (funcs->num_groups) {
> /* function already there */
> @@ -487,7 +488,11 @@ static int _add_function(struct mvebu_pinctrl_function 
> *funcs, const char *name)
> return -EEXIST;
> }
> funcs++;
> +   nr_funcs--;
> }
> +   if (!nr_funcs)
> +   return -EOVERFLOW;
> +
> funcs->name = name;
> funcs->num_groups = 1;
> return 0;
> @@ -501,7 +506,7 @@ static int mvebu_pinctrl_build_functions(struct 
> platform_device *pdev,
> int n, s;
>
> /* we allocate functions for number of pins and hope
> -* there are less unique functions than pins available */
> +* there are fewer unique functions than pins available */
> funcs = devm_kzalloc(>dev, pctl->desc.npins *
>  sizeof(struct mvebu_pinctrl_function), 
> GFP_KERNEL);
> if (!funcs)
> @@ -510,26 +515,27 @@ static int mvebu_pinctrl_build_functions(struct 
> platform_device *pdev,
> for (n = 0; n < pctl->num_groups; n++) {
> struct mvebu_pinctrl_group *grp = >groups[n];
> for (s = 0; s < grp->num_settings; s++) {
> +   int ret;
> +
> /* skip unsupported settings on this variant */
> if (pctl->variant &&
> !(pctl->variant & grp->settings[s].variant))
> continue;
>
> /* check for unique functions and count groups */
> -   if (_add_function(funcs, grp->settings[s].name))
> +   ret = _add_function(funcs, pctl->desc.npins,
> +   grp->settings[s].name);
> +   if (ret == -EOVERFLOW)
> +   dev_err(>dev,
> +   "More functions than pins(%d)\n",
> +   pctl->desc.npins);
> +   if (ret)
> continue;
>
> num++;
> }
> }
>
> -   /* with the number of unique functions and it's groups known,
> -  reallocate functions and assign group names */
> -   funcs = krealloc(funcs, num * sizeof(struct mvebu_pinctrl_function),
> -GFP_KERNEL);
> -   if (!funcs)
> -   return -ENOMEM;
> -
> pctl->num_functions = num;
> pctl->functions = funcs;
>
>
>
> --
> dwmw2
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-3.9.0-rc1+: Output from "make kernelrelease"contains incorrect data

2013-03-09 Thread Chris Clayton

In Linus' current tree, the first time the command "make kernelrelease" 
is run after building a kernel, the output contains some unwanted text. 
Subsequent uses of the command produce the expected output. This appears 
to be a regression - 3.8.2 does not have this problem.


This is easily demonstrated from the command line by the following:


...
System is 2311 kB
CRC a4e38b86
Kernel: arch/x86/boot/bzImage is ready  (#186)
$ make kernelrelease
scripts/kconfig/conf --silentoldconfig Kconfig 3.9.0-rc1+
$ make kernelrelease
3.9.0-rc1+

Happy to test the fix.

Chris



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] coredump: introduce dump_interrupted()

2013-03-09 Thread Oleg Nesterov

On 03/08, Andrew Morton wrote:
>
> On Fri, 8 Mar 2013 18:59:15 +0100 Oleg Nesterov  wrote:
>
> > Change dump_write(), dump_seek() and do_coredump() to check
> > signal_pending() and abort if it is true.
>
> hm, why.

Firstly. we need these changes to ensure that the coredump won't delay
suspend, and to ensure it reacts to SIGKILL "quickly enough". A core
dump can take a lot of time.

> I think we're missing some context here - this is to support freezing,
> yes?

No. This is to document that

- currently we do not support freezing

- why we do not support, and what should we do to support
  (the comments in dump_interrupted/wait_for_dump_helpers)

If do_coredump() "races" with suspend/etc we simply abort, hopefully
this is fine in practice. And even if we decide to change this later,
I hope this series can be counted as a preparation.

> An example of why this is needed: the dump_interrupted() check which
> was added to dump_seek() is just weird.  An lseek is instantaneous,
   ^
Oh, I simply do not know, this can depend on the filesystem?

> And if the file doesn't support lseek (do such files exist?  should we
> be returning 0 instead of -ENOMEM?),

(can't comment, I do not know)

> we just sit there in a loop
> extending the file with write().  This can take *ages*, but this part
> of dump_seek() *didn't* get the signal check!

The loop does dump_write() which checks dump_interrupted() at the start.

> > Ideally it should do try_to_freeze() but then we need the unpleasant
> > changes in dump_write() and wait_for_dump_helpers(). So far we simply
> > accept the fact that the freezer can truncate a core-dump but at least
> > you can reliably suspend.
>
> OK, so there is some connection between this and suspending.  Details,
> please...

It is not trivial to change dump_write() to restart if f_op->write()
fails because of freezing(). We need to handle the short writes, we need
to clear TIF_SIGPENDING (and we can't rely on recalc_sigpending() unless
we change it to check PF_DUMPCORE), and somehow we need to avoid the
races with freeze_task + __thaw_task.

Everything looks possible but imho doesn't worth a trouble, a coredump
truncated by freezer is tolerable. I hope. And again, even if we decide
to "fix" this problem we can do this on top of these changes.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: soft lockup on all kernels after 2.6.3x

2013-03-09 Thread Alexey Vlasov

On Thu, Mar 07, 2013 at 08:57:28AM -0800, Eric Dumazet wrote:
> 
> Well, remove all alien patches and try to reproduce the bug with a
> pristine linux kernel.

I wrote to Spender (developer grsec) and he confirmed that it's possible
that a problem is with grsec patch.

Thank you greatly for your answers!

-- 
BRGDS. Alexey Vlasov.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 >

1 - 100 of 342 matches

Mail list logo