Re: [PATCH v2] memory-hotplug: Fix kernel warning during memory hotplug on ppc64
On Tue, 3 Nov 2015 11:21:59 -0600 John Allen <jal...@linux.vnet.ibm.com> wrote: > This patch fixes a bug where a kernel warning is triggered when performing > a memory hotplug on ppc64. This warning may also occur on any architecture > that has multiple sections per memory block. > > [ 78.300767] [ cut here ] > [ 78.300768] WARNING: at ../drivers/base/memory.c:210 > [ 78.300769] Modules linked in: rpadlpar_io(X) rpaphp(X) tcp_diag udp_diag > inet_diag unix_diag af_packet_diag netlink_diag af_packet xfs libcrc32c > ibmveth(X) rtc_generic btrfs xor raid6_pq xts gf128mul dm_crypt sd_mod sr_mod > cdrom crc_t10dif ibmvscsi(X) scsi_transport_srp scsi_tgt dm_mod sg scsi_mod > autofs4 > [ 78.300789] Supported: Yes, External > [ 78.300791] CPU: 1 PID: 3090 Comm: systemd-udevd Tainted: G X > 3.12.45-1-default #1 > [ 78.300793] task: c004d7d1d970 ti: c004d7b9 task.ti: > c004d7b9 > [ 78.300794] NIP: c04fcff8 LR: c04fda84 CTR: > > [ 78.300795] REGS: c004d7b93930 TRAP: 0700 Tainted: G X > (3.12.45-1-default) > [ 78.300796] MSR: 80029033 <SF,EE,ME,IR,DR,RI,LE> CR: 24088848 > XER: > [ 78.300800] CFAR: c04fcf98 SOFTE: 1 > GPR00: 0537 c004d7b93bb0 c0e7f200 00053000 > GPR04: 1000 0001 c0e0f200 > GPR08: 0001 0537 014dc000 > GPR12: 00054000 ce7f0900 10041040 > GPR16: 0100206f0010 1003ff78 1006c824 100410b0 > GPR20: 1003ff90 1006c00c 01002073cd20 0100206f0760 > GPR24: 0100206f85a0 c076d950 c004ef7c95e0 c004d7b93e00 > GPR28: c004de601738 0001 c1218f80 003f > [ 78.300818] NIP [c04fcff8] memory_block_action+0x258/0x2e0 > [ 78.300820] LR [c04fda84] memory_subsys_online+0x54/0x100 > [ 78.300821] Call Trace: > [ 78.300822] [c004d7b93bb0] [c9071ce0] 0xc9071ce0 > (unreliable) > [ 78.300824] [c004d7b93c40] [c04fda84] > memory_subsys_online+0x54/0x100 > [ 78.300826] [c004d7b93c70] [c04df784] device_online+0xb4/0x120 > [ 78.300828] [c004d7b93cb0] [c04fd738] > store_mem_state+0x88/0x220 > [ 78.300830] [c004d7b93cf0] [c04db448] dev_attr_store+0x68/0xa0 > [ 78.300833] [c004d7b93d30] [c031f938] > sysfs_write_file+0xf8/0x1d0 > [ 78.300835] [c004d7b93d90] [c027d29c] vfs_write+0xec/0x250 > [ 78.300837] [c004d7b93de0] [c027dfdc] SyS_write+0x6c/0xf0 > [ 78.300839] [c004d7b93e30] [c000a17c] syscall_exit+0x0/0x7c > [ 78.300840] Instruction dump: > [ 78.300841] 780a0560 79482ea4 7ce94214 2fa7 41de0014 7d09402a 396b4000 > 7907ffe3 > [ 78.300844] 4082ff54 3cc2fff9 8926b83a 69290001 <0b09> 2fa9 > 40de006c 3860fff0 > [ 78.300847] ---[ end trace dfec8da06ebbc762 ]--- > > The warning is triggered because there is a udev rule that automatically > tries to online memory after it has been added. The udev rule varies from > distro to distro, but will generally look something like: > > SUBSYSTEM=="memory", ACTION=="add", ATTR{state}=="offline", > ATTR{state}="online" > > On any architecture that uses memory_probe_store to reserve memory, > this can interrupt the memory reservation process. This patch modifies > memory_probe_store to take the hotplug sysfs lock to prevent the online > of added memory before the completion of the probe. > > Signed-off-by: John Allen <jal...@linux.vnet.ibm.com> > --- Looks good to me. Reviewed-by: Yasuaki Ishimatsu <isimatu.yasu...@jp.fujitsu.com> Thanks, Yasuaki Ishimatsu > v2: Move call to unlock_device_hotplug under "out" label > > diff --git a/drivers/base/memory.c b/drivers/base/memory.c > index bece691..7c50415 100644 > --- a/drivers/base/memory.c > +++ b/drivers/base/memory.c > @@ -422,6 +422,10 @@ memory_probe_store(struct device *dev, struct > device_attribute *attr, > if (phys_addr & ((pages_per_block << PAGE_SHIFT) - 1)) > return -EINVAL; > > + ret = lock_device_hotplug_sysfs(); > + if (ret) > + return ret; > + > for (i = 0; i < sections_per_block; i++) { > nid = memory_add_physaddr_to_nid(phys_addr); > ret = add_memory(nid, phys_addr, > @@ -434,6 +438,7 @@ memory_probe_store(struct device *dev, struct > device_attrib
Re: [PATCH] slab: Fix nodeid bounds check for non-contiguous node IDs
(2014/12/01 7:16), Paul Mackerras wrote: The bounds check for nodeid in cache_alloc_node gives false positives on machines where the node IDs are not contiguous, leading to a panic at boot time. For example, on a POWER8 machine the node IDs are typically 0, 1, 16 and 17. This means that num_online_nodes() returns 4, so when cache_alloc_node is called with nodeid = 16 the VM_BUG_ON triggers. Do you have the call trace? If you have it, please add it in the description. To fix this, we instead compare the nodeid with MAX_NUMNODES, and additionally make sure it isn't negative (since nodeid is an int). The check is there mainly to protect the array dereference in the get_node() call in the next line, and the array being dereferenced is of size MAX_NUMNODES. If the nodeid is in range but invalid, the BUG_ON in the next line will catch that. Signed-off-by: Paul Mackerras pau...@samba.org Do you need to backport it into -stable kernels? --- diff --git a/mm/slab.c b/mm/slab.c index eb2b2ea..f34e053 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -3076,7 +3076,7 @@ static void *cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, void *obj; int x; - VM_BUG_ON(nodeid num_online_nodes()); + VM_BUG_ON(nodeid 0 || nodeid = MAX_NUMNODES); How about use: VM_BUG_ON(!node_online(nodeid)); When allocating the memory, the node of the memory being allocated must be online. But your code cannot check the condition. Thanks, Yasuaki Ishimatsu n = get_node(cachep, nodeid); BUG_ON(!n); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] slab: Fix nodeid bounds check for non-contiguous node IDs
(2014/12/01 9:42), Paul Mackerras wrote: On Mon, Dec 01, 2014 at 09:14:40AM +0900, Yasuaki Ishimatsu wrote: (2014/12/01 7:16), Paul Mackerras wrote: The bounds check for nodeid in cache_alloc_node gives false positives on machines where the node IDs are not contiguous, leading to a panic at boot time. For example, on a POWER8 machine the node IDs are typically 0, 1, 16 and 17. This means that num_online_nodes() returns 4, so when cache_alloc_node is called with nodeid = 16 the VM_BUG_ON triggers. Do you have the call trace? If you have it, please add it in the description. I can get it easily enough. To fix this, we instead compare the nodeid with MAX_NUMNODES, and additionally make sure it isn't negative (since nodeid is an int). The check is there mainly to protect the array dereference in the get_node() call in the next line, and the array being dereferenced is of size MAX_NUMNODES. If the nodeid is in range but invalid, the BUG_ON in the next line will catch that. Signed-off-by: Paul Mackerras pau...@samba.org Do you need to backport it into -stable kernels? It does need to go to stable, yes, for 3.10 and later. --- diff --git a/mm/slab.c b/mm/slab.c index eb2b2ea..f34e053 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -3076,7 +3076,7 @@ static void *cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, void *obj; int x; - VM_BUG_ON(nodeid num_online_nodes()); + VM_BUG_ON(nodeid 0 || nodeid = MAX_NUMNODES); How about use: VM_BUG_ON(!node_online(nodeid)); That would not be better, since node_online() doesn't bounds-check its argument. Ah. You are right. When allocating the memory, the node of the memory being allocated must be online. But your code cannot check the condition. The following two lines: n = get_node(cachep, nodeid); BUG_ON(!n); effectively check that condition already, as I tried to explain in the commit message. O.K. I understood. Thansk, Yasuaki Ishimatsu Regards, Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
(2014/12/01 13:28), Paul Mackerras wrote: The bounds check for nodeid in cache_alloc_node gives false positives on machines where the node IDs are not contiguous, leading to a panic at boot time. For example, on a POWER8 machine the node IDs are typically 0, 1, 16 and 17. This means that num_online_nodes() returns 4, so when cache_alloc_node is called with nodeid = 16 the VM_BUG_ON triggers, like this: kernel BUG at /home/paulus/kernel/kvm/mm/slab.c:3079! Oops: Exception in kernel mode, sig: 5 [#1] SMP NR_CPUS=1024 NUMA PowerNV Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 3.18.0-rc5-kvm+ #17 task: c13ba230 ti: c1494000 task.ti: c1494000 NIP: c0264f6c LR: c0264f5c CTR: REGS: c14979a0 TRAP: 0700 Not tainted (3.18.0-rc5-kvm+) MSR: 92021032 SF,HV,VEC,ME,IR,DR,RI CR: 28000448 XER: 2000 CFAR: c047e978 SOFTE: 0 GPR00: c0264f5c c1497c20 c1499d48 0004 GPR04: 0100 0010 0068 GPR08: 0001 082d c0cca5a8 GPR12: 48000448 cfda 01003bd44ff0 10020578 GPR16: 01003bd44ff8 01003bd45000 0001 GPR20: 0010 GPR24: c00ffe80 c0c824ec 0068 c00ffe80 GPR28: 0010 c00ffe80 0010 NIP [c0264f6c] .cache_alloc_node+0x6c/0x270 LR [c0264f5c] .cache_alloc_node+0x5c/0x270 Call Trace: [c1497c20] [c0264f5c] .cache_alloc_node+0x5c/0x270 (unreliable) [c1497cf0] [c026552c] .kmem_cache_alloc_node_trace+0xdc/0x360 [c1497dc0] [c0c824ec] .init_list+0x3c/0x128 [c1497e50] [c0c827b4] .kmem_cache_init+0x1dc/0x258 [c1497ef0] [c0c54090] .start_kernel+0x2a0/0x568 [c1497f90] [c0008c6c] start_here_common+0x20/0xa8 Instruction dump: 7c7d1b78 7c962378 4bda4e91 6000 3c620004 38800100 386370d8 48219959 6000 7f83e000 7d301026 5529effe 0b09 393c0010 79291f24 7d3d4a14 To fix this, we instead compare the nodeid with MAX_NUMNODES, and additionally make sure it isn't negative (since nodeid is an int). The check is there mainly to protect the array dereference in the get_node() call in the next line, and the array being dereferenced is of size MAX_NUMNODES. If the nodeid is in range but invalid (for example if the node is off-line), the BUG_ON in the next line will catch that. Signed-off-by: Paul Mackerras pau...@samba.org --- Looks good to me. Reviewed-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com If you need to backport it into -stable kernel, please read Documentation/stable_kernel_rules.txt. Thanks, Yasuaki Ishimatsu v2: include the oops message in the patch description mm/slab.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/slab.c b/mm/slab.c index eb2b2ea..f34e053 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -3076,7 +3076,7 @@ static void *cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, void *obj; int x; - VM_BUG_ON(nodeid num_online_nodes()); + VM_BUG_ON(nodeid 0 || nodeid = MAX_NUMNODES); n = get_node(cachep, nodeid); BUG_ON(!n); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 0/4] Unify CPU hotplug lock interface
(2013/08/30 9:22), Toshi Kani wrote: lock_device_hotplug() was recently introduced to serialize CPU Memory online/offline and hotplug operations, along with sysfs online interface restructure (commit 4f3549d7). With this new locking scheme, cpu_hotplug_driver_lock() is redundant and is no longer necessary. This patchset makes sure that lock_device_hotplug() covers all CPU online/ offline interfaces, and then removes cpu_hotplug_driver_lock(). v2: - Rebased to the pm tree, bleeding-edge. - Changed patch 2/4 to use lock_device_hotplug_sysfs(). --- Toshi Kani (4): hotplug, x86: Fix online state in cpu0 debug interface hotplug, x86: Add hotplug lock to missing places hotplug, x86: Disable ARCH_CPU_PROBE_RELEASE on x86 hotplug, powerpc, x86: Remove cpu_hotplug_driver_lock() --- The patch-set looks good to me. Acked-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com Thanks, Yasuaki Ishimatsu arch/powerpc/kernel/smp.c | 12 -- arch/powerpc/platforms/pseries/dlpar.c | 40 +- arch/x86/Kconfig | 4 arch/x86/kernel/smpboot.c | 21 -- arch/x86/kernel/topology.c | 11 ++ drivers/base/cpu.c | 34 +++-- include/linux/cpu.h| 13 --- 7 files changed, 45 insertions(+), 90 deletions(-) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 7/7] drivers: base: refactor add_memory_section() to add_memory_block()
(2013/08/22 17:20), Yasuaki Ishimatsu wrote: (2013/08/21 2:13), Seth Jennings wrote: Right now memory_dev_init() maintains the memory block pointer between iterations of add_memory_section(). This is nasty. This patch refactors add_memory_section() to become add_memory_block(). The refactoring pulls the section scanning out of memory_dev_init() and simplifies the signature. Signed-off-by: Seth Jennings sjenn...@linux.vnet.ibm.com --- drivers/base/memory.c | 48 +--- 1 file changed, 21 insertions(+), 27 deletions(-) diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 7d9d3bc..021283a 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -602,32 +602,31 @@ static int init_memory_block(struct memory_block **memory, return ret; } -static int add_memory_section(struct mem_section *section, -struct memory_block **mem_p) +static int add_memory_block(int base_section_nr) { -struct memory_block *mem = NULL; -int scn_nr = __section_nr(section); -int ret = 0; - -if (mem_p *mem_p) { -if (scn_nr = (*mem_p)-start_section_nr -scn_nr = (*mem_p)-end_section_nr) { -mem = *mem_p; -} -} +struct memory_block *mem; +int i, ret, section_count = 0, section_nr; -if (mem) -mem-section_count++; -else { -ret = init_memory_block(mem, section, MEM_ONLINE); -/* store memory_block pointer for next loop */ -if (!ret mem_p) -*mem_p = mem; +for (i = base_section_nr; + (i base_section_nr + sections_per_block) i NR_MEM_SECTIONS; + i++) { +if (!present_section_nr(i)) +continue; +if (section_count == 0) +section_nr = i; +section_count++; } -return ret; +if (section_count == 0) +return 0; +ret = init_memory_block(mem, __nr_to_section(section_nr), MEM_ONLINE); +if (ret) +return ret; +mem-section_count = section_count; +return 0; } + /* * need an interface for the VM to add new memory regions, * but without onlining it. @@ -733,7 +732,6 @@ int __init memory_dev_init(void) int ret; int err; unsigned long block_sz; -struct memory_block *mem = NULL; ret = subsys_system_register(memory_subsys, memory_root_attr_groups); if (ret) @@ -747,12 +745,8 @@ int __init memory_dev_init(void) * during boot and have been initialized */ mutex_lock(mem_sysfs_mutex); -for (i = 0; i NR_MEM_SECTIONS; i++) { -if (!present_section_nr(i)) -continue; -/* don't need to reuse memory_block if only one per block */ -err = add_memory_section(__nr_to_section(i), - (sections_per_block == 1) ? NULL : mem); +for (i = 0; i NR_MEM_SECTIONS; i += sections_per_block) { Why do you remove present_setcion_nr() check? Sorry for the noise. I understood. The check was moved into add_memory_section(). So it was removed. Thanks, Yasuaki Ishimatsu +err = add_memory_block(i); if (!ret) Thanks, Yasuaki Ishimatasu ret = err; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/7] drivers: base: move mutex lock out of add_memory_section()
(2013/08/21 2:24), Seth Jennings wrote: Gah! Forgot the cover letter. This patchset just seeks to clean up and refactor some things in memory.c for better understanding and possibly better performance due do a decrease in mutex acquisitions and refcount churn at boot time. No functional change is intended by this set! All patches were Reviewed-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com Tested-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com Thanks, Yasuaki Ishimatsu Seth -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 7/7] drivers: base: refactor add_memory_section() to add_memory_block()
(2013/08/21 2:13), Seth Jennings wrote: Right now memory_dev_init() maintains the memory block pointer between iterations of add_memory_section(). This is nasty. This patch refactors add_memory_section() to become add_memory_block(). The refactoring pulls the section scanning out of memory_dev_init() and simplifies the signature. Signed-off-by: Seth Jennings sjenn...@linux.vnet.ibm.com --- drivers/base/memory.c | 48 +--- 1 file changed, 21 insertions(+), 27 deletions(-) diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 7d9d3bc..021283a 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -602,32 +602,31 @@ static int init_memory_block(struct memory_block **memory, return ret; } -static int add_memory_section(struct mem_section *section, - struct memory_block **mem_p) +static int add_memory_block(int base_section_nr) { - struct memory_block *mem = NULL; - int scn_nr = __section_nr(section); - int ret = 0; - - if (mem_p *mem_p) { - if (scn_nr = (*mem_p)-start_section_nr - scn_nr = (*mem_p)-end_section_nr) { - mem = *mem_p; - } - } + struct memory_block *mem; + int i, ret, section_count = 0, section_nr; - if (mem) - mem-section_count++; - else { - ret = init_memory_block(mem, section, MEM_ONLINE); - /* store memory_block pointer for next loop */ - if (!ret mem_p) - *mem_p = mem; + for (i = base_section_nr; + (i base_section_nr + sections_per_block) i NR_MEM_SECTIONS; + i++) { + if (!present_section_nr(i)) + continue; + if (section_count == 0) + section_nr = i; + section_count++; } - return ret; + if (section_count == 0) + return 0; + ret = init_memory_block(mem, __nr_to_section(section_nr), MEM_ONLINE); + if (ret) + return ret; + mem-section_count = section_count; + return 0; } + /* * need an interface for the VM to add new memory regions, * but without onlining it. @@ -733,7 +732,6 @@ int __init memory_dev_init(void) int ret; int err; unsigned long block_sz; - struct memory_block *mem = NULL; ret = subsys_system_register(memory_subsys, memory_root_attr_groups); if (ret) @@ -747,12 +745,8 @@ int __init memory_dev_init(void) * during boot and have been initialized */ mutex_lock(mem_sysfs_mutex); - for (i = 0; i NR_MEM_SECTIONS; i++) { - if (!present_section_nr(i)) - continue; - /* don't need to reuse memory_block if only one per block */ - err = add_memory_section(__nr_to_section(i), - (sections_per_block == 1) ? NULL : mem); + for (i = 0; i NR_MEM_SECTIONS; i += sections_per_block) { Why do you remove present_setcion_nr() check? + err = add_memory_block(i); if (!ret) Thanks, Yasuaki Ishimatasu ret = err; } ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [Patch v4 08/12] memory-hotplug: remove memmap of sparse-vmemmap
; + + pte = pte_offset_kernel(pmd, addr); + for (; addr end; pte++, addr += PAGE_SIZE) { + next = (addr + PAGE_SIZE) PAGE_MASK; + if (next end) + next = end; + + if (pte_none(*pte)) + continue; + if (IS_ALIGNED(addr, PAGE_SIZE) + IS_ALIGNED(end, PAGE_SIZE)) { + vmemmap_free_pages(pte_page(*pte), 0); + spin_lock(init_mm.page_table_lock); + pte_clear(init_mm, addr, pte); + spin_unlock(init_mm.page_table_lock); If addr or end is not alianed with PAGE_SIZE, you may leak some memory. yes, I think we can handle this situation with the method you mentioned in the change log: 1. When removing memory, the page structs of the revmoved memory are filled with 0xFD. 2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared. In this case, the page used as PT/PMD can be freed. By the way, why is 0xFD? There is no reason. I just filled the page with unique number. Thanks, Yasuaki Ishimatsu + } + } + + free_pte_table(pmd); + __flush_tlb_all(); +} + +static void vmemmap_pmd_remove(pud_t *pud, unsigned long addr, unsigned long end) +{ + unsigned long next; + pmd_t *pmd; + + pmd = pmd_offset(pud, addr); + for (; addr end; addr = next, pmd++) { + next = pmd_addr_end(addr, end); + if (pmd_none(*pmd)) + continue; + + if (cpu_has_pse) { + unsigned long pte_base; + + if (IS_ALIGNED(addr, PMD_SIZE) + IS_ALIGNED(next, PMD_SIZE)) { + vmemmap_free_pages(pmd_page(*pmd), + get_order(PMD_SIZE)); + spin_lock(init_mm.page_table_lock); + pmd_clear(pmd); + spin_unlock(init_mm.page_table_lock); + continue; + } + + /* +* We use 2M page, but we need to remove part of them, +* so split 2M page to 4K page. +*/ + pte_base = get_zeroed_page(GFP_ATOMIC | __GFP_NOTRACK); get_zeored_page() may fail. You should handle this error. That means system is out of memory, I will trigger a bug_on. + split_large_page((pte_t *)pmd, addr, (pte_t *)pte_base); + __flush_tlb_all(); + + spin_lock(init_mm.page_table_lock); + pmd_populate_kernel(init_mm, pmd, (pte_t *)pte_base); + spin_unlock(init_mm.page_table_lock); + } + + vmemmap_pte_remove(pmd, addr, next); + } + + free_pmd_table(pud); + __flush_tlb_all(); +} + +static void vmemmap_pud_remove(pgd_t *pgd, unsigned long addr, unsigned long end) +{ + unsigned long next; + pud_t *pud; + + pud = pud_offset(pgd, addr); + for (; addr end; addr = next, pud++) { + next = pud_addr_end(addr, end); + if (pud_none(*pud)) + continue; + + vmemmap_pmd_remove(pud, addr, next); + } + + free_pud_table(pgd); + __flush_tlb_all(); +} + +void vmemmap_free(struct page *memmap, unsigned long nr_pages) +{ + unsigned long addr = (unsigned long)memmap; + unsigned long end = (unsigned long)(memmap + nr_pages); + unsigned long next; + + for (; addr end; addr = next) { + pgd_t *pgd = pgd_offset_k(addr); + + next = pgd_addr_end(addr, end); + if (!pgd_present(*pgd)) + continue; + + vmemmap_pud_remove(pgd, addr, next); + sync_global_pgds(addr, next); The parameter for sync_global_pgds() is [start, end], not [start, end) yes, thanks. + } +} +#endif diff --git a/mm/sparse.c b/mm/sparse.c index fac95f2..3a16d68 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -613,12 +613,13 @@ static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid, /* This will make the necessary allocations eventually. */ return sparse_mem_map_populate(pnum, nid); } -static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages) +static void __kfree_section_memmap(struct page *page, unsigned long nr_pages) Why do you change this line? 0k, it is no need to change. { - return; /* XXX: Not implemented yet */ + vmemmap_free(page, nr_pages); } static void free_map_bootmem(struct page *page, unsigned long nr_pages) { + vmemmap_free(page, nr_pages); } #else static struct page *__kmalloc_section_memmap(unsigned long nr_pages
Re: [Patch v4 00/12] memory-hotplug: hot-remove physical memory
Hi Andrew, 2012/11/28 4:27, Andrew Morton wrote: On Tue, 27 Nov 2012 18:00:10 +0800 Wen Congyang we...@cn.fujitsu.com wrote: The patch-set was divided from following thread's patch-set. https://lkml.org/lkml/2012/9/5/201 The last version of this patchset: https://lkml.org/lkml/2012/11/1/93 As we're now at -rc7 I'd prefer to take a look at all of this after the 3.7 release - please resend everything shortly after 3.8-rc1. Almost patches about memory hotplug has been merged into your and Rafael's tree. And these patches are waiting to open the v3.8 merge window. Remaining patches are only this patch-set. So we hope that this patch-set is merged into v3.8. In merging this patch-set into v3.8, Linux on x86_64 makes a memory hot plug possible. Thanks, Yasuaki Ishimatsu If you want to know the reason, please read following thread. https://lkml.org/lkml/2012/10/2/83 Please include the rationale within each version of the patchset rather than by linking to an old email. Because a) this way, more people are likely to read it b) it permits the text to be maimtained as the code evolves c) it permits the text to be included in the mainlnie commit, where people can find it. The patch-set has only the function of kernel core side for physical memory hot remove. So if you use the patch, please apply following patches. - bug fix for memory hot remove https://lkml.org/lkml/2012/10/31/269 - acpi framework https://lkml.org/lkml/2012/10/26/175 What's happening with the acpi framework? has it received any feedback from the ACPI developers? ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 00/12] memory-hotplug: hot-remove physical memory
Hi Andrew, The patch-set aims to linux-3.8. So we would like you to merge the patch-set into your tree. The patch-set applied many comments. Currently there is no comment to the patch-set. Additionally, we have spent a lot of time on the verification of the patch-set. And we found many bugs, and fixed them. So we believe that Linux on x86_64 can support memory hot remove by the patch-set. Thanks, Yasuaki Ishimatsu 2012/11/01 18:44, Wen Congyang wrote: The patch-set was divided from following thread's patch-set. https://lkml.org/lkml/2012/9/5/201 The last version of this patchset: https://lkml.org/lkml/2012/10/23/213 If you want to know the reason, please read following thread. https://lkml.org/lkml/2012/10/2/83 The patch-set has only the function of kernel core side for physical memory hot remove. So if you use the patch, please apply following patches. - bug fix for memory hot remove https://lkml.org/lkml/2012/10/31/269 - acpi framework https://lkml.org/lkml/2012/10/26/175 The patches can free/remove the following things: - /sys/firmware/memmap/X/{end, start, type} : [PATCH 2/10] - mem_section and related sysfs files : [PATCH 3-4/10] - memmap of sparse-vmemmap : [PATCH 5-7/10] - page table of removed memory : [RFC PATCH 8/10] - node and related sysfs files : [RFC PATCH 9-10/10] * [PATCH 2/10] checks whether the memory can be removed or not. If you find lack of function for physical memory hot-remove, please let me know. How to test this patchset? 1. apply this patchset and build the kernel. MEMORY_HOTPLUG, MEMORY_HOTREMOVE, ACPI_HOTPLUG_MEMORY must be selected. 2. load the module acpi_memhotplug 3. hotplug the memory device(it depends on your hardware) You will see the memory device under the directory /sys/bus/acpi/devices/. Its name is PNP0C80:XX. 4. online/offline pages provided by this memory device You can write online/offline to /sys/devices/system/memory/memoryX/state to online/offline pages provided by this memory device 5. hotremove the memory device You can hotremove the memory device by the hardware, or writing 1 to /sys/bus/acpi/devices/PNP0C80:XX/eject. Note: if the memory provided by the memory device is used by the kernel, it can't be offlined. It is not a bug. Known problems: 1. hotremoving memory device may cause kernel panicked This bug will be fixed by Liu Jiang's patch: https://lkml.org/lkml/2012/7/3/1 Changelogs from v2 to v3: Patch9: call sync_global_pgds() if pgd is changed Patch10: fix a problem int the patch Changelogs from v1 to v2: Patch1: new patch, offline memory twice. 1st iterate: offline every non primary memory block. 2nd iterate: offline primary (i.e. first added) memory block. Patch3: new patch, no logical change, just remove reduntant codes. Patch9: merge the patch from wujianguo into this patch. flush tlb on all cpu after the pagetable is changed. Patch12: new patch, free node_data when a node is offlined Wen Congyang (6): memory-hotplug: try to offline the memory twice to avoid dependence memory-hotplug: remove redundant codes memory-hotplug: introduce new function arch_remove_memory() for removing page table depends on architecture memory-hotplug: remove page table of x86_64 architecture memory-hotplug: remove sysfs file of node memory-hotplug: free node_data when a node is offlined Yasuaki Ishimatsu (6): memory-hotplug: check whether all memory blocks are offlined or not when removing memory memory-hotplug: remove /sys/firmware/memmap/X sysfs memory-hotplug: unregister memory section on SPARSEMEM_VMEMMAP memory-hotplug: implement register_page_bootmem_info_section of sparse-vmemmap memory-hotplug: remove memmap of sparse-vmemmap memory-hotplug: memory_hotplug: clear zone when removing the memory arch/ia64/mm/discontig.c | 14 ++ arch/ia64/mm/init.c | 18 ++ arch/powerpc/mm/init_64.c| 14 ++ arch/powerpc/mm/mem.c| 12 + arch/s390/mm/init.c | 12 + arch/s390/mm/vmem.c | 14 ++ arch/sh/mm/init.c| 17 ++ arch/sparc/mm/init_64.c | 14 ++ arch/tile/mm/init.c | 8 + arch/x86/include/asm/pgtable_types.h | 1 + arch/x86/mm/init_32.c| 12 + arch/x86/mm/init_64.c| 417 +++ arch/x86/mm/pageattr.c | 47 ++-- drivers/acpi/acpi_memhotplug.c | 8 +- drivers/base/memory.c| 6 + drivers/firmware/memmap.c| 98 +++- include/linux/firmware-map.h | 6 + include/linux/memory_hotplug.h | 15 +- include/linux/mm.h
Re: [PATCH v3 11/12] memory-hotplug: remove sysfs file of node
Hi Wen, This patch cannot be applied, if I apply latest acpi framework's patch-set: https://lkml.org/lkml/2012/11/15/21 Because acpi_memory_disable_device() is gone by the patch-set. I updated the patch and attached it on the mail. 2012/11/01 18:44, Wen Congyang wrote: This patch introduces a new function try_offline_node() to remove sysfs file of node when all memory sections of this node are removed. If some memory sections of this node are not removed, this function does nothing. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com --- drivers/acpi/acpi_memhotplug.c | 8 +- include/linux/memory_hotplug.h | 2 +- mm/memory_hotplug.c| 58 -- 3 files changed, 64 insertions(+), 4 deletions(-) diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c index 24c807f..0780f99 100644 --- a/drivers/acpi/acpi_memhotplug.c +++ b/drivers/acpi/acpi_memhotplug.c @@ -310,7 +310,9 @@ static int acpi_memory_disable_device(struct acpi_memory_device *mem_device) { int result; struct acpi_memory_info *info, *n; + int node; + node = acpi_get_node(mem_device-device-handle); /* * Ask the VM to offline this memory range. @@ -318,7 +320,11 @@ static int acpi_memory_disable_device(struct acpi_memory_device *mem_device) */ list_for_each_entry_safe(info, n, mem_device-res_list, list) { if (info-enabled) { - result = remove_memory(info-start_addr, info-length); + if (node 0) + node = memory_add_physaddr_to_nid( + info-start_addr); + result = remove_memory(node, info-start_addr, + info-length); if (result) return result; } diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index d4c4402..7b4cfe6 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -231,7 +231,7 @@ extern int arch_add_memory(int nid, u64 start, u64 size); extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages); extern int offline_memory_block(struct memory_block *mem); extern bool is_memblock_offlined(struct memory_block *mem); -extern int remove_memory(u64 start, u64 size); +extern int remove_memory(int node, u64 start, u64 size); extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn, int nr_pages); extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 7bcced0..d965da3 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -29,6 +29,7 @@ #include linux/suspend.h #include linux/mm_inline.h #include linux/firmware-map.h +#include linux/stop_machine.h #include asm/tlbflush.h @@ -1299,7 +1300,58 @@ static int is_memblock_offlined_cb(struct memory_block *mem, void *arg) return ret; } -int __ref remove_memory(u64 start, u64 size) +static int check_cpu_on_node(void *data) +{ + struct pglist_data *pgdat = data; + int cpu; + + for_each_present_cpu(cpu) { + if (cpu_to_node(cpu) == pgdat-node_id) + /* + * the cpu on this node isn't removed, and we can't + * offline this node. + */ + return -EBUSY; + } + + return 0; +} + +/* offline the node if all memory sections of this node are removed */ +static void try_offline_node(int nid) +{ + unsigned long start_pfn = NODE_DATA(nid)-node_start_pfn; + unsigned long end_pfn = start_pfn + NODE_DATA(nid)-node_spanned_pages; + unsigned long pfn; + + for (pfn = start_pfn; pfn end_pfn; pfn += PAGES_PER_SECTION) { + unsigned long section_nr = pfn_to_section_nr(pfn); + + if (!present_section_nr(section_nr)) + continue; + + if (pfn_to_nid(pfn) != nid) + continue; + + /* + * some memory sections of this node are not removed, and we + * can't offline node now. + */ + return; + } + + if (stop_machine(check_cpu_on_node, NODE_DATA(nid), NULL)) + return; + + /* + * all memory/cpu of this node are removed, we can offline
Re: [PATCH 5/10] memory-hotplug : memory-hotplug: check page type in get_page_bootmem
Hi Kosaki, Sorry for late reply. 2012/10/13 4:28, KOSAKI Motohiro wrote: On Thu, Oct 4, 2012 at 10:32 PM, Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com wrote: The function get_page_bootmem() may be called more than one time to the same page. There is no need to set page's type, private if the function is not the first time called to the page. Note: the patch is just optimization and does not fix any problem. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- mm/memory_hotplug.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) Index: linux-3.6/mm/memory_hotplug.c === --- linux-3.6.orig/mm/memory_hotplug.c 2012-10-04 18:29:58.284676075 +0900 +++ linux-3.6/mm/memory_hotplug.c 2012-10-04 18:30:03.454680542 +0900 @@ -95,10 +95,17 @@ static void release_memory_resource(stru static void get_page_bootmem(unsigned long info, struct page *page, unsigned long type) { - page-lru.next = (struct list_head *) type; - SetPagePrivate(page); - set_page_private(page, info); - atomic_inc(page-_count); + unsigned long page_type; + + page_type = (unsigned long)page-lru.next; If I understand correctly, page-lru.next might be uninitialized yet. Ah yes. I was misunderstanding... Hi Wen, When you update the physical hot remove patch-set, please drop the patch. Thanks, Yasuaki Ishimatsu Moreover, I have no seen any good effect in this patch. I don't understand why we need to increase code complexity. + if (page_type MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE || + page_type MEMORY_HOTPLUG_MAX_BOOTMEM_TYPE){ + page-lru.next = (struct list_head *)type; + SetPagePrivate(page); + set_page_private(page, info); + atomic_inc(page-_count); + } else + atomic_inc(page-_count); } /* reference to __meminit __free_pages_bootmem is valid -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/10] memory-hotplug : remove /sys/firmware/memmap/X sysfs
2012/10/06 4:36, KOSAKI Motohiro wrote: On Thu, Oct 4, 2012 at 10:26 PM, Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com wrote: When (hot)adding memory into system, /sys/firmware/memmap/X/{end, start, type} sysfs files are created. But there is no code to remove these files. The patch implements the function to remove them. Note : The code does not free firmware_map_entry since there is no way to free memory which is allocated by bootmem. You have to explain why this is ok. I guess the unfreed firmware_map_entry is reused at next online memory and don't make memory leak, right? Unfortunately, it is no. It makes memory leak about firmware_map_entry size. If we hot add memory, slab allocater prepares a other memory for firmware_map_entry. In my understanding, if the memory is allocated by bootmem allocator, the memory is not managed by slab allocator. So we can not use kfree() against the memory. On the other hand, the page of the memory may have various data allocalted by bootmem allocater with the exception of the firmware_map_entry. Thus we cannot free the page. So the patch makes memory leak. But I think the memory leak size is very samll. And it does not affect the system. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- drivers/firmware/memmap.c| 98 ++- include/linux/firmware-map.h |6 ++ mm/memory_hotplug.c |7 ++- 3 files changed, 108 insertions(+), 3 deletions(-) Index: linux-3.6/drivers/firmware/memmap.c === --- linux-3.6.orig/drivers/firmware/memmap.c2012-10-04 18:27:05.195500420 +0900 +++ linux-3.6/drivers/firmware/memmap.c 2012-10-04 18:27:18.901514330 +0900 @@ -21,6 +21,7 @@ #include linux/types.h #include linux/bootmem.h #include linux/slab.h +#include linux/mm.h /* * Data types -- @@ -41,6 +42,7 @@ struct firmware_map_entry { const char *type; /* type of the memory range */ struct list_headlist; /* entry for the linked list */ struct kobject kobj; /* kobject for each entry */ + unsigned intbootmem:1; /* allocated from bootmem */ Use bool. We'll update it. }; /* @@ -79,7 +81,26 @@ static const struct sysfs_ops memmap_att .show = memmap_attr_show, }; + +static inline struct firmware_map_entry * +to_memmap_entry(struct kobject *kobj) +{ + return container_of(kobj, struct firmware_map_entry, kobj); +} + +static void release_firmware_map_entry(struct kobject *kobj) +{ + struct firmware_map_entry *entry = to_memmap_entry(kobj); + + if (entry-bootmem) + /* There is no way to free memory allocated from bootmem */ + return; + + kfree(entry); +} + static struct kobj_type memmap_ktype = { + .release= release_firmware_map_entry, .sysfs_ops = memmap_attr_ops, .default_attrs = def_attrs, }; @@ -94,6 +115,7 @@ static struct kobj_type memmap_ktype = { * in firmware initialisation code in one single thread of execution. */ static LIST_HEAD(map_entries); +static DEFINE_SPINLOCK(map_entries_lock); /** * firmware_map_add_entry() - Does the real work to add a firmware memmap entry. @@ -118,11 +140,25 @@ static int firmware_map_add_entry(u64 st INIT_LIST_HEAD(entry-list); kobject_init(entry-kobj, memmap_ktype); + spin_lock(map_entries_lock); list_add_tail(entry-list, map_entries); + spin_unlock(map_entries_lock); return 0; } +/** + * firmware_map_remove_entry() - Does the real work to remove a firmware + * memmap entry. + * @entry: removed entry. + **/ +static inline void firmware_map_remove_entry(struct firmware_map_entry *entry) Don't use inline in *.c file. gcc is wise than you. We'll update it. +{ + spin_lock(map_entries_lock); + list_del(entry-list); + spin_unlock(map_entries_lock); +} + /* * Add memmap entry on sysfs */ @@ -144,6 +180,35 @@ static int add_sysfs_fw_map_entry(struct return 0; } +/* + * Remove memmap entry on sysfs + */ +static inline void remove_sysfs_fw_map_entry(struct firmware_map_entry *entry) +{ + kobject_put(entry-kobj); +} + +/* + * Search memmap entry + */ + +static struct firmware_map_entry * __meminit +firmware_map_find_entry(u64 start, u64 end, const char *type) +{ + struct firmware_map_entry *entry; + + spin_lock(map_entries_lock); + list_for_each_entry(entry, map_entries, list
Re: linux-next: build failure after merge of the origin tree
Hi Stephen, 2012/10/10 8:45, Andrew Morton wrote: On Wed, 10 Oct 2012 10:21:50 +1100 Stephen Rothwell s...@canb.auug.org.au wrote: Hi Linus, In Linus' tree, today's linux-next build (powerpc ppc64_defconfig) failed like this: arch/powerpc/platforms/pseries/hotplug-memory.c: In function 'pseries_remove_memblock': arch/powerpc/platforms/pseries/hotplug-memory.c:103:17: error: unused variable 'pfn' [-Werror=unused-variable] Caused by commit d760afd4d257 (memory-hotplug: suppress Trying to free nonexistent resource - warning). I can't see what the point of the pfn variable is This: --- a/arch/powerpc/platforms/pseries/hotplug-memory.c~a +++ a/arch/powerpc/platforms/pseries/hotplug-memory.c @@ -101,7 +101,7 @@ static int pseries_remove_memblock(unsig sections_to_remove = (memblock_size PAGE_SHIFT) / PAGES_PER_SECTION; for (i = 0; i sections_to_remove; i++) { unsigned long pfn = start_pfn + i * PAGES_PER_SECTION; - ret = __remove_pages(zone, start_pfn, PAGES_PER_SECTION); + ret = __remove_pages(zone, pfn, PAGES_PER_SECTION); if (ret) return ret; } I believe the error to be fixed with this patch. Could you try it? Thanks, Yasuaki Ishimatsu and this patch never appeared in linux-next before being merged. :-( It was first sighted October 3. I have reverted that commit for today. If this patch truly was authored yesterday (according the Author Date in git), why was it merged yesterday while still under discussion? And the latest update to it still has this build problem ... did anyone even try to build this for powerpc (since that architecture was obviously affected)? Apparently not - the ppc bit was a best-effort fixup for a patch which addresses an x86 problem. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: memory-hotplug : suppres Trying to free nonexistent resource XXXXXXXXXXXXXXXX-YYYYYYYYYYYYYYYY warning
Hi Andrew, 2012/10/06 6:09, Andrew Morton wrote: On Thu, 4 Oct 2012 14:31:09 +0900 Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com wrote: When our x86 box calls __remove_pages(), release_mem_region() shows many warnings. And x86 box cannot unregister iomem_resource. Trying to free nonexistent resource - release_mem_region() has been changed as called in each PAGES_PER_SECTION chunk since applying a patch(de7f0cba96786c). Because powerpc registers iomem_resource in each PAGES_PER_SECTION chunk. But when I hot add memory on x86 box, iomem_resource is register in each _CRS not PAGES_PER_SECTION chunk. So x86 box unregisters iomem_resource. The patch fixes the problem. --- linux-3.6.orig/arch/powerpc/platforms/pseries/hotplug-memory.c 2012-10-04 14:22:59.833520792 +0900 +++ linux-3.6/arch/powerpc/platforms/pseries/hotplug-memory.c 2012-10-04 14:23:05.150521411 +0900 @@ -77,7 +77,8 @@ static int pseries_remove_memblock(unsig { unsigned long start, start_pfn; struct zone *zone; - int ret; + int i, ret; + int sections_to_remove; start_pfn = base PAGE_SHIFT; @@ -97,9 +98,13 @@ static int pseries_remove_memblock(unsig * to sysfs state file and we can't remove sysfs entries * while writing to it. So we have to defer it to here. */ - ret = __remove_pages(zone, start_pfn, memblock_size PAGE_SHIFT); - if (ret) - return ret; + sections_to_remove = (memblock_size PAGE_SHIFT) / PAGES_PER_SECTION; + for (i = 0; i sections_to_remove; i++) { + unsigned long pfn = start_pfn + i * PAGES_PER_SECTION; + ret = __remove_pages(zone, start_pfn, PAGES_PER_SECTION); + if (ret) + return ret; + } It is inappropriate that `i' have a signed 32-bit type. I doubt if there's any possibility of an overflow bug here, but using a consistent and well-chosen type would eliminate all doubt. Note that __remove_pages() does use an unsigned long for this, although it stupidly calls that variable i, despite the C programmers' expectation that a variable called i has type int. The same applies to `sections_to_remove', but __remove_pages() went and decided to use an `int' for that variable. Sigh. Anyway, please have a think, and see if we can come up with the best and most accurate choice of types and identifiers in this code. Your concern is right. Overflow bug may occur in the future. So I changed type of i and sections_to_remove to unsigned long. Please merge it into your tree instead of previous patch. __remove_pages() also has same concern. So I'll fix it. --- When our x86 box calls __remove_pages(), release_mem_region() shows many warnings. And x86 box cannot unregister iomem_resource. Trying to free nonexistent resource - release_mem_region() has been changed as called in each PAGES_PER_SECTION chunk since applying a patch(de7f0cba96786c). Because powerpc registers iomem_resource in each PAGES_PER_SECTION chunk. But when I hot add memory on x86 box, iomem_resource is register in each _CRS not PAGES_PER_SECTION chunk. So x86 box unregisters iomem_resource. The patch fixes the problem. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- arch/powerpc/platforms/pseries/hotplug-memory.c | 11 --- mm/memory_hotplug.c |4 ++-- 2 files changed, 10 insertions(+), 5 deletions(-) Index: linux-3.6/arch/powerpc/platforms/pseries/hotplug-memory.c === --- linux-3.6.orig/arch/powerpc/platforms/pseries/hotplug-memory.c 2012-10-05 14:33:09.516197839 +0900 +++ linux-3.6/arch/powerpc/platforms/pseries/hotplug-memory.c 2012-10-09 11:27:50.555709827 +0900 @@ -78,6 +78,7 @@ static int pseries_remove_memblock(unsig unsigned long start, start_pfn; struct zone *zone; int ret; + unsigned long i, sections_to_remove; start_pfn = base PAGE_SHIFT; @@ -97,9 +98,13 @@ static int pseries_remove_memblock(unsig * to sysfs state file and we can't remove sysfs entries * while writing to it. So we have to defer it to here. */ - ret = __remove_pages(zone, start_pfn, memblock_size PAGE_SHIFT); - if (ret) - return ret; + sections_to_remove = (memblock_size PAGE_SHIFT) / PAGES_PER_SECTION; + for (i = 0; i
Re: [RFC v9 PATCH 16/21] memory-hotplug: free memmap of sparse-vmemmap
Hi Chen, Sorry for late reply. 2012/10/02 13:21, Ni zhan Chen wrote: On 09/05/2012 05:25 PM, we...@cn.fujitsu.com wrote: From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com All pages of virtual mapping in removed memory cannot be freed, since some pages used as PGD/PUD includes not only removed memory but also other memory. So the patch checks whether page can be freed or not. How to check whether page can be freed or not? 1. When removing memory, the page structs of the revmoved memory are filled with 0FD. 2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared. In this case, the page used as PT/PMD can be freed. Applying patch, __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is integrated into one. So __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is deleted. Note: vmemmap_kfree() and vmemmap_free_bootmem() are not implemented for ia64, ppc, s390, and sparc. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- arch/ia64/mm/discontig.c |8 +++ arch/powerpc/mm/init_64.c |8 +++ arch/s390/mm/vmem.c |8 +++ arch/sparc/mm/init_64.c |8 +++ arch/x86/mm/init_64.c | 119 + include/linux/mm.h|2 + mm/memory_hotplug.c | 17 +-- mm/sparse.c |5 +- 8 files changed, 158 insertions(+), 17 deletions(-) diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c index 33943db..0d23b69 100644 --- a/arch/ia64/mm/discontig.c +++ b/arch/ia64/mm/discontig.c @@ -823,6 +823,14 @@ int __meminit vmemmap_populate(struct page *start_page, return vmemmap_populate_basepages(start_page, size, node); } +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages) +{ +} + +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages) +{ +} + void register_page_bootmem_memmap(unsigned long section_nr, struct page *start_page, unsigned long size) { diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c index 3690c44..835a2b3 100644 --- a/arch/powerpc/mm/init_64.c +++ b/arch/powerpc/mm/init_64.c @@ -299,6 +299,14 @@ int __meminit vmemmap_populate(struct page *start_page, return 0; } +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages) +{ +} + +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages) +{ +} + void register_page_bootmem_memmap(unsigned long section_nr, struct page *start_page, unsigned long size) { diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c index eda55cd..4b42b0b 100644 --- a/arch/s390/mm/vmem.c +++ b/arch/s390/mm/vmem.c @@ -227,6 +227,14 @@ out: return ret; } +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages) +{ +} + +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages) +{ +} + void register_page_bootmem_memmap(unsigned long section_nr, struct page *start_page, unsigned long size) { diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c index add1cc7..1384826 100644 --- a/arch/sparc/mm/init_64.c +++ b/arch/sparc/mm/init_64.c @@ -2078,6 +2078,14 @@ void __meminit vmemmap_populate_print_last(void) } } +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages) +{ +} + +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages) +{ +} + void register_page_bootmem_memmap(unsigned long section_nr, struct page *start_page, unsigned long size) { diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 0075592..4e8f8a4 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1138,6 +1138,125 @@ vmemmap_populate(struct page *start_page, unsigned long size, int node) return 0; } +#define PAGE_INUSE 0xFD + +unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long end, +struct page **pp, int *page_size) +{ +pgd_t *pgd; +pud_t *pud; +pmd_t *pmd; +pte_t *pte; +void *page_addr; +unsigned long next; + +*pp = NULL; + +pgd = pgd_offset_k(addr); +if (pgd_none(*pgd)) +return pgd_addr_end(addr, end); + +pud = pud_offset(pgd, addr); +if (pud_none(*pud)) +return pud_addr_end(addr, end); + +if (!cpu_has_pse) { +next = (addr + PAGE_SIZE) PAGE_MASK; +pmd = pmd_offset(pud, addr); +if (pmd_none(*pmd)) +return next; + +pte = pte_offset_kernel(pmd, addr); +if (pte_none(*pte)) +return next; + +*page_size = PAGE_SIZE; +*pp = pte_page
[PATCH 0/10] memory-hotplug: hot-remove physical memory
The patch-set was divided from following thread's patch-set. https://lkml.org/lkml/2012/9/5/201 If you want to know the reason, please read following thread. https://lkml.org/lkml/2012/10/2/83 The patch-set has only the function of kernel core side for physical memory hot remove. So if you use the patch, please apply following patches. - bug fix for memory hot remove https://lkml.org/lkml/2012/9/27/39 https://lkml.org/lkml/2012/10/2/83 http://www.spinics.net/lists/linux-mm/msg42982.html - acpi framework https://lkml.org/lkml/2012/10/3/126 https://lkml.org/lkml/2012/10/3/641 The patches can free/remove the following things: - /sys/firmware/memmap/X/{end, start, type} : [PATCH 2/10] - mem_section and related sysfs files : [PATCH 3-4/10] - memmap of sparse-vmemmap : [PATCH 5-7/10] - page table of removed memory : [RFC PATCH 8/10] - node and related sysfs files : [RFC PATCH 9-10/10] * [PATCH 1/10] checks whether the memory can be removed or not. If you find lack of function for physical memory hot-remove, please let me know. How to test this patchset? 1. apply this patchset and build the kernel. MEMORY_HOTPLUG, MEMORY_HOTREMOVE, ACPI_HOTPLUG_MEMORY must be selected. 2. load the module acpi_memhotplug 3. hotplug the memory device(it depends on your hardware) You will see the memory device under the directory /sys/bus/acpi/devices/. Its name is PNP0C80:XX. 4. online/offline pages provided by this memory device You can write online/offline to /sys/devices/system/memory/memoryX/state to online/offline pages provided by this memory device 5. hotremove the memory device You can hotremove the memory device by the hardware, or writing 1 to /sys/bus/acpi/devices/PNP0C80:XX/eject. Note: if the memory provided by the memory device is used by the kernel, it can't be offlined. It is not a bug. Known problems: 1. memory can't be offlined when CONFIG_MEMCG is selected. For example: there is a memory device on node 1. The address range is [1G, 1.5G). You will find 4 new directories memory8, memory9, memory10, and memory11 under the directory /sys/devices/system/memory/. If CONFIG_MEMCG is selected, we will allocate memory to store page cgroup when we online pages. When we online memory8, the memory stored page cgroup is not provided by this memory device. But when we online memory9, the memory stored page cgroup may be provided by memory8. So we can't offline memory8 now. We should offline the memory in the reversed order. When the memory device is hotremoved, we will auto offline memory provided by this memory device. But we don't know which memory is onlined first, so offlining memory may fail. In such case, you should offline the memory by hand before hotremoving the memory device. 2. hotremoving memory device may cause kernel panicked This bug will be fixed by Liu Jiang's patch: https://lkml.org/lkml/2012/7/3/1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/10] memory-hotplug : check whether memory is offline or not when removing memory
When calling remove_memory(), the memory should be offline. If the function is used to online memory, kernel panic may occur. So the patch checks whether memory is offline or not. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- drivers/base/memory.c | 39 +++ include/linux/memory.h |5 + mm/memory_hotplug.c| 17 +++-- 3 files changed, 59 insertions(+), 2 deletions(-) Index: linux-3.6/drivers/base/memory.c === --- linux-3.6.orig/drivers/base/memory.c2012-10-04 14:22:57.0 +0900 +++ linux-3.6/drivers/base/memory.c 2012-10-04 14:45:46.653585860 +0900 @@ -70,6 +70,45 @@ void unregister_memory_isolate_notifier( } EXPORT_SYMBOL(unregister_memory_isolate_notifier); +bool is_memblk_offline(unsigned long start, unsigned long size) +{ + struct memory_block *mem = NULL; + struct mem_section *section; + unsigned long start_pfn, end_pfn; + unsigned long pfn, section_nr; + + start_pfn = PFN_DOWN(start); + end_pfn = PFN_UP(start + size); + + for (pfn = start_pfn; pfn end_pfn; pfn += PAGES_PER_SECTION) { + section_nr = pfn_to_section_nr(pfn); + if (!present_section_nr(section_nr)) + continue; + + section = __nr_to_section(section_nr); + /* same memblock? */ + if (mem) + if ((section_nr = mem-start_section_nr) + (section_nr = mem-end_section_nr)) + continue; + + mem = find_memory_block_hinted(section, mem); + if (!mem) + continue; + if (mem-state == MEM_OFFLINE) + continue; + + kobject_put(mem-dev.kobj); + return false; + } + + if (mem) + kobject_put(mem-dev.kobj); + + return true; +} +EXPORT_SYMBOL(is_memblk_offline); + /* * register_memory - Setup a sysfs device for a memory block */ Index: linux-3.6/include/linux/memory.h === --- linux-3.6.orig/include/linux/memory.h 2012-10-02 18:00:22.0 +0900 +++ linux-3.6/include/linux/memory.h2012-10-04 14:44:40.902581028 +0900 @@ -106,6 +106,10 @@ static inline int memory_isolate_notify( { return 0; } +static inline bool is_memblk_offline(unsigned long start, unsigned long size) +{ + return false; +} #else extern int register_memory_notifier(struct notifier_block *nb); extern void unregister_memory_notifier(struct notifier_block *nb); @@ -120,6 +124,7 @@ extern int memory_isolate_notify(unsigne extern struct memory_block *find_memory_block_hinted(struct mem_section *, struct memory_block *); extern struct memory_block *find_memory_block(struct mem_section *); +extern bool is_memblk_offline(unsigned long start, unsigned long size); #define CONFIG_MEM_BLOCK_SIZE (PAGES_PER_SECTIONPAGE_SHIFT) enum mem_add_context { BOOT, HOTPLUG }; #endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */ Index: linux-3.6/mm/memory_hotplug.c === --- linux-3.6.orig/mm/memory_hotplug.c 2012-10-04 14:31:08.0 +0900 +++ linux-3.6/mm/memory_hotplug.c 2012-10-04 14:58:22.449687986 +0900 @@ -1045,8 +1045,21 @@ int offline_memory(u64 start, u64 size) int remove_memory(int nid, u64 start, u64 size) { - /* It is not implemented yet*/ - return 0; + int ret = 0; + lock_memory_hotplug(); + /* +* The memory might become online by other task, even if you offine it. +* So we check whether the memory has been onlined or not. +*/ + if (!is_memblk_offline(start, size)) { + pr_warn(memory removing [mem %#010llx-%#010llx] failed, + because the memmory range is online\n, + start, start + size); + ret = -EAGAIN; + } + + unlock_memory_hotplug(); + return ret; } EXPORT_SYMBOL_GPL(remove_memory); #else ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/10] memory-hotplug : remove /sys/firmware/memmap/X sysfs
When (hot)adding memory into system, /sys/firmware/memmap/X/{end, start, type} sysfs files are created. But there is no code to remove these files. The patch implements the function to remove them. Note : The code does not free firmware_map_entry since there is no way to free memory which is allocated by bootmem. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- drivers/firmware/memmap.c| 98 ++- include/linux/firmware-map.h |6 ++ mm/memory_hotplug.c |7 ++- 3 files changed, 108 insertions(+), 3 deletions(-) Index: linux-3.6/drivers/firmware/memmap.c === --- linux-3.6.orig/drivers/firmware/memmap.c2012-10-04 18:27:05.195500420 +0900 +++ linux-3.6/drivers/firmware/memmap.c 2012-10-04 18:27:18.901514330 +0900 @@ -21,6 +21,7 @@ #include linux/types.h #include linux/bootmem.h #include linux/slab.h +#include linux/mm.h /* * Data types -- @@ -41,6 +42,7 @@ struct firmware_map_entry { const char *type; /* type of the memory range */ struct list_headlist; /* entry for the linked list */ struct kobject kobj; /* kobject for each entry */ + unsigned intbootmem:1; /* allocated from bootmem */ }; /* @@ -79,7 +81,26 @@ static const struct sysfs_ops memmap_att .show = memmap_attr_show, }; + +static inline struct firmware_map_entry * +to_memmap_entry(struct kobject *kobj) +{ + return container_of(kobj, struct firmware_map_entry, kobj); +} + +static void release_firmware_map_entry(struct kobject *kobj) +{ + struct firmware_map_entry *entry = to_memmap_entry(kobj); + + if (entry-bootmem) + /* There is no way to free memory allocated from bootmem */ + return; + + kfree(entry); +} + static struct kobj_type memmap_ktype = { + .release= release_firmware_map_entry, .sysfs_ops = memmap_attr_ops, .default_attrs = def_attrs, }; @@ -94,6 +115,7 @@ static struct kobj_type memmap_ktype = { * in firmware initialisation code in one single thread of execution. */ static LIST_HEAD(map_entries); +static DEFINE_SPINLOCK(map_entries_lock); /** * firmware_map_add_entry() - Does the real work to add a firmware memmap entry. @@ -118,11 +140,25 @@ static int firmware_map_add_entry(u64 st INIT_LIST_HEAD(entry-list); kobject_init(entry-kobj, memmap_ktype); + spin_lock(map_entries_lock); list_add_tail(entry-list, map_entries); + spin_unlock(map_entries_lock); return 0; } +/** + * firmware_map_remove_entry() - Does the real work to remove a firmware + * memmap entry. + * @entry: removed entry. + **/ +static inline void firmware_map_remove_entry(struct firmware_map_entry *entry) +{ + spin_lock(map_entries_lock); + list_del(entry-list); + spin_unlock(map_entries_lock); +} + /* * Add memmap entry on sysfs */ @@ -144,6 +180,35 @@ static int add_sysfs_fw_map_entry(struct return 0; } +/* + * Remove memmap entry on sysfs + */ +static inline void remove_sysfs_fw_map_entry(struct firmware_map_entry *entry) +{ + kobject_put(entry-kobj); +} + +/* + * Search memmap entry + */ + +static struct firmware_map_entry * __meminit +firmware_map_find_entry(u64 start, u64 end, const char *type) +{ + struct firmware_map_entry *entry; + + spin_lock(map_entries_lock); + list_for_each_entry(entry, map_entries, list) + if ((entry-start == start) (entry-end == end) + (!strcmp(entry-type, type))) { + spin_unlock(map_entries_lock); + return entry; + } + + spin_unlock(map_entries_lock); + return NULL; +} + /** * firmware_map_add_hotplug() - Adds a firmware mapping entry when we do * memory hotplug. @@ -193,9 +258,36 @@ int __init firmware_map_add_early(u64 st if (WARN_ON(!entry)) return -ENOMEM; + entry-bootmem = 1; return firmware_map_add_entry(start, end, type, entry); } +/** + * firmware_map_remove() - remove a firmware mapping entry + * @start: Start of the memory range. + * @end: End of the memory range. + * @type: Type of the memory range. + * + * removes a firmware mapping entry. + * + * Returns 0 on success, or -EINVAL if no entry. + **/ +int __meminit firmware_map_remove(u64 start, u64 end, const char *type) +{ + struct firmware_map_entry *entry; + + entry
[PATCH 3/10] memory-hotplug : introduce new function arch_remove_memory() for removing page table depends on architecture
From: Wen Congyang we...@cn.fujitsu.com For removing memory, we need to remove page table. But it depends on architecture. So the patch introduce arch_remove_memory() for removing page table. Now it only calls __remove_pages(). Note: __remove_pages() for some archtecuture is not implemented (I don't know how to implement it for s390). CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com --- arch/ia64/mm/init.c| 18 ++ arch/powerpc/mm/mem.c | 12 arch/s390/mm/init.c| 12 arch/sh/mm/init.c | 17 + arch/tile/mm/init.c|8 arch/x86/mm/init_32.c | 12 arch/x86/mm/init_64.c | 15 +++ include/linux/memory_hotplug.h |1 + mm/memory_hotplug.c|1 + 9 files changed, 96 insertions(+) Index: linux-3.6/arch/ia64/mm/init.c === --- linux-3.6.orig/arch/ia64/mm/init.c 2012-10-04 18:27:03.082498276 +0900 +++ linux-3.6/arch/ia64/mm/init.c 2012-10-04 18:28:50.087606867 +0900 @@ -688,6 +688,24 @@ int arch_add_memory(int nid, u64 start, return ret; } + +#ifdef CONFIG_MEMORY_HOTREMOVE +int arch_remove_memory(u64 start, u64 size) +{ + unsigned long start_pfn = start PAGE_SHIFT; + unsigned long nr_pages = size PAGE_SHIFT; + struct zone *zone; + int ret; + + zone = page_zone(pfn_to_page(start_pfn)); + ret = __remove_pages(zone, start_pfn, nr_pages); + if (ret) + pr_warn(%s: Problem encountered in __remove_pages() as +ret=%d\n, __func__, ret); + + return ret; +} +#endif #endif /* Index: linux-3.6/arch/powerpc/mm/mem.c === --- linux-3.6.orig/arch/powerpc/mm/mem.c2012-10-04 18:27:03.084498278 +0900 +++ linux-3.6/arch/powerpc/mm/mem.c 2012-10-04 18:28:50.094606874 +0900 @@ -133,6 +133,18 @@ int arch_add_memory(int nid, u64 start, return __add_pages(nid, zone, start_pfn, nr_pages); } + +#ifdef CONFIG_MEMORY_HOTREMOVE +int arch_remove_memory(u64 start, u64 size) +{ + unsigned long start_pfn = start PAGE_SHIFT; + unsigned long nr_pages = size PAGE_SHIFT; + struct zone *zone; + + zone = page_zone(pfn_to_page(start_pfn)); + return __remove_pages(zone, start_pfn, nr_pages); +} +#endif #endif /* CONFIG_MEMORY_HOTPLUG */ /* Index: linux-3.6/arch/s390/mm/init.c === --- linux-3.6.orig/arch/s390/mm/init.c 2012-10-04 18:27:03.080498274 +0900 +++ linux-3.6/arch/s390/mm/init.c 2012-10-04 18:28:50.104606884 +0900 @@ -257,4 +257,16 @@ int arch_add_memory(int nid, u64 start, vmem_remove_mapping(start, size); return rc; } + +#ifdef CONFIG_MEMORY_HOTREMOVE +int arch_remove_memory(u64 start, u64 size) +{ + /* +* There is no hardware or firmware interface which could trigger a +* hot memory remove on s390. So there is nothing that needs to be +* implemented. +*/ + return -EBUSY; +} +#endif #endif /* CONFIG_MEMORY_HOTPLUG */ Index: linux-3.6/arch/sh/mm/init.c === --- linux-3.6.orig/arch/sh/mm/init.c2012-10-04 18:27:03.091498285 +0900 +++ linux-3.6/arch/sh/mm/init.c 2012-10-04 18:28:50.116606897 +0900 @@ -558,4 +558,21 @@ int memory_add_physaddr_to_nid(u64 addr) EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); #endif +#ifdef CONFIG_MEMORY_HOTREMOVE +int arch_remove_memory(u64 start, u64 size) +{ + unsigned long start_pfn = start PAGE_SHIFT; + unsigned long nr_pages = size PAGE_SHIFT; + struct zone *zone; + int ret; + + zone = page_zone(pfn_to_page(start_pfn)); + ret = __remove_pages(zone, start_pfn, nr_pages); + if (unlikely(ret)) + pr_warn(%s: Failed, __remove_pages() == %d\n, __func__, + ret); + + return ret; +} +#endif #endif /* CONFIG_MEMORY_HOTPLUG */ Index: linux-3.6/arch/tile/mm/init.c === --- linux-3.6.orig/arch/tile/mm/init.c 2012-10-04 18:27:03.078498272 +0900 +++ linux-3.6/arch/tile/mm/init.c 2012-10-04 18:28:50.122606903 +0900 @@ -935,6 +935,14 @@ int remove_memory(u64 start, u64 size) { return -EINVAL; } + +#ifdef CONFIG_MEMORY_HOTREMOVE +int arch_remove_memory(u64
[PATCH 4/10] memory-hotplug : unregister memory section on SPARSEMEM_VMEMMAP
Currently __remove_section for SPARSEMEM_VMEMMAP does nothing. But even if we use SPARSEMEM_VMEMMAP, we can unregister the memory_section. So the patch add unregister_memory_section() into __remove_section(). CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- mm/memory_hotplug.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) Index: linux-3.6/mm/memory_hotplug.c === --- linux-3.6.orig/mm/memory_hotplug.c 2012-10-04 18:29:50.577668254 +0900 +++ linux-3.6/mm/memory_hotplug.c 2012-10-04 18:29:58.284676075 +0900 @@ -279,11 +279,14 @@ static int __meminit __add_section(int n #ifdef CONFIG_SPARSEMEM_VMEMMAP static int __remove_section(struct zone *zone, struct mem_section *ms) { - /* -* XXX: Freeing memmap with vmemmap is not implement yet. -* This should be removed later. -*/ - return -EBUSY; + int ret = -EINVAL; + + if (!valid_section(ms)) + return ret; + + ret = unregister_memory_section(ms); + + return ret; } #else static int __remove_section(struct zone *zone, struct mem_section *ms) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 5/10] memory-hotplug : memory-hotplug: check page type in get_page_bootmem
The function get_page_bootmem() may be called more than one time to the same page. There is no need to set page's type, private if the function is not the first time called to the page. Note: the patch is just optimization and does not fix any problem. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- mm/memory_hotplug.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) Index: linux-3.6/mm/memory_hotplug.c === --- linux-3.6.orig/mm/memory_hotplug.c 2012-10-04 18:29:58.284676075 +0900 +++ linux-3.6/mm/memory_hotplug.c 2012-10-04 18:30:03.454680542 +0900 @@ -95,10 +95,17 @@ static void release_memory_resource(stru static void get_page_bootmem(unsigned long info, struct page *page, unsigned long type) { - page-lru.next = (struct list_head *) type; - SetPagePrivate(page); - set_page_private(page, info); - atomic_inc(page-_count); + unsigned long page_type; + + page_type = (unsigned long)page-lru.next; + if (page_type MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE || + page_type MEMORY_HOTPLUG_MAX_BOOTMEM_TYPE){ + page-lru.next = (struct list_head *)type; + SetPagePrivate(page); + set_page_private(page, info); + atomic_inc(page-_count); + } else + atomic_inc(page-_count); } /* reference to __meminit __free_pages_bootmem is valid ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 6/10] memory-hotplug : implement register_page_bootmem_info_section of sparse-vmemmap
For removing memmap region of sparse-vmemmap which is allocated bootmem, memmap region of sparse-vmemmap needs to be registered by get_page_bootmem(). So the patch searches pages of virtual mapping and registers the pages by get_page_bootmem(). Note: register_page_bootmem_memmap() is not implemented for ia64, ppc, s390, and sparc. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- arch/ia64/mm/discontig.c |6 arch/powerpc/mm/init_64.c |6 arch/s390/mm/vmem.c|6 arch/sparc/mm/init_64.c|6 arch/x86/mm/init_64.c | 52 + include/linux/memory_hotplug.h | 11 +--- include/linux/mm.h |3 +- mm/memory_hotplug.c| 37 ++--- 8 files changed, 113 insertions(+), 14 deletions(-) Index: linux-3.6/include/linux/memory_hotplug.h === --- linux-3.6.orig/include/linux/memory_hotplug.h 2012-10-04 17:15:03.029828127 +0900 +++ linux-3.6/include/linux/memory_hotplug.h2012-10-04 17:15:59.010833688 +0900 @@ -163,17 +163,10 @@ static inline void arch_refresh_nodedata #endif /* CONFIG_NUMA */ #endif /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */ -#ifdef CONFIG_SPARSEMEM_VMEMMAP -static inline void register_page_bootmem_info_node(struct pglist_data *pgdat) -{ -} -static inline void put_page_bootmem(struct page *page) -{ -} -#else extern void register_page_bootmem_info_node(struct pglist_data *pgdat); extern void put_page_bootmem(struct page *page); -#endif +extern void get_page_bootmem(unsigned long ingo, struct page *page, +unsigned long type); /* * Lock for memory hotplug guarantees 1) all callbacks for memory hotplug Index: linux-3.6/mm/memory_hotplug.c === --- linux-3.6.orig/mm/memory_hotplug.c 2012-10-04 17:15:27.213831361 +0900 +++ linux-3.6/mm/memory_hotplug.c 2012-10-04 17:37:00.176401540 +0900 @@ -91,9 +91,8 @@ static void release_memory_resource(stru } #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE -#ifndef CONFIG_SPARSEMEM_VMEMMAP -static void get_page_bootmem(unsigned long info, struct page *page, -unsigned long type) +void get_page_bootmem(unsigned long info, struct page *page, + unsigned long type) { unsigned long page_type; @@ -127,6 +126,7 @@ void __ref put_page_bootmem(struct page } +#ifndef CONFIG_SPARSEMEM_VMEMMAP static void register_page_bootmem_info_section(unsigned long start_pfn) { unsigned long *usemap, mapsize, section_nr, i; @@ -160,6 +160,36 @@ static void register_page_bootmem_info_s get_page_bootmem(section_nr, page, MIX_SECTION_INFO); } +#else +static void register_page_bootmem_info_section(unsigned long start_pfn) +{ + unsigned long *usemap, mapsize, section_nr, i; + struct mem_section *ms; + struct page *page, *memmap; + + if (!pfn_valid(start_pfn)) + return; + + section_nr = pfn_to_section_nr(start_pfn); + ms = __nr_to_section(section_nr); + + memmap = sparse_decode_mem_map(ms-section_mem_map, section_nr); + + page = virt_to_page(memmap); + mapsize = sizeof(struct page) * PAGES_PER_SECTION; + mapsize = PAGE_ALIGN(mapsize) PAGE_SHIFT; + + register_page_bootmem_memmap(section_nr, memmap, PAGES_PER_SECTION); + + usemap = __nr_to_section(section_nr)-pageblock_flags; + page = virt_to_page(usemap); + + mapsize = PAGE_ALIGN(usemap_size()) PAGE_SHIFT; + + for (i = 0; i mapsize; i++, page++) + get_page_bootmem(section_nr, page, MIX_SECTION_INFO); +} +#endif void register_page_bootmem_info_node(struct pglist_data *pgdat) { @@ -202,7 +232,6 @@ void register_page_bootmem_info_node(str register_page_bootmem_info_section(pfn); } } -#endif /* !CONFIG_SPARSEMEM_VMEMMAP */ static void grow_zone_span(struct zone *zone, unsigned long start_pfn, unsigned long end_pfn) Index: linux-3.6/arch/ia64/mm/discontig.c === --- linux-3.6.orig/arch/ia64/mm/discontig.c 2012-10-01 08:47:46.0 +0900 +++ linux-3.6/arch/ia64/mm/discontig.c 2012-10-04 17:15:59.209833459 +0900 @@ -822,4 +822,10 @@ int __meminit vmemmap_populate(struct pa { return vmemmap_populate_basepages(start_page, size, node); } + +void register_page_bootmem_memmap(unsigned long section_nr
[PATCH 7/10] memory-hotplug : remove memmap of sparse-vmemmap
All pages of virtual mapping in removed memory cannot be freed, since some pages used as PGD/PUD includes not only removed memory but also other memory. So the patch checks whether page can be freed or not. How to check whether page can be freed or not? 1. When removing memory, the page structs of the revmoved memory are filled with 0FD. 2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared. In this case, the page used as PT/PMD can be freed. Applying patch, __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is integrated into one. So __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is deleted. Note: vmemmap_kfree() and vmemmap_free_bootmem() are not implemented for ia64, ppc, s390, and sparc. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- arch/ia64/mm/discontig.c |8 +++ arch/powerpc/mm/init_64.c |8 +++ arch/s390/mm/vmem.c |8 +++ arch/sparc/mm/init_64.c |8 +++ arch/x86/mm/init_64.c | 119 ++ include/linux/mm.h|2 mm/memory_hotplug.c | 17 -- mm/sparse.c |5 + 8 files changed, 158 insertions(+), 17 deletions(-) Index: linux-3.6/arch/ia64/mm/discontig.c === --- linux-3.6.orig/arch/ia64/mm/discontig.c 2012-10-04 18:30:15.475692638 +0900 +++ linux-3.6/arch/ia64/mm/discontig.c 2012-10-04 18:30:21.145698389 +0900 @@ -823,6 +823,14 @@ int __meminit vmemmap_populate(struct pa return vmemmap_populate_basepages(start_page, size, node); } +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages) +{ +} + +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages) +{ +} + void register_page_bootmem_memmap(unsigned long section_nr, struct page *start_page, unsigned long size) { Index: linux-3.6/arch/powerpc/mm/init_64.c === --- linux-3.6.orig/arch/powerpc/mm/init_64.c2012-10-04 18:30:15.494692657 +0900 +++ linux-3.6/arch/powerpc/mm/init_64.c 2012-10-04 18:30:21.150698394 +0900 @@ -299,6 +299,14 @@ int __meminit vmemmap_populate(struct pa return 0; } +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages) +{ +} + +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages) +{ +} + void register_page_bootmem_memmap(unsigned long section_nr, struct page *start_page, unsigned long size) { Index: linux-3.6/arch/s390/mm/vmem.c === --- linux-3.6.orig/arch/s390/mm/vmem.c 2012-10-04 18:30:15.506692670 +0900 +++ linux-3.6/arch/s390/mm/vmem.c 2012-10-04 18:30:21.157698401 +0900 @@ -227,6 +227,14 @@ out: return ret; } +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages) +{ +} + +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages) +{ +} + void register_page_bootmem_memmap(unsigned long section_nr, struct page *start_page, unsigned long size) { Index: linux-3.6/arch/sparc/mm/init_64.c === --- linux-3.6.orig/arch/sparc/mm/init_64.c 2012-10-04 18:30:15.512692676 +0900 +++ linux-3.6/arch/sparc/mm/init_64.c 2012-10-04 18:30:21.163698408 +0900 @@ -2078,6 +2078,14 @@ void __meminit vmemmap_populate_print_la } } +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages) +{ +} + +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages) +{ +} + void register_page_bootmem_memmap(unsigned long section_nr, struct page *start_page, unsigned long size) { Index: linux-3.6/arch/x86/mm/init_64.c === --- linux-3.6.orig/arch/x86/mm/init_64.c2012-10-04 18:30:15.517692681 +0900 +++ linux-3.6/arch/x86/mm/init_64.c 2012-10-04 18:30:21.171698416 +0900 @@ -993,6 +993,125 @@ vmemmap_populate(struct page *start_page return 0; } +#define PAGE_INUSE 0xFD + +unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long end, + struct page **pp, int *page_size) +{ + pgd_t *pgd; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + void *page_addr; + unsigned long next; + + *pp = NULL; + + pgd = pgd_offset_k(addr); + if (pgd_none(*pgd)) + return pgd_addr_end(addr, end); + + pud = pud_offset(pgd, addr); + if (pud_none(*pud)) + return
[PATCH 8/10] memory-hotplug : remove page table of x86_64 architecture
From: Wen Congyang we...@cn.fujitsu.com For hot removing memory, we sholud remove page table about the memory. So the patch searches a page table about the removed memory, and clear page table. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com --- arch/x86/include/asm/pgtable_types.h |1 arch/x86/mm/init_64.c| 147 +++ arch/x86/mm/pageattr.c | 47 +-- 3 files changed, 173 insertions(+), 22 deletions(-) Index: linux-3.6/arch/x86/mm/init_64.c === --- linux-3.6.orig/arch/x86/mm/init_64.c2012-10-04 18:30:21.171698416 +0900 +++ linux-3.6/arch/x86/mm/init_64.c 2012-10-04 18:30:27.317704652 +0900 @@ -675,6 +675,151 @@ int arch_add_memory(int nid, u64 start, } EXPORT_SYMBOL_GPL(arch_add_memory); +static void __meminit +phys_pte_remove(pte_t *pte_page, unsigned long addr, unsigned long end) +{ + unsigned pages = 0; + int i = pte_index(addr); + + pte_t *pte = pte_page + pte_index(addr); + + for (; i PTRS_PER_PTE; i++, addr += PAGE_SIZE, pte++) { + + if (addr = end) + break; + + if (!pte_present(*pte)) + continue; + + pages++; + set_pte(pte, __pte(0)); + } + + update_page_count(PG_LEVEL_4K, -pages); +} + +static void __meminit +phys_pmd_remove(pmd_t *pmd_page, unsigned long addr, unsigned long end) +{ + unsigned long pages = 0, next; + int i = pmd_index(addr); + + for (; i PTRS_PER_PMD; i++, addr = next) { + unsigned long pte_phys; + pmd_t *pmd = pmd_page + pmd_index(addr); + pte_t *pte; + + if (addr = end) + break; + + next = (addr PMD_MASK) + PMD_SIZE; + + if (!pmd_present(*pmd)) + continue; + + if (pmd_large(*pmd)) { + if ((addr ~PMD_MASK) == 0 next = end) { + set_pmd(pmd, __pmd(0)); + pages++; + continue; + } + + /* +* We use 2M page, but we need to remove part of them, +* so split 2M page to 4K page. +*/ + pte = alloc_low_page(pte_phys); + __split_large_page((pte_t *)pmd, addr, pte); + + spin_lock(init_mm.page_table_lock); + pmd_populate_kernel(init_mm, pmd, __va(pte_phys)); + spin_unlock(init_mm.page_table_lock); + } + + spin_lock(init_mm.page_table_lock); + pte = map_low_page((pte_t *)pmd_page_vaddr(*pmd)); + phys_pte_remove(pte, addr, end); + unmap_low_page(pte); + spin_unlock(init_mm.page_table_lock); + } + update_page_count(PG_LEVEL_2M, -pages); +} + +static void __meminit +phys_pud_remove(pud_t *pud_page, unsigned long addr, unsigned long end) +{ + unsigned long pages = 0, next; + int i = pud_index(addr); + + for (; i PTRS_PER_PUD; i++, addr = next) { + unsigned long pmd_phys; + pud_t *pud = pud_page + pud_index(addr); + pmd_t *pmd; + + if (addr = end) + break; + + next = (addr PUD_MASK) + PUD_SIZE; + + if (!pud_present(*pud)) + continue; + + if (pud_large(*pud)) { + if ((addr ~PUD_MASK) == 0 next = end) { + set_pud(pud, __pud(0)); + pages++; + continue; + } + + /* +* We use 1G page, but we need to remove part of them, +* so split 1G page to 2M page. +*/ + pmd = alloc_low_page(pmd_phys); + __split_large_page((pte_t *)pud, addr, (pte_t *)pmd); + + spin_lock(init_mm.page_table_lock); + pud_populate(init_mm, pud, __va(pmd_phys)); + spin_unlock(init_mm.page_table_lock); + } + + pmd = map_low_page(pmd_offset(pud, 0)); + phys_pmd_remove(pmd, addr, end); + unmap_low_page(pmd); + __flush_tlb_all(); + } + __flush_tlb_all
[PATCH 9/10] memory-hotplug : memory_hotplug: clear zone when removing the memory
When a memory is added, we update zone's and pgdat's start_pfn and spanned_pages in the function __add_zone(). So we should revert them when the memory is removed. The patch adds a new function __remove_zone() to do this. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com --- mm/memory_hotplug.c | 207 1 file changed, 207 insertions(+) Index: linux-3.6/mm/memory_hotplug.c === --- linux-3.6.orig/mm/memory_hotplug.c 2012-10-04 18:30:21.182698427 +0900 +++ linux-3.6/mm/memory_hotplug.c 2012-10-04 18:30:31.767709165 +0900 @@ -312,10 +312,213 @@ static int __meminit __add_section(int n return register_new_memory(nid, __pfn_to_section(phys_start_pfn)); } +/* find the smallest valid pfn in the range [start_pfn, end_pfn) */ +static int find_smallest_section_pfn(int nid, struct zone *zone, +unsigned long start_pfn, +unsigned long end_pfn) +{ + struct mem_section *ms; + + for (; start_pfn end_pfn; start_pfn += PAGES_PER_SECTION) { + ms = __pfn_to_section(start_pfn); + + if (unlikely(!valid_section(ms))) + continue; + + if (unlikely(pfn_to_nid(start_pfn)) != nid) + continue; + + if (zone zone != page_zone(pfn_to_page(start_pfn))) + continue; + + return start_pfn; + } + + return 0; +} + +/* find the biggest valid pfn in the range [start_pfn, end_pfn). */ +static int find_biggest_section_pfn(int nid, struct zone *zone, + unsigned long start_pfn, + unsigned long end_pfn) +{ + struct mem_section *ms; + unsigned long pfn; + + /* pfn is the end pfn of a memory section. */ + pfn = end_pfn - 1; + for (; pfn = start_pfn; pfn -= PAGES_PER_SECTION) { + ms = __pfn_to_section(pfn); + + if (unlikely(!valid_section(ms))) + continue; + + if (unlikely(pfn_to_nid(pfn)) != nid) + continue; + + if (zone zone != page_zone(pfn_to_page(pfn))) + continue; + + return pfn; + } + + return 0; +} + +static void shrink_zone_span(struct zone *zone, unsigned long start_pfn, +unsigned long end_pfn) +{ + unsigned long zone_start_pfn = zone-zone_start_pfn; + unsigned long zone_end_pfn = zone-zone_start_pfn + zone-spanned_pages; + unsigned long pfn; + struct mem_section *ms; + int nid = zone_to_nid(zone); + + zone_span_writelock(zone); + if (zone_start_pfn == start_pfn) { + /* +* If the section is smallest section in the zone, it need +* shrink zone-zone_start_pfn and zone-zone_spanned_pages. +* In this case, we find second smallest valid mem_section +* for shrinking zone. +*/ + pfn = find_smallest_section_pfn(nid, zone, end_pfn, + zone_end_pfn); + if (pfn) { + zone-zone_start_pfn = pfn; + zone-spanned_pages = zone_end_pfn - pfn; + } + } else if (zone_end_pfn == end_pfn) { + /* +* If the section is biggest section in the zone, it need +* shrink zone-spanned_pages. +* In this case, we find second biggest valid mem_section for +* shrinking zone. +*/ + pfn = find_biggest_section_pfn(nid, zone, zone_start_pfn, + start_pfn); + if (pfn) + zone-spanned_pages = pfn - zone_start_pfn + 1; + } + + /* +* The section is not biggest or smallest mem_section in the zone, it +* only creates a hole in the zone. So in this case, we need not +* change the zone. But perhaps, the zone has only hole data. Thus +* it check the zone has only hole or not. +*/ + pfn = zone_start_pfn; + for (; pfn zone_end_pfn; pfn += PAGES_PER_SECTION) { + ms = __pfn_to_section(pfn); + + if (unlikely(!valid_section(ms))) + continue; + + if (page_zone(pfn_to_page(pfn)) != zone) + continue; + +/* If the section
[PATCH 10/10] memory-hotplug : remove sysfs file of node
From: Wen Congyang we...@cn.fujitsu.com This patch introduces a new function try_offline_node() to remove sysfs file of node when all memory sections of this node are removed. If some memory sections of this node are not removed, this function does nothing. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com --- mm/memory_hotplug.c | 54 1 file changed, 54 insertions(+) Index: linux-3.6/mm/memory_hotplug.c === --- linux-3.6.orig/mm/memory_hotplug.c 2012-10-04 18:30:31.767709165 +0900 +++ linux-3.6/mm/memory_hotplug.c 2012-10-04 18:32:46.907842637 +0900 @@ -29,6 +29,7 @@ #include linux/suspend.h #include linux/mm_inline.h #include linux/firmware-map.h +#include linux/stop_machine.h #include asm/tlbflush.h @@ -1276,6 +1277,57 @@ int offline_memory(u64 start, u64 size) return 0; } +static int check_cpu_on_node(void *data) +{ + struct pglist_data *pgdat = data; + int cpu; + + for_each_online_cpu(cpu) { + if (cpu_to_node(cpu) == pgdat-node_id) + /* +* the cpu on this node is onlined, and we can't +* offline this node. +*/ + return -EBUSY; + } + + return 0; +} + +/* offline the node if all memory sections of this node are removed */ +static void try_offline_node(int nid) +{ + unsigned long start_pfn = NODE_DATA(nid)-node_start_pfn; + unsigned long end_pfn = start_pfn + NODE_DATA(nid)-node_spanned_pages; + unsigned long pfn; + + for (pfn = start_pfn; pfn end_pfn; pfn += PAGES_PER_SECTION) { + unsigned long section_nr = pfn_to_section_nr(pfn); + + if (!present_section_nr(section_nr)) + continue; + + if (pfn_to_nid(pfn) != nid) + continue; + + /* +* some memory sections of this node are not removed, and we +* can't offline node now. +*/ + return; + } + + if (stop_machine(check_cpu_on_node, NODE_DATA(nid), NULL)) + return; + + /* +* all memory sections of this node are removed, we can offline this +* node now. +*/ + node_set_offline(nid); + unregister_one_node(nid); +} + int __ref remove_memory(int nid, u64 start, u64 size) { int ret = 0; @@ -1296,6 +1348,8 @@ int __ref remove_memory(int nid, u64 sta firmware_map_remove(start, start + size, System RAM); arch_remove_memory(start, size); + + try_offline_node(nid); out: unlock_memory_hotplug(); return ret; ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
memory-hotplug : suppres Trying to free nonexistent resource XXXXXXXXXXXXXXXX-YYYYYYYYYYYYYYYY warning
When our x86 box calls __remove_pages(), release_mem_region() shows many warnings. And x86 box cannot unregister iomem_resource. Trying to free nonexistent resource - release_mem_region() has been changed as called in each PAGES_PER_SECTION chunk since applying a patch(de7f0cba96786c). Because powerpc registers iomem_resource in each PAGES_PER_SECTION chunk. But when I hot add memory on x86 box, iomem_resource is register in each _CRS not PAGES_PER_SECTION chunk. So x86 box unregisters iomem_resource. The patch fixes the problem. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- arch/powerpc/platforms/pseries/hotplug-memory.c | 13 + mm/memory_hotplug.c |4 ++-- 2 files changed, 11 insertions(+), 6 deletions(-) Index: linux-3.6/arch/powerpc/platforms/pseries/hotplug-memory.c === --- linux-3.6.orig/arch/powerpc/platforms/pseries/hotplug-memory.c 2012-10-04 14:22:59.833520792 +0900 +++ linux-3.6/arch/powerpc/platforms/pseries/hotplug-memory.c 2012-10-04 14:23:05.150521411 +0900 @@ -77,7 +77,8 @@ static int pseries_remove_memblock(unsig { unsigned long start, start_pfn; struct zone *zone; - int ret; + int i, ret; + int sections_to_remove; start_pfn = base PAGE_SHIFT; @@ -97,9 +98,13 @@ static int pseries_remove_memblock(unsig * to sysfs state file and we can't remove sysfs entries * while writing to it. So we have to defer it to here. */ - ret = __remove_pages(zone, start_pfn, memblock_size PAGE_SHIFT); - if (ret) - return ret; + sections_to_remove = (memblock_size PAGE_SHIFT) / PAGES_PER_SECTION; + for (i = 0; i sections_to_remove; i++) { + unsigned long pfn = start_pfn + i * PAGES_PER_SECTION; + ret = __remove_pages(zone, start_pfn, PAGES_PER_SECTION); + if (ret) + return ret; + } /* * Update memory regions for memory remove Index: linux-3.6/mm/memory_hotplug.c === --- linux-3.6.orig/mm/memory_hotplug.c 2012-10-04 14:22:59.829520788 +0900 +++ linux-3.6/mm/memory_hotplug.c 2012-10-04 14:23:25.860527278 +0900 @@ -362,11 +362,11 @@ int __remove_pages(struct zone *zone, un BUG_ON(phys_start_pfn ~PAGE_SECTION_MASK); BUG_ON(nr_pages % PAGES_PER_SECTION); + release_mem_region(phys_start_pfn PAGE_SHIFT, nr_pages * PAGE_SIZE); + sections_to_remove = nr_pages / PAGES_PER_SECTION; for (i = 0; i sections_to_remove; i++) { unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION; - release_mem_region(pfn PAGE_SHIFT, - PAGES_PER_SECTION PAGE_SHIFT); ret = __remove_section(zone, __pfn_to_section(pfn)); if (ret) break; ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC v9 PATCH 03/21] memory-hotplug: store the node id in acpi_memory_device
Hi Chen, 2012/09/28 12:21, Ni zhan Chen wrote: On 09/05/2012 05:25 PM, we...@cn.fujitsu.com wrote: From: Wen Congyang we...@cn.fujitsu.com The memory device has only one node id. Store the node id when enable the memory device, and we can reuse it when removing the memory device. one question: if use numa emulation, memory device will associated to one node or ...? Memory device has only one node, even if you use numa emulation. Thanks, Yasuaki Ishimatsu CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com Reviewed-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- drivers/acpi/acpi_memhotplug.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c index 2a7beac..7873832 100644 --- a/drivers/acpi/acpi_memhotplug.c +++ b/drivers/acpi/acpi_memhotplug.c @@ -83,6 +83,7 @@ struct acpi_memory_info { struct acpi_memory_device { struct acpi_device * device; unsigned int state;/* State of the memory device */ +int nid; struct list_head res_list; }; @@ -256,6 +257,9 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device) info-enabled = 1; num_enabled++; } + +mem_device-nid = node; + if (!num_enabled) { printk(KERN_ERR PREFIX add_memory failed\n); mem_device-state = MEMORY_INVALID_STATE; ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory
Hi Chen, 2012/10/02 8:45, Ni zhan Chen wrote: On 10/01/2012 12:44 PM, Yasuaki Ishimatsu wrote: Hi Chen, 2012/09/29 17:19, Ni zhan Chen wrote: On 09/05/2012 05:25 PM, we...@cn.fujitsu.com wrote: From: Wen Congyang we...@cn.fujitsu.com This patch series aims to support physical memory hot-remove. The patches can free/remove the following things: - acpi_memory_info : [RFC PATCH 4/19] - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19] - iomem_resource: [RFC PATCH 9/19] - mem_section and related sysfs files : [RFC PATCH 10-11, 13-16/19] - page table of removed memory : [RFC PATCH 12/19] - node and related sysfs files : [RFC PATCH 18-19/19] If you find lack of function for physical memory hot-remove, please let me know. How to test this patchset? 1. apply this patchset and build the kernel. MEMORY_HOTPLUG, MEMORY_HOTREMOVE, ACPI_HOTPLUG_MEMORY must be selected. 2. load the module acpi_memhotplug Hi Yasuaki, where is the acpi_memhotplug module? If you build acpi_memhotplug as module, it is created under /lib/modules/kernel-version/driver/acpi/ directory. It depends on config ACPI_HOTPLUG_MEMORY. The confing is [*], it becomes built-in function. So you don't need to care about it. Thanks, Yasuaki Ishimatsu Hi Yasuaki, I build the kernel, MEMORY_HOTPLUG, MEMORY_HOTREMOVE, ACPI_HOTPLUG_MEMORY are seleted as [*], but I can't find PNP0C80:XX under the directory /sys/bus/acpi/devices/. [root@localhost ~]# ls /sys/bus/acpi/devices/ device:00 device:07 device:0e device:15 device:1c device:23 device:2a LNXCPU:00 LNXCPU:07PNP0501:00 PNP0C02:00 PNP0C0F:02 PNP0C14:01 device:01 device:08 device:0f device:16 device:1d device:24 device:2b LNXCPU:01 LNXPWRBN:00 PNP0800:00 PNP0C02:01 PNP0C0F:03 PNP0C31:00 device:02 device:09 device:10 device:17 device:1e device:25 device:2c LNXCPU:02 LNXSYSTM:00 PNP0A08:00 PNP0C02:02 PNP0C0F:04 device:03 device:0a device:11 device:18 device:1f device:26 device:2d LNXCPU:03 PNP:00 PNP0B00:00 PNP0C04:00 PNP0C0F:05 device:04 device:0b device:12 device:19 device:20 device:27 device:2e LNXCPU:04 PNP0100:00 PNP0C01:00 PNP0C0C:00 PNP0C0F:06 device:05 device:0c device:13 device:1a device:21 device:28 device:2f LNXCPU:05 PNP0103:00 PNP0C01:01 PNP0C0F:00 PNP0C0F:07 device:06 device:0d device:14 device:1b device:22 device:29 INT3F0D:00 LNXCPU:06 PNP0200:00 PNP0C01:02 PNP0C0F:01 PNP0C14:00 then what I miss ? thanks. It depend on hardware. It seems that your system does not support memory hotplug. If you use KVM, you can try memory hotplug on KVM guest by applying Vasilis' patch-set. http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg01389.html Thanks, Yasuaki Ishimatsu 3. hotplug the memory device(it depends on your hardware) You will see the memory device under the directory /sys/bus/acpi/devices/. Its name is PNP0C80:XX. 4. online/offline pages provided by this memory device You can write online/offline to /sys/devices/system/memory/memoryX/state to online/offline pages provided by this memory device 5. hotremove the memory device You can hotremove the memory device by the hardware, or writing 1 to /sys/bus/acpi/devices/PNP0C80:XX/eject. Note: if the memory provided by the memory device is used by the kernel, it can't be offlined. It is not a bug. Known problems: 1. memory can't be offlined when CONFIG_MEMCG is selected. For example: there is a memory device on node 1. The address range is [1G, 1.5G). You will find 4 new directories memory8, memory9, memory10, and memory11 under the directory /sys/devices/system/memory/. If CONFIG_MEMCG is selected, we will allocate memory to store page cgroup when we online pages. When we online memory8, the memory stored page cgroup is not provided by this memory device. But when we online memory9, the memory stored page cgroup may be provided by memory8. So we can't offline memory8 now. We should offline the memory in the reversed order. When the memory device is hotremoved, we will auto offline memory provided by this memory device. But we don't know which memory is onlined first, so offlining memory may fail. In such case, you should offline the memory by hand before hotremoving the memory device. 2. hotremoving memory device may cause kernel panicked This bug will be fixed by Liu Jiang's patch: https://lkml.org/lkml/2012/7/3/1 change log of v9: [RFC PATCH v9 8/21] * add a lock to protect the list map_entries * add an indicator to firmware_map_entry to remember whether the memory is allocated from bootmem [RFC PATCH v9 10/21] * change the macro to inline function [RFC PATCH v9 19/21] * don't offline the node if the cpu on the node is onlined [RFC PATCH v9 21/21] * create new patch: auto offline page_cgroup
Re: [RFC v9 PATCH 01/21] memory-hotplug: rename remove_memory() to offline_memory()/offline_pages()
Hi Kosaki-san, 2012/09/29 7:15, KOSAKI Motohiro wrote: On Thu, Sep 27, 2012 at 11:50 PM, Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com wrote: Hi Chen, 2012/09/28 11:22, Ni zhan Chen wrote: On 09/05/2012 05:25 PM, we...@cn.fujitsu.com wrote: From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com remove_memory() only try to offline pages. It is called in two cases: 1. hot remove a memory device 2. echo offline /sys/devices/system/memory/memoryXX/state In the 1st case, we should also change memory block's state, and notify the userspace that the memory block's state is changed after offlining pages. So rename remove_memory() to offline_memory()/offline_pages(). And in the 1st case, offline_memory() will be used. The function offline_memory() is not implemented. In the 2nd case, offline_pages() will be used. But this time there is not a function associated with add_memory. To associate with add_memory() later, we renamed it. Then, you introduced bisect breakage. It is definitely unacceptable. What is bisect breakage meaning? Thanks, Yasuaki Ishimatsu NAK. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC v9 PATCH 13/21] memory-hotplug: check page type in get_page_bootmem
Hi Chen, 2012/09/29 11:15, Ni zhan Chen wrote: On 09/05/2012 05:25 PM, we...@cn.fujitsu.com wrote: From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com The function get_page_bootmem() may be called more than one time to the same page. There is no need to set page's type, private if the function is not the first time called to the page. Note: the patch is just optimization and does not fix any problem. Hi Yasuaki, this patch is reasonable to me. I have another question associated to get_page_bootmem(), the question is from another fujitsu guy's patch changelog [commit : 04753278769f3], the changelog said that: 1) When the memmap of removing section is allocated on other section by bootmem, it should/can be free. 2) When the memmap of removing section is allocated on the same section, it shouldn't be freed. Because the section has to be logical memory offlined already and all pages must be isolated against page allocater. If it is freed, page allocator may use it which will be removed physically soon. but I don't see his patch guarantee 2), it means that his patch doesn't guarantee the memmap of removing section which is allocated on other section by bootmem doesn't be freed. Hopefully get your explaination in details, thanks in advance. :-) In my understanding, the patch does not guarantee it. Please see [commit : 0c0a4a517a31e]. free_map_bootmem() in the commit guarantees it. Thanks, Yasuaki Ishimatsu CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- mm/memory_hotplug.c | 15 +++ 1 files changed, 11 insertions(+), 4 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index d736df3..26a5012 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -95,10 +95,17 @@ static void release_memory_resource(struct resource *res) static void get_page_bootmem(unsigned long info, struct page *page, unsigned long type) { -page-lru.next = (struct list_head *) type; -SetPagePrivate(page); -set_page_private(page, info); -atomic_inc(page-_count); +unsigned long page_type; + +page_type = (unsigned long)page-lru.next; +if (page_type MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE || +page_type MEMORY_HOTPLUG_MAX_BOOTMEM_TYPE){ +page-lru.next = (struct list_head *)type; +SetPagePrivate(page); +set_page_private(page, info); +atomic_inc(page-_count); +} else +atomic_inc(page-_count); } /* reference to __meminit __free_pages_bootmem is valid ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory
Hi Chen, 2012/09/29 17:19, Ni zhan Chen wrote: On 09/05/2012 05:25 PM, we...@cn.fujitsu.com wrote: From: Wen Congyang we...@cn.fujitsu.com This patch series aims to support physical memory hot-remove. The patches can free/remove the following things: - acpi_memory_info : [RFC PATCH 4/19] - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19] - iomem_resource: [RFC PATCH 9/19] - mem_section and related sysfs files : [RFC PATCH 10-11, 13-16/19] - page table of removed memory : [RFC PATCH 12/19] - node and related sysfs files : [RFC PATCH 18-19/19] If you find lack of function for physical memory hot-remove, please let me know. How to test this patchset? 1. apply this patchset and build the kernel. MEMORY_HOTPLUG, MEMORY_HOTREMOVE, ACPI_HOTPLUG_MEMORY must be selected. 2. load the module acpi_memhotplug Hi Yasuaki, where is the acpi_memhotplug module? If you build acpi_memhotplug as module, it is created under /lib/modules/kernel-version/driver/acpi/ directory. It depends on config ACPI_HOTPLUG_MEMORY. The confing is [*], it becomes built-in function. So you don't need to care about it. Thanks, Yasuaki Ishimatsu 3. hotplug the memory device(it depends on your hardware) You will see the memory device under the directory /sys/bus/acpi/devices/. Its name is PNP0C80:XX. 4. online/offline pages provided by this memory device You can write online/offline to /sys/devices/system/memory/memoryX/state to online/offline pages provided by this memory device 5. hotremove the memory device You can hotremove the memory device by the hardware, or writing 1 to /sys/bus/acpi/devices/PNP0C80:XX/eject. Note: if the memory provided by the memory device is used by the kernel, it can't be offlined. It is not a bug. Known problems: 1. memory can't be offlined when CONFIG_MEMCG is selected. For example: there is a memory device on node 1. The address range is [1G, 1.5G). You will find 4 new directories memory8, memory9, memory10, and memory11 under the directory /sys/devices/system/memory/. If CONFIG_MEMCG is selected, we will allocate memory to store page cgroup when we online pages. When we online memory8, the memory stored page cgroup is not provided by this memory device. But when we online memory9, the memory stored page cgroup may be provided by memory8. So we can't offline memory8 now. We should offline the memory in the reversed order. When the memory device is hotremoved, we will auto offline memory provided by this memory device. But we don't know which memory is onlined first, so offlining memory may fail. In such case, you should offline the memory by hand before hotremoving the memory device. 2. hotremoving memory device may cause kernel panicked This bug will be fixed by Liu Jiang's patch: https://lkml.org/lkml/2012/7/3/1 change log of v9: [RFC PATCH v9 8/21] * add a lock to protect the list map_entries * add an indicator to firmware_map_entry to remember whether the memory is allocated from bootmem [RFC PATCH v9 10/21] * change the macro to inline function [RFC PATCH v9 19/21] * don't offline the node if the cpu on the node is onlined [RFC PATCH v9 21/21] * create new patch: auto offline page_cgroup when onlining memory block failed change log of v8: [RFC PATCH v8 17/20] * Fix problems when one node's range include the other nodes [RFC PATCH v8 18/20] * fix building error when CONFIG_MEMORY_HOTPLUG_SPARSE or CONFIG_HUGETLBFS is not defined. [RFC PATCH v8 19/20] * don't offline node when some memory sections are not removed [RFC PATCH v8 20/20] * create new patch: clear hwpoisoned flag when onlining pages change log of v7: [RFC PATCH v7 4/19] * do not continue if acpi_memory_device_remove_memory() fails. [RFC PATCH v7 15/19] * handle usemap in register_page_bootmem_info_section() too. change log of v6: [RFC PATCH v6 12/19] * fix building error on other archtitectures than x86 [RFC PATCH v6 15-16/19] * fix building error on other archtitectures than x86 change log of v5: * merge the patchset to clear page table and the patchset to hot remove memory(from ishimatsu) to one big patchset. [RFC PATCH v5 1/19] * rename remove_memory() to offline_memory()/offline_pages() [RFC PATCH v5 2/19] * new patch: implement offline_memory(). This function offlines pages, update memory block's state, and notify the userspace that the memory block's state is changed. [RFC PATCH v5 4/19] * offline and remove memory in acpi_memory_disable_device() too. [RFC PATCH v5 17/19] * new patch: add a new function __remove_zone() to revert the things done in the function __add_zone(). [RFC PATCH v5 18/19] * flush work befor reseting node device. change log of v4: * remove
Re: [RFC v9 PATCH 01/21] memory-hotplug: rename remove_memory() to offline_memory()/offline_pages()
Hi Chen, 2012/09/28 11:22, Ni zhan Chen wrote: On 09/05/2012 05:25 PM, we...@cn.fujitsu.com wrote: From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com remove_memory() only try to offline pages. It is called in two cases: 1. hot remove a memory device 2. echo offline /sys/devices/system/memory/memoryXX/state In the 1st case, we should also change memory block's state, and notify the userspace that the memory block's state is changed after offlining pages. So rename remove_memory() to offline_memory()/offline_pages(). And in the 1st case, offline_memory() will be used. The function offline_memory() is not implemented. In the 2nd case, offline_pages() will be used. But this time there is not a function associated with add_memory. To associate with add_memory() later, we renamed it. Thanks, Yasuaki Ishimatsu CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com --- drivers/acpi/acpi_memhotplug.c |2 +- drivers/base/memory.c |9 +++-- include/linux/memory_hotplug.h |3 ++- mm/memory_hotplug.c| 22 ++ 4 files changed, 20 insertions(+), 16 deletions(-) diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c index 24c807f..2a7beac 100644 --- a/drivers/acpi/acpi_memhotplug.c +++ b/drivers/acpi/acpi_memhotplug.c @@ -318,7 +318,7 @@ static int acpi_memory_disable_device(struct acpi_memory_device *mem_device) */ list_for_each_entry_safe(info, n, mem_device-res_list, list) { if (info-enabled) { -result = remove_memory(info-start_addr, info-length); +result = offline_memory(info-start_addr, info-length); if (result) return result; } diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 7dda4f7..44e7de6 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -248,26 +248,23 @@ static bool pages_correctly_reserved(unsigned long start_pfn, static int memory_block_action(unsigned long phys_index, unsigned long action) { -unsigned long start_pfn, start_paddr; +unsigned long start_pfn; unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block; struct page *first_page; int ret; first_page = pfn_to_page(phys_index PFN_SECTION_SHIFT); +start_pfn = page_to_pfn(first_page); switch (action) { case MEM_ONLINE: -start_pfn = page_to_pfn(first_page); - if (!pages_correctly_reserved(start_pfn, nr_pages)) return -EBUSY; ret = online_pages(start_pfn, nr_pages); break; case MEM_OFFLINE: -start_paddr = page_to_pfn(first_page) PAGE_SHIFT; -ret = remove_memory(start_paddr, -nr_pages PAGE_SHIFT); +ret = offline_pages(start_pfn, nr_pages); break; default: WARN(1, KERN_WARNING %s(%ld, %ld) unknown action: diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 910550f..c183f39 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -233,7 +233,8 @@ static inline int is_mem_section_removable(unsigned long pfn, extern int mem_online_node(int nid); extern int add_memory(int nid, u64 start, u64 size); extern int arch_add_memory(int nid, u64 start, u64 size); -extern int remove_memory(u64 start, u64 size); +extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages); +extern int offline_memory(u64 start, u64 size); extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn, int nr_pages); extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 3ad25f9..bb42316 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -866,7 +866,7 @@ check_pages_isolated(unsigned long start_pfn, unsigned long end_pfn) return offlined; } -static int __ref offline_pages(unsigned long start_pfn, +static int __ref __offline_pages(unsigned long start_pfn, unsigned long end_pfn, unsigned long timeout) { unsigned long pfn, nr_pages, expire; @@ -994,18 +994,24 @@ out: return ret; } -int remove_memory(u64 start, u64 size) +int offline_pages(unsigned long start_pfn, unsigned long nr_pages) { -unsigned long start_pfn, end_pfn; +return __offline_pages(start_pfn, start_pfn + nr_pages, 120 * HZ); +} -start_pfn = PFN_DOWN(start
Re: [RFC v9 PATCH 05/21] memory-hotplug: check whether memory is present or not
Hi Wen, 2012/09/11 11:15, Wen Congyang wrote: Hi, ishimatsu At 09/05/2012 05:25 PM, we...@cn.fujitsu.com Wrote: From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com If system supports memory hot-remove, online_pages() may online removed pages. So online_pages() need to check whether onlining pages are present or not. Because we use memory_block_change_state() to hotremoving memory, I think this patch can be removed. What do you think? Pleae teach me detals a little more. If we use memory_block_change_state(), does the conflict never occur? Why? Thansk, Yasuaki Ishimatsu Thanks Wen Congyang CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- include/linux/mmzone.h | 19 +++ mm/memory_hotplug.c| 13 + 2 files changed, 32 insertions(+), 0 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 2daa54f..ac3ae30 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1180,6 +1180,25 @@ void sparse_init(void); #define sparse_index_init(_sec, _nid) do {} while (0) #endif /* CONFIG_SPARSEMEM */ +#ifdef CONFIG_SPARSEMEM +static inline int pfns_present(unsigned long pfn, unsigned long nr_pages) +{ + int i; + for (i = 0; i nr_pages; i++) { + if (pfn_present(pfn + i)) + continue; + else + return -EINVAL; + } + return 0; +} +#else +static inline int pfns_present(unsigned long pfn, unsigned long nr_pages) +{ + return 0; +} +#endif /* CONFIG_SPARSEMEM*/ + #ifdef CONFIG_NODES_SPAN_OTHER_NODES bool early_pfn_in_nid(unsigned long pfn, int nid); #else diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 49f7747..299747d 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -467,6 +467,19 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages) struct memory_notify arg; lock_memory_hotplug(); + /* +* If system supports memory hot-remove, the memory may have been +* removed. So we check whether the memory has been removed or not. +* +* Note: When CONFIG_SPARSEMEM is defined, pfns_present() become +* effective. If CONFIG_SPARSEMEM is not defined, pfns_present() +* always returns 0. +*/ + ret = pfns_present(pfn, nr_pages); + if (ret) { + unlock_memory_hotplug(); + return ret; + } arg.start_pfn = pfn; arg.nr_pages = nr_pages; arg.status_change_nid = -1; ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory
Hi Wen, 2012/09/01 5:49, Andrew Morton wrote: On Tue, 28 Aug 2012 18:00:07 +0800 we...@cn.fujitsu.com wrote: This patch series aims to support physical memory hot-remove. Have you had much review and testing feedback yet? The patches can free/remove the following things: - acpi_memory_info : [RFC PATCH 4/19] - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19] - iomem_resource: [RFC PATCH 9/19] - mem_section and related sysfs files : [RFC PATCH 10-11, 13-16/19] - page table of removed memory : [RFC PATCH 12/19] - node and related sysfs files : [RFC PATCH 18-19/19] If you find lack of function for physical memory hot-remove, please let me know. I doubt if many people have hardware which permits physical memory removal? How would you suggest that people with regular hardware can test these chagnes? How do you test the patch? As Andrew says, for hot-removing memory, we need a particular hardware. I think so too. So many people may want to know how to test the patch. If we apply following patch to kvm guest, can we hot-remove memory on kvm guest? http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg01389.html Thanks, Yasuaki Ishimatsu Known problems: 1. memory can't be offlined when CONFIG_MEMCG is selected. That's quite a problem! Do you have a description of why this is the case, and a plan for fixing it? ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC v8 PATCH 13/20] memory-hotplug: check page type in get_page_bootmem
Hi Wen, 2012/09/04 12:46, Wen Congyang wrote: Hi, isimatu-san At 09/01/2012 05:30 AM, Andrew Morton Wrote: On Tue, 28 Aug 2012 18:00:20 +0800 we...@cn.fujitsu.com wrote: From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com There is a possibility that get_page_bootmem() is called to the same page many times. So when get_page_bootmem is called to the same page, the function only increments page-_count. I really don't understand this explanation, even after having looked at the code. Can you please have another attempt at the changelog? What is the problem that you want to fix? The function get_page_bootmem() may be called to the same page more than once, but I don't find any problem about current implementation. The patch is just optimization. The patch does not fix a problems. As you know, the function may be called many times for the same page. I think if a page is sets to page_type and Page Private flag and page-private, the page need not be set the same things again. So we check page_type when get_page_bootmem() is called. And if the page has been set to them, the page is only incremented page-_count. Thanks, Yasuaki Ishimatsu Thanks Wen Congyang --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -95,10 +95,17 @@ static void release_memory_resource(struct resource *res) static void get_page_bootmem(unsigned long info, struct page *page, unsigned long type) { - page-lru.next = (struct list_head *) type; - SetPagePrivate(page); - set_page_private(page, info); - atomic_inc(page-_count); + unsigned long page_type; + + page_type = (unsigned long) page-lru.next; + if (page_type MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE || + page_type MEMORY_HOTPLUG_MAX_BOOTMEM_TYPE){ + page-lru.next = (struct list_head *) type; + SetPagePrivate(page); + set_page_private(page, info); + atomic_inc(page-_count); + } else + atomic_inc(page-_count); } And a code comment which explains what is going on would be good. As is always the case ;) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH v5 00/19] memory-hotplug: hot-remove physical memory
Hi Wen, 2012/07/27 19:20, Wen Congyang wrote: This patch series aims to support physical memory hot-remove. The patches can free/remove following things: - acpi_memory_info : [RFC PATCH 4/19] - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19] - iomem_resource: [RFC PATCH 9/19] - mem_section and related sysfs files : [RFC PATCH 10-11, 13-16/19] - page table of removed memory : [RFC PATCH 12/19] - node and related sysfs files : [RFC PATCH 18-19/19] If you find lack of function for physical memory hot-remove, please let me know. change log of v5: * merge the patchset to clear page table and the patchset to hot remove memory(from ishimatsu) to one big patchset. Thank you for merging patches. I'll review next Monday. Thanks, Yasuaki Ishimatsu [RFC PATCH v5 1/19] * rename remove_memory() to offline_memory()/offline_pages() [RFC PATCH v5 2/19] * new patch: implement offline_memory(). This function offlines pages, update memory block's state, and notify the userspace that the memory block's state is changed. [RFC PATCH v5 4/19] * offline and remove memory in acpi_memory_disable_device() too. [RFC PATCH v5 17/19] * new patch: add a new function __remove_zone() to revert the things done in the function __add_zone(). [RFC PATCH v5 18/19] * flush work befor reseting node device. change log of v4: * remove memory-hotplug : unify argument of firmware_map_add_early/hotplug from the patch series, since the patch is a bugfix. It is being disccussed on other thread. But for testing the patch series, the patch is needed. So I added the patch as [PATCH 0/13]. [RFC PATCH v4 2/13] * check memory is online or not at remove_memory() * add memory_add_physaddr_to_nid() to acpi_memory_device_remove() for getting node id [RFC PATCH v4 3/13] * create new patch : check memory is online or not at online_pages() [RFC PATCH v4 4/13] * add __ref section to remove_memory() * call firmware_map_remove_entry() before remove_sysfs_fw_map_entry() [RFC PATCH v4 11/13] * rewrite register_page_bootmem_memmap() for removing page used as PT/PMD change log of v3: * rebase to 3.5.0-rc6 [RFC PATCH v2 2/13] * remove extra kobject_put() * The patch was commented by Wen. Wen's comment is acpi_memory_device_remove() should ignore a return value of remove_memory() since caller does not care the return value. But I did not change it since I think caller should care the return value. And I am trying to fix it as follow: https://lkml.org/lkml/2012/7/5/624 [RFC PATCH v2 4/13] * remove a firmware_memmap_entry allocated by kzmalloc() change log of v2: [RFC PATCH v2 2/13] * check whether memory block is offline or not before calling offline_memory() * check whether section is valid or not in is_memblk_offline() * call kobject_put() for each memory_block in is_memblk_offline() [RFC PATCH v2 3/13] * unify the end argument of firmware_map_add_early/hotplug [RFC PATCH v2 4/13] * add release_firmware_map_entry() for freeing firmware_map_entry [RFC PATCH v2 6/13] * add release_memory_block() for freeing memory_block [RFC PATCH v2 11/13] * fix wrong arguments of free_pages() Wen Congyang (5): memory-hotplug: implement offline_memory() memory-hotplug: store the node id in acpi_memory_device memory-hotplug: export the function acpi_bus_remove() memory-hotplug: call acpi_bus_remove() to remove memory device memory-hotplug: introduce new function arch_remove_memory() Yasuaki Ishimatsu (14): memory-hotplug: rename remove_memory() to offline_memory()/offline_pages() memory-hotplug: offline and remove memory when removing the memory device memory-hotplug: check whether memory is present or not memory-hotplug: remove /sys/firmware/memmap/X sysfs memory-hotplug: does not release memory region in PAGES_PER_SECTION chunks memory-hotplug: add memory_block_release memory-hotplug: remove_memory calls __remove_pages memory-hotplug: check page type in get_page_bootmem memory-hotplug: move register_page_bootmem_info_node and put_page_bootmem for sparse-vmemmap memory-hotplug: implement register_page_bootmem_info_section of sparse-vmemmap memory-hotplug: free memmap of sparse-vmemmap memory_hotplug: clear zone when the memory is removed memory-hotplug: add node_device_release memory-hotplug: remove sysfs file of node arch/ia64/mm/init.c | 16 + arch/powerpc/mm/mem.c | 14 + arch/powerpc/platforms/pseries/hotplug-memory.c | 16 +- arch/s390/mm/init.c |8 + arch/sh/mm/init.c | 15 + arch/tile/mm/init.c |8 + arch
Re: [RFC PATCH v5 19/19] memory-hotplug: remove sysfs file of node
Hi Wen, 2012/07/27 19:36, Wen Congyang wrote: From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com The patch adds node_set_offline() and unregister_one_node() to remove_memory() for removing sysfs file of node. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- mm/memory_hotplug.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 5ac035f..5681968 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1267,6 +1267,11 @@ int __ref remove_memory(int nid, u64 start, u64 size) /* remove memmap entry */ firmware_map_remove(start, start + size, System RAM); + if (!node_present_pages(nid)) { Applying [PATCH v5 17/19], pgdat-node_spanned_pages can become 0 when all memory of the pgdat is removed. When pgdat-node_spanned_pages is 0, it means the pgdat has no memory. So I think node_spanned_pages() is better. Thanks, Yasuaki Ishimatsu + node_set_offline(nid); + unregister_one_node(nid); + } + arch_remove_memory(start, size); out: unlock_memory_hotplug(); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH 0/8] memory-hotplug : hot-remove physical memory(clear page table)
[Hi Wen, Good news!! I was waiting for this patch to come. Applying the patches, can we hot-remove physical memory completely? Thanks, Yasuaki Ishimatsu 2012/07/20 16:06, Wen Congyang wrote: This patch series aims to support physical memory hot-remove(clear page table). This patch series base on ishimatsu's patch series. You can get it here: http://www.spinics.net/lists/linux-acpi/msg36804.html The patches can remove following things: - page table of removed memory If you find lack of function for physical memory hot-remove, please let me know. Note: * The patch remove memory info from list before freeing it is being disccussed in other thread. But for testing the patch series, the patch is needed. So I added the patch as [PATCH 0/8]. * You need to apply ishimatsu's patch series first before applying this patch series. Wen Congyang (8): memory-hotplug: store the node id in acpi_memory_device memory-hotplug: offline memory only when it is onlined memory-hotplug: call remove_memory() to cleanup when removing memory device memory-hotplug: export the function acpi_bus_remove() memory-hotplug: call acpi_bus_remove() to remove memory device memory-hotplug: introduce new function arch_remove_memory() x86: make __split_large_page() generally avialable memory-hotplug: implement arch_remove_memory() arch/ia64/mm/init.c | 16 arch/powerpc/mm/mem.c| 14 +++ arch/s390/mm/init.c |8 ++ arch/sh/mm/init.c| 15 +++ arch/tile/mm/init.c |8 ++ arch/x86/include/asm/pgtable_types.h |1 + arch/x86/mm/init_32.c| 10 ++ arch/x86/mm/init_64.c| 160 ++ arch/x86/mm/pageattr.c | 47 +- drivers/acpi/acpi_memhotplug.c | 24 -- drivers/acpi/scan.c |3 +- include/acpi/acpi_bus.h |1 + include/linux/memory_hotplug.h |1 + mm/memory_hotplug.c |2 +- 14 files changed, 280 insertions(+), 30 deletions(-) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH 1/8] memory-hotplug: store the node id in acpi_memory_device
Hi Wen, 2012/07/20 16:09, Wen Congyang wrote: The memory device has only one node id. Store the node id when enabling the memory device, and we can reuse it when removing the memory device. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com --- It looks to me. Reviewed-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com Thanks, Yasuaki Ishimatsu drivers/acpi/acpi_memhotplug.c |8 +--- 1 files changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c index 5cafd6b..db8de39 100644 --- a/drivers/acpi/acpi_memhotplug.c +++ b/drivers/acpi/acpi_memhotplug.c @@ -84,6 +84,7 @@ struct acpi_memory_info { struct acpi_memory_device { struct acpi_device * device; unsigned int state; /* State of the memory device */ + int nid; struct list_head res_list; }; @@ -257,6 +258,9 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device) info-enabled = 1; num_enabled++; } + + mem_device-nid = node; + if (!num_enabled) { printk(KERN_ERR PREFIX add_memory failed\n); mem_device-state = MEMORY_INVALID_STATE; @@ -463,7 +467,7 @@ static int acpi_memory_device_remove(struct acpi_device *device, int type) mem_device = acpi_driver_data(device); - node = acpi_get_node(mem_device-device-handle); + node = mem_device-nid; list_for_each_entry_safe(info, tmp, mem_device-res_list, list) { if (!info-enabled) continue; @@ -473,8 +477,6 @@ static int acpi_memory_device_remove(struct acpi_device *device, int type) if (result) return result; } - if (node 0) - node = memory_add_physaddr_to_nid(info-start_addr); result = remove_memory(node, info-start_addr, info-length); if (result) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH 2/8] memory-hotplug: offline memory only when it is onlined
Hi Wen, 2012/07/20 16:10, Wen Congyang wrote: offline_memory() will fail if the memory is not onlined. So check whether the memory is onlined before calling offline_memory(). CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com --- I have no comment. Reviewed-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com Thanks, Yasuaki Ishimatsu drivers/acpi/acpi_memhotplug.c | 10 +++--- 1 files changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c index db8de39..712e767 100644 --- a/drivers/acpi/acpi_memhotplug.c +++ b/drivers/acpi/acpi_memhotplug.c @@ -323,9 +323,13 @@ static int acpi_memory_disable_device(struct acpi_memory_device *mem_device) */ list_for_each_entry_safe(info, n, mem_device-res_list, list) { if (info-enabled) { - result = offline_memory(info-start_addr, info-length); - if (result) - return result; + if (!is_memblk_offline(info-start_addr, + info-length)) { + result = offline_memory(info-start_addr, + info-length); + if (result) + return result; + } } list_del(info-list); kfree(info); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH 3/8] memory-hotplug: call remove_memory() to cleanup when removing memory device
Hi Wen, 2012/07/20 16:10, Wen Congyang wrote: We should remove the following things when removing the memory device: 1. memmap and related sysfs files 2. iomem_resource 3. mem_section and related sysfs files 4. node and related sysfs files The function remove_memory() can do this. So call it after the memory device is offlined. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com --- I have no comment. Reviewed-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com Thanks, Yasuaki Ishimatsu drivers/acpi/acpi_memhotplug.c |7 ++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c index 712e767..58e4e63 100644 --- a/drivers/acpi/acpi_memhotplug.c +++ b/drivers/acpi/acpi_memhotplug.c @@ -315,7 +315,7 @@ static int acpi_memory_disable_device(struct acpi_memory_device *mem_device) { int result; struct acpi_memory_info *info, *n; - + int node = mem_device-nid; /* * Ask the VM to offline this memory range. @@ -330,6 +330,11 @@ static int acpi_memory_disable_device(struct acpi_memory_device *mem_device) if (result) return result; } + + result = remove_memory(node, info-start_addr, + info-length); + if (result) + return result; } list_del(info-list); kfree(info); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH 6/8] memory-hotplug: introduce new function arch_remove_memory()
2012/07/20 16:12, Wen Congyang wrote: We don't call __add_pages() directly in the function add_memory() because some other architecture related thins needs to be done before or after calling __add_pages(). So we should not call __remove_pages() directly in the function remove_memory. Introduce new function arch_remove_memory() to revert the things done in arch_add_memory(). Note: the function for x86_64 will be implemented later. And I don't know how to implement it for s390. I think you need cc to other arch ML for reviewing the patch. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com --- arch/ia64/mm/init.c| 16 arch/powerpc/mm/mem.c | 14 ++ arch/s390/mm/init.c|8 arch/sh/mm/init.c | 15 +++ arch/tile/mm/init.c|8 arch/x86/mm/init_32.c | 10 ++ arch/x86/mm/init_64.c |7 +++ include/linux/memory_hotplug.h |1 + mm/memory_hotplug.c|2 +- 9 files changed, 80 insertions(+), 1 deletions(-) diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c index 0eab454..1e345ed 100644 --- a/arch/ia64/mm/init.c +++ b/arch/ia64/mm/init.c @@ -688,6 +688,22 @@ int arch_add_memory(int nid, u64 start, u64 size) return ret; } + +#ifdef CONFIG_MEMORY_HOTREMOVE +int arch_remove_memory(u64 start, u64 size) +{ + unsigned long start_pfn = start PAGE_SHIFT; + unsigned long nr_pages = size PAGE_SHIFT; + int ret; + + ret = __remove_pages(start_pfn, nr_pages); + if (ret) + pr_warn(%s: Problem encountered in __remove_pages() as +ret=%d\n, __func__, ret); + + return ret; +} +#endif #endif /* diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c index baaafde..249cef4 100644 --- a/arch/powerpc/mm/mem.c +++ b/arch/powerpc/mm/mem.c @@ -133,6 +133,20 @@ int arch_add_memory(int nid, u64 start, u64 size) return __add_pages(nid, zone, start_pfn, nr_pages); } + +#ifdef CONFIG_MEMORY_HOTREMOVE +int arch_remove_memory(u64 start, u64 size) +{ + unsigned long start_pfn = start PAGE_SHIFT; + unsigned long nr_pages = size PAGE_SHIFT; + + start = (unsigned long)__va(start); + if (remove_section_mapping(start, start + size)) + return -EINVAL; + + return __remove_pages(start_pfn, nr_pages); +} +#endif #endif /* CONFIG_MEMORY_HOTPLUG */ /* diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c index 2bea060..3de0d5b 100644 --- a/arch/s390/mm/init.c +++ b/arch/s390/mm/init.c @@ -259,4 +259,12 @@ int arch_add_memory(int nid, u64 start, u64 size) vmem_remove_mapping(start, size); return rc; } + +#ifdef CONFIG_MEMORY_HOTREMOVE +int arch_remove_memory(u64 start, u64 size) +{ + /* TODO */ + return -EBUSY; +} +#endif #endif /* CONFIG_MEMORY_HOTPLUG */ diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c index 82cc576..fc84491 100644 --- a/arch/sh/mm/init.c +++ b/arch/sh/mm/init.c @@ -558,4 +558,19 @@ int memory_add_physaddr_to_nid(u64 addr) EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); #endif +#ifdef CONFIG_MEMORY_HOTREMOVE +int arch_remove_memory(u64 start, u64 size) +{ + unsigned long start_pfn = start PAGE_SHIFT; + unsigned long nr_pages = size PAGE_SHIFT; + int ret; + + ret = __remove_pages(start_pfn, nr_pages); + if (unlikely(ret)) + pr_warn(%s: Failed, __remove_pages() == %d\n, __func__, + ret); + + return ret; +} +#endif #endif /* CONFIG_MEMORY_HOTPLUG */ diff --git a/arch/tile/mm/init.c b/arch/tile/mm/init.c index 630dd2c..bdd8a99 100644 --- a/arch/tile/mm/init.c +++ b/arch/tile/mm/init.c @@ -947,6 +947,14 @@ int remove_memory(u64 start, u64 size) { return -EINVAL; } + +#ifdef CONFIG_MEMORY_HOTREMOVE +int arch_remove_memory(u64 start, u64 size) +{ + /* TODO */ + return -EBUSY; +} +#endif #endif struct kmem_cache *pgd_cache; diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c index 575d86f..a690153 100644 --- a/arch/x86/mm/init_32.c +++ b/arch/x86/mm/init_32.c @@ -842,6 +842,16 @@ int arch_add_memory(int nid, u64 start, u64 size) return __add_pages(nid, zone, start_pfn, nr_pages); } + +#ifdef CONFIG_MEMORY_HOTREMOVE +int arch_remove_memory(unsigned long start, unsigned long size) +{ + unsigned long start_pfn = start PAGE_SHIFT; + unsigned long nr_pages = size PAGE_SHIFT; + + return __remove_pages
Re: [RFC PATCH v4 11/13] memory-hotplug : free memmap of sparse-vmemmap
Hi Wen, 2012/07/19 14:58, Wen Congyang wrote: At 07/18/2012 06:16 PM, Yasuaki Ishimatsu Wrote: All pages of virtual mapping in removed memory cannot be freed, since some pages used as PGD/PUD includes not only removed memory but also other memory. So the patch checks whether page can be freed or not. How to check whether page can be freed or not? 1. When removing memory, the page structs of the revmoved memory are filled with 0FD. 2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared. In this case, the page used as PT/PMD can be freed. Applying patch, __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is integrated into one. So __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is deleted. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- arch/x86/mm/init_64.c | 121 ++ include/linux/mm.h|2 mm/memory_hotplug.c | 19 --- mm/sparse.c |5 +- 4 files changed, 128 insertions(+), 19 deletions(-) Index: linux-3.5-rc6/include/linux/mm.h === --- linux-3.5-rc6.orig/include/linux/mm.h2012-07-18 18:01:28.0 +0900 +++ linux-3.5-rc6/include/linux/mm.h 2012-07-18 18:03:05.551168773 +0900 @@ -1588,6 +1588,8 @@ int vmemmap_populate(struct page *start_ void vmemmap_populate_print_last(void); void register_page_bootmem_memmap(unsigned long section_nr, struct page *map, unsigned long size); +void vmemmap_kfree(struct page *memmpa, unsigned long nr_pages); +void vmemmap_free_bootmem(struct page *memmpa, unsigned long nr_pages); enum mf_flags { MF_COUNT_INCREASED = 1 0, Index: linux-3.5-rc6/mm/sparse.c === --- linux-3.5-rc6.orig/mm/sparse.c 2012-07-18 17:59:25.0 +0900 +++ linux-3.5-rc6/mm/sparse.c2012-07-18 18:03:05.553168749 +0900 @@ -614,12 +614,13 @@ static inline struct page *kmalloc_secti /* This will make the necessary allocations eventually. */ return sparse_mem_map_populate(pnum, nid); } -static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages) +static void __kfree_section_memmap(struct page *page, unsigned long nr_pages) { -return; /* XXX: Not implemented yet */ +vmemmap_kfree(page, nr_pages); } static void free_map_bootmem(struct page *page, unsigned long nr_pages) { +vmemmap_free_bootmem(page, nr_pages); } #else static struct page *__kmalloc_section_memmap(unsigned long nr_pages) Index: linux-3.5-rc6/arch/x86/mm/init_64.c === --- linux-3.5-rc6.orig/arch/x86/mm/init_64.c 2012-07-18 18:01:28.0 +0900 +++ linux-3.5-rc6/arch/x86/mm/init_64.c 2012-07-18 18:03:05.564168611 +0900 @@ -978,6 +978,127 @@ vmemmap_populate(struct page *start_page return 0; } +#define PAGE_INUSE 0xFD + +unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long end, +struct page **pp, int *page_size) +{ +pgd_t *pgd; +pud_t *pud; +pmd_t *pmd; +pte_t *pte; +void *page_addr; +unsigned long next; + +*pp = NULL; + +pgd = pgd_offset_k(addr); +if (pgd_none(*pgd)) +return pgd_addr_end(addr, end); + +pud = pud_offset(pgd, addr); +if (pud_none(*pud)) +return pud_addr_end(addr,end); + +if (!cpu_has_pse) { +next = (addr + PAGE_SIZE) PAGE_MASK; +pmd = pmd_offset(pud, addr); +if (pmd_none(*pmd)) +return next; + +pte = pte_offset_kernel(pmd, addr); +if (pte_none(*pte)) +return next; + +*page_size = PAGE_SIZE; +*pp = pte_page(*pte); +} else { +next = pmd_addr_end(addr, end); + +pmd = pmd_offset(pud, addr); +if (pmd_none(*pmd)) +return next; + +*page_size = PMD_SIZE; +*pp = pmd_page(*pmd); +} + +/* + * Removed page structs are filled with 0xFD. + */ +memset((void *)addr, PAGE_INUSE, next - addr); + +page_addr = page_address(*pp); + +/* + * Check the page is filled with 0xFD or not. + * memchr_inv() returns the address. In this case, we cannot + * clear PTE/PUD entry, since the page is used by other
[RESEND RFC PATCH v4 11/13] memory-hotplug : free memmap of sparse-vmemmap
All pages of virtual mapping in removed memory cannot be freed, since some pages used as PGD/PUD includes not only removed memory but also other memory. So the patch checks whether page can be freed or not. How to check whether page can be freed or not? 1. When removing memory, the page structs of the revmoved memory are filled with 0FD. 2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared. In this case, the page used as PT/PMD can be freed. Applying patch, __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is integrated into one. So __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is deleted. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- arch/x86/mm/init_64.c | 121 ++ include/linux/mm.h|2 mm/memory_hotplug.c | 17 --- mm/sparse.c |5 +- 4 files changed, 128 insertions(+), 17 deletions(-) Index: linux-3.5-rc6/include/linux/mm.h === --- linux-3.5-rc6.orig/include/linux/mm.h 2012-07-19 15:07:48.836986796 +0900 +++ linux-3.5-rc6/include/linux/mm.h2012-07-19 15:07:59.101858469 +0900 @@ -1588,6 +1588,8 @@ int vmemmap_populate(struct page *start_ void vmemmap_populate_print_last(void); void register_page_bootmem_memmap(unsigned long section_nr, struct page *map, unsigned long size); +void vmemmap_kfree(struct page *memmpa, unsigned long nr_pages); +void vmemmap_free_bootmem(struct page *memmpa, unsigned long nr_pages); enum mf_flags { MF_COUNT_INCREASED = 1 0, Index: linux-3.5-rc6/mm/sparse.c === --- linux-3.5-rc6.orig/mm/sparse.c 2012-07-19 11:57:09.065797011 +0900 +++ linux-3.5-rc6/mm/sparse.c 2012-07-19 15:07:59.114858306 +0900 @@ -614,12 +614,13 @@ static inline struct page *kmalloc_secti /* This will make the necessary allocations eventually. */ return sparse_mem_map_populate(pnum, nid); } -static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages) +static void __kfree_section_memmap(struct page *page, unsigned long nr_pages) { - return; /* XXX: Not implemented yet */ + vmemmap_kfree(page, nr_pages); } static void free_map_bootmem(struct page *page, unsigned long nr_pages) { + vmemmap_free_bootmem(page, nr_pages); } #else static struct page *__kmalloc_section_memmap(unsigned long nr_pages) Index: linux-3.5-rc6/arch/x86/mm/init_64.c === --- linux-3.5-rc6.orig/arch/x86/mm/init_64.c2012-07-19 15:07:48.898986022 +0900 +++ linux-3.5-rc6/arch/x86/mm/init_64.c 2012-07-19 15:14:05.870273270 +0900 @@ -978,6 +978,127 @@ vmemmap_populate(struct page *start_page return 0; } +#define PAGE_INUSE 0xFD + +unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long end, + struct page **pp, int *page_size) +{ + pgd_t *pgd; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + void *page_addr; + unsigned long next; + + *pp = NULL; + + pgd = pgd_offset_k(addr); + if (pgd_none(*pgd)) + return pgd_addr_end(addr, end); + + pud = pud_offset(pgd, addr); + if (pud_none(*pud)) + return pud_addr_end(addr, end); + + if (!cpu_has_pse) { + next = (addr + PAGE_SIZE) PAGE_MASK; + pmd = pmd_offset(pud, addr); + if (pmd_none(*pmd)) + return next; + + pte = pte_offset_kernel(pmd, addr); + if (pte_none(*pte)) + return next; + + *page_size = PAGE_SIZE; + *pp = pte_page(*pte); + } else { + next = pmd_addr_end(addr, end); + + pmd = pmd_offset(pud, addr); + if (pmd_none(*pmd)) + return next; + + *page_size = PMD_SIZE; + *pp = pmd_page(*pmd); + } + + /* +* Removed page structs are filled with 0xFD. +*/ + memset((void *)addr, PAGE_INUSE, next - addr); + + page_addr = page_address(*pp); + + /* +* Check the page is filled with 0xFD or not. +* memchr_inv() returns the address. In this case, we cannot +* clear PTE/PUD entry, since the page is used by other. +* So we cannot also free the page. +* +* memchr_inv() returns NULL. In this case, we
Re: [RFC PATCH v4 1/13] memory-hotplug : rename remove_memory to offline_memory
Hi Bob, 2012/07/19 17:19, Bob Liu wrote: Hi Yasuaki, On Wed, Jul 18, 2012 at 6:05 PM, Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com wrote: remove_memory() does not remove memory but just offlines memory. The patch changes name of it to offline_memory(). Since offline_memory() just align the start/end pfn and there is no matched online_memory() function, i think it's better to remove this function and add the alignment into offline_pages(). If we change it, these argument becomes different as follows: online_pages : page frame number and number of page frame number offline_pages : memory address and memory length I think it is ugly. So I don't want to change it. As you say, there is no function that matches to offline_memory(). If we create export symbol function for onlining page, in this case, the function should be named online_memory(). Thanks, Yasuaki Ishimatsu CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- drivers/acpi/acpi_memhotplug.c |2 +- drivers/base/memory.c |4 ++-- include/linux/memory_hotplug.h |2 +- mm/memory_hotplug.c|6 +++--- 4 files changed, 7 insertions(+), 7 deletions(-) Index: linux-3.5-rc4/drivers/acpi/acpi_memhotplug.c === --- linux-3.5-rc4.orig/drivers/acpi/acpi_memhotplug.c 2012-07-03 14:21:46.102416917 +0900 +++ linux-3.5-rc4/drivers/acpi/acpi_memhotplug.c2012-07-03 14:21:49.458374960 +0900 @@ -318,7 +318,7 @@ static int acpi_memory_disable_device(st */ list_for_each_entry_safe(info, n, mem_device-res_list, list) { if (info-enabled) { - result = remove_memory(info-start_addr, info-length); + result = offline_memory(info-start_addr, info-length); if (result) return result; } Index: linux-3.5-rc4/drivers/base/memory.c === --- linux-3.5-rc4.orig/drivers/base/memory.c2012-07-03 14:21:46.095417003 +0900 +++ linux-3.5-rc4/drivers/base/memory.c 2012-07-03 14:21:49.459374948 +0900 @@ -266,8 +266,8 @@ memory_block_action(unsigned long phys_i break; case MEM_OFFLINE: start_paddr = page_to_pfn(first_page) PAGE_SHIFT; - ret = remove_memory(start_paddr, - nr_pages PAGE_SHIFT); + ret = offline_memory(start_paddr, +nr_pages PAGE_SHIFT); break; default: WARN(1, KERN_WARNING %s(%ld, %ld) unknown action: Index: linux-3.5-rc4/mm/memory_hotplug.c === --- linux-3.5-rc4.orig/mm/memory_hotplug.c 2012-07-03 14:21:46.102416917 +0900 +++ linux-3.5-rc4/mm/memory_hotplug.c 2012-07-03 14:21:49.466374860 +0900 @@ -990,7 +990,7 @@ out: return ret; } -int remove_memory(u64 start, u64 size) +int offline_memory(u64 start, u64 size) { unsigned long start_pfn, end_pfn; @@ -999,9 +999,9 @@ int remove_memory(u64 start, u64 size) return offline_pages(start_pfn, end_pfn, 120 * HZ); } #else -int remove_memory(u64 start, u64 size) +int offline_memory(u64 start, u64 size) { return -EINVAL; } #endif /* CONFIG_MEMORY_HOTREMOVE */ -EXPORT_SYMBOL_GPL(remove_memory); +EXPORT_SYMBOL_GPL(offline_memory); Index: linux-3.5-rc4/include/linux/memory_hotplug.h === --- linux-3.5-rc4.orig/include/linux/memory_hotplug.h 2012-07-03 14:21:46.102416917 +0900 +++ linux-3.5-rc4/include/linux/memory_hotplug.h2012-07-03 14:21:49.471374796 +0900 @@ -233,7 +233,7 @@ static inline int is_mem_section_removab extern int mem_online_node(int nid); extern int add_memory(int nid, u64 start, u64 size); extern int arch_add_memory(int nid, u64 start, u64 size); -extern int remove_memory(u64 start, u64 size); +extern int offline_memory(u64 start, u64 size); extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn, int nr_pages); extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info
Re: [RFC PATCH v4 7/13] memory-hotplug : remove_memory calls __remove_pages
Hi Bob, 2012/07/19 17:32, Bob Liu wrote: On Wed, Jul 18, 2012 at 6:12 PM, Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com wrote: The patch adds __remove_pages() to remove_memory(). Then the range of phys_start_pfn argument and nr_pages argument in __remove_pagse() may have different zone. So zone argument is removed from __remove_pages() and __remove_pages() caluculates zone in each section. When CONFIG_SPARSEMEM_VMEMMAP is defined, there is no way to remove a memmap. So __remove_section only calls unregister_memory_section(). CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- arch/powerpc/platforms/pseries/hotplug-memory.c |5 + include/linux/memory_hotplug.h |3 +-- mm/memory_hotplug.c | 19 --- 3 files changed, 14 insertions(+), 13 deletions(-) Index: linux-3.5-rc6/mm/memory_hotplug.c === --- linux-3.5-rc6.orig/mm/memory_hotplug.c 2012-07-18 18:00:27.440145432 +0900 +++ linux-3.5-rc6/mm/memory_hotplug.c 2012-07-18 18:01:02.070712487 +0900 @@ -275,11 +275,14 @@ static int __meminit __add_section(int n #ifdef CONFIG_SPARSEMEM_VMEMMAP static int __remove_section(struct zone *zone, struct mem_section *ms) { - /* -* XXX: Freeing memmap with vmemmap is not implement yet. -* This should be removed later. -*/ - return -EBUSY; + int ret = -EINVAL; + + if (!valid_section(ms)) + return ret; + + ret = unregister_memory_section(ms); + I saw a patch from Jiang Liu mm/hotplug: free zone-pageset when a zone becomes empty to free the zone-pageset and i think there may more cleanup needed when a zone becomes empty. We already have __add_zone() in __add_section(), what about add a function like __remove_zone() to do the cleanup here? Thank you for your cooment. As you say, I think cleanup function of zone is necessary. So I'll update it. Thanks, Yasuaki Ishimatsu. + return ret; } #else static int __remove_section(struct zone *zone, struct mem_section *ms) @@ -346,11 +349,11 @@ EXPORT_SYMBOL_GPL(__add_pages); * sure that pages are marked reserved and zones are adjust properly by * calling offline_pages(). */ -int __remove_pages(struct zone *zone, unsigned long phys_start_pfn, -unsigned long nr_pages) +int __remove_pages(unsigned long phys_start_pfn, unsigned long nr_pages) { unsigned long i, ret = 0; int sections_to_remove; + struct zone *zone; /* * We can only remove entire sections @@ -363,6 +366,7 @@ int __remove_pages(struct zone *zone, un sections_to_remove = nr_pages / PAGES_PER_SECTION; for (i = 0; i sections_to_remove; i++) { unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION; + zone = page_zone(pfn_to_page(pfn)); ret = __remove_section(zone, __pfn_to_section(pfn)); if (ret) break; @@ -1031,6 +1035,7 @@ int __ref remove_memory(int nid, u64 sta /* remove memmap entry */ firmware_map_remove(start, start + size, System RAM); + __remove_pages(start PAGE_SHIFT, size PAGE_SHIFT); out: unlock_memory_hotplug(); return ret; Index: linux-3.5-rc6/include/linux/memory_hotplug.h === --- linux-3.5-rc6.orig/include/linux/memory_hotplug.h 2012-07-18 18:00:27.445145371 +0900 +++ linux-3.5-rc6/include/linux/memory_hotplug.h2012-07-18 18:00:40.461982690 +0900 @@ -89,8 +89,7 @@ extern bool is_pageblock_removable_noloc /* reasonably generic interface to expand the physical pages in a zone */ extern int __add_pages(int nid, struct zone *zone, unsigned long start_pfn, unsigned long nr_pages); -extern int __remove_pages(struct zone *zone, unsigned long start_pfn, - unsigned long nr_pages); +extern int __remove_pages(unsigned long start_pfn, unsigned long nr_pages); #ifdef CONFIG_NUMA extern int memory_add_physaddr_to_nid(u64 start); Index: linux-3.5-rc6/arch/powerpc/platforms/pseries/hotplug-memory.c === --- linux-3.5-rc6.orig/arch/powerpc/platforms/pseries/hotplug-memory.c 2012-07-18 18:00:27.442145407 +0900 +++ linux-3.5-rc6/arch/powerpc/platforms/pseries/hotplug-memory.c 2012-07-18 18:00:40.470982578 +0900 @@ -76,7 +76,6 @@ unsigned long memory_block_size_bytes(vo
Re: [RFC PATCH v4 2/13] memory-hotplug : add physical memory hotplug code to acpi_memory_device_remove
Hi Wen, 2012/07/19 16:23, Wen Congyang wrote: At 07/18/2012 06:06 PM, Yasuaki Ishimatsu Wrote: acpi_memory_device_remove() has been prepared to remove physical memory. But, the function only frees acpi_memory_device currentlry. The patch adds following functions into acpi_memory_device_remove(): - offline memory - remove physical memory. It only check whether memory is online or not. - free acpi_memory_device CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- drivers/acpi/acpi_memhotplug.c | 27 ++- drivers/base/memory.c | 39 +++ include/linux/memory.h |5 + include/linux/memory_hotplug.h |5 + mm/memory_hotplug.c| 22 ++ 5 files changed, 97 insertions(+), 1 deletion(-) Index: linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c === --- linux-3.5-rc6.orig/drivers/acpi/acpi_memhotplug.c2012-07-17 11:20:15.117796971 +0900 +++ linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c 2012-07-17 13:36:30.325594022 +0900 @@ -29,6 +29,7 @@ #include linux/module.h #include linux/init.h #include linux/types.h +#include linux/memory.h #include linux/memory_hotplug.h #include linux/slab.h #include acpi/acpi_drivers.h @@ -452,12 +453,36 @@ static int acpi_memory_device_add(struct static int acpi_memory_device_remove(struct acpi_device *device, int type) { struct acpi_memory_device *mem_device = NULL; - +struct acpi_memory_info *info, *tmp; +int result; +int node; if (!device || !acpi_driver_data(device)) return -EINVAL; mem_device = acpi_driver_data(device); + +node = acpi_get_node(mem_device-device-handle); +list_for_each_entry_safe(info, tmp, mem_device-res_list, list) { +if (!info-enabled) +continue; + +if (!is_memblk_offline(info-start_addr, info-length)) { +result = offline_memory(info-start_addr, info-length); +if (result) +return result; +} +if (node 0) +node = memory_add_physaddr_to_nid(info-start_addr); + +result = remove_memory(node, info-start_addr, info-length); +if (result) +return result; + +list_del(info-list); +kfree(info); +} + kfree(mem_device); return 0; Index: linux-3.5-rc6/include/linux/memory_hotplug.h === --- linux-3.5-rc6.orig/include/linux/memory_hotplug.h2012-07-17 11:20:15.133796772 +0900 +++ linux-3.5-rc6/include/linux/memory_hotplug.h 2012-07-17 11:29:41.490716352 +0900 @@ -221,6 +221,7 @@ static inline void unlock_memory_hotplug #ifdef CONFIG_MEMORY_HOTREMOVE extern int is_mem_section_removable(unsigned long pfn, unsigned long nr_pages); +extern int remove_memory(int nid, u64 start, u64 size); #else static inline int is_mem_section_removable(unsigned long pfn, @@ -228,6 +229,10 @@ static inline int is_mem_section_removab { return 0; } +static inline int remove_memory(int nid, u64 start, u64 size) +{ +return -EBUSY; +} #endif /* CONFIG_MEMORY_HOTREMOVE */ extern int mem_online_node(int nid); Index: linux-3.5-rc6/mm/memory_hotplug.c === --- linux-3.5-rc6.orig/mm/memory_hotplug.c 2012-07-17 11:20:15.129796821 +0900 +++ linux-3.5-rc6/mm/memory_hotplug.c2012-07-17 13:25:18.952986069 +0900 @@ -998,6 +998,28 @@ int offline_memory(u64 start, u64 size) end_pfn = start_pfn + PFN_DOWN(size); return offline_pages(start_pfn, end_pfn, 120 * HZ); } + +int remove_memory(int nid, u64 start, u64 size) +{ +int ret = -EBUSY; +lock_memory_hotplug(); +/* + * The memory might become online by other task, even if you offine it. + * So we check whether the cpu has been onlined or not. + */ +if (!is_memblk_offline(start, size)) { +pr_warn(memory removing [mem %#010llx-%#010llx] failed, +because the memmory range is online\n, +start, start + size); +ret = -EAGAIN; +} + +unlock_memory_hotplug(); +return ret; + +} +EXPORT_SYMBOL_GPL(remove_memory); + #else
Re: [RFC PATCH v4 11/13] memory-hotplug : free memmap of sparse-vmemmap
Hi Wen, 2012/07/19 18:45, Wen Congyang wrote: At 07/18/2012 06:16 PM, Yasuaki Ishimatsu Wrote: All pages of virtual mapping in removed memory cannot be freed, since some pages used as PGD/PUD includes not only removed memory but also other memory. So the patch checks whether page can be freed or not. How to check whether page can be freed or not? 1. When removing memory, the page structs of the revmoved memory are filled with 0FD. 2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared. In this case, the page used as PT/PMD can be freed. Applying patch, __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is integrated into one. So __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is deleted. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- arch/x86/mm/init_64.c | 121 ++ include/linux/mm.h|2 mm/memory_hotplug.c | 19 --- mm/sparse.c |5 +- 4 files changed, 128 insertions(+), 19 deletions(-) Index: linux-3.5-rc6/include/linux/mm.h === --- linux-3.5-rc6.orig/include/linux/mm.h2012-07-18 18:01:28.0 +0900 +++ linux-3.5-rc6/include/linux/mm.h 2012-07-18 18:03:05.551168773 +0900 @@ -1588,6 +1588,8 @@ int vmemmap_populate(struct page *start_ void vmemmap_populate_print_last(void); void register_page_bootmem_memmap(unsigned long section_nr, struct page *map, unsigned long size); +void vmemmap_kfree(struct page *memmpa, unsigned long nr_pages); +void vmemmap_free_bootmem(struct page *memmpa, unsigned long nr_pages); enum mf_flags { MF_COUNT_INCREASED = 1 0, Index: linux-3.5-rc6/mm/sparse.c === --- linux-3.5-rc6.orig/mm/sparse.c 2012-07-18 17:59:25.0 +0900 +++ linux-3.5-rc6/mm/sparse.c2012-07-18 18:03:05.553168749 +0900 @@ -614,12 +614,13 @@ static inline struct page *kmalloc_secti /* This will make the necessary allocations eventually. */ return sparse_mem_map_populate(pnum, nid); } -static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages) +static void __kfree_section_memmap(struct page *page, unsigned long nr_pages) { -return; /* XXX: Not implemented yet */ +vmemmap_kfree(page, nr_pages); } static void free_map_bootmem(struct page *page, unsigned long nr_pages) { +vmemmap_free_bootmem(page, nr_pages); } #else static struct page *__kmalloc_section_memmap(unsigned long nr_pages) Index: linux-3.5-rc6/arch/x86/mm/init_64.c === --- linux-3.5-rc6.orig/arch/x86/mm/init_64.c 2012-07-18 18:01:28.0 +0900 +++ linux-3.5-rc6/arch/x86/mm/init_64.c 2012-07-18 18:03:05.564168611 +0900 @@ -978,6 +978,127 @@ vmemmap_populate(struct page *start_page return 0; } +#define PAGE_INUSE 0xFD + +unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long end, +struct page **pp, int *page_size) +{ +pgd_t *pgd; +pud_t *pud; +pmd_t *pmd; +pte_t *pte; +void *page_addr; +unsigned long next; + +*pp = NULL; + +pgd = pgd_offset_k(addr); +if (pgd_none(*pgd)) +return pgd_addr_end(addr, end); + +pud = pud_offset(pgd, addr); +if (pud_none(*pud)) +return pud_addr_end(addr,end); + +if (!cpu_has_pse) { +next = (addr + PAGE_SIZE) PAGE_MASK; +pmd = pmd_offset(pud, addr); +if (pmd_none(*pmd)) +return next; + +pte = pte_offset_kernel(pmd, addr); +if (pte_none(*pte)) +return next; + +*page_size = PAGE_SIZE; +*pp = pte_page(*pte); +} else { +next = pmd_addr_end(addr, end); + +pmd = pmd_offset(pud, addr); +if (pmd_none(*pmd)) +return next; + +*page_size = PMD_SIZE; +*pp = pmd_page(*pmd); +} + +/* + * Removed page structs are filled with 0xFD. + */ +memset((void *)addr, PAGE_INUSE, next - addr); + +page_addr = page_address(*pp); + +/* + * Check the page is filled with 0xFD or not. + * memchr_inv() returns the address. In this case, we cannot + * clear PTE/PUD entry, since the page is used by other
[RFC PATCH v4 0/13] memory-hotplug : hot-remove physical memory
This patch series aims to support physical memory hot-remove. [RFC PATCH v4 1/13] memory-hotplug : rename remove_memory to offline_memory [RFC PATCH v4 2/13] memory-hotplug : add physical memory hotplug code to acpi_memory_device_remove [RFC PATCH v4 3/13] memory-hotplug : check whether memory is present or not [RFC PATCH v4 4/13] memory-hotplug : remove /sys/firmware/memmap/X sysfs [RFC PATCH v4 5/13] memory-hotplug : does not release memory region in PAGES_PER_SECTION chunks [RFC PATCH v4 6/13] memory-hotplug : add memory_block_release [RFC PATCH v4 7/13] memory-hotplug : remove_memory calls __remove_pages [RFC PATCH v4 8/13] memory-hotplug : check page type in get_page_bootmem [RFC PATCH v4 9/13] memory-hotplug : move register_page_bootmem_info_node and put_page_bootmem for sparse-vmemmap4 [RFC PATCH v4 10/13] memory-hotplug : implement register_page_bootmem_info_section of sparse-vmemmap [RFC PATCH v4 11/13] memory-hotplug : free memmap of sparse-vmemmap [RFC PATCH v4 12/13] memory-hotplug : add node_device_release [RFC PATCH v4 13/13] memory-hotplug : remove sysfs file of node Even if you apply these patches, you cannot remove the physical memory completely since these patches are still under development. But other components can be removed. I want you to cooperate to improve the physical memory hot-remove. So please review these patches and give your comment/idea. The patches can free/remove following things: - acpi_memory_info : [RFC PATCH 2/13] - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 4/13] - iomem_resource: [RFC PATCH 5/13] - mem_section and related sysfs files : [RFC PATCH 6-11/13] - node and related sysfs files : [RFC PATCH 12-13/13] The patches cannot do following things yet: - page table of removed memory If you find lack of function for physical memory hot-remove, please let me know. change log of v4: * remove memory-hotplug : unify argument of firmware_map_add_early/hotplug from the patch series, since the patch is a bugfix. It is being disccussed on other thread. But for testing the patch series, the patch is needed. So I added the patch as [PATCH 0/13]. [RFC PATCH v4 2/13] * check memory is online or not at remove_memory() * add memory_add_physaddr_to_nid() to acpi_memory_device_remove() for getting node id [RFC PATCH v4 3/13] * create new patch : check memory is online or not at online_pages() [RFC PATCH v4 4/13] * add __ref section to remove_memory() * call firmware_map_remove_entry() before remove_sysfs_fw_map_entry() [RFC PATCH v4 11/13] * rewrite register_page_bootmem_memmap() for removing page used as PT/PMD change log of v3: * rebase to 3.5.0-rc6 [RFC PATCH v2 2/13] * remove extra kobject_put() * The patch was commented by Wen. Wen's comment is acpi_memory_device_remove() should ignore a return value of remove_memory() since caller does not care the return value. But I did not change it since I think caller should care the return value. And I am trying to fix it as follow: https://lkml.org/lkml/2012/7/5/624 [RFC PATCH v2 4/13] * remove a firmware_memmap_entry allocated by kzmalloc() change log of v2: [RFC PATCH v2 2/13] * check whether memory block is offline or not before calling offline_memory() * check whether section is valid or not in is_memblk_offline() * call kobject_put() for each memory_block in is_memblk_offline() [RFC PATCH v2 3/13] * unify the end argument of firmware_map_add_early/hotplug [RFC PATCH v2 4/13] * add release_firmware_map_entry() for freeing firmware_map_entry [RFC PATCH v2 6/13] * add release_memory_block() for freeing memory_block [RFC PATCH v2 11/13] * fix wrong arguments of free_pages() --- arch/powerpc/platforms/pseries/hotplug-memory.c | 16 +- arch/x86/mm/init_64.c | 144 drivers/acpi/acpi_memhotplug.c | 28 drivers/base/memory.c | 54 - drivers/base/node.c |7 + drivers/firmware/memmap.c | 78 - include/linux/firmware-map.h|6 + include/linux/memory.h |5 include/linux/memory_hotplug.h | 17 -- include/linux/mm.h |5 mm/memory_hotplug.c | 98 mm/sparse.c |5 12 files changed, 414 insertions(+), 49 deletions(-) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH 0/13] firmware_map : unify argument of firmware_map_add_early/hotplug
There are two ways to create /sys/firmware/memmap/X sysfs: - firmware_map_add_early When the system starts, it is calledd from e820_reserve_resources() - firmware_map_add_hotplug When the memory is hot plugged, it is called from add_memory() But these functions are called without unifying value of end argument as below: - end argument of firmware_map_add_early() : start + size - 1 - end argument of firmware_map_add_hogplug() : start + size The patch unifies them to start + size. Even if applying the patch, /sys/firmware/memmap/X/end file content does not change. CC: Thomas Gleixner t...@linutronix.de CC: Ingo Molnar mi...@kernel.org CC: H. Peter Anvin h...@zytor.com CC: Tejun Heo t...@kernel.org CC: Andrew Morton a...@linux-foundation.org Reviewed-by: Dave Hansen d...@linux.vnet.ibm.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- arch/x86/kernel/e820.c|2 +- drivers/firmware/memmap.c |8 2 files changed, 5 insertions(+), 5 deletions(-) Index: linux-3.5-rc6/arch/x86/kernel/e820.c === --- linux-3.5-rc6.orig/arch/x86/kernel/e820.c 2012-07-18 17:19:38.391365260 +0900 +++ linux-3.5-rc6/arch/x86/kernel/e820.c2012-07-18 17:19:43.616300222 +0900 @@ -944,7 +944,7 @@ void __init e820_reserve_resources(void) for (i = 0; i e820_saved.nr_map; i++) { struct e820entry *entry = e820_saved.map[i]; firmware_map_add_early(entry-addr, - entry-addr + entry-size - 1, + entry-addr + entry-size, e820_type_to_string(entry-type)); } } Index: linux-3.5-rc6/drivers/firmware/memmap.c === --- linux-3.5-rc6.orig/drivers/firmware/memmap.c2012-07-18 17:19:38.388365299 +0900 +++ linux-3.5-rc6/drivers/firmware/memmap.c 2012-07-18 18:30:47.608390251 +0900 @@ -98,7 +98,7 @@ static LIST_HEAD(map_entries); /** * firmware_map_add_entry() - Does the real work to add a firmware memmap entry. * @start: Start of the memory range. - * @end: End of the memory range (inclusive). + * @end: End of the memory range. * @type: Type of the memory range. * @entry: Pre-allocated (either kmalloc() or bootmem allocator), uninitialised * entry. @@ -113,7 +113,7 @@ static int firmware_map_add_entry(u64 st BUG_ON(start end); entry-start = start; - entry-end = end; + entry-end = end - 1; entry-type = type; INIT_LIST_HEAD(entry-list); kobject_init(entry-kobj, memmap_ktype); @@ -148,7 +148,7 @@ static int add_sysfs_fw_map_entry(struct * firmware_map_add_hotplug() - Adds a firmware mapping entry when we do * memory hotplug. * @start: Start of the memory range. - * @end: End of the memory range (inclusive). + * @end: End of the memory range. * @type: Type of the memory range. * * Adds a firmware mapping entry. This function is for memory hotplug, it is @@ -175,7 +175,7 @@ int __meminit firmware_map_add_hotplug(u /** * firmware_map_add_early() - Adds a firmware mapping entry. * @start: Start of the memory range. - * @end: End of the memory range (inclusive). + * @end: End of the memory range. * @type: Type of the memory range. * * Adds a firmware mapping entry. This function uses the bootmem allocator ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v4 1/13] memory-hotplug : rename remove_memory to offline_memory
remove_memory() does not remove memory but just offlines memory. The patch changes name of it to offline_memory(). CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- drivers/acpi/acpi_memhotplug.c |2 +- drivers/base/memory.c |4 ++-- include/linux/memory_hotplug.h |2 +- mm/memory_hotplug.c|6 +++--- 4 files changed, 7 insertions(+), 7 deletions(-) Index: linux-3.5-rc4/drivers/acpi/acpi_memhotplug.c === --- linux-3.5-rc4.orig/drivers/acpi/acpi_memhotplug.c 2012-07-03 14:21:46.102416917 +0900 +++ linux-3.5-rc4/drivers/acpi/acpi_memhotplug.c2012-07-03 14:21:49.458374960 +0900 @@ -318,7 +318,7 @@ static int acpi_memory_disable_device(st */ list_for_each_entry_safe(info, n, mem_device-res_list, list) { if (info-enabled) { - result = remove_memory(info-start_addr, info-length); + result = offline_memory(info-start_addr, info-length); if (result) return result; } Index: linux-3.5-rc4/drivers/base/memory.c === --- linux-3.5-rc4.orig/drivers/base/memory.c2012-07-03 14:21:46.095417003 +0900 +++ linux-3.5-rc4/drivers/base/memory.c 2012-07-03 14:21:49.459374948 +0900 @@ -266,8 +266,8 @@ memory_block_action(unsigned long phys_i break; case MEM_OFFLINE: start_paddr = page_to_pfn(first_page) PAGE_SHIFT; - ret = remove_memory(start_paddr, - nr_pages PAGE_SHIFT); + ret = offline_memory(start_paddr, +nr_pages PAGE_SHIFT); break; default: WARN(1, KERN_WARNING %s(%ld, %ld) unknown action: Index: linux-3.5-rc4/mm/memory_hotplug.c === --- linux-3.5-rc4.orig/mm/memory_hotplug.c 2012-07-03 14:21:46.102416917 +0900 +++ linux-3.5-rc4/mm/memory_hotplug.c 2012-07-03 14:21:49.466374860 +0900 @@ -990,7 +990,7 @@ out: return ret; } -int remove_memory(u64 start, u64 size) +int offline_memory(u64 start, u64 size) { unsigned long start_pfn, end_pfn; @@ -999,9 +999,9 @@ int remove_memory(u64 start, u64 size) return offline_pages(start_pfn, end_pfn, 120 * HZ); } #else -int remove_memory(u64 start, u64 size) +int offline_memory(u64 start, u64 size) { return -EINVAL; } #endif /* CONFIG_MEMORY_HOTREMOVE */ -EXPORT_SYMBOL_GPL(remove_memory); +EXPORT_SYMBOL_GPL(offline_memory); Index: linux-3.5-rc4/include/linux/memory_hotplug.h === --- linux-3.5-rc4.orig/include/linux/memory_hotplug.h 2012-07-03 14:21:46.102416917 +0900 +++ linux-3.5-rc4/include/linux/memory_hotplug.h2012-07-03 14:21:49.471374796 +0900 @@ -233,7 +233,7 @@ static inline int is_mem_section_removab extern int mem_online_node(int nid); extern int add_memory(int nid, u64 start, u64 size); extern int arch_add_memory(int nid, u64 start, u64 size); -extern int remove_memory(u64 start, u64 size); +extern int offline_memory(u64 start, u64 size); extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn, int nr_pages); extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v4 2/13] memory-hotplug : add physical memory hotplug code to acpi_memory_device_remove
acpi_memory_device_remove() has been prepared to remove physical memory. But, the function only frees acpi_memory_device currentlry. The patch adds following functions into acpi_memory_device_remove(): - offline memory - remove physical memory. It only check whether memory is online or not. - free acpi_memory_device CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- drivers/acpi/acpi_memhotplug.c | 27 ++- drivers/base/memory.c | 39 +++ include/linux/memory.h |5 + include/linux/memory_hotplug.h |5 + mm/memory_hotplug.c| 22 ++ 5 files changed, 97 insertions(+), 1 deletion(-) Index: linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c === --- linux-3.5-rc6.orig/drivers/acpi/acpi_memhotplug.c 2012-07-17 11:20:15.117796971 +0900 +++ linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c2012-07-17 13:36:30.325594022 +0900 @@ -29,6 +29,7 @@ #include linux/module.h #include linux/init.h #include linux/types.h +#include linux/memory.h #include linux/memory_hotplug.h #include linux/slab.h #include acpi/acpi_drivers.h @@ -452,12 +453,36 @@ static int acpi_memory_device_add(struct static int acpi_memory_device_remove(struct acpi_device *device, int type) { struct acpi_memory_device *mem_device = NULL; - + struct acpi_memory_info *info, *tmp; + int result; + int node; if (!device || !acpi_driver_data(device)) return -EINVAL; mem_device = acpi_driver_data(device); + + node = acpi_get_node(mem_device-device-handle); + list_for_each_entry_safe(info, tmp, mem_device-res_list, list) { + if (!info-enabled) + continue; + + if (!is_memblk_offline(info-start_addr, info-length)) { + result = offline_memory(info-start_addr, info-length); + if (result) + return result; + } + if (node 0) + node = memory_add_physaddr_to_nid(info-start_addr); + + result = remove_memory(node, info-start_addr, info-length); + if (result) + return result; + + list_del(info-list); + kfree(info); + } + kfree(mem_device); return 0; Index: linux-3.5-rc6/include/linux/memory_hotplug.h === --- linux-3.5-rc6.orig/include/linux/memory_hotplug.h 2012-07-17 11:20:15.133796772 +0900 +++ linux-3.5-rc6/include/linux/memory_hotplug.h2012-07-17 11:29:41.490716352 +0900 @@ -221,6 +221,7 @@ static inline void unlock_memory_hotplug #ifdef CONFIG_MEMORY_HOTREMOVE extern int is_mem_section_removable(unsigned long pfn, unsigned long nr_pages); +extern int remove_memory(int nid, u64 start, u64 size); #else static inline int is_mem_section_removable(unsigned long pfn, @@ -228,6 +229,10 @@ static inline int is_mem_section_removab { return 0; } +static inline int remove_memory(int nid, u64 start, u64 size) +{ + return -EBUSY; +} #endif /* CONFIG_MEMORY_HOTREMOVE */ extern int mem_online_node(int nid); Index: linux-3.5-rc6/mm/memory_hotplug.c === --- linux-3.5-rc6.orig/mm/memory_hotplug.c 2012-07-17 11:20:15.129796821 +0900 +++ linux-3.5-rc6/mm/memory_hotplug.c 2012-07-17 13:25:18.952986069 +0900 @@ -998,6 +998,28 @@ int offline_memory(u64 start, u64 size) end_pfn = start_pfn + PFN_DOWN(size); return offline_pages(start_pfn, end_pfn, 120 * HZ); } + +int remove_memory(int nid, u64 start, u64 size) +{ + int ret = -EBUSY; + lock_memory_hotplug(); + /* +* The memory might become online by other task, even if you offine it. +* So we check whether the cpu has been onlined or not. +*/ + if (!is_memblk_offline(start, size)) { + pr_warn(memory removing [mem %#010llx-%#010llx] failed, + because the memmory range is online\n, + start, start + size); + ret = -EAGAIN; + } + + unlock_memory_hotplug(); + return ret; + +} +EXPORT_SYMBOL_GPL(remove_memory); + #else int offline_memory(u64 start, u64 size) { Index: linux-3.5-rc6/drivers/base/memory.c
[PATCH v4 3/13] memory-hotplug : check whether memory is present or not
If system supports memory hot-remove, online_pages() may online removed pages. So online_pages() need to check whether onlining pages are present or not. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- include/linux/mmzone.h | 21 + mm/memory_hotplug.c| 13 + 2 files changed, 34 insertions(+) Index: linux-3.5-rc6/include/linux/mmzone.h === --- linux-3.5-rc6.orig/include/linux/mmzone.h 2012-07-08 09:23:56.0 +0900 +++ linux-3.5-rc6/include/linux/mmzone.h2012-07-17 16:10:21.588186145 +0900 @@ -1168,6 +1168,27 @@ void sparse_init(void); #define sparse_index_init(_sec, _nid) do {} while (0) #endif /* CONFIG_SPARSEMEM */ +#ifdef CONFIG_SPARSEMEM +static inline int pfns_present(unsigned long pfn, unsigned long nr_pages) +{ + int i; + for (i = 0; i nr_pages; i++) { + if (pfn_present(pfn + 1)) + continue; + else { + unlock_memory_hotplug(); + return -EINVAL; + } + } + return 0; +} +#else +static inline int pfns_present(unsigned long pfn, unsigned long nr_pages) +{ + return 0; +} +#endif /* CONFIG_SPARSEMEM*/ + #ifdef CONFIG_NODES_SPAN_OTHER_NODES bool early_pfn_in_nid(unsigned long pfn, int nid); #else Index: linux-3.5-rc6/mm/memory_hotplug.c === --- linux-3.5-rc6.orig/mm/memory_hotplug.c 2012-07-17 14:26:40.0 +0900 +++ linux-3.5-rc6/mm/memory_hotplug.c 2012-07-17 16:09:50.070580170 +0900 @@ -467,6 +467,19 @@ int __ref online_pages(unsigned long pfn struct memory_notify arg; lock_memory_hotplug(); + /* +* If system supports memory hot-remove, the memory may have been +* removed. So we check whether the memory has been removed or not. +* +* Note: When CONFIG_SPARSEMEM is defined, pfns_present() become +* effective. If CONFIG_SPARSEMEM is not defined, pfns_present() +* always returns 0. +*/ + ret = pfns_present(pfn, nr_pages); + if (ret) { + unlock_memory_hotplug(); + return ret; + } arg.start_pfn = pfn; arg.nr_pages = nr_pages; arg.status_change_nid = -1; ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v4 4/13] memory-hotplug : remove /sys/firmware/memmap/X sysfs
When (hot)adding memory into system, /sys/firmware/memmap/X/{end, start, type} sysfs files are created. But there is no code to remove these files. The patch implements the function to remove them. Note : The code does not free firmware_map_entry since there is no way to free memory which is allocated by bootmem. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- drivers/firmware/memmap.c| 78 ++- include/linux/firmware-map.h |6 +++ mm/memory_hotplug.c |9 +++- 3 files changed, 90 insertions(+), 3 deletions(-) Index: linux-3.5-rc6/mm/memory_hotplug.c === --- linux-3.5-rc6.orig/mm/memory_hotplug.c 2012-07-18 17:20:05.670024283 +0900 +++ linux-3.5-rc6/mm/memory_hotplug.c 2012-07-18 17:51:03.933189930 +0900 @@ -1012,9 +1012,9 @@ int offline_memory(u64 start, u64 size) return offline_pages(start_pfn, end_pfn, 120 * HZ); } -int remove_memory(int nid, u64 start, u64 size) +int __ref remove_memory(int nid, u64 start, u64 size) { - int ret = -EBUSY; + int ret = 0; lock_memory_hotplug(); /* * The memory might become online by other task, even if you offine it. @@ -1025,8 +1025,13 @@ int remove_memory(int nid, u64 start, u6 because the memmory range is online\n, start, start + size); ret = -EAGAIN; + goto out; } + /* remove memmap entry */ + firmware_map_remove(start, start + size, System RAM); + +out: unlock_memory_hotplug(); return ret; Index: linux-3.5-rc6/include/linux/firmware-map.h === --- linux-3.5-rc6.orig/include/linux/firmware-map.h 2012-07-18 17:19:37.007382563 +0900 +++ linux-3.5-rc6/include/linux/firmware-map.h 2012-07-18 17:42:20.804730245 +0900 @@ -25,6 +25,7 @@ int firmware_map_add_early(u64 start, u64 end, const char *type); int firmware_map_add_hotplug(u64 start, u64 end, const char *type); +int firmware_map_remove(u64 start, u64 end, const char *type); #else /* CONFIG_FIRMWARE_MEMMAP */ @@ -38,6 +39,11 @@ static inline int firmware_map_add_hotpl return 0; } +static inline int firmware_map_remove(u64 start, u64 end, const char *type) +{ + return 0; +} + #endif /* CONFIG_FIRMWARE_MEMMAP */ #endif /* _LINUX_FIRMWARE_MAP_H */ Index: linux-3.5-rc6/drivers/firmware/memmap.c === --- linux-3.5-rc6.orig/drivers/firmware/memmap.c2012-07-18 17:19:43.618300182 +0900 +++ linux-3.5-rc6/drivers/firmware/memmap.c 2012-07-18 17:42:20.846729721 +0900 @@ -21,6 +21,7 @@ #include linux/types.h #include linux/bootmem.h #include linux/slab.h +#include linux/mm.h /* * Data types -- @@ -79,7 +80,22 @@ static const struct sysfs_ops memmap_att .show = memmap_attr_show, }; +#define to_memmap_entry(obj) container_of(obj, struct firmware_map_entry, kobj) + +static void release_firmware_map_entry(struct kobject *kobj) +{ + struct firmware_map_entry *entry = to_memmap_entry(kobj); + struct page *page; + + page = virt_to_page(entry); + if (PageSlab(page) || PageCompound(page)) + kfree(entry); + + /* There is no way to free memory allocated from bootmem*/ +} + static struct kobj_type memmap_ktype = { + .release= release_firmware_map_entry, .sysfs_ops = memmap_attr_ops, .default_attrs = def_attrs, }; @@ -123,6 +139,16 @@ static int firmware_map_add_entry(u64 st return 0; } +/** + * firmware_map_remove_entry() - Does the real work to remove a firmware + * memmap entry. + * @entry: removed entry. + **/ +static inline void firmware_map_remove_entry(struct firmware_map_entry *entry) +{ + list_del(entry-list); +} + /* * Add memmap entry on sysfs */ @@ -144,6 +170,31 @@ static int add_sysfs_fw_map_entry(struct return 0; } +/* + * Remove memmap entry on sysfs + */ +static inline void remove_sysfs_fw_map_entry(struct firmware_map_entry *entry) +{ + kobject_put(entry-kobj); +} + +/* + * Search memmap entry + */ + +struct firmware_map_entry * __meminit +find_firmware_map_entry(u64 start, u64 end, const char *type) +{ + struct firmware_map_entry *entry; + + list_for_each_entry(entry, map_entries, list) + if ((entry
[RFC PATCH v4 5/13] memory-hotplug : does not release memory region in PAGES_PER_SECTION chunks
Since applying a patch(de7f0cba96786c), release_mem_region() has been changed as called in PAGES_PER_SECTION chunks because register_memory_resource() is called in PAGES_PER_SECTION chunks by add_memory(). But it seems firmware dependency. If CRS are written in the PAGES_PER_SECTION chunks in ACPI DSDT Table, register_memory_resource() is called in PAGES_PER_SECTION chunks. But if CRS are written in the DIMM unit in ACPI DSDT Table, register_memory_resource() is called in DIMM unit. So release_mem_region() should not be called in PAGES_PER_SECTION chunks. The patch fixes it. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- arch/powerpc/platforms/pseries/hotplug-memory.c | 13 + mm/memory_hotplug.c |4 ++-- 2 files changed, 11 insertions(+), 6 deletions(-) Index: linux-3.5-rc6/mm/memory_hotplug.c === --- linux-3.5-rc6.orig/mm/memory_hotplug.c 2012-07-18 17:51:03.933189930 +0900 +++ linux-3.5-rc6/mm/memory_hotplug.c 2012-07-18 17:51:17.550020005 +0900 @@ -358,11 +358,11 @@ int __remove_pages(struct zone *zone, un BUG_ON(phys_start_pfn ~PAGE_SECTION_MASK); BUG_ON(nr_pages % PAGES_PER_SECTION); + release_mem_region(phys_start_pfn PAGE_SHIFT, nr_pages * PAGE_SIZE); + sections_to_remove = nr_pages / PAGES_PER_SECTION; for (i = 0; i sections_to_remove; i++) { unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION; - release_mem_region(pfn PAGE_SHIFT, - PAGES_PER_SECTION PAGE_SHIFT); ret = __remove_section(zone, __pfn_to_section(pfn)); if (ret) break; Index: linux-3.5-rc6/arch/powerpc/platforms/pseries/hotplug-memory.c === --- linux-3.5-rc6.orig/arch/powerpc/platforms/pseries/hotplug-memory.c 2012-07-18 17:50:49.893365814 +0900 +++ linux-3.5-rc6/arch/powerpc/platforms/pseries/hotplug-memory.c 2012-07-18 17:51:17.553019968 +0900 @@ -77,7 +77,8 @@ static int pseries_remove_memblock(unsig { unsigned long start, start_pfn; struct zone *zone; - int ret; + int i, ret; + int sections_to_remove; start_pfn = base PAGE_SHIFT; @@ -97,9 +98,13 @@ static int pseries_remove_memblock(unsig * to sysfs state file and we can't remove sysfs entries * while writing to it. So we have to defer it to here. */ - ret = __remove_pages(zone, start_pfn, memblock_size PAGE_SHIFT); - if (ret) - return ret; + sections_to_remove = (memblock_size PAGE_SHIFT) / PAGES_PER_SECTION; + for (i = 0; i sections_to_remove; i++) { + unsigned long pfn = start_pfn + i * PAGES_PER_SECTION; + ret = __remove_pages(zone, start_pfn, PAGES_PER_SECTION); + if (ret) + return ret; + } /* * Update memory regions for memory remove ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v4 7/13] memory-hotplug : remove_memory calls __remove_pages
The patch adds __remove_pages() to remove_memory(). Then the range of phys_start_pfn argument and nr_pages argument in __remove_pagse() may have different zone. So zone argument is removed from __remove_pages() and __remove_pages() caluculates zone in each section. When CONFIG_SPARSEMEM_VMEMMAP is defined, there is no way to remove a memmap. So __remove_section only calls unregister_memory_section(). CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- arch/powerpc/platforms/pseries/hotplug-memory.c |5 + include/linux/memory_hotplug.h |3 +-- mm/memory_hotplug.c | 19 --- 3 files changed, 14 insertions(+), 13 deletions(-) Index: linux-3.5-rc6/mm/memory_hotplug.c === --- linux-3.5-rc6.orig/mm/memory_hotplug.c 2012-07-18 18:00:27.440145432 +0900 +++ linux-3.5-rc6/mm/memory_hotplug.c 2012-07-18 18:01:02.070712487 +0900 @@ -275,11 +275,14 @@ static int __meminit __add_section(int n #ifdef CONFIG_SPARSEMEM_VMEMMAP static int __remove_section(struct zone *zone, struct mem_section *ms) { - /* -* XXX: Freeing memmap with vmemmap is not implement yet. -* This should be removed later. -*/ - return -EBUSY; + int ret = -EINVAL; + + if (!valid_section(ms)) + return ret; + + ret = unregister_memory_section(ms); + + return ret; } #else static int __remove_section(struct zone *zone, struct mem_section *ms) @@ -346,11 +349,11 @@ EXPORT_SYMBOL_GPL(__add_pages); * sure that pages are marked reserved and zones are adjust properly by * calling offline_pages(). */ -int __remove_pages(struct zone *zone, unsigned long phys_start_pfn, -unsigned long nr_pages) +int __remove_pages(unsigned long phys_start_pfn, unsigned long nr_pages) { unsigned long i, ret = 0; int sections_to_remove; + struct zone *zone; /* * We can only remove entire sections @@ -363,6 +366,7 @@ int __remove_pages(struct zone *zone, un sections_to_remove = nr_pages / PAGES_PER_SECTION; for (i = 0; i sections_to_remove; i++) { unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION; + zone = page_zone(pfn_to_page(pfn)); ret = __remove_section(zone, __pfn_to_section(pfn)); if (ret) break; @@ -1031,6 +1035,7 @@ int __ref remove_memory(int nid, u64 sta /* remove memmap entry */ firmware_map_remove(start, start + size, System RAM); + __remove_pages(start PAGE_SHIFT, size PAGE_SHIFT); out: unlock_memory_hotplug(); return ret; Index: linux-3.5-rc6/include/linux/memory_hotplug.h === --- linux-3.5-rc6.orig/include/linux/memory_hotplug.h 2012-07-18 18:00:27.445145371 +0900 +++ linux-3.5-rc6/include/linux/memory_hotplug.h2012-07-18 18:00:40.461982690 +0900 @@ -89,8 +89,7 @@ extern bool is_pageblock_removable_noloc /* reasonably generic interface to expand the physical pages in a zone */ extern int __add_pages(int nid, struct zone *zone, unsigned long start_pfn, unsigned long nr_pages); -extern int __remove_pages(struct zone *zone, unsigned long start_pfn, - unsigned long nr_pages); +extern int __remove_pages(unsigned long start_pfn, unsigned long nr_pages); #ifdef CONFIG_NUMA extern int memory_add_physaddr_to_nid(u64 start); Index: linux-3.5-rc6/arch/powerpc/platforms/pseries/hotplug-memory.c === --- linux-3.5-rc6.orig/arch/powerpc/platforms/pseries/hotplug-memory.c 2012-07-18 18:00:27.442145407 +0900 +++ linux-3.5-rc6/arch/powerpc/platforms/pseries/hotplug-memory.c 2012-07-18 18:00:40.470982578 +0900 @@ -76,7 +76,6 @@ unsigned long memory_block_size_bytes(vo static int pseries_remove_memblock(unsigned long base, unsigned int memblock_size) { unsigned long start, start_pfn; - struct zone *zone; int i, ret; int sections_to_remove; @@ -87,8 +86,6 @@ static int pseries_remove_memblock(unsig return 0; } - zone = page_zone(pfn_to_page(start_pfn)); - /* * Remove section mappings and sysfs entries for the * section of the memory we are removing. @@ -101,7 +98,7 @@ static int pseries_remove_memblock(unsig sections_to_remove = (memblock_size PAGE_SHIFT
[RFC PATCH v4 6/13] memory-hotplug : add memory_block_release
When calling remove_memory_block(), the function shows following message at device_release(). Device 'memory528' does not have a release() function, it is broken and must be fixed. remove_memory_block() calls kfree(mem). I think it shouled be called from device_release(). So the patch implements memory_block_release() CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- drivers/base/memory.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) Index: linux-3.5-rc6/drivers/base/memory.c === --- linux-3.5-rc6.orig/drivers/base/memory.c2012-07-18 17:50:49.659368740 +0900 +++ linux-3.5-rc6/drivers/base/memory.c 2012-07-18 17:51:28.655881214 +0900 @@ -109,6 +109,15 @@ bool is_memblk_offline(unsigned long sta } EXPORT_SYMBOL(is_memblk_offline); +#define to_memory_block(device) container_of(device, struct memory_block, dev) + +static void release_memory_block(struct device *dev) +{ + struct memory_block *mem = to_memory_block(dev); + + kfree(mem); +} + /* * register_memory - Setup a sysfs device for a memory block */ @@ -119,6 +128,7 @@ int register_memory(struct memory_block memory-dev.bus = memory_subsys; memory-dev.id = memory-start_section_nr / sections_per_block; + memory-dev.release = release_memory_block; error = device_register(memory-dev); return error; @@ -669,7 +679,6 @@ int remove_memory_block(unsigned long no mem_remove_simple_file(mem, phys_device); mem_remove_simple_file(mem, removable); unregister_memory(mem); - kfree(mem); } else kobject_put(mem-dev.kobj); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v4 8/13] memory-hotplug : check page type in get_page_bootmem
There is a possibility that get_page_bootmem() is called to the same page many times. So when get_page_bootmem is called to the same page, the function only increments page-_count. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- mm/memory_hotplug.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) Index: linux-3.5-rc6/mm/memory_hotplug.c === --- linux-3.5-rc6.orig/mm/memory_hotplug.c 2012-07-18 18:01:02.070712487 +0900 +++ linux-3.5-rc6/mm/memory_hotplug.c 2012-07-18 18:01:12.586581077 +0900 @@ -95,10 +95,17 @@ static void release_memory_resource(stru static void get_page_bootmem(unsigned long info, struct page *page, unsigned long type) { - page-lru.next = (struct list_head *) type; - SetPagePrivate(page); - set_page_private(page, info); - atomic_inc(page-_count); + unsigned long page_type; + + page_type = (unsigned long) page-lru.next; + if (type MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE || + type MEMORY_HOTPLUG_MAX_BOOTMEM_TYPE){ + page-lru.next = (struct list_head *) type; + SetPagePrivate(page); + set_page_private(page, info); + atomic_inc(page-_count); + } else + atomic_inc(page-_count); } /* reference to __meminit __free_pages_bootmem is valid ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v4 9/13] memory-hotplug : move register_page_bootmem_info_node and put_page_bootmem for sparse-vmemmap
For implementing register_page_bootmem_info_node of sparse-vmemmap, register_page_bootmem_info_node and put_page_bootmem are moved to memory_hotplug.c CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- include/linux/memory_hotplug.h |9 - mm/memory_hotplug.c|8 ++-- 2 files changed, 6 insertions(+), 11 deletions(-) Index: linux-3.5-rc6/include/linux/memory_hotplug.h === --- linux-3.5-rc6.orig/include/linux/memory_hotplug.h 2012-07-18 18:00:40.461982690 +0900 +++ linux-3.5-rc6/include/linux/memory_hotplug.h2012-07-18 18:01:24.217435670 +0900 @@ -160,17 +160,8 @@ static inline void arch_refresh_nodedata #endif /* CONFIG_NUMA */ #endif /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */ -#ifdef CONFIG_SPARSEMEM_VMEMMAP -static inline void register_page_bootmem_info_node(struct pglist_data *pgdat) -{ -} -static inline void put_page_bootmem(struct page *page) -{ -} -#else extern void register_page_bootmem_info_node(struct pglist_data *pgdat); extern void put_page_bootmem(struct page *page); -#endif /* * Lock for memory hotplug guarantees 1) all callbacks for memory hotplug Index: linux-3.5-rc6/mm/memory_hotplug.c === --- linux-3.5-rc6.orig/mm/memory_hotplug.c 2012-07-18 18:01:12.586581077 +0900 +++ linux-3.5-rc6/mm/memory_hotplug.c 2012-07-18 18:01:24.221435622 +0900 @@ -91,7 +91,6 @@ static void release_memory_resource(stru } #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE -#ifndef CONFIG_SPARSEMEM_VMEMMAP static void get_page_bootmem(unsigned long info, struct page *page, unsigned long type) { @@ -127,6 +126,7 @@ void __ref put_page_bootmem(struct page } +#ifndef CONFIG_SPARSEMEM_VMEMMAP static void register_page_bootmem_info_section(unsigned long start_pfn) { unsigned long *usemap, mapsize, section_nr, i; @@ -163,6 +163,11 @@ static void register_page_bootmem_info_s get_page_bootmem(section_nr, page, MIX_SECTION_INFO); } +#else +static inline void register_page_bootmem_info_section(unsigned long start_pfn) +{ +} +#endif void register_page_bootmem_info_node(struct pglist_data *pgdat) { @@ -198,7 +203,6 @@ void register_page_bootmem_info_node(str register_page_bootmem_info_section(pfn); } -#endif /* !CONFIG_SPARSEMEM_VMEMMAP */ static void grow_zone_span(struct zone *zone, unsigned long start_pfn, unsigned long end_pfn) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v4 10/13] memory-hotplug : implement register_page_bootmem_info_section of sparse-vmemmap
For removing memmap region of sparse-vmemmap which is allocated bootmem, memmap region of sparse-vmemmap needs to be registered by get_page_bootmem(). So the patch searches pages of virtual mapping and registers the pages by get_page_bootmem(). CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- arch/x86/mm/init_64.c | 52 + include/linux/memory_hotplug.h |2 + include/linux/mm.h |3 +- mm/memory_hotplug.c| 23 +++--- 4 files changed, 76 insertions(+), 4 deletions(-) Index: linux-3.5-rc6/mm/memory_hotplug.c === --- linux-3.5-rc6.orig/mm/memory_hotplug.c 2012-07-18 18:01:24.221435622 +0900 +++ linux-3.5-rc6/mm/memory_hotplug.c 2012-07-18 18:01:28.156386427 +0900 @@ -91,8 +91,8 @@ static void release_memory_resource(stru } #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE -static void get_page_bootmem(unsigned long info, struct page *page, -unsigned long type) +void get_page_bootmem(unsigned long info, struct page *page, + unsigned long type) { unsigned long page_type; @@ -164,8 +164,25 @@ static void register_page_bootmem_info_s } #else -static inline void register_page_bootmem_info_section(unsigned long start_pfn) +static void register_page_bootmem_info_section(unsigned long start_pfn) { + unsigned long mapsize, section_nr; + struct mem_section *ms; + struct page *page, *memmap; + + if (!pfn_valid(start_pfn)) + return; + + section_nr = pfn_to_section_nr(start_pfn); + ms = __nr_to_section(section_nr); + + memmap = sparse_decode_mem_map(ms-section_mem_map, section_nr); + + page = virt_to_page(memmap); + mapsize = sizeof(struct page) * PAGES_PER_SECTION; + mapsize = PAGE_ALIGN(mapsize) PAGE_SHIFT; + + register_page_bootmem_memmap(section_nr, memmap, PAGES_PER_SECTION); } #endif Index: linux-3.5-rc6/include/linux/mm.h === --- linux-3.5-rc6.orig/include/linux/mm.h 2012-07-18 17:59:51.225598230 +0900 +++ linux-3.5-rc6/include/linux/mm.h2012-07-18 18:01:28.161386365 +0900 @@ -1586,7 +1586,8 @@ int vmemmap_populate_basepages(struct pa unsigned long pages, int node); int vmemmap_populate(struct page *start_page, unsigned long pages, int node); void vmemmap_populate_print_last(void); - +void register_page_bootmem_memmap(unsigned long section_nr, struct page *map, + unsigned long size); enum mf_flags { MF_COUNT_INCREASED = 1 0, Index: linux-3.5-rc6/arch/x86/mm/init_64.c === --- linux-3.5-rc6.orig/arch/x86/mm/init_64.c2012-07-18 17:59:51.221598278 +0900 +++ linux-3.5-rc6/arch/x86/mm/init_64.c 2012-07-18 18:01:28.169386264 +0900 @@ -978,6 +978,58 @@ vmemmap_populate(struct page *start_page return 0; } +void register_page_bootmem_memmap(unsigned long section_nr, + struct page *start_page, unsigned long size) +{ + unsigned long addr = (unsigned long)start_page; + unsigned long end = (unsigned long)(start_page + size); + unsigned long next; + pgd_t *pgd; + pud_t *pud; + pmd_t *pmd; + + for (; addr end; addr = next) { + pte_t *pte = NULL; + + pgd = pgd_offset_k(addr); + if (pgd_none(*pgd)) { + next = (addr + PAGE_SIZE) PAGE_MASK; + continue; + } + get_page_bootmem(section_nr, pgd_page(*pgd), MIX_SECTION_INFO); + + pud = pud_offset(pgd, addr); + if (pud_none(*pud)) { + next = (addr + PAGE_SIZE) PAGE_MASK; + continue; + } + get_page_bootmem(section_nr, pud_page(*pud), MIX_SECTION_INFO); + + if (!cpu_has_pse) { + next = (addr + PAGE_SIZE) PAGE_MASK; + pmd = pmd_offset(pud, addr); + if (pmd_none(*pmd)) + continue; + get_page_bootmem(section_nr, pmd_page(*pmd), +MIX_SECTION_INFO); + + pte = pte_offset_kernel(pmd, addr); + if (pte_none(*pte
[RFC PATCH v4 11/13] memory-hotplug : free memmap of sparse-vmemmap
All pages of virtual mapping in removed memory cannot be freed, since some pages used as PGD/PUD includes not only removed memory but also other memory. So the patch checks whether page can be freed or not. How to check whether page can be freed or not? 1. When removing memory, the page structs of the revmoved memory are filled with 0FD. 2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared. In this case, the page used as PT/PMD can be freed. Applying patch, __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is integrated into one. So __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is deleted. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- arch/x86/mm/init_64.c | 121 ++ include/linux/mm.h|2 mm/memory_hotplug.c | 19 --- mm/sparse.c |5 +- 4 files changed, 128 insertions(+), 19 deletions(-) Index: linux-3.5-rc6/include/linux/mm.h === --- linux-3.5-rc6.orig/include/linux/mm.h 2012-07-18 18:01:28.0 +0900 +++ linux-3.5-rc6/include/linux/mm.h2012-07-18 18:03:05.551168773 +0900 @@ -1588,6 +1588,8 @@ int vmemmap_populate(struct page *start_ void vmemmap_populate_print_last(void); void register_page_bootmem_memmap(unsigned long section_nr, struct page *map, unsigned long size); +void vmemmap_kfree(struct page *memmpa, unsigned long nr_pages); +void vmemmap_free_bootmem(struct page *memmpa, unsigned long nr_pages); enum mf_flags { MF_COUNT_INCREASED = 1 0, Index: linux-3.5-rc6/mm/sparse.c === --- linux-3.5-rc6.orig/mm/sparse.c 2012-07-18 17:59:25.0 +0900 +++ linux-3.5-rc6/mm/sparse.c 2012-07-18 18:03:05.553168749 +0900 @@ -614,12 +614,13 @@ static inline struct page *kmalloc_secti /* This will make the necessary allocations eventually. */ return sparse_mem_map_populate(pnum, nid); } -static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages) +static void __kfree_section_memmap(struct page *page, unsigned long nr_pages) { - return; /* XXX: Not implemented yet */ + vmemmap_kfree(page, nr_pages); } static void free_map_bootmem(struct page *page, unsigned long nr_pages) { + vmemmap_free_bootmem(page, nr_pages); } #else static struct page *__kmalloc_section_memmap(unsigned long nr_pages) Index: linux-3.5-rc6/arch/x86/mm/init_64.c === --- linux-3.5-rc6.orig/arch/x86/mm/init_64.c2012-07-18 18:01:28.0 +0900 +++ linux-3.5-rc6/arch/x86/mm/init_64.c 2012-07-18 18:03:05.564168611 +0900 @@ -978,6 +978,127 @@ vmemmap_populate(struct page *start_page return 0; } +#define PAGE_INUSE 0xFD + +unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long end, + struct page **pp, int *page_size) +{ + pgd_t *pgd; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + void *page_addr; + unsigned long next; + + *pp = NULL; + + pgd = pgd_offset_k(addr); + if (pgd_none(*pgd)) + return pgd_addr_end(addr, end); + + pud = pud_offset(pgd, addr); + if (pud_none(*pud)) + return pud_addr_end(addr,end); + + if (!cpu_has_pse) { + next = (addr + PAGE_SIZE) PAGE_MASK; + pmd = pmd_offset(pud, addr); + if (pmd_none(*pmd)) + return next; + + pte = pte_offset_kernel(pmd, addr); + if (pte_none(*pte)) + return next; + + *page_size = PAGE_SIZE; + *pp = pte_page(*pte); + } else { + next = pmd_addr_end(addr, end); + + pmd = pmd_offset(pud, addr); + if (pmd_none(*pmd)) + return next; + + *page_size = PMD_SIZE; + *pp = pmd_page(*pmd); + } + + /* +* Removed page structs are filled with 0xFD. +*/ + memset((void *)addr, PAGE_INUSE, next - addr); + + page_addr = page_address(*pp); + + /* +* Check the page is filled with 0xFD or not. +* memchr_inv() returns the address. In this case, we cannot +* clear PTE/PUD entry, since the page is used by other. +* So we cannot also free the page. +* +* memchr_inv() returns NULL. In this case, we
[RFC PATCH v4 12/13] memory-hotplug : add node_device_release
When calling unregister_node(), the function shows following message at device_release(). Device 'node2' does not have a release() function, it is broken and must be fixed. So the patch implements node_device_release() CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- drivers/base/node.c |7 +++ 1 file changed, 7 insertions(+) Index: linux-3.5-rc6/drivers/base/node.c === --- linux-3.5-rc6.orig/drivers/base/node.c 2012-07-18 18:24:29.191121066 +0900 +++ linux-3.5-rc6/drivers/base/node.c 2012-07-18 18:25:47.46983 +0900 @@ -252,6 +252,12 @@ static inline void hugetlb_register_node static inline void hugetlb_unregister_node(struct node *node) {} #endif +static void node_device_release(struct device *dev) +{ + struct node *node_dev = to_node(dev); + + memset(node_dev, 0, sizeof(struct node)); +} /* * register_node - Setup a sysfs device for a node. @@ -265,6 +271,7 @@ int register_node(struct node *node, int node-dev.id = num; node-dev.bus = node_subsys; + node-dev.release = node_device_release; error = device_register(node-dev); if (!error){ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v4 13/13] memory-hotplug : remove sysfs file of node
The patch adds node_set_offline() and unregister_one_node() to remove_memory() for removing sysfs file of node. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- mm/memory_hotplug.c |5 + 1 file changed, 5 insertions(+) Index: linux-3.5-rc6/mm/memory_hotplug.c === --- linux-3.5-rc6.orig/mm/memory_hotplug.c 2012-07-18 18:25:11.036597977 +0900 +++ linux-3.5-rc6/mm/memory_hotplug.c 2012-07-18 18:25:54.860050109 +0900 @@ -1048,6 +1048,11 @@ int __ref remove_memory(int nid, u64 sta /* remove memmap entry */ firmware_map_remove(start, start + size, System RAM); + if (!node_present_pages(nid)) { + node_set_offline(nid); + unregister_one_node(nid); + } + __remove_pages(start PAGE_SHIFT, size PAGE_SHIFT); out: unlock_memory_hotplug(); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v4 3/13] memory-hotplug : check whether memory is present or not
Hi Wen, 2012/07/18 19:25, Wen Congyang wrote: At 07/18/2012 06:07 PM, Yasuaki Ishimatsu Wrote: If system supports memory hot-remove, online_pages() may online removed pages. So online_pages() need to check whether onlining pages are present or not. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- include/linux/mmzone.h | 21 + mm/memory_hotplug.c| 13 + 2 files changed, 34 insertions(+) Index: linux-3.5-rc6/include/linux/mmzone.h === --- linux-3.5-rc6.orig/include/linux/mmzone.h2012-07-08 09:23:56.0 +0900 +++ linux-3.5-rc6/include/linux/mmzone.h 2012-07-17 16:10:21.588186145 +0900 @@ -1168,6 +1168,27 @@ void sparse_init(void); #define sparse_index_init(_sec, _nid) do {} while (0) #endif /* CONFIG_SPARSEMEM */ +#ifdef CONFIG_SPARSEMEM +static inline int pfns_present(unsigned long pfn, unsigned long nr_pages) +{ +int i; +for (i = 0; i nr_pages; i++) { +if (pfn_present(pfn + 1)) +continue; +else { +unlock_memory_hotplug(); Why do you unlock memory hotplug here? The caller will do it. Ah, you are right. In this case, the function should only return -EINVAL. Thansks, Yasuaki Ishimatsu Thanks Wen Congyang +return -EINVAL; +} +} +return 0; +} +#else +static inline int pfns_present(unsigned long pfn, unsigned long nr_pages) +{ +return 0; +} +#endif /* CONFIG_SPARSEMEM*/ + #ifdef CONFIG_NODES_SPAN_OTHER_NODES bool early_pfn_in_nid(unsigned long pfn, int nid); #else Index: linux-3.5-rc6/mm/memory_hotplug.c === --- linux-3.5-rc6.orig/mm/memory_hotplug.c 2012-07-17 14:26:40.0 +0900 +++ linux-3.5-rc6/mm/memory_hotplug.c2012-07-17 16:09:50.070580170 +0900 @@ -467,6 +467,19 @@ int __ref online_pages(unsigned long pfn struct memory_notify arg; lock_memory_hotplug(); +/* + * If system supports memory hot-remove, the memory may have been + * removed. So we check whether the memory has been removed or not. + * + * Note: When CONFIG_SPARSEMEM is defined, pfns_present() become + * effective. If CONFIG_SPARSEMEM is not defined, pfns_present() + * always returns 0. + */ +ret = pfns_present(pfn, nr_pages); +if (ret) { +unlock_memory_hotplug(); +return ret; +} arg.start_pfn = pfn; arg.nr_pages = nr_pages; arg.status_change_nid = -1; ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH v4 4/13] memory-hotplug : remove /sys/firmware/memmap/X sysfs
Hi Wen, 2012/07/18 19:33, Wen Congyang wrote: At 07/18/2012 06:09 PM, Yasuaki Ishimatsu Wrote: When (hot)adding memory into system, /sys/firmware/memmap/X/{end, start, type} sysfs files are created. But there is no code to remove these files. The patch implements the function to remove them. Note : The code does not free firmware_map_entry since there is no way to free memory which is allocated by bootmem. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- drivers/firmware/memmap.c| 78 ++- include/linux/firmware-map.h |6 +++ mm/memory_hotplug.c |9 +++- 3 files changed, 90 insertions(+), 3 deletions(-) Index: linux-3.5-rc6/mm/memory_hotplug.c === --- linux-3.5-rc6.orig/mm/memory_hotplug.c 2012-07-18 17:20:05.670024283 +0900 +++ linux-3.5-rc6/mm/memory_hotplug.c2012-07-18 17:51:03.933189930 +0900 @@ -1012,9 +1012,9 @@ int offline_memory(u64 start, u64 size) return offline_pages(start_pfn, end_pfn, 120 * HZ); } -int remove_memory(int nid, u64 start, u64 size) +int __ref remove_memory(int nid, u64 start, u64 size) { -int ret = -EBUSY; +int ret = 0; lock_memory_hotplug(); /* * The memory might become online by other task, even if you offine it. @@ -1025,8 +1025,13 @@ int remove_memory(int nid, u64 start, u6 because the memmory range is online\n, start, start + size); ret = -EAGAIN; +goto out; } +/* remove memmap entry */ +firmware_map_remove(start, start + size, System RAM); + +out: unlock_memory_hotplug(); return ret; Index: linux-3.5-rc6/include/linux/firmware-map.h === --- linux-3.5-rc6.orig/include/linux/firmware-map.h 2012-07-18 17:19:37.007382563 +0900 +++ linux-3.5-rc6/include/linux/firmware-map.h 2012-07-18 17:42:20.804730245 +0900 @@ -25,6 +25,7 @@ int firmware_map_add_early(u64 start, u64 end, const char *type); int firmware_map_add_hotplug(u64 start, u64 end, const char *type); +int firmware_map_remove(u64 start, u64 end, const char *type); #else /* CONFIG_FIRMWARE_MEMMAP */ @@ -38,6 +39,11 @@ static inline int firmware_map_add_hotpl return 0; } +static inline int firmware_map_remove(u64 start, u64 end, const char *type) +{ +return 0; +} + #endif /* CONFIG_FIRMWARE_MEMMAP */ #endif /* _LINUX_FIRMWARE_MAP_H */ Index: linux-3.5-rc6/drivers/firmware/memmap.c === --- linux-3.5-rc6.orig/drivers/firmware/memmap.c 2012-07-18 17:19:43.618300182 +0900 +++ linux-3.5-rc6/drivers/firmware/memmap.c 2012-07-18 17:42:20.846729721 +0900 @@ -21,6 +21,7 @@ #include linux/types.h #include linux/bootmem.h #include linux/slab.h +#include linux/mm.h /* * Data types -- @@ -79,7 +80,22 @@ static const struct sysfs_ops memmap_att .show = memmap_attr_show, }; +#define to_memmap_entry(obj) container_of(obj, struct firmware_map_entry, kobj) + +static void release_firmware_map_entry(struct kobject *kobj) +{ +struct firmware_map_entry *entry = to_memmap_entry(kobj); +struct page *page; + +page = virt_to_page(entry); +if (PageSlab(page) || PageCompound(page)) +kfree(entry); IIRC, this function's implementation is changed. Why do you do it? If PageCompound(page), should we check page-first_page's flags? I forgot to write the change to change log. Jiang and Christoph discussed how to find the slab page: - https://lkml.org/lkml/2012/7/6/333 Then, Christoph proposed this method. So I changed it. Thanks, Yasuaki Ishimatsu Thanks Wen Congyang + +/* There is no way to free memory allocated from bootmem*/ +} + static struct kobj_type memmap_ktype = { +.release= release_firmware_map_entry, .sysfs_ops = memmap_attr_ops, .default_attrs = def_attrs, }; @@ -123,6 +139,16 @@ static int firmware_map_add_entry(u64 st return 0; } +/** + * firmware_map_remove_entry() - Does the real work to remove a firmware + * memmap entry. + * @entry: removed entry. + **/ +static inline void firmware_map_remove_entry(struct firmware_map_entry *entry) +{ +list_del(entry
Re: [RFC PATCH v3 4/13] memory-hotplug : remove /sys/firmware/memmap/X sysfs
Hi Wen, 2012/07/13 18:10, Wen Congyang wrote: At 07/09/2012 06:26 PM, Yasuaki Ishimatsu Wrote: When (hot)adding memory into system, /sys/firmware/memmap/X/{end, start, type} sysfs files are created. But there is no code to remove these files. The patch implements the function to remove them. Note : The code does not free firmware_map_entry since there is no way to free memory which is allocated by bootmem. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- drivers/firmware/memmap.c| 78 ++- include/linux/firmware-map.h |6 +++ mm/memory_hotplug.c |6 ++- 3 files changed, 88 insertions(+), 2 deletions(-) Index: linux-3.5-rc6/mm/memory_hotplug.c === --- linux-3.5-rc6.orig/mm/memory_hotplug.c 2012-07-09 18:23:13.323844923 +0900 +++ linux-3.5-rc6/mm/memory_hotplug.c2012-07-09 18:23:19.522767424 +0900 @@ -661,7 +661,11 @@ EXPORT_SYMBOL_GPL(add_memory); int remove_memory(int nid, u64 start, u64 size) { -return -EBUSY; +lock_memory_hotplug(); +/* remove memmap entry */ +firmware_map_remove(start, start + size - 1, System RAM); firmware_map_remove() is in meminit section, so remove_memory() should be in ref section. I'll add it. Thanks, Yasuaki Ishimatsu Thanks Wen Congyang +unlock_memory_hotplug(); +return 0; } EXPORT_SYMBOL_GPL(remove_memory); Index: linux-3.5-rc6/include/linux/firmware-map.h === --- linux-3.5-rc6.orig/include/linux/firmware-map.h 2012-07-09 18:23:09.532892314 +0900 +++ linux-3.5-rc6/include/linux/firmware-map.h 2012-07-09 18:23:19.523767412 +0900 @@ -25,6 +25,7 @@ int firmware_map_add_early(u64 start, u64 end, const char *type); int firmware_map_add_hotplug(u64 start, u64 end, const char *type); +int firmware_map_remove(u64 start, u64 end, const char *type); #else /* CONFIG_FIRMWARE_MEMMAP */ @@ -38,6 +39,11 @@ static inline int firmware_map_add_hotpl return 0; } +static inline int firmware_map_remove(u64 start, u64 end, const char *type) +{ +return 0; +} + #endif /* CONFIG_FIRMWARE_MEMMAP */ #endif /* _LINUX_FIRMWARE_MAP_H */ Index: linux-3.5-rc6/drivers/firmware/memmap.c === --- linux-3.5-rc6.orig/drivers/firmware/memmap.c 2012-07-09 18:23:09.532892314 +0900 +++ linux-3.5-rc6/drivers/firmware/memmap.c 2012-07-09 18:25:46.371931554 +0900 @@ -21,6 +21,7 @@ #include linux/types.h #include linux/bootmem.h #include linux/slab.h +#include linux/mm.h /* * Data types -- @@ -79,7 +80,22 @@ static const struct sysfs_ops memmap_att .show = memmap_attr_show, }; +#define to_memmap_entry(obj) container_of(obj, struct firmware_map_entry, kobj) + +static void release_firmware_map_entry(struct kobject *kobj) +{ +struct firmware_map_entry *entry = to_memmap_entry(kobj); +struct page *head_page; + +head_page = virt_to_head_page(entry); +if (PageSlab(head_page)) +kfree(entry); + +/* There is no way to free memory allocated from bootmem*/ +} + static struct kobj_type memmap_ktype = { +.release= release_firmware_map_entry, .sysfs_ops = memmap_attr_ops, .default_attrs = def_attrs, }; @@ -123,6 +139,16 @@ static int firmware_map_add_entry(u64 st return 0; } +/** + * firmware_map_remove_entry() - Does the real work to remove a firmware + * memmap entry. + * @entry: removed entry. + **/ +static inline void firmware_map_remove_entry(struct firmware_map_entry *entry) +{ +list_del(entry-list); +} + /* * Add memmap entry on sysfs */ @@ -144,6 +170,31 @@ static int add_sysfs_fw_map_entry(struct return 0; } +/* + * Remove memmap entry on sysfs + */ +static inline void remove_sysfs_fw_map_entry(struct firmware_map_entry *entry) +{ +kobject_put(entry-kobj); +} + +/* + * Search memmap entry + */ + +struct firmware_map_entry * __meminit +find_firmware_map_entry(u64 start, u64 end, const char *type) +{ +struct firmware_map_entry *entry; + +list_for_each_entry(entry, map_entries, list) +if ((entry-start == start) (entry-end == end) +(!strcmp(entry-type, type))) +return
Re: [RFC PATCH v3 4/13] memory-hotplug : remove /sys/firmware/memmap/X sysfs
Hi Wen, 2012/07/16 11:32, Wen Congyang wrote: At 07/09/2012 06:26 PM, Yasuaki Ishimatsu Wrote: When (hot)adding memory into system, /sys/firmware/memmap/X/{end, start, type} sysfs files are created. But there is no code to remove these files. The patch implements the function to remove them. Note : The code does not free firmware_map_entry since there is no way to free memory which is allocated by bootmem. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- drivers/firmware/memmap.c| 78 ++- include/linux/firmware-map.h |6 +++ mm/memory_hotplug.c |6 ++- 3 files changed, 88 insertions(+), 2 deletions(-) Index: linux-3.5-rc6/mm/memory_hotplug.c === --- linux-3.5-rc6.orig/mm/memory_hotplug.c 2012-07-09 18:23:13.323844923 +0900 +++ linux-3.5-rc6/mm/memory_hotplug.c2012-07-09 18:23:19.522767424 +0900 @@ -661,7 +661,11 @@ EXPORT_SYMBOL_GPL(add_memory); int remove_memory(int nid, u64 start, u64 size) { -return -EBUSY; +lock_memory_hotplug(); +/* remove memmap entry */ +firmware_map_remove(start, start + size - 1, System RAM); +unlock_memory_hotplug(); +return 0; } EXPORT_SYMBOL_GPL(remove_memory); Index: linux-3.5-rc6/include/linux/firmware-map.h === --- linux-3.5-rc6.orig/include/linux/firmware-map.h 2012-07-09 18:23:09.532892314 +0900 +++ linux-3.5-rc6/include/linux/firmware-map.h 2012-07-09 18:23:19.523767412 +0900 @@ -25,6 +25,7 @@ int firmware_map_add_early(u64 start, u64 end, const char *type); int firmware_map_add_hotplug(u64 start, u64 end, const char *type); +int firmware_map_remove(u64 start, u64 end, const char *type); #else /* CONFIG_FIRMWARE_MEMMAP */ @@ -38,6 +39,11 @@ static inline int firmware_map_add_hotpl return 0; } +static inline int firmware_map_remove(u64 start, u64 end, const char *type) +{ +return 0; +} + #endif /* CONFIG_FIRMWARE_MEMMAP */ #endif /* _LINUX_FIRMWARE_MAP_H */ Index: linux-3.5-rc6/drivers/firmware/memmap.c === --- linux-3.5-rc6.orig/drivers/firmware/memmap.c 2012-07-09 18:23:09.532892314 +0900 +++ linux-3.5-rc6/drivers/firmware/memmap.c 2012-07-09 18:25:46.371931554 +0900 @@ -21,6 +21,7 @@ #include linux/types.h #include linux/bootmem.h #include linux/slab.h +#include linux/mm.h /* * Data types -- @@ -79,7 +80,22 @@ static const struct sysfs_ops memmap_att .show = memmap_attr_show, }; +#define to_memmap_entry(obj) container_of(obj, struct firmware_map_entry, kobj) + +static void release_firmware_map_entry(struct kobject *kobj) +{ +struct firmware_map_entry *entry = to_memmap_entry(kobj); +struct page *head_page; + +head_page = virt_to_head_page(entry); +if (PageSlab(head_page)) +kfree(entry); + +/* There is no way to free memory allocated from bootmem*/ +} + static struct kobj_type memmap_ktype = { +.release= release_firmware_map_entry, .sysfs_ops = memmap_attr_ops, .default_attrs = def_attrs, }; @@ -123,6 +139,16 @@ static int firmware_map_add_entry(u64 st return 0; } +/** + * firmware_map_remove_entry() - Does the real work to remove a firmware + * memmap entry. + * @entry: removed entry. + **/ +static inline void firmware_map_remove_entry(struct firmware_map_entry *entry) +{ +list_del(entry-list); +} + /* * Add memmap entry on sysfs */ @@ -144,6 +170,31 @@ static int add_sysfs_fw_map_entry(struct return 0; } +/* + * Remove memmap entry on sysfs + */ +static inline void remove_sysfs_fw_map_entry(struct firmware_map_entry *entry) +{ +kobject_put(entry-kobj); +} + +/* + * Search memmap entry + */ + +struct firmware_map_entry * __meminit +find_firmware_map_entry(u64 start, u64 end, const char *type) +{ +struct firmware_map_entry *entry; + +list_for_each_entry(entry, map_entries, list) +if ((entry-start == start) (entry-end == end) +(!strcmp(entry-type, type))) +return entry; + +return NULL; +} + /** * firmware_map_add_hotplug() - Adds a firmware mapping entry when we do * memory hotplug. @@ -196,6 +247,32
Re: [RFC PATCH v3 2/13] memory-hotplug : add physical memory hotplug code to acpi_memory_device_remove
Hi Wen, 2012/07/13 12:26, Wen Congyang wrote: At 07/09/2012 06:24 PM, Yasuaki Ishimatsu Wrote: acpi_memory_device_remove() has been prepared to remove physical memory. But, the function only frees acpi_memory_device currentlry. The patch adds following functions into acpi_memory_device_remove(): - offline memory - remove physical memory (only return -EBUSY) - free acpi_memory_device CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- drivers/acpi/acpi_memhotplug.c | 26 +- drivers/base/memory.c | 39 +++ include/linux/memory.h |5 + include/linux/memory_hotplug.h |1 + mm/memory_hotplug.c|8 5 files changed, 78 insertions(+), 1 deletion(-) Index: linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c === --- linux-3.5-rc6.orig/drivers/acpi/acpi_memhotplug.c2012-07-09 18:08:29.946888653 +0900 +++ linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c 2012-07-09 18:08:43.470719531 +0900 @@ -29,6 +29,7 @@ #include linux/module.h #include linux/init.h #include linux/types.h +#include linux/memory.h #include linux/memory_hotplug.h #include linux/slab.h #include acpi/acpi_drivers.h @@ -452,12 +453,35 @@ static int acpi_memory_device_add(struct static int acpi_memory_device_remove(struct acpi_device *device, int type) { struct acpi_memory_device *mem_device = NULL; - +struct acpi_memory_info *info, *tmp; +int result; +int node; if (!device || !acpi_driver_data(device)) return -EINVAL; mem_device = acpi_driver_data(device); + +node = acpi_get_node(mem_device-device-handle); + +list_for_each_entry_safe(info, tmp, mem_device-res_list, list) { +if (!info-enabled) +continue; + +if (!is_memblk_offline(info-start_addr, info-length)) { +result = offline_memory(info-start_addr, info-length); +if (result) +return result; +} + +result = remove_memory(node, info-start_addr, info-length); +if (result) +return result; + +list_del(info-list); +kfree(info); +} + kfree(mem_device); return 0; Index: linux-3.5-rc6/include/linux/memory_hotplug.h === --- linux-3.5-rc6.orig/include/linux/memory_hotplug.h2012-07-09 18:08:29.955888542 +0900 +++ linux-3.5-rc6/include/linux/memory_hotplug.h 2012-07-09 18:08:43.471719518 +0900 @@ -233,6 +233,7 @@ static inline int is_mem_section_removab extern int mem_online_node(int nid); extern int add_memory(int nid, u64 start, u64 size); extern int arch_add_memory(int nid, u64 start, u64 size); +extern int remove_memory(int nid, u64 start, u64 size); Here should be: #ifdef CONFIG_MEMORY_HOTREMOVE extern int remove_memory(int nid, u64 start, u64 size); #else static int inline remove_memory(int nid, u64 start, u64 size) { return -EBUSY; } #endif O.K. I'll update it. Thanks, Yasuaki Ishimatsu extern int offline_memory(u64 start, u64 size); extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn, int nr_pages); Index: linux-3.5-rc6/mm/memory_hotplug.c === --- linux-3.5-rc6.orig/mm/memory_hotplug.c 2012-07-09 18:08:29.953888567 +0900 +++ linux-3.5-rc6/mm/memory_hotplug.c2012-07-09 18:08:43.476719455 +0900 @@ -659,6 +659,14 @@ out: } EXPORT_SYMBOL_GPL(add_memory); +int remove_memory(int nid, u64 start, u64 size) +{ +return -EBUSY; + +} +EXPORT_SYMBOL_GPL(remove_memory); We only need to implement this function when CONFIG_MEMORY_HOTREMOVE is defined here. Thanks Wen Congyang + + #ifdef CONFIG_MEMORY_HOTREMOVE /* * A free page on the buddy free lists (not the per-cpu lists) has PageBuddy Index: linux-3.5-rc6/drivers/base/memory.c === --- linux-3.5-rc6.orig/drivers/base/memory.c 2012-07-09 18:08:29.947888640 +0900 +++ linux-3.5-rc6/drivers/base/memory.c 2012-07-09 18:10:54.880076739 +0900 @@ -70,6 +70,45 @@ void unregister_memory_isolate_notifier
Re: [RFC PATCH v3 2/13] memory-hotplug : add physical memory hotplug code to acpi_memory_device_remove
Hi Wen, 2012/07/13 19:40, Wen Congyang wrote: At 07/09/2012 06:24 PM, Yasuaki Ishimatsu Wrote: acpi_memory_device_remove() has been prepared to remove physical memory. But, the function only frees acpi_memory_device currentlry. The patch adds following functions into acpi_memory_device_remove(): - offline memory - remove physical memory (only return -EBUSY) - free acpi_memory_device CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- drivers/acpi/acpi_memhotplug.c | 26 +- drivers/base/memory.c | 39 +++ include/linux/memory.h |5 + include/linux/memory_hotplug.h |1 + mm/memory_hotplug.c|8 5 files changed, 78 insertions(+), 1 deletion(-) Index: linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c === --- linux-3.5-rc6.orig/drivers/acpi/acpi_memhotplug.c2012-07-09 18:08:29.946888653 +0900 +++ linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c 2012-07-09 18:08:43.470719531 +0900 @@ -29,6 +29,7 @@ #include linux/module.h #include linux/init.h #include linux/types.h +#include linux/memory.h #include linux/memory_hotplug.h #include linux/slab.h #include acpi/acpi_drivers.h @@ -452,12 +453,35 @@ static int acpi_memory_device_add(struct static int acpi_memory_device_remove(struct acpi_device *device, int type) { struct acpi_memory_device *mem_device = NULL; - +struct acpi_memory_info *info, *tmp; +int result; +int node; if (!device || !acpi_driver_data(device)) return -EINVAL; mem_device = acpi_driver_data(device); + +node = acpi_get_node(mem_device-device-handle); acpi_get_node() may return -1, and you should call memory_add_physaddr_to_nid() to get the node id. O.K. I'll update it. Thanks, Yasuaki Ishimatsu Thanks Wen Congyang + +list_for_each_entry_safe(info, tmp, mem_device-res_list, list) { +if (!info-enabled) +continue; + +if (!is_memblk_offline(info-start_addr, info-length)) { +result = offline_memory(info-start_addr, info-length); +if (result) +return result; +} + +result = remove_memory(node, info-start_addr, info-length); +if (result) +return result; + +list_del(info-list); +kfree(info); +} + kfree(mem_device); return 0; Index: linux-3.5-rc6/include/linux/memory_hotplug.h === --- linux-3.5-rc6.orig/include/linux/memory_hotplug.h2012-07-09 18:08:29.955888542 +0900 +++ linux-3.5-rc6/include/linux/memory_hotplug.h 2012-07-09 18:08:43.471719518 +0900 @@ -233,6 +233,7 @@ static inline int is_mem_section_removab extern int mem_online_node(int nid); extern int add_memory(int nid, u64 start, u64 size); extern int arch_add_memory(int nid, u64 start, u64 size); +extern int remove_memory(int nid, u64 start, u64 size); extern int offline_memory(u64 start, u64 size); extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn, int nr_pages); Index: linux-3.5-rc6/mm/memory_hotplug.c === --- linux-3.5-rc6.orig/mm/memory_hotplug.c 2012-07-09 18:08:29.953888567 +0900 +++ linux-3.5-rc6/mm/memory_hotplug.c2012-07-09 18:08:43.476719455 +0900 @@ -659,6 +659,14 @@ out: } EXPORT_SYMBOL_GPL(add_memory); +int remove_memory(int nid, u64 start, u64 size) +{ +return -EBUSY; + +} +EXPORT_SYMBOL_GPL(remove_memory); + + #ifdef CONFIG_MEMORY_HOTREMOVE /* * A free page on the buddy free lists (not the per-cpu lists) has PageBuddy Index: linux-3.5-rc6/drivers/base/memory.c === --- linux-3.5-rc6.orig/drivers/base/memory.c 2012-07-09 18:08:29.947888640 +0900 +++ linux-3.5-rc6/drivers/base/memory.c 2012-07-09 18:10:54.880076739 +0900 @@ -70,6 +70,45 @@ void unregister_memory_isolate_notifier( } EXPORT_SYMBOL(unregister_memory_isolate_notifier); +bool is_memblk_offline(unsigned long start, unsigned long size) +{ +struct memory_block *mem = NULL; +struct mem_section *section; +unsigned
Re: [RFC PATCH v3 2/13] memory-hotplug : add physical memory hotplug code to acpi_memory_device_remove
Hi Wen, 2012/07/13 12:35, Wen Congyang wrote: At 07/09/2012 06:24 PM, Yasuaki Ishimatsu Wrote: acpi_memory_device_remove() has been prepared to remove physical memory. But, the function only frees acpi_memory_device currentlry. The patch adds following functions into acpi_memory_device_remove(): - offline memory - remove physical memory (only return -EBUSY) - free acpi_memory_device CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- drivers/acpi/acpi_memhotplug.c | 26 +- drivers/base/memory.c | 39 +++ include/linux/memory.h |5 + include/linux/memory_hotplug.h |1 + mm/memory_hotplug.c|8 5 files changed, 78 insertions(+), 1 deletion(-) Index: linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c === --- linux-3.5-rc6.orig/drivers/acpi/acpi_memhotplug.c2012-07-09 18:08:29.946888653 +0900 +++ linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c 2012-07-09 18:08:43.470719531 +0900 @@ -29,6 +29,7 @@ #include linux/module.h #include linux/init.h #include linux/types.h +#include linux/memory.h #include linux/memory_hotplug.h #include linux/slab.h #include acpi/acpi_drivers.h @@ -452,12 +453,35 @@ static int acpi_memory_device_add(struct static int acpi_memory_device_remove(struct acpi_device *device, int type) { struct acpi_memory_device *mem_device = NULL; - +struct acpi_memory_info *info, *tmp; +int result; +int node; if (!device || !acpi_driver_data(device)) return -EINVAL; mem_device = acpi_driver_data(device); + +node = acpi_get_node(mem_device-device-handle); + +list_for_each_entry_safe(info, tmp, mem_device-res_list, list) { +if (!info-enabled) +continue; + +if (!is_memblk_offline(info-start_addr, info-length)) { +result = offline_memory(info-start_addr, info-length); +if (result) +return result; +} + +result = remove_memory(node, info-start_addr, info-length); The user may online the memory between offline_memory() and remove_memory(). So I think we should lock memory hotplug before check the memory's status and release it after remove_memory(). How about get mem_block-state_mutex of removed memory? When offlining memory, we need to change memory_block-state into MEM_OFFLINE. In this case, we get mem_block-state_mutex. So I think the mutex lock is beneficial. Thanks, Yasuaki Ishimatsu Thanks Wen Congyang +if (result) +return result; + +list_del(info-list); +kfree(info); +} + kfree(mem_device); return 0; Index: linux-3.5-rc6/include/linux/memory_hotplug.h === --- linux-3.5-rc6.orig/include/linux/memory_hotplug.h2012-07-09 18:08:29.955888542 +0900 +++ linux-3.5-rc6/include/linux/memory_hotplug.h 2012-07-09 18:08:43.471719518 +0900 @@ -233,6 +233,7 @@ static inline int is_mem_section_removab extern int mem_online_node(int nid); extern int add_memory(int nid, u64 start, u64 size); extern int arch_add_memory(int nid, u64 start, u64 size); +extern int remove_memory(int nid, u64 start, u64 size); extern int offline_memory(u64 start, u64 size); extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn, int nr_pages); Index: linux-3.5-rc6/mm/memory_hotplug.c === --- linux-3.5-rc6.orig/mm/memory_hotplug.c 2012-07-09 18:08:29.953888567 +0900 +++ linux-3.5-rc6/mm/memory_hotplug.c2012-07-09 18:08:43.476719455 +0900 @@ -659,6 +659,14 @@ out: } EXPORT_SYMBOL_GPL(add_memory); +int remove_memory(int nid, u64 start, u64 size) +{ +return -EBUSY; + +} +EXPORT_SYMBOL_GPL(remove_memory); + + #ifdef CONFIG_MEMORY_HOTREMOVE /* * A free page on the buddy free lists (not the per-cpu lists) has PageBuddy Index: linux-3.5-rc6/drivers/base/memory.c === --- linux-3.5-rc6.orig/drivers/base/memory.c 2012-07-09 18:08:29.947888640 +0900 +++ linux-3.5-rc6/drivers/base/memory.c 2012-07-09 18:10:54.880076739
Re: [RFC PATCH v3 2/13] memory-hotplug : add physical memory hotplug code to acpi_memory_device_remove
Hi Wen, 2012/07/17 10:44, Yasuaki Ishimatsu wrote: Hi Wen, 2012/07/13 12:35, Wen Congyang wrote: At 07/09/2012 06:24 PM, Yasuaki Ishimatsu Wrote: acpi_memory_device_remove() has been prepared to remove physical memory. But, the function only frees acpi_memory_device currentlry. The patch adds following functions into acpi_memory_device_remove(): - offline memory - remove physical memory (only return -EBUSY) - free acpi_memory_device CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- drivers/acpi/acpi_memhotplug.c | 26 +- drivers/base/memory.c | 39 +++ include/linux/memory.h |5 + include/linux/memory_hotplug.h |1 + mm/memory_hotplug.c|8 5 files changed, 78 insertions(+), 1 deletion(-) Index: linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c === --- linux-3.5-rc6.orig/drivers/acpi/acpi_memhotplug.c 2012-07-09 18:08:29.946888653 +0900 +++ linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c2012-07-09 18:08:43.470719531 +0900 @@ -29,6 +29,7 @@ #include linux/module.h #include linux/init.h #include linux/types.h +#include linux/memory.h #include linux/memory_hotplug.h #include linux/slab.h #include acpi/acpi_drivers.h @@ -452,12 +453,35 @@ static int acpi_memory_device_add(struct static int acpi_memory_device_remove(struct acpi_device *device, int type) { struct acpi_memory_device *mem_device = NULL; - + struct acpi_memory_info *info, *tmp; + int result; + int node; if (!device || !acpi_driver_data(device)) return -EINVAL; mem_device = acpi_driver_data(device); + + node = acpi_get_node(mem_device-device-handle); + + list_for_each_entry_safe(info, tmp, mem_device-res_list, list) { + if (!info-enabled) + continue; + + if (!is_memblk_offline(info-start_addr, info-length)) { + result = offline_memory(info-start_addr, info-length); + if (result) + return result; + } + + result = remove_memory(node, info-start_addr, info-length); The user may online the memory between offline_memory() and remove_memory(). So I think we should lock memory hotplug before check the memory's status and release it after remove_memory(). How about get mem_block-state_mutex of removed memory? When offlining memory, we need to change memory_block-state into MEM_OFFLINE. In this case, we get mem_block-state_mutex. So I think the mutex lock is beneficial. It is not good idea since remove_memory frees mem_block structure... Do you have any ideas? Thanks, Yasuaki Ishimatsu Thanks, Yasuaki Ishimatsu Thanks Wen Congyang + if (result) + return result; + + list_del(info-list); + kfree(info); + } + kfree(mem_device); return 0; Index: linux-3.5-rc6/include/linux/memory_hotplug.h === --- linux-3.5-rc6.orig/include/linux/memory_hotplug.h 2012-07-09 18:08:29.955888542 +0900 +++ linux-3.5-rc6/include/linux/memory_hotplug.h2012-07-09 18:08:43.471719518 +0900 @@ -233,6 +233,7 @@ static inline int is_mem_section_removab extern int mem_online_node(int nid); extern int add_memory(int nid, u64 start, u64 size); extern int arch_add_memory(int nid, u64 start, u64 size); +extern int remove_memory(int nid, u64 start, u64 size); extern int offline_memory(u64 start, u64 size); extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn, int nr_pages); Index: linux-3.5-rc6/mm/memory_hotplug.c === --- linux-3.5-rc6.orig/mm/memory_hotplug.c 2012-07-09 18:08:29.953888567 +0900 +++ linux-3.5-rc6/mm/memory_hotplug.c 2012-07-09 18:08:43.476719455 +0900 @@ -659,6 +659,14 @@ out: } EXPORT_SYMBOL_GPL(add_memory); +int remove_memory(int nid, u64 start, u64 size) +{ + return -EBUSY; + +} +EXPORT_SYMBOL_GPL(remove_memory); + + #ifdef CONFIG_MEMORY_HOTREMOVE /* * A free page on the buddy free lists (not the per-cpu lists) has PageBuddy Index: linux-3.5-rc6/drivers/base/memory.c
Re: [RFC PATCH v3 2/13] memory-hotplug : add physical memory hotplug code to acpi_memory_device_remove
Hi Wen, 2012/07/17 11:32, Wen Congyang wrote: At 07/17/2012 09:54 AM, Yasuaki Ishimatsu Wrote: Hi Wen, 2012/07/17 10:44, Yasuaki Ishimatsu wrote: Hi Wen, 2012/07/13 12:35, Wen Congyang wrote: At 07/09/2012 06:24 PM, Yasuaki Ishimatsu Wrote: acpi_memory_device_remove() has been prepared to remove physical memory. But, the function only frees acpi_memory_device currentlry. The patch adds following functions into acpi_memory_device_remove(): - offline memory - remove physical memory (only return -EBUSY) - free acpi_memory_device CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- drivers/acpi/acpi_memhotplug.c | 26 +- drivers/base/memory.c | 39 +++ include/linux/memory.h |5 + include/linux/memory_hotplug.h |1 + mm/memory_hotplug.c|8 5 files changed, 78 insertions(+), 1 deletion(-) Index: linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c === --- linux-3.5-rc6.orig/drivers/acpi/acpi_memhotplug.c 2012-07-09 18:08:29.946888653 +0900 +++ linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c 2012-07-09 18:08:43.470719531 +0900 @@ -29,6 +29,7 @@ #include linux/module.h #include linux/init.h #include linux/types.h +#include linux/memory.h #include linux/memory_hotplug.h #include linux/slab.h #include acpi/acpi_drivers.h @@ -452,12 +453,35 @@ static int acpi_memory_device_add(struct static int acpi_memory_device_remove(struct acpi_device *device, int type) { struct acpi_memory_device *mem_device = NULL; - + struct acpi_memory_info *info, *tmp; + int result; + int node; if (!device || !acpi_driver_data(device)) return -EINVAL; mem_device = acpi_driver_data(device); + + node = acpi_get_node(mem_device-device-handle); + + list_for_each_entry_safe(info, tmp, mem_device-res_list, list) { + if (!info-enabled) + continue; + + if (!is_memblk_offline(info-start_addr, info-length)) { + result = offline_memory(info-start_addr, info-length); + if (result) + return result; + } + + result = remove_memory(node, info-start_addr, info-length); The user may online the memory between offline_memory() and remove_memory(). So I think we should lock memory hotplug before check the memory's status and release it after remove_memory(). How about get mem_block-state_mutex of removed memory? When offlining memory, we need to change memory_block-state into MEM_OFFLINE. In this case, we get mem_block-state_mutex. So I think the mutex lock is beneficial. It is not good idea since remove_memory frees mem_block structure... Do you have any ideas? Hmm, split offline_memory() to 2 functions: offline_pages() and __offline_pages() offline_pages() lock_memory_hotplug(); __offline_pages(); unlock_memory_hotplug(); and implement remove_memory() like this: remove_memory() lock_memory_hotplug() if (!is_memblk_offline()) { __offline_pages(); } // cleanup unlock_memory_hotplug(); What about this? I also thought about it once. But a problem remains. Current offilne_pages() cannot realize the memory has been removed by remove_memory(). So even if protecting the race by lock_memory_hotplug(), offline_pages() can offline the removed memory. offline_pages() should have the means to know the memory was removed. But I don't have good idea. Thanks, Yasuaki Ishimatsu Thanks Wen Congyang Thanks, Yasuaki Ishimatsu Thanks, Yasuaki Ishimatsu Thanks Wen Congyang + if (result) + return result; + + list_del(info-list); + kfree(info); + } + kfree(mem_device); return 0; Index: linux-3.5-rc6/include/linux/memory_hotplug.h === --- linux-3.5-rc6.orig/include/linux/memory_hotplug.h 2012-07-09 18:08:29.955888542 +0900 +++ linux-3.5-rc6/include/linux/memory_hotplug.h 2012-07-09 18:08:43.471719518 +0900 @@ -233,6 +233,7 @@ static inline int is_mem_section_removab extern int mem_online_node(int nid); extern int add_memory(int nid, u64 start, u64 size); extern int arch_add_memory(int nid, u64 start, u64 size); +extern int
Re: [RFC PATCH v3 2/13] memory-hotplug : add physical memory hotplug code to acpi_memory_device_remove
Hi Wen, 2012/07/17 12:32, Wen Congyang wrote: At 07/17/2012 11:08 AM, Yasuaki Ishimatsu Wrote: Hi Wen, 2012/07/17 11:32, Wen Congyang wrote: At 07/17/2012 09:54 AM, Yasuaki Ishimatsu Wrote: Hi Wen, 2012/07/17 10:44, Yasuaki Ishimatsu wrote: Hi Wen, 2012/07/13 12:35, Wen Congyang wrote: At 07/09/2012 06:24 PM, Yasuaki Ishimatsu Wrote: acpi_memory_device_remove() has been prepared to remove physical memory. But, the function only frees acpi_memory_device currentlry. The patch adds following functions into acpi_memory_device_remove(): - offline memory - remove physical memory (only return -EBUSY) - free acpi_memory_device CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- drivers/acpi/acpi_memhotplug.c | 26 +- drivers/base/memory.c | 39 +++ include/linux/memory.h |5 + include/linux/memory_hotplug.h |1 + mm/memory_hotplug.c|8 5 files changed, 78 insertions(+), 1 deletion(-) Index: linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c === --- linux-3.5-rc6.orig/drivers/acpi/acpi_memhotplug.c 2012-07-09 18:08:29.946888653 +0900 +++ linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c2012-07-09 18:08:43.470719531 +0900 @@ -29,6 +29,7 @@ #include linux/module.h #include linux/init.h #include linux/types.h +#include linux/memory.h #include linux/memory_hotplug.h #include linux/slab.h #include acpi/acpi_drivers.h @@ -452,12 +453,35 @@ static int acpi_memory_device_add(struct static int acpi_memory_device_remove(struct acpi_device *device, int type) { struct acpi_memory_device *mem_device = NULL; - + struct acpi_memory_info *info, *tmp; + int result; + int node; if (!device || !acpi_driver_data(device)) return -EINVAL; mem_device = acpi_driver_data(device); + + node = acpi_get_node(mem_device-device-handle); + + list_for_each_entry_safe(info, tmp, mem_device-res_list, list) { + if (!info-enabled) + continue; + + if (!is_memblk_offline(info-start_addr, info-length)) { + result = offline_memory(info-start_addr, info-length); + if (result) + return result; + } + + result = remove_memory(node, info-start_addr, info-length); The user may online the memory between offline_memory() and remove_memory(). So I think we should lock memory hotplug before check the memory's status and release it after remove_memory(). How about get mem_block-state_mutex of removed memory? When offlining memory, we need to change memory_block-state into MEM_OFFLINE. In this case, we get mem_block-state_mutex. So I think the mutex lock is beneficial. It is not good idea since remove_memory frees mem_block structure... Do you have any ideas? Hmm, split offline_memory() to 2 functions: offline_pages() and __offline_pages() offline_pages() lock_memory_hotplug(); __offline_pages(); unlock_memory_hotplug(); and implement remove_memory() like this: remove_memory() lock_memory_hotplug() if (!is_memblk_offline()) { __offline_pages(); } // cleanup unlock_memory_hotplug(); What about this? I also thought about it once. But a problem remains. Current offilne_pages() cannot realize the memory has been removed by remove_memory(). So even if protecting the race by lock_memory_hotplug(), offline_pages() can offline the removed memory. offline_pages() should have the means to know the memory was removed. But I don't have good idea. We can not online/offline part of memory block, so what about this? It seems you do not understand my concern. When memory_remove() and offline_pages() run to same memory simultaneously, offline_pages runs to removed memory. memory_remove() | offline_pages() --- lock_memory_hotplug()| | wait at lock_memory_hotplug() remove memory| unlock_memory_hotplug() | | wake up and start offline_pages() | offline page | = but the memory has already removed
Re: [RFC PATCH v3 2/13] memory-hotplug : add physical memory hotplug code to acpi_memory_device_remove
Hi Wen, 2012/07/17 14:17, Wen Congyang wrote: At 07/17/2012 12:51 PM, Yasuaki Ishimatsu Wrote: Hi Wen, 2012/07/17 12:32, Wen Congyang wrote: At 07/17/2012 11:08 AM, Yasuaki Ishimatsu Wrote: Hi Wen, 2012/07/17 11:32, Wen Congyang wrote: At 07/17/2012 09:54 AM, Yasuaki Ishimatsu Wrote: Hi Wen, 2012/07/17 10:44, Yasuaki Ishimatsu wrote: Hi Wen, 2012/07/13 12:35, Wen Congyang wrote: At 07/09/2012 06:24 PM, Yasuaki Ishimatsu Wrote: acpi_memory_device_remove() has been prepared to remove physical memory. But, the function only frees acpi_memory_device currentlry. The patch adds following functions into acpi_memory_device_remove(): - offline memory - remove physical memory (only return -EBUSY) - free acpi_memory_device CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- drivers/acpi/acpi_memhotplug.c | 26 +- drivers/base/memory.c | 39 +++ include/linux/memory.h |5 + include/linux/memory_hotplug.h |1 + mm/memory_hotplug.c|8 5 files changed, 78 insertions(+), 1 deletion(-) Index: linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c === --- linux-3.5-rc6.orig/drivers/acpi/acpi_memhotplug.c 2012-07-09 18:08:29.946888653 +0900 +++ linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c 2012-07-09 18:08:43.470719531 +0900 @@ -29,6 +29,7 @@ #include linux/module.h #include linux/init.h #include linux/types.h +#include linux/memory.h #include linux/memory_hotplug.h #include linux/slab.h #include acpi/acpi_drivers.h @@ -452,12 +453,35 @@ static int acpi_memory_device_add(struct static int acpi_memory_device_remove(struct acpi_device *device, int type) { struct acpi_memory_device *mem_device = NULL; - + struct acpi_memory_info *info, *tmp; + int result; + int node; if (!device || !acpi_driver_data(device)) return -EINVAL; mem_device = acpi_driver_data(device); + + node = acpi_get_node(mem_device-device-handle); + + list_for_each_entry_safe(info, tmp, mem_device-res_list, list) { + if (!info-enabled) + continue; + + if (!is_memblk_offline(info-start_addr, info-length)) { + result = offline_memory(info-start_addr, info-length); + if (result) + return result; + } + + result = remove_memory(node, info-start_addr, info-length); The user may online the memory between offline_memory() and remove_memory(). So I think we should lock memory hotplug before check the memory's status and release it after remove_memory(). How about get mem_block-state_mutex of removed memory? When offlining memory, we need to change memory_block-state into MEM_OFFLINE. In this case, we get mem_block-state_mutex. So I think the mutex lock is beneficial. It is not good idea since remove_memory frees mem_block structure... Do you have any ideas? Hmm, split offline_memory() to 2 functions: offline_pages() and __offline_pages() offline_pages() lock_memory_hotplug(); __offline_pages(); unlock_memory_hotplug(); and implement remove_memory() like this: remove_memory() lock_memory_hotplug() if (!is_memblk_offline()) { __offline_pages(); } // cleanup unlock_memory_hotplug(); What about this? I also thought about it once. But a problem remains. Current offilne_pages() cannot realize the memory has been removed by remove_memory(). So even if protecting the race by lock_memory_hotplug(), offline_pages() can offline the removed memory. offline_pages() should have the means to know the memory was removed. But I don't have good idea. We can not online/offline part of memory block, so what about this? It seems you do not understand my concern. When memory_remove() and offline_pages() run to same memory simultaneously, offline_pages runs to removed memory. memory_remove() | offline_pages() --- lock_memory_hotplug()| | wait at lock_memory_hotplug() remove memory| unlock_memory_hotplug() | | wake up and start offline_pages
Re: [RFC PATCH v3 3/13] memory-hotplug : unify argument of firmware_map_add_early/hotplug
Hi Dave, 2012/07/12 22:40, Dave Hansen wrote: On 07/11/2012 09:52 PM, Yasuaki Ishimatsu wrote: Does the following patch include your comment? If O.K., I will separate the patch from the series and send it for bug fix. Looks sane to me. It does now mean that the calling conventions for some of the other firmware_map*() functions are different, but I think that's OK since they're only used internally to memmap.c. Thank you for reviewing my patch. I'll send the patch. Thanks, Yasuaki Ishimatsu -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH v3 3/13] memory-hotplug : unify argument of firmware_map_add_early/hotplug
Hi Dave, 2012/07/12 22:40, Dave Hansen wrote: On 07/11/2012 09:52 PM, Yasuaki Ishimatsu wrote: Does the following patch include your comment? If O.K., I will separate the patch from the series and send it for bug fix. Looks sane to me. It does now mean that the calling conventions for some of the other firmware_map*() functions are different, but I think that's OK since they're only used internally to memmap.c. Can I add Reviewed-by: Dave Hansen to the patch? Thanks, Yasuaki Ishimatsu -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH v3 11/13] memory-hotplug : free memmap of sparse-vmemmap
Hi Wen, 2012/07/11 15:25, Wen Congyang wrote: At 07/11/2012 01:52 PM, Yasuaki Ishimatsu Wrote: 2012/07/11 14:06, Wen Congyang wrote: Hi Wen, At 07/09/2012 06:33 PM, Yasuaki Ishimatsu Wrote: I don't think that all pages of virtual mapping in removed memory can be freed, since page which type is MIX_SECTION_INFO is difficult to free. So, the patch only frees page which type is SECTION_INFO at first. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- arch/x86/mm/init_64.c | 91 ++ include/linux/mm.h|2 + mm/memory_hotplug.c |5 ++ mm/sparse.c |5 +- 4 files changed, 101 insertions(+), 2 deletions(-) Index: linux-3.5-rc4/include/linux/mm.h === --- linux-3.5-rc4.orig/include/linux/mm.h 2012-07-03 14:22:18.530011567 +0900 +++ linux-3.5-rc4/include/linux/mm.h 2012-07-03 14:22:20.83872 +0900 @@ -1588,6 +1588,8 @@ int vmemmap_populate(struct page *start_ void vmemmap_populate_print_last(void); void register_page_bootmem_memmap(unsigned long section_nr, struct page *map, unsigned long size); +void vmemmap_kfree(struct page *memmpa, unsigned long nr_pages); +void vmemmap_free_bootmem(struct page *memmpa, unsigned long nr_pages); enum mf_flags { MF_COUNT_INCREASED = 1 0, Index: linux-3.5-rc4/mm/sparse.c === --- linux-3.5-rc4.orig/mm/sparse.c 2012-07-03 14:21:45.071429805 +0900 +++ linux-3.5-rc4/mm/sparse.c 2012-07-03 14:22:21.000983767 +0900 @@ -614,12 +614,13 @@ static inline struct page *kmalloc_secti /* This will make the necessary allocations eventually. */ return sparse_mem_map_populate(pnum, nid); } -static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages) +static void __kfree_section_memmap(struct page *page, unsigned long nr_pages) { - return; /* XXX: Not implemented yet */ + vmemmap_kfree(page, nr_pages); Hmm, I think you try to free the memory allocated in kmalloc_section_memmap(). Yes. } static void free_map_bootmem(struct page *page, unsigned long nr_pages) { + vmemmap_free_bootmem(page, nr_pages); } Hmm, which function is the memory you try to free allocated in? The function try to free memory allocated from bootmem. The memory has been registered by get_page_bootmem(). So we can free the memory by put_page_bootmem(). OK, I will read these codes, and check it. #else static struct page *__kmalloc_section_memmap(unsigned long nr_pages) Index: linux-3.5-rc4/arch/x86/mm/init_64.c === --- linux-3.5-rc4.orig/arch/x86/mm/init_64.c 2012-07-03 14:22:18.538011465 +0900 +++ linux-3.5-rc4/arch/x86/mm/init_64.c2012-07-03 14:22:21.007983103 +0900 @@ -978,6 +978,97 @@ vmemmap_populate(struct page *start_page return 0; } +unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long end, +struct page **pp) +{ + pgd_t *pgd; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + unsigned long next; + + *pp = NULL; + + pgd = pgd_offset_k(addr); + if (pgd_none(*pgd)) + return (addr + PAGE_SIZE) PAGE_MASK; Hmm, why not goto next pgd? Does it mean return (addr + PGDIR_SIZE) PGDIR_MASK? + + pud = pud_offset(pgd, addr); + if (pud_none(*pud)) + return (addr + PAGE_SIZE) PAGE_MASK; + + if (!cpu_has_pse) { + next = (addr + PAGE_SIZE) PAGE_MASK; + pmd = pmd_offset(pud, addr); + if (pmd_none(*pmd)) + return next; + + pte = pte_offset_kernel(pmd, addr); + if (pte_none(*pte)) + return next; + + *pp = pte_page(*pte); + pte_clear(init_mm, addr, pte); I think you should flush tlb here. Thanks, I'll update it. + } else { + next = pmd_addr_end(addr, end); + + pmd = pmd_offset(pud, addr); + if (pmd_none(*pmd)) + return next; + + *pp = pmd_page(*pmd); + pmd_clear(pmd); + } + + return next; +} + +void __meminit +vmemmap_kfree(struct page *memmap, unsigned long nr_pages) +{ + unsigned long addr = (unsigned long)memmap; + unsigned long end = (unsigned long)(memmap + nr_pages); + unsigned long next
Re: [RFC PATCH v3 3/13] memory-hotplug : unify argument of firmware_map_add_early/hotplug
Hi Dave, 2012/07/12 0:30, Dave Hansen wrote: On 07/09/2012 03:25 AM, Yasuaki Ishimatsu wrote: @@ -642,7 +642,7 @@ int __ref add_memory(int nid, u64 start, } /* create new memmap entry */ -firmware_map_add_hotplug(start, start + size, System RAM); +firmware_map_add_hotplug(start, start + size - 1, System RAM); I know the firmware_map_*() calls use inclusive end addresses internally, but do we really need to expose them? Both of the callers you mentioned do: firmware_map_add_hotplug(start, start + size - 1, System RAM); or firmware_map_add_early(entry-addr, entry-addr + entry-size - 1, e820_type_to_string(entry-type)); So it seems a _bit_ silly to keep all of the callers doing this size-1 thing. I also noted that the new caller that you added does the same thing. Could we just change the external calling convention to be exclusive? Thank you for your comment. Does the following patch include your comment? If O.K., I will separate the patch from the series and send it for bug fix. --- arch/x86/kernel/e820.c|2 +- drivers/firmware/memmap.c |8 2 files changed, 5 insertions(+), 5 deletions(-) Index: linux-next/arch/x86/kernel/e820.c === --- linux-next.orig/arch/x86/kernel/e820.c 2012-07-02 09:50:23.0 +0900 +++ linux-next/arch/x86/kernel/e820.c 2012-07-12 13:30:45.942318179 +0900 @@ -944,7 +944,7 @@ for (i = 0; i e820_saved.nr_map; i++) { struct e820entry *entry = e820_saved.map[i]; firmware_map_add_early(entry-addr, - entry-addr + entry-size - 1, + entry-addr + entry-size, e820_type_to_string(entry-type)); } } Index: linux-next/drivers/firmware/memmap.c === --- linux-next.orig/drivers/firmware/memmap.c 2012-07-02 09:50:26.0 +0900 +++ linux-next/drivers/firmware/memmap.c2012-07-12 13:40:53.823318481 +0900 @@ -98,7 +98,7 @@ /** * firmware_map_add_entry() - Does the real work to add a firmware memmap entry. * @start: Start of the memory range. - * @end: End of the memory range (inclusive). + * @end: End of the memory range. * @type: Type of the memory range. * @entry: Pre-allocated (either kmalloc() or bootmem allocator), uninitialised * entry. @@ -113,7 +113,7 @@ BUG_ON(start end); entry-start = start; - entry-end = end; + entry-end = end - 1; entry-type = type; INIT_LIST_HEAD(entry-list); kobject_init(entry-kobj, memmap_ktype); @@ -148,7 +148,7 @@ * firmware_map_add_hotplug() - Adds a firmware mapping entry when we do * memory hotplug. * @start: Start of the memory range. - * @end: End of the memory range (inclusive). + * @end: End of the memory range. * @type: Type of the memory range. * * Adds a firmware mapping entry. This function is for memory hotplug, it is @@ -175,7 +175,7 @@ /** * firmware_map_add_early() - Adds a firmware mapping entry. * @start: Start of the memory range. - * @end: End of the memory range (inclusive). + * @end: End of the memory range. * @type: Type of the memory range. * * Adds a firmware mapping entry. This function uses the bootmem allocator ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH v3 0/13] memory-hotplug : hot-remove physical memory
Hi Christoph, 2012/07/10 0:18, Christoph Lameter wrote: On Mon, 9 Jul 2012, Yasuaki Ishimatsu wrote: Even if you apply these patches, you cannot remove the physical memory completely since these patches are still under development. I want you to cooperate to improve the physical memory hot-remove. So please review these patches and give your comment/idea. Could you at least give a method on how you want to do physical memory removal? We plan to release a dynamic hardware partitionable system. It will be able to hot remove/add a system board which included memory and cpu. But as you know, Linux does not support memory hot-remove on x86 box. So I try to develop it. Current plan to hot remove system board is to use container driver. Thus I define the system board in ACPI DSDT table as a container device. It have supported hot-add a container device. And if container device has _EJ0 ACPI method, eject file to remove the container device is prepared as follow: # ls -l /sys/bus/acpi/devices/ACPI0004\:01/eject --w---. 1 root root 4096 Jul 10 18:19 /sys/bus/acpi/devices/ACPI0004:01/eject When I hot-remove the container device, I echo 1 to the file as follow: #echo 1 /sys/bus/acpi/devices/ACPI0004\:02/eject Then acpi_bus_trim() is called. And it calls acpi_memory_device_remove() for removing memory device. But the code does not do nothing. So I developed the continuation of the function. You would have to remove all objects from the range you want to physically remove. That is only possible under special circumstances and with a limited set of objects. Even if you exclusively use ZONE_MOVEABLE you still may get cases where pages are pinned for a long time. I know it. So my memory hot-remove plan is as follows: 1. hot-added a system board All memory which included the system board is offline. 2. online the memory as removable page The function has not supported yet. It is being developed by Lai as follow: http://lkml.indiana.edu/hypermail/linux/kernel/1207.0/01478.html If it is supported, I will be able to create movable memory. 3. hot-remove the memory by container device's eject file Thanks, Yasuaki Ishimatsu I am not sure that these patches are useful unless we know where you are going with this. If we end up with a situation where we still cannot remove physical memory then this patchset is not helpful. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH v3 0/13] memory-hotplug : hot-remove physical memory
Hi Jiang, 2012/07/11 1:50, Jiang Liu wrote: On 07/10/2012 05:58 PM, Yasuaki Ishimatsu wrote: Hi Christoph, 2012/07/10 0:18, Christoph Lameter wrote: On Mon, 9 Jul 2012, Yasuaki Ishimatsu wrote: Even if you apply these patches, you cannot remove the physical memory completely since these patches are still under development. I want you to cooperate to improve the physical memory hot-remove. So please review these patches and give your comment/idea. Could you at least give a method on how you want to do physical memory removal? We plan to release a dynamic hardware partitionable system. It will be able to hot remove/add a system board which included memory and cpu. But as you know, Linux does not support memory hot-remove on x86 box. So I try to develop it. Current plan to hot remove system board is to use container driver. Thus I define the system board in ACPI DSDT table as a container device. It have supported hot-add a container device. And if container device has _EJ0 ACPI method, eject file to remove the container device is prepared as follow: # ls -l /sys/bus/acpi/devices/ACPI0004\:01/eject --w---. 1 root root 4096 Jul 10 18:19 /sys/bus/acpi/devices/ACPI0004:01/eject When I hot-remove the container device, I echo 1 to the file as follow: #echo 1 /sys/bus/acpi/devices/ACPI0004\:02/eject Then acpi_bus_trim() is called. And it calls acpi_memory_device_remove() for removing memory device. But the code does not do nothing. So I developed the continuation of the function. You would have to remove all objects from the range you want to physically remove. That is only possible under special circumstances and with a limited set of objects. Even if you exclusively use ZONE_MOVEABLE you still may get cases where pages are pinned for a long time. I know it. So my memory hot-remove plan is as follows: 1. hot-added a system board All memory which included the system board is offline. 2. online the memory as removable page The function has not supported yet. It is being developed by Lai as follow: http://lkml.indiana.edu/hypermail/linux/kernel/1207.0/01478.html If it is supported, I will be able to create movable memory. 3. hot-remove the memory by container device's eject file We have implemented a prototype to do physical node (mem + CPU + IOH) hotplug for Itanium and is now porting it to x86. But with currently solution, memory hotplug functionality may cause 10-20% performance decrease because we concentrate all DMA/Normal memory to the first NUMA node, and all other NUMA nodes only hosts ZONE_MOVABLE. We are working on solution to minimize the performance drop now. Thank you for your interesting response. I have a question. How do you move all other NUMA nodes to ZONE_MOVABLE? To use ZONE_MOVABLE, we need to use boot options like kernelcore or movablecore. But it is not enough, since the requested amount is spread evenly throughout all nodes in the system. So I think we do not have way to move all other NUMA node to ZONE_MOVABLE. Thanks, Yasuaki Ishimatsu Thanks, Yasuaki Ishimatsu I am not sure that these patches are useful unless we know where you are going with this. If we end up with a situation where we still cannot remove physical memory then this patchset is not helpful. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH v3 0/13] memory-hotplug : hot-remove physical memory
Hi Jiang, 2012/07/11 9:21, Jiang Liu wrote: On 07/11/2012 08:09 AM, Yasuaki Ishimatsu wrote: Hi Jiang, 2012/07/11 1:50, Jiang Liu wrote: On 07/10/2012 05:58 PM, Yasuaki Ishimatsu wrote: Hi Christoph, 2012/07/10 0:18, Christoph Lameter wrote: On Mon, 9 Jul 2012, Yasuaki Ishimatsu wrote: Even if you apply these patches, you cannot remove the physical memory completely since these patches are still under development. I want you to cooperate to improve the physical memory hot-remove. So please review these patches and give your comment/idea. Could you at least give a method on how you want to do physical memory removal? We plan to release a dynamic hardware partitionable system. It will be able to hot remove/add a system board which included memory and cpu. But as you know, Linux does not support memory hot-remove on x86 box. So I try to develop it. Current plan to hot remove system board is to use container driver. Thus I define the system board in ACPI DSDT table as a container device. It have supported hot-add a container device. And if container device has _EJ0 ACPI method, eject file to remove the container device is prepared as follow: # ls -l /sys/bus/acpi/devices/ACPI0004\:01/eject --w---. 1 root root 4096 Jul 10 18:19 /sys/bus/acpi/devices/ACPI0004:01/eject When I hot-remove the container device, I echo 1 to the file as follow: #echo 1 /sys/bus/acpi/devices/ACPI0004\:02/eject Then acpi_bus_trim() is called. And it calls acpi_memory_device_remove() for removing memory device. But the code does not do nothing. So I developed the continuation of the function. You would have to remove all objects from the range you want to physically remove. That is only possible under special circumstances and with a limited set of objects. Even if you exclusively use ZONE_MOVEABLE you still may get cases where pages are pinned for a long time. I know it. So my memory hot-remove plan is as follows: 1. hot-added a system board All memory which included the system board is offline. 2. online the memory as removable page The function has not supported yet. It is being developed by Lai as follow: http://lkml.indiana.edu/hypermail/linux/kernel/1207.0/01478.html If it is supported, I will be able to create movable memory. 3. hot-remove the memory by container device's eject file We have implemented a prototype to do physical node (mem + CPU + IOH) hotplug for Itanium and is now porting it to x86. But with currently solution, memory hotplug functionality may cause 10-20% performance decrease because we concentrate all DMA/Normal memory to the first NUMA node, and all other NUMA nodes only hosts ZONE_MOVABLE. We are working on solution to minimize the performance drop now. Thank you for your interesting response. I have a question. How do you move all other NUMA nodes to ZONE_MOVABLE? To use ZONE_MOVABLE, we need to use boot options like kernelcore or movablecore. But it is not enough, since the requested amount is spread evenly throughout all nodes in the system. So I think we do not have way to move all other NUMA node to ZONE_MOVABLE. We have modified the ZONE_MOVABLE spreading and bootmem allocation. If the kernelcore or movablecore kernel parameters are present, we follow current behavior. If those parameter are absent and the platform supports physical hotplug, we will concentrate DMA/NORMAL memory to specific nodes. That's interesting. I want to know more details, if you do not mind. Current kernel doesn't do the behavior, does it? So I think you have some patches for changing the behavior. Will you merge these patches into community kernel? Thanks, Yasuaki Ishimatsu Thanks, Yasuaki Ishimatsu Thanks, Yasuaki Ishimatsu I am not sure that these patches are useful unless we know where you are going with this. If we end up with a situation where we still cannot remove physical memory then this patchset is not helpful. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH v3 11/13] memory-hotplug : free memmap of sparse-vmemmap
2012/07/11 14:06, Wen Congyang wrote: Hi Wen, At 07/09/2012 06:33 PM, Yasuaki Ishimatsu Wrote: I don't think that all pages of virtual mapping in removed memory can be freed, since page which type is MIX_SECTION_INFO is difficult to free. So, the patch only frees page which type is SECTION_INFO at first. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- arch/x86/mm/init_64.c | 91 ++ include/linux/mm.h|2 + mm/memory_hotplug.c |5 ++ mm/sparse.c |5 +- 4 files changed, 101 insertions(+), 2 deletions(-) Index: linux-3.5-rc4/include/linux/mm.h === --- linux-3.5-rc4.orig/include/linux/mm.h2012-07-03 14:22:18.530011567 +0900 +++ linux-3.5-rc4/include/linux/mm.h 2012-07-03 14:22:20.83872 +0900 @@ -1588,6 +1588,8 @@ int vmemmap_populate(struct page *start_ void vmemmap_populate_print_last(void); void register_page_bootmem_memmap(unsigned long section_nr, struct page *map, unsigned long size); +void vmemmap_kfree(struct page *memmpa, unsigned long nr_pages); +void vmemmap_free_bootmem(struct page *memmpa, unsigned long nr_pages); enum mf_flags { MF_COUNT_INCREASED = 1 0, Index: linux-3.5-rc4/mm/sparse.c === --- linux-3.5-rc4.orig/mm/sparse.c 2012-07-03 14:21:45.071429805 +0900 +++ linux-3.5-rc4/mm/sparse.c2012-07-03 14:22:21.000983767 +0900 @@ -614,12 +614,13 @@ static inline struct page *kmalloc_secti /* This will make the necessary allocations eventually. */ return sparse_mem_map_populate(pnum, nid); } -static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages) +static void __kfree_section_memmap(struct page *page, unsigned long nr_pages) { -return; /* XXX: Not implemented yet */ +vmemmap_kfree(page, nr_pages); Hmm, I think you try to free the memory allocated in kmalloc_section_memmap(). Yes. } static void free_map_bootmem(struct page *page, unsigned long nr_pages) { +vmemmap_free_bootmem(page, nr_pages); } Hmm, which function is the memory you try to free allocated in? The function try to free memory allocated from bootmem. The memory has been registered by get_page_bootmem(). So we can free the memory by put_page_bootmem(). #else static struct page *__kmalloc_section_memmap(unsigned long nr_pages) Index: linux-3.5-rc4/arch/x86/mm/init_64.c === --- linux-3.5-rc4.orig/arch/x86/mm/init_64.c 2012-07-03 14:22:18.538011465 +0900 +++ linux-3.5-rc4/arch/x86/mm/init_64.c 2012-07-03 14:22:21.007983103 +0900 @@ -978,6 +978,97 @@ vmemmap_populate(struct page *start_page return 0; } +unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long end, + struct page **pp) +{ +pgd_t *pgd; +pud_t *pud; +pmd_t *pmd; +pte_t *pte; +unsigned long next; + +*pp = NULL; + +pgd = pgd_offset_k(addr); +if (pgd_none(*pgd)) +return (addr + PAGE_SIZE) PAGE_MASK; Hmm, why not goto next pgd? Does it mean return (addr + PGDIR_SIZE) PGDIR_MASK? + +pud = pud_offset(pgd, addr); +if (pud_none(*pud)) +return (addr + PAGE_SIZE) PAGE_MASK; + +if (!cpu_has_pse) { +next = (addr + PAGE_SIZE) PAGE_MASK; +pmd = pmd_offset(pud, addr); +if (pmd_none(*pmd)) +return next; + +pte = pte_offset_kernel(pmd, addr); +if (pte_none(*pte)) +return next; + +*pp = pte_page(*pte); +pte_clear(init_mm, addr, pte); I think you should flush tlb here. Thanks, I'll update it. +} else { +next = pmd_addr_end(addr, end); + +pmd = pmd_offset(pud, addr); +if (pmd_none(*pmd)) +return next; + +*pp = pmd_page(*pmd); +pmd_clear(pmd); +} + +return next; +} + +void __meminit +vmemmap_kfree(struct page *memmap, unsigned long nr_pages) +{ +unsigned long addr = (unsigned long)memmap; +unsigned long end = (unsigned long)(memmap + nr_pages); +unsigned long next; +unsigned int order; +struct page *page; + +for (; addr end; addr = next) { +page = NULL
Re: [RFC PATCH v2 4/13] memory-hotplug : remove /sys/firmware/memmap/X sysfs
Hi Wen, 2012/07/06 18:20, Wen Congyang wrote: At 07/06/2012 04:27 PM, Yasuaki Ishimatsu Wrote: Hi Wen, 2012/07/04 19:01, Wen Congyang wrote: At 07/04/2012 01:52 PM, Yasuaki Ishimatsu Wrote: Hi Wen, 2012/07/04 14:08, Wen Congyang wrote: At 07/04/2012 12:45 PM, Yasuaki Ishimatsu Wrote: Hi Wen, 2012/07/03 15:35, Wen Congyang wrote: At 07/03/2012 01:56 PM, Yasuaki Ishimatsu Wrote: When (hot)adding memory into system, /sys/firmware/memmap/X/{end, start, type} sysfs files are created. But there is no code to remove these files. The patch implements the function to remove them. Note : The code does not free firmware_map_entry since there is no way to free memory which is allocated by bootmem. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- drivers/firmware/memmap.c| 70 +++ include/linux/firmware-map.h |6 +++ mm/memory_hotplug.c |6 +++ 3 files changed, 81 insertions(+), 1 deletion(-) Index: linux-3.5-rc4/mm/memory_hotplug.c === --- linux-3.5-rc4.orig/mm/memory_hotplug.c 2012-07-03 14:22:00.190240794 +0900 +++ linux-3.5-rc4/mm/memory_hotplug.c 2012-07-03 14:22:03.549198802 +0900 @@ -661,7 +661,11 @@ EXPORT_SYMBOL_GPL(add_memory); int remove_memory(int nid, u64 start, u64 size) { - return -EBUSY; + lock_memory_hotplug(); + /* remove memmap entry */ + firmware_map_remove(start, start + size - 1, System RAM); + unlock_memory_hotplug(); + return 0; } EXPORT_SYMBOL_GPL(remove_memory); Index: linux-3.5-rc4/include/linux/firmware-map.h === --- linux-3.5-rc4.orig/include/linux/firmware-map.h2012-07-03 14:21:45.766421116 +0900 +++ linux-3.5-rc4/include/linux/firmware-map.h 2012-07-03 14:22:03.550198789 +0900 @@ -25,6 +25,7 @@ int firmware_map_add_early(u64 start, u64 end, const char *type); int firmware_map_add_hotplug(u64 start, u64 end, const char *type); +int firmware_map_remove(u64 start, u64 end, const char *type); #else /* CONFIG_FIRMWARE_MEMMAP */ @@ -38,6 +39,11 @@ static inline int firmware_map_add_hotpl return 0; } +static inline int firmware_map_remove(u64 start, u64 end, const char *type) +{ + return 0; +} + #endif /* CONFIG_FIRMWARE_MEMMAP */ #endif /* _LINUX_FIRMWARE_MAP_H */ Index: linux-3.5-rc4/drivers/firmware/memmap.c === --- linux-3.5-rc4.orig/drivers/firmware/memmap.c 2012-07-03 14:21:45.761421180 +0900 +++ linux-3.5-rc4/drivers/firmware/memmap.c2012-07-03 14:22:03.569198549 +0900 @@ -79,7 +79,16 @@ static const struct sysfs_ops memmap_att .show = memmap_attr_show, }; +static void release_firmware_map_entry(struct kobject *kobj) +{ + /* + * FIXME : There is no idea. + * How to free the entry which allocated bootmem? + */ I find a function free_bootmem(), but I am not sure whether it can work here. It cannot work here. Another problem: how to check whether the entry uses bootmem? When firmware_map_entry is allocated by kzalloc(), the page has PG_slab. This is not true. In my test, I find the page does not have PG_slab sometimes. I think that it depends on the allocated size. firmware_map_entry size is smaller than PAGE_SIZE. So the page has PG_Slab. In my test, I add printk in the function firmware_map_add_hotplug() to display page's flags. And sometimes the page is not allocated by slab(I use PageSlab() to verify it). How did you check it? Could you send your debug patch? When the memory is not allocated from slab, the flags is 0x108000. Thank you for sending the patch. I think the page to not have PageSlab is a compound page. So we can check whether the entry is allocate from bootmem or not as follow: static void release_firmware_map_entry(struct kobject *kobj) { struct firmware_map_entry *entry = to_memmap_entry(kobj); struct page *head_page; head_page = virt_to_head_page(entry); if (PageSlab(head_page)) kfree(etnry); else /* the entry is allocated from bootmem */ } Thanks, Yasuaki Ishimatsu From 8dd51368d6c03edf7edc89cab17441e3741c39c7 Mon Sep 17 00:00:00 2001 From: Wen Congyang we...@cn.fujitsu.com Date: Wed, 4 Jul 2012 16:05:26 +0800 Subject: [PATCH] debug
[RFC PATCH v3 0/13] memory-hotplug : hot-remove physical memory
This patch series aims to support physical memory hot-remove. [RFC PATCH v3 1/13] memory-hotplug : rename remove_memory to offline_memory [RFC PATCH v3 2/13] memory-hotplug : add physical memory hotplug code to acpi_memory_device_remove [RFC PATCH v3 3/13] memory-hotplug : unify argument of firmware_map_add_early/hotplug [RFC PATCH v3 4/13] memory-hotplug : remove /sys/firmware/memmap/X sysfs [RFC PATCH v3 5/13] memory-hotplug : does not release memory region in PAGES_PER_SECTION chunks [RFC PATCH v3 6/13] memory-hotplug : add memory_block_release [RFC PATCH v3 7/13] memory-hotplug : remove_memory calls __remove_pages [RFC PATCH v3 8/13] memory-hotplug : check page type in get_page_bootmem [RFC PATCH v3 9/13] memory-hotplug : move register_page_bootmem_info_node and put_page_bootmem for sparse-vmemmap [RFC PATCH v3 10/13] memory-hotplug : implement register_page_bootmem_info_section of sparse-vmemmap [RFC PATCH v3 11/13] memory-hotplug : free memmap of sparse-vmemmap [RFC PATCH v3 12/13] memory-hotplug : add node_device_release [RFC PATCH v3 13/13] memory-hotplug : remove sysfs file of node Even if you apply these patches, you cannot remove the physical memory completely since these patches are still under development. I want you to cooperate to improve the physical memory hot-remove. So please review these patches and give your comment/idea. The patches can free/remove following things: - acpi_memory_info : [RFC PATCH 2/13] - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 4/13] - iomem_resource: [RFC PATCH 5/13] - mem_section and related sysfs files : [RFC PATCH 6-11/13] - node and related sysfs files : [RFC PATCH 12-13/13] The patches cannot do following things yet: - page table of removed memory If you find lack of function for physical memory hot-remove, please let me know. change log of v3: * rebase to 3.5.0-rc6 [RFC PATCH v2 2/13] * remove extra kobject_put() * The patch was commented by Wen. Wen's comment is acpi_memory_device_remove() should ignore a return value of remove_memory() since caller does not care the return value. But I did not change it since I think caller should care the return value. And I am trying to fix it as follow: https://lkml.org/lkml/2012/7/5/624 [RFC PATCH v2 4/13] * remove a firmware_memmap_entry allocated by kzmalloc() change log of v2: [RFC PATCH v2 2/13] * check whether memory block is offline or not before calling offline_memory() * check whether section is valid or not in is_memblk_offline() * call kobject_put() for each memory_block in is_memblk_offline() [RFC PATCH v2 3/13] * unify the end argument of firmware_map_add_early/hotplug [RFC PATCH v2 4/13] * add release_firmware_map_entry() for freeing firmware_map_entry [RFC PATCH v2 6/13] * add release_memory_block() for freeing memory_block [RFC PATCH v2 11/13] * fix wrong arguments of free_pages() --- arch/powerpc/platforms/pseries/hotplug-memory.c | 16 +- arch/x86/mm/init_64.c | 144 drivers/acpi/acpi_memhotplug.c | 28 drivers/base/memory.c | 54 - drivers/base/node.c |7 + drivers/firmware/memmap.c | 78 - include/linux/firmware-map.h|6 + include/linux/memory.h |5 include/linux/memory_hotplug.h | 17 -- include/linux/mm.h |5 mm/memory_hotplug.c | 98 mm/sparse.c |5 12 files changed, 414 insertions(+), 49 deletions(-) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v3 1/13] memory-hotplug : rename remove_memory to offline_memory
remove_memory() does not remove memory but just offlines memory. The patch changes name of it to offline_memory(). CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- drivers/acpi/acpi_memhotplug.c |2 +- drivers/base/memory.c |4 ++-- include/linux/memory_hotplug.h |2 +- mm/memory_hotplug.c|6 +++--- 4 files changed, 7 insertions(+), 7 deletions(-) Index: linux-3.5-rc4/drivers/acpi/acpi_memhotplug.c === --- linux-3.5-rc4.orig/drivers/acpi/acpi_memhotplug.c 2012-07-03 14:21:46.102416917 +0900 +++ linux-3.5-rc4/drivers/acpi/acpi_memhotplug.c2012-07-03 14:21:49.458374960 +0900 @@ -318,7 +318,7 @@ static int acpi_memory_disable_device(st */ list_for_each_entry_safe(info, n, mem_device-res_list, list) { if (info-enabled) { - result = remove_memory(info-start_addr, info-length); + result = offline_memory(info-start_addr, info-length); if (result) return result; } Index: linux-3.5-rc4/drivers/base/memory.c === --- linux-3.5-rc4.orig/drivers/base/memory.c2012-07-03 14:21:46.095417003 +0900 +++ linux-3.5-rc4/drivers/base/memory.c 2012-07-03 14:21:49.459374948 +0900 @@ -266,8 +266,8 @@ memory_block_action(unsigned long phys_i break; case MEM_OFFLINE: start_paddr = page_to_pfn(first_page) PAGE_SHIFT; - ret = remove_memory(start_paddr, - nr_pages PAGE_SHIFT); + ret = offline_memory(start_paddr, +nr_pages PAGE_SHIFT); break; default: WARN(1, KERN_WARNING %s(%ld, %ld) unknown action: Index: linux-3.5-rc4/mm/memory_hotplug.c === --- linux-3.5-rc4.orig/mm/memory_hotplug.c 2012-07-03 14:21:46.102416917 +0900 +++ linux-3.5-rc4/mm/memory_hotplug.c 2012-07-03 14:21:49.466374860 +0900 @@ -990,7 +990,7 @@ out: return ret; } -int remove_memory(u64 start, u64 size) +int offline_memory(u64 start, u64 size) { unsigned long start_pfn, end_pfn; @@ -999,9 +999,9 @@ int remove_memory(u64 start, u64 size) return offline_pages(start_pfn, end_pfn, 120 * HZ); } #else -int remove_memory(u64 start, u64 size) +int offline_memory(u64 start, u64 size) { return -EINVAL; } #endif /* CONFIG_MEMORY_HOTREMOVE */ -EXPORT_SYMBOL_GPL(remove_memory); +EXPORT_SYMBOL_GPL(offline_memory); Index: linux-3.5-rc4/include/linux/memory_hotplug.h === --- linux-3.5-rc4.orig/include/linux/memory_hotplug.h 2012-07-03 14:21:46.102416917 +0900 +++ linux-3.5-rc4/include/linux/memory_hotplug.h2012-07-03 14:21:49.471374796 +0900 @@ -233,7 +233,7 @@ static inline int is_mem_section_removab extern int mem_online_node(int nid); extern int add_memory(int nid, u64 start, u64 size); extern int arch_add_memory(int nid, u64 start, u64 size); -extern int remove_memory(u64 start, u64 size); +extern int offline_memory(u64 start, u64 size); extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn, int nr_pages); extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v3 2/13] memory-hotplug : add physical memory hotplug code to acpi_memory_device_remove
acpi_memory_device_remove() has been prepared to remove physical memory. But, the function only frees acpi_memory_device currentlry. The patch adds following functions into acpi_memory_device_remove(): - offline memory - remove physical memory (only return -EBUSY) - free acpi_memory_device CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- drivers/acpi/acpi_memhotplug.c | 26 +- drivers/base/memory.c | 39 +++ include/linux/memory.h |5 + include/linux/memory_hotplug.h |1 + mm/memory_hotplug.c|8 5 files changed, 78 insertions(+), 1 deletion(-) Index: linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c === --- linux-3.5-rc6.orig/drivers/acpi/acpi_memhotplug.c 2012-07-09 18:08:29.946888653 +0900 +++ linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c2012-07-09 18:08:43.470719531 +0900 @@ -29,6 +29,7 @@ #include linux/module.h #include linux/init.h #include linux/types.h +#include linux/memory.h #include linux/memory_hotplug.h #include linux/slab.h #include acpi/acpi_drivers.h @@ -452,12 +453,35 @@ static int acpi_memory_device_add(struct static int acpi_memory_device_remove(struct acpi_device *device, int type) { struct acpi_memory_device *mem_device = NULL; - + struct acpi_memory_info *info, *tmp; + int result; + int node; if (!device || !acpi_driver_data(device)) return -EINVAL; mem_device = acpi_driver_data(device); + + node = acpi_get_node(mem_device-device-handle); + + list_for_each_entry_safe(info, tmp, mem_device-res_list, list) { + if (!info-enabled) + continue; + + if (!is_memblk_offline(info-start_addr, info-length)) { + result = offline_memory(info-start_addr, info-length); + if (result) + return result; + } + + result = remove_memory(node, info-start_addr, info-length); + if (result) + return result; + + list_del(info-list); + kfree(info); + } + kfree(mem_device); return 0; Index: linux-3.5-rc6/include/linux/memory_hotplug.h === --- linux-3.5-rc6.orig/include/linux/memory_hotplug.h 2012-07-09 18:08:29.955888542 +0900 +++ linux-3.5-rc6/include/linux/memory_hotplug.h2012-07-09 18:08:43.471719518 +0900 @@ -233,6 +233,7 @@ static inline int is_mem_section_removab extern int mem_online_node(int nid); extern int add_memory(int nid, u64 start, u64 size); extern int arch_add_memory(int nid, u64 start, u64 size); +extern int remove_memory(int nid, u64 start, u64 size); extern int offline_memory(u64 start, u64 size); extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn, int nr_pages); Index: linux-3.5-rc6/mm/memory_hotplug.c === --- linux-3.5-rc6.orig/mm/memory_hotplug.c 2012-07-09 18:08:29.953888567 +0900 +++ linux-3.5-rc6/mm/memory_hotplug.c 2012-07-09 18:08:43.476719455 +0900 @@ -659,6 +659,14 @@ out: } EXPORT_SYMBOL_GPL(add_memory); +int remove_memory(int nid, u64 start, u64 size) +{ + return -EBUSY; + +} +EXPORT_SYMBOL_GPL(remove_memory); + + #ifdef CONFIG_MEMORY_HOTREMOVE /* * A free page on the buddy free lists (not the per-cpu lists) has PageBuddy Index: linux-3.5-rc6/drivers/base/memory.c === --- linux-3.5-rc6.orig/drivers/base/memory.c2012-07-09 18:08:29.947888640 +0900 +++ linux-3.5-rc6/drivers/base/memory.c 2012-07-09 18:10:54.880076739 +0900 @@ -70,6 +70,45 @@ void unregister_memory_isolate_notifier( } EXPORT_SYMBOL(unregister_memory_isolate_notifier); +bool is_memblk_offline(unsigned long start, unsigned long size) +{ + struct memory_block *mem = NULL; + struct mem_section *section; + unsigned long start_pfn, end_pfn; + unsigned long pfn, section_nr; + + start_pfn = PFN_DOWN(start); + end_pfn = start_pfn + PFN_DOWN(start); + + for (pfn = start_pfn; pfn end_pfn; pfn += PAGES_PER_SECTION) { + section_nr = pfn_to_section_nr(pfn); + if (!present_section_nr(section_nr
[RFC PATCH v3 3/13] memory-hotplug : unify argument of firmware_map_add_early/hotplug
There are two ways to create /sys/firmware/memmap/X sysfs: - firmware_map_add_early When the system starts, it is calledd from e820_reserve_resources() - firmware_map_add_hotplug When the memory is hot plugged, it is called from add_memory() But these functions are called without unifying value of end argument as below: - end argument of firmware_map_add_early() : start + size - 1 - end argument of firmware_map_add_hogplug() : start + size The patch unifies them to start + size - 1. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- mm/memory_hotplug.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-3.5-rc6/mm/memory_hotplug.c === --- linux-3.5-rc6.orig/mm/memory_hotplug.c 2012-07-09 18:08:43.476719455 +0900 +++ linux-3.5-rc6/mm/memory_hotplug.c 2012-07-09 18:13:57.664791810 +0900 @@ -642,7 +642,7 @@ int __ref add_memory(int nid, u64 start, } /* create new memmap entry */ - firmware_map_add_hotplug(start, start + size, System RAM); + firmware_map_add_hotplug(start, start + size - 1, System RAM); goto out; ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v3 4/13] memory-hotplug : remove /sys/firmware/memmap/X sysfs
When (hot)adding memory into system, /sys/firmware/memmap/X/{end, start, type} sysfs files are created. But there is no code to remove these files. The patch implements the function to remove them. Note : The code does not free firmware_map_entry since there is no way to free memory which is allocated by bootmem. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- drivers/firmware/memmap.c| 78 ++- include/linux/firmware-map.h |6 +++ mm/memory_hotplug.c |6 ++- 3 files changed, 88 insertions(+), 2 deletions(-) Index: linux-3.5-rc6/mm/memory_hotplug.c === --- linux-3.5-rc6.orig/mm/memory_hotplug.c 2012-07-09 18:23:13.323844923 +0900 +++ linux-3.5-rc6/mm/memory_hotplug.c 2012-07-09 18:23:19.522767424 +0900 @@ -661,7 +661,11 @@ EXPORT_SYMBOL_GPL(add_memory); int remove_memory(int nid, u64 start, u64 size) { - return -EBUSY; + lock_memory_hotplug(); + /* remove memmap entry */ + firmware_map_remove(start, start + size - 1, System RAM); + unlock_memory_hotplug(); + return 0; } EXPORT_SYMBOL_GPL(remove_memory); Index: linux-3.5-rc6/include/linux/firmware-map.h === --- linux-3.5-rc6.orig/include/linux/firmware-map.h 2012-07-09 18:23:09.532892314 +0900 +++ linux-3.5-rc6/include/linux/firmware-map.h 2012-07-09 18:23:19.523767412 +0900 @@ -25,6 +25,7 @@ int firmware_map_add_early(u64 start, u64 end, const char *type); int firmware_map_add_hotplug(u64 start, u64 end, const char *type); +int firmware_map_remove(u64 start, u64 end, const char *type); #else /* CONFIG_FIRMWARE_MEMMAP */ @@ -38,6 +39,11 @@ static inline int firmware_map_add_hotpl return 0; } +static inline int firmware_map_remove(u64 start, u64 end, const char *type) +{ + return 0; +} + #endif /* CONFIG_FIRMWARE_MEMMAP */ #endif /* _LINUX_FIRMWARE_MAP_H */ Index: linux-3.5-rc6/drivers/firmware/memmap.c === --- linux-3.5-rc6.orig/drivers/firmware/memmap.c2012-07-09 18:23:09.532892314 +0900 +++ linux-3.5-rc6/drivers/firmware/memmap.c 2012-07-09 18:25:46.371931554 +0900 @@ -21,6 +21,7 @@ #include linux/types.h #include linux/bootmem.h #include linux/slab.h +#include linux/mm.h /* * Data types -- @@ -79,7 +80,22 @@ static const struct sysfs_ops memmap_att .show = memmap_attr_show, }; +#define to_memmap_entry(obj) container_of(obj, struct firmware_map_entry, kobj) + +static void release_firmware_map_entry(struct kobject *kobj) +{ + struct firmware_map_entry *entry = to_memmap_entry(kobj); + struct page *head_page; + + head_page = virt_to_head_page(entry); + if (PageSlab(head_page)) + kfree(entry); + + /* There is no way to free memory allocated from bootmem*/ +} + static struct kobj_type memmap_ktype = { + .release= release_firmware_map_entry, .sysfs_ops = memmap_attr_ops, .default_attrs = def_attrs, }; @@ -123,6 +139,16 @@ static int firmware_map_add_entry(u64 st return 0; } +/** + * firmware_map_remove_entry() - Does the real work to remove a firmware + * memmap entry. + * @entry: removed entry. + **/ +static inline void firmware_map_remove_entry(struct firmware_map_entry *entry) +{ + list_del(entry-list); +} + /* * Add memmap entry on sysfs */ @@ -144,6 +170,31 @@ static int add_sysfs_fw_map_entry(struct return 0; } +/* + * Remove memmap entry on sysfs + */ +static inline void remove_sysfs_fw_map_entry(struct firmware_map_entry *entry) +{ + kobject_put(entry-kobj); +} + +/* + * Search memmap entry + */ + +struct firmware_map_entry * __meminit +find_firmware_map_entry(u64 start, u64 end, const char *type) +{ + struct firmware_map_entry *entry; + + list_for_each_entry(entry, map_entries, list) + if ((entry-start == start) (entry-end == end) + (!strcmp(entry-type, type))) + return entry; + + return NULL; +} + /** * firmware_map_add_hotplug() - Adds a firmware mapping entry when we do * memory hotplug. @@ -196,6 +247,32 @@ int __init firmware_map_add_early(u64 st return firmware_map_add_entry(start, end, type, entry); } +/** + * firmware_map_remove() - remove a firmware mapping entry + * @start: Start
[RFC PATCH v3 5/13] memory-hotplug : does not release memory region in PAGES_PER_SECTION chunks
Since applying a patch(de7f0cba96786c), release_mem_region() has been changed as called in PAGES_PER_SECTION chunks because register_memory_resource() is called in PAGES_PER_SECTION chunks by add_memory(). But it seems firmware dependency. If CRS are written in the PAGES_PER_SECTION chunks in ACPI DSDT Table, register_memory_resource() is called in PAGES_PER_SECTION chunks. But if CRS are written in the DIMM unit in ACPI DSDT Table, register_memory_resource() is called in DIMM unit. So release_mem_region() should not be called in PAGES_PER_SECTION chunks. The patch fixes it. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- arch/powerpc/platforms/pseries/hotplug-memory.c | 13 + mm/memory_hotplug.c |4 ++-- 2 files changed, 11 insertions(+), 6 deletions(-) Index: linux-3.5-rc4/mm/memory_hotplug.c === --- linux-3.5-rc4.orig/mm/memory_hotplug.c 2012-07-03 14:22:03.549198802 +0900 +++ linux-3.5-rc4/mm/memory_hotplug.c 2012-07-03 14:22:05.919169458 +0900 @@ -358,11 +358,11 @@ int __remove_pages(struct zone *zone, un BUG_ON(phys_start_pfn ~PAGE_SECTION_MASK); BUG_ON(nr_pages % PAGES_PER_SECTION); + release_mem_region(phys_start_pfn PAGE_SHIFT, nr_pages * PAGE_SIZE); + sections_to_remove = nr_pages / PAGES_PER_SECTION; for (i = 0; i sections_to_remove; i++) { unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION; - release_mem_region(pfn PAGE_SHIFT, - PAGES_PER_SECTION PAGE_SHIFT); ret = __remove_section(zone, __pfn_to_section(pfn)); if (ret) break; Index: linux-3.5-rc4/arch/powerpc/platforms/pseries/hotplug-memory.c === --- linux-3.5-rc4.orig/arch/powerpc/platforms/pseries/hotplug-memory.c 2012-07-03 14:21:45.641422678 +0900 +++ linux-3.5-rc4/arch/powerpc/platforms/pseries/hotplug-memory.c 2012-07-03 14:22:05.920169437 +0900 @@ -77,7 +77,8 @@ static int pseries_remove_memblock(unsig { unsigned long start, start_pfn; struct zone *zone; - int ret; + int i, ret; + int sections_to_remove; start_pfn = base PAGE_SHIFT; @@ -97,9 +98,13 @@ static int pseries_remove_memblock(unsig * to sysfs state file and we can't remove sysfs entries * while writing to it. So we have to defer it to here. */ - ret = __remove_pages(zone, start_pfn, memblock_size PAGE_SHIFT); - if (ret) - return ret; + sections_to_remove = (memblock_size PAGE_SHIFT) / PAGES_PER_SECTION; + for (i = 0; i sections_to_remove; i++) { + unsigned long pfn = start_pfn + i * PAGES_PER_SECTION; + ret = __remove_pages(zone, start_pfn, PAGES_PER_SECTION); + if (ret) + return ret; + } /* * Update memory regions for memory remove ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v3 6/13] memory-hotplug : add memory_block_release
When calling remove_memory_block(), the function shows following message at device_release(). Device 'memory528' does not have a release() function, it is broken and must be fixed. remove_memory_block() calls kfree(mem). I think it shouled be called from device_release(). So the patch implements memory_block_release() CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- drivers/base/memory.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) Index: linux-3.5-rc6/drivers/base/memory.c === --- linux-3.5-rc6.orig/drivers/base/memory.c2012-07-09 18:10:54.880076739 +0900 +++ linux-3.5-rc6/drivers/base/memory.c 2012-07-09 18:19:20.471755922 +0900 @@ -109,6 +109,15 @@ bool is_memblk_offline(unsigned long sta } EXPORT_SYMBOL(is_memblk_offline); +#define to_memory_block(device) container_of(device, struct memory_block, dev) + +static void release_memory_block(struct device *dev) +{ + struct memory_block *mem = to_memory_block(dev); + + kfree(mem); +} + /* * register_memory - Setup a sysfs device for a memory block */ @@ -119,6 +128,7 @@ int register_memory(struct memory_block memory-dev.bus = memory_subsys; memory-dev.id = memory-start_section_nr / sections_per_block; + memory-dev.release = release_memory_block; error = device_register(memory-dev); return error; @@ -669,7 +679,6 @@ int remove_memory_block(unsigned long no mem_remove_simple_file(mem, phys_device); mem_remove_simple_file(mem, removable); unregister_memory(mem); - kfree(mem); } else kobject_put(mem-dev.kobj); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v3 7/13] memory-hotplug : remove_memory calls __remove_pages
The patch adds __remove_pages() to remove_memory(). Then the range of phys_start_pfn argument and nr_pages argument in __remove_pagse() may have different zone. So zone argument is removed from __remove_pages() and __remove_pages() caluculates zone in each section. When CONFIG_SPARSEMEM_VMEMMAP is defined, there is no way to remove a memmap. So __remove_section only calls unregister_memory_section(). CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- arch/powerpc/platforms/pseries/hotplug-memory.c |5 + include/linux/memory_hotplug.h |3 +-- mm/memory_hotplug.c | 20 +--- 3 files changed, 15 insertions(+), 13 deletions(-) Index: linux-3.5-rc4/mm/memory_hotplug.c === --- linux-3.5-rc4.orig/mm/memory_hotplug.c 2012-07-03 14:22:05.919169458 +0900 +++ linux-3.5-rc4/mm/memory_hotplug.c 2012-07-03 14:22:10.170116406 +0900 @@ -275,11 +275,14 @@ static int __meminit __add_section(int n #ifdef CONFIG_SPARSEMEM_VMEMMAP static int __remove_section(struct zone *zone, struct mem_section *ms) { - /* -* XXX: Freeing memmap with vmemmap is not implement yet. -* This should be removed later. -*/ - return -EBUSY; + int ret; + + if (!valid_section(ms)) + return ret; + + ret = unregister_memory_section(ms); + + return ret; } #else static int __remove_section(struct zone *zone, struct mem_section *ms) @@ -346,11 +349,11 @@ EXPORT_SYMBOL_GPL(__add_pages); * sure that pages are marked reserved and zones are adjust properly by * calling offline_pages(). */ -int __remove_pages(struct zone *zone, unsigned long phys_start_pfn, -unsigned long nr_pages) +int __remove_pages(unsigned long phys_start_pfn, unsigned long nr_pages) { unsigned long i, ret = 0; int sections_to_remove; + struct zone *zone; /* * We can only remove entire sections @@ -363,6 +366,7 @@ int __remove_pages(struct zone *zone, un sections_to_remove = nr_pages / PAGES_PER_SECTION; for (i = 0; i sections_to_remove; i++) { unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION; + zone = page_zone(pfn_to_page(pfn)); ret = __remove_section(zone, __pfn_to_section(pfn)); if (ret) break; @@ -664,6 +668,8 @@ int remove_memory(int nid, u64 start, u6 lock_memory_hotplug(); /* remove memmap entry */ firmware_map_remove(start, start + size - 1, System RAM); + + __remove_pages(start PAGE_SHIFT, size PAGE_SHIFT); unlock_memory_hotplug(); return 0; Index: linux-3.5-rc4/include/linux/memory_hotplug.h === --- linux-3.5-rc4.orig/include/linux/memory_hotplug.h 2012-07-03 14:21:58.330264047 +0900 +++ linux-3.5-rc4/include/linux/memory_hotplug.h2012-07-03 14:22:10.170116406 +0900 @@ -89,8 +89,7 @@ extern bool is_pageblock_removable_noloc /* reasonably generic interface to expand the physical pages in a zone */ extern int __add_pages(int nid, struct zone *zone, unsigned long start_pfn, unsigned long nr_pages); -extern int __remove_pages(struct zone *zone, unsigned long start_pfn, - unsigned long nr_pages); +extern int __remove_pages(unsigned long start_pfn, unsigned long nr_pages); #ifdef CONFIG_NUMA extern int memory_add_physaddr_to_nid(u64 start); Index: linux-3.5-rc4/arch/powerpc/platforms/pseries/hotplug-memory.c === --- linux-3.5-rc4.orig/arch/powerpc/platforms/pseries/hotplug-memory.c 2012-07-03 14:22:05.920169437 +0900 +++ linux-3.5-rc4/arch/powerpc/platforms/pseries/hotplug-memory.c 2012-07-03 14:22:10.172116353 +0900 @@ -76,7 +76,6 @@ unsigned long memory_block_size_bytes(vo static int pseries_remove_memblock(unsigned long base, unsigned int memblock_size) { unsigned long start, start_pfn; - struct zone *zone; int i, ret; int sections_to_remove; @@ -87,8 +86,6 @@ static int pseries_remove_memblock(unsig return 0; } - zone = page_zone(pfn_to_page(start_pfn)); - /* * Remove section mappings and sysfs entries for the * section of the memory we are removing. @@ -101,7 +98,7 @@ static int pseries_remove_memblock(unsig sections_to_remove = (memblock_size PAGE_SHIFT
[RFC PATCH v3 8/13] memory-hotplug : check page type in get_page_bootmem
There is a possibility that get_page_bootmem() is called to the same page many times. So when get_page_bootmem is called to the same page, the function only increments page-_count. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- mm/memory_hotplug.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) Index: linux-3.5-rc4/mm/memory_hotplug.c === --- linux-3.5-rc4.orig/mm/memory_hotplug.c 2012-07-03 14:22:10.170116406 +0900 +++ linux-3.5-rc4/mm/memory_hotplug.c 2012-07-03 14:22:12.299089413 +0900 @@ -95,10 +95,17 @@ static void release_memory_resource(stru static void get_page_bootmem(unsigned long info, struct page *page, unsigned long type) { - page-lru.next = (struct list_head *) type; - SetPagePrivate(page); - set_page_private(page, info); - atomic_inc(page-_count); + unsigned long page_type; + + page_type = (unsigned long) page-lru.next; + if (type MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE || + type MEMORY_HOTPLUG_MAX_BOOTMEM_TYPE){ + page-lru.next = (struct list_head *) type; + SetPagePrivate(page); + set_page_private(page, info); + atomic_inc(page-_count); + } else + atomic_inc(page-_count); } /* reference to __meminit __free_pages_bootmem is valid ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v3 9/13] memory-hotplug : move register_page_bootmem_info_node and put_page_bootmem for sparse-vmemmap
For implementing register_page_bootmem_info_node of sparse-vmemmap, register_page_bootmem_info_node and put_page_bootmem are moved to memory_hotplug.c CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- include/linux/memory_hotplug.h |9 - mm/memory_hotplug.c|8 ++-- 2 files changed, 6 insertions(+), 11 deletions(-) Index: linux-3.5-rc4/include/linux/memory_hotplug.h === --- linux-3.5-rc4.orig/include/linux/memory_hotplug.h 2012-07-03 14:22:10.170116406 +0900 +++ linux-3.5-rc4/include/linux/memory_hotplug.h2012-07-03 14:22:14.409063086 +0900 @@ -160,17 +160,8 @@ static inline void arch_refresh_nodedata #endif /* CONFIG_NUMA */ #endif /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */ -#ifdef CONFIG_SPARSEMEM_VMEMMAP -static inline void register_page_bootmem_info_node(struct pglist_data *pgdat) -{ -} -static inline void put_page_bootmem(struct page *page) -{ -} -#else extern void register_page_bootmem_info_node(struct pglist_data *pgdat); extern void put_page_bootmem(struct page *page); -#endif /* * Lock for memory hotplug guarantees 1) all callbacks for memory hotplug Index: linux-3.5-rc4/mm/memory_hotplug.c === --- linux-3.5-rc4.orig/mm/memory_hotplug.c 2012-07-03 14:22:12.299089413 +0900 +++ linux-3.5-rc4/mm/memory_hotplug.c 2012-07-03 14:22:14.419062959 +0900 @@ -91,7 +91,6 @@ static void release_memory_resource(stru } #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE -#ifndef CONFIG_SPARSEMEM_VMEMMAP static void get_page_bootmem(unsigned long info, struct page *page, unsigned long type) { @@ -127,6 +126,7 @@ void __ref put_page_bootmem(struct page } +#ifndef CONFIG_SPARSEMEM_VMEMMAP static void register_page_bootmem_info_section(unsigned long start_pfn) { unsigned long *usemap, mapsize, section_nr, i; @@ -163,6 +163,11 @@ static void register_page_bootmem_info_s get_page_bootmem(section_nr, page, MIX_SECTION_INFO); } +#else +static inline void register_page_bootmem_info_section(unsigned long start_pfn) +{ +} +#endif void register_page_bootmem_info_node(struct pglist_data *pgdat) { @@ -198,7 +203,6 @@ void register_page_bootmem_info_node(str register_page_bootmem_info_section(pfn); } -#endif /* !CONFIG_SPARSEMEM_VMEMMAP */ static void grow_zone_span(struct zone *zone, unsigned long start_pfn, unsigned long end_pfn) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v3 10/13] memory-hotplug : implement register_page_bootmem_info_section of sparse-vmemmap
For removing memmap region of sparse-vmemmap which is allocated bootmem, memmap region of sparse-vmemmap needs to be registered by get_page_bootmem(). So the patch searches pages of virtual mapping and registers the pages by get_page_bootmem(). CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- arch/x86/mm/init_64.c | 53 + include/linux/memory_hotplug.h |2 + include/linux/mm.h |3 +- mm/memory_hotplug.c| 23 +++-- 4 files changed, 77 insertions(+), 4 deletions(-) Index: linux-3.5-rc4/mm/memory_hotplug.c === --- linux-3.5-rc4.orig/mm/memory_hotplug.c 2012-07-03 14:22:14.419062959 +0900 +++ linux-3.5-rc4/mm/memory_hotplug.c 2012-07-03 14:22:18.522011667 +0900 @@ -91,8 +91,8 @@ static void release_memory_resource(stru } #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE -static void get_page_bootmem(unsigned long info, struct page *page, -unsigned long type) +void get_page_bootmem(unsigned long info, struct page *page, + unsigned long type) { unsigned long page_type; @@ -164,8 +164,25 @@ static void register_page_bootmem_info_s } #else -static inline void register_page_bootmem_info_section(unsigned long start_pfn) +static void register_page_bootmem_info_section(unsigned long start_pfn) { + unsigned long mapsize, section_nr; + struct mem_section *ms; + struct page *page, *memmap; + + if (!pfn_valid(start_pfn)) + return; + + section_nr = pfn_to_section_nr(start_pfn); + ms = __nr_to_section(section_nr); + + memmap = sparse_decode_mem_map(ms-section_mem_map, section_nr); + + page = virt_to_page(memmap); + mapsize = sizeof(struct page) * PAGES_PER_SECTION; + mapsize = PAGE_ALIGN(mapsize) PAGE_SHIFT; + + register_page_bootmem_memmap(section_nr, memmap, PAGES_PER_SECTION); } #endif Index: linux-3.5-rc4/include/linux/mm.h === --- linux-3.5-rc4.orig/include/linux/mm.h 2012-07-03 14:21:45.223427904 +0900 +++ linux-3.5-rc4/include/linux/mm.h2012-07-03 14:22:18.530011567 +0900 @@ -1586,7 +1586,8 @@ int vmemmap_populate_basepages(struct pa unsigned long pages, int node); int vmemmap_populate(struct page *start_page, unsigned long pages, int node); void vmemmap_populate_print_last(void); - +void register_page_bootmem_memmap(unsigned long section_nr, struct page *map, + unsigned long size); enum mf_flags { MF_COUNT_INCREASED = 1 0, Index: linux-3.5-rc4/arch/x86/mm/init_64.c === --- linux-3.5-rc4.orig/arch/x86/mm/init_64.c2012-07-03 14:21:45.228427843 +0900 +++ linux-3.5-rc4/arch/x86/mm/init_64.c 2012-07-03 14:22:18.538011465 +0900 @@ -978,6 +978,59 @@ vmemmap_populate(struct page *start_page return 0; } +void __meminit +register_page_bootmem_memmap(unsigned long section_nr, struct page *start_page, +unsigned long size) +{ + unsigned long addr = (unsigned long)start_page; + unsigned long end = (unsigned long)(start_page + size); + unsigned long next; + pgd_t *pgd; + pud_t *pud; + pmd_t *pmd; + + for (; addr end; addr = next) { + pte_t *pte = NULL; + + pgd = pgd_offset_k(addr); + if (pgd_none(*pgd)) { + next = (addr + PAGE_SIZE) PAGE_MASK; + continue; + } + get_page_bootmem(section_nr, pgd_page(*pgd), MIX_SECTION_INFO); + + pud = pud_offset(pgd, addr); + if (pud_none(*pud)) { + next = (addr + PAGE_SIZE) PAGE_MASK; + continue; + } + get_page_bootmem(section_nr, pud_page(*pud), MIX_SECTION_INFO); + + if (!cpu_has_pse) { + next = (addr + PAGE_SIZE) PAGE_MASK; + pmd = pmd_offset(pud, addr); + if (pmd_none(*pmd)) + continue; + get_page_bootmem(section_nr, pmd_page(*pmd), +MIX_SECTION_INFO); + + pte = pte_offset_kernel(pmd, addr); + if (pte_none(*pte
[RFC PATCH v3 11/13] memory-hotplug : free memmap of sparse-vmemmap
I don't think that all pages of virtual mapping in removed memory can be freed, since page which type is MIX_SECTION_INFO is difficult to free. So, the patch only frees page which type is SECTION_INFO at first. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- arch/x86/mm/init_64.c | 91 ++ include/linux/mm.h|2 + mm/memory_hotplug.c |5 ++ mm/sparse.c |5 +- 4 files changed, 101 insertions(+), 2 deletions(-) Index: linux-3.5-rc4/include/linux/mm.h === --- linux-3.5-rc4.orig/include/linux/mm.h 2012-07-03 14:22:18.530011567 +0900 +++ linux-3.5-rc4/include/linux/mm.h2012-07-03 14:22:20.83872 +0900 @@ -1588,6 +1588,8 @@ int vmemmap_populate(struct page *start_ void vmemmap_populate_print_last(void); void register_page_bootmem_memmap(unsigned long section_nr, struct page *map, unsigned long size); +void vmemmap_kfree(struct page *memmpa, unsigned long nr_pages); +void vmemmap_free_bootmem(struct page *memmpa, unsigned long nr_pages); enum mf_flags { MF_COUNT_INCREASED = 1 0, Index: linux-3.5-rc4/mm/sparse.c === --- linux-3.5-rc4.orig/mm/sparse.c 2012-07-03 14:21:45.071429805 +0900 +++ linux-3.5-rc4/mm/sparse.c 2012-07-03 14:22:21.000983767 +0900 @@ -614,12 +614,13 @@ static inline struct page *kmalloc_secti /* This will make the necessary allocations eventually. */ return sparse_mem_map_populate(pnum, nid); } -static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages) +static void __kfree_section_memmap(struct page *page, unsigned long nr_pages) { - return; /* XXX: Not implemented yet */ + vmemmap_kfree(page, nr_pages); } static void free_map_bootmem(struct page *page, unsigned long nr_pages) { + vmemmap_free_bootmem(page, nr_pages); } #else static struct page *__kmalloc_section_memmap(unsigned long nr_pages) Index: linux-3.5-rc4/arch/x86/mm/init_64.c === --- linux-3.5-rc4.orig/arch/x86/mm/init_64.c2012-07-03 14:22:18.538011465 +0900 +++ linux-3.5-rc4/arch/x86/mm/init_64.c 2012-07-03 14:22:21.007983103 +0900 @@ -978,6 +978,97 @@ vmemmap_populate(struct page *start_page return 0; } +unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long end, + struct page **pp) +{ + pgd_t *pgd; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + unsigned long next; + + *pp = NULL; + + pgd = pgd_offset_k(addr); + if (pgd_none(*pgd)) + return (addr + PAGE_SIZE) PAGE_MASK; + + pud = pud_offset(pgd, addr); + if (pud_none(*pud)) + return (addr + PAGE_SIZE) PAGE_MASK; + + if (!cpu_has_pse) { + next = (addr + PAGE_SIZE) PAGE_MASK; + pmd = pmd_offset(pud, addr); + if (pmd_none(*pmd)) + return next; + + pte = pte_offset_kernel(pmd, addr); + if (pte_none(*pte)) + return next; + + *pp = pte_page(*pte); + pte_clear(init_mm, addr, pte); + } else { + next = pmd_addr_end(addr, end); + + pmd = pmd_offset(pud, addr); + if (pmd_none(*pmd)) + return next; + + *pp = pmd_page(*pmd); + pmd_clear(pmd); + } + + return next; +} + +void __meminit +vmemmap_kfree(struct page *memmap, unsigned long nr_pages) +{ + unsigned long addr = (unsigned long)memmap; + unsigned long end = (unsigned long)(memmap + nr_pages); + unsigned long next; + unsigned int order; + struct page *page; + + for (; addr end; addr = next) { + page = NULL; + next = find_and_clear_pte_page(addr, end, page); + if (!page) + continue; + + if (is_vmalloc_addr(page_address(page))) + vfree(page_address(page)); + else { + order = next - addr; + free_pages((unsigned long)page_address(page), + get_order(order)); + } + } +} + +void __meminit +vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages) +{ + unsigned long addr
[RFC PATCH v3 12/13] memory-hotplug : add node_device_release
When calling unregister_node(), the function shows following message at device_release(). Device 'node2' does not have a release() function, it is broken and must be fixed. So the patch implements node_device_release() CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- drivers/base/node.c |7 +++ 1 file changed, 7 insertions(+) Index: linux-3.5-rc4/drivers/base/node.c === --- linux-3.5-rc4.orig/drivers/base/node.c 2012-07-03 14:21:44.882432167 +0900 +++ linux-3.5-rc4/drivers/base/node.c 2012-07-03 14:22:23.296951921 +0900 @@ -252,6 +252,12 @@ static inline void hugetlb_register_node static inline void hugetlb_unregister_node(struct node *node) {} #endif +static void node_device_release(struct device *dev) +{ + struct node *node_dev = to_node(dev); + + memset(node_dev, 0, sizeof(struct node)); +} /* * register_node - Setup a sysfs device for a node. @@ -265,6 +271,7 @@ int register_node(struct node *node, int node-dev.id = num; node-dev.bus = node_subsys; + node-dev.release = node_device_release; error = device_register(node-dev); if (!error){ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v3 13/13] memory-hotplug : remove sysfs file of node
The patch adds node_set_offline() and unregister_one_node() to remove_memory() for removing sysfs file of node. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- mm/memory_hotplug.c |5 + 1 file changed, 5 insertions(+) Index: linux-3.5-rc4/mm/memory_hotplug.c === --- linux-3.5-rc4.orig/mm/memory_hotplug.c 2012-07-03 14:22:21.012982694 +0900 +++ linux-3.5-rc4/mm/memory_hotplug.c 2012-07-03 14:22:25.405925554 +0900 @@ -702,6 +702,11 @@ int remove_memory(int nid, u64 start, u6 /* remove memmap entry */ firmware_map_remove(start, start + size - 1, System RAM); + if (!node_present_pages(nid)) { + node_set_offline(nid); + unregister_one_node(nid); + } + __remove_pages(start PAGE_SHIFT, size PAGE_SHIFT); unlock_memory_hotplug(); return 0; ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev