Re: [PATCH v3 0/6] mm: Further memory block device cleanups

2019-06-21 Thread Andrew Morton
On Fri, 21 Jun 2019 20:24:59 +0200 David Hildenbrand  wrote:

> @Qian Cai, unfortunately I can't reproduce.
> 
> If you get the chance, it would be great if you could retry with
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 972c5336bebf..742f99ddd148 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -868,6 +868,9 @@ int walk_memory_blocks(unsigned long start, unsigned
> long size,
> unsigned long block_id;
> int ret = 0;
> 
> +   if (!size)
> +   return;
> +
> for (block_id = start_block_id; block_id <= end_block_id;
> block_id++) {
> mem = find_memory_block_by_id(block_id);
> if (!mem)
> 
> 
> 
> If both, start and size are 0, we would get a vry long loop. This
> would mean that we have an online node that does not span any pages at
> all (pgdat->node_start_pfn = 0, start_pfn + pgdat->node_spanned_pages = 0).

I think I'll make that a `return 0' and I won't drop patches 4-6 for
now, as we appear to have this fixed.



From: David Hildenbrand 
Subject: drivers-base-memoryc-get-rid-of-find_memory_block_hinted-v3-fix

handle zero-length walks

Link: http://lkml.kernel.org/r/1c2edc22-afd7-2211-c4c7-40e54e500...@redhat.com
Reported-by: Qian Cai 
Tested-by: Qian Cai 
Signed-off-by: Andrew Morton 
---

 drivers/base/memory.c |3 +++
 1 file changed, 3 insertions(+)

--- 
a/drivers/base/memory.c~drivers-base-memoryc-get-rid-of-find_memory_block_hinted-v3-fix
+++ a/drivers/base/memory.c
@@ -866,6 +866,9 @@ int walk_memory_blocks(unsigned long sta
unsigned long block_id;
int ret = 0;
 
+   if (!size)
+   return 0;
+
for (block_id = start_block_id; block_id <= end_block_id; block_id++) {
mem = find_memory_block_by_id(block_id);
if (!mem)




Re: [PATCH v3 0/6] mm: Further memory block device cleanups

2019-06-21 Thread Qian Cai
On Fri, 2019-06-21 at 20:24 +0200, David Hildenbrand wrote:
> On 21.06.19 17:15, Qian Cai wrote:
> > On Thu, 2019-06-20 at 20:31 +0200, David Hildenbrand wrote:
> > > @Andrew: Only patch 1, 4 and 6 changed compared to v1.
> > > 
> > > Some further cleanups around memory block devices. Especially, clean up
> > > and simplify walk_memory_range(). Including some other minor cleanups.
> > > 
> > > Compiled + tested on x86 with DIMMs under QEMU. Compile-tested on ppc64.
> > > 
> > > v2 -> v3:
> > > - "mm/memory_hotplug: Rename walk_memory_range() and pass start+size .."
> > > -- Avoid warning on ppc.
> > > - "drivers/base/memory.c: Get rid of find_memory_block_hinted()"
> > > -- Fixup a comment regarding hinted devices.
> > > 
> > > v1 -> v2:
> > > - "mm: Section numbers use the type "unsigned long""
> > > -- "unsigned long i" -> "unsigned long nr", in one case -> "int i"
> > > - "drivers/base/memory.c: Get rid of find_memory_block_hinted("
> > > -- Fix compilation error
> > > -- Get rid of the "hint" parameter completely
> > > 
> > > David Hildenbrand (6):
> > >   mm: Section numbers use the type "unsigned long"
> > >   drivers/base/memory: Use "unsigned long" for block ids
> > >   mm: Make register_mem_sect_under_node() static
> > >   mm/memory_hotplug: Rename walk_memory_range() and pass start+size
> > > instead of pfns
> > >   mm/memory_hotplug: Move and simplify walk_memory_blocks()
> > >   drivers/base/memory.c: Get rid of find_memory_block_hinted()
> > > 
> > >  arch/powerpc/platforms/powernv/memtrace.c |  23 ++---
> > >  drivers/acpi/acpi_memhotplug.c|  19 +---
> > >  drivers/base/memory.c | 120 +-
> > >  drivers/base/node.c   |   8 +-
> > >  include/linux/memory.h|   5 +-
> > >  include/linux/memory_hotplug.h|   2 -
> > >  include/linux/mmzone.h|   4 +-
> > >  include/linux/node.h  |   7 --
> > >  mm/memory_hotplug.c   |  57 +-
> > >  mm/sparse.c   |  12 +--
> > >  10 files changed, 106 insertions(+), 151 deletions(-)
> > > 
> > 
> > This series causes a few machines are unable to boot triggering endless soft
> > lockups. Reverted those commits fixed the issue.
> > 
> > 97f4217d1da0 Revert "mm/memory_hotplug: rename walk_memory_range() and pass
> > start+size instead of pfns"
> > c608eebf33c6 Revert "mm-memory_hotplug-rename-walk_memory_range-and-pass-
> > startsize-instead-of-pfns-fix"
> > 34b5e4ab7558 Revert "mm/memory_hotplug: move and simplify
> > walk_memory_blocks()"
> > 59a9f3eec5d1 Revert "drivers/base/memory.c: Get rid of
> > find_memory_block_hinted()"
> > 5cfcd52288b6 Revert "drivers-base-memoryc-get-rid-of-
> > find_memory_block_hinted-
> > v3"
> > 
> > [4.582081][T1] ACPI FADT declares the system doesn't support PCIe
> > ASPM,
> > so disable it
> > [4.590405][T1] ACPI: bus type PCI registered
> > [4.592908][T1] PCI: MMCONFIG for domain  [bus 00-ff] at [mem
> > 0x8000-0x8fff] (base 0x8000)
> > [4.601860][T1] PCI: MMCONFIG at [mem 0x8000-0x8fff] reserved
> > in
> > E820
> > [4.601860][T1] PCI: Using configuration type 1 for base access
> > [   28.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
> > [swapper/0:1]
> > [   28.671351][   C16] Modules linked in:
> > [   28.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-
> > next-20190621+ #1
> > [   28.681366][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
> > DL385
> > Gen10, BIOS A40 03/09/2018
> > [   28.691334][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
> > [   28.701334][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53 48
> > 8b
> > 55 08 48 89 fb 48 8d 7f 18 e8 4c 89 7d ff 48 89 df e8 94 f9 7d ff 41 54 9d
> > <65>
> > ff 0d c2 44 8d 48 5b 41 5c 5d c3 0f 1f 44 00 00 0f 1f 44 00 00
> > [   28.711354][   C16] RSP: 0018:888205b27bf8 EFLAGS: 0246 ORIG_RAX:
> > ff13
> > [   28.721372][   C16] RAX:  RBX: 8882053d6138 RCX:
> > b6f2a3b8
> > [   28.731371][   C16] RDX: 111040a7ac27 RSI: dc00 RDI:
> > 8882053d6138
> > [   28.741371][   C16] RBP: 888205b27c08 R08: ed1040a7ac28 R09:
> > ed1040a7ac27
> > [   28.751334][   C16] R10: ed1040a7ac27 R11: 8882053d613b R12:
> > 0246
> > [   28.751370][   C16] R13: 888205b27c98 R14: 8884504d0a20 R15:
> > 
> > [   28.761368][   C16] FS:  () GS:88845450()
> > knlGS:
> > [   28.771373][   C16] CS:  0010 DS:  ES:  CR0: 80050033
> > [   28.781334][   C16] CR2:  CR3: 0007c9012000 CR4:
> > 001406a0
> > [   28.791333][   C16] Call Trace:
> > [   28.791374][   C16]  klist_next+0xd8/0x1c0
> > [   28.791374][   C16]  subsys_find_device_by_id+0x13b/0x1f0
> > [   28.801334][   C

Re: [PATCH v3 0/6] mm: Further memory block device cleanups

2019-06-21 Thread David Hildenbrand
On 21.06.19 21:07, Qian Cai wrote:
> On Fri, 2019-06-21 at 20:56 +0200, David Hildenbrand wrote:
>> On 21.06.19 20:24, David Hildenbrand wrote:
>>> On 21.06.19 17:15, Qian Cai wrote:
 On Thu, 2019-06-20 at 20:31 +0200, David Hildenbrand wrote:
> @Andrew: Only patch 1, 4 and 6 changed compared to v1.
>
> Some further cleanups around memory block devices. Especially, clean up
> and simplify walk_memory_range(). Including some other minor cleanups.
>
> Compiled + tested on x86 with DIMMs under QEMU. Compile-tested on ppc64.
>
> v2 -> v3:
> - "mm/memory_hotplug: Rename walk_memory_range() and pass start+size .."
> -- Avoid warning on ppc.
> - "drivers/base/memory.c: Get rid of find_memory_block_hinted()"
> -- Fixup a comment regarding hinted devices.
>
> v1 -> v2:
> - "mm: Section numbers use the type "unsigned long""
> -- "unsigned long i" -> "unsigned long nr", in one case -> "int i"
> - "drivers/base/memory.c: Get rid of find_memory_block_hinted("
> -- Fix compilation error
> -- Get rid of the "hint" parameter completely
>
> David Hildenbrand (6):
>   mm: Section numbers use the type "unsigned long"
>   drivers/base/memory: Use "unsigned long" for block ids
>   mm: Make register_mem_sect_under_node() static
>   mm/memory_hotplug: Rename walk_memory_range() and pass start+size
> instead of pfns
>   mm/memory_hotplug: Move and simplify walk_memory_blocks()
>   drivers/base/memory.c: Get rid of find_memory_block_hinted()
>
>  arch/powerpc/platforms/powernv/memtrace.c |  23 ++---
>  drivers/acpi/acpi_memhotplug.c|  19 +---
>  drivers/base/memory.c | 120 +-
>  drivers/base/node.c   |   8 +-
>  include/linux/memory.h|   5 +-
>  include/linux/memory_hotplug.h|   2 -
>  include/linux/mmzone.h|   4 +-
>  include/linux/node.h  |   7 --
>  mm/memory_hotplug.c   |  57 +-
>  mm/sparse.c   |  12 +--
>  10 files changed, 106 insertions(+), 151 deletions(-)
>

 This series causes a few machines are unable to boot triggering endless
 soft
 lockups. Reverted those commits fixed the issue.

 97f4217d1da0 Revert "mm/memory_hotplug: rename walk_memory_range() and
 pass
 start+size instead of pfns"
 c608eebf33c6 Revert "mm-memory_hotplug-rename-walk_memory_range-and-pass-
 startsize-instead-of-pfns-fix"
 34b5e4ab7558 Revert "mm/memory_hotplug: move and simplify
 walk_memory_blocks()"
 59a9f3eec5d1 Revert "drivers/base/memory.c: Get rid of
 find_memory_block_hinted()"
 5cfcd52288b6 Revert "drivers-base-memoryc-get-rid-of-
 find_memory_block_hinted-
 v3"

 [4.582081][T1] ACPI FADT declares the system doesn't support PCIe
 ASPM,
 so disable it
 [4.590405][T1] ACPI: bus type PCI registered
 [4.592908][T1] PCI: MMCONFIG for domain  [bus 00-ff] at [mem
 0x8000-0x8fff] (base 0x8000)
 [4.601860][T1] PCI: MMCONFIG at [mem 0x8000-0x8fff]
 reserved in
 E820
 [4.601860][T1] PCI: Using configuration type 1 for base access
 [   28.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
 [swapper/0:1]
 [   28.671351][   C16] Modules linked in:
 [   28.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Not tainted 5.2.0-
 rc5-
 next-20190621+ #1
 [   28.681366][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
 DL385
 Gen10, BIOS A40 03/09/2018
 [   28.691334][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
 [   28.701334][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53
 48 8b
 55 08 48 89 fb 48 8d 7f 18 e8 4c 89 7d ff 48 89 df e8 94 f9 7d ff 41 54 9d
 <65>
 ff 0d c2 44 8d 48 5b 41 5c 5d c3 0f 1f 44 00 00 0f 1f 44 00 00
 [   28.711354][   C16] RSP: 0018:888205b27bf8 EFLAGS: 0246
 ORIG_RAX:
 ff13
 [   28.721372][   C16] RAX:  RBX: 8882053d6138 RCX:
 b6f2a3b8
 [   28.731371][   C16] RDX: 111040a7ac27 RSI: dc00 RDI:
 8882053d6138
 [   28.741371][   C16] RBP: 888205b27c08 R08: ed1040a7ac28 R09:
 ed1040a7ac27
 [   28.751334][   C16] R10: ed1040a7ac27 R11: 8882053d613b R12:
 0246
 [   28.751370][   C16] R13: 888205b27c98 R14: 8884504d0a20 R15:
 
 [   28.761368][   C16] FS:  ()
 GS:88845450()
 knlGS:
 [   28.771373][   C16] CS:  0010 DS:  ES:  CR0: 80050033
 [   28.781334][   C16] CR2:  CR3: 0007c9012000 CR4:
 001406a0
 [   28.79133

Re: [PATCH v3 0/6] mm: Further memory block device cleanups

2019-06-21 Thread Qian Cai
On Fri, 2019-06-21 at 20:56 +0200, David Hildenbrand wrote:
> On 21.06.19 20:24, David Hildenbrand wrote:
> > On 21.06.19 17:15, Qian Cai wrote:
> > > On Thu, 2019-06-20 at 20:31 +0200, David Hildenbrand wrote:
> > > > @Andrew: Only patch 1, 4 and 6 changed compared to v1.
> > > > 
> > > > Some further cleanups around memory block devices. Especially, clean up
> > > > and simplify walk_memory_range(). Including some other minor cleanups.
> > > > 
> > > > Compiled + tested on x86 with DIMMs under QEMU. Compile-tested on ppc64.
> > > > 
> > > > v2 -> v3:
> > > > - "mm/memory_hotplug: Rename walk_memory_range() and pass start+size .."
> > > > -- Avoid warning on ppc.
> > > > - "drivers/base/memory.c: Get rid of find_memory_block_hinted()"
> > > > -- Fixup a comment regarding hinted devices.
> > > > 
> > > > v1 -> v2:
> > > > - "mm: Section numbers use the type "unsigned long""
> > > > -- "unsigned long i" -> "unsigned long nr", in one case -> "int i"
> > > > - "drivers/base/memory.c: Get rid of find_memory_block_hinted("
> > > > -- Fix compilation error
> > > > -- Get rid of the "hint" parameter completely
> > > > 
> > > > David Hildenbrand (6):
> > > >   mm: Section numbers use the type "unsigned long"
> > > >   drivers/base/memory: Use "unsigned long" for block ids
> > > >   mm: Make register_mem_sect_under_node() static
> > > >   mm/memory_hotplug: Rename walk_memory_range() and pass start+size
> > > > instead of pfns
> > > >   mm/memory_hotplug: Move and simplify walk_memory_blocks()
> > > >   drivers/base/memory.c: Get rid of find_memory_block_hinted()
> > > > 
> > > >  arch/powerpc/platforms/powernv/memtrace.c |  23 ++---
> > > >  drivers/acpi/acpi_memhotplug.c|  19 +---
> > > >  drivers/base/memory.c | 120 +-
> > > >  drivers/base/node.c   |   8 +-
> > > >  include/linux/memory.h|   5 +-
> > > >  include/linux/memory_hotplug.h|   2 -
> > > >  include/linux/mmzone.h|   4 +-
> > > >  include/linux/node.h  |   7 --
> > > >  mm/memory_hotplug.c   |  57 +-
> > > >  mm/sparse.c   |  12 +--
> > > >  10 files changed, 106 insertions(+), 151 deletions(-)
> > > > 
> > > 
> > > This series causes a few machines are unable to boot triggering endless
> > > soft
> > > lockups. Reverted those commits fixed the issue.
> > > 
> > > 97f4217d1da0 Revert "mm/memory_hotplug: rename walk_memory_range() and
> > > pass
> > > start+size instead of pfns"
> > > c608eebf33c6 Revert "mm-memory_hotplug-rename-walk_memory_range-and-pass-
> > > startsize-instead-of-pfns-fix"
> > > 34b5e4ab7558 Revert "mm/memory_hotplug: move and simplify
> > > walk_memory_blocks()"
> > > 59a9f3eec5d1 Revert "drivers/base/memory.c: Get rid of
> > > find_memory_block_hinted()"
> > > 5cfcd52288b6 Revert "drivers-base-memoryc-get-rid-of-
> > > find_memory_block_hinted-
> > > v3"
> > > 
> > > [4.582081][T1] ACPI FADT declares the system doesn't support PCIe
> > > ASPM,
> > > so disable it
> > > [4.590405][T1] ACPI: bus type PCI registered
> > > [4.592908][T1] PCI: MMCONFIG for domain  [bus 00-ff] at [mem
> > > 0x8000-0x8fff] (base 0x8000)
> > > [4.601860][T1] PCI: MMCONFIG at [mem 0x8000-0x8fff]
> > > reserved in
> > > E820
> > > [4.601860][T1] PCI: Using configuration type 1 for base access
> > > [   28.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
> > > [swapper/0:1]
> > > [   28.671351][   C16] Modules linked in:
> > > [   28.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Not tainted 5.2.0-
> > > rc5-
> > > next-20190621+ #1
> > > [   28.681366][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
> > > DL385
> > > Gen10, BIOS A40 03/09/2018
> > > [   28.691334][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
> > > [   28.701334][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53
> > > 48 8b
> > > 55 08 48 89 fb 48 8d 7f 18 e8 4c 89 7d ff 48 89 df e8 94 f9 7d ff 41 54 9d
> > > <65>
> > > ff 0d c2 44 8d 48 5b 41 5c 5d c3 0f 1f 44 00 00 0f 1f 44 00 00
> > > [   28.711354][   C16] RSP: 0018:888205b27bf8 EFLAGS: 0246
> > > ORIG_RAX:
> > > ff13
> > > [   28.721372][   C16] RAX:  RBX: 8882053d6138 RCX:
> > > b6f2a3b8
> > > [   28.731371][   C16] RDX: 111040a7ac27 RSI: dc00 RDI:
> > > 8882053d6138
> > > [   28.741371][   C16] RBP: 888205b27c08 R08: ed1040a7ac28 R09:
> > > ed1040a7ac27
> > > [   28.751334][   C16] R10: ed1040a7ac27 R11: 8882053d613b R12:
> > > 0246
> > > [   28.751370][   C16] R13: 888205b27c98 R14: 8884504d0a20 R15:
> > > 
> > > [   28.761368][   C16] FS:  ()
> > > GS:88845450()
> > > knlGS:
> > > [   28.771373][   C16] CS:  0010 DS:  ES:  CR0: 80050033
>

Re: [PATCH v3 0/6] mm: Further memory block device cleanups

2019-06-21 Thread David Hildenbrand
On 21.06.19 20:24, David Hildenbrand wrote:
> On 21.06.19 17:15, Qian Cai wrote:
>> On Thu, 2019-06-20 at 20:31 +0200, David Hildenbrand wrote:
>>> @Andrew: Only patch 1, 4 and 6 changed compared to v1.
>>>
>>> Some further cleanups around memory block devices. Especially, clean up
>>> and simplify walk_memory_range(). Including some other minor cleanups.
>>>
>>> Compiled + tested on x86 with DIMMs under QEMU. Compile-tested on ppc64.
>>>
>>> v2 -> v3:
>>> - "mm/memory_hotplug: Rename walk_memory_range() and pass start+size .."
>>> -- Avoid warning on ppc.
>>> - "drivers/base/memory.c: Get rid of find_memory_block_hinted()"
>>> -- Fixup a comment regarding hinted devices.
>>>
>>> v1 -> v2:
>>> - "mm: Section numbers use the type "unsigned long""
>>> -- "unsigned long i" -> "unsigned long nr", in one case -> "int i"
>>> - "drivers/base/memory.c: Get rid of find_memory_block_hinted("
>>> -- Fix compilation error
>>> -- Get rid of the "hint" parameter completely
>>>
>>> David Hildenbrand (6):
>>>   mm: Section numbers use the type "unsigned long"
>>>   drivers/base/memory: Use "unsigned long" for block ids
>>>   mm: Make register_mem_sect_under_node() static
>>>   mm/memory_hotplug: Rename walk_memory_range() and pass start+size
>>> instead of pfns
>>>   mm/memory_hotplug: Move and simplify walk_memory_blocks()
>>>   drivers/base/memory.c: Get rid of find_memory_block_hinted()
>>>
>>>  arch/powerpc/platforms/powernv/memtrace.c |  23 ++---
>>>  drivers/acpi/acpi_memhotplug.c|  19 +---
>>>  drivers/base/memory.c | 120 +-
>>>  drivers/base/node.c   |   8 +-
>>>  include/linux/memory.h|   5 +-
>>>  include/linux/memory_hotplug.h|   2 -
>>>  include/linux/mmzone.h|   4 +-
>>>  include/linux/node.h  |   7 --
>>>  mm/memory_hotplug.c   |  57 +-
>>>  mm/sparse.c   |  12 +--
>>>  10 files changed, 106 insertions(+), 151 deletions(-)
>>>
>>
>> This series causes a few machines are unable to boot triggering endless soft
>> lockups. Reverted those commits fixed the issue.
>>
>> 97f4217d1da0 Revert "mm/memory_hotplug: rename walk_memory_range() and pass
>> start+size instead of pfns"
>> c608eebf33c6 Revert "mm-memory_hotplug-rename-walk_memory_range-and-pass-
>> startsize-instead-of-pfns-fix"
>> 34b5e4ab7558 Revert "mm/memory_hotplug: move and simplify 
>> walk_memory_blocks()"
>> 59a9f3eec5d1 Revert "drivers/base/memory.c: Get rid of
>> find_memory_block_hinted()"
>> 5cfcd52288b6 Revert 
>> "drivers-base-memoryc-get-rid-of-find_memory_block_hinted-
>> v3"
>>
>> [4.582081][T1] ACPI FADT declares the system doesn't support PCIe 
>> ASPM,
>> so disable it
>> [4.590405][T1] ACPI: bus type PCI registered
>> [4.592908][T1] PCI: MMCONFIG for domain  [bus 00-ff] at [mem
>> 0x8000-0x8fff] (base 0x8000)
>> [4.601860][T1] PCI: MMCONFIG at [mem 0x8000-0x8fff] reserved 
>> in
>> E820
>> [4.601860][T1] PCI: Using configuration type 1 for base access
>> [   28.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
>> [swapper/0:1]
>> [   28.671351][   C16] Modules linked in:
>> [   28.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-
>> next-20190621+ #1
>> [   28.681366][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
>> Gen10, BIOS A40 03/09/2018
>> [   28.691334][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
>> [   28.701334][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53 48 
>> 8b
>> 55 08 48 89 fb 48 8d 7f 18 e8 4c 89 7d ff 48 89 df e8 94 f9 7d ff 41 54 9d 
>> <65>
>> ff 0d c2 44 8d 48 5b 41 5c 5d c3 0f 1f 44 00 00 0f 1f 44 00 00
>> [   28.711354][   C16] RSP: 0018:888205b27bf8 EFLAGS: 0246 ORIG_RAX:
>> ff13
>> [   28.721372][   C16] RAX:  RBX: 8882053d6138 RCX:
>> b6f2a3b8
>> [   28.731371][   C16] RDX: 111040a7ac27 RSI: dc00 RDI:
>> 8882053d6138
>> [   28.741371][   C16] RBP: 888205b27c08 R08: ed1040a7ac28 R09:
>> ed1040a7ac27
>> [   28.751334][   C16] R10: ed1040a7ac27 R11: 8882053d613b R12:
>> 0246
>> [   28.751370][   C16] R13: 888205b27c98 R14: 8884504d0a20 R15:
>> 
>> [   28.761368][   C16] FS:  () GS:88845450()
>> knlGS:
>> [   28.771373][   C16] CS:  0010 DS:  ES:  CR0: 80050033
>> [   28.781334][   C16] CR2:  CR3: 0007c9012000 CR4:
>> 001406a0
>> [   28.791333][   C16] Call Trace:
>> [   28.791374][   C16]  klist_next+0xd8/0x1c0
>> [   28.791374][   C16]  subsys_find_device_by_id+0x13b/0x1f0
>> [   28.801334][   C16]  ? bus_find_device_by_name+0x20/0x20
>> [   28.801370][   C16]  ? kobject_put+0x23/0x250
>> [   28.811333][   C16]  walk_memory_blocks+0x6c/0xb8
>> [   28.811353][ 

Re: [PATCH v3 0/6] mm: Further memory block device cleanups

2019-06-21 Thread David Hildenbrand
On 21.06.19 17:15, Qian Cai wrote:
> On Thu, 2019-06-20 at 20:31 +0200, David Hildenbrand wrote:
>> @Andrew: Only patch 1, 4 and 6 changed compared to v1.
>>
>> Some further cleanups around memory block devices. Especially, clean up
>> and simplify walk_memory_range(). Including some other minor cleanups.
>>
>> Compiled + tested on x86 with DIMMs under QEMU. Compile-tested on ppc64.
>>
>> v2 -> v3:
>> - "mm/memory_hotplug: Rename walk_memory_range() and pass start+size .."
>> -- Avoid warning on ppc.
>> - "drivers/base/memory.c: Get rid of find_memory_block_hinted()"
>> -- Fixup a comment regarding hinted devices.
>>
>> v1 -> v2:
>> - "mm: Section numbers use the type "unsigned long""
>> -- "unsigned long i" -> "unsigned long nr", in one case -> "int i"
>> - "drivers/base/memory.c: Get rid of find_memory_block_hinted("
>> -- Fix compilation error
>> -- Get rid of the "hint" parameter completely
>>
>> David Hildenbrand (6):
>>   mm: Section numbers use the type "unsigned long"
>>   drivers/base/memory: Use "unsigned long" for block ids
>>   mm: Make register_mem_sect_under_node() static
>>   mm/memory_hotplug: Rename walk_memory_range() and pass start+size
>> instead of pfns
>>   mm/memory_hotplug: Move and simplify walk_memory_blocks()
>>   drivers/base/memory.c: Get rid of find_memory_block_hinted()
>>
>>  arch/powerpc/platforms/powernv/memtrace.c |  23 ++---
>>  drivers/acpi/acpi_memhotplug.c|  19 +---
>>  drivers/base/memory.c | 120 +-
>>  drivers/base/node.c   |   8 +-
>>  include/linux/memory.h|   5 +-
>>  include/linux/memory_hotplug.h|   2 -
>>  include/linux/mmzone.h|   4 +-
>>  include/linux/node.h  |   7 --
>>  mm/memory_hotplug.c   |  57 +-
>>  mm/sparse.c   |  12 +--
>>  10 files changed, 106 insertions(+), 151 deletions(-)
>>
> 
> This series causes a few machines are unable to boot triggering endless soft
> lockups. Reverted those commits fixed the issue.
> 
> 97f4217d1da0 Revert "mm/memory_hotplug: rename walk_memory_range() and pass
> start+size instead of pfns"
> c608eebf33c6 Revert "mm-memory_hotplug-rename-walk_memory_range-and-pass-
> startsize-instead-of-pfns-fix"
> 34b5e4ab7558 Revert "mm/memory_hotplug: move and simplify 
> walk_memory_blocks()"
> 59a9f3eec5d1 Revert "drivers/base/memory.c: Get rid of
> find_memory_block_hinted()"
> 5cfcd52288b6 Revert "drivers-base-memoryc-get-rid-of-find_memory_block_hinted-
> v3"
> 
> [4.582081][T1] ACPI FADT declares the system doesn't support PCIe 
> ASPM,
> so disable it
> [4.590405][T1] ACPI: bus type PCI registered
> [4.592908][T1] PCI: MMCONFIG for domain  [bus 00-ff] at [mem
> 0x8000-0x8fff] (base 0x8000)
> [4.601860][T1] PCI: MMCONFIG at [mem 0x8000-0x8fff] reserved 
> in
> E820
> [4.601860][T1] PCI: Using configuration type 1 for base access
> [   28.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
> [swapper/0:1]
> [   28.671351][   C16] Modules linked in:
> [   28.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-
> next-20190621+ #1
> [   28.681366][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
> Gen10, BIOS A40 03/09/2018
> [   28.691334][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
> [   28.701334][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53 48 
> 8b
> 55 08 48 89 fb 48 8d 7f 18 e8 4c 89 7d ff 48 89 df e8 94 f9 7d ff 41 54 9d 
> <65>
> ff 0d c2 44 8d 48 5b 41 5c 5d c3 0f 1f 44 00 00 0f 1f 44 00 00
> [   28.711354][   C16] RSP: 0018:888205b27bf8 EFLAGS: 0246 ORIG_RAX:
> ff13
> [   28.721372][   C16] RAX:  RBX: 8882053d6138 RCX:
> b6f2a3b8
> [   28.731371][   C16] RDX: 111040a7ac27 RSI: dc00 RDI:
> 8882053d6138
> [   28.741371][   C16] RBP: 888205b27c08 R08: ed1040a7ac28 R09:
> ed1040a7ac27
> [   28.751334][   C16] R10: ed1040a7ac27 R11: 8882053d613b R12:
> 0246
> [   28.751370][   C16] R13: 888205b27c98 R14: 8884504d0a20 R15:
> 
> [   28.761368][   C16] FS:  () GS:88845450()
> knlGS:
> [   28.771373][   C16] CS:  0010 DS:  ES:  CR0: 80050033
> [   28.781334][   C16] CR2:  CR3: 0007c9012000 CR4:
> 001406a0
> [   28.791333][   C16] Call Trace:
> [   28.791374][   C16]  klist_next+0xd8/0x1c0
> [   28.791374][   C16]  subsys_find_device_by_id+0x13b/0x1f0
> [   28.801334][   C16]  ? bus_find_device_by_name+0x20/0x20
> [   28.801370][   C16]  ? kobject_put+0x23/0x250
> [   28.811333][   C16]  walk_memory_blocks+0x6c/0xb8
> [   28.811353][   C16]  ? write_policy_show+0x40/0x40
> [   28.821334][   C16]  link_mem_sections+0x7e/0xa0
> [   28.821369][   C16]  ? unregister_memory_block_under_n

Re: [PATCH v3 0/6] mm: Further memory block device cleanups

2019-06-21 Thread David Hildenbrand
On 21.06.19 17:15, Qian Cai wrote:
> On Thu, 2019-06-20 at 20:31 +0200, David Hildenbrand wrote:
>> @Andrew: Only patch 1, 4 and 6 changed compared to v1.
>>
>> Some further cleanups around memory block devices. Especially, clean up
>> and simplify walk_memory_range(). Including some other minor cleanups.
>>
>> Compiled + tested on x86 with DIMMs under QEMU. Compile-tested on ppc64.
>>
>> v2 -> v3:
>> - "mm/memory_hotplug: Rename walk_memory_range() and pass start+size .."
>> -- Avoid warning on ppc.
>> - "drivers/base/memory.c: Get rid of find_memory_block_hinted()"
>> -- Fixup a comment regarding hinted devices.
>>
>> v1 -> v2:
>> - "mm: Section numbers use the type "unsigned long""
>> -- "unsigned long i" -> "unsigned long nr", in one case -> "int i"
>> - "drivers/base/memory.c: Get rid of find_memory_block_hinted("
>> -- Fix compilation error
>> -- Get rid of the "hint" parameter completely
>>
>> David Hildenbrand (6):
>>   mm: Section numbers use the type "unsigned long"
>>   drivers/base/memory: Use "unsigned long" for block ids
>>   mm: Make register_mem_sect_under_node() static
>>   mm/memory_hotplug: Rename walk_memory_range() and pass start+size
>> instead of pfns
>>   mm/memory_hotplug: Move and simplify walk_memory_blocks()
>>   drivers/base/memory.c: Get rid of find_memory_block_hinted()
>>
>>  arch/powerpc/platforms/powernv/memtrace.c |  23 ++---
>>  drivers/acpi/acpi_memhotplug.c|  19 +---
>>  drivers/base/memory.c | 120 +-
>>  drivers/base/node.c   |   8 +-
>>  include/linux/memory.h|   5 +-
>>  include/linux/memory_hotplug.h|   2 -
>>  include/linux/mmzone.h|   4 +-
>>  include/linux/node.h  |   7 --
>>  mm/memory_hotplug.c   |  57 +-
>>  mm/sparse.c   |  12 +--
>>  10 files changed, 106 insertions(+), 151 deletions(-)
>>
> 
> This series causes a few machines are unable to boot triggering endless soft
> lockups. Reverted those commits fixed the issue.
> 
> 97f4217d1da0 Revert "mm/memory_hotplug: rename walk_memory_range() and pass
> start+size instead of pfns"
> c608eebf33c6 Revert "mm-memory_hotplug-rename-walk_memory_range-and-pass-
> startsize-instead-of-pfns-fix"
> 34b5e4ab7558 Revert "mm/memory_hotplug: move and simplify 
> walk_memory_blocks()"
> 59a9f3eec5d1 Revert "drivers/base/memory.c: Get rid of
> find_memory_block_hinted()"
> 5cfcd52288b6 Revert "drivers-base-memoryc-get-rid-of-find_memory_block_hinted-
> v3"
> 
> [4.582081][T1] ACPI FADT declares the system doesn't support PCIe 
> ASPM,
> so disable it
> [4.590405][T1] ACPI: bus type PCI registered
> [4.592908][T1] PCI: MMCONFIG for domain  [bus 00-ff] at [mem
> 0x8000-0x8fff] (base 0x8000)
> [4.601860][T1] PCI: MMCONFIG at [mem 0x8000-0x8fff] reserved 
> in
> E820
> [4.601860][T1] PCI: Using configuration type 1 for base access
> [   28.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
> [swapper/0:1]
> [   28.671351][   C16] Modules linked in:
> [   28.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-
> next-20190621+ #1
> [   28.681366][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
> Gen10, BIOS A40 03/09/2018
> [   28.691334][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
> [   28.701334][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53 48 
> 8b
> 55 08 48 89 fb 48 8d 7f 18 e8 4c 89 7d ff 48 89 df e8 94 f9 7d ff 41 54 9d 
> <65>
> ff 0d c2 44 8d 48 5b 41 5c 5d c3 0f 1f 44 00 00 0f 1f 44 00 00
> [   28.711354][   C16] RSP: 0018:888205b27bf8 EFLAGS: 0246 ORIG_RAX:
> ff13
> [   28.721372][   C16] RAX:  RBX: 8882053d6138 RCX:
> b6f2a3b8
> [   28.731371][   C16] RDX: 111040a7ac27 RSI: dc00 RDI:
> 8882053d6138
> [   28.741371][   C16] RBP: 888205b27c08 R08: ed1040a7ac28 R09:
> ed1040a7ac27
> [   28.751334][   C16] R10: ed1040a7ac27 R11: 8882053d613b R12:
> 0246
> [   28.751370][   C16] R13: 888205b27c98 R14: 8884504d0a20 R15:
> 
> [   28.761368][   C16] FS:  () GS:88845450()
> knlGS:
> [   28.771373][   C16] CS:  0010 DS:  ES:  CR0: 80050033
> [   28.781334][   C16] CR2:  CR3: 0007c9012000 CR4:
> 001406a0
> [   28.791333][   C16] Call Trace:
> [   28.791374][   C16]  klist_next+0xd8/0x1c0
> [   28.791374][   C16]  subsys_find_device_by_id+0x13b/0x1f0
> [   28.801334][   C16]  ? bus_find_device_by_name+0x20/0x20
> [   28.801370][   C16]  ? kobject_put+0x23/0x250
> [   28.811333][   C16]  walk_memory_blocks+0x6c/0xb8
> [   28.811353][   C16]  ? write_policy_show+0x40/0x40
> [   28.821334][   C16]  link_mem_sections+0x7e/0xa0
> [   28.821369][   C16]  ? unregister_memory_block_under_n

Re: [PATCH v3 0/6] mm: Further memory block device cleanups

2019-06-21 Thread Qian Cai
On Thu, 2019-06-20 at 20:31 +0200, David Hildenbrand wrote:
> @Andrew: Only patch 1, 4 and 6 changed compared to v1.
> 
> Some further cleanups around memory block devices. Especially, clean up
> and simplify walk_memory_range(). Including some other minor cleanups.
> 
> Compiled + tested on x86 with DIMMs under QEMU. Compile-tested on ppc64.
> 
> v2 -> v3:
> - "mm/memory_hotplug: Rename walk_memory_range() and pass start+size .."
> -- Avoid warning on ppc.
> - "drivers/base/memory.c: Get rid of find_memory_block_hinted()"
> -- Fixup a comment regarding hinted devices.
> 
> v1 -> v2:
> - "mm: Section numbers use the type "unsigned long""
> -- "unsigned long i" -> "unsigned long nr", in one case -> "int i"
> - "drivers/base/memory.c: Get rid of find_memory_block_hinted("
> -- Fix compilation error
> -- Get rid of the "hint" parameter completely
> 
> David Hildenbrand (6):
>   mm: Section numbers use the type "unsigned long"
>   drivers/base/memory: Use "unsigned long" for block ids
>   mm: Make register_mem_sect_under_node() static
>   mm/memory_hotplug: Rename walk_memory_range() and pass start+size
> instead of pfns
>   mm/memory_hotplug: Move and simplify walk_memory_blocks()
>   drivers/base/memory.c: Get rid of find_memory_block_hinted()
> 
>  arch/powerpc/platforms/powernv/memtrace.c |  23 ++---
>  drivers/acpi/acpi_memhotplug.c|  19 +---
>  drivers/base/memory.c | 120 +-
>  drivers/base/node.c   |   8 +-
>  include/linux/memory.h|   5 +-
>  include/linux/memory_hotplug.h|   2 -
>  include/linux/mmzone.h|   4 +-
>  include/linux/node.h  |   7 --
>  mm/memory_hotplug.c   |  57 +-
>  mm/sparse.c   |  12 +--
>  10 files changed, 106 insertions(+), 151 deletions(-)
> 

This series causes a few machines are unable to boot triggering endless soft
lockups. Reverted those commits fixed the issue.

97f4217d1da0 Revert "mm/memory_hotplug: rename walk_memory_range() and pass
start+size instead of pfns"
c608eebf33c6 Revert "mm-memory_hotplug-rename-walk_memory_range-and-pass-
startsize-instead-of-pfns-fix"
34b5e4ab7558 Revert "mm/memory_hotplug: move and simplify walk_memory_blocks()"
59a9f3eec5d1 Revert "drivers/base/memory.c: Get rid of
find_memory_block_hinted()"
5cfcd52288b6 Revert "drivers-base-memoryc-get-rid-of-find_memory_block_hinted-
v3"

[4.582081][T1] ACPI FADT declares the system doesn't support PCIe ASPM,
so disable it
[4.590405][T1] ACPI: bus type PCI registered
[4.592908][T1] PCI: MMCONFIG for domain  [bus 00-ff] at [mem
0x8000-0x8fff] (base 0x8000)
[4.601860][T1] PCI: MMCONFIG at [mem 0x8000-0x8fff] reserved in
E820
[4.601860][T1] PCI: Using configuration type 1 for base access
[   28.661336][   C16] watchdog: BUG: soft lockup - CPU#16 stuck for 22s!
[swapper/0:1]
[   28.671351][   C16] Modules linked in:
[   28.671354][   C16] CPU: 16 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5-
next-20190621+ #1
[   28.681366][   C16] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
Gen10, BIOS A40 03/09/2018
[   28.691334][   C16] RIP: 0010:_raw_spin_unlock_irqrestore+0x2f/0x40
[   28.701334][   C16] Code: 55 48 89 e5 41 54 49 89 f4 be 01 00 00 00 53 48 8b
55 08 48 89 fb 48 8d 7f 18 e8 4c 89 7d ff 48 89 df e8 94 f9 7d ff 41 54 9d <65>
ff 0d c2 44 8d 48 5b 41 5c 5d c3 0f 1f 44 00 00 0f 1f 44 00 00
[   28.711354][   C16] RSP: 0018:888205b27bf8 EFLAGS: 0246 ORIG_RAX:
ff13
[   28.721372][   C16] RAX:  RBX: 8882053d6138 RCX:
b6f2a3b8
[   28.731371][   C16] RDX: 111040a7ac27 RSI: dc00 RDI:
8882053d6138
[   28.741371][   C16] RBP: 888205b27c08 R08: ed1040a7ac28 R09:
ed1040a7ac27
[   28.751334][   C16] R10: ed1040a7ac27 R11: 8882053d613b R12:
0246
[   28.751370][   C16] R13: 888205b27c98 R14: 8884504d0a20 R15:

[   28.761368][   C16] FS:  () GS:88845450()
knlGS:
[   28.771373][   C16] CS:  0010 DS:  ES:  CR0: 80050033
[   28.781334][   C16] CR2:  CR3: 0007c9012000 CR4:
001406a0
[   28.791333][   C16] Call Trace:
[   28.791374][   C16]  klist_next+0xd8/0x1c0
[   28.791374][   C16]  subsys_find_device_by_id+0x13b/0x1f0
[   28.801334][   C16]  ? bus_find_device_by_name+0x20/0x20
[   28.801370][   C16]  ? kobject_put+0x23/0x250
[   28.811333][   C16]  walk_memory_blocks+0x6c/0xb8
[   28.811353][   C16]  ? write_policy_show+0x40/0x40
[   28.821334][   C16]  link_mem_sections+0x7e/0xa0
[   28.821369][   C16]  ? unregister_memory_block_under_nodes+0x210/0x210
[   28.831353][   C16]  ? __register_one_node+0x3bd/0x600
[   28.831353][   C16]  topology_init+0xbf/0x126
[   28.841364][   C16]  ? enable_cpu0_hotplug+0x1a/0x1a
[   28.841368][   C16]