Re: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail in __free_pages_core()")

2021-03-16 Thread Mike Rapoport
On Tue, Mar 16, 2021 at 10:08:10AM +0100, David Hildenbrand wrote:
> On 16.03.21 09:58, Liang, Liang (Leo) wrote:
> > [AMD Public Use]
> > 
> > Hi David,
> > 
> > root@scbu-Chachani:~# cat /proc/mtrr
> > reg00: base=0x0 (0MB), size= 2048MB, count=1: write-back
> > reg01: base=0x0ffe0 ( 4094MB), size=2MB, count=1: write-protect
> > reg02: base=0x1 ( 4096MB), size=   16MB, count=1: write-protect
> 
> ^ there it is
> 
> https://wiki.osdev.org/MTRR
> 
> "Reads allocate cache lines on a cache miss. All writes update main memory.
> 
> Cache lines are not allocated on a write miss. Write hits invalidate the
> cache line and update main memory. "
> 
> AFAIU, writes completely bypass caches and store directly to main mamory. If
> there are cache lines from a previous read, they are invalidated. So I think
> especially slow will be read(addr), write(addr), read(addr), ... which is
> what we have in the kstream benchmark.
> 
> 
> The question is:
> 
> who sets this up without owning the memory?
> Is the memory actually special/slow or is that setting wrong?

I really doubt that 16M at 0x1 in a system with 8G RAM would
*physically* differ from the neighbouring memory.

> Buggy firmware/BIOS?
> Buggy device driver?

[0.27] MTRR default type: uncachable
[0.28] MTRR fixed ranges enabled:
[0.30]   0-9 write-back
[0.31]   A-B uncachable
[0.32]   C-F write-through
[0.33] MTRR variable ranges enabled:
[0.34]   0 base  mask 8000 write-back
[0.36]   1 base FFE0 mask FFE0 write-protect
[0.37]   2 base 0001 mask FF00 write-protect

As we have the range at 0x1 write-protected reported that early in
boot I'd say it's BIOS.

The question is how to reliably detect that this is a bogus setting...

[0.38]   3 base FFDE mask FFFE write-protect
[0.39]   4 base FF00 mask FFF8 write-protect
[0.40]   5 disabled
[0.41]   6 disabled
[0.42]   7 disabled
[0.42] TOM2: 00028000 aka 10240M


-- 
Sincerely yours,
Mike.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail in __free_pages_core()")

2021-03-16 Thread David Hildenbrand

On 16.03.21 09:43, Liang, Liang (Leo) wrote:

[AMD Public Use]

Hi David,

Thanks for your explanation. We saw slow boot issue on our farm/QA's machines 
and mine. All of machines are same SoC/board.


I cannot spot anything really special in the logs -- it's just ordinary 
system ram -- except:


[0.27] MTRR fixed ranges enabled:
[0.28]   0-9 write-back
[0.29]   A-B uncachable
[0.30]   C-F write-through
[0.31] MTRR variable ranges enabled:
[0.32]   0 base  mask 8000 write-back
[0.34]   1 base FFE0 mask FFE0 write-protect
[0.35]   2 base 0001 mask FF00 write-protect
[0.36]   3 base FFDE mask FFFE write-protect
[0.38]   4 base FF00 mask FFF8 write-protect
[0.39]   5 disabled
[0.39]   6 disabled
[0.40]   7 disabled

Not sure if "2 base 0001" indicates something nasty. Not sure 
how to interpret the masks.


Can you provide the output of "cat /proc/mtrr" ?

--
Thanks,

David / dhildenb

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail in __free_pages_core()")

2021-03-16 Thread David Hildenbrand

On 16.03.21 09:58, Liang, Liang (Leo) wrote:

[AMD Public Use]

Hi David,

root@scbu-Chachani:~# cat /proc/mtrr
reg00: base=0x0 (0MB), size= 2048MB, count=1: write-back
reg01: base=0x0ffe0 ( 4094MB), size=2MB, count=1: write-protect
reg02: base=0x1 ( 4096MB), size=   16MB, count=1: write-protect


^ there it is

https://wiki.osdev.org/MTRR

"Reads allocate cache lines on a cache miss. All writes update main memory.

Cache lines are not allocated on a write miss. Write hits invalidate the 
cache line and update main memory. "


AFAIU, writes completely bypass caches and store directly to main 
mamory. If there are cache lines from a previous read, they are 
invalidated. So I think especially slow will be read(addr), write(addr), 
read(addr), ... which is what we have in the kstream benchmark.



The question is:

who sets this up without owning the memory?
Is the memory actually special/slow or is that setting wrong?
Buggy firmware/BIOS?
Buggy device driver?



reg03: base=0x0ffde ( 4093MB), size=  128KB, count=1: write-protect
reg04: base=0x0ff00 ( 4080MB), size=  512KB, count=1: write-protect



--
Thanks,

David / dhildenb

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail in __free_pages_core()")

2021-03-16 Thread David Hildenbrand

On 16.03.21 12:02, Liang, Liang (Leo) wrote:

[AMD Public Use]

Hi David and Mike,

It's BIOS buggy. Now fixed by new BIOS. Thanks you so much! Cheers!

[0.34] MTRR variable ranges enabled:
[0.35]   0 base  mask 8000 write-back
[0.37]   1 base FFE0 mask FFE0 write-protect
[0.39]   2 base FFDE mask FFFE write-protect
[0.40]   3 base FF00 mask FFF8 write-protect
[0.41]   4 disabled
[0.42]   5 disabled
[0.43]   6 disabled
[0.44]   7 disabled
[0.45] TOM2: 00028000 aka 10240M

root@scbu-Chachani:/home/scbu# cat /proc/mtrr
reg00: base=0x0 (0MB), size= 2048MB, count=1: write-back
reg01: base=0x0ffe0 ( 4094MB), size=2MB, count=1: write-protect
reg02: base=0x0ffde ( 4093MB), size=  128KB, count=1: write-protect
reg03: base=0x0ff00 ( 4080MB), size=  512KB, count=1: write-protect


Great :)

(another latent BUG found with 7fef431be9c9 :) )

--
Thanks,

David / dhildenb

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail in __free_pages_core()")

2021-03-16 Thread David Hildenbrand

On 16.03.21 09:00, Liang, Liang (Leo) wrote:

[AMD Public Use]

Hi Mike,

Thanks for help. The patch works for me and boot time back to normal. So it's a 
fix, or just WA?


Hi Leo,

excluding up to 16 MiB of memory on every system just because that 
single platform is weird is not acceptable.


I think we have to figure out

a) why that memory is so special. This is weird.
b) why the platform doesn't indicate it in a special way. Why is it 
ordinary system RAM but still *that* slow?

c) how we can reliably identify such memory and exclude it.

I'll have a peek at the memory layout of that machine from boot logs 
next to figure out if we can answer any of these questions.


Just to verify: this does happen on multiple machines, not just a single 
one? (i.e., we're not dealing with faulty RAM)


--
Thanks,

David / dhildenb

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail in __free_pages_core()")

2021-03-16 Thread Liang, Liang (Leo)
[AMD Public Use]

Hi David and Mike,

It's BIOS buggy. Now fixed by new BIOS. Thanks you so much! Cheers!

[0.34] MTRR variable ranges enabled:
[0.35]   0 base  mask 8000 write-back
[0.37]   1 base FFE0 mask FFE0 write-protect
[0.39]   2 base FFDE mask FFFE write-protect
[0.40]   3 base FF00 mask FFF8 write-protect
[0.41]   4 disabled
[0.42]   5 disabled
[0.43]   6 disabled
[0.44]   7 disabled
[0.45] TOM2: 00028000 aka 10240M

root@scbu-Chachani:/home/scbu# cat /proc/mtrr
reg00: base=0x0 (0MB), size= 2048MB, count=1: write-back
reg01: base=0x0ffe0 ( 4094MB), size=2MB, count=1: write-protect
reg02: base=0x0ffde ( 4093MB), size=  128KB, count=1: write-protect
reg03: base=0x0ff00 ( 4080MB), size=  512KB, count=1: write-protect

BRs,
Leo
-Original Message-
From: Mike Rapoport  
Sent: Tuesday, March 16, 2021 6:30 PM
To: David Hildenbrand 
Cc: Liang, Liang (Leo) ; Deucher, Alexander 
; linux-ker...@vger.kernel.org; amd-gfx list 
; Andrew Morton ; 
Huang, Ray ; Koenig, Christian ; 
Rafael J. Wysocki ; George Kennedy 

Subject: Re: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail 
in __free_pages_core()")

On Tue, Mar 16, 2021 at 10:08:10AM +0100, David Hildenbrand wrote:
> On 16.03.21 09:58, Liang, Liang (Leo) wrote:
> > [AMD Public Use]
> > 
> > Hi David,
> > 
> > root@scbu-Chachani:~# cat /proc/mtrr
> > reg00: base=0x0 (0MB), size= 2048MB, count=1: write-back
> > reg01: base=0x0ffe0 ( 4094MB), size=2MB, count=1: write-protect
> > reg02: base=0x1 ( 4096MB), size=   16MB, count=1: write-protect
> 
> ^ there it is
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki
> .osdev.org%2FMTRRdata=04%7C01%7CLiang.Liang%40amd.com%7C49c791cc6
> 18745b8c35208d8e86679a1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C6
> 37514874126576401%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoi
> V2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=667IK3Bnyx5uP3
> rKN8bOjW7A2MBuM1sLCg98m1LCIGo%3Dreserved=0
> 
> "Reads allocate cache lines on a cache miss. All writes update main memory.
> 
> Cache lines are not allocated on a write miss. Write hits invalidate 
> the cache line and update main memory. "
> 
> AFAIU, writes completely bypass caches and store directly to main 
> mamory. If there are cache lines from a previous read, they are 
> invalidated. So I think especially slow will be read(addr), 
> write(addr), read(addr), ... which is what we have in the kstream benchmark.
> 
> 
> The question is:
> 
> who sets this up without owning the memory?
> Is the memory actually special/slow or is that setting wrong?

I really doubt that 16M at 0x1 in a system with 8G RAM would
*physically* differ from the neighbouring memory.

> Buggy firmware/BIOS?
> Buggy device driver?

[0.27] MTRR default type: uncachable
[0.28] MTRR fixed ranges enabled:
[0.30]   0-9 write-back
[0.31]   A-B uncachable
[0.32]   C-F write-through
[0.33] MTRR variable ranges enabled:
[0.34]   0 base  mask 8000 write-back
[0.36]   1 base FFE0 mask FFE0 write-protect
[0.37]   2 base 0001 mask FF00 write-protect

As we have the range at 0x1 write-protected reported that early in boot 
I'd say it's BIOS.

The question is how to reliably detect that this is a bogus setting...

[0.38]   3 base FFDE mask FFFE write-protect
[0.39]   4 base FF00 mask FFF8 write-protect
[0.40]   5 disabled
[0.41]   6 disabled
[0.42]   7 disabled
[0.42] TOM2: 00028000 aka 10240M


--
Sincerely yours,
Mike.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail in __free_pages_core()")

2021-03-16 Thread Liang, Liang (Leo)
[AMD Public Use]

Hi David,

root@scbu-Chachani:~# cat /proc/mtrr
reg00: base=0x0 (0MB), size= 2048MB, count=1: write-back
reg01: base=0x0ffe0 ( 4094MB), size=2MB, count=1: write-protect
reg02: base=0x1 ( 4096MB), size=   16MB, count=1: write-protect
reg03: base=0x0ffde ( 4093MB), size=  128KB, count=1: write-protect
reg04: base=0x0ff00 ( 4080MB), size=  512KB, count=1: write-protect

BRs,
Leo
-Original Message-
From: David Hildenbrand  
Sent: Tuesday, March 16, 2021 4:54 PM
To: Liang, Liang (Leo) ; Mike Rapoport 
Cc: Deucher, Alexander ; 
linux-ker...@vger.kernel.org; amd-gfx list ; 
Andrew Morton ; Huang, Ray ; 
Koenig, Christian ; Rafael J. Wysocki 
; George Kennedy 
Subject: Re: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail 
in __free_pages_core()")

On 16.03.21 09:43, Liang, Liang (Leo) wrote:
> [AMD Public Use]
> 
> Hi David,
> 
> Thanks for your explanation. We saw slow boot issue on our farm/QA's machines 
> and mine. All of machines are same SoC/board.

I cannot spot anything really special in the logs -- it's just ordinary system 
ram -- except:

[0.27] MTRR fixed ranges enabled:
[0.28]   0-9 write-back
[0.29]   A-B uncachable
[0.30]   C-F write-through
[0.31] MTRR variable ranges enabled:
[0.32]   0 base  mask 8000 write-back
[0.34]   1 base FFE0 mask FFE0 write-protect
[0.35]   2 base 0001 mask FF00 write-protect
[0.36]   3 base FFDE mask FFFE write-protect
[0.38]   4 base FF00 mask FFF8 write-protect
[0.39]   5 disabled
[0.39]   6 disabled
[0.40]   7 disabled

Not sure if "2 base 0001" indicates something nasty. Not sure how to 
interpret the masks.

Can you provide the output of "cat /proc/mtrr" ?

--
Thanks,

David / dhildenb
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail in __free_pages_core()")

2021-03-16 Thread Liang, Liang (Leo)
[AMD Public Use]

Hi David,

Thanks for your explanation. We saw slow boot issue on our farm/QA's machines 
and mine. All of machines are same SoC/board.

BRs,
Leo
-Original Message-
From: David Hildenbrand  
Sent: Tuesday, March 16, 2021 4:38 PM
To: Liang, Liang (Leo) ; Mike Rapoport 
Cc: Deucher, Alexander ; 
linux-ker...@vger.kernel.org; amd-gfx list ; 
Andrew Morton ; Huang, Ray ; 
Koenig, Christian ; Rafael J. Wysocki 
; George Kennedy 
Subject: Re: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail 
in __free_pages_core()")

On 16.03.21 09:00, Liang, Liang (Leo) wrote:
> [AMD Public Use]
> 
> Hi Mike,
> 
> Thanks for help. The patch works for me and boot time back to normal. So it's 
> a fix, or just WA?

Hi Leo,

excluding up to 16 MiB of memory on every system just because that single 
platform is weird is not acceptable.

I think we have to figure out

a) why that memory is so special. This is weird.
b) why the platform doesn't indicate it in a special way. Why is it ordinary 
system RAM but still *that* slow?
c) how we can reliably identify such memory and exclude it.

I'll have a peek at the memory layout of that machine from boot logs next to 
figure out if we can answer any of these questions.

Just to verify: this does happen on multiple machines, not just a single one? 
(i.e., we're not dealing with faulty RAM)

--
Thanks,

David / dhildenb
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail in __free_pages_core()")

2021-03-16 Thread Mike Rapoport
Hi Leo,

On Tue, Mar 16, 2021 at 12:36:29AM +, Liang, Liang (Leo) wrote:
> 
> Hi David,
> 
> Sorry for late. If revert 7fef431be9c9 (without 7fef431be9c9), the dmesg 
> attached. And looks the exception as below:
> [  +0.027833] [0x7800 - 0x783f] 20925 MB/s / 25405 
> MB/s
> [  +1.363596] [0x0001 - 0x0001003f] 222 MB/s / 222 MB/s
> [  +1.562192] [0x00010040 - 0x0001007f] 222 MB/s / 222 MB/s
> [  +1.881332] [0x00010080 - 0x000100bf] 195 MB/s / 159 MB/s
> [  +1.383388] [0x000100c0 - 0x000100ff] 219 MB/s / 221 MB/s
> [  +0.029342] [0x00010100 - 0x0001013f] 19807 MB/s / 24125 
> MB/s
> 
> What is the problem here? Do you want to check the acpi tables?

As it seems the first 16M at 0x0001 are two orders of magnitude
slower than the rest of the memory as if there is a different memory device
there.

This would explain why with 7fef431be9c9 everything gets slower as we
allocate the first (and probably quite critical) data from those 16M.

No idea how this could be related to ACPI and why ACPI initialization
causes the huge slowdown on its own.

Can you please try booting with 7fef431be9c9 still applied and with this
patch (not even compile tested):

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index d883176ef2ce..780f11ca14c9 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -778,6 +778,7 @@ void __init setup_arch(char **cmdline_p)
 * L1TF its contents can be leaked to user processes.
 */
memblock_reserve(0, PAGE_SIZE);
+   memblock_reserve(0x0001, SZ_16M);
 
early_reserve_initrd();
 
 
> BRs,
> Leo
> -Original Message-
> From: David Hildenbrand  
> Sent: Monday, March 15, 2021 9:04 PM
> To: Mike Rapoport 
> Cc: Liang, Liang (Leo) ; Deucher, Alexander 
> ; linux-ker...@vger.kernel.org; amd-gfx list 
> ; Andrew Morton ; 
> Huang, Ray ; Koenig, Christian ; 
> Rafael J. Wysocki ; George Kennedy 
> 
> Subject: Re: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail 
> in __free_pages_core()")
> 
> On 13.03.21 14:48, Mike Rapoport wrote:
> > Hi,
> > 
> > On Sat, Mar 13, 2021 at 10:05:23AM +0100, David Hildenbrand wrote:
> >>> Am 13.03.2021 um 05:04 schrieb Liang, Liang (Leo) :
> >>>
> >>> Hi David,
> >>>
> >>> Which benchmark tool you prefer? Memtest86+ or else?
> >>
> >> Hi Leo,
> >>
> >> I think you want something that runs under Linux natively.
> >>
> >> I'm planning on coding up a kernel module to walk all 4MB pages in 
> >> the freelists and perform a stream benchmark individually. Then we 
> >> might be able to identify the problematic range - if there is a 
> >> problematic range :)
> > 
> > My wild guess would be that the pages that are now at the head of free 
> > lists have wrong caching enabled. Might be worth checking in your test 
> > module.
> 
> I hacked something up real quick:
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdavidhildenbrand%2Fkstreamdata=04%7C01%7Cliang.liang%40amd.com%7C61fb103eeb7647f5228408d8e7b2d7d3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637514102622932303%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=ufUYQRtdSHvEkR61LiJZtsVdYZbtdGbKlzZHOQdct78%3Dreserved=0
> 
> Only briefly tested inside a VM. The output looks something like
> 
> [...]
> [ 8396.432225] [0x4580 - 0x45bf] 25322 MB/s /
> 38948 MB/s
> [ 8396.448749] [0x45c0 - 0x45ff] 24481 MB/s /
> 38946 MB/s
> [ 8396.465197] [0x4600 - 0x463f] 24892 MB/s /
> 39170 MB/s
> [ 8396.481552] [0x4640 - 0x467f] 25222 MB/s /
> 39156 MB/s
> [ 8396.498012] [0x4680 - 0x46bf] 24416 MB/s /
> 39159 MB/s
> [ 8396.514397] [0x46c0 - 0x46ff] 25469 MB/s /
> 38940 MB/s
> [ 8396.530849] [0x4700 - 0x473f] 24885 MB/s /
> 38734 MB/s
> [ 8396.547195] [0x4740 - 0x477f] 25458 MB/s /
> 38941 MB/s
> [...]
> 
> The benchmark allocates one 4 MiB chunk at a time and runs a simplified 
> STREAM benchmark a) without flushing caches b) flushing caches before every 
> memory access.
> 
> It would be great if you could run that with the *old behavior* kernel (IOW, 
> without 7fef431be9c9), so we might still be lucky to catch the problematic 
> area in the freelist.
> 
> Let's see if that will indicate anything.
> 
> --
> Thanks,
> 
> David / dhildenb



-- 
Sincerely yours,
Mike.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail in __free_pages_core()")

2021-03-16 Thread Liang, Liang (Leo)
[AMD Public Use]

Hi Mike,

Thanks for help. The patch works for me and boot time back to normal. So it's a 
fix, or just WA?

BRs,
Leo
-Original Message-
From: Mike Rapoport  
Sent: Tuesday, March 16, 2021 2:50 PM
To: Liang, Liang (Leo) 
Cc: David Hildenbrand ; Deucher, Alexander 
; linux-ker...@vger.kernel.org; amd-gfx list 
; Andrew Morton ; 
Huang, Ray ; Koenig, Christian ; 
Rafael J. Wysocki ; George Kennedy 

Subject: Re: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail 
in __free_pages_core()")

Hi Leo,

On Tue, Mar 16, 2021 at 12:36:29AM +, Liang, Liang (Leo) wrote:
> 
> Hi David,
> 
> Sorry for late. If revert 7fef431be9c9 (without 7fef431be9c9), the dmesg 
> attached. And looks the exception as below:
> [  +0.027833] [0x7800 - 0x783f] 20925 MB/s / 
> 25405 MB/s [  +1.363596] [0x0001 - 0x0001003f] 222 
> MB/s / 222 MB/s [  +1.562192] [0x00010040 - 
> 0x0001007f] 222 MB/s / 222 MB/s [  +1.881332] 
> [0x00010080 - 0x000100bf] 195 MB/s / 159 MB/s [  
> +1.383388] [0x000100c0 - 0x000100ff] 219 MB/s / 221 
> MB/s [  +0.029342] [0x00010100 - 0x0001013f] 19807 
> MB/s / 24125 MB/s
> 
> What is the problem here? Do you want to check the acpi tables?

As it seems the first 16M at 0x0001 are two orders of magnitude 
slower than the rest of the memory as if there is a different memory device 
there.

This would explain why with 7fef431be9c9 everything gets slower as we allocate 
the first (and probably quite critical) data from those 16M.

No idea how this could be related to ACPI and why ACPI initialization causes 
the huge slowdown on its own.

Can you please try booting with 7fef431be9c9 still applied and with this patch 
(not even compile tested):

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 
d883176ef2ce..780f11ca14c9 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -778,6 +778,7 @@ void __init setup_arch(char **cmdline_p)
 * L1TF its contents can be leaked to user processes.
 */
memblock_reserve(0, PAGE_SIZE);
+   memblock_reserve(0x0001, SZ_16M);
 
early_reserve_initrd();
 
 
> BRs,
> Leo
> -Original Message-
> From: David Hildenbrand 
> Sent: Monday, March 15, 2021 9:04 PM
> To: Mike Rapoport 
> Cc: Liang, Liang (Leo) ; Deucher, Alexander 
> ; linux-ker...@vger.kernel.org; amd-gfx 
> list ; Andrew Morton 
> ; Huang, Ray ; Koenig, 
> Christian ; Rafael J. Wysocki 
> ; George Kennedy 
> Subject: Re: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages 
> to tail in __free_pages_core()")
> 
> On 13.03.21 14:48, Mike Rapoport wrote:
> > Hi,
> > 
> > On Sat, Mar 13, 2021 at 10:05:23AM +0100, David Hildenbrand wrote:
> >>> Am 13.03.2021 um 05:04 schrieb Liang, Liang (Leo) :
> >>>
> >>> Hi David,
> >>>
> >>> Which benchmark tool you prefer? Memtest86+ or else?
> >>
> >> Hi Leo,
> >>
> >> I think you want something that runs under Linux natively.
> >>
> >> I'm planning on coding up a kernel module to walk all 4MB pages in 
> >> the freelists and perform a stream benchmark individually. Then we 
> >> might be able to identify the problematic range - if there is a 
> >> problematic range :)
> > 
> > My wild guess would be that the pages that are now at the head of 
> > free lists have wrong caching enabled. Might be worth checking in 
> > your test module.
> 
> I hacked something up real quick:
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> ub.com%2Fdavidhildenbrand%2Fkstreamdata=04%7C01%7CLiang.Liang%40a
> md.com%7Cb569c2890cd14a555dcd08d8e847cea6%7C3dd8961fe4884e608e11a82d99
> 4e183d%7C0%7C0%7C637514742399803857%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC
> 4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sd
> ata=7Mj%2BBlp%2BDZNg3grEYMnDyx%2FLGkZYu0YPfstiByD6UCk%3Dreserved=
> 0
> 
> Only briefly tested inside a VM. The output looks something like
> 
> [...]
> [ 8396.432225] [0x4580 - 0x45bf] 25322 MB/s /
> 38948 MB/s
> [ 8396.448749] [0x45c0 - 0x45ff] 24481 MB/s /
> 38946 MB/s
> [ 8396.465197] [0x4600 - 0x463f] 24892 MB/s /
> 39170 MB/s
> [ 8396.481552] [0x4640 - 0x467f] 25222 MB/s /
> 39156 MB/s
> [ 8396.498012] [0x4680 - 0x46bf] 24416 MB/s /
> 39159 MB/s
> [ 8396.514397] [0x46c0 - 0x46ff] 25469 MB/s /
> 38940 MB/s
> [ 8396.530849] [0x4700 - 0x473f] 24885 MB/s /
> 38734 MB/s
> [ 8396.547195] [0x4740 - 0x477f] 25458 MB/s /
> 38941 MB/s
> [...]
> 
> The benchmark allocates one 4 MiB chunk at a time and runs a simplified 
> STREAM benchmark a) without flushing caches b) flushing caches before every 
> memory access.
> 
> It would be great if you could run that with the *old behavior* kernel (IOW, 
> without 7fef431be9c9), so we might 

Re: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail in __free_pages_core()")

2021-03-15 Thread David Hildenbrand

On 13.03.21 14:48, Mike Rapoport wrote:

Hi,

On Sat, Mar 13, 2021 at 10:05:23AM +0100, David Hildenbrand wrote:

Am 13.03.2021 um 05:04 schrieb Liang, Liang (Leo) :

Hi David,

Which benchmark tool you prefer? Memtest86+ or else?


Hi Leo,

I think you want something that runs under Linux natively.

I‘m planning on coding up a kernel module to walk all 4MB pages in the
freelists and perform a stream benchmark individually. Then we might be
able to identify the problematic range - if there is a problematic range :)


My wild guess would be that the pages that are now at the head of free
lists have wrong caching enabled. Might be worth checking in your test
module.


I hacked something up real quick:

https://github.com/davidhildenbrand/kstream

Only briefly tested inside a VM. The output looks something like

[...]
[ 8396.432225] [0x4580 - 0x45bf] 25322 MB/s / 
38948 MB/s
[ 8396.448749] [0x45c0 - 0x45ff] 24481 MB/s / 
38946 MB/s
[ 8396.465197] [0x4600 - 0x463f] 24892 MB/s / 
39170 MB/s
[ 8396.481552] [0x4640 - 0x467f] 25222 MB/s / 
39156 MB/s
[ 8396.498012] [0x4680 - 0x46bf] 24416 MB/s / 
39159 MB/s
[ 8396.514397] [0x46c0 - 0x46ff] 25469 MB/s / 
38940 MB/s
[ 8396.530849] [0x4700 - 0x473f] 24885 MB/s / 
38734 MB/s
[ 8396.547195] [0x4740 - 0x477f] 25458 MB/s / 
38941 MB/s

[...]

The benchmark allocates one 4 MiB chunk at a time and runs a simplified 
STREAM benchmark a) without flushing caches b) flushing caches before 
every memory access.


It would be great if you could run that with the *old behavior* kernel 
(IOW, without 7fef431be9c9), so we might still be lucky to catch the 
problematic area in the freelist.


Let's see if that will indicate anything.

--
Thanks,

David / dhildenb

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail in __free_pages_core()")

2021-03-15 Thread Mike Rapoport
Hi,

On Sat, Mar 13, 2021 at 10:05:23AM +0100, David Hildenbrand wrote:
> > Am 13.03.2021 um 05:04 schrieb Liang, Liang (Leo) :
> > 
> > Hi David,
> > 
> > Which benchmark tool you prefer? Memtest86+ or else?
> 
> Hi Leo,
> 
> I think you want something that runs under Linux natively.
> 
> I‘m planning on coding up a kernel module to walk all 4MB pages in the
> freelists and perform a stream benchmark individually. Then we might be
> able to identify the problematic range - if there is a problematic range :)

My wild guess would be that the pages that are now at the head of free
lists have wrong caching enabled. Might be worth checking in your test
module.

> Guess I‘ll have it running by Monday and let you know.
> 
> Cheers!

-- 
Sincerely yours,
Mike.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail in __free_pages_core()")

2021-03-13 Thread David Hildenbrand

> Am 13.03.2021 um 05:04 schrieb Liang, Liang (Leo) :
> 
> [AMD Public Use]
> 
> Hi David,
> 
> Which benchmark tool you prefer? Memtest86+ or else?

Hi Leo,

I think you want something that runs under Linux natively.

I‘m planning on coding up a kernel module to walk all 4MB pages in the 
freelists and perform a stream benchmark individually. Then we might be able to 
identify the problematic range - if there is a problematic range :) Guess I‘ll 
have it running by Monday and let you know.

Cheers!

> 
> BRs,
> Leo
> -Original Message-
> From: David Hildenbrand  
> Sent: Saturday, March 13, 2021 12:47 AM
> To: Liang, Liang (Leo) ; Deucher, Alexander 
> ; linux-ker...@vger.kernel.org; amd-gfx list 
> ; Andrew Morton 
> Cc: Huang, Ray ; Koenig, Christian 
> ; Mike Rapoport ; Rafael J. 
> Wysocki ; George Kennedy 
> Subject: Re: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail 
> in __free_pages_core()")
> 
>> On 12.03.21 17:19, Liang, Liang (Leo) wrote:
>> [AMD Public Use]
>> 
>> Dmesg attached.
>> 
> 
> 
> So, looks like the "real" slowdown starts once the buddy is up and running 
> (no surprise).
> 
> 
> [0.044035] Memory: 6856724K/7200304K available (14345K kernel code, 9699K 
> rwdata, 5276K rodata, 2628K init, 12104K bss, 343324K reserved, 0K 
> cma-reserved)
> [0.044045] random: get_random_u64 called from 
> __kmem_cache_create+0x33/0x460 with crng_init=1
> [0.049025] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=16, Nodes=1
> [0.050036] ftrace: allocating 47158 entries in 185 pages
> [0.097487] ftrace: allocated 185 pages with 5 groups
> [0.109210] rcu: Hierarchical RCU implementation.
> 
> vs.
> 
> [0.041115] Memory: 6869396K/7200304K available (14345K kernel code, 3433K 
> rwdata, 5284K rodata, 2624K init, 6088K bss, 330652K reserved, 0K 
> cma-reserved)
> [0.041127] random: get_random_u64 called from 
> __kmem_cache_create+0x31/0x430 with crng_init=1
> [0.041309] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=16, Nodes=1
> [0.041335] ftrace: allocating 47184 entries in 185 pages
> [0.055719] ftrace: allocated 185 pages with 5 groups
> [0.055863] rcu: Hierarchical RCU implementation.
> 
> 
> And it gets especially bad during ACPI table processing:
> 
> [4.158303] ACPI: Added _OSI(Module Device)
> [4.158767] ACPI: Added _OSI(Processor Device)
> [4.159230] ACPI: Added _OSI(3.0 _SCP Extensions)
> [4.159705] ACPI: Added _OSI(Processor Aggregator Device)
> [4.160551] ACPI: Added _OSI(Linux-Dell-Video)
> [4.161359] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
> [4.162264] ACPI: Added _OSI(Linux-HPI-Hybrid-Graphics)
> [   17.713421] ACPI: 13 ACPI AML tables successfully acquired and loaded
> [   18.716065] ACPI: [Firmware Bug]: BIOS _OSI(Linux) query ignored
> [   20.743828] ACPI: EC: EC started
> [   20.744155] ACPI: EC: interrupt blocked
> [   20.945956] ACPI: EC: EC_CMD/EC_SC=0x666, EC_DATA=0x662
> [   20.946618] ACPI: \_SB_.PCI0.LPC0.EC0_: Boot DSDT EC used to handle 
> transactions
> [   20.947348] ACPI: Interpreter enabled
> [   20.951278] ACPI: (supports S0 S3 S4 S5)
> [   20.951632] ACPI: Using IOAPIC for interrupt routing
> 
> vs.
> 
> [0.216039] ACPI: Added _OSI(Module Device)
> [0.216041] ACPI: Added _OSI(Processor Device)
> [0.216043] ACPI: Added _OSI(3.0 _SCP Extensions)
> [0.216044] ACPI: Added _OSI(Processor Aggregator Device)
> [0.216046] ACPI: Added _OSI(Linux-Dell-Video)
> [0.216048] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
> [0.216049] ACPI: Added _OSI(Linux-HPI-Hybrid-Graphics)
> [0.228259] ACPI: 13 ACPI AML tables successfully acquired and loaded
> [0.229527] ACPI: [Firmware Bug]: BIOS _OSI(Linux) query ignored
> [0.231663] ACPI: EC: EC started
> [0.231666] ACPI: EC: interrupt blocked
> [0.233664] ACPI: EC: EC_CMD/EC_SC=0x666, EC_DATA=0x662
> [0.233667] ACPI: \_SB_.PCI0.LPC0.EC0_: Boot DSDT EC used to handle 
> transactions
> [0.233670] ACPI: Interpreter enabled
> [0.233685] ACPI: (supports S0 S3 S4 S5)
> [0.233687] ACPI: Using IOAPIC for interrupt routing
> 
> The jump from 4.1 -> 17.7 is especially bad.
> 
> Which might in fact indicate that this could be related to using some very 
> special slow (ACPI?) memory for ordinary purposes, interfering with actual 
> ACPI users?
> 
> But again, just a wild guess, because the system is extremely slow 
> afterwards, however, we don't have any pauses without any signs of life for 
> that long.
> 
> 
> It would be interesting to run a simple memory bandwidth benchmark on the 
> fast kernel with differing sizes up to running OOM to see if there is really 
> some memory that is just horribly slow once allocated and used.
> 
> --
> Thanks,
> 
> David / dhildenb
> 

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail in __free_pages_core()")

2021-03-12 Thread Liang, Liang (Leo)
[AMD Public Use]

Hi David,

Which benchmark tool you prefer? Memtest86+ or else?

BRs,
Leo
-Original Message-
From: David Hildenbrand  
Sent: Saturday, March 13, 2021 12:47 AM
To: Liang, Liang (Leo) ; Deucher, Alexander 
; linux-ker...@vger.kernel.org; amd-gfx list 
; Andrew Morton 
Cc: Huang, Ray ; Koenig, Christian 
; Mike Rapoport ; Rafael J. 
Wysocki ; George Kennedy 
Subject: Re: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail 
in __free_pages_core()")

On 12.03.21 17:19, Liang, Liang (Leo) wrote:
> [AMD Public Use]
> 
> Dmesg attached.
> 


So, looks like the "real" slowdown starts once the buddy is up and running (no 
surprise).


[0.044035] Memory: 6856724K/7200304K available (14345K kernel code, 9699K 
rwdata, 5276K rodata, 2628K init, 12104K bss, 343324K reserved, 0K cma-reserved)
[0.044045] random: get_random_u64 called from 
__kmem_cache_create+0x33/0x460 with crng_init=1
[0.049025] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=16, Nodes=1
[0.050036] ftrace: allocating 47158 entries in 185 pages
[0.097487] ftrace: allocated 185 pages with 5 groups
[0.109210] rcu: Hierarchical RCU implementation.

vs.

[0.041115] Memory: 6869396K/7200304K available (14345K kernel code, 3433K 
rwdata, 5284K rodata, 2624K init, 6088K bss, 330652K reserved, 0K cma-reserved)
[0.041127] random: get_random_u64 called from 
__kmem_cache_create+0x31/0x430 with crng_init=1
[0.041309] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=16, Nodes=1
[0.041335] ftrace: allocating 47184 entries in 185 pages
[0.055719] ftrace: allocated 185 pages with 5 groups
[0.055863] rcu: Hierarchical RCU implementation.


And it gets especially bad during ACPI table processing:

[4.158303] ACPI: Added _OSI(Module Device)
[4.158767] ACPI: Added _OSI(Processor Device)
[4.159230] ACPI: Added _OSI(3.0 _SCP Extensions)
[4.159705] ACPI: Added _OSI(Processor Aggregator Device)
[4.160551] ACPI: Added _OSI(Linux-Dell-Video)
[4.161359] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
[4.162264] ACPI: Added _OSI(Linux-HPI-Hybrid-Graphics)
[   17.713421] ACPI: 13 ACPI AML tables successfully acquired and loaded
[   18.716065] ACPI: [Firmware Bug]: BIOS _OSI(Linux) query ignored
[   20.743828] ACPI: EC: EC started
[   20.744155] ACPI: EC: interrupt blocked
[   20.945956] ACPI: EC: EC_CMD/EC_SC=0x666, EC_DATA=0x662
[   20.946618] ACPI: \_SB_.PCI0.LPC0.EC0_: Boot DSDT EC used to handle 
transactions
[   20.947348] ACPI: Interpreter enabled
[   20.951278] ACPI: (supports S0 S3 S4 S5)
[   20.951632] ACPI: Using IOAPIC for interrupt routing

vs.

[0.216039] ACPI: Added _OSI(Module Device)
[0.216041] ACPI: Added _OSI(Processor Device)
[0.216043] ACPI: Added _OSI(3.0 _SCP Extensions)
[0.216044] ACPI: Added _OSI(Processor Aggregator Device)
[0.216046] ACPI: Added _OSI(Linux-Dell-Video)
[0.216048] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
[0.216049] ACPI: Added _OSI(Linux-HPI-Hybrid-Graphics)
[0.228259] ACPI: 13 ACPI AML tables successfully acquired and loaded
[0.229527] ACPI: [Firmware Bug]: BIOS _OSI(Linux) query ignored
[0.231663] ACPI: EC: EC started
[0.231666] ACPI: EC: interrupt blocked
[0.233664] ACPI: EC: EC_CMD/EC_SC=0x666, EC_DATA=0x662
[0.233667] ACPI: \_SB_.PCI0.LPC0.EC0_: Boot DSDT EC used to handle 
transactions
[0.233670] ACPI: Interpreter enabled
[0.233685] ACPI: (supports S0 S3 S4 S5)
[0.233687] ACPI: Using IOAPIC for interrupt routing

The jump from 4.1 -> 17.7 is especially bad.

Which might in fact indicate that this could be related to using some very 
special slow (ACPI?) memory for ordinary purposes, interfering with actual ACPI 
users?

But again, just a wild guess, because the system is extremely slow afterwards, 
however, we don't have any pauses without any signs of life for that long.


It would be interesting to run a simple memory bandwidth benchmark on the fast 
kernel with differing sizes up to running OOM to see if there is really some 
memory that is just horribly slow once allocated and used.

--
Thanks,

David / dhildenb
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail in __free_pages_core()")

2021-03-12 Thread David Hildenbrand

On 12.03.21 17:19, Liang, Liang (Leo) wrote:

[AMD Public Use]

Dmesg attached.




So, looks like the "real" slowdown starts once the buddy is up and running (no 
surprise).


[0.044035] Memory: 6856724K/7200304K available (14345K kernel code, 9699K 
rwdata, 5276K rodata, 2628K init, 12104K bss, 343324K reserved, 0K cma-reserved)
[0.044045] random: get_random_u64 called from 
__kmem_cache_create+0x33/0x460 with crng_init=1
[0.049025] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=16, Nodes=1
[0.050036] ftrace: allocating 47158 entries in 185 pages
[0.097487] ftrace: allocated 185 pages with 5 groups
[0.109210] rcu: Hierarchical RCU implementation.

vs.

[0.041115] Memory: 6869396K/7200304K available (14345K kernel code, 3433K 
rwdata, 5284K rodata, 2624K init, 6088K bss, 330652K reserved, 0K cma-reserved)
[0.041127] random: get_random_u64 called from 
__kmem_cache_create+0x31/0x430 with crng_init=1
[0.041309] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=16, Nodes=1
[0.041335] ftrace: allocating 47184 entries in 185 pages
[0.055719] ftrace: allocated 185 pages with 5 groups
[0.055863] rcu: Hierarchical RCU implementation.


And it gets especially bad during ACPI table processing:

[4.158303] ACPI: Added _OSI(Module Device)
[4.158767] ACPI: Added _OSI(Processor Device)
[4.159230] ACPI: Added _OSI(3.0 _SCP Extensions)
[4.159705] ACPI: Added _OSI(Processor Aggregator Device)
[4.160551] ACPI: Added _OSI(Linux-Dell-Video)
[4.161359] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
[4.162264] ACPI: Added _OSI(Linux-HPI-Hybrid-Graphics)
[   17.713421] ACPI: 13 ACPI AML tables successfully acquired and loaded
[   18.716065] ACPI: [Firmware Bug]: BIOS _OSI(Linux) query ignored
[   20.743828] ACPI: EC: EC started
[   20.744155] ACPI: EC: interrupt blocked
[   20.945956] ACPI: EC: EC_CMD/EC_SC=0x666, EC_DATA=0x662
[   20.946618] ACPI: \_SB_.PCI0.LPC0.EC0_: Boot DSDT EC used to handle 
transactions
[   20.947348] ACPI: Interpreter enabled
[   20.951278] ACPI: (supports S0 S3 S4 S5)
[   20.951632] ACPI: Using IOAPIC for interrupt routing

vs.

[0.216039] ACPI: Added _OSI(Module Device)
[0.216041] ACPI: Added _OSI(Processor Device)
[0.216043] ACPI: Added _OSI(3.0 _SCP Extensions)
[0.216044] ACPI: Added _OSI(Processor Aggregator Device)
[0.216046] ACPI: Added _OSI(Linux-Dell-Video)
[0.216048] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
[0.216049] ACPI: Added _OSI(Linux-HPI-Hybrid-Graphics)
[0.228259] ACPI: 13 ACPI AML tables successfully acquired and loaded
[0.229527] ACPI: [Firmware Bug]: BIOS _OSI(Linux) query ignored
[0.231663] ACPI: EC: EC started
[0.231666] ACPI: EC: interrupt blocked
[0.233664] ACPI: EC: EC_CMD/EC_SC=0x666, EC_DATA=0x662
[0.233667] ACPI: \_SB_.PCI0.LPC0.EC0_: Boot DSDT EC used to handle 
transactions
[0.233670] ACPI: Interpreter enabled
[0.233685] ACPI: (supports S0 S3 S4 S5)
[0.233687] ACPI: Using IOAPIC for interrupt routing

The jump from 4.1 -> 17.7 is especially bad.

Which might in fact indicate that this could be related to using
some very special slow (ACPI?) memory for ordinary purposes,
interfering with actual ACPI users?

But again, just a wild guess, because the system is extremely slow
afterwards, however, we don't have any pauses without any signs of
life for that long.


It would be interesting to run a simple memory bandwidth benchmark
on the fast kernel with differing sizes up to running OOM to see if
there is really some memory that is just horribly slow once allocated and
used.

--
Thanks,

David / dhildenb

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail in __free_pages_core()")

2021-03-12 Thread Deucher, Alexander
[AMD Public Use]

> -Original Message-
> From: David Hildenbrand 
> Sent: Friday, March 12, 2021 10:48 AM
> To: Deucher, Alexander ; linux-
> ker...@vger.kernel.org; amd-gfx list ;
> Andrew Morton ; Liang, Liang (Leo)
> 
> Cc: Huang, Ray ; Koenig, Christian
> ; Mike Rapoport ;
> Rafael J. Wysocki ; George Kennedy
> 
> Subject: Re: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to
> tail in __free_pages_core()")
> 
> > 8G (with some carve out for the integrated GPU).
> > [0.044181] Memory: 6858688K/7200304K available (14345K kernel code,
> 9659K rwdata, 4980K rodata, 2484K init, 12292K bss, 341360K reserved, 0K
> cma-reserved)
> >
> > Nothing particularly special about these systems that I am aware of.  I'll 
> > see
> if we can repro this issue on any other platforms, but so far, not one has
> noticed any problems.
> >
> >>
> >> Increasing the boot time from a few seconds to 2-3 minutes does not
> >> smell like some corner case cache effects we might be hitting in this
> >> particular instance - there have been minor reports that it either
> >> slightly increased or slightly decreases initial system performance, but 
> >> that
> was about it.
> >>
> >> Either, yet another latent BUG (but why? why should memory access
> >> suddenly be that slow? I could only guess that we are now making
> >> sooner use of very slow memory), or there is really something else weird
> going on.
> >
> > Looks like pretty much everything is slower based on the timestamps in the
> dmesg output.  There is a big jump here:
> 
> If we're really dealing with some specific slow memory regions and that
> memory gets allocated for something that gets used regularly, then we might
> get a general slowdown. Hard to identify, though :)
> 
> >
> >> [3.758596] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
> >> [3.759372] ACPI: Added _OSI(Linux-HPI-Hybrid-Graphics)
> >> [   16.177983] ACPI: 13 ACPI AML tables successfully acquired and loaded
> >> [   17.099316] ACPI: [Firmware Bug]: BIOS _OSI(Linux) query ignored
> >> [   18.969959] ACPI: EC: EC started
> >
> > And here:
> >
> >> [   36.566608] PCI: CLS 64 bytes, default 64
> >> [   36.575383] Trying to unpack rootfs image as initramfs...
> >> [   44.594348] Initramfs unpacking failed: Decoding failed
> >> [   44.765141] Freeing initrd memory: 46348K
> >
> > Also seeing soft lockups:
> >> [  124.588634] watchdog: BUG: soft lockup - CPU#1 stuck for 23s!
> >> [swapper/1:0]
> 
> Yes, I noticed that -- there is a heavy slowdown somewhere.
> 
> As that patch is v5.10 already (and we're close to v5.12) I assume something
> is particularly weird about the platform you are running on - because this is
> the first time I see a report like that.

Well, this platform is not yet widely available outside of AMD so it's not 
likely to have been seen by anyone else, but there is nothing special about it 
compared to any other AMD platforms beyond that that I am aware of.

Alex

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail in __free_pages_core()")

2021-03-12 Thread David Hildenbrand

8G (with some carve out for the integrated GPU).
[0.044181] Memory: 6858688K/7200304K available (14345K kernel code, 9659K 
rwdata, 4980K rodata, 2484K init, 12292K bss, 341360K reserved, 0K cma-reserved)

Nothing particularly special about these systems that I am aware of.  I'll see 
if we can repro this issue on any other platforms, but so far, not one has 
noticed any problems.



Increasing the boot time from a few seconds to 2-3 minutes does not smell
like some corner case cache effects we might be hitting in this particular
instance - there have been minor reports that it either slightly increased or
slightly decreases initial system performance, but that was about it.

Either, yet another latent BUG (but why? why should memory access
suddenly be that slow? I could only guess that we are now making sooner
use of very slow memory), or there is really something else weird going on.


Looks like pretty much everything is slower based on the timestamps in the 
dmesg output.  There is a big jump here:


If we're really dealing with some specific slow memory regions and that 
memory gets allocated for something that gets used regularly, then we 
might get a general slowdown. Hard to identify, though :)





[3.758596] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
[3.759372] ACPI: Added _OSI(Linux-HPI-Hybrid-Graphics)
[   16.177983] ACPI: 13 ACPI AML tables successfully acquired and loaded
[   17.099316] ACPI: [Firmware Bug]: BIOS _OSI(Linux) query ignored
[   18.969959] ACPI: EC: EC started


And here:


[   36.566608] PCI: CLS 64 bytes, default 64
[   36.575383] Trying to unpack rootfs image as initramfs...
[   44.594348] Initramfs unpacking failed: Decoding failed
[   44.765141] Freeing initrd memory: 46348K


Also seeing soft lockups:

[  124.588634] watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [swapper/1:0]


Yes, I noticed that -- there is a heavy slowdown somewhere.

As that patch is v5.10 already (and we're close to v5.12) I assume 
something is particularly weird about the platform you are running on - 
because this is the first time I see a report like that.


--
Thanks,

David / dhildenb

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail in __free_pages_core()")

2021-03-12 Thread David Hildenbrand

On 12.03.21 15:06, Deucher, Alexander wrote:

[AMD Public Use]


-Original Message-
From: David Hildenbrand 
Sent: Thursday, March 11, 2021 10:03 AM
To: Deucher, Alexander ; linux-
ker...@vger.kernel.org; amd-gfx list ;
Andrew Morton 
Cc: Huang, Ray ; Koenig, Christian
; Liang, Liang (Leo) ;
Mike Rapoport ; Rafael J. Wysocki
; George Kennedy 
Subject: Re: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to
tail in __free_pages_core()")

On 11.03.21 15:41, Deucher, Alexander wrote:

[AMD Public Use]

Booting kernels on certain AMD platforms takes 2-3 minutes with the patch

in the subject.  Reverting it restores quick boot times (few seconds).  Any
ideas?




Hi,

We just discovered latent BUGs in ACPI code whereby ACPI tables are
exposed to the page allocator as ordinary, free system RAM. With the
patch you mention, the order in which pages get allocated from the page
allocator are changed - which makes the BUG trigger more easily.

I could imagine that someone allocates and uses that memory on your
platform, and I could imagine that such accesses are very slow.

I cannot tell if that is the root cause, but at least it would make sense.

See
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.
kernel.org%2Fpatchwork%2Fpatch%2F1389314%2Fdata=04%7C01%7C
alexander.deucher%40amd.com%7Cd1533aaddccd464c59f308d8e49ec563%7
C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637510717893096801%
7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLC
JBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=xpty77D54Z5S%2FKK
JO5SsVQaNsHoojWMR73whpu8VT%2B4%3Dreserved=0

You might want to give that patch a try (not sure if it's the latest
version). CCing George


Thanks for the patch.  Unfortunately it didn't help.  Any other ideas?  Is 
there a newer version of that patch?



@George?

It's interesting that this only applies to these special AMD systems so  
far. Is there anything particular about these systems? How much memory  
do these systems have?


Increasing the boot time from a few seconds to 2-3 minutes does not  
smell like some corner case cache effects we might be hitting in this  
particular instance - there have been minor reports that it either  
slightly increased or slightly decreases initial system performance, but  
that was about it.


Either, yet another latent BUG (but why? why should memory access  
suddenly be that slow? I could only guess that we are now making sooner  
use of very slow memory), or there is really something else weird going on.


Cheers!


Alex



--
Thanks,

David / dhildenb

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail in __free_pages_core()")

2021-03-12 Thread Deucher, Alexander
[AMD Public Use]

> -Original Message-
> From: David Hildenbrand 
> Sent: Friday, March 12, 2021 9:12 AM
> To: Deucher, Alexander ; linux-
> ker...@vger.kernel.org; amd-gfx list ;
> Andrew Morton 
> Cc: Huang, Ray ; Koenig, Christian
> ; Liang, Liang (Leo) ;
> Mike Rapoport ; Rafael J. Wysocki
> ; George Kennedy 
> Subject: Re: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to
> tail in __free_pages_core()")
> 
> On 12.03.21 15:06, Deucher, Alexander wrote:
> > [AMD Public Use]
> >
> >> -Original Message-
> >> From: David Hildenbrand 
> >> Sent: Thursday, March 11, 2021 10:03 AM
> >> To: Deucher, Alexander ; linux-
> >> ker...@vger.kernel.org; amd-gfx list ;
> >> Andrew Morton 
> >> Cc: Huang, Ray ; Koenig, Christian
> >> ; Liang, Liang (Leo)
> ;
> >> Mike Rapoport ; Rafael J. Wysocki
> >> ; George Kennedy 
> >> Subject: Re: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages
> >> to tail in __free_pages_core()")
> >>
> >> On 11.03.21 15:41, Deucher, Alexander wrote:
> >>> [AMD Public Use]
> >>>
> >>> Booting kernels on certain AMD platforms takes 2-3 minutes with the
> >>> patch
> >> in the subject.  Reverting it restores quick boot times (few
> >> seconds).  Any ideas?
> >>>
> >>
> >> Hi,
> >>
> >> We just discovered latent BUGs in ACPI code whereby ACPI tables are
> >> exposed to the page allocator as ordinary, free system RAM. With the
> >> patch you mention, the order in which pages get allocated from the
> >> page allocator are changed - which makes the BUG trigger more easily.
> >>
> >> I could imagine that someone allocates and uses that memory on your
> >> platform, and I could imagine that such accesses are very slow.
> >>
> >> I cannot tell if that is the root cause, but at least it would make sense.
> >>
> >> See
> >>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.
> >>
> kernel.org%2Fpatchwork%2Fpatch%2F1389314%2Fdata=04%7C01%7C
> >>
> alexander.deucher%40amd.com%7Cd1533aaddccd464c59f308d8e49ec563%7
> >>
> C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637510717893096801%
> >>
> 7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLC
> >>
> JBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=xpty77D54Z5S%2FKK
> >> JO5SsVQaNsHoojWMR73whpu8VT%2B4%3Dreserved=0
> >>
> >> You might want to give that patch a try (not sure if it's the latest
> >> version). CCing George
> >
> > Thanks for the patch.  Unfortunately it didn't help.  Any other ideas?  Is
> there a newer version of that patch?
> >
> 
> @George?
> 
> It's interesting that this only applies to these special AMD systems so far. 
> Is
> there anything particular about these systems? How much memory do these
> systems have?

8G (with some carve out for the integrated GPU).
[0.044181] Memory: 6858688K/7200304K available (14345K kernel code, 9659K 
rwdata, 4980K rodata, 2484K init, 12292K bss, 341360K reserved, 0K cma-reserved)

Nothing particularly special about these systems that I am aware of.  I'll see 
if we can repro this issue on any other platforms, but so far, not one has 
noticed any problems.

> 
> Increasing the boot time from a few seconds to 2-3 minutes does not smell
> like some corner case cache effects we might be hitting in this particular
> instance - there have been minor reports that it either slightly increased or
> slightly decreases initial system performance, but that was about it.
> 
> Either, yet another latent BUG (but why? why should memory access
> suddenly be that slow? I could only guess that we are now making sooner
> use of very slow memory), or there is really something else weird going on.

Looks like pretty much everything is slower based on the timestamps in the 
dmesg output.  There is a big jump here:

> [3.758596] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
> [3.759372] ACPI: Added _OSI(Linux-HPI-Hybrid-Graphics)
> [   16.177983] ACPI: 13 ACPI AML tables successfully acquired and loaded
> [   17.099316] ACPI: [Firmware Bug]: BIOS _OSI(Linux) query ignored
> [   18.969959] ACPI: EC: EC started

And here:

> [   36.566608] PCI: CLS 64 bytes, default 64
> [   36.575383] Trying to unpack rootfs image as initramfs...
> [   44.594348] Initramfs unpacking failed: Decoding failed
> [   44.765141] Freeing initrd memory: 46348K

Also seeing soft lockups:
> [  124.588634] watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [swapper/1:0]

@Liang, Liang (Leo) can you attach the dmesg outputs with 7fef431be9c9 reverted 
and without?

Alex

> 
> Cheers!
> 
> > Alex
> 
> 
> --
> Thanks,
> 
> David / dhildenb
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail in __free_pages_core()")

2021-03-12 Thread Deucher, Alexander
[AMD Public Use]

> -Original Message-
> From: David Hildenbrand 
> Sent: Thursday, March 11, 2021 10:03 AM
> To: Deucher, Alexander ; linux-
> ker...@vger.kernel.org; amd-gfx list ;
> Andrew Morton 
> Cc: Huang, Ray ; Koenig, Christian
> ; Liang, Liang (Leo) ;
> Mike Rapoport ; Rafael J. Wysocki
> ; George Kennedy 
> Subject: Re: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to
> tail in __free_pages_core()")
> 
> On 11.03.21 15:41, Deucher, Alexander wrote:
> > [AMD Public Use]
> >
> > Booting kernels on certain AMD platforms takes 2-3 minutes with the patch
> in the subject.  Reverting it restores quick boot times (few seconds).  Any
> ideas?
> >
> 
> Hi,
> 
> We just discovered latent BUGs in ACPI code whereby ACPI tables are
> exposed to the page allocator as ordinary, free system RAM. With the
> patch you mention, the order in which pages get allocated from the page
> allocator are changed - which makes the BUG trigger more easily.
> 
> I could imagine that someone allocates and uses that memory on your
> platform, and I could imagine that such accesses are very slow.
> 
> I cannot tell if that is the root cause, but at least it would make sense.
> 
> See
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.
> kernel.org%2Fpatchwork%2Fpatch%2F1389314%2Fdata=04%7C01%7C
> alexander.deucher%40amd.com%7Cd1533aaddccd464c59f308d8e49ec563%7
> C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637510717893096801%
> 7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLC
> JBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=xpty77D54Z5S%2FKK
> JO5SsVQaNsHoojWMR73whpu8VT%2B4%3Dreserved=0
> 
> You might want to give that patch a try (not sure if it's the latest
> version). CCing George

Thanks for the patch.  Unfortunately it didn't help.  Any other ideas?  Is 
there a newer version of that patch?

Alex

> 
> Thanks
> 
> > Thanks,
> >
> > Alex
> >
> > [0.00] Linux version 5.11.0-7490c004ae7e (jenkins@24dbd4b4380b)
> (gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, GNU ld (GNU Binutils for Ubuntu)
> 2.30) #20210308 SMP Sun Mar 7 20:04:05 UTC 2021
> > [0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-5.11.0-
> 7490c004ae7e root=UUID=459758f3-5106-4173-b9bc-cf9d528828ec ro
> resume=UUID=23390f67-bbaf-42c1-b31d-64ef7288e39e amd_iommu=off
> nokaslr
> > [0.00] KERNEL supported cpus:
> > [0.00]   Intel GenuineIntel
> > [0.00]   AMD AuthenticAMD
> > [0.00]   Hygon HygonGenuine
> > [0.00]   Centaur CentaurHauls
> > [0.00]   zhaoxin   Shanghai
> > [0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point
> registers'
> > [0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
> > [0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
> > [0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
> > [0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 
> > bytes,
> using 'compacted' format.
> > [0.00] BIOS-provided physical RAM map:
> > [0.00] BIOS-e820: [mem 0x-0x0009efff]
> usable
> > [0.00] BIOS-e820: [mem 0x0009f000-0x000b]
> reserved
> > [0.00] BIOS-e820: [mem 0x0010-0x09af]
> usable
> > [0.00] BIOS-e820: [mem 0x09b0-0x09df]
> reserved
> > [0.00] BIOS-e820: [mem 0x09e0-0x09ef]
> usable
> > [0.00] BIOS-e820: [mem 0x09f0-0x09f10fff]
> ACPI NVS
> > [0.00] BIOS-e820: [mem 0x09f11000-0x6c56efff]
> usable
> > [0.00] BIOS-e820: [mem 0x6c56f000-0x6c56]
> reserved
> > [0.00] BIOS-e820: [mem 0x6c57-0x7877efff]
> usable
> > [0.00] BIOS-e820: [mem 0x7877f000-0x7af7efff]
> reserved
> > [0.00] BIOS-e820: [mem 0x7af7f000-0x7cf7efff]
> ACPI NVS
> > [0.00] BIOS-e820: [mem 0x7cf7f000-0x7cffefff]
> ACPI data
> > [0.00] BIOS-e820: [mem 0x7cfff000-0x7cff]
> usable
> > [0.00] BIOS-e820: [mem 0x7d00-0x7dff]
> reserved
> > [0.00] BIOS-e820: [mem 0x7f00-0x7fff]
> reserved
> > [0.00] BIOS-e820: [mem 0xa000-0xa00f]
> reserved
> > [0.00] BIOS-e820: [mem 0xf000-0xf7ff]
> reserved
> > [0.00] BIOS-e820: [mem 0xfec0-0xfec01fff]
> reserved
> > [0.00] BIOS-e820: [mem 0xfec1-0xfec10fff]
> reserved
> > [0.00] BIOS-e820: [mem 0xfec2-0xfec20fff]
> reserved
> > [0.00] BIOS-e820: [mem 0xfed8-0xfed81fff]
> reserved
> > [0.00] BIOS-e820: [mem 0xfedc-0xfedd]
> reserved
> > [0.00] BIOS-e820: [mem 0xfee0-0xfee00fff]
> reserved
> > [

Re: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail in __free_pages_core()")

2021-03-11 Thread David Hildenbrand

On 11.03.21 15:41, Deucher, Alexander wrote:

[AMD Public Use]

Booting kernels on certain AMD platforms takes 2-3 minutes with the patch in 
the subject.  Reverting it restores quick boot times (few seconds).  Any ideas?



Hi,

We just discovered latent BUGs in ACPI code whereby ACPI tables are  
exposed to the page allocator as ordinary, free system RAM. With the  
patch you mention, the order in which pages get allocated from the page  
allocator are changed - which makes the BUG trigger more easily.


I could imagine that someone allocates and uses that memory on your  
platform, and I could imagine that such accesses are very slow.


I cannot tell if that is the root cause, but at least it would make sense.

See https://lore.kernel.org/patchwork/patch/1389314/

You might want to give that patch a try (not sure if it's the latest  
version). CCing George


Thanks


Thanks,

Alex

[0.00] Linux version 5.11.0-7490c004ae7e (jenkins@24dbd4b4380b) (gcc 
(Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, GNU ld (GNU Binutils for Ubuntu) 2.30) 
#20210308 SMP Sun Mar 7 20:04:05 UTC 2021
[0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-5.11.0-7490c004ae7e 
root=UUID=459758f3-5106-4173-b9bc-cf9d528828ec ro 
resume=UUID=23390f67-bbaf-42c1-b31d-64ef7288e39e amd_iommu=off nokaslr
[0.00] KERNEL supported cpus:
[0.00]   Intel GenuineIntel
[0.00]   AMD AuthenticAMD
[0.00]   Hygon HygonGenuine
[0.00]   Centaur CentaurHauls
[0.00]   zhaoxin   Shanghai
[0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point 
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, 
using 'compacted' format.
[0.00] BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009efff] usable
[0.00] BIOS-e820: [mem 0x0009f000-0x000b] reserved
[0.00] BIOS-e820: [mem 0x0010-0x09af] usable
[0.00] BIOS-e820: [mem 0x09b0-0x09df] reserved
[0.00] BIOS-e820: [mem 0x09e0-0x09ef] usable
[0.00] BIOS-e820: [mem 0x09f0-0x09f10fff] ACPI NVS
[0.00] BIOS-e820: [mem 0x09f11000-0x6c56efff] usable
[0.00] BIOS-e820: [mem 0x6c56f000-0x6c56] reserved
[0.00] BIOS-e820: [mem 0x6c57-0x7877efff] usable
[0.00] BIOS-e820: [mem 0x7877f000-0x7af7efff] reserved
[0.00] BIOS-e820: [mem 0x7af7f000-0x7cf7efff] ACPI NVS
[0.00] BIOS-e820: [mem 0x7cf7f000-0x7cffefff] ACPI data
[0.00] BIOS-e820: [mem 0x7cfff000-0x7cff] usable
[0.00] BIOS-e820: [mem 0x7d00-0x7dff] reserved
[0.00] BIOS-e820: [mem 0x7f00-0x7fff] reserved
[0.00] BIOS-e820: [mem 0xa000-0xa00f] reserved
[0.00] BIOS-e820: [mem 0xf000-0xf7ff] reserved
[0.00] BIOS-e820: [mem 0xfec0-0xfec01fff] reserved
[0.00] BIOS-e820: [mem 0xfec1-0xfec10fff] reserved
[0.00] BIOS-e820: [mem 0xfec2-0xfec20fff] reserved
[0.00] BIOS-e820: [mem 0xfed8-0xfed81fff] reserved
[0.00] BIOS-e820: [mem 0xfedc-0xfedd] reserved
[0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved
[0.00] BIOS-e820: [mem 0xff08-0xffdd] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00023f37] usable
[0.00] BIOS-e820: [mem 0x00023f38-0x00027fff] reserved
[0.00] NX (Execute Disable) protection: active
[0.00] e820: update [mem 0x6a275018-0x6a283857] usable ==> usable
[0.00] e820: update [mem 0x6a275018-0x6a283857] usable ==> usable
[0.00] e820: update [mem 0x6c572018-0x6c57c657] usable ==> usable
[0.00] e820: update [mem 0x6c572018-0x6c57c657] usable ==> usable
[0.00] extended physical RAM map:
[0.00] reserve setup_data: [mem 0x-0x0009efff] 
usable
[0.00] reserve setup_data: [mem 0x0009f000-0x000b] 
reserved
[0.00] reserve setup_data: [mem 0x0010-0x09af] 
usable
[0.00] reserve setup_data: [mem 0x09b0-0x09df] 
reserved
[0.00] reserve setup_data: [mem 0x09e0-0x09ef] 
usable
[0.00] reserve setup_data: [mem 0x09f0-0x09f10fff] 
ACPI NVS
[0.00] reserve setup_data: [mem 

slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail in __free_pages_core()")

2021-03-11 Thread Deucher, Alexander
[AMD Public Use]

Booting kernels on certain AMD platforms takes 2-3 minutes with the patch in 
the subject.  Reverting it restores quick boot times (few seconds).  Any ideas?

Thanks,

Alex

[0.00] Linux version 5.11.0-7490c004ae7e (jenkins@24dbd4b4380b) (gcc 
(Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, GNU ld (GNU Binutils for Ubuntu) 2.30) 
#20210308 SMP Sun Mar 7 20:04:05 UTC 2021
[0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-5.11.0-7490c004ae7e 
root=UUID=459758f3-5106-4173-b9bc-cf9d528828ec ro 
resume=UUID=23390f67-bbaf-42c1-b31d-64ef7288e39e amd_iommu=off nokaslr
[0.00] KERNEL supported cpus:
[0.00]   Intel GenuineIntel
[0.00]   AMD AuthenticAMD
[0.00]   Hygon HygonGenuine
[0.00]   Centaur CentaurHauls
[0.00]   zhaoxin   Shanghai  
[0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point 
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, 
using 'compacted' format.
[0.00] BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009efff] usable
[0.00] BIOS-e820: [mem 0x0009f000-0x000b] reserved
[0.00] BIOS-e820: [mem 0x0010-0x09af] usable
[0.00] BIOS-e820: [mem 0x09b0-0x09df] reserved
[0.00] BIOS-e820: [mem 0x09e0-0x09ef] usable
[0.00] BIOS-e820: [mem 0x09f0-0x09f10fff] ACPI NVS
[0.00] BIOS-e820: [mem 0x09f11000-0x6c56efff] usable
[0.00] BIOS-e820: [mem 0x6c56f000-0x6c56] reserved
[0.00] BIOS-e820: [mem 0x6c57-0x7877efff] usable
[0.00] BIOS-e820: [mem 0x7877f000-0x7af7efff] reserved
[0.00] BIOS-e820: [mem 0x7af7f000-0x7cf7efff] ACPI NVS
[0.00] BIOS-e820: [mem 0x7cf7f000-0x7cffefff] ACPI data
[0.00] BIOS-e820: [mem 0x7cfff000-0x7cff] usable
[0.00] BIOS-e820: [mem 0x7d00-0x7dff] reserved
[0.00] BIOS-e820: [mem 0x7f00-0x7fff] reserved
[0.00] BIOS-e820: [mem 0xa000-0xa00f] reserved
[0.00] BIOS-e820: [mem 0xf000-0xf7ff] reserved
[0.00] BIOS-e820: [mem 0xfec0-0xfec01fff] reserved
[0.00] BIOS-e820: [mem 0xfec1-0xfec10fff] reserved
[0.00] BIOS-e820: [mem 0xfec2-0xfec20fff] reserved
[0.00] BIOS-e820: [mem 0xfed8-0xfed81fff] reserved
[0.00] BIOS-e820: [mem 0xfedc-0xfedd] reserved
[0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved
[0.00] BIOS-e820: [mem 0xff08-0xffdd] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00023f37] usable
[0.00] BIOS-e820: [mem 0x00023f38-0x00027fff] reserved
[0.00] NX (Execute Disable) protection: active
[0.00] e820: update [mem 0x6a275018-0x6a283857] usable ==> usable
[0.00] e820: update [mem 0x6a275018-0x6a283857] usable ==> usable
[0.00] e820: update [mem 0x6c572018-0x6c57c657] usable ==> usable
[0.00] e820: update [mem 0x6c572018-0x6c57c657] usable ==> usable
[0.00] extended physical RAM map:
[0.00] reserve setup_data: [mem 0x-0x0009efff] 
usable
[0.00] reserve setup_data: [mem 0x0009f000-0x000b] 
reserved
[0.00] reserve setup_data: [mem 0x0010-0x09af] 
usable
[0.00] reserve setup_data: [mem 0x09b0-0x09df] 
reserved
[0.00] reserve setup_data: [mem 0x09e0-0x09ef] 
usable
[0.00] reserve setup_data: [mem 0x09f0-0x09f10fff] 
ACPI NVS
[0.00] reserve setup_data: [mem 0x09f11000-0x6a275017] 
usable
[0.00] reserve setup_data: [mem 0x6a275018-0x6a283857] 
usable
[0.00] reserve setup_data: [mem 0x6a283858-0x6c56efff] 
usable
[0.00] reserve setup_data: [mem 0x6c56f000-0x6c56] 
reserved
[0.00] reserve setup_data: [mem 0x6c57-0x6c572017] 
usable
[0.00] reserve setup_data: [mem 0x6c572018-0x6c57c657] 
usable
[0.00] reserve setup_data: [mem 0x6c57c658-0x7877efff] 
usable
[0.00] reserve setup_data: [mem 0x7877f000-0x7af7efff] 
reserved
[0.00] reserve setup_data: [mem