Hi Baoquan,

There could still be some memory initialization problem with
the draft patch. I see a lot of page corruption errors.

BUG: Bad page state in process swapper  pfn:ab0803c

Here is the call trace

[    0.262826]  dump_stack+0x57/0x6a
[    0.262827]  bad_page.cold.119+0x63/0x93
[    0.262828]  __free_pages_ok+0x31f/0x330
[    0.262829]  memblock_free_all+0x153/0x1bf
[    0.262830]  mem_init+0x23/0x1f2
[    0.262831]  start_kernel+0x299/0x57a
[    0.262832]  secondary_startup_64_no_verify+0xb8/0xbb

I don't see this in dmesg log with vanilla kernel.

It looks like the overhead due to this initialization problem
is around 3 secs.

[    0.262831]  start_kernel+0x299/0x57a
[    0.262832]  secondary_startup_64_no_verify+0xb8/0xbb
[    3.758185] Memory: 3374072K/1073740756K available (12297K kernel code, 
5778Krwdata, 4376K rodata, 2352K init, 6480K bss, 16999716K reserved, 0K 
cma-reserved)

But the draft patch is fixing the initial problem
reported around 2 secs (log snippet below) hence the total
delay of 1 sec.

[    0.024752]   Normal zone: 1445888 pages used for memmap
[    0.024753]   Normal zone: 89391104 pages, LIFO batch:63
[    0.027379] ACPI: PM-Timer IO Port: 0x448


________________________________________
From: Rahul Gopakumar <gopakum...@vmware.com>
Sent: 22 October 2020 10:51 PM
To: b...@redhat.com
Cc: linux...@kvack.org; linux-kernel@vger.kernel.org; 
a...@linux-foundation.org; natechancel...@gmail.com; ndesaulni...@google.com; 
clang-built-li...@googlegroups.com; rost...@goodmis.org; Rajender M; Yiu Cho 
Lau; Peter Jonasson; Venkatesh Rajaram
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel

Hi Baoquan,

>> Can you tell how you measure the boot time?

Our test is actually boothalt, time reported by this test
includes both boot-up and shutdown time.

>> At above, you said "Patch on latest commit - 20.161 secs",
>> could you tell where this 20.161 secs comes from,

So this time is boot-up time + shutdown time.

>From the dmesg.log it looks like during the memmap_init
it's taking less time in the patch. Let me take a closer look to
confirm this and also to find where the 1-sec delay in the patch
run is coming from.


From: b...@redhat.com <b...@redhat.com>
Sent: 22 October 2020 9:34 AM
To: Rahul Gopakumar <gopakum...@vmware.com>
Cc: linux...@kvack.org <linux...@kvack.org>; linux-kernel@vger.kernel.org 
<linux-kernel@vger.kernel.org>; a...@linux-foundation.org 
<a...@linux-foundation.org>; natechancel...@gmail.com 
<natechancel...@gmail.com>; ndesaulni...@google.com <ndesaulni...@google.com>; 
clang-built-li...@googlegroups.com <clang-built-li...@googlegroups.com>; 
rost...@goodmis.org <rost...@goodmis.org>; Rajender M <ma...@vmware.com>; Yiu 
Cho Lau <lauyi...@vmware.com>; Peter Jonasson <pjonas...@vmware.com>; Venkatesh 
Rajaram <rajar...@vmware.com>
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel

Hi Rahul,

On 10/20/20 at 03:26pm, Rahul Gopakumar wrote:
> >> Here, do you mean it even cost more time with the patch applied?
>
> Yes, we ran it multiple times and it looks like there is a
> very minor increase with the patch.
>
......
> On 10/20/20 at 01:45pm, Rahul Gopakumar wrote:
> > Hi Baoquan,
> >
> > We had some trouble applying the patch to problem commit and the latest 
> > upstream commit. Steven (CC'ed) helped us by providing the updated draft 
> > patch. We applied it on the latest commit 
> > (3e4fb4346c781068610d03c12b16c0cfb0fd24a3), and it doesn't look like 
> > improving the performance numbers.
>
> Thanks for your feedback. From the code, I am sure what the problem is,
> but I didn't test it on system with huge memory. Forget mentioning my
> draft patch is based on akpm/master branch since it's a mm issue, it
> might be a little different with linus's mainline kernel, sorry for the
> inconvenience.
>
> I will test and debug this on a server with 4T memory in our lab, and
> update if any progress.
>
> >
> > Patch on latest commit - 20.161 secs
> > Vanilla latest commit - 19.50 secs
>

Can you tell how you measure the boot time? I checked the boot logs you
attached, E.g in below two logs, I saw patch_dmesg.log even has less
time during memmap init. Now I have got a machine with 1T memory for
testing, but didn't see obvious time cost increase. At above, you said
"Patch on latest commit - 20.161 secs", could you tell where this 20.161
secs comes from, so that I can investigate and reproduce on my system?

patch_dmesg.log:
[    0.023126] Initmem setup node 1 [mem 0x0000005600000000-0x000000aaffffffff]
[    0.023128] On node 1 totalpages: 89128960
[    0.023129]   Normal zone: 1392640 pages used for memmap
[    0.023130]   Normal zone: 89128960 pages, LIFO batch:63
[    0.023893] Initmem setup node 2 [mem 0x000000ab00000000-0x000001033fffffff]
[    0.023895] On node 2 totalpages: 89391104
[    0.023896]   Normal zone: 1445888 pages used for memmap
[    0.023897]   Normal zone: 89391104 pages, LIFO batch:63
[    0.026744] ACPI: PM-Timer IO Port: 0x448
[    0.026747] ACPI: Local APIC address 0xfee00000

vanilla_dmesg.log:
[    0.024295] Initmem setup node 1 [mem 0x0000005600000000-0x000000aaffffffff]
[    0.024298] On node 1 totalpages: 89128960
[    0.024299]   Normal zone: 1392640 pages used for memmap
[    0.024299]   Normal zone: 89128960 pages, LIFO batch:63
[    0.025289] Initmem setup node 2 [mem 0x000000ab00000000-0x000001033fffffff]
[    0.025291] On node 2 totalpages: 89391104
[    0.025292]   Normal zone: 1445888 pages used for memmap
[    0.025293]   Normal zone: 89391104 pages, LIFO batch:63
[    2.096982] ACPI: PM-Timer IO Port: 0x448
[    2.096987] ACPI: Local APIC address 0xfee00000

Reply via email to