On Sat, Nov 14, 2020 at 01:25:44PM +0100, Greg Kroah-Hartman wrote:
> On Sat, Nov 14, 2020 at 03:19:17PM +0800, Feng Tang wrote:
> > Hi Greg,
> > 
> > On Fri, Nov 13, 2020 at 07:46:57AM +0100, Greg Kroah-Hartman wrote:
> > > On Thu, Nov 12, 2020 at 10:06:25PM +0800, kernel test robot wrote:
> > > > 
> > > > Greeting,
> > > > 
> > > > FYI, we noticed a 7.5% improvement of fio.read_iops due to commit:
> > > > 
> > > > 
> > > > commit: 9522750c66c689b739e151fcdf895420dc81efc0 ("Fonts: Replace 
> > > > discarded const qualifier")
> > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > > 
> > > I strongly doubt this :)
> > 
> > We just double checked, the test was run 4 times and the result are
> > very stable.
> > 
> > The commit does looks irrelevant to fio test, and we just further
> > checked the System map of the 2 kernels, and many data's alignment
> > have been changed (systemmaps attached).
> > 
> > We have a hack debug patch to make data sections of each .o file to
> > be aligned, with that the fio result gap could be reduced from +7.5%
> > to +3.8%, so there is still some other factor affecting the benchmark,
> > which need more checking. And with the same debug method of forcing
> > data sections aligned, 2 other strange performance bumps[1][2] reported
> > by 0day could be recovered.
> > 
> > [1]. https://lore.kernel.org/lkml/20200205123216.GO12867@shao2-debian/
> > [2]. https://lore.kernel.org/lkml/20200305062138.GI5972@shao2-debian/
> 
> That's really odd.  Why wouldn't .o sections be aligned already and how
> does that affect the real .ko files that are created from that?  What
> alignment are you forcing?

Our debug patch is hacky which enforce 16K aligned (to adapt other rules
in the linker script to make kernel boot), as below:

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 1bf7e31..de5ddc8 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -156,7 +156,9 @@ SECTIONS
        X86_ALIGN_RODATA_END
 
        /* Data */
-       .data : AT(ADDR(.data) - LOAD_OFFSET) {
+       .data : AT(ADDR(.data) - LOAD_OFFSET)
+       SUBALIGN(16384)
+       {
                /* Start of data section */
                _sdata = .;

And to make it boot, for our kernel config, we have to disable
CONFIG_DYNAMIC_DEBUG to avoid kernel panic.

> And also, what hardware is seeing this performance gains?  Something is
> fitting into a cache now that previously wasn't, and tracking that down
> seems like it would be very worthwhile as that is a non-trivial speedup
> that some developers take years to achieve with code changes.

It's a x86 server with 2S/48C/96T, and the fio parameters are:

        [global]
        bs=2M
        ioengine=mmap
        iodepth=32
        size=4473924266
        nr_files=1
        filesize=4473924266
        direct=0
        runtime=240
        invalidate=1
        fallocate=posix
        io_size=4473924266
        file_service_type=roundrobin
        random_distribution=random
        group_reporting
        pre_read=0

        time_based

        [task_0]
        rw=read
        directory=/fs/pmem0
        numjobs=24

        [task_1]
        rw=read
        directory=/fs/pmem1
        numjobs=24

And yes, we also think it's cacheline related, and we are further   
checking it. Actually we have 2 other similar strange performance
change checking ongoing:

https://lore.kernel.org/lkml/20201102091543.GM31092@shao2-debian/
https://lore.kernel.org/lkml/20201004132838.GU393@shao2-debian/

So it may take some time. And to be frank, there have been quite
some old similar cases that we couldn't figure out the exact cause.

Thanks,
Feng

> thanks,
> 
> greg k-h

Reply via email to