On Tue, 2018-02-27 at 15:42 +0800, kemi wrote:
> 
> On 2018年02月26日 20:33, Jeff Layton wrote:
> > On Mon, 2018-02-26 at 06:43 -0500, Jeff Layton wrote:
> > > On Mon, 2018-02-26 at 16:38 +0800, Ye Xiaolong wrote:
> > > > On 02/25, Jeff Layton wrote:
> > > > > On Sun, 2018-02-25 at 23:05 +0800, kernel test robot wrote:
> > > > > > Greeting,
> > > > > > 
> > > > > > FYI, we noticed a -18.0% regression of aim7.jobs-per-min due to 
> > > > > > commit:
> > > > > > 
> > > > > > 
> > > > > > commit: c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e ("iversion: make 
> > > > > > inode_cmp_iversion{+raw} return bool instead of s64")
> > > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 
> > > > > > master
> > > > > > 
> > > > > > in testcase: aim7
> > > > > > on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 
> > > > > > 3.00GHz with 384G memory
> > > > > > with following parameters:
> > > > > > 
> > > > > >     disk: 4BRD_12G
> > > > > >     md: RAID0
> > > > > >     fs: xfs
> > > > > >     test: disk_src
> > > > > >     load: 3000
> > > > > >     cpufreq_governor: performance
> > > > > > 
> > > > > > test-description: AIM7 is a traditional UNIX system level benchmark 
> > > > > > suite which is used to test and measure the performance of 
> > > > > > multiuser system.
> > > > > > test-url: 
> > > > > > https://sourceforge.net/projects/aimbench/files/aim-suite7/
> > > > > > 
> > > > > > 
> > > > > 
> > > > > I'm a bit suspicious of this result.
> > > > > 
> > > > > This patch only changes inode_cmp_iversion{+raw} (since renamed to
> > > > > inode_eq_iversion{+raw}), and that neither should ever be called from
> > > > > xfs. The patch is fairly trivial too, and I wouldn't expect a big
> > > > > performance hit.
> > > > 
> > > > I tried to queue 4 more times test for both commit c0cef30e4f and its 
> > > > parent,
> > > > the result seems quite stable.
> > > > 
> > > > c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e:
> > > >  "aim7.jobs-per-min": [
> > > >     32964.01,
> > > >     32938.68,
> > > >     33068.18,
> > > >     32886.32,
> > > >     32843.72,
> > > >     32798.83,
> > > >     32898.34,
> > > >     32952.55
> > > >   ],
> > > > 
> > > > 3da90b159b146672f830bcd2489dd3a1f4e9e089:
> > > >   "aim7.jobs-per-min": [
> > > >     40239.65,
> > > >     40163.33,
> > > >     40353.32,
> > > >     39976.9,
> > > >     40185.75,
> > > >     40411.3,
> > > >     40213.58,
> > > >     39900.69
> > > >   ],
> > > > 
> > > > Any other test data you may need?
> > > > 
> > > > > 
> > > > > Is IMA involved here at all? I didn't see any evidence of it, but the
> > > > > kernel config did have it enabled.
> > > > > 
> > > > 
> > > > Sorry, not quite familiar with IMA, could you tell more about how to 
> > > > check it?
> > > > 
> > > 
> > > Thanks for retesting it, but I'm at a loss for why we're seeing this:
> > > 
> > > IMA is the the integrity management subsystem. It will use the iversion
> > > field to determine whether to remeasure files during remeasurement.  It
> > > looks like the kernel config has it enabled, but it doesn't look like
> > > it's in use, based on the info in the initial report.
> > > 
> > > This patch only affects two inlined functions inode_cmp_iversion and
> > > inode_cmp_iversion_raw. The patch is pretty trivial (as Linus points
> > > out). These functions are only called from IMA and fs-specific code
> > > (usually in readdir implementations to detect directory changes).
> > > 
> > > XFS does not call either of these functions however, so I'm a little
> > > unclear on how this patch could slow anything down on this test. The
> > > only thing I can think to do here would be to profile this and see what
> > > stands out.
> > > 
> > > Note that we do need to keep this in perspective too. This 18%
> > > regression on this test follows around a ~230% improvement that occurred
> > > when we merged the bulk of these patches. It's should still be quite a
> > > bit faster than the v4.15 in this regard.
> > > 
> > > Still, it'd be good to understand what's going on here.
> > > 
> > > 
> > 
> > Could we see the dmesg from this boot? It'd be good to confirm that IMA
> > is not involved here, as that's the only place that I can see that would
> > call into this code at all here.
> > 
> 
> See attachment for info on dmesg/perf-profile/compare_result.
> Feel free to let Xiaolong or me know if anything else you would like to check.
> 

Many thanks,

Only one caller of the functions touched by this patch shows up in the
profiles: ima_file_free. That calls ima_check_last_writer, which calls
inode_cmp_iversion. The lines from the profiles show:

3da90b159b146672f830bcd2489dd3a1f4e9e089:
     0.00%     0.00%  [kernel.kallsyms]   [k] ima_file_free

c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e:
     0.01%     0.01%  [kernel.kallsyms]   [k] ima_file_free

Seems pretty insignificant, but perhaps that is somehow accounting for
the difference. This is called when a file is freed so there could be an
effect I guess if we're closing a lot of files for write.

Looking at the disassembly from the builds on my box there is some
slight difference, since we did  alter the implementation.
inode->iversion is 0x150 bytes into the struct on my builds:

3da90b159b146672f830bcd2489dd3a1f4e9e089:

   0xffffffff813ae858 <+136>:   je     0xffffffff813ae871 <ima_file_free+161>
   0xffffffff813ae85a <+138>:   mov    0x150(%rbp),%rsi
   0xffffffff813ae861 <+145>:   mov    0x20(%rax),%rcx
   0xffffffff813ae865 <+149>:   and    $0xfffffffffffffffe,%rsi
   0xffffffff813ae869 <+153>:   add    %rcx,%rcx
   0xffffffff813ae86c <+156>:   cmp    %rcx,%rsi
   0xffffffff813ae86f <+159>:   je     0xffffffff813ae899 <ima_file_free+201>

c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e:

   0xffffffff813ae828 <+136>:   je     0xffffffff813ae83a <ima_file_free+154>
   0xffffffff813ae82a <+138>:   mov    0x150(%rbp),%rcx
   0xffffffff813ae831 <+145>:   shr    %rcx
   0xffffffff813ae834 <+148>:   cmp    %rcx,0x20(%rax)
   0xffffffff813ae838 <+152>:   je     0xffffffff813ae862 <ima_file_free+194>

The patched one looks like it ought to be more efficient. It's certainly
fewer instructions and it doesn't touch %rsi now. Are shifts more
expensive?

In any case, what might be good at this point as a sanity check is to
turn off CONFIG_IMA and recheck this. That should tell us whether we're
on the right track here. With that disabled, this patch should have no
effect on anything at all.

Thanks,
-- 
Jeff Layton <jlay...@redhat.com>

Reply via email to