Hello Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/7099
to look at the new patch set (#4).
Change subject: codegen: fix regression in inlining after LLVM 4.0 upgrade
......................................................................
codegen: fix regression in inlining after LLVM 4.0 upgrade
With the upgrade to LLVM 4.0, the performance of our generated code
regressed significantly. After looking at the generated assembly from
codegen-test, I could see that there were many 'call' instructions that
didn't appear in the earlier output, which led me to suspect the
inliner.
After adding a 'module->dump()' call and looking at the output, I could
see that the calls were to utility functions like BitmapTest() which
should be inlined due to their very small size. However, the function
was marked 'noinline' in precompiled.ll. After a bit of Googling I came
across a thread[1] in which someone else noticed that all of their
functions had
unexpected 'noinline' attributes after upgrading to 4.0.
After following the breadcrumbs, I found some advice to change the way
in which we emit precompiled.ll to disable the built-in LLVM passes
which were responsible for adding this attribute.
This had one unintended side-effect which initially broke codegen in
TSAN builds (which use libc++ instead of libstdcxx). Fixing this required
changing the debug builds to still run the most basic ('-O0')
optimization passes at JIT time instead of skipping the optimization
pass manager entirely. See the new comment in module_builder.cc for
details.
That also caused the size of generated code in UBSAN builds to increase,
so I had to increase the limit on the number of dumped instructions per
function to avoid breaking one of the tests.
During my investigation, I also found that we should have been passing
the optimization and size levels into the inlining pass, so I threw
that change in for good measure.
This fixed the perf regression. I benchmarked using:
memrowset-test --roundtrip_num_rows=10000000 \
--gtest_filter=\*InsertCount\* \
--gtest_repeat=10
LLVM 3.9 (before the regression) (asm at [2]):
I0607 12:00:17.540612 23532 memrowset-test.cc:446] Time spent Scanning rows
where all are committed: real 0.295s user 0.294s sys 0.000s
I0607 12:00:23.842226 23532 memrowset-test.cc:446] Time spent Scanning rows
where all are committed: real 0.297s user 0.297s sys 0.000s
I0607 12:00:29.710149 23532 memrowset-test.cc:446] Time spent Scanning rows
where all are committed: real 0.296s user 0.296s sys 0.000s
I0607 12:00:35.587321 23532 memrowset-test.cc:446] Time spent Scanning rows
where all are committed: real 0.296s user 0.296s sys 0.000s
I0607 12:00:41.451495 23532 memrowset-test.cc:446] Time spent Scanning rows
where all are committed: real 0.296s user 0.296s sys 0.000s
I0607 12:00:47.309710 23532 memrowset-test.cc:446] Time spent Scanning rows
where all are committed: real 0.297s user 0.297s sys 0.000s
I0607 12:00:53.183537 23532 memrowset-test.cc:446] Time spent Scanning rows
where all are committed: real 0.297s user 0.297s sys 0.000s
I0607 12:00:59.051522 23532 memrowset-test.cc:446] Time spent Scanning rows
where all are committed: real 0.297s user 0.297s sys 0.000s
I0607 12:01:04.935763 23532 memrowset-test.cc:446] Time spent Scanning rows
where all are committed: real 0.297s user 0.298s sys 0.000s
I0607 12:01:10.821828 23532 memrowset-test.cc:446] Time spent Scanning rows
where all are committed: real 0.297s user 0.299s sys 0.000s
LLVM 4.0 (before this fix) (asm at [3]):
I0607 12:14:07.771476 12117 memrowset-test.cc:446] Time spent Scanning rows
where all are committed: real 0.524s user 0.525s sys 0.000s
I0607 12:14:14.283260 12117 memrowset-test.cc:446] Time spent Scanning rows
where all are committed: real 0.525s user 0.525s sys 0.000s
I0607 12:14:20.376360 12117 memrowset-test.cc:446] Time spent Scanning rows
where all are committed: real 0.543s user 0.543s sys 0.000s
I0607 12:14:26.457681 12117 memrowset-test.cc:446] Time spent Scanning rows
where all are committed: real 0.530s user 0.530s sys 0.000s
I0607 12:14:32.535305 12117 memrowset-test.cc:446] Time spent Scanning rows
where all are committed: real 0.532s user 0.532s sys 0.000s
I0607 12:14:38.615105 12117 memrowset-test.cc:446] Time spent Scanning rows
where all are committed: real 0.531s user 0.532s sys 0.000s
I0607 12:14:44.697655 12117 memrowset-test.cc:446] Time spent Scanning rows
where all are committed: real 0.531s user 0.531s sys 0.000s
I0607 12:14:50.779160 12117 memrowset-test.cc:446] Time spent Scanning rows
where all are committed: real 0.532s user 0.532s sys 0.000s
I0607 12:14:56.852665 12117 memrowset-test.cc:446] Time spent Scanning rows
where all are committed: real 0.531s user 0.530s sys 0.000s
I0607 12:15:02.954947 12117 memrowset-test.cc:446] Time spent Scanning rows
where all are committed: real 0.538s user 0.540s sys 0.000s
LLVM 4.0 (after this fix) (asm at [4]):
I0607 12:17:37.953876 17916 memrowset-test.cc:446] Time spent Scanning rows
where all are committed: real 0.292s user 0.292s sys 0.000s
I0607 12:17:44.214912 17916 memrowset-test.cc:446] Time spent Scanning rows
where all are committed: real 0.295s user 0.295s sys 0.000s
I0607 12:17:50.057200 17916 memrowset-test.cc:446] Time spent Scanning rows
where all are committed: real 0.299s user 0.299s sys 0.000s
I0607 12:17:55.882935 17916 memrowset-test.cc:446] Time spent Scanning rows
where all are committed: real 0.292s user 0.292s sys 0.000s
I0607 12:18:01.725445 17916 memrowset-test.cc:446] Time spent Scanning rows
where all are committed: real 0.293s user 0.293s sys 0.000s
I0607 12:18:07.562417 17916 memrowset-test.cc:446] Time spent Scanning rows
where all are committed: real 0.293s user 0.294s sys 0.000s
I0607 12:18:13.421221 17916 memrowset-test.cc:446] Time spent Scanning rows
where all are committed: real 0.293s user 0.293s sys 0.000s
I0607 12:18:19.268613 17916 memrowset-test.cc:446] Time spent Scanning rows
where all are committed: real 0.293s user 0.293s sys 0.000s
I0607 12:18:25.113490 17916 memrowset-test.cc:446] Time spent Scanning rows
where all are committed: real 0.293s user 0.294s sys 0.000s
I0607 12:18:30.950954 17916 memrowset-test.cc:446] Time spent Scanning rows
where all are committed: real 0.293s user 0.294s sys 0.000s
Looking at the assembly for the three tests above, it's clear that the
inlining is broken in 4.0 without this fix, and with it, the generated
assembly is a little better (shorter) than 3.9.
[1] http://lists.llvm.org/pipermail/llvm-dev/2017-May/113175.html
[2] https://gist.github.com/b4b0365ea1570f33d55c6c3dffd4a8af
[3] https://gist.github.com/128ecb292fddd069d5397a76dc53fc11
[4] https://gist.github.com/0cb01f454737d6c2dc457f0cdb6bb52d
Change-Id: I7e449df80e5cd02b9ce2dbf4daa5cf8151007dcf
---
M src/kudu/codegen/CMakeLists.txt
M src/kudu/codegen/code_generator.cc
M src/kudu/codegen/module_builder.cc
3 files changed, 35 insertions(+), 12 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/7099/4
--
To view, visit http://gerrit.cloudera.org:8080/7099
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7e449df80e5cd02b9ce2dbf4daa5cf8151007dcf
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Dan Burkert <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>