Re: RFR: 8255455: Refactor ThreadHeapSampler::_log_table as constexpr
On Tue, 27 Oct 2020 14:00:34 GMT, Claes Redestad wrote: > The static `ThreadHeapSampler::_log_table` is currently initialized on JVM > bootstrap to an overhead of ~67k instructions (linux-x64). By turning the > initialization into a constexpr, we can precalculate the helper table at > compile time, which trades a runtime overhead for a small, 8kb, static > footprint increase. > > I compared `fast_log2` with the `log2` builtin with a naive benchmarking > experiment[1] (not included in this PR) and show that the `fast_log2` is > ~2.5x faster than `log2` on my system. And that without the lookup table we'd > be much worse. So I think it makes sense to preserve this optimization, but > get rid of the startup overhead: > > [5.428s][debug][heapsampling] log2, 0.0751173 secs > [5.457s][debug][heapsampling] fast_log2, 0.0298244 secs > [5.622s][debug][heapsampling] fast_log2_uncached, 0.1645569 secs > > I've verified that this refactoring does not affect performance in this naive > setup. > > [1] https://github.com/openjdk/jdk/compare/master...cl4es:log2_micro?expand=1 Dear @cl4es, I am not a reviewer, just have 1 comment that maybe you need to update the Year info in the headers of touched files. Thanks. Lin - PR: https://git.openjdk.java.net/jdk/pull/880
Re: RFR: 8255455: Refactor ThreadHeapSampler::_log_table as constexpr
On Thu, 29 Oct 2020 08:56:13 GMT, Lin Zang wrote: >> The static `ThreadHeapSampler::_log_table` is currently initialized on JVM >> bootstrap to an overhead of ~67k instructions (linux-x64). By turning the >> initialization into a constexpr, we can precalculate the helper table at >> compile time, which trades a runtime overhead for a small, 8kb, static >> footprint increase. >> >> I compared `fast_log2` with the `log2` builtin with a naive benchmarking >> experiment[1] (not included in this PR) and show that the `fast_log2` is >> ~2.5x faster than `log2` on my system. And that without the lookup table >> we'd be much worse. So I think it makes sense to preserve this optimization, >> but get rid of the startup overhead: >> >> [5.428s][debug][heapsampling] log2, 0.0751173 secs >> [5.457s][debug][heapsampling] fast_log2, 0.0298244 secs >> [5.622s][debug][heapsampling] fast_log2_uncached, 0.1645569 secs >> >> I've verified that this refactoring does not affect performance in this >> naive setup. >> >> [1] https://github.com/openjdk/jdk/compare/master...cl4es:log2_micro?expand=1 > > Dear @cl4es, > I am not a reviewer, just have 1 comment that maybe you need to update the > Year info in the headers of touched files. > > Thanks. > Lin Unfortunately there's currently no portable way to use `std::log` (or any of the other `std` math functions) in a constexpr, so I had to resort to a code generator approach instead. It's either that or withdrawing this PR. Using UL and a debug-only block to implement an adhoc code generator (`-Xlog:heapsampling+generate::none`) might be a bit unorthodox, but I think it turned out OK. - PR: https://git.openjdk.java.net/jdk/pull/880
Re: RFR: 8255455: Refactor ThreadHeapSampler::_log_table as constexpr [v2]
> The static `ThreadHeapSampler::_log_table` is currently initialized on JVM > bootstrap to an overhead of ~67k instructions (linux-x64). By turning the > initialization into a constexpr, we can precalculate the helper table at > compile time, which trades a runtime overhead for a small, 8kb, static > footprint increase. > > I compared `fast_log2` with the `log2` builtin with a naive benchmarking > experiment[1] (not included in this PR) and show that the `fast_log2` is > ~2.5x faster than `log2` on my system. And that without the lookup table we'd > be much worse. So I think it makes sense to preserve this optimization, but > get rid of the startup overhead: > > [5.428s][debug][heapsampling] log2, 0.0751173 secs > [5.457s][debug][heapsampling] fast_log2, 0.0298244 secs > [5.622s][debug][heapsampling] fast_log2_uncached, 0.1645569 secs > > I've verified that this refactoring does not affect performance in this naive > setup. > > [1] https://github.com/openjdk/jdk/compare/master...cl4es:log2_micro?expand=1 Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into threadHeapSampler_constexpr - Remove _log_table_initialized assert - Refactor ThreadHeapSampler::_log_table as constexpr - Changes: - all: https://git.openjdk.java.net/jdk/pull/880/files - new: https://git.openjdk.java.net/jdk/pull/880/files/2a66158a..f28b9cc4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=880&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=880&range=00-01 Stats: 1880 lines in 75 files changed: 1373 ins; 312 del; 195 mod Patch: https://git.openjdk.java.net/jdk/pull/880.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/880/head:pull/880 PR: https://git.openjdk.java.net/jdk/pull/880
Re: RFR: 8255455: Refactor ThreadHeapSampler::_log_table as constexpr [v2]
On Tue, 27 Oct 2020 14:56:33 GMT, Claes Redestad wrote: >> The static `ThreadHeapSampler::_log_table` is currently initialized on JVM >> bootstrap to an overhead of ~67k instructions (linux-x64). By turning the >> initialization into a constexpr, we can precalculate the helper table at >> compile time, which trades a runtime overhead for a small, 8kb, static >> footprint increase. >> >> I compared `fast_log2` with the `log2` builtin with a naive benchmarking >> experiment[1] (not included in this PR) and show that the `fast_log2` is >> ~2.5x faster than `log2` on my system. And that without the lookup table >> we'd be much worse. So I think it makes sense to preserve this optimization, >> but get rid of the startup overhead: >> >> [5.428s][debug][heapsampling] log2, 0.0751173 secs >> [5.457s][debug][heapsampling] fast_log2, 0.0298244 secs >> [5.622s][debug][heapsampling] fast_log2_uncached, 0.1645569 secs >> >> I've verified that this refactoring does not affect performance in this >> naive setup. >> >> [1] https://github.com/openjdk/jdk/compare/master...cl4es:log2_micro?expand=1 > > Claes Redestad has updated the pull request with a new target base due to a > merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains three additional > commits since the last revision: > > - Merge branch 'master' into threadHeapSampler_constexpr > - Remove _log_table_initialized assert > - Refactor ThreadHeapSampler::_log_table as constexpr Changes requested by iklam (Reviewer). src/hotspot/share/runtime/threadHeapSampler.cpp line 48: > 46: }; > 47: > 48: const FastLogTable > ThreadHeapSampler::_log_table; To guarantee that `_log_table` is evaluated at C++ compile time, it's best to change the code to constexpr FastLogTable _log_table; There are 2 reasons: 1. C++ guarantees compile-time evaluation only if the expression is used in a "constexpr context". You can read more from [here](https://isocpp.org/blog/2013/01/when-does-a-constexpr-function-get-evaluated-at-compile-time-stackoverflow). 2. In the future, if the `FastLogTable` constructor is modified in a way that cannot be evaluated at compile time, (e.g., someone removes `constexpr` from the constructor's declaration by mistake, the C++ compiler will catch that and tell you that `_log_table` cannot be `constexpr`. Unfortunately, you cannot use `constexpr` in forward declarations, so you should either move the definition of `FastLogTable` to the hpp file, or remove the declaration of `_log_table` from the hpp file. - PR: https://git.openjdk.java.net/jdk/pull/880
Re: RFR: 8255455: Refactor ThreadHeapSampler::_log_table as constexpr [v3]
> The static `ThreadHeapSampler::_log_table` is currently initialized on JVM > bootstrap to an overhead of ~67k instructions (linux-x64). By turning the > initialization into a constexpr, we can precalculate the helper table at > compile time, which trades a runtime overhead for a small, 8kb, static > footprint increase. > > I compared `fast_log2` with the `log2` builtin with a naive benchmarking > experiment[1] (not included in this PR) and show that the `fast_log2` is > ~2.5x faster than `log2` on my system. And that without the lookup table we'd > be much worse. So I think it makes sense to preserve this optimization, but > get rid of the startup overhead: > > [5.428s][debug][heapsampling] log2, 0.0751173 secs > [5.457s][debug][heapsampling] fast_log2, 0.0298244 secs > [5.622s][debug][heapsampling] fast_log2_uncached, 0.1645569 secs > > I've verified that this refactoring does not affect performance in this naive > setup. > > [1] https://github.com/openjdk/jdk/compare/master...cl4es:log2_micro?expand=1 Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Declare and define log_table as a static constexpr inside threadHeapSampler.cpp - Merge branch 'master' into threadHeapSampler_constexpr - Merge branch 'master' into threadHeapSampler_constexpr - Remove _log_table_initialized assert - Refactor ThreadHeapSampler::_log_table as constexpr - Changes: - all: https://git.openjdk.java.net/jdk/pull/880/files - new: https://git.openjdk.java.net/jdk/pull/880/files/f28b9cc4..62148744 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=880&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=880&range=01-02 Stats: 922 lines in 50 files changed: 556 ins; 221 del; 145 mod Patch: https://git.openjdk.java.net/jdk/pull/880.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/880/head:pull/880 PR: https://git.openjdk.java.net/jdk/pull/880
Re: RFR: 8255455: Refactor ThreadHeapSampler::_log_table as constexpr [v2]
On Tue, 27 Oct 2020 23:08:13 GMT, Ioi Lam wrote: >> Claes Redestad has updated the pull request with a new target base due to a >> merge or a rebase. The incremental webrev excludes the unrelated changes >> brought in by the merge/rebase. The pull request contains three additional >> commits since the last revision: >> >> - Merge branch 'master' into threadHeapSampler_constexpr >> - Remove _log_table_initialized assert >> - Refactor ThreadHeapSampler::_log_table as constexpr > > src/hotspot/share/runtime/threadHeapSampler.cpp line 48: > >> 46: }; >> 47: >> 48: const FastLogTable >> ThreadHeapSampler::_log_table; > > To guarantee that `_log_table` is evaluated at C++ compile time, it's best to > change the code to > > constexpr FastLogTable _log_table; > > There are 2 reasons: > > 1. C++ guarantees compile-time evaluation only if the expression is used in a > "constexpr context". You can read more from > [here](https://isocpp.org/blog/2013/01/when-does-a-constexpr-function-get-evaluated-at-compile-time-stackoverflow). > 2. In the future, if the `FastLogTable` constructor is modified in a way that > cannot be evaluated at compile time, (e.g., someone removes `constexpr` from > the constructor's declaration by mistake, the C++ compiler will catch that > and tell you that `_log_table` cannot be `constexpr`. > > Unfortunately, you cannot use `constexpr` in forward declarations, so you > should either move the definition of `FastLogTable` to the hpp file, or > remove the declaration of `_log_table` from the hpp file. Ok, makes sense. I went with removing the `_log_table` declaration from the hpp file. I think things came out a bit cleaner as a result, too. - PR: https://git.openjdk.java.net/jdk/pull/880
Re: RFR: 8255455: Refactor ThreadHeapSampler::_log_table as constexpr [v4]
> The static `ThreadHeapSampler::_log_table` is currently initialized on JVM > bootstrap to an overhead of ~67k instructions (linux-x64). By turning the > initialization into a constexpr, we can precalculate the helper table at > compile time, which trades a runtime overhead for a small, 8kb, static > footprint increase. > > I compared `fast_log2` with the `log2` builtin with a naive benchmarking > experiment[1] (not included in this PR) and show that the `fast_log2` is > ~2.5x faster than `log2` on my system. And that without the lookup table we'd > be much worse. So I think it makes sense to preserve this optimization, but > get rid of the startup overhead: > > [5.428s][debug][heapsampling] log2, 0.0751173 secs > [5.457s][debug][heapsampling] fast_log2, 0.0298244 secs > [5.622s][debug][heapsampling] fast_log2_uncached, 0.1645569 secs > > I've verified that this refactoring does not affect performance in this naive > setup. > > [1] https://github.com/openjdk/jdk/compare/master...cl4es:log2_micro?expand=1 Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'master' into threadHeapSampler_constexpr - Declare and define log_table as a static constexpr inside threadHeapSampler.cpp - Merge branch 'master' into threadHeapSampler_constexpr - Merge branch 'master' into threadHeapSampler_constexpr - Remove _log_table_initialized assert - Refactor ThreadHeapSampler::_log_table as constexpr - Changes: - all: https://git.openjdk.java.net/jdk/pull/880/files - new: https://git.openjdk.java.net/jdk/pull/880/files/62148744..f796147d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=880&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=880&range=02-03 Stats: 364 lines in 32 files changed: 294 ins; 21 del; 49 mod Patch: https://git.openjdk.java.net/jdk/pull/880.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/880/head:pull/880 PR: https://git.openjdk.java.net/jdk/pull/880
Re: RFR: 8255455: Refactor ThreadHeapSampler::_log_table as constexpr [v2]
On Tue, 27 Oct 2020 23:08:21 GMT, Ioi Lam wrote: >> Claes Redestad has updated the pull request with a new target base due to a >> merge or a rebase. The incremental webrev excludes the unrelated changes >> brought in by the merge/rebase. The pull request contains three additional >> commits since the last revision: >> >> - Merge branch 'master' into threadHeapSampler_constexpr >> - Remove _log_table_initialized assert >> - Refactor ThreadHeapSampler::_log_table as constexpr > > Changes requested by iklam (Reviewer). The change to declare the log_table as constexpr exposed an issue with the std `log` function not being constexpr, which now cause a build error on Windows and Mac. It works on Linux, but likely not robustly so since I seem to have been stumbling into a non-compliant quirk: https://stackoverflow.com/questions/50477974/constexpr-exp-log-pow - PR: https://git.openjdk.java.net/jdk/pull/880
Re: RFR: 8255455: Refactor ThreadHeapSampler::_log_table as constexpr [v4]
On Wed, 28 Oct 2020 14:36:59 GMT, Claes Redestad wrote: >> The static `ThreadHeapSampler::_log_table` is currently initialized on JVM >> bootstrap to an overhead of ~67k instructions (linux-x64). By turning the >> initialization into a constexpr, we can precalculate the helper table at >> compile time, which trades a runtime overhead for a small, 8kb, static >> footprint increase. >> >> I compared `fast_log2` with the `log2` builtin with a naive benchmarking >> experiment[1] (not included in this PR) and show that the `fast_log2` is >> ~2.5x faster than `log2` on my system. And that without the lookup table >> we'd be much worse. So I think it makes sense to preserve this optimization, >> but get rid of the startup overhead: >> >> [5.428s][debug][heapsampling] log2, 0.0751173 secs >> [5.457s][debug][heapsampling] fast_log2, 0.0298244 secs >> [5.622s][debug][heapsampling] fast_log2_uncached, 0.1645569 secs >> >> I've verified that this refactoring does not affect performance in this >> naive setup. >> >> [1] https://github.com/openjdk/jdk/compare/master...cl4es:log2_micro?expand=1 > > Claes Redestad has updated the pull request with a new target base due to a > merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains six additional > commits since the last revision: > > - Merge branch 'master' into threadHeapSampler_constexpr > - Declare and define log_table as a static constexpr inside > threadHeapSampler.cpp > - Merge branch 'master' into threadHeapSampler_constexpr > - Merge branch 'master' into threadHeapSampler_constexpr > - Remove _log_table_initialized assert > - Refactor ThreadHeapSampler::_log_table as constexpr Hi Ioi and Claes, This looks nice. One nit: + _values[i] = (log(1.0 + static_cast(i+0.5) / N) / log(2.0)); Need spaces around '+' sign. Also, it can be just one line. Thanks, Serguei - Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/880
Re: RFR: 8255455: Refactor ThreadHeapSampler::_log_table as constexpr [v5]
> The static `ThreadHeapSampler::_log_table` is currently initialized on JVM > bootstrap to an overhead of ~67k instructions (linux-x64). By turning the > initialization into a constexpr, we can precalculate the helper table at > compile time, which trades a runtime overhead for a small, 8kb, static > footprint increase. > > I compared `fast_log2` with the `log2` builtin with a naive benchmarking > experiment[1] (not included in this PR) and show that the `fast_log2` is > ~2.5x faster than `log2` on my system. And that without the lookup table we'd > be much worse. So I think it makes sense to preserve this optimization, but > get rid of the startup overhead: > > [5.428s][debug][heapsampling] log2, 0.0751173 secs > [5.457s][debug][heapsampling] fast_log2, 0.0298244 secs > [5.622s][debug][heapsampling] fast_log2_uncached, 0.1645569 secs > > I've verified that this refactoring does not affect performance in this naive > setup. > > [1] https://github.com/openjdk/jdk/compare/master...cl4es:log2_micro?expand=1 Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Desugar constexpr into code generator and output - Changes: - all: https://git.openjdk.java.net/jdk/pull/880/files - new: https://git.openjdk.java.net/jdk/pull/880/files/f796147d..61bac3dc Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=880&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=880&range=03-04 Stats: 292 lines in 2 files changed: 281 ins; 0 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/880.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/880/head:pull/880 PR: https://git.openjdk.java.net/jdk/pull/880
Re: RFR: 8255455: Refactor ThreadHeapSampler::_log_table as constexpr [v6]
> The static `ThreadHeapSampler::_log_table` is currently initialized on JVM > bootstrap to an overhead of ~67k instructions (linux-x64). By turning the > initialization into a constexpr, we can precalculate the helper table at > compile time, which trades a runtime overhead for a small, 8kb, static > footprint increase. > > I compared `fast_log2` with the `log2` builtin with a naive benchmarking > experiment[1] (not included in this PR) and show that the `fast_log2` is > ~2.5x faster than `log2` on my system. And that without the lookup table we'd > be much worse. So I think it makes sense to preserve this optimization, but > get rid of the startup overhead: > > [5.428s][debug][heapsampling] log2, 0.0751173 secs > [5.457s][debug][heapsampling] fast_log2, 0.0298244 secs > [5.622s][debug][heapsampling] fast_log2_uncached, 0.1645569 secs > > I've verified that this refactoring does not affect performance in this naive > setup. > > [1] https://github.com/openjdk/jdk/compare/master...cl4es:log2_micro?expand=1 Claes Redestad has updated the pull request incrementally with two additional commits since the last revision: - Add explicit include of logging - Add const, fix copyright - Changes: - all: https://git.openjdk.java.net/jdk/pull/880/files - new: https://git.openjdk.java.net/jdk/pull/880/files/61bac3dc..a802814f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=880&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=880&range=04-05 Stats: 5 lines in 2 files changed: 2 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/880.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/880/head:pull/880 PR: https://git.openjdk.java.net/jdk/pull/880