Re: RFR: 8255455: Refactor ThreadHeapSampler::_log_table as constexpr

2020-10-29 Thread Lin Zang
On Tue, 27 Oct 2020 14:00:34 GMT, Claes Redestad  wrote:

> The static `ThreadHeapSampler::_log_table` is currently initialized on JVM 
> bootstrap to an overhead of ~67k instructions (linux-x64). By turning the 
> initialization into a constexpr, we can precalculate the helper table at 
> compile time, which trades a runtime overhead for a small, 8kb, static 
> footprint increase.
> 
> I compared `fast_log2` with the `log2` builtin with a naive benchmarking 
> experiment[1] (not included in this PR) and show that the `fast_log2` is 
> ~2.5x faster than `log2` on my system. And that without the lookup table we'd 
> be much worse. So I think it makes sense to preserve this optimization, but 
> get rid of the startup overhead:
> 
> [5.428s][debug][heapsampling] log2, 0.0751173 secs
> [5.457s][debug][heapsampling] fast_log2, 0.0298244 secs
> [5.622s][debug][heapsampling] fast_log2_uncached, 0.1645569 secs
> 
> I've verified that this refactoring does not affect performance in this naive 
> setup.
> 
> [1] https://github.com/openjdk/jdk/compare/master...cl4es:log2_micro?expand=1

Dear @cl4es, 
I am not a reviewer, just have 1 comment that maybe you need to update the Year 
info in the headers of touched files. 

Thanks.
Lin

-

PR: https://git.openjdk.java.net/jdk/pull/880


Re: RFR: 8255455: Refactor ThreadHeapSampler::_log_table as constexpr

2020-10-29 Thread Claes Redestad
On Thu, 29 Oct 2020 08:56:13 GMT, Lin Zang  wrote:

>> The static `ThreadHeapSampler::_log_table` is currently initialized on JVM 
>> bootstrap to an overhead of ~67k instructions (linux-x64). By turning the 
>> initialization into a constexpr, we can precalculate the helper table at 
>> compile time, which trades a runtime overhead for a small, 8kb, static 
>> footprint increase.
>> 
>> I compared `fast_log2` with the `log2` builtin with a naive benchmarking 
>> experiment[1] (not included in this PR) and show that the `fast_log2` is 
>> ~2.5x faster than `log2` on my system. And that without the lookup table 
>> we'd be much worse. So I think it makes sense to preserve this optimization, 
>> but get rid of the startup overhead:
>> 
>> [5.428s][debug][heapsampling] log2, 0.0751173 secs
>> [5.457s][debug][heapsampling] fast_log2, 0.0298244 secs
>> [5.622s][debug][heapsampling] fast_log2_uncached, 0.1645569 secs
>> 
>> I've verified that this refactoring does not affect performance in this 
>> naive setup.
>> 
>> [1] https://github.com/openjdk/jdk/compare/master...cl4es:log2_micro?expand=1
>
> Dear @cl4es, 
> I am not a reviewer, just have 1 comment that maybe you need to update the 
> Year info in the headers of touched files. 
> 
> Thanks.
> Lin

Unfortunately there's currently no portable way to use `std::log` (or any of 
the other `std` math functions) in a constexpr, so I had to resort to a code 
generator approach instead. It's either that or withdrawing this PR.

Using UL and a debug-only block to implement an adhoc code generator 
(`-Xlog:heapsampling+generate::none`) might be a bit unorthodox, but I think it 
turned out OK.

-

PR: https://git.openjdk.java.net/jdk/pull/880


Re: RFR: 8255455: Refactor ThreadHeapSampler::_log_table as constexpr [v2]

2020-10-27 Thread Claes Redestad
> The static `ThreadHeapSampler::_log_table` is currently initialized on JVM 
> bootstrap to an overhead of ~67k instructions (linux-x64). By turning the 
> initialization into a constexpr, we can precalculate the helper table at 
> compile time, which trades a runtime overhead for a small, 8kb, static 
> footprint increase.
> 
> I compared `fast_log2` with the `log2` builtin with a naive benchmarking 
> experiment[1] (not included in this PR) and show that the `fast_log2` is 
> ~2.5x faster than `log2` on my system. And that without the lookup table we'd 
> be much worse. So I think it makes sense to preserve this optimization, but 
> get rid of the startup overhead:
> 
> [5.428s][debug][heapsampling] log2, 0.0751173 secs
> [5.457s][debug][heapsampling] fast_log2, 0.0298244 secs
> [5.622s][debug][heapsampling] fast_log2_uncached, 0.1645569 secs
> 
> I've verified that this refactoring does not affect performance in this naive 
> setup.
> 
> [1] https://github.com/openjdk/jdk/compare/master...cl4es:log2_micro?expand=1

Claes Redestad has updated the pull request with a new target base due to a 
merge or a rebase. The incremental webrev excludes the unrelated changes 
brought in by the merge/rebase. The pull request contains three additional 
commits since the last revision:

 - Merge branch 'master' into threadHeapSampler_constexpr
 - Remove _log_table_initialized assert
 - Refactor ThreadHeapSampler::_log_table as constexpr

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/880/files
  - new: https://git.openjdk.java.net/jdk/pull/880/files/2a66158a..f28b9cc4

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=880&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=880&range=00-01

  Stats: 1880 lines in 75 files changed: 1373 ins; 312 del; 195 mod
  Patch: https://git.openjdk.java.net/jdk/pull/880.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/880/head:pull/880

PR: https://git.openjdk.java.net/jdk/pull/880


Re: RFR: 8255455: Refactor ThreadHeapSampler::_log_table as constexpr [v2]

2020-10-27 Thread Ioi Lam
On Tue, 27 Oct 2020 14:56:33 GMT, Claes Redestad  wrote:

>> The static `ThreadHeapSampler::_log_table` is currently initialized on JVM 
>> bootstrap to an overhead of ~67k instructions (linux-x64). By turning the 
>> initialization into a constexpr, we can precalculate the helper table at 
>> compile time, which trades a runtime overhead for a small, 8kb, static 
>> footprint increase.
>> 
>> I compared `fast_log2` with the `log2` builtin with a naive benchmarking 
>> experiment[1] (not included in this PR) and show that the `fast_log2` is 
>> ~2.5x faster than `log2` on my system. And that without the lookup table 
>> we'd be much worse. So I think it makes sense to preserve this optimization, 
>> but get rid of the startup overhead:
>> 
>> [5.428s][debug][heapsampling] log2, 0.0751173 secs
>> [5.457s][debug][heapsampling] fast_log2, 0.0298244 secs
>> [5.622s][debug][heapsampling] fast_log2_uncached, 0.1645569 secs
>> 
>> I've verified that this refactoring does not affect performance in this 
>> naive setup.
>> 
>> [1] https://github.com/openjdk/jdk/compare/master...cl4es:log2_micro?expand=1
>
> Claes Redestad has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains three additional 
> commits since the last revision:
> 
>  - Merge branch 'master' into threadHeapSampler_constexpr
>  - Remove _log_table_initialized assert
>  - Refactor ThreadHeapSampler::_log_table as constexpr

Changes requested by iklam (Reviewer).

src/hotspot/share/runtime/threadHeapSampler.cpp line 48:

> 46: };
> 47: 
> 48: const FastLogTable 
> ThreadHeapSampler::_log_table;

To guarantee that `_log_table` is evaluated at C++ compile time, it's best to 
change the code to

constexpr FastLogTable _log_table;

There are 2 reasons:

1. C++ guarantees compile-time evaluation only if the expression is used in a 
"constexpr context". You can read more from 
[here](https://isocpp.org/blog/2013/01/when-does-a-constexpr-function-get-evaluated-at-compile-time-stackoverflow).
2. In the future, if the `FastLogTable` constructor is modified in a way that 
cannot be evaluated at compile time, (e.g., someone removes `constexpr` from 
the constructor's declaration by mistake, the C++ compiler will catch that and 
tell you that `_log_table` cannot be `constexpr`.

Unfortunately,  you cannot use `constexpr` in forward declarations, so you 
should either move the definition of `FastLogTable` to the hpp file, or remove 
the declaration of `_log_table` from the hpp file.

-

PR: https://git.openjdk.java.net/jdk/pull/880


Re: RFR: 8255455: Refactor ThreadHeapSampler::_log_table as constexpr [v3]

2020-10-28 Thread Claes Redestad
> The static `ThreadHeapSampler::_log_table` is currently initialized on JVM 
> bootstrap to an overhead of ~67k instructions (linux-x64). By turning the 
> initialization into a constexpr, we can precalculate the helper table at 
> compile time, which trades a runtime overhead for a small, 8kb, static 
> footprint increase.
> 
> I compared `fast_log2` with the `log2` builtin with a naive benchmarking 
> experiment[1] (not included in this PR) and show that the `fast_log2` is 
> ~2.5x faster than `log2` on my system. And that without the lookup table we'd 
> be much worse. So I think it makes sense to preserve this optimization, but 
> get rid of the startup overhead:
> 
> [5.428s][debug][heapsampling] log2, 0.0751173 secs
> [5.457s][debug][heapsampling] fast_log2, 0.0298244 secs
> [5.622s][debug][heapsampling] fast_log2_uncached, 0.1645569 secs
> 
> I've verified that this refactoring does not affect performance in this naive 
> setup.
> 
> [1] https://github.com/openjdk/jdk/compare/master...cl4es:log2_micro?expand=1

Claes Redestad has updated the pull request with a new target base due to a 
merge or a rebase. The incremental webrev excludes the unrelated changes 
brought in by the merge/rebase. The pull request contains five additional 
commits since the last revision:

 - Declare and define log_table as a static constexpr inside 
threadHeapSampler.cpp
 - Merge branch 'master' into threadHeapSampler_constexpr
 - Merge branch 'master' into threadHeapSampler_constexpr
 - Remove _log_table_initialized assert
 - Refactor ThreadHeapSampler::_log_table as constexpr

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/880/files
  - new: https://git.openjdk.java.net/jdk/pull/880/files/f28b9cc4..62148744

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=880&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=880&range=01-02

  Stats: 922 lines in 50 files changed: 556 ins; 221 del; 145 mod
  Patch: https://git.openjdk.java.net/jdk/pull/880.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/880/head:pull/880

PR: https://git.openjdk.java.net/jdk/pull/880


Re: RFR: 8255455: Refactor ThreadHeapSampler::_log_table as constexpr [v2]

2020-10-28 Thread Claes Redestad
On Tue, 27 Oct 2020 23:08:13 GMT, Ioi Lam  wrote:

>> Claes Redestad has updated the pull request with a new target base due to a 
>> merge or a rebase. The incremental webrev excludes the unrelated changes 
>> brought in by the merge/rebase. The pull request contains three additional 
>> commits since the last revision:
>> 
>>  - Merge branch 'master' into threadHeapSampler_constexpr
>>  - Remove _log_table_initialized assert
>>  - Refactor ThreadHeapSampler::_log_table as constexpr
>
> src/hotspot/share/runtime/threadHeapSampler.cpp line 48:
> 
>> 46: };
>> 47: 
>> 48: const FastLogTable 
>> ThreadHeapSampler::_log_table;
> 
> To guarantee that `_log_table` is evaluated at C++ compile time, it's best to 
> change the code to
> 
> constexpr FastLogTable _log_table;
> 
> There are 2 reasons:
> 
> 1. C++ guarantees compile-time evaluation only if the expression is used in a 
> "constexpr context". You can read more from 
> [here](https://isocpp.org/blog/2013/01/when-does-a-constexpr-function-get-evaluated-at-compile-time-stackoverflow).
> 2. In the future, if the `FastLogTable` constructor is modified in a way that 
> cannot be evaluated at compile time, (e.g., someone removes `constexpr` from 
> the constructor's declaration by mistake, the C++ compiler will catch that 
> and tell you that `_log_table` cannot be `constexpr`.
> 
> Unfortunately,  you cannot use `constexpr` in forward declarations, so you 
> should either move the definition of `FastLogTable` to the hpp file, or 
> remove the declaration of `_log_table` from the hpp file.

Ok, makes sense.

I went with removing the `_log_table` declaration from the hpp file. I think 
things came out a bit cleaner as a result, too.

-

PR: https://git.openjdk.java.net/jdk/pull/880


Re: RFR: 8255455: Refactor ThreadHeapSampler::_log_table as constexpr [v4]

2020-10-28 Thread Claes Redestad
> The static `ThreadHeapSampler::_log_table` is currently initialized on JVM 
> bootstrap to an overhead of ~67k instructions (linux-x64). By turning the 
> initialization into a constexpr, we can precalculate the helper table at 
> compile time, which trades a runtime overhead for a small, 8kb, static 
> footprint increase.
> 
> I compared `fast_log2` with the `log2` builtin with a naive benchmarking 
> experiment[1] (not included in this PR) and show that the `fast_log2` is 
> ~2.5x faster than `log2` on my system. And that without the lookup table we'd 
> be much worse. So I think it makes sense to preserve this optimization, but 
> get rid of the startup overhead:
> 
> [5.428s][debug][heapsampling] log2, 0.0751173 secs
> [5.457s][debug][heapsampling] fast_log2, 0.0298244 secs
> [5.622s][debug][heapsampling] fast_log2_uncached, 0.1645569 secs
> 
> I've verified that this refactoring does not affect performance in this naive 
> setup.
> 
> [1] https://github.com/openjdk/jdk/compare/master...cl4es:log2_micro?expand=1

Claes Redestad has updated the pull request with a new target base due to a 
merge or a rebase. The incremental webrev excludes the unrelated changes 
brought in by the merge/rebase. The pull request contains six additional 
commits since the last revision:

 - Merge branch 'master' into threadHeapSampler_constexpr
 - Declare and define log_table as a static constexpr inside 
threadHeapSampler.cpp
 - Merge branch 'master' into threadHeapSampler_constexpr
 - Merge branch 'master' into threadHeapSampler_constexpr
 - Remove _log_table_initialized assert
 - Refactor ThreadHeapSampler::_log_table as constexpr

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/880/files
  - new: https://git.openjdk.java.net/jdk/pull/880/files/62148744..f796147d

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=880&range=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=880&range=02-03

  Stats: 364 lines in 32 files changed: 294 ins; 21 del; 49 mod
  Patch: https://git.openjdk.java.net/jdk/pull/880.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/880/head:pull/880

PR: https://git.openjdk.java.net/jdk/pull/880


Re: RFR: 8255455: Refactor ThreadHeapSampler::_log_table as constexpr [v2]

2020-10-28 Thread Claes Redestad
On Tue, 27 Oct 2020 23:08:21 GMT, Ioi Lam  wrote:

>> Claes Redestad has updated the pull request with a new target base due to a 
>> merge or a rebase. The incremental webrev excludes the unrelated changes 
>> brought in by the merge/rebase. The pull request contains three additional 
>> commits since the last revision:
>> 
>>  - Merge branch 'master' into threadHeapSampler_constexpr
>>  - Remove _log_table_initialized assert
>>  - Refactor ThreadHeapSampler::_log_table as constexpr
>
> Changes requested by iklam (Reviewer).

The change to declare the log_table as constexpr exposed an issue with the std 
`log` function not being constexpr, which now cause a build error on Windows 
and Mac. It works on Linux, but likely not robustly so since I seem to have 
been stumbling into a non-compliant quirk: 
https://stackoverflow.com/questions/50477974/constexpr-exp-log-pow

-

PR: https://git.openjdk.java.net/jdk/pull/880


Re: RFR: 8255455: Refactor ThreadHeapSampler::_log_table as constexpr [v4]

2020-10-28 Thread Serguei Spitsyn
On Wed, 28 Oct 2020 14:36:59 GMT, Claes Redestad  wrote:

>> The static `ThreadHeapSampler::_log_table` is currently initialized on JVM 
>> bootstrap to an overhead of ~67k instructions (linux-x64). By turning the 
>> initialization into a constexpr, we can precalculate the helper table at 
>> compile time, which trades a runtime overhead for a small, 8kb, static 
>> footprint increase.
>> 
>> I compared `fast_log2` with the `log2` builtin with a naive benchmarking 
>> experiment[1] (not included in this PR) and show that the `fast_log2` is 
>> ~2.5x faster than `log2` on my system. And that without the lookup table 
>> we'd be much worse. So I think it makes sense to preserve this optimization, 
>> but get rid of the startup overhead:
>> 
>> [5.428s][debug][heapsampling] log2, 0.0751173 secs
>> [5.457s][debug][heapsampling] fast_log2, 0.0298244 secs
>> [5.622s][debug][heapsampling] fast_log2_uncached, 0.1645569 secs
>> 
>> I've verified that this refactoring does not affect performance in this 
>> naive setup.
>> 
>> [1] https://github.com/openjdk/jdk/compare/master...cl4es:log2_micro?expand=1
>
> Claes Redestad has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains six additional 
> commits since the last revision:
> 
>  - Merge branch 'master' into threadHeapSampler_constexpr
>  - Declare and define log_table as a static constexpr inside 
> threadHeapSampler.cpp
>  - Merge branch 'master' into threadHeapSampler_constexpr
>  - Merge branch 'master' into threadHeapSampler_constexpr
>  - Remove _log_table_initialized assert
>  - Refactor ThreadHeapSampler::_log_table as constexpr

Hi Ioi and Claes,

This looks nice.
One nit:
+  _values[i] = (log(1.0 + static_cast(i+0.5) / N)
 / log(2.0));

Need spaces around '+' sign. Also, it can be just one line.

Thanks,
Serguei

-

Marked as reviewed by sspitsyn (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/880


Re: RFR: 8255455: Refactor ThreadHeapSampler::_log_table as constexpr [v5]

2020-10-29 Thread Claes Redestad
> The static `ThreadHeapSampler::_log_table` is currently initialized on JVM 
> bootstrap to an overhead of ~67k instructions (linux-x64). By turning the 
> initialization into a constexpr, we can precalculate the helper table at 
> compile time, which trades a runtime overhead for a small, 8kb, static 
> footprint increase.
> 
> I compared `fast_log2` with the `log2` builtin with a naive benchmarking 
> experiment[1] (not included in this PR) and show that the `fast_log2` is 
> ~2.5x faster than `log2` on my system. And that without the lookup table we'd 
> be much worse. So I think it makes sense to preserve this optimization, but 
> get rid of the startup overhead:
> 
> [5.428s][debug][heapsampling] log2, 0.0751173 secs
> [5.457s][debug][heapsampling] fast_log2, 0.0298244 secs
> [5.622s][debug][heapsampling] fast_log2_uncached, 0.1645569 secs
> 
> I've verified that this refactoring does not affect performance in this naive 
> setup.
> 
> [1] https://github.com/openjdk/jdk/compare/master...cl4es:log2_micro?expand=1

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Desugar constexpr into code generator and output

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/880/files
  - new: https://git.openjdk.java.net/jdk/pull/880/files/f796147d..61bac3dc

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=880&range=04
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=880&range=03-04

  Stats: 292 lines in 2 files changed: 281 ins; 0 del; 11 mod
  Patch: https://git.openjdk.java.net/jdk/pull/880.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/880/head:pull/880

PR: https://git.openjdk.java.net/jdk/pull/880


Re: RFR: 8255455: Refactor ThreadHeapSampler::_log_table as constexpr [v6]

2020-10-29 Thread Claes Redestad
> The static `ThreadHeapSampler::_log_table` is currently initialized on JVM 
> bootstrap to an overhead of ~67k instructions (linux-x64). By turning the 
> initialization into a constexpr, we can precalculate the helper table at 
> compile time, which trades a runtime overhead for a small, 8kb, static 
> footprint increase.
> 
> I compared `fast_log2` with the `log2` builtin with a naive benchmarking 
> experiment[1] (not included in this PR) and show that the `fast_log2` is 
> ~2.5x faster than `log2` on my system. And that without the lookup table we'd 
> be much worse. So I think it makes sense to preserve this optimization, but 
> get rid of the startup overhead:
> 
> [5.428s][debug][heapsampling] log2, 0.0751173 secs
> [5.457s][debug][heapsampling] fast_log2, 0.0298244 secs
> [5.622s][debug][heapsampling] fast_log2_uncached, 0.1645569 secs
> 
> I've verified that this refactoring does not affect performance in this naive 
> setup.
> 
> [1] https://github.com/openjdk/jdk/compare/master...cl4es:log2_micro?expand=1

Claes Redestad has updated the pull request incrementally with two additional 
commits since the last revision:

 - Add explicit include of logging
 - Add const, fix copyright

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/880/files
  - new: https://git.openjdk.java.net/jdk/pull/880/files/61bac3dc..a802814f

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=880&range=05
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=880&range=04-05

  Stats: 5 lines in 2 files changed: 2 ins; 0 del; 3 mod
  Patch: https://git.openjdk.java.net/jdk/pull/880.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/880/head:pull/880

PR: https://git.openjdk.java.net/jdk/pull/880