Re: RFR: 8203359: Container level resources events [v14]

2021-05-19 Thread Jaroslav Bachorik
> With this change it becomes possible to surface various cgroup level metrics 
> (available via `jdk.internal.platform.Metrics`) as JFR events.
> 
> Only a subset of the metrics exposed by `jdk.internal.platform.Metrics` is 
> turned into JFR events to start with.
> * CPU related metrics
> * Memory related metrics
> * I/O related metrics
> 
> For each of those subsystems a configuration data will be emitted as well. 
> The initial proposal is to emit the configuration data events at least once 
> per chunk and the metrics values at 30 seconds interval. 
> By using these values the emitted events seem to contain useful information 
> without increasing overhead (the metrics values are read from `/proc` 
> filesystem so that should not be done too frequently).

Jaroslav Bachorik has updated the pull request incrementally with one 
additional commit since the last revision:

  Remove conditinal registration of event hooks

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3126/files
  - new: https://git.openjdk.java.net/jdk/pull/3126/files/e0f5855b..37931f4a

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3126&range=13
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3126&range=12-13

  Stats: 16 lines in 1 file changed: 1 ins; 4 del; 11 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3126.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3126/head:pull/3126

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v9]

2021-05-19 Thread Jaroslav Bachorik
On Wed, 19 May 2021 17:21:26 GMT, Erik Gahlin  wrote:

> I think you need to add the hook, for the event metadata to be correct. 
> Otherwise, the "period" setting will not show up.

Yes. The failed test log would indicate also the rest of the metadata not being 
in a good shape. But with the hook registered everything works fine.

-

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v9]

2021-05-19 Thread Erik Gahlin
On Wed, 21 Apr 2021 22:38:32 GMT, Erik Gahlin  wrote:

>> Jaroslav Bachorik has refreshed the contents of this pull request, and 
>> previous commits have been removed. The incremental views will show 
>> differences compared to the previous content of the PR.
>
> I wonder if something similar to below could be added to 
> jdk.jfr.internal.Utils:
> 
> private static Metrics[] metrics;
> public static Metrics getMetrics() {
> if (metrics == null) {
> metrics = new Metrics[] { Metrics.systemMetrics() };
> }
> return metrics[0];
> }
> 
> public static boolean shouldSkipBytecode(String eventName, Class 
> superClass) {
> if (superClass != AbstractJDKEvent.class) {
> return false;
> }
> if (!eventName.startsWith("jdk.Container")) {
> return false;
> }
> return getMetrics() == null;
> }
> 
> Then we could add checks to 
> jdk.jfr.internal.JVMUpcalls::bytesForEagerInstrumentation(...)
> 
> eventName = ei.getEventName();
> if (Utils.shouldSkipBytecode(eventName, superClass))) {
> return oldBytes;
> }
> 
> and jdk.jfr.internal.JVMUpcalls:onRetransform(...)
> 
> if (jdk.internal.event.Event.class.isAssignableFrom(clazz) && 
> !Modifier.isAbstract(clazz.getModifiers())) {
> if (Utils.shouldSkipBytecode(clazz.getName(), clazz.getSuperclass())) 
> {
> return oldBytes;
> }
> 
> This way we would not pay for generating bytecode for events in a 
> non-container environment. 
> 
> Not sure if it works, but could perhaps make startup faster? We would still 
> pay for generating the event handlers during registration, but it's much 
> trickier to avoid since we need to store the event type somewhere.

> @egahlin Unfortunately, I had to make one late change in the periodic event 
> hook registration.
> If the events are registered conditionally only when running in a container 
> the event metadata are not correct and `TestDefaultConfigurations.java` test 
> will fail. When I register the hooks unconditionally, the metadata is 
> correctly generated and the test passes.
> I will hold off integration until I hear back from you whether this is 
> acceptable or I should try to find an alternative solution.

I think we should always register the metadata, even if you can't get the 
event. 

That's how we handle different GCs. Users must be able to explore events even 
if they can't use them. For example, you must be able to configure container 
events in JMC (with correct labels/descriptions) without actually connecting to 
a JVM running in a Docker container.

-

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v9]

2021-05-19 Thread Jaroslav Bachorik
On Wed, 21 Apr 2021 22:38:32 GMT, Erik Gahlin  wrote:

>> Jaroslav Bachorik has refreshed the contents of this pull request, and 
>> previous commits have been removed. The incremental views will show 
>> differences compared to the previous content of the PR.
>
> I wonder if something similar to below could be added to 
> jdk.jfr.internal.Utils:
> 
> private static Metrics[] metrics;
> public static Metrics getMetrics() {
> if (metrics == null) {
> metrics = new Metrics[] { Metrics.systemMetrics() };
> }
> return metrics[0];
> }
> 
> public static boolean shouldSkipBytecode(String eventName, Class 
> superClass) {
> if (superClass != AbstractJDKEvent.class) {
> return false;
> }
> if (!eventName.startsWith("jdk.Container")) {
> return false;
> }
> return getMetrics() == null;
> }
> 
> Then we could add checks to 
> jdk.jfr.internal.JVMUpcalls::bytesForEagerInstrumentation(...)
> 
> eventName = ei.getEventName();
> if (Utils.shouldSkipBytecode(eventName, superClass))) {
> return oldBytes;
> }
> 
> and jdk.jfr.internal.JVMUpcalls:onRetransform(...)
> 
> if (jdk.internal.event.Event.class.isAssignableFrom(clazz) && 
> !Modifier.isAbstract(clazz.getModifiers())) {
> if (Utils.shouldSkipBytecode(clazz.getName(), clazz.getSuperclass())) 
> {
> return oldBytes;
> }
> 
> This way we would not pay for generating bytecode for events in a 
> non-container environment. 
> 
> Not sure if it works, but could perhaps make startup faster? We would still 
> pay for generating the event handlers during registration, but it's much 
> trickier to avoid since we need to store the event type somewhere.

@egahlin Unfortunately, I had to make one late change in the periodic event 
hook registration.
If the events are registered conditionally only when running in a container the 
event metadata are not correct and `TestDefaultConfigurations.java` test will 
fail. When I register the hooks unconditionally, the metadata is correctly 
generated and the test passes.
I will hold off integration until I hear back from you whether this is 
acceptable or I should try to find an alternative solution.

-

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v13]

2021-05-19 Thread Severin Gehwolf
On Wed, 19 May 2021 15:23:55 GMT, Jaroslav Bachorik  
wrote:

>> With this change it becomes possible to surface various cgroup level metrics 
>> (available via `jdk.internal.platform.Metrics`) as JFR events.
>> 
>> Only a subset of the metrics exposed by `jdk.internal.platform.Metrics` is 
>> turned into JFR events to start with.
>> * CPU related metrics
>> * Memory related metrics
>> * I/O related metrics
>> 
>> For each of those subsystems a configuration data will be emitted as well. 
>> The initial proposal is to emit the configuration data events at least once 
>> per chunk and the metrics values at 30 seconds interval. 
>> By using these values the emitted events seem to contain useful information 
>> without increasing overhead (the metrics values are read from `/proc` 
>> filesystem so that should not be done too frequently).
>
> Jaroslav Bachorik has refreshed the contents of this pull request, and 
> previous commits have been removed. The incremental views will show 
> differences compared to the previous content of the PR. The pull request 
> contains one new commit since the last revision:
> 
>   Fix event metadata

LGTM

-

Marked as reviewed by sgehwolf (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v12]

2021-05-19 Thread Jaroslav Bachorik
On Wed, 19 May 2021 14:11:40 GMT, Jaroslav Bachorik  
wrote:

>> With this change it becomes possible to surface various cgroup level metrics 
>> (available via `jdk.internal.platform.Metrics`) as JFR events.
>> 
>> Only a subset of the metrics exposed by `jdk.internal.platform.Metrics` is 
>> turned into JFR events to start with.
>> * CPU related metrics
>> * Memory related metrics
>> * I/O related metrics
>> 
>> For each of those subsystems a configuration data will be emitted as well. 
>> The initial proposal is to emit the configuration data events at least once 
>> per chunk and the metrics values at 30 seconds interval. 
>> By using these values the emitted events seem to contain useful information 
>> without increasing overhead (the metrics values are read from `/proc` 
>> filesystem so that should not be done too frequently).
>
> Jaroslav Bachorik has updated the pull request with a new target base due to 
> a merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains 11 additional 
> commits since the last revision:
> 
>  - Fix event metadata
>  - Small fixes
>  - Remove trailing spaces
>  - Doh
>  - Report container type and register events conditionally
>  - Remove unused test files
>  - Initial test support for JFR container events
>  - Update the JFR control files
>  - Split off the CPU throttling metrics
>  - Formatting spaces
>  - ... and 1 more: 
> https://git.openjdk.java.net/jdk/compare/a32c4b70...c3fa274c

Thanks for the review!
I've fixed the outstanding test failures and the patch is in its final form.

-

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v13]

2021-05-19 Thread Jaroslav Bachorik
> With this change it becomes possible to surface various cgroup level metrics 
> (available via `jdk.internal.platform.Metrics`) as JFR events.
> 
> Only a subset of the metrics exposed by `jdk.internal.platform.Metrics` is 
> turned into JFR events to start with.
> * CPU related metrics
> * Memory related metrics
> * I/O related metrics
> 
> For each of those subsystems a configuration data will be emitted as well. 
> The initial proposal is to emit the configuration data events at least once 
> per chunk and the metrics values at 30 seconds interval. 
> By using these values the emitted events seem to contain useful information 
> without increasing overhead (the metrics values are read from `/proc` 
> filesystem so that should not be done too frequently).

Jaroslav Bachorik has refreshed the contents of this pull request, and previous 
commits have been removed. The incremental views will show differences compared 
to the previous content of the PR. The pull request contains one new commit 
since the last revision:

  Fix event metadata

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3126/files
  - new: https://git.openjdk.java.net/jdk/pull/3126/files/c3fa274c..e0f5855b

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3126&range=12
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3126&range=11-12

  Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3126.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3126/head:pull/3126

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v12]

2021-05-19 Thread Jaroslav Bachorik
> With this change it becomes possible to surface various cgroup level metrics 
> (available via `jdk.internal.platform.Metrics`) as JFR events.
> 
> Only a subset of the metrics exposed by `jdk.internal.platform.Metrics` is 
> turned into JFR events to start with.
> * CPU related metrics
> * Memory related metrics
> * I/O related metrics
> 
> For each of those subsystems a configuration data will be emitted as well. 
> The initial proposal is to emit the configuration data events at least once 
> per chunk and the metrics values at 30 seconds interval. 
> By using these values the emitted events seem to contain useful information 
> without increasing overhead (the metrics values are read from `/proc` 
> filesystem so that should not be done too frequently).

Jaroslav Bachorik has updated the pull request with a new target base due to a 
merge or a rebase. The incremental webrev excludes the unrelated changes 
brought in by the merge/rebase. The pull request contains 11 additional commits 
since the last revision:

 - Fix event metadata
 - Small fixes
 - Remove trailing spaces
 - Doh
 - Report container type and register events conditionally
 - Remove unused test files
 - Initial test support for JFR container events
 - Update the JFR control files
 - Split off the CPU throttling metrics
 - Formatting spaces
 - ... and 1 more: https://git.openjdk.java.net/jdk/compare/2f8f5ce3...c3fa274c

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3126/files
  - new: https://git.openjdk.java.net/jdk/pull/3126/files/fddd1727..c3fa274c

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3126&range=11
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3126&range=10-11

  Stats: 638229 lines in 6576 files changed: 92464 ins; 530978 del; 14787 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3126.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3126/head:pull/3126

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v11]

2021-05-19 Thread Jaroslav Bachorik
On Wed, 19 May 2021 10:00:04 GMT, Severin Gehwolf  wrote:

>> Jaroslav Bachorik has updated the pull request with a new target base due to 
>> a merge or a rebase. The pull request now contains 10 commits:
>> 
>>  - Small fixes
>>  - Remove trailing spaces
>>  - Doh
>>  - Report container type and register events conditionally
>>  - Remove unused test files
>>  - Initial test support for JFR container events
>>  - Update the JFR control files
>>  - Split off the CPU throttling metrics
>>  - Formatting spaces
>>  - 8203359: Container level resources events
>
> src/jdk.jfr/share/classes/jdk/jfr/events/ContainerMemoryUsageEvent.java line 
> 48:
> 
>> 46: @Description("(attempts per second * 1000), if enabled, that the 
>> operating system tries to satisfy a memory request for any " +
>> 47:  "process in the current container when no free memory 
>> is readily available.")
>> 48: public double memoryPressure;
> 
> Should this `memoryPressure` field go from `ContainerMemoryUsageEvent` class? 
> It's not set anywhere is it? would be cgroup v1 only api so I'm not sure it 
> should be there for a generic event like this.

Yes. Removing.

-

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v11]

2021-05-19 Thread Severin Gehwolf
On Mon, 3 May 2021 13:16:06 GMT, Jaroslav Bachorik  
wrote:

>> With this change it becomes possible to surface various cgroup level metrics 
>> (available via `jdk.internal.platform.Metrics`) as JFR events.
>> 
>> Only a subset of the metrics exposed by `jdk.internal.platform.Metrics` is 
>> turned into JFR events to start with.
>> * CPU related metrics
>> * Memory related metrics
>> * I/O related metrics
>> 
>> For each of those subsystems a configuration data will be emitted as well. 
>> The initial proposal is to emit the configuration data events at least once 
>> per chunk and the metrics values at 30 seconds interval. 
>> By using these values the emitted events seem to contain useful information 
>> without increasing overhead (the metrics values are read from `/proc` 
>> filesystem so that should not be done too frequently).
>
> Jaroslav Bachorik has updated the pull request with a new target base due to 
> a merge or a rebase. The pull request now contains 10 commits:
> 
>  - Small fixes
>  - Remove trailing spaces
>  - Doh
>  - Report container type and register events conditionally
>  - Remove unused test files
>  - Initial test support for JFR container events
>  - Update the JFR control files
>  - Split off the CPU throttling metrics
>  - Formatting spaces
>  - 8203359: Container level resources events

Changes requested by sgehwolf (Reviewer).

src/jdk.jfr/share/classes/jdk/jfr/events/ContainerMemoryUsageEvent.java line 48:

> 46: @Description("(attempts per second * 1000), if enabled, that the 
> operating system tries to satisfy a memory request for any " +
> 47:  "process in the current container when no free memory is 
> readily available.")
> 48: public double memoryPressure;

Should this `memoryPressure` field go from `ContainerMemoryUsageEvent` class? 
It's not set anywhere is it? would be cgroup v1 only api so I'm not sure it 
should be there for a generic event like this.

-

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v11]

2021-05-18 Thread Erik Gahlin
On Mon, 3 May 2021 13:16:06 GMT, Jaroslav Bachorik  
wrote:

>> With this change it becomes possible to surface various cgroup level metrics 
>> (available via `jdk.internal.platform.Metrics`) as JFR events.
>> 
>> Only a subset of the metrics exposed by `jdk.internal.platform.Metrics` is 
>> turned into JFR events to start with.
>> * CPU related metrics
>> * Memory related metrics
>> * I/O related metrics
>> 
>> For each of those subsystems a configuration data will be emitted as well. 
>> The initial proposal is to emit the configuration data events at least once 
>> per chunk and the metrics values at 30 seconds interval. 
>> By using these values the emitted events seem to contain useful information 
>> without increasing overhead (the metrics values are read from `/proc` 
>> filesystem so that should not be done too frequently).
>
> Jaroslav Bachorik has updated the pull request with a new target base due to 
> a merge or a rebase. The pull request now contains 10 commits:
> 
>  - Small fixes
>  - Remove trailing spaces
>  - Doh
>  - Report container type and register events conditionally
>  - Remove unused test files
>  - Initial test support for JFR container events
>  - Update the JFR control files
>  - Split off the CPU throttling metrics
>  - Formatting spaces
>  - 8203359: Container level resources events

Looks good, but if there are test issues they should be fixed.

-

Marked as reviewed by egahlin (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v11]

2021-05-03 Thread Jaroslav Bachorik
> With this change it becomes possible to surface various cgroup level metrics 
> (available via `jdk.internal.platform.Metrics`) as JFR events.
> 
> Only a subset of the metrics exposed by `jdk.internal.platform.Metrics` is 
> turned into JFR events to start with.
> * CPU related metrics
> * Memory related metrics
> * I/O related metrics
> 
> For each of those subsystems a configuration data will be emitted as well. 
> The initial proposal is to emit the configuration data events at least once 
> per chunk and the metrics values at 30 seconds interval. 
> By using these values the emitted events seem to contain useful information 
> without increasing overhead (the metrics values are read from `/proc` 
> filesystem so that should not be done too frequently).

Jaroslav Bachorik has updated the pull request with a new target base due to a 
merge or a rebase. The incremental webrev excludes the unrelated changes 
brought in by the merge/rebase. The pull request contains 10 additional commits 
since the last revision:

 - Small fixes
 - Remove trailing spaces
 - Doh
 - Report container type and register events conditionally
 - Remove unused test files
 - Initial test support for JFR container events
 - Update the JFR control files
 - Split off the CPU throttling metrics
 - Formatting spaces
 - 8203359: Container level resources events

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3126/files
  - new: https://git.openjdk.java.net/jdk/pull/3126/files/04c3f092..fddd1727

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3126&range=10
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3126&range=09-10

  Stats: 93348 lines in 2177 files changed: 37265 ins; 49988 del; 6095 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3126.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3126/head:pull/3126

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v10]

2021-05-03 Thread Jaroslav Bachorik
On Tue, 27 Apr 2021 09:40:01 GMT, Severin Gehwolf  wrote:

>> Jaroslav Bachorik has updated the pull request with a new target base due to 
>> a merge or a rebase. The incremental webrev excludes the unrelated changes 
>> brought in by the merge/rebase. The pull request contains 13 additional 
>> commits since the last revision:
>> 
>>  - Prevent event container bytecode generation if no container present
>>  - Fix event metadata
>>  - Roll back conditional registration of container events
>>  - Remove container events flag
>>  - Remove trailing spaces
>>  - Doh
>>  - Report container type and register events conditionally
>>  - Remove unused test files
>>  - Initial test support for JFR container events
>>  - Update the JFR control files
>>  - ... and 3 more: 
>> https://git.openjdk.java.net/jdk/compare/3322e9ff...04c3f092
>
> test/hotspot/jtreg/containers/docker/TestJFREvents.java line 147:
> 
>> 145:   .addClassOptions(eventName, 
>> "period=endChunk"))
>> 146: .shouldHaveExitValue(0)
>> 147: .shouldContain(memoryPressureFld)
> 
> This test fails for me on cgroupv1 with:
> 
> 
> --System.err:(42/1407)--
>  stdout: [= EventType: jdk.ContainerMemoryUsage
> startTime = 946400166
> duration = 0
> eventThread = {
>   osName = "main"
>   osThreadId = 6
>   javaName = "main"
>   javaThreadId = 1
>   group = {
> parent = {
>   parent = N/A
>   name = "system"
> }
> name = "main"
>   }
> }
> 
> stackTrace = null
> memoryFailCount = 0
> memoryUsage = 57786368
> swapMemoryUsage = 57782272
> ];
>  stderr: []
>  exitValue = 0
> 
> java.lang.RuntimeException: 'memoryPressure' missing from stdout/stderr
> 
> at 
> jdk.test.lib.process.OutputAnalyzer.shouldContain(OutputAnalyzer.java:206)
> at TestJFREvents.testMemoryUsage(TestJFREvents.java:147)
> at TestJFREvents.main(TestJFREvents.java:77)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78)
> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:568)
> at 
> com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127)
> at java.base/java.lang.Thread.run(Thread.java:831)
> 
> JavaTest Message: Test threw exception: java.lang.RuntimeException: 
> 'memoryPressure' missing from stdout/stderr
> 
> JavaTest Message: shutting down test
> 
> 
> I think `memoryPressure` got removed from the code and, thus, should get 
> removed from the test.

Will fix

-

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v10]

2021-05-03 Thread Jaroslav Bachorik
On Mon, 26 Apr 2021 21:20:57 GMT, Erik Gahlin  wrote:

>> Jaroslav Bachorik has updated the pull request with a new target base due to 
>> a merge or a rebase. The incremental webrev excludes the unrelated changes 
>> brought in by the merge/rebase. The pull request contains 13 additional 
>> commits since the last revision:
>> 
>>  - Prevent event container bytecode generation if no container present
>>  - Fix event metadata
>>  - Roll back conditional registration of container events
>>  - Remove container events flag
>>  - Remove trailing spaces
>>  - Doh
>>  - Report container type and register events conditionally
>>  - Remove unused test files
>>  - Initial test support for JFR container events
>>  - Update the JFR control files
>>  - ... and 3 more: 
>> https://git.openjdk.java.net/jdk/compare/c00a534f...04c3f092
>
> src/jdk.jfr/share/classes/jdk/jfr/internal/Utils.java line 729:
> 
>> 727: 
>> 728: public static boolean shouldSkipBytecode(String eventName, Class 
>> superClass) {
>> 729: if 
>> (!superClass.getName().equals("jdk.jfr.events.AbstractJDKEvent")) {
> 
> Was there a problem checking against the class instance? If so, perhaps you 
> could add a check that the class is in the boot class loader (null).

Yes, `AbstractJDKEvent` is package private so it is not accessible from here.

> src/jdk.jfr/share/classes/jdk/jfr/internal/Utils.java line 737:
> 
>> 735: private static Metrics getMetrics() {
>> 736: if (metrics == null) {
>> 737: metrics = Metrics.systemMetrics();
> 
> Will this not lead to a lookup every time in an non-container environment?

Yes. Now I see why you used `Metrics[]` - will fix.

-

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v9]

2021-04-27 Thread Severin Gehwolf
On Thu, 22 Apr 2021 15:08:59 GMT, Jaroslav Bachorik  
wrote:

>> I wonder if something similar to below could be added to 
>> jdk.jfr.internal.Utils:
>> 
>> private static Metrics[] metrics;
>> public static Metrics getMetrics() {
>> if (metrics == null) {
>> metrics = new Metrics[] { Metrics.systemMetrics() };
>> }
>> return metrics[0];
>> }
>> 
>> public static boolean shouldSkipBytecode(String eventName, Class 
>> superClass) {
>> if (superClass != AbstractJDKEvent.class) {
>> return false;
>> }
>> if (!eventName.startsWith("jdk.Container")) {
>> return false;
>> }
>> return getMetrics() == null;
>> }
>> 
>> Then we could add checks to 
>> jdk.jfr.internal.JVMUpcalls::bytesForEagerInstrumentation(...)
>> 
>> eventName = ei.getEventName();
>> if (Utils.shouldSkipBytecode(eventName, superClass))) {
>> return oldBytes;
>> }
>> 
>> and jdk.jfr.internal.JVMUpcalls:onRetransform(...)
>> 
>> if (jdk.internal.event.Event.class.isAssignableFrom(clazz) && 
>> !Modifier.isAbstract(clazz.getModifiers())) {
>> if (Utils.shouldSkipBytecode(clazz.getName(), 
>> clazz.getSuperclass())) {
>> return oldBytes;
>> }
>> 
>> This way we would not pay for generating bytecode for events in a 
>> non-container environment. 
>> 
>> Not sure if it works, but could perhaps make startup faster? We would still 
>> pay for generating the event handlers during registration, but it's much 
>> trickier to avoid since we need to store the event type somewhere.
>
> @egahlin Sounds good.
> Any particular reason you are using `Metrics[]` array?

> @jbachorik Has this been tested on cgroups v1 and cgroups v2 Linux systems?

OK. I've tested the latest iteration on both (cgroup v2 and cgroup v1). Testing 
looks good other than the `memoryPressure` issue.

-

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v10]

2021-04-27 Thread Severin Gehwolf
On Thu, 22 Apr 2021 16:00:32 GMT, Jaroslav Bachorik  
wrote:

>> With this change it becomes possible to surface various cgroup level metrics 
>> (available via `jdk.internal.platform.Metrics`) as JFR events.
>> 
>> Only a subset of the metrics exposed by `jdk.internal.platform.Metrics` is 
>> turned into JFR events to start with.
>> * CPU related metrics
>> * Memory related metrics
>> * I/O related metrics
>> 
>> For each of those subsystems a configuration data will be emitted as well. 
>> The initial proposal is to emit the configuration data events at least once 
>> per chunk and the metrics values at 30 seconds interval. 
>> By using these values the emitted events seem to contain useful information 
>> without increasing overhead (the metrics values are read from `/proc` 
>> filesystem so that should not be done too frequently).
>
> Jaroslav Bachorik has updated the pull request with a new target base due to 
> a merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains 13 additional 
> commits since the last revision:
> 
>  - Prevent event container bytecode generation if no container present
>  - Fix event metadata
>  - Roll back conditional registration of container events
>  - Remove container events flag
>  - Remove trailing spaces
>  - Doh
>  - Report container type and register events conditionally
>  - Remove unused test files
>  - Initial test support for JFR container events
>  - Update the JFR control files
>  - ... and 3 more: 
> https://git.openjdk.java.net/jdk/compare/23dc6e83...04c3f092

@jbachorik The test needs fixing.

test/hotspot/jtreg/containers/docker/TestJFREvents.java line 147:

> 145:   .addClassOptions(eventName, 
> "period=endChunk"))
> 146: .shouldHaveExitValue(0)
> 147: .shouldContain(memoryPressureFld)

This test fails for me on cgroupv1 with:


--System.err:(42/1407)--
 stdout: [= EventType: jdk.ContainerMemoryUsage
startTime = 946400166
duration = 0
eventThread = {
  osName = "main"
  osThreadId = 6
  javaName = "main"
  javaThreadId = 1
  group = {
parent = {
  parent = N/A
  name = "system"
}
name = "main"
  }
}

stackTrace = null
memoryFailCount = 0
memoryUsage = 57786368
swapMemoryUsage = 57782272
];
 stderr: []
 exitValue = 0

java.lang.RuntimeException: 'memoryPressure' missing from stdout/stderr

at 
jdk.test.lib.process.OutputAnalyzer.shouldContain(OutputAnalyzer.java:206)
at TestJFREvents.testMemoryUsage(TestJFREvents.java:147)
at TestJFREvents.main(TestJFREvents.java:77)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at 
com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127)
at java.base/java.lang.Thread.run(Thread.java:831)

JavaTest Message: Test threw exception: java.lang.RuntimeException: 
'memoryPressure' missing from stdout/stderr

JavaTest Message: shutting down test


I think `memoryPressure` got removed from the code and, thus, should get 
removed from the test.

-

Changes requested by sgehwolf (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v9]

2021-04-27 Thread Severin Gehwolf
On Thu, 22 Apr 2021 15:08:59 GMT, Jaroslav Bachorik  
wrote:

>> I wonder if something similar to below could be added to 
>> jdk.jfr.internal.Utils:
>> 
>> private static Metrics[] metrics;
>> public static Metrics getMetrics() {
>> if (metrics == null) {
>> metrics = new Metrics[] { Metrics.systemMetrics() };
>> }
>> return metrics[0];
>> }
>> 
>> public static boolean shouldSkipBytecode(String eventName, Class 
>> superClass) {
>> if (superClass != AbstractJDKEvent.class) {
>> return false;
>> }
>> if (!eventName.startsWith("jdk.Container")) {
>> return false;
>> }
>> return getMetrics() == null;
>> }
>> 
>> Then we could add checks to 
>> jdk.jfr.internal.JVMUpcalls::bytesForEagerInstrumentation(...)
>> 
>> eventName = ei.getEventName();
>> if (Utils.shouldSkipBytecode(eventName, superClass))) {
>> return oldBytes;
>> }
>> 
>> and jdk.jfr.internal.JVMUpcalls:onRetransform(...)
>> 
>> if (jdk.internal.event.Event.class.isAssignableFrom(clazz) && 
>> !Modifier.isAbstract(clazz.getModifiers())) {
>> if (Utils.shouldSkipBytecode(clazz.getName(), 
>> clazz.getSuperclass())) {
>> return oldBytes;
>> }
>> 
>> This way we would not pay for generating bytecode for events in a 
>> non-container environment. 
>> 
>> Not sure if it works, but could perhaps make startup faster? We would still 
>> pay for generating the event handlers during registration, but it's much 
>> trickier to avoid since we need to store the event type somewhere.
>
> @egahlin Sounds good.
> Any particular reason you are using `Metrics[]` array?

@jbachorik Has this been tested on cgroups v1 and cgroups v2 Linux systems?

-

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v10]

2021-04-26 Thread Erik Gahlin
On Thu, 22 Apr 2021 16:00:32 GMT, Jaroslav Bachorik  
wrote:

>> With this change it becomes possible to surface various cgroup level metrics 
>> (available via `jdk.internal.platform.Metrics`) as JFR events.
>> 
>> Only a subset of the metrics exposed by `jdk.internal.platform.Metrics` is 
>> turned into JFR events to start with.
>> * CPU related metrics
>> * Memory related metrics
>> * I/O related metrics
>> 
>> For each of those subsystems a configuration data will be emitted as well. 
>> The initial proposal is to emit the configuration data events at least once 
>> per chunk and the metrics values at 30 seconds interval. 
>> By using these values the emitted events seem to contain useful information 
>> without increasing overhead (the metrics values are read from `/proc` 
>> filesystem so that should not be done too frequently).
>
> Jaroslav Bachorik has updated the pull request with a new target base due to 
> a merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains 13 additional 
> commits since the last revision:
> 
>  - Prevent event container bytecode generation if no container present
>  - Fix event metadata
>  - Roll back conditional registration of container events
>  - Remove container events flag
>  - Remove trailing spaces
>  - Doh
>  - Report container type and register events conditionally
>  - Remove unused test files
>  - Initial test support for JFR container events
>  - Update the JFR control files
>  - ... and 3 more: 
> https://git.openjdk.java.net/jdk/compare/904d9495...04c3f092

src/jdk.jfr/share/classes/jdk/jfr/internal/Utils.java line 729:

> 727: 
> 728: public static boolean shouldSkipBytecode(String eventName, Class 
> superClass) {
> 729: if 
> (!superClass.getName().equals("jdk.jfr.events.AbstractJDKEvent")) {

Was there a problem checking against the class instance? If so, perhaps you 
could add a check that the class is in the boot class loader (null).

src/jdk.jfr/share/classes/jdk/jfr/internal/Utils.java line 737:

> 735: private static Metrics getMetrics() {
> 736: if (metrics == null) {
> 737: metrics = Metrics.systemMetrics();

Will this not lead to a lookup every time in an non-container environment?

-

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v9]

2021-04-22 Thread Jaroslav Bachorik
On Wed, 21 Apr 2021 22:38:32 GMT, Erik Gahlin  wrote:

>> Jaroslav Bachorik has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Fix event metadata
>
> I wonder if something similar to below could be added to 
> jdk.jfr.internal.Utils:
> 
> private static Metrics[] metrics;
> public static Metrics getMetrics() {
> if (metrics == null) {
> metrics = new Metrics[] { Metrics.systemMetrics() };
> }
> return metrics[0];
> }
> 
> public static boolean shouldSkipBytecode(String eventName, Class 
> superClass) {
> if (superClass != AbstractJDKEvent.class) {
> return false;
> }
> if (!eventName.startsWith("jdk.Container")) {
> return false;
> }
> return getMetrics() == null;
> }
> 
> Then we could add checks to 
> jdk.jfr.internal.JVMUpcalls::bytesForEagerInstrumentation(...)
> 
> eventName = ei.getEventName();
> if (Utils.shouldSkipBytecode(eventName, superClass))) {
> return oldBytes;
> }
> 
> and jdk.jfr.internal.JVMUpcalls:onRetransform(...)
> 
> if (jdk.internal.event.Event.class.isAssignableFrom(clazz) && 
> !Modifier.isAbstract(clazz.getModifiers())) {
> if (Utils.shouldSkipBytecode(clazz.getName(), clazz.getSuperclass())) 
> {
> return oldBytes;
> }
> 
> This way we would not pay for generating bytecode for events in a 
> non-container environment. 
> 
> Not sure if it works, but could perhaps make startup faster? We would still 
> pay for generating the event handlers during registration, but it's much 
> trickier to avoid since we need to store the event type somewhere.

@egahlin Sounds good.
Any particular reason you are using `Metrics[]` array?

-

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v10]

2021-04-22 Thread Jaroslav Bachorik
> With this change it becomes possible to surface various cgroup level metrics 
> (available via `jdk.internal.platform.Metrics`) as JFR events.
> 
> Only a subset of the metrics exposed by `jdk.internal.platform.Metrics` is 
> turned into JFR events to start with.
> * CPU related metrics
> * Memory related metrics
> * I/O related metrics
> 
> For each of those subsystems a configuration data will be emitted as well. 
> The initial proposal is to emit the configuration data events at least once 
> per chunk and the metrics values at 30 seconds interval. 
> By using these values the emitted events seem to contain useful information 
> without increasing overhead (the metrics values are read from `/proc` 
> filesystem so that should not be done too frequently).

Jaroslav Bachorik has updated the pull request with a new target base due to a 
merge or a rebase. The incremental webrev excludes the unrelated changes 
brought in by the merge/rebase. The pull request contains 13 additional commits 
since the last revision:

 - Prevent event container bytecode generation if no container present
 - Fix event metadata
 - Roll back conditional registration of container events
 - Remove container events flag
 - Remove trailing spaces
 - Doh
 - Report container type and register events conditionally
 - Remove unused test files
 - Initial test support for JFR container events
 - Update the JFR control files
 - ... and 3 more: https://git.openjdk.java.net/jdk/compare/7babcb00...04c3f092

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3126/files
  - new: https://git.openjdk.java.net/jdk/pull/3126/files/f766faf0..04c3f092

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3126&range=09
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3126&range=08-09

  Stats: 36024 lines in 1257 files changed: 5443 ins; 26264 del; 4317 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3126.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3126/head:pull/3126

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v9]

2021-04-21 Thread Erik Gahlin
On Wed, 21 Apr 2021 13:34:28 GMT, Jaroslav Bachorik  
wrote:

>> With this change it becomes possible to surface various cgroup level metrics 
>> (available via `jdk.internal.platform.Metrics`) as JFR events.
>> 
>> Only a subset of the metrics exposed by `jdk.internal.platform.Metrics` is 
>> turned into JFR events to start with.
>> * CPU related metrics
>> * Memory related metrics
>> * I/O related metrics
>> 
>> For each of those subsystems a configuration data will be emitted as well. 
>> The initial proposal is to emit the configuration data events at least once 
>> per chunk and the metrics values at 30 seconds interval. 
>> By using these values the emitted events seem to contain useful information 
>> without increasing overhead (the metrics values are read from `/proc` 
>> filesystem so that should not be done too frequently).
>
> Jaroslav Bachorik has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Fix event metadata

I wonder if something similar to below could be added to jdk.jfr.internal.Utils:

private static Metrics[] metrics;
public static Metrics getMetrics() {
if (metrics == null) {
metrics = new Metrics[] { Metrics.systemMetrics() };
}
return metrics[0];
}

public static boolean shouldSkipBytecode(String eventName, Class 
superClass) {
if (superClass.getClassLoader() != null) {
return false;
}
if (!eventName.startsWith("jdk.Container")) {
return false;
}
return getMetrics() == null;
}

Then we could add checks to 
jdk.jfr.internal.JVMUpcalls::bytesForEagerInstrumentation(...)

eventName = ei.getEventName();
if (Utils.shouldSkipBytecode(eventName, superClass))) {
return oldBytes;
}

and jdk.jfr.internal.JVMUpcalls:onRetransform(...)

if (jdk.internal.event.Event.class.isAssignableFrom(clazz) && 
!Modifier.isAbstract(clazz.getModifiers())) {
if (Utils.shouldSkipBytecode(clazz.getName(), clazz.getSuperclass())) {
return oldBytes;
}

This way we would not pay for generating bytecode for events in a non-container 
environment. 

Not sure if it works, but could perhaps make startup faster? We would still pay 
for generating the event handlers during registration, but it's much trickier 
to avoid since we need to store the event type somewhere.

-

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v7]

2021-04-21 Thread Jaroslav Bachorik
On Wed, 21 Apr 2021 11:49:10 GMT, Erik Gahlin  wrote:

>> Ok. So what would be a good rule-of-thumb for when one should introduce a 
>> flag?
>
> I think we want to limit the number flags/options There are already too many, 
> preferably we would have none, but some of the ones we have today, like gc 
> and exception, are necessary, because the impact of have them on by default 
> would be catastrophic (long stop the world pauses and possibly million of 
> events per second).

Thanks!
Removed the flag.

-

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v8]

2021-04-21 Thread Jaroslav Bachorik
On Wed, 21 Apr 2021 13:21:32 GMT, Jaroslav Bachorik  
wrote:

>> How does it look in proc?
>
> This was based on 
> https://github.com/openjdk/jdk/blob/da860290c2657c0fb1de8c77c8dffdb35f1cf938/src/java.base/linux/classes/jdk/internal/platform/CgroupV1Metrics.java#L149
> 
> However, that metric is  'internal'  only so it really does not make sense to 
> keep it here and is there only by mistake.

Removed

-

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v8]

2021-04-21 Thread Jaroslav Bachorik
On Wed, 21 Apr 2021 11:50:25 GMT, Erik Gahlin  wrote:

>> I guess we could fit those events under `Operating System/Memory` and 
>> `Operating System/Processor` categories.
>> What about I/O? Currently, there is only `Operating System/Network` 
>> category. The options are:
>> * Add `Operating System/IO` category and move `Network` to `Operating 
>> System/IO/Network`
>> * Add `Operation System/FileSystem` category and move the container IO event 
>> there
>> 
>> What would you prefer?
>
> I prefer adding File System (or having separate container category).

Added `Operating System/File System` category

-

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v8]

2021-04-21 Thread Jaroslav Bachorik
On Wed, 14 Apr 2021 10:26:44 GMT, Erik Gahlin  wrote:

>> Jaroslav Bachorik has updated the pull request with a new target base due to 
>> a merge or a rebase. The pull request now contains 11 commits:
>> 
>>  - Roll back conditional registration of container events
>>  - Remove container events flag
>>  - Remove trailing spaces
>>  - Doh
>>  - Report container type and register events conditionally
>>  - Remove unused test files
>>  - Initial test support for JFR container events
>>  - Update the JFR control files
>>  - Split off the CPU throttling metrics
>>  - Formatting spaces
>>  - ... and 1 more: 
>> https://git.openjdk.java.net/jdk/compare/e80012ed...67a61bd7
>
> src/jdk.jfr/share/classes/jdk/jfr/events/ContainerCPUThrottlingEvent.java 
> line 46:
> 
>> 44: public class ContainerCPUThrottlingEvent extends AbstractJDKEvent {
>> 45:   @Label("CPU Elapsed Slices")
>> 46:   @Description("Number of time-slice periods that have elapsed if a CPU 
>> quota has been setup for the container.")
> 
> If the description is one sentence, period should not be included.

Fixed in all locations

> src/jdk.jfr/share/classes/jdk/jfr/events/ContainerCPUUsageEvent.java line 46:
> 
>> 44: public class ContainerCPUUsageEvent extends AbstractJDKEvent {
>> 45:   @Label("CPU Time")
>> 46:   @Description("Aggregate time, in nanoseconds, consumed by all tasks in 
>> the container.")
> 
> We usually skip the unit "nanoseconds" in descriptions when the field has a 
> content type that describes the unit.

Gone

> src/jdk.jfr/share/classes/jdk/jfr/events/ContainerConfigurationEvent.java 
> line 45:
> 
>> 43: @Description("A set of container specific attributes.")
>> 44: public final class ContainerConfigurationEvent extends AbstractJDKEvent {
>> 45: @Label("Container type")
> 
> Capitalize "type" in the label

Done

> src/jdk.jfr/share/classes/jdk/jfr/events/ContainerConfigurationEvent.java 
> line 78:
> 
>> 76: 
>> 77: @Label("Memory and Swap Limit")
>> 78: @Description("Maximum amount of physical memory and swap space, in 
>> bytes, that can be allocated in the container.")
> 
> No need to mention bytes in the description when the field has DataAmount 
> annotation.

Yep. Done.

> src/jdk.jfr/share/classes/jdk/jfr/events/ContainerIOUsageEvent.java line 47:
> 
>> 45: public class ContainerIOUsageEvent extends AbstractJDKEvent {
>> 46: 
>> 47:   @Label("BlkIO Request Count")
> 
> Change to "Block IO"

Done

-

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v9]

2021-04-21 Thread Jaroslav Bachorik
> With this change it becomes possible to surface various cgroup level metrics 
> (available via `jdk.internal.platform.Metrics`) as JFR events.
> 
> Only a subset of the metrics exposed by `jdk.internal.platform.Metrics` is 
> turned into JFR events to start with.
> * CPU related metrics
> * Memory related metrics
> * I/O related metrics
> 
> For each of those subsystems a configuration data will be emitted as well. 
> The initial proposal is to emit the configuration data events at least once 
> per chunk and the metrics values at 30 seconds interval. 
> By using these values the emitted events seem to contain useful information 
> without increasing overhead (the metrics values are read from `/proc` 
> filesystem so that should not be done too frequently).

Jaroslav Bachorik has updated the pull request incrementally with one 
additional commit since the last revision:

  Fix event metadata

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3126/files
  - new: https://git.openjdk.java.net/jdk/pull/3126/files/67a61bd7..f766faf0

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3126&range=08
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3126&range=07-08

  Stats: 36 lines in 5 files changed: 0 ins; 5 del; 31 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3126.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3126/head:pull/3126

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v8]

2021-04-21 Thread Jaroslav Bachorik
On Wed, 21 Apr 2021 11:55:48 GMT, Erik Gahlin  wrote:

>> I don't see this event field being set anywhere? Which Metrics API call is 
>> this using?
>
> How does it look in proc?

This was based on 
https://github.com/openjdk/jdk/blob/da860290c2657c0fb1de8c77c8dffdb35f1cf938/src/java.base/linux/classes/jdk/internal/platform/CgroupV1Metrics.java#L149

However, that metric is  'internal'  only so it really does not make sense to 
keep it here and is there only by mistake.

-

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v8]

2021-04-21 Thread Erik Gahlin
On Wed, 14 Apr 2021 14:32:32 GMT, Severin Gehwolf  wrote:

>> This is taken as reported by cgroups - I didn't want to change the semantics 
>> so it does not confuse people familiar with cgroups.
>
> I don't see this event field being set anywhere? Which Metrics API call is 
> this using?

How does it look in proc?

-

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v7]

2021-04-21 Thread Erik Gahlin
On Wed, 14 Apr 2021 08:28:01 GMT, Jaroslav Bachorik  
wrote:

>> src/jdk.jfr/share/conf/jfr/default.jfc line 1051:
>> 
>>> 1049:   false
>>> 1050: 
>>> 1051:   true
>> 
>> I don't think we should create "flag" for "Container Events". Instead we 
>> should treat them like CPU and other OS events, always on. Since JFR can be 
>> used outside a container, it seems wrong to have this as an option.
>
> Ok. So what would be a good rule-of-thumb for when one should introduce a 
> flag?

I think we want to limit the number flags/options There are already too many, 
preferably we would have none, but some of the ones we have today, like gc and 
exception, are necessary, because the impact of have them on by default would 
be catastrophic (long stop the world pauses and possibly million of events per 
second).

-

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v8]

2021-04-21 Thread Erik Gahlin
On Wed, 14 Apr 2021 12:14:33 GMT, Jaroslav Bachorik  
wrote:

>> src/jdk.jfr/share/classes/jdk/jfr/events/ContainerCPUThrottlingEvent.java 
>> line 44:
>> 
>>> 42: @Category({"Operating System", "Container", "Processor"})
>>> 43: @Description("Container CPU throttling related information.")
>>> 44: public class ContainerCPUThrottlingEvent extends AbstractJDKEvent {
>> 
>> I wonder if we should put all the container events in the same category 
>> {"Operating System", "Container"}, Or possibly add them under the already 
>> existing categories under "Operating System"?
>
> I guess we could fit those events under `Operating System/Memory` and 
> `Operating System/Processor` categories.
> What about I/O? Currently, there is only `Operating System/Network` category. 
> The options are:
> * Add `Operating System/IO` category and move `Network` to `Operating 
> System/IO/Network`
> * Add `Operation System/FileSystem` category and move the container IO event 
> there
> 
> What would you prefer?

I prefer adding File System (or having separate container category).

-

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v8]

2021-04-14 Thread Severin Gehwolf
On Wed, 14 Apr 2021 12:03:12 GMT, Jaroslav Bachorik  
wrote:

>> src/jdk.jfr/share/classes/jdk/jfr/events/ContainerMemoryUsageEvent.java line 
>> 46:
>> 
>>> 44: public final class ContainerMemoryUsageEvent extends AbstractJDKEvent {
>>> 45: @Label("Memory Pressure")
>>> 46: @Description("(attempts per second * 1000), if enabled, that the 
>>> operating system tries to satisfy a memory request for any " +
>> 
>> This unit seems a bit strange. Do we really need to multiply by 1000?
>
> This is taken as reported by cgroups - I didn't want to change the semantics 
> so it does not confuse people familiar with cgroups.

I don't see this event field being set anywhere? Which Metrics API call is this 
using?

-

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v8]

2021-04-14 Thread Jaroslav Bachorik
On Wed, 14 Apr 2021 10:25:09 GMT, Erik Gahlin  wrote:

>> Jaroslav Bachorik has updated the pull request with a new target base due to 
>> a merge or a rebase. The incremental webrev excludes the unrelated changes 
>> brought in by the merge/rebase. The pull request contains 11 additional 
>> commits since the last revision:
>> 
>>  - Roll back conditional registration of container events
>>  - Remove container events flag
>>  - Remove trailing spaces
>>  - Doh
>>  - Report container type and register events conditionally
>>  - Remove unused test files
>>  - Initial test support for JFR container events
>>  - Update the JFR control files
>>  - Split off the CPU throttling metrics
>>  - Formatting spaces
>>  - ... and 1 more: 
>> https://git.openjdk.java.net/jdk/compare/19aad098...67a61bd7
>
> src/jdk.jfr/share/classes/jdk/jfr/events/ContainerCPUThrottlingEvent.java 
> line 44:
> 
>> 42: @Category({"Operating System", "Container", "Processor"})
>> 43: @Description("Container CPU throttling related information.")
>> 44: public class ContainerCPUThrottlingEvent extends AbstractJDKEvent {
> 
> I wonder if we should put all the container events in the same category 
> {"Operating System", "Container"}, Or possibly add them under the already 
> existing categories under "Operating System"?

I guess we could fit those events under `Operating System/Memory` and 
`Operating System/Processor` categories.
What about I/O? Currently, there is only `Operating System/Network` category. 
The options are:
* Add `Operating System/IO` category and move `Network` to `Operating 
System/IO/Network`
* Add `Operation System/FileSystem` category and move the container IO event 
there

What would you prefer?

-

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v8]

2021-04-14 Thread Jaroslav Bachorik
On Wed, 14 Apr 2021 11:17:14 GMT, Erik Gahlin  wrote:

>> Jaroslav Bachorik has updated the pull request with a new target base due to 
>> a merge or a rebase. The incremental webrev excludes the unrelated changes 
>> brought in by the merge/rebase. The pull request contains 11 additional 
>> commits since the last revision:
>> 
>>  - Roll back conditional registration of container events
>>  - Remove container events flag
>>  - Remove trailing spaces
>>  - Doh
>>  - Report container type and register events conditionally
>>  - Remove unused test files
>>  - Initial test support for JFR container events
>>  - Update the JFR control files
>>  - Split off the CPU throttling metrics
>>  - Formatting spaces
>>  - ... and 1 more: 
>> https://git.openjdk.java.net/jdk/compare/b72abe91...67a61bd7
>
> src/jdk.jfr/share/classes/jdk/jfr/events/ContainerMemoryUsageEvent.java line 
> 46:
> 
>> 44: public final class ContainerMemoryUsageEvent extends AbstractJDKEvent {
>> 45: @Label("Memory Pressure")
>> 46: @Description("(attempts per second * 1000), if enabled, that the 
>> operating system tries to satisfy a memory request for any " +
> 
> This unit seems a bit strange. Do we really need to multiply by 1000?

This is taken as reported by cgroups - I didn't want to change the semantics so 
it does not confuse people familiar with cgroups.

-

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v8]

2021-04-14 Thread Erik Gahlin
On Wed, 14 Apr 2021 08:31:55 GMT, Jaroslav Bachorik  
wrote:

>> With this change it becomes possible to surface various cgroup level metrics 
>> (available via `jdk.internal.platform.Metrics`) as JFR events.
>> 
>> Only a subset of the metrics exposed by `jdk.internal.platform.Metrics` is 
>> turned into JFR events to start with.
>> * CPU related metrics
>> * Memory related metrics
>> * I/O related metrics
>> 
>> For each of those subsystems a configuration data will be emitted as well. 
>> The initial proposal is to emit the configuration data events at least once 
>> per chunk and the metrics values at 30 seconds interval. 
>> By using these values the emitted events seem to contain useful information 
>> without increasing overhead (the metrics values are read from `/proc` 
>> filesystem so that should not be done too frequently).
>
> Jaroslav Bachorik has updated the pull request with a new target base due to 
> a merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains 11 additional 
> commits since the last revision:
> 
>  - Roll back conditional registration of container events
>  - Remove container events flag
>  - Remove trailing spaces
>  - Doh
>  - Report container type and register events conditionally
>  - Remove unused test files
>  - Initial test support for JFR container events
>  - Update the JFR control files
>  - Split off the CPU throttling metrics
>  - Formatting spaces
>  - ... and 1 more: 
> https://git.openjdk.java.net/jdk/compare/1f93be70...67a61bd7

src/jdk.jfr/share/classes/jdk/jfr/events/ContainerCPUThrottlingEvent.java line 
44:

> 42: @Category({"Operating System", "Container", "Processor"})
> 43: @Description("Container CPU throttling related information.")
> 44: public class ContainerCPUThrottlingEvent extends AbstractJDKEvent {

I wonder if we should put all the container events in the same category 
{"Operating System", "Container"}, Or possibly add them under the already 
existing categories under "Operating System"?

src/jdk.jfr/share/classes/jdk/jfr/events/ContainerCPUThrottlingEvent.java line 
46:

> 44: public class ContainerCPUThrottlingEvent extends AbstractJDKEvent {
> 45:   @Label("CPU Elapsed Slices")
> 46:   @Description("Number of time-slice periods that have elapsed if a CPU 
> quota has been setup for the container.")

If the description is one sentence, period should not be included.

src/jdk.jfr/share/classes/jdk/jfr/events/ContainerCPUUsageEvent.java line 46:

> 44: public class ContainerCPUUsageEvent extends AbstractJDKEvent {
> 45:   @Label("CPU Time")
> 46:   @Description("Aggregate time, in nanoseconds, consumed by all tasks in 
> the container.")

We usually skip the unit "nanoseconds" in descriptions when the field has a 
content type that describes the unit.

src/jdk.jfr/share/classes/jdk/jfr/events/ContainerConfigurationEvent.java line 
45:

> 43: @Description("A set of container specific attributes.")
> 44: public final class ContainerConfigurationEvent extends AbstractJDKEvent {
> 45: @Label("Container type")

Capitalize "type" in the label

src/jdk.jfr/share/classes/jdk/jfr/events/ContainerConfigurationEvent.java line 
78:

> 76: 
> 77: @Label("Memory and Swap Limit")
> 78: @Description("Maximum amount of physical memory and swap space, in 
> bytes, that can be allocated in the container.")

No need to mention bytes in the description when the field has DataAmount 
annotation.

src/jdk.jfr/share/classes/jdk/jfr/events/ContainerIOUsageEvent.java line 47:

> 45: public class ContainerIOUsageEvent extends AbstractJDKEvent {
> 46: 
> 47:   @Label("BlkIO Request Count")

Change to "Block IO"

src/jdk.jfr/share/classes/jdk/jfr/events/ContainerMemoryUsageEvent.java line 46:

> 44: public final class ContainerMemoryUsageEvent extends AbstractJDKEvent {
> 45: @Label("Memory Pressure")
> 46: @Description("(attempts per second * 1000), if enabled, that the 
> operating system tries to satisfy a memory request for any " +

This unit seems a bit strange. Do we really need to multiply by 1000?

-

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v7]

2021-04-14 Thread Erik Gahlin
On Wed, 14 Apr 2021 08:28:37 GMT, Jaroslav Bachorik  
wrote:

>> src/jdk.jfr/share/classes/jdk/jfr/internal/instrument/JDKEvents.java line 
>> 163:
>> 
>>> 161: private static void initializeContainerEvents() {
>>> 162: containerMetrics = Container.metrics();
>>> 163: if (containerMetrics != null) {
>> 
>> I understand this will reduce startup time, but it's contrary to how we 
>> treat other events. 
>> 
>> We register events, even if they can't be used. We want users to see what 
>> events are available (and their metadata) and use JMC recording wizard or 
>> other means to configure a .jfc file without actually being connected to a 
>> containerized process. We want the same events to minimize (subtle) platform 
>> dependent bugs.
>> 
>> I think we should try to find other means to reduce the startup time. It's 
>> better to have consistent behaviour, but an initial implementation than 
>> isn't as performant, than inconsistent behavior and somewhat faster 
>> implementation.
>> 
>> At some point we will need to address the startup cost of registering Java 
>> events anyway. For example, we could generate metadata at build time in a 
>> binary format, similar to what we already do with native events. Could even 
>> be the same file. Then we can have hundreds of Java events without the cost 
>> of reflection and unnecessary class loading at startup. We could add a 
>> simple check so that bytecode for the container events (commit() etc) are 
>> not generated unless in a container environment. A couple of (cached) checks 
>> in JVMUpcalls may be sufficient to prevent instrumentation cost.
>
> Right. So, for the initial drop I will just leave the container events 
> registered unconditionally.

I think that is fine.

-

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v8]

2021-04-14 Thread Jaroslav Bachorik
> With this change it becomes possible to surface various cgroup level metrics 
> (available via `jdk.internal.platform.Metrics`) as JFR events.
> 
> Only a subset of the metrics exposed by `jdk.internal.platform.Metrics` is 
> turned into JFR events to start with.
> * CPU related metrics
> * Memory related metrics
> * I/O related metrics
> 
> For each of those subsystems a configuration data will be emitted as well. 
> The initial proposal is to emit the configuration data events at least once 
> per chunk and the metrics values at 30 seconds interval. 
> By using these values the emitted events seem to contain useful information 
> without increasing overhead (the metrics values are read from `/proc` 
> filesystem so that should not be done too frequently).

Jaroslav Bachorik has updated the pull request with a new target base due to a 
merge or a rebase. The incremental webrev excludes the unrelated changes 
brought in by the merge/rebase. The pull request contains 11 additional commits 
since the last revision:

 - Roll back conditional registration of container events
 - Remove container events flag
 - Remove trailing spaces
 - Doh
 - Report container type and register events conditionally
 - Remove unused test files
 - Initial test support for JFR container events
 - Update the JFR control files
 - Split off the CPU throttling metrics
 - Formatting spaces
 - ... and 1 more: https://git.openjdk.java.net/jdk/compare/faed1f5b...67a61bd7

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3126/files
  - new: https://git.openjdk.java.net/jdk/pull/3126/files/f4372e23..67a61bd7

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3126&range=07
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3126&range=06-07

  Stats: 76531 lines in 2761 files changed: 49167 ins; 17238 del; 10126 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3126.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3126/head:pull/3126

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v7]

2021-04-14 Thread Jaroslav Bachorik
On Mon, 12 Apr 2021 18:53:07 GMT, Erik Gahlin  wrote:

>> Jaroslav Bachorik has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Remove trailing spaces
>
> src/jdk.jfr/share/classes/jdk/jfr/internal/instrument/JDKEvents.java line 163:
> 
>> 161: private static void initializeContainerEvents() {
>> 162: containerMetrics = Container.metrics();
>> 163: if (containerMetrics != null) {
> 
> I understand this will reduce startup time, but it's contrary to how we treat 
> other events. 
> 
> We register events, even if they can't be used. We want users to see what 
> events are available (and their metadata) and use JMC recording wizard or 
> other means to configure a .jfc file without actually being connected to a 
> containerized process. We want the same events to minimize (subtle) platform 
> dependent bugs.
> 
> I think we should try to find other means to reduce the startup time. It's 
> better to have consistent behaviour, but an initial implementation than isn't 
> as performant, than inconsistent behavior and somewhat faster implementation.
> 
> At some point we will need to address the startup cost of registering Java 
> events anyway. For example, we could generate metadata at build time in a 
> binary format, similar to what we already do with native events. Could even 
> be the same file. Then we can have hundreds of Java events without the cost 
> of reflection and unnecessary class loading at startup. We could add a simple 
> check so that bytecode for the container events (commit() etc) are not 
> generated unless in a container environment. A couple of (cached) checks in 
> JVMUpcalls may be sufficient to prevent instrumentation cost.

Right. So, for the initial drop I will just leave the container events 
registered unconditionally.

> src/jdk.jfr/share/conf/jfr/default.jfc line 1051:
> 
>> 1049:   false
>> 1050: 
>> 1051:   true
> 
> I don't think we should create "flag" for "Container Events". Instead we 
> should treat them like CPU and other OS events, always on. Since JFR can be 
> used outside a container, it seems wrong to have this as an option.

Ok. So what would be a good rule-of-thumb for when one should introduce a flag?

-

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v7]

2021-04-12 Thread Erik Gahlin
On Fri, 2 Apr 2021 12:33:03 GMT, Jaroslav Bachorik  
wrote:

>> With this change it becomes possible to surface various cgroup level metrics 
>> (available via `jdk.internal.platform.Metrics`) as JFR events.
>> 
>> Only a subset of the metrics exposed by `jdk.internal.platform.Metrics` is 
>> turned into JFR events to start with.
>> * CPU related metrics
>> * Memory related metrics
>> * I/O related metrics
>> 
>> For each of those subsystems a configuration data will be emitted as well. 
>> The initial proposal is to emit the configuration data events at least once 
>> per chunk and the metrics values at 30 seconds interval. 
>> By using these values the emitted events seem to contain useful information 
>> without increasing overhead (the metrics values are read from `/proc` 
>> filesystem so that should not be done too frequently).
>
> Jaroslav Bachorik has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Remove trailing spaces

src/jdk.jfr/share/classes/jdk/jfr/internal/instrument/JDKEvents.java line 163:

> 161: private static void initializeContainerEvents() {
> 162: containerMetrics = Container.metrics();
> 163: if (containerMetrics != null) {

I understand this will reduce startup time, but it's contrary to how we treat 
other events. 

We register events, even if they can't be used. We want users to see what 
events are available (and their metadata) and use JMC recording wizard or other 
means to configure a .jfc file without actually being connected to a 
containerized process. We want the same events to minimize (subtle) platform 
dependent bugs.

I think we should try to find other means to reduce the startup time. It's 
better to have consistent behaviour, but an initial implementation than isn't 
as performant, than inconsistent behavior and somewhat faster implementation.

At some point we will need to address the startup cost of registering Java 
events anyway. For example, we could generate metadata at build time in a 
binary format, similar to what we already do with native events. Could even be 
the same file. Then we can have hundreds of Java events without the cost of 
reflection and unnecessary class loading at startup. We could add a simple 
check so that bytecode for the container events (commit() etc) are not 
generated unless in a container environment. A couple of (cached) checks in 
JVMUpcalls may be sufficient to prevent instrumentation cost.

src/jdk.jfr/share/conf/jfr/default.jfc line 1051:

> 1049:   false
> 1050: 
> 1051:   true

I don't think we should create "flag" for "Container Events". Instead we should 
treat them like CPU and other OS events, always on. Since JFR can be used 
outside a container, it seems wrong to have this as an option.

-

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v7]

2021-04-02 Thread Jaroslav Bachorik
> With this change it becomes possible to surface various cgroup level metrics 
> (available via `jdk.internal.platform.Metrics`) as JFR events.
> 
> Only a subset of the metrics exposed by `jdk.internal.platform.Metrics` is 
> turned into JFR events to start with.
> * CPU related metrics
> * Memory related metrics
> * I/O related metrics
> 
> For each of those subsystems a configuration data will be emitted as well. 
> The initial proposal is to emit the configuration data events at least once 
> per chunk and the metrics values at 30 seconds interval. 
> By using these values the emitted events seem to contain useful information 
> without increasing overhead (the metrics values are read from `/proc` 
> filesystem so that should not be done too frequently).

Jaroslav Bachorik has updated the pull request incrementally with one 
additional commit since the last revision:

  Remove trailing spaces

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3126/files
  - new: https://git.openjdk.java.net/jdk/pull/3126/files/78dc85d3..f4372e23

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3126&range=06
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3126&range=05-06

  Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3126.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3126/head:pull/3126

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v6]

2021-04-02 Thread Jaroslav Bachorik
> With this change it becomes possible to surface various cgroup level metrics 
> (available via `jdk.internal.platform.Metrics`) as JFR events.
> 
> Only a subset of the metrics exposed by `jdk.internal.platform.Metrics` is 
> turned into JFR events to start with.
> * CPU related metrics
> * Memory related metrics
> * I/O related metrics
> 
> For each of those subsystems a configuration data will be emitted as well. 
> The initial proposal is to emit the configuration data events at least once 
> per chunk and the metrics values at 30 seconds interval. 
> By using these values the emitted events seem to contain useful information 
> without increasing overhead (the metrics values are read from `/proc` 
> filesystem so that should not be done too frequently).

Jaroslav Bachorik has updated the pull request incrementally with one 
additional commit since the last revision:

  Doh

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3126/files
  - new: https://git.openjdk.java.net/jdk/pull/3126/files/2514b1a7..78dc85d3

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3126&range=05
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3126&range=04-05

  Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3126.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3126/head:pull/3126

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v5]

2021-04-02 Thread Jaroslav Bachorik
> With this change it becomes possible to surface various cgroup level metrics 
> (available via `jdk.internal.platform.Metrics`) as JFR events.
> 
> Only a subset of the metrics exposed by `jdk.internal.platform.Metrics` is 
> turned into JFR events to start with.
> * CPU related metrics
> * Memory related metrics
> * I/O related metrics
> 
> For each of those subsystems a configuration data will be emitted as well. 
> The initial proposal is to emit the configuration data events at least once 
> per chunk and the metrics values at 30 seconds interval. 
> By using these values the emitted events seem to contain useful information 
> without increasing overhead (the metrics values are read from `/proc` 
> filesystem so that should not be done too frequently).

Jaroslav Bachorik has updated the pull request incrementally with one 
additional commit since the last revision:

  Report container type and register events conditionally

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3126/files
  - new: https://git.openjdk.java.net/jdk/pull/3126/files/79c91f57..2514b1a7

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3126&range=04
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3126&range=03-04

  Stats: 41 lines in 2 files changed: 24 ins; 8 del; 9 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3126.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3126/head:pull/3126

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events

2021-04-02 Thread Jaroslav Bachorik
On Thu, 1 Apr 2021 15:55:59 GMT, Jaroslav Bachorik  
wrote:

>>> Does each getter call result in parsing /proc, or do things aggregated over 
>>> several calls or hooks?
>> 
>> From the looks of it the event emitting code uses `Metrics.java` interface 
>> for retrieving the info. Each call to a method exposed by Metrics result in 
>> file IO on some cgroup (v1 or v2) interface file(s) in `/sys/fs/...`. I 
>> don't see any aggregation being done.
>> 
>> On the hotspot side, we implemented some caching for frequent calls 
>> (JDK-8232207, JDK-8227006), but we didn't do that yet for the Java side 
>> since there wasn't any need (so far). If calls are becoming frequent with 
>> this it should be reconsidered.
>> 
>> So +1 on getting some data on what the perf penalty of this is.
>
> Thanks to all for chiming in!
> 
> I have added the tests to 
> `test/hotspot/jtreg/containers/docker/TestJFREvents.java` where there already 
> were some templates for the container event data.
> 
> As for the performance - as expected, extracting the data from `/proc` is not 
> exactly cheap. On my test c5.4xlarge instance I am getting an average 
> wall-clock time to generate the usage/throttling events (one instance of 
> each) of ~15ms.
> I would argue that 15ms per 30s (the default emission period for those 
> events) might be acceptable to start with. 
> 
> Caching of cgroups parsed data would help if the emission period is shorter 
> than the cache TTL. This is exacerbated by the fact that (almost) each 
> container event type requires data from a different cgroups control file - 
> hence the data will not be shared between the event type instances even if 
> cached. Realistically, caching benefits would become visible only for 
> sub-second emission periods.
> 
> If the caching is still required I would suggest having a follow up ticket 
> just for that - it will require setting up some benchmarks to justify the 
> changes that would need to be done in the metrics implementation.

I tried to measure the startup regression and here are my observations:
* Startup is not affected unless the application is started with JFR
* The extra events and hooks take ~5ms on my work machine
* It is possible not to register those events and hooks in a non-container env 
- then the overhead is 20-50us which it takes to figure out whether running in 
container

In order to minimize the effect this change will have on the startup I would 
suggest using conditional registration unless I hear strong objections to that.

-

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events

2021-04-01 Thread Jaroslav Bachorik
On Fri, 26 Mar 2021 11:05:56 GMT, Severin Gehwolf  wrote:

>> Does each getter call result in parsing /proc, or do things aggregated over 
>> several calls or hooks?
>> 
>> Do you have any data how expensive the invocations are? 
>> 
>> You could for example try to measure it by temporary making the events 
>> durational, and fetch the values between begin() and end(), and perhaps show 
>> a 'jfr print --events Container* recording.jfr' printout. 
>> 
>> If possible, it would be interesting to get some idea about the startup cost 
>> as well
>> 
>> If not too much overhead, I think it would be nice to skip the "flag" in the 
>> .jfcs, and always record the events in a container environment.
>> 
>> I know there is a way to test JFR using Docker, maybe @mseledts could 
>> provide information? Some sanity tests would be good to have.
>
>> Does each getter call result in parsing /proc, or do things aggregated over 
>> several calls or hooks?
> 
> From the looks of it the event emitting code uses `Metrics.java` interface 
> for retrieving the info. Each call to a method exposed by Metrics result in 
> file IO on some cgroup (v1 or v2) interface file(s) in `/sys/fs/...`. I don't 
> see any aggregation being done.
> 
> On the hotspot side, we implemented some caching for frequent calls 
> (JDK-8232207, JDK-8227006), but we didn't do that yet for the Java side since 
> there wasn't any need (so far). If calls are becoming frequent with this it 
> should be reconsidered.
> 
> So +1 on getting some data on what the perf penalty of this is.

Thanks to all for chiming in!

I have added the tests to 
`test/hotspot/jtreg/containers/docker/TestJFREvents.java` where there already 
were some templates for the container event data.

As for the performance - as expected, extracting the data from `/proc` is not 
exactly cheap. On my test c5.4xlarge instance I am getting an average 
wall-clock time to generate the usage/throttling events (one instance of each) 
of ~15ms.
I would argue that 15ms per 30s (the default emission period for those events) 
might be acceptable to start with. 

Caching of cgroups parsed data would help if the emission period is shorter 
than the cache TTL. This is exacerbated by the fact that (almost) each 
container event type requires data from a different cgroups control file - 
hence the data will not be shared between the event type instances even if 
cached. Realistically, caching benefits would become visible only for 
sub-second emission periods.

If the caching is still required I would suggest having a follow up ticket just 
for that - it will require setting up some benchmarks to justify the changes 
that would need to be done in the metrics implementation.

-

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v4]

2021-04-01 Thread Jaroslav Bachorik
> With this change it becomes possible to surface various cgroup level metrics 
> (available via `jdk.internal.platform.Metrics`) as JFR events.
> 
> Only a subset of the metrics exposed by `jdk.internal.platform.Metrics` is 
> turned into JFR events to start with.
> * CPU related metrics
> * Memory related metrics
> * I/O related metrics
> 
> For each of those subsystems a configuration data will be emitted as well. 
> The initial proposal is to emit the configuration data events at least once 
> per chunk and the metrics values at 30 seconds interval. 
> By using these values the emitted events seem to contain useful information 
> without increasing overhead (the metrics values are read from `/proc` 
> filesystem so that should not be done too frequently).

Jaroslav Bachorik has updated the pull request incrementally with two 
additional commits since the last revision:

 - Remove unused test files
 - Initial test support for JFR container events

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3126/files
  - new: https://git.openjdk.java.net/jdk/pull/3126/files/82570022..79c91f57

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3126&range=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3126&range=02-03

  Stats: 117 lines in 4 files changed: 99 ins; 6 del; 12 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3126.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3126/head:pull/3126

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events

2021-03-26 Thread Severin Gehwolf
On Thu, 25 Mar 2021 23:28:18 GMT, Erik Gahlin  wrote:

> Does each getter call result in parsing /proc, or do things aggregated over 
> several calls or hooks?

>From the looks of it the event emitting code uses `Metrics.java` interface for 
>retrieving the info. Each call to a method exposed by Metrics result in file 
>IO on some cgroup (v1 or v2) interface file(s) in `/sys/fs/...`. I don't see 
>any aggregation being done.

On the hotspot side, we implemented some caching for frequent calls 
(JDK-8232207, JDK-8227006), but we didn't do that yet for the Java side since 
there wasn't any need (so far). If calls are becoming frequent with this it 
should be reconsidered.

So +1 on getting some data on what the perf penalty of this is.

-

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events

2021-03-25 Thread Erik Gahlin
On Wed, 24 Mar 2021 18:39:06 GMT, Severin Gehwolf  wrote:

>> With this change it becomes possible to surface various cgroup level metrics 
>> (available via `jdk.internal.platform.Metrics`) as JFR events.
>> 
>> Only a subset of the metrics exposed by `jdk.internal.platform.Metrics` is 
>> turned into JFR events to start with.
>> * CPU related metrics
>> * Memory related metrics
>> * I/O related metrics
>> 
>> For each of those subsystems a configuration data will be emitted as well. 
>> The initial proposal is to emit the configuration data events at least once 
>> per chunk and the metrics values at 30 seconds interval. 
>> By using these values the emitted events seem to contain useful information 
>> without increasing overhead (the metrics values are read from `/proc` 
>> filesystem so that should not be done too frequently).
>
> @jbachorik Would it make sense for `ContainerConfigurationEvent` to include 
> the underlying cgroup version info (v1 or legacy vs. v2 or unified)? 
> `Metrics.getProvider()` should give that info.

Does each getter call result in parsing /proc, or do things aggregated over 
several calls or hooks?

Do you have any data how expensive the invocations are? 

You could for example try to measure it by temporary making the events 
durational, and fetch the values between begin() and end(), and perhaps show a 
'jfr print --events Container* recording.jfr' printout. 

If possible, it would be interesting to get some idea about the startup cost as 
well

If not too much overhead, I think it would be nice to skip the "flag" in the 
.jfcs, and always record the events in a container environment.

I know there is a way to test JFR using Docker, maybe @mseledts could provide 
information? Some sanity tests would be good to have.

-

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events

2021-03-24 Thread Severin Gehwolf
On Mon, 22 Mar 2021 15:57:12 GMT, Jaroslav Bachorik  
wrote:

> With this change it becomes possible to surface various cgroup level metrics 
> (available via `jdk.internal.platform.Metrics`) as JFR events.
> 
> Only a subset of the metrics exposed by `jdk.internal.platform.Metrics` is 
> turned into JFR events to start with.
> * CPU related metrics
> * Memory related metrics
> * I/O related metrics
> 
> For each of those subsystems a configuration data will be emitted as well. 
> The initial proposal is to emit the configuration data events at least once 
> per chunk and the metrics values at 30 seconds interval. 
> By using these values the emitted events seem to contain useful information 
> without increasing overhead (the metrics values are read from `/proc` 
> filesystem so that should not be done too frequently).

@jbachorik Would it make sense for `ContainerConfigurationEvent` to include the 
underlying cgroup version info (v1 or legacy vs. v2 or unified)? 
`Metrics.getProvider()` should give that info.

-

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v3]

2021-03-24 Thread Jaroslav Bachorik
> With this change it becomes possible to surface various cgroup level metrics 
> (available via `jdk.internal.platform.Metrics`) as JFR events.
> 
> Only a subset of the metrics exposed by `jdk.internal.platform.Metrics` is 
> turned into JFR events to start with.
> * CPU related metrics
> * Memory related metrics
> * I/O related metrics
> 
> For each of those subsystems a configuration data will be emitted as well. 
> The initial proposal is to emit the configuration data events at least once 
> per chunk and the metrics values at 30 seconds interval. 
> By using these values the emitted events seem to contain useful information 
> without increasing overhead (the metrics values are read from `/proc` 
> filesystem so that should not be done too frequently).

Jaroslav Bachorik has updated the pull request incrementally with one 
additional commit since the last revision:

  Update the JFR control files

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3126/files
  - new: https://git.openjdk.java.net/jdk/pull/3126/files/90de0d8c..82570022

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3126&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3126&range=01-02

  Stats: 10 lines in 2 files changed: 10 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3126.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3126/head:pull/3126

PR: https://git.openjdk.java.net/jdk/pull/3126


Re: RFR: 8203359: Container level resources events [v2]

2021-03-24 Thread Jaroslav Bachorik
> With this change it becomes possible to surface various cgroup level metrics 
> (available via `jdk.internal.platform.Metrics`) as JFR events.
> 
> Only a subset of the metrics exposed by `jdk.internal.platform.Metrics` is 
> turned into JFR events to start with.
> * CPU related metrics
> * Memory related metrics
> * I/O related metrics
> 
> For each of those subsystems a configuration data will be emitted as well. 
> The initial proposal is to emit the configuration data events at least once 
> per chunk and the metrics values at 30 seconds interval. 
> By using these values the emitted events seem to contain useful information 
> without increasing overhead (the metrics values are read from `/proc` 
> filesystem so that should not be done too frequently).

Jaroslav Bachorik has updated the pull request incrementally with one 
additional commit since the last revision:

  Split off the CPU throttling metrics

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3126/files
  - new: https://git.openjdk.java.net/jdk/pull/3126/files/9ba3c5a6..90de0d8c

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3126&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3126&range=00-01

  Stats: 88 lines in 3 files changed: 73 ins; 15 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3126.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3126/head:pull/3126

PR: https://git.openjdk.java.net/jdk/pull/3126


RFR: 8203359: Container level resources events

2021-03-22 Thread Jaroslav Bachorik
With this change it becomes possible to surface various cgroup level metrics 
(available via `jdk.internal.platform.Metrics`) as JFR events.

Only a subset of the metrics exposed by `jdk.internal.platform.Metrics` is 
turned into JFR events to start with.
* CPU related metrics
* Memory related metrics
* I/O related metrics

For each of those subsystems a configuration data will be emitted as well. The 
initial proposal is to emit the configuration data events at least once per 
chunk and the metrics values at 30 seconds interval. 
By using these values the emitted events seem to contain useful information 
without increasing overhead (the metrics values are read from `/proc` 
filesystem so that should not be done too frequently).

-

Commit messages:
 - Formatting spaces
 - 8203359: Container level resources events

Changes: https://git.openjdk.java.net/jdk/pull/3126/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3126&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8203359
  Stats: 389 lines in 8 files changed: 386 ins; 0 del; 3 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3126.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3126/head:pull/3126

PR: https://git.openjdk.java.net/jdk/pull/3126