I applied your patch to the latest JDK 15 sources and ran the container tests on Oracle Linux 8.1 with podman/cgroupv2 enabled. There were some issues. I’m not sure if its my setup or not.
I also ran the same build on Ubuntu with docker/cgroupv1. I didn't see any failures on the cgroupv1 system. Here are some notes: The podman version on OL 8.1 doesn't yes support cgroupv2. I built the latest from sources for this test. The docker tests take a very long time under podman! Longer than the cgroupv1 run. cpusets and cpusets.mems are blank on host and if none are specified on podman/docker run command. On cgroupv1 they are host values if not specified for container. Effective cpusets and cpusets.mems are set properly in a container. HOST OUTPUT: ./java -XshowSettings:system -version Operating System Metrics: Provider: cgroupv2 Effective CPU Count: 32 CPU Period: -1 CPU Quota: -1 CPU Shares: -1 List of Processors: N/A List of Effective Processors: N/A List of Memory Nodes: N/A List of Available Memory Nodes: N/A Memory Limit: Unlimited Memory Soft Limit: Unlimited Memory & Swap Limit: Unlimited CONTAINER OUTPUT: # podman run -it -v `pwd`:/mnt ubuntu bash root@3c6654a3b834:/mnt/jdk/bin# ./java -XshowSettings:system -version Operating System Metrics: Provider: cgroupv2 Effective CPU Count: 32 CPU Period: 100000us CPU Quota: -1 CPU Shares: -1 List of Processors: N/A List of Effective Processors, 32 total: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 List of Memory Nodes: N/A List of Available Memory Nodes, 1 total: 0 Memory Limit: Unlimited Memory Soft Limit: Unlimited Memory & Swap Limit: Unlimited Docker tests fail if /bin/docker is not available in podman setup. We probably should enhance the docker check to also look for podman. Two container tests failed: FAILED: containers/cgroup/PlainRead.java failed Memory Limit is: -2 instead of unlimited or -1. This is because memory.max is not foumd. FAILED: jdk/internal/platform/cgroup/TestCgroupMetrics.java This fails because nr_periods line doesn't always exist. I think you’ve got to enable a quota for this to appear (not sure). Here’s the contents: % more cpu.stat usage_usec 23974562755 user_usec 22257183568 system_usec 1717379186 CGROUPV2 results on Oracle Linux 8.1 --------- testing container APIs FAILED: containers/cgroup/PlainRead.java Passed: containers/docker/DockerBasicTest.java Passed: containers/docker/TestCPUAwareness.java Passed: containers/docker/TestCPUSets.java Passed: containers/docker/TestJcmdWithSideCar.java Passed: containers/docker/TestJFREvents.java Passed: containers/docker/TestJFRNetworkEvents.java Passed: containers/docker/TestMemoryAwareness.java Passed: containers/docker/TestMisc.java Test results: passed: 8; failed: 1 Results written to /export/users/bobv/jdk15/build/jtreg/JTwork Error: Some tests failed or other problems occurred. testing jdk.internal.platform APIs FAILED: jdk/internal/platform/cgroup/TestCgroupMetrics.java Passed: jdk/internal/platform/cgroup/TestCgroupSubsystemController.java Passed: jdk/internal/platform/docker/TestDockerCpuMetrics.java Passed: jdk/internal/platform/docker/TestDockerMemoryMetrics.java Passed: jdk/internal/platform/docker/TestSystemMetrics.java Test results: passed: 4; failed: 1 Results written to /export/users/bobv/jdk15/build/jtreg/JTwork Error: Some tests failed or other problems occurred. testing -XshowSettings:system launcher option Passed: tools/launcher/Settings.java Test results: passed: 1 Bob. > On Feb 11, 2020, at 1:04 PM, Severin Gehwolf <sgehw...@redhat.com> wrote: > > Hi Mandy, Bob, > > Thanks again for the reviews and patience on this. Sorry it took me so > long to get back to this :-/ > > Updated webrev: > Full: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/10/webrev/ > incremental: > http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/10/incremental/webrev/ > > I've tested this with docker tests on cgroup v1 and via podman on a > cgroup v2 system. They pass. I'll be running this through jdk-submit as > well. > > More below. > > On Tue, 2020-01-21 at 16:09 -0800, Mandy Chung wrote: >> Hi Severin, >> >> Thanks for the update. >> >> On 1/21/20 11:30 AM, Severin Gehwolf wrote: >>> Full: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/09/webrev/ >>> incremental: >>> http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/09/incremental/webrev/ >>> >> >> I have answered my own question. Most of the metrics used to return 0 if >> unavailable due to IOException reading a file or malformed content and now >> are changed to return -2 due to error fetching the metric. >> >> The following are about limits which used to return -1 if unlimited or no >> limit set. >> public long getCpuQuota(); >> public long getCpuShares(); >> public long getMemoryLimit(); >> public long getMemoryAndSwapLimit(); >> public long getMemorySoftLimit(); >> >> With this patch, only getMemoryLimit and getMemoryAndSwapLimit specify to >> return -1 if unlimited or no limit set. However the implementation does >> return -1. All of the above specify to return -2 if unavailable due to >> error fetching the metric. >> >> I found the implementation quite hard to follow. I spent some time >> reviewing the code to see if the implementation matches the spec but I >> can't easily tell yet. For example, >> CgroupSubsystemController::getLongValueMatchingLine returns -1 when >> IOException occurs. >> CgroupSubsystemController::getLongEntry returns 0L if IOException occurs. >> >> CgroupV1SubsystemController::convertStringToLong returns Long.MAX_VALUE >> when value overflows > > This one is intentional. It's mapped back to unlimited via > longValOrUnlimited(). The reason for this is that cgroup v1 doesn't > have a concept of "unlimited". Unlimited values will be a very large > numbers in cgroup v1 files. > >> CgroupV2SubsystemController::convertStringToLong returns -1 when IOException >> occurs >> >> CgroupV1Subsystem::getCpuShares return -1 if cpu.shares == 0 or 1024 >> CgroupV2Subsystem::getCpuShares returns -1 if cpu.weight == 100 or 0 > > These two are special cases too. See the implementation note of > Metrics.getCpuShares(). In the cgroup v2 case the default value is 100 > (over 1024 in cgroup v1). That's why unlimited is being returned for > those values. > >> CgroupV2Subsystem::getFromCpuMax returns -1 if error in reading cpu.max or >> malformed content >> CgroupV2Subsystem::sumTokensIOStat returns -2 if IOException error >> This is called by getBlkIOServiceCount and getBlkIOServiced >> >> I think this can be improved and add the documentation to describe >> what the methods do. Since Metrics APIs consistently return -2 if >> unavailable due to error in fetching the metric, why some utility >> methods in *Subsystem and *SubsystemController return -1 upon error >> and 0 when unlimited? >> >> I suspect if the getXXXValue and other methods are clearly documented >> with the error cases (possibly renaming the method name if appropriate) >> CgroupV1Subsystem and CgroupV2SubSystem will become very explicit >> to understand. > > This should be fixed now. > > I've gone through the API doc of Metrics.java and have updated it. In > general, I've updated it to return -1 if metric is unavailable (due to > error in reading some files or content being empty), and -2 if not > supported. No method returns -2 currently, but it might change and it's > good to have some way of saying "not implementable" for this subsystem > in the spec. That's my take on it anyway. > > There is also a new unit test for shared controller logic: > TestCgroupSubsystemController.java > > It execises various cases of error/success. > > That is to ensure proper symmetry across the various cases (including > IOException). I've also documented static methods in > CgroupSubsystemController. Overall, all methods now return the same > values for cgroup v1 and cgroup v2 (given the impl nuances) for the > various cases. > >> >> CgroupSubsystem.java >> >> 44 public static final double DOUBLE_RETVAL_NOT_SUPPORTED = >> LONG_RETVAL_NOT_SUPPORTED; >> 49 public static final Boolean BOOL_RETVAL_NOT_SUPPORTED = null; >> >> They are no longer needed, right? > > Removed. > >> >> CgroupSubsystemFactory.java >> >> 89 System.err.println("Warning: Mixed cgroupv1 and cgroupv2 not >> supported. Metrics disabled."); >> >> >> I expect this be a System.Logger log > > Updated. > >> 114 if >> (!Integer.valueOf(0).toString().equals(tokens[0])) { >> >> This can be simplified to if (!"0".equals(tokens[0])) > > Done, thanks! > >> LauncherHelper.java >> >> 407 // Extended cgroupv1 specific metrics >> 408 if (c instanceof MetricsCgroupV1) { >> 409 MetricsCgroupV1 cgroupV1 = (MetricsCgroupV1)c; >> 410 limit = cgroupV1.getKernelMemoryLimit(); >> 411 ostream.println(formatLimitString(limit, INDENT + "Kernel >> Memory Limit: ")); >> 412 limit = cgroupV1.getTcpMemoryLimit(); >> 413 ostream.println(formatLimitString(limit, INDENT + "TCP >> Memory Limit: ")); >> 414 Boolean value = cgroupV1.isMemoryOOMKillEnabled(); >> 415 ostream.println(formatBoolean(value, INDENT + "Out Of Memory >> Killer Enabled: ")); >> 416 value = cgroupV1.isCpuSetMemoryPressureEnabled(); >> 417 ostream.println(formatBoolean(value, INDENT + "CPUSet Memory >> Pressure Enabled: ")); >> 418 } >> >> MetricsCgroupV1 is linux-only. It will fail the compilation when >> building on non-linux. One option is to move this code to >> src/java.base/linux/share/sun/launcher/CgroupMetrics.java >> >> Are they continued to be interesting metrics to be output from >> -XshowSetting? I wonder if they can simply be dropped from the output. >> Bob will have an opinion. > > I've removed those extra cgroup v1 specific metrics printed via > -XshowSettings:system. Not sure what to do with MetricsCgroupV1. It's > only used in tests in webrev 10. On the other hand the idea would be > for consumers to downcast it to MetricsCgroupV1 if they needed those > extra metrics. > > Thanks, > Severin >