lucabro81 opened a new issue, #193:
URL: https://github.com/apache/openserverless/issues/193
### Ⅰ. Issue Description
OpenServerless installation fails on Ubuntu 24.04 with kernel 6.14 due to
JVM cgroup v2 compatibility issue. Kafka and Controller pods crash with
`NullPointerException` in `CgroupV2Subsystem.getInstance`.
### Ⅱ. Describe what happened
Kafka pod crashes immediately on startup with the following exception:
```
Exception in thread "main" java.lang.reflect.InvocationTargetException
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
Source)
at java.base/java.lang.reflect.Method.invoke(Unknown Source)
at
java.instrument/sun.instrument.InstrumentationImpl.loadClassAndStartAgent(Unknown
Source)
at
java.instrument/sun.instrument.InstrumentationImpl.loadClassAndCallPremain(Unknown
Source)
Caused by: java.lang.NullPointerException
at
java.base/jdk.internal.platform.cgroupv2.CgroupV2Subsystem.getInstance(Unknown
Source)
at java.base/jdk.internal.platform.CgroupSubsystemFactory.create(Unknown
Source)
at java.base/jdk.internal.platform.CgroupMetrics.getInstance(Unknown
Source)
at java.base/jdk.internal.platform.SystemMetrics.instance(Unknown Source)
at java.base/jdk.internal.platform.Metrics.systemMetrics(Unknown Source)
at java.base/jdk.internal.platform.Container.metrics(Unknown Source)
at
jdk.management/com.sun.management.internal.OperatingSystemImpl.<init>(Unknown
Source)
at
jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl.getOperatingSystemMXBean(Unknown
Source)
at
jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl$3.nameToMBeanMap(Unknown
Source)
at
java.management/sun.management.spi.PlatformMBeanProvider$PlatformComponent.getMBeans(Unknown
Source)
at
java.management/java.lang.management.ManagementFactory.getPlatformMXBean(Unknown
Source)
at
java.management/java.lang.management.ManagementFactory.getOperatingSystemMXBean(Unknown
Source)
at
io.prometheus.jmx.shaded.io.prometheus.client.hotspot.StandardExports.<init>(StandardExports.java:43)
at
io.prometheus.jmx.shaded.io.prometheus.client.hotspot.DefaultExports.register(DefaultExports.java:37)
at
io.prometheus.jmx.shaded.io.prometheus.client.hotspot.DefaultExports.initialize(DefaultExports.java:28)
at io.prometheus.jmx.JavaAgent.premain(JavaAgent.java:30)
... 6 more
*** java.lang.instrument ASSERTION FAILED ***: "result" with message agent
load/premain call failed at
src/java.instrument/share/native/libinstrument/JPLISAgent.c line: 422
FATAL ERROR in native method: processing of -javaagent failed,
processJavaStart failed
```
The Controller pod never starts - installation hangs indefinitely waiting
for `pod/controller-0` to become available.
### Ⅲ. Describe what you expected to happen
Installation should complete successfully with all pods running and healthy,
allowing user creation and login.
### Ⅳ. How to reproduce it (as minimally and precisely as possible)
1. Install Ubuntu 24.04 on hardware with NVIDIA kernel 6.14
2. Configure ops with minimal settings
3. Run installation
4. Installation fails with timeout waiting for controller-0
### Ⅴ. Anything else we need to know?
**Root Cause Analysis:**
This is a known issue with JVM versions prior to JDK 21 running on Linux
kernel 6.12+ with cgroup v2. The problem occurs because:
1. Ubuntu 24.04 uses kernel 6.x + systemd 256+ which enforces cgroup v2
2. The memory cgroup controller is missing from `/proc/cgroups`:
```bash
$ cat /proc/cgroups
#subsys_name hierarchy num_cgroups enabled
cpu 0 860 1
cpuacct 0 860 1
blkio 0 860 1
devices 0 860 1
freezer 0 860 1
net_cls 0 860 1
perf_event 0 860 1
net_prio 0 860 1
hugetlb 0 860 1
pids 0 860 1
rdma 0 860 1
misc 0 860 1
dmem 0 860 1
```
Note: No "memory" controller present
3. JVM's `CgroupV2Subsystem.getInstance()` expects the memory controller and
crashes with NPE when it's missing
**Temporary Workaround:**
Force cgroup v1 by adding kernel boot parameter:
```bash
# In /etc/default/grub:
GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=0"
# Then: sudo update-grub && sudo reboot
```
**Permanent Solution:**
Update Docker images to use JDK 21+ or apply JVM patches for cgroup v2
compatibility. Similar issues have been fixed in:
- Elasticsearch 7.17.26+
- OpenJDK bug tracker: https://bugs.openjdk.org/browse/JDK-8287107
### Ⅵ. Environment:
- K8S Runtime and version: k3s (installed by ops)
- OPS CLI version: 0.1.0-2409121919.dev
- OS: Ubuntu 24.04 LTS
- Kernel: 6.14.0-1015-nvidia
- Hardware: Dell Pro Max GB10 (NVIDIA GB10 Grace CPU + Blackwell GPU)
- Java version on host: OpenJDK 1.8.0_472
- Cgroup version: v2 (enforced by kernel 6.14)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]