Ethanlm opened a new pull request #3366:
URL: https://github.com/apache/storm/pull/3366
## What is the purpose of the change
Enabled supervisors to use runc binary to launch workers inside an oci
container. See docs/OCI-support.md for justifications. The original design and
many parts of the oci related code is borrowed from Verizon Media hadoop-core
team
This has been used in our production Storm clusters for more than one year.
## How was the change tested
Tested with an example WordCount topology, including following operations:
1. launch a topology and supervisor launches the worker inside the oci
container
```
-bash-4.2$ sudo runc list
ID PID STATUS BUNDLE
CREATED
OWNER
6703-1a23ca4b-6062-4d08-8ac3-b09e7d35e7cb 21780 running
/home/y/var/storm/workers/1a23ca4b-6062-4d08-8ac3-b09e7d35e7cb
2020-12-21T20:21:17.116971511Z root
```
2. manually kill the worker, supervisor recovers it
3. profile the worker from UI (jmap, heap, etc)
```
-bash-4.2$ sudo ls -ltrh
/home/y/var/storm/workers-artifacts/wc1-2-1608581491/6703/
...
-rw-r----- 1 username1 gstorm 64K Dec 21 21:32 worker.log
-rw-r----- 1 username1 gstorm 27K Dec 21 21:51 jstack-17-20201221215145.txt
-rw-r----- 1 username1 gstorm 552M Dec 21 21:52
recording-17-20201221215222.bin
```
4. Checked cpu throttling and validated it was properly enforced. In this
example, this worker is assigned with 140% cpu.
```
-bash-4.2$ cat
/sys/fs/cgroup/cpu/storm/6703-1a23ca4b-6062-4d08-8ac3-b09e7d35e7cb/cpu.cfs_period_us
100000
-bash-4.2$ cat
/sys/fs/cgroup/cpu/storm/6703-1a23ca4b-6062-4d08-8ac3-b09e7d35e7cb/cpu.cfs_quota_us
140000
-bash-4.2$ cat
/sys/fs/cgroup/cpu/storm/6703-1a23ca4b-6062-4d08-8ac3-b09e7d35e7cb/cpu.stat
nr_periods 824
nr_throttled 821
throttled_time 104646268666
#Top output:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21800 username1 20 0 5453928 1.7g 28036 S 139.9 22.7 2:15.96 java
21780 username1 20 0 2451376 83324 27448 S 0.0 1.1 0:02.36 java
```
5. kill the topology, so supervisor kills the worker
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]