Ethanlm commented on a change in pull request #3366:
URL: https://github.com/apache/storm/pull/3366#discussion_r669915785
##########
File path: docs/OCI-support.md
##########
@@ -0,0 +1,1704 @@
+---
+title: OCI/Squashfs Runtime
+layout: documentation
+documentation: true
+---
+
+# OCI/Squashfs Runtime
+
+OCI/Squashfs is a container runtime that allows topologies to run inside
docker containers. However, unlike the existing
+Docker runtime, the images are fetched from HDFS rather than from the Docker
registry or requiring images to be pre-loaded
+into Docker on each node. Docker does not need to be installed on the nodes in
order for this runtime to work.
+
+Note: This has only been tested on RHEL7.
+
+## Motivation
+
+#### Docker runtime drawbacks
+Using the current Docker runtime (see
[Docker-support.md](Docker-support.md#Docker-Support) ) has some drawbacks:
+
+##### Docker Daemons Dependency
+
+The Docker daemons `dockerd` and `containerd` must be running on the system in
order for the Docker runtime to function.
+And these daemons can get out of sync which could cause nontrivial issues to
the containers.
+
+##### Docker Registry Issues at Scale
+
+Using the Docker runtime on a large scale Storm cluster can overwhelm the
Docker registry. In practice this requires
+admins to pre-load a Docker image on all the cluster nodes in a controlled
fashion before a large job requesting
+the image can run.
+
+##### Image Costs in Time and Space
+
+Docker stores each image layer as a tar.gz archive. In order to use the layer,
the compressed archive must be unpacked
+into the node's filesystem. This can consume significant disk space,
especially when the reliable image store location
+capacity is relatively small. In addition, unpacking an image layer takes
time, especially when the layer is large or
+contains thousands of files. This additional time for unpacking delays
container launch beyond the time needed to transfer
+the layer data over the network.
+
+#### OCI/Squashfs Runtime advantages
+
+The OCI/Squashfs runtime avoids the drawback listed above in the following
ways.
+
+##### No Docker dependencies on The Node
+
+Docker does not need to be installed on each node, nor is there a dependency
on a daemon or service that needs to be started
+by an admin before containers can be launched. All that is required to be
present on each node is an OCI-compatible runtime like
+`runc`.
+
+##### Leverages Distributed File Sytems For Scale
+
+Image can be fetched via HDFS or other distributed file systems instead of the
Docker registry. This prevents a large cluster from
+overwhelming a Docker registry when a big topology causes all of the nodes to
request an image at once. This also allows large clusters
+to run topologies more dynamically, as images would not need to be pre-loaded
by admins on each node to prevent a large Docker registry
+image request storm.
+
+##### Smaller, Faster images on The Node
+
+The new runtime handles layer localization directly, so layer formats other
than tar archive can be supported. For example, each image layer
+can be converted to squashfs images as part of copying the layers to HDFS.
squashfs is a file system optimized for running directly on a
+compressed image. With squashfs layers the layer data can remain compressed on
the node saving disk space. Container launch after layer
+localization is also faster, as the layers no longer need to be unpacked into
a directory to become usable.
+
+
+## Prerequisite
+
+First you need to use the`docker-to-squash.py` script to download docker
images and configs, convert layers to squashfs files and put them to a
directory in HDFS, for example
+
+```bash
+python docker-to-squash.py pull-build-push-update --hdfs-root
hdfs://hostname:port/containers \
+
docker.xxx.com:4443/hadoop-user-images/storm/rhel7:20201202-232133,storm/rhel7:dev_current
--log DEBUG --bootstrap
+```
+
+With this command, all the layers belong to this image will be converted to
squashfs file and be placed under `./layers` directory;
+the manifest of this image will be placed under `./manifests` directory with
the name as the sha256 value of the manifest content;
+the config of this image will be placed under `./config` directory with the
name as the sha256 value of the config content;
+the mapping from the image tag to the sha256 value of the manifest will be
written to the "./image-tag-to-manifest-file".
+
+##### Example
+
+For example, the directory structure is like this:
+
+```bash
+-bash-4.2$ hdfs dfs -ls /containers/*
+Found 1 items
+-r--r--r-- 3 hdfsqa hadoop 7877 2020-12-04 14:29
/containers/config/ef1ff2c7167a1a6cd01e106f51b84a4d400611ba971c53cbc28de7919515ca4e
+-r--r--r-- 3 hdfsqa hadoop 160 2020-12-04 14:30
/containers/image-tag-to-hash
+Found 7 items
+-r--r--r-- 3 hdfsqa hadoop 84697088 2020-12-04 14:28
/containers/layers/152ee1d2cccea9dfe6393d2bdf9d077b67616b2b417b25eb74fc5ffaadcb96f5.sqsh
+-r--r--r-- 3 hdfsqa hadoop 545267712 2020-12-04 14:28
/containers/layers/18ee671016a1bf3ecab07395d93c2cbecd352d59c497a1551e2074d64e1098d9.sqsh
+-r--r--r-- 3 hdfsqa hadoop 12906496 2020-10-06 15:24
/containers/layers/1b73e9433ecca0a6bb152bd7525f2b7c233484d51c24f8a6ba483d5cfd3035dc.sqsh
+-r--r--r-- 3 hdfsqa hadoop 4096 2020-12-04 14:29
/containers/layers/344224962010c03c9ca1f11a9bff0dfcc296ac46d0a55e4ff30a0ad13b9817af.sqsh
+-r--r--r-- 3 hdfsqa hadoop 26091520 2020-10-06 15:22
/containers/layers/3692c3483ef6516fba685b316448e8aaf0fc10bb66818116edc8e5e6800076c7.sqsh
+-r--r--r-- 3 hdfsqa hadoop 4096 2020-12-04 14:29
/containers/layers/8710a3d72f75b45c48ab6b9b67eb6d77caea3dac91a0c30e0831f591cba4887e.sqsh
+-r--r--r-- 3 hdfsqa hadoop 121122816 2020-10-06 15:23
/containers/layers/ea067172a7138f035d89a5c378db6d66c1581d98b0497b21f256e04c3d2b5303.sqsh
+Found 1 items
+-r--r--r-- 3 hdfsqa hadoop 1793 2020-12-04 14:29
/containers/manifests/26fd443859325d5911f3be5c5e231dddca88ee0d526456c0c92dd794148d8585
+```
+
+The `image-tag-to-manifest-file`:
+```bash
+-bash-4.2$ hdfs dfs -cat /containers/image-tag-to-hash
+storm/rhel7:dev_current:26fd443859325d5911f3be5c5e231dddca88ee0d526456c0c92dd794148d8585#docker.xxx.com:4443/hadoop-user-images/storm/rhel7:20201202-232133
+```
+
+The manifest file
`26fd443859325d5911f3be5c5e231dddca88ee0d526456c0c92dd794148d8585`:
+```json
+{
+ "schemaVersion": 2,
+ "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
+ "config": {
+ "mediaType": "application/vnd.docker.container.image.v1+json",
+ "size": 7877,
+ "digest":
"sha256:ef1ff2c7167a1a6cd01e106f51b84a4d400611ba971c53cbc28de7919515ca4e"
+ },
+ "layers": [
+ {
+ "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
+ "size": 26858854,
+ "digest":
"sha256:3692c3483ef6516fba685b316448e8aaf0fc10bb66818116edc8e5e6800076c7"
+ },
+ {
+ "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
+ "size": 123300113,
+ "digest":
"sha256:ea067172a7138f035d89a5c378db6d66c1581d98b0497b21f256e04c3d2b5303"
+ },
+ {
+ "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
+ "size": 12927624,
+ "digest":
"sha256:1b73e9433ecca0a6bb152bd7525f2b7c233484d51c24f8a6ba483d5cfd3035dc"
+ },
+ {
+ "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
+ "size": 567401434,
+ "digest":
"sha256:18ee671016a1bf3ecab07395d93c2cbecd352d59c497a1551e2074d64e1098d9"
+ },
+ {
+ "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
+ "size": 85748864,
+ "digest":
"sha256:152ee1d2cccea9dfe6393d2bdf9d077b67616b2b417b25eb74fc5ffaadcb96f5"
+ },
+ {
+ "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
+ "size": 186,
+ "digest":
"sha256:344224962010c03c9ca1f11a9bff0dfcc296ac46d0a55e4ff30a0ad13b9817af"
+ },
+ {
+ "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
+ "size": 156,
+ "digest":
"sha256:8710a3d72f75b45c48ab6b9b67eb6d77caea3dac91a0c30e0831f591cba4887e"
+ }
+ ]
+}
+```
+
+And the config file
`ef1ff2c7167a1a6cd01e106f51b84a4d400611ba971c53cbc28de7919515ca4e` (some of the
content is omitted):
+```json
+{
+ "architecture": "amd64",
+ "config": {
+ "Hostname": "",
+ "Domainname": "",
+ "User": "root",
+ "AttachStdin": false,
+ "AttachStdout": false,
+ "AttachStderr": false,
+ "Tty": false,
+ "OpenStdin": false,
+ "StdinOnce": false,
+ "Env": [
+ "X_SCLS=rh-git218",
+ "LD_LIBRARY_PATH=/opt/rh/httpd24/root/usr/lib64",
+
"PATH=/opt/rh/rh-git218/root/usr/bin:/home/y/bin64:/home/y/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/y/share/yjava_jdk/java/bin",
+ "PERL5LIB=/opt/rh/rh-git218/root/usr/share/perl5/vendor_perl",
+ "LANG=en_US.UTF-8",
+ "LANGUAGE=en_US:en",
+ "LC_ALL=en_US.UTF-8",
+ "JAVA_HOME=/home/y/share/yjava_jdk/java"
+ ],
+ "Cmd": [
+ "/bin/bash"
+ ],
+ "Image":
"sha256:6977cd0735c96d14248e834f775373e40230c134b70f10163c05ce6c6c8873ca",
+ "Volumes": null,
+ "WorkingDir": "",
+ "Entrypoint": null,
+ "OnBuild": null,
+ "Labels": {
+ "name": "xxxxx"
+ }
+ },
+ "container":
"344ff1084dea3e0501a0d426e52c43cd589d6b29f33ab0915b7be8906b9aec41",
+ "container_config": {
+ "Hostname": "344ff1084dea",
+ "Domainname": "",
+ "User": "root",
+ "AttachStdin": false,
+ "AttachStdout": false,
+ "AttachStderr": false,
+ "Tty": false,
+ "OpenStdin": false,
+ "StdinOnce": false,
+ "Env": [
+ "X_SCLS=rh-git218",
+ "LD_LIBRARY_PATH=/opt/rh/httpd24/root/usr/lib64",
+
"PATH=/opt/rh/rh-git218/root/usr/bin:/home/y/bin64:/home/y/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/y/share/yjava_jdk/java/bin",
+ "PERL5LIB=/opt/rh/rh-git218/root/usr/share/perl5/vendor_perl",
+ "LANG=en_US.UTF-8",
+ "LANGUAGE=en_US:en",
+ "LC_ALL=en_US.UTF-8",
+ "JAVA_HOME=/home/y/share/yjava_jdk/java"
+ ],
+ "Cmd": [
+ "/bin/sh",
+ "-c"
+ ],
+ "Image":
"sha256:6977cd0735c96d14248e834f775373e40230c134b70f10163c05ce6c6c8873ca",
+ "Volumes": null,
+ "WorkingDir": "",
+ "Entrypoint": null,
+ "OnBuild": null,
+ "Labels": {
+ "name": "xxxxx"
+ }
+ },
+ "created": "2020-12-02T23:25:47.354704574Z",
+ "docker_version": "19.03.8",
+ "history": [
+ {
+ "created": "2020-02-18T21:43:36.934503462Z",
+ "created_by": "/bin/sh"
+ },
+ {
+ "created": "2020-02-18T21:45:05.729764427Z",
+ "created_by": "/bin/sh"
+ },
+ {
+ "created": "2020-02-18T21:46:36.638896031Z",
+ "created_by": "/bin/sh"
+ },
+ {
+ "created": "2020-12-02T23:21:54.595662813Z",
+ "created_by": "/bin/sh -c #(nop) USER root",
+ "empty_layer": true
+ },
+ {
+ "created": "2020-12-02T23:25:45.822235539Z",
+ "created_by": "/bin/sh -c /opt/python/bin/pip3.6 install --no-cache-dir
numpy scipy pandas requests setuptools scikit-learn matplotlib"
+ },
+ {
+ "created": "2020-12-02T23:25:46.708884538Z",
+ "created_by": "/bin/sh -c #(nop) ENV
JAVA_HOME=/home/y/share/yjava_jdk/java",
+ "empty_layer": true
+ },
+ {
+ "created": "2020-12-02T23:25:46.770226108Z",
+ "created_by": "/bin/sh -c #(nop) ENV
PATH=/opt/rh/rh-git218/root/usr/bin:/home/y/bin64:/home/y/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/y/share/yjava_jdk/java/bin",
+ "empty_layer": true
+ },
+ {
+ "created": "2020-12-02T23:25:46.837263533Z",
+ "created_by": "/bin/sh -c #(nop) COPY
file:33283617fbd796b25e53eaf4d26012eea1f610ff9acc0706f11281e86be440dc in
/etc/krb5.conf "
+ },
+ {
+ "created": "2020-12-02T23:25:47.237515768Z",
+ "created_by": "/bin/sh -c echo '7.7.4' \u003e
/etc/hadoop-dockerfile-version"
+ }
+ ],
+ "os": "linux",
+ "rootfs": {
+ "type": "layers",
+ "diff_ids": [
+
"sha256:9f627fdb0292afbe5e2eb96edc1b3a5d3a8f468e3acf1d29f1509509285c7341",
+
"sha256:83d2667f9458eaf719588a96bb63f2520bd377d29d52f6dbd4ff13c819c08037",
+
"sha256:fcba5f49eef4f3d77d3e73e499a1a4e1914b3f20d903625d27c0aa3ab82f41a3",
+
"sha256:3bd4567d0726f5d6560b548bc0c0400e868f6a27067887a36edd7e8ceafff96c",
+
"sha256:ad56900a1f10e6ef96f17c7e8019384540ab1b34ccce6bda06675473b08d787e",
+
"sha256:ac0a645609f957ab9c4a8a62f8646e99f09a74ada54ed2eaca204c6e183c9ae8",
+ "sha256:9bf10102fc145156f4081c2cacdbadab5816dce4f88eb02881ab739239d316e6"
+ ]
+ }
+}
+```
+
+Note: To use the `docker-to-squash.py`, you need to install
[skopeo](https://github.com/containers/skopeo),
[jq](https://stedolan.github.io/jq/) and squashfs-tools.
+
+
+## Configurations
+
+Then you need to set up storm with the following configs:
+
+| Setting | Description
|
+|-------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `storm.resource.isolation.plugin.enable` | set to `true` to enable
isolation plugin. `storm.resource.isolation.plugin` determines which plugin to
use. If this is set to `false`,
`org.apache.storm.container.DefaultResourceIsolationManager` will be used.
|
+| `storm.resource.isolation.plugin` | set to
`"org.apache.storm.container.oci.RuncLibContainerManager"` to enable OCI/Squash
runtime support
|
+| `storm.oci.allowed.images` | A whitelist of docker images that
can be used. Users can only choose a docker image from the list.
+| `storm.oci.image` | The default docker image to be used
if user doesn't specify which image to use. And it must belong to the
`storm.oci.allowed.images`
+| `topology.oci.image` | Topologies can specify which image
to use. It must belong to the `storm.oci.allowed.images` |
+| `storm.oci.cgroup.root` | The root path of cgroup for docker
to use. On RHEL7, it should be "/sys/fs/cgroup".
+| `storm.oci.cgroup.parent` | --cgroup-parent config for docker
command. It must follow the constraints of docker commands. The path will be
made as absolute path if it's a relative path because we saw some weird bugs
((the cgroup memory directory disappears after a while) when a relative path is
used.
+| `storm.oci.readonly.bindmounts` | A list of read only bind mounted
directories.
+| `storm.oci.readwrite.bindmounts` | A list of read-write bind mounted
directories.
+| `storm.oci.nscd.dir` | The directory of nscd (name service
cache daemon), e.g. "/var/run/nscd/". nscd must be running so that profiling
can work properly.
+| `storm.oci.seccomp.profile` | White listed syscalls seccomp Json
file to be used as a seccomp filter
+| `supervisor.worker.launcher` | Full path to the worker-launcher
executable.
+| `storm.oci.image.hdfs.toplevel.dir` | The HDFS location under which
the oci image manifests, layers and configs directories exist.
+| `storm.oci.image.tag.to.manifest.plugin` | The plugin to be used to get the
image-tag to manifest mappings.
+| `storm.oci.localorhdfs.image.tag.to.manifest.plugin.hdfs.hash.file` |
The hdfs location of image-tag to manifest mapping file. You need to set it if
`org.apache.storm.container.oci.LocalOrHdfsImageTagToManifestPlugin` is used as
`storm.oci.image.tag.to.manifest.plugin`.
+| `storm.oci.manifest.to.resources.plugin` | The plugin to be used to get oci
resource according to the manifest.
Review comment:
`storm.oci.manifest.to.resources.plugin` and
`storm.oci.resources.localizer` has no default value. We only have HDFS related
plugins as of now, and we don't include `storm-hdfs-oci` in the package
distribution by default
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]