[ https://issues.apache.org/jira/browse/DRILL-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17065973#comment-17065973 ]
Dobes Vandermeer commented on DRILL-7563: ----------------------------------------- I have been able to get drill to run in kubernetes using the existing drill docker images currently published. See here for example configs: [https://gist.github.com/dobesv/98d85b18ee8566891c5122e2b990f0c5] [https://gist.github.com/dobesv/be5aa3e6e5830e54c0e77b73884333cc] > Docker & Kubernetes Drill server container > ------------------------------------------ > > Key: DRILL-7563 > URL: https://issues.apache.org/jira/browse/DRILL-7563 > Project: Apache Drill > Issue Type: Improvement > Affects Versions: 1.17.0 > Reporter: Paul Rogers > Assignee: Paul Rogers > Priority: Major > > Drill provides two Docker containers: > * [Build Drill from > sources|https://github.com/apache/drill/blob/master/Dockerfile] > * [Run Drill in interactive embedded > mode|https://github.com/apache/drill/blob/master/distribution/Dockerfile] > User feedback suggests that these are not quite the right solutions to run > Drill in a K8s (or OpenShift) cluster. In addition, we need a container to > run a Drill server. This ticket summarizes the tasks involved. > h4. Container Image > The container image should: > * Start with the OpenJDK base image with minimal extra packages. > * Download and install an official Drill release. > We may then want to provide two derived images: > The Drillbit image which: > * Configures Drill for production and as needed in the following steps. > * Provides entry points for the Drillbit and for Sqlline > * Exposes Drill's four ports > * Accept as parameters things like the ZK host IP(s). > The Sqlline image, meant to be run in interactive mode (like the current > embedded image) and which: > * Accept as parameters the ZK host IP(s). > Both should be published to the official Drill DockerHub account: > https://hub.docker.com/r/apache/drill > h4. Runtime Environment > Drill has very few dependencies, but it must have a running ZK. > * Start a [ZK container|https://hub.docker.com/_/zookeeper/]. > * A place to store logs, which can be in the container by default, stored on > the host file system via a volume mount. > * Access to a data source, which can be configured via a storage plugin > stored in ZK. > * Ensure graceful shutdown integration with the Docker shutdown mechanism. > h4. Running Drill in Docker > Users must run at least one Drillbit, and may run more. Users may want to run > Sqlline. > * The Drillbit container requires, at a minimum, the IP address of the ZK > instance(s). > * The Sqlline container requires only the ZK instances, from which it can > find the Drillbit. > Uses will want to customize some parts of Drill: at least memory, perhaps any > of the other options. Provide a way to pass this information into the > container to avoid the need to rebuild the container to change configuration. > h4. Running Drill in K8s > The containers should be usable in "plain" Docker. Today, however, many > people use K8s to orchestrate Docker. Thus, the Drillbit (but probably not > the Sqlline) container should be designed to work with K8s. An example set of > K8s YAML files should illustrate: > * Create a host-mount file system to capture Drill logs and query profiles. > * Optionally write Drill logs to stdout, to be captured by {{fluentd}} or > similar tools. > * Pass Drill configuration (both HOCON and envrironment) as config maps. > * Pass ZK as an environment variable (the value of which would, one presumes, > come from some kind of service discovery system.) > The result is that the user should be able to manually tinker with the YAML > files, then use {{kubeadm}} to launch, monitor and stop Drill. The user sets > cluster size manually by launching the desired number of Drill pods. > h4. Helm Chart for Drill > The next step is to wrap the YAML files in a Helm chart, with parameters > exposed for the config options noted above. > h4. Drill Operator for K8s > > Full K8s integration will require an operator to manage the Drill cluster. > K8s operators are often written in Go, though doing so is not necessary. > Drill already includes Drill-on-YARN which is, essential a "YARN operator." > Repurpose this code to work with K8s as the target cluster manager rather > than YARN. Reuse the same operations from DoY: configure, start, resize and > stop a cluster. > h4. Security > The above steps provide an "MVP": minimum viable project - it will run Drill > with standard options in the various environments. Users who chose to run > Drill in production will likely require additional security settings. Enable > SSL? Control ingress? We need to understand what is needed, what Drill > offers, and how to enable Drill's security features in a containerized > environment. > h4. Production Deployments > With Docker and K8s the old maxim "the devil is in the details" is true in > spades. There are dozens of ways that Drill can be configured and integrated > in K8s: "stock K8s", OpenShift, AWS EKS, Google GCP and so on. Only > experience with users in the field will tell us what is needed. At the same > time, we must avoid making the solution too complex: "instant ignition" must > be the key, along with additional, optional, config options needed to achieve > production deployment goals. This is an area where community contribution > would be extremely valuable. > -- This message was sent by Atlassian Jira (v8.3.4#803005)