+1
> On Apr 7, 2017, at 10:46 PM, Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > > Hi Stephen, > > I think we should go to 1 and 4: > > 1. Try to use existing images providing what we need. If we don't find > existing image, we can always ask and help other community to provide so. > 4. If we don't find a suitable image, and waiting for this image, we can > store the image in our own "IT dockerhub". > > Regards > JB > >> On 04/08/2017 01:03 AM, Stephen Sisk wrote: >> Wanted to see if anyone else had opinions on this/provide a quick update. >> >> I think for both elasticsearch and HIFIO that we can find existing, >> supported images that could serve those purposes - HIFIO is looking like >> it'll able to do so for cassandra, which was proving tricky. >> >> So to summarize my current proposed solutions: (ordered by my preference) >> 1. (new) Strongly urge people to find existing docker images that meet our >> image criteria - regularly updated/security checked >> 2. Start using helm >> 3. Push our docker images to docker hub >> 4. Host our own public container registry >> >> S >> >>> On Tue, Apr 4, 2017 at 10:16 AM Stephen Sisk <s...@google.com> wrote: >>> >>> I'd like to hear what direction folks want to go in, and from there look >>> at the options. I think for some of these options (like running our own >>> public registry), they may be able to and it's something we should look at, >>> but I don't assume they have time to work on this type of issue. >>> >>> S >>> >>> On Tue, Apr 4, 2017 at 10:00 AM Lukasz Cwik <lc...@google.com.invalid> >>> wrote: >>> >>> Is this something that Apache infra could help us with? >>> >>> On Mon, Apr 3, 2017 at 7:22 PM, Stephen Sisk <s...@google.com.invalid> >>> wrote: >>> >>>> Summary: >>>> >>>> For IO ITs that use data stores that need custom docker images in order >>> to >>>> run, we can't currently use them in a kubernetes cluster (which is where >>> we >>>> host our data stores.) I have a couple options for how to solve this and >>> am >>>> looking for feedback from folks involved in creating IO ITs/opinions on >>>> kubernetes. >>>> >>>> >>>> Details: >>>> >>>> We've discussed in the past that we'll want to allow developers to submit >>>> just a dockerfile, and then we'll use that when creating the data store >>> on >>>> kubernetes. This is the case for ElasticsearchIO and I assume more data >>>> stores in the future will want to do this. It's also looking like it'll >>> be >>>> necessary to use custom docker images for the HadoopInputFormatIO's >>>> cassandra ITs - to run a cassandra cluster, there doesn't seem to be a >>> good >>>> image you can use out of the box. >>>> >>>> In either case, in order to retrieve a docker image, kubernetes needs a >>>> container registry - it will read the docker images from there. A simple >>>> private container registry doesn't work because kubernetes config files >>> are >>>> static - this means that if local devs try to use the kubernetes files, >>>> they point at the private container registry and they wouldn't be able to >>>> retrieve the images since they don't have access. They'd have to manually >>>> edit the files, which in theory is an option, but I don't consider that >>> to >>>> be acceptable since it feels pretty unfriendly (it is simple, so if we >>>> really don't like the below options we can revisit it.) >>>> >>>> Quick summary of the options >>>> >>>> ======================= >>>> >>>> We can: >>>> >>>> * Start using something like k8 helm - this adds more dependencies, adds >>> a >>>> small amount of complexity (this is my recommendation, but only by a >>>> little) >>>> >>>> * Start pushing images to docker hub - this means they'll be publicly >>>> visible and raises the bar for maintenance of those images >>>> >>>> * Host our own public container registry - this means running our own >>>> public service with costs, etc.. >>>> >>>> Below are detailed discussions of these options. You can skip to the "My >>>> thoughts on this" section if you're not interested in the details. >>>> >>>> >>>> 1. Templated kubernetes images >>>> >>>> ========================= >>>> >>>> Kubernetes (k8) does not currently have built in support for >>> parameterizing >>>> scripts - there's an issues open for this[1], but it doesn't seem to be >>>> very active. >>>> >>>> There are tools like Kubernetes helm that allow users to specify >>> parameters >>>> when running their kubernetes scripts. They also enable a lot more >>> (they're >>>> probably closer to a package manager like apt-get) - see this >>>> description[3] for an overview. >>>> >>>> I'm open to other options besides helm, but it seems to be the officially >>>> supported one. >>>> >>>> How the world would look using helm: >>>> >>>> * When developing an IO IT, someone (either the developer or one of us), >>>> would need to create a chart (the name for the helm script) - it's >>>> basically another set of config files but in theory is as simple as a >>>> couple metadata files plus a templatized version of a regular k8 script. >>>> This should be trivial compared to the task of creating a k8 script. >>>> >>>> * When creating an instance of a data store, the developer (or the beam >>> CI >>>> server) would first build the docker image for the data store and push to >>>> their container registry, then run a command like `helm install -f >>>> mydb.yaml --set imageRepo=1.2.3.4` >>>> >>>> * when done running tests/developing/etc… the developer/beam CI server >>>> would run `helm delete -f mydb.yaml` >>>> >>>> Upsides: >>>> >>>> * Something like helm is pretty interesting - we talked about it as an >>>> upside and something we wanted to do when we talked about using >>> kubernetes >>>> >>>> * We pick up a set of working kubernetes scripts this way. The full list >>> is >>>> at [2], but some ones that stood out: mongodb, memcached, mysql, >>> postgres, >>>> redis, elasticsearch (incubating), kafka (incubating), zookeeper >>>> (incubating) - this could speed development >>>> >>>> Downsides: >>>> >>>> * Adds an additional dependency to run our ITs (helm or another k8 >>>> templating tool) >>>> >>>> * Requires people to build their own images run a container registry if >>>> they don't already have one (it will not surprise you that there's a >>> docker >>>> image for running the registry [0] - so it's not crazy. :) I *think* this >>>> will probably just be a simple one/two line command once we have it >>>> scripted. >>>> >>>> * Helm in particular is kind of heavyweight for what we really need - it >>>> requires running a service in the k8 cluster and adds additional >>>> complexity. >>>> >>>> * Adds to the complexity of creating a new kubernetes script. Until I've >>>> tried it, I can't really speak to the complexity, but taking a look at >>> the >>>> instructions [4], it doesn't seem too bad. >>>> >>>> >>>> >>>> >>>> 2. Push images to docker hub >>>> >>>> ======================= >>>> >>>> This requires that users push images that we want to use to docker hub, >>> and >>>> then our IO ITs will rely on that. I think the developer of the >>> dockerfile >>>> should be responsible for the image - having the beam project responsible >>>> for a publicly available artifact (like the docker images) outside of our >>>> core deliverables doesn't seem like the right move. >>>> >>>> We would still retain a copy of the source dockerfiles and could >>> regenerate >>>> the images at any time, so I'm not concerned about a scenario where >>> docker >>>> hub went away (it would be pretty simple to switch to another repo - just >>>> change some config files.) >>>> >>>> For someone running the k8 scripts (ie, running the IO ITs), this is >>> pretty >>>> easy - they just run the k8 script like they do today. >>>> >>>> For someone creating the k8 scripts (ie, creating the IO ITs), this is >>> more >>>> complex - either they or we have to push this to docker hub and make sure >>>> it's up to date, etc.. >>>> >>>> >>>> Upsides: >>>> >>>> * No additional complexity for IO IT runners. >>>> >>>> Downsides: >>>> >>>> * Higher bar for creating the image in the first place - someone has to >>>> maintain the publicly available docker hub image. >>>> >>>> * It seems weird to have a custom docker image up on docker hub - maybe >>>> that's common, but if we need specific changes to images for our needs, >>> I'd >>>> prefer it be private. >>>> >>>> >>>> 3. Run our own *public* container registry >>>> >>>> ============================================== >>>> >>>> We would run a beam-specific container registry service - it would be >>> used >>>> by the apache beam CI servers, but it would also be available for use by >>>> anyone running beam IO ITs on their local dev setup. >>>> >>>> From a IO IT creator's perspective, this would look pretty similar to how >>>> things are now - they just check in a dockerfile. For someone running the >>>> k8 scripts, they similarly don't need to think about it. >>>> >>>> Upsides: >>>> >>>> * we're not adding any additional complexity for end developer >>>> >>>> Downsides: >>>> >>>> * Have to keep docker registry software up to date >>>> >>>> * The service is a single of failure for any beam devs running IO ITs >>>> >>>> * It can incur costs, etc… As an open source project, it doesn't seem >>> great >>>> for us to be running a public service. >>>> >>>> >>>> >>>> My thoughts on this >>>> >>>> =============== >>>> >>>> In spite of the additional complexity, I think using k8 helm is probably >>>> the best option. The general goal behind the IO ITs has been to keep >>>> ourselves self-contained: avoid having centralized infrastructure for >>> those >>>> running the ITs. Helm is a good match for those criteria. I will admit >>> that >>>> I find the additional dependencies/complexity to be worrisome. However, I >>>> really like the idea of picking up additional data store configs for >>> free - >>>> if we were doing this in 5 years, we'd say "we should just use the >>>> ecosystem of helm charts" and go from there. >>>> >>>> I do think that pushing images to docker hub is a viable option, and if >>> the >>>> community is more excited to do that/wants to push the images there, I'd >>>> support it. I can see how folks would be hesitant. I would like for the >>>> developer of the docker file to do >>>> >>>> Of the 3 options, I would strongly push back against running a public >>>> container registry - I would not want to administer it, and I don't think >>>> we as a project want to be paying for the costs associated with it. >>>> >>>> Next steps >>>> >>>> ========= >>>> >>>> Let me know what you think! This is definitely a topic where >>> understanding >>>> what the community of IO devs wants is helpful. As we discuss, I'll >>>> probably spend a little time exploring helm since I want to play around >>>> with it and understand if there are other drawbacks. I ran into this >>>> question while working on getting the HIFIO cassandra cluster running, >>> so I >>>> might prototype with that. >>>> >>>> I'll create JIRA for this in the next day or so. >>>> >>>> Stephen >>>> >>>> >>>> >>>> [0] docker registry container - https://hub.docker.com/_/registry/ >>>> >>>> [1] kubernetes issue open for supporting templates - >>>> https://github.com/kubernetes/kubernetes/issues/23896 >>>> >>>> [2] set of available charts - https://github.com/kubernetes/charts >>>> >>>> [3] kubernetes helm introduction - >>>> https://deis.com/blog/2015/introducing-helm-for-kubernetes/ >>>> [4] kubernetes charts instructions - >>>> https://github.com/kubernetes/helm/blob/master/docs/charts.md > > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com