Hi Stephen, Can we piggyback on current Apache Docker Hub account? I think images can be hold there, too.
-E On Mon, Apr 10, 2017 at 5:22 PM, Stephen Sisk <s...@google.com.invalid> wrote: > for 4 - there's a number of logistics involved. How do you propose handling > cost, potential DOS, etc? People in different timezones would need to be > oncall for it since it impacts people's ability to dev work (or they need > to be okay if it goes out.) Can you give some reasons why you think it's > better than the other options? I put it on the list, but I'm strongly not a > fan. > > S > > On Sat, Apr 8, 2017 at 5:31 AM Ted Yu <yuzhih...@gmail.com> wrote: > > > +1 > > > > > On Apr 7, 2017, at 10:46 PM, Jean-Baptiste Onofré <j...@nanthrax.net> > > wrote: > > > > > > Hi Stephen, > > > > > > I think we should go to 1 and 4: > > > > > > 1. Try to use existing images providing what we need. If we don't find > > existing image, we can always ask and help other community to provide so. > > > 4. If we don't find a suitable image, and waiting for this image, we > can > > store the image in our own "IT dockerhub". > > > > > > Regards > > > JB > > > > > >> On 04/08/2017 01:03 AM, Stephen Sisk wrote: > > >> Wanted to see if anyone else had opinions on this/provide a quick > > update. > > >> > > >> I think for both elasticsearch and HIFIO that we can find existing, > > >> supported images that could serve those purposes - HIFIO is looking > like > > >> it'll able to do so for cassandra, which was proving tricky. > > >> > > >> So to summarize my current proposed solutions: (ordered by my > > preference) > > >> 1. (new) Strongly urge people to find existing docker images that meet > > our > > >> image criteria - regularly updated/security checked > > >> 2. Start using helm > > >> 3. Push our docker images to docker hub > > >> 4. Host our own public container registry > > >> > > >> S > > >> > > >>> On Tue, Apr 4, 2017 at 10:16 AM Stephen Sisk <s...@google.com> > wrote: > > >>> > > >>> I'd like to hear what direction folks want to go in, and from there > > look > > >>> at the options. I think for some of these options (like running our > own > > >>> public registry), they may be able to and it's something we should > > look at, > > >>> but I don't assume they have time to work on this type of issue. > > >>> > > >>> S > > >>> > > >>> On Tue, Apr 4, 2017 at 10:00 AM Lukasz Cwik <lc...@google.com.invalid > > > > >>> wrote: > > >>> > > >>> Is this something that Apache infra could help us with? > > >>> > > >>> On Mon, Apr 3, 2017 at 7:22 PM, Stephen Sisk <s...@google.com.invalid > > > > >>> wrote: > > >>> > > >>>> Summary: > > >>>> > > >>>> For IO ITs that use data stores that need custom docker images in > > order > > >>> to > > >>>> run, we can't currently use them in a kubernetes cluster (which is > > where > > >>> we > > >>>> host our data stores.) I have a couple options for how to solve this > > and > > >>> am > > >>>> looking for feedback from folks involved in creating IO ITs/opinions > > on > > >>>> kubernetes. > > >>>> > > >>>> > > >>>> Details: > > >>>> > > >>>> We've discussed in the past that we'll want to allow developers to > > submit > > >>>> just a dockerfile, and then we'll use that when creating the data > > store > > >>> on > > >>>> kubernetes. This is the case for ElasticsearchIO and I assume more > > data > > >>>> stores in the future will want to do this. It's also looking like > > it'll > > >>> be > > >>>> necessary to use custom docker images for the HadoopInputFormatIO's > > >>>> cassandra ITs - to run a cassandra cluster, there doesn't seem to > be a > > >>> good > > >>>> image you can use out of the box. > > >>>> > > >>>> In either case, in order to retrieve a docker image, kubernetes > needs > > a > > >>>> container registry - it will read the docker images from there. A > > simple > > >>>> private container registry doesn't work because kubernetes config > > files > > >>> are > > >>>> static - this means that if local devs try to use the kubernetes > > files, > > >>>> they point at the private container registry and they wouldn't be > > able to > > >>>> retrieve the images since they don't have access. They'd have to > > manually > > >>>> edit the files, which in theory is an option, but I don't consider > > that > > >>> to > > >>>> be acceptable since it feels pretty unfriendly (it is simple, so if > we > > >>>> really don't like the below options we can revisit it.) > > >>>> > > >>>> Quick summary of the options > > >>>> > > >>>> ======================= > > >>>> > > >>>> We can: > > >>>> > > >>>> * Start using something like k8 helm - this adds more dependencies, > > adds > > >>> a > > >>>> small amount of complexity (this is my recommendation, but only by a > > >>>> little) > > >>>> > > >>>> * Start pushing images to docker hub - this means they'll be > publicly > > >>>> visible and raises the bar for maintenance of those images > > >>>> > > >>>> * Host our own public container registry - this means running our > own > > >>>> public service with costs, etc.. > > >>>> > > >>>> Below are detailed discussions of these options. You can skip to the > > "My > > >>>> thoughts on this" section if you're not interested in the details. > > >>>> > > >>>> > > >>>> 1. Templated kubernetes images > > >>>> > > >>>> ========================= > > >>>> > > >>>> Kubernetes (k8) does not currently have built in support for > > >>> parameterizing > > >>>> scripts - there's an issues open for this[1], but it doesn't seem to > > be > > >>>> very active. > > >>>> > > >>>> There are tools like Kubernetes helm that allow users to specify > > >>> parameters > > >>>> when running their kubernetes scripts. They also enable a lot more > > >>> (they're > > >>>> probably closer to a package manager like apt-get) - see this > > >>>> description[3] for an overview. > > >>>> > > >>>> I'm open to other options besides helm, but it seems to be the > > officially > > >>>> supported one. > > >>>> > > >>>> How the world would look using helm: > > >>>> > > >>>> * When developing an IO IT, someone (either the developer or one of > > us), > > >>>> would need to create a chart (the name for the helm script) - it's > > >>>> basically another set of config files but in theory is as simple as > a > > >>>> couple metadata files plus a templatized version of a regular k8 > > script. > > >>>> This should be trivial compared to the task of creating a k8 script. > > >>>> > > >>>> * When creating an instance of a data store, the developer (or the > > beam > > >>> CI > > >>>> server) would first build the docker image for the data store and > > push to > > >>>> their container registry, then run a command like `helm install -f > > >>>> mydb.yaml --set imageRepo=1.2.3.4` > > >>>> > > >>>> * when done running tests/developing/etc… the developer/beam CI > > server > > >>>> would run `helm delete -f mydb.yaml` > > >>>> > > >>>> Upsides: > > >>>> > > >>>> * Something like helm is pretty interesting - we talked about it as > an > > >>>> upside and something we wanted to do when we talked about using > > >>> kubernetes > > >>>> > > >>>> * We pick up a set of working kubernetes scripts this way. The full > > list > > >>> is > > >>>> at [2], but some ones that stood out: mongodb, memcached, mysql, > > >>> postgres, > > >>>> redis, elasticsearch (incubating), kafka (incubating), zookeeper > > >>>> (incubating) - this could speed development > > >>>> > > >>>> Downsides: > > >>>> > > >>>> * Adds an additional dependency to run our ITs (helm or another k8 > > >>>> templating tool) > > >>>> > > >>>> * Requires people to build their own images run a container registry > > if > > >>>> they don't already have one (it will not surprise you that there's a > > >>> docker > > >>>> image for running the registry [0] - so it's not crazy. :) I *think* > > this > > >>>> will probably just be a simple one/two line command once we have it > > >>>> scripted. > > >>>> > > >>>> * Helm in particular is kind of heavyweight for what we really need > - > > it > > >>>> requires running a service in the k8 cluster and adds additional > > >>>> complexity. > > >>>> > > >>>> * Adds to the complexity of creating a new kubernetes script. Until > > I've > > >>>> tried it, I can't really speak to the complexity, but taking a look > at > > >>> the > > >>>> instructions [4], it doesn't seem too bad. > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> 2. Push images to docker hub > > >>>> > > >>>> ======================= > > >>>> > > >>>> This requires that users push images that we want to use to docker > > hub, > > >>> and > > >>>> then our IO ITs will rely on that. I think the developer of the > > >>> dockerfile > > >>>> should be responsible for the image - having the beam project > > responsible > > >>>> for a publicly available artifact (like the docker images) outside > of > > our > > >>>> core deliverables doesn't seem like the right move. > > >>>> > > >>>> We would still retain a copy of the source dockerfiles and could > > >>> regenerate > > >>>> the images at any time, so I'm not concerned about a scenario where > > >>> docker > > >>>> hub went away (it would be pretty simple to switch to another repo - > > just > > >>>> change some config files.) > > >>>> > > >>>> For someone running the k8 scripts (ie, running the IO ITs), this is > > >>> pretty > > >>>> easy - they just run the k8 script like they do today. > > >>>> > > >>>> For someone creating the k8 scripts (ie, creating the IO ITs), this > is > > >>> more > > >>>> complex - either they or we have to push this to docker hub and make > > sure > > >>>> it's up to date, etc.. > > >>>> > > >>>> > > >>>> Upsides: > > >>>> > > >>>> * No additional complexity for IO IT runners. > > >>>> > > >>>> Downsides: > > >>>> > > >>>> * Higher bar for creating the image in the first place - someone has > > to > > >>>> maintain the publicly available docker hub image. > > >>>> > > >>>> * It seems weird to have a custom docker image up on docker hub - > > maybe > > >>>> that's common, but if we need specific changes to images for our > > needs, > > >>> I'd > > >>>> prefer it be private. > > >>>> > > >>>> > > >>>> 3. Run our own *public* container registry > > >>>> > > >>>> ============================================== > > >>>> > > >>>> We would run a beam-specific container registry service - it would > be > > >>> used > > >>>> by the apache beam CI servers, but it would also be available for > use > > by > > >>>> anyone running beam IO ITs on their local dev setup. > > >>>> > > >>>> From a IO IT creator's perspective, this would look pretty similar > to > > how > > >>>> things are now - they just check in a dockerfile. For someone > running > > the > > >>>> k8 scripts, they similarly don't need to think about it. > > >>>> > > >>>> Upsides: > > >>>> > > >>>> * we're not adding any additional complexity for end developer > > >>>> > > >>>> Downsides: > > >>>> > > >>>> * Have to keep docker registry software up to date > > >>>> > > >>>> * The service is a single of failure for any beam devs running IO > ITs > > >>>> > > >>>> * It can incur costs, etc… As an open source project, it doesn't > seem > > >>> great > > >>>> for us to be running a public service. > > >>>> > > >>>> > > >>>> > > >>>> My thoughts on this > > >>>> > > >>>> =============== > > >>>> > > >>>> In spite of the additional complexity, I think using k8 helm is > > probably > > >>>> the best option. The general goal behind the IO ITs has been to keep > > >>>> ourselves self-contained: avoid having centralized infrastructure > for > > >>> those > > >>>> running the ITs. Helm is a good match for those criteria. I will > admit > > >>> that > > >>>> I find the additional dependencies/complexity to be worrisome. > > However, I > > >>>> really like the idea of picking up additional data store configs for > > >>> free - > > >>>> if we were doing this in 5 years, we'd say "we should just use the > > >>>> ecosystem of helm charts" and go from there. > > >>>> > > >>>> I do think that pushing images to docker hub is a viable option, and > > if > > >>> the > > >>>> community is more excited to do that/wants to push the images there, > > I'd > > >>>> support it. I can see how folks would be hesitant. I would like for > > the > > >>>> developer of the docker file to do > > >>>> > > >>>> Of the 3 options, I would strongly push back against running a > public > > >>>> container registry - I would not want to administer it, and I don't > > think > > >>>> we as a project want to be paying for the costs associated with it. > > >>>> > > >>>> Next steps > > >>>> > > >>>> ========= > > >>>> > > >>>> Let me know what you think! This is definitely a topic where > > >>> understanding > > >>>> what the community of IO devs wants is helpful. As we discuss, I'll > > >>>> probably spend a little time exploring helm since I want to play > > around > > >>>> with it and understand if there are other drawbacks. I ran into this > > >>>> question while working on getting the HIFIO cassandra cluster > running, > > >>> so I > > >>>> might prototype with that. > > >>>> > > >>>> I'll create JIRA for this in the next day or so. > > >>>> > > >>>> Stephen > > >>>> > > >>>> > > >>>> > > >>>> [0] docker registry container - https://hub.docker.com/_/registry/ > > >>>> > > >>>> [1] kubernetes issue open for supporting templates - > > >>>> https://github.com/kubernetes/kubernetes/issues/23896 > > >>>> > > >>>> [2] set of available charts - https://github.com/kubernetes/charts > > >>>> > > >>>> [3] kubernetes helm introduction - > > >>>> https://deis.com/blog/2015/introducing-helm-for-kubernetes/ > > >>>> [4] kubernetes charts instructions - > > >>>> https://github.com/kubernetes/helm/blob/master/docs/charts.md > > > > > > -- > > > Jean-Baptiste Onofré > > > jbono...@apache.org > > > http://blog.nanthrax.net > > > Talend - http://www.talend.com > > >