Re: IO ITs: Hosting Docker images

Ekrem Aksoy Mon, 10 Apr 2017 09:58:49 -0700

Hi Stephen,

Can we piggyback on current Apache Docker Hub account? I think images can
be hold there, too.


-E

On Mon, Apr 10, 2017 at 5:22 PM, Stephen Sisk <s...@google.com.invalid>
wrote:

> for 4 - there's a number of logistics involved. How do you propose handling
> cost, potential DOS, etc? People in different timezones would need to be
> oncall for it since it impacts people's ability to dev work (or they need
> to be okay if it goes out.) Can you give some reasons why you think it's
> better than the other options? I put it on the list, but I'm strongly not a
> fan.
>
> S
>
> On Sat, Apr 8, 2017 at 5:31 AM Ted Yu <yuzhih...@gmail.com> wrote:
>
> > +1
> >
> > > On Apr 7, 2017, at 10:46 PM, Jean-Baptiste Onofré <j...@nanthrax.net>
> > wrote:
> > >
> > > Hi Stephen,
> > >
> > > I think we should go to 1 and 4:
> > >
> > > 1. Try to use existing images providing what we need. If we don't find
> > existing image, we can always ask and help other community to provide so.
> > > 4. If we don't find a suitable image, and waiting for this image, we
> can
> > store the image in our own "IT dockerhub".
> > >
> > > Regards
> > > JB
> > >
> > >> On 04/08/2017 01:03 AM, Stephen Sisk wrote:
> > >> Wanted to see if anyone else had opinions on this/provide a quick
> > update.
> > >>
> > >> I think for both elasticsearch and HIFIO that we can find existing,
> > >> supported images that could serve those purposes - HIFIO is looking
> like
> > >> it'll able to do so for cassandra, which was proving tricky.
> > >>
> > >> So to summarize my current proposed solutions: (ordered by my
> > preference)
> > >> 1. (new) Strongly urge people to find existing docker images that meet
> > our
> > >> image criteria - regularly updated/security checked
> > >> 2. Start using helm
> > >> 3. Push our docker images to docker hub
> > >> 4. Host our own public container registry
> > >>
> > >> S
> > >>
> > >>> On Tue, Apr 4, 2017 at 10:16 AM Stephen Sisk <s...@google.com>
> wrote:
> > >>>
> > >>> I'd like to hear what direction folks want to go in, and from there
> > look
> > >>> at the options. I think for some of these options (like running our
> own
> > >>> public registry), they may be able to and it's something we should
> > look at,
> > >>> but I don't assume they have time to work on this type of issue.
> > >>>
> > >>> S
> > >>>
> > >>> On Tue, Apr 4, 2017 at 10:00 AM Lukasz Cwik <lc...@google.com.invalid
> >
> > >>> wrote:
> > >>>
> > >>> Is this something that Apache infra could help us with?
> > >>>
> > >>> On Mon, Apr 3, 2017 at 7:22 PM, Stephen Sisk <s...@google.com.invalid
> >
> > >>> wrote:
> > >>>
> > >>>> Summary:
> > >>>>
> > >>>> For IO ITs that use data stores that need custom docker images in
> > order
> > >>> to
> > >>>> run, we can't currently use them in a kubernetes cluster (which is
> > where
> > >>> we
> > >>>> host our data stores.) I have a couple options for how to solve this
> > and
> > >>> am
> > >>>> looking for feedback from folks involved in creating IO ITs/opinions
> > on
> > >>>> kubernetes.
> > >>>>
> > >>>>
> > >>>> Details:
> > >>>>
> > >>>> We've discussed in the past that we'll want to allow developers to
> > submit
> > >>>> just a dockerfile, and then we'll use that when creating the data
> > store
> > >>> on
> > >>>> kubernetes. This is the case for ElasticsearchIO and I assume more
> > data
> > >>>> stores in the future will want to do this. It's also looking like
> > it'll
> > >>> be
> > >>>> necessary to use custom docker images for the HadoopInputFormatIO's
> > >>>> cassandra ITs - to run a cassandra cluster, there doesn't seem to
> be a
> > >>> good
> > >>>> image you can use out of the box.
> > >>>>
> > >>>> In either case, in order to retrieve a docker image, kubernetes
> needs
> > a
> > >>>> container registry - it will read the docker images from there. A
> > simple
> > >>>> private container registry doesn't work because kubernetes config
> > files
> > >>> are
> > >>>> static - this means that if local devs try to use the kubernetes
> > files,
> > >>>> they point at the private container registry and they wouldn't be
> > able to
> > >>>> retrieve the images since they don't have access. They'd have to
> > manually
> > >>>> edit the files, which in theory is an option, but I don't consider
> > that
> > >>> to
> > >>>> be acceptable since it feels pretty unfriendly (it is simple, so if
> we
> > >>>> really don't like the below options we can revisit it.)
> > >>>>
> > >>>> Quick summary of the options
> > >>>>
> > >>>> =======================
> > >>>>
> > >>>> We can:
> > >>>>
> > >>>> * Start using something like k8 helm - this adds more dependencies,
> > adds
> > >>> a
> > >>>> small amount of complexity (this is my recommendation, but only by a
> > >>>> little)
> > >>>>
> > >>>> * Start pushing images to docker hub - this means they'll be
> publicly
> > >>>> visible and raises the bar for maintenance of those images
> > >>>>
> > >>>> * Host our own public container registry - this means running our
> own
> > >>>> public service with costs, etc..
> > >>>>
> > >>>> Below are detailed discussions of these options. You can skip to the
> > "My
> > >>>> thoughts on this" section if you're not interested in the details.
> > >>>>
> > >>>>
> > >>>> 1. Templated kubernetes images
> > >>>>
> > >>>> =========================
> > >>>>
> > >>>> Kubernetes (k8) does not currently have built in support for
> > >>> parameterizing
> > >>>> scripts - there's an issues open for this[1], but it doesn't seem to
> > be
> > >>>> very active.
> > >>>>
> > >>>> There are tools like Kubernetes helm that allow users to specify
> > >>> parameters
> > >>>> when running their kubernetes scripts. They also enable a lot more
> > >>> (they're
> > >>>> probably closer to a package manager like apt-get) - see this
> > >>>> description[3] for an overview.
> > >>>>
> > >>>> I'm open to other options besides helm, but it seems to be the
> > officially
> > >>>> supported one.
> > >>>>
> > >>>> How the world would look using helm:
> > >>>>
> > >>>> * When developing an IO IT, someone (either the developer or one of
> > us),
> > >>>> would need to create a chart (the name for the helm script) - it's
> > >>>> basically another set of config files but in theory is as simple as
> a
> > >>>> couple metadata files plus a templatized version of a regular k8
> > script.
> > >>>> This should be trivial compared to the task of creating a k8 script.
> > >>>>
> > >>>> *  When creating an instance of a data store, the developer (or the
> > beam
> > >>> CI
> > >>>> server) would first build the docker image for the data store and
> > push to
> > >>>> their container registry, then run a command like `helm install -f
> > >>>> mydb.yaml --set imageRepo=1.2.3.4`
> > >>>>
> > >>>> * when done running tests/developing/etc…  the developer/beam CI
> > server
> > >>>> would run `helm delete -f mydb.yaml`
> > >>>>
> > >>>> Upsides:
> > >>>>
> > >>>> * Something like helm is pretty interesting - we talked about it as
> an
> > >>>> upside and something we wanted to do when we talked about using
> > >>> kubernetes
> > >>>>
> > >>>> * We pick up a set of working kubernetes scripts this way. The full
> > list
> > >>> is
> > >>>> at [2], but some ones that stood out: mongodb, memcached, mysql,
> > >>> postgres,
> > >>>> redis, elasticsearch (incubating), kafka (incubating), zookeeper
> > >>>> (incubating) - this could speed development
> > >>>>
> > >>>> Downsides:
> > >>>>
> > >>>> * Adds an additional dependency to run our ITs (helm or another k8
> > >>>> templating tool)
> > >>>>
> > >>>> * Requires people to build their own images run a container registry
> > if
> > >>>> they don't already have one (it will not surprise you that there's a
> > >>> docker
> > >>>> image for running the registry [0] - so it's not crazy. :) I *think*
> > this
> > >>>> will probably just be a simple one/two line command once we have it
> > >>>> scripted.
> > >>>>
> > >>>> * Helm in particular is kind of heavyweight for what we really need
> -
> > it
> > >>>> requires running a service in the k8 cluster and adds additional
> > >>>> complexity.
> > >>>>
> > >>>> * Adds to the complexity of creating a new kubernetes script. Until
> > I've
> > >>>> tried it, I can't really speak to the complexity, but taking a look
> at
> > >>> the
> > >>>> instructions [4], it doesn't seem too bad.
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> 2. Push images to docker hub
> > >>>>
> > >>>> =======================
> > >>>>
> > >>>> This requires that users push images that we want to use to docker
> > hub,
> > >>> and
> > >>>> then our IO ITs will rely on that. I  think the developer of the
> > >>> dockerfile
> > >>>> should be responsible for the image - having the beam project
> > responsible
> > >>>> for a publicly available artifact (like the docker images) outside
> of
> > our
> > >>>> core deliverables doesn't seem like the right move.
> > >>>>
> > >>>> We would still retain a copy of the source dockerfiles and could
> > >>> regenerate
> > >>>> the images at any time, so I'm not concerned about a scenario where
> > >>> docker
> > >>>> hub went away (it would be pretty simple to switch to another repo -
> > just
> > >>>> change some config files.)
> > >>>>
> > >>>> For someone running the k8 scripts (ie, running the IO ITs), this is
> > >>> pretty
> > >>>> easy - they just run the k8 script like they do today.
> > >>>>
> > >>>> For someone creating the k8 scripts (ie, creating the IO ITs), this
> is
> > >>> more
> > >>>> complex - either they or we have to push this to docker hub and make
> > sure
> > >>>> it's up to date, etc..
> > >>>>
> > >>>>
> > >>>> Upsides:
> > >>>>
> > >>>> * No additional complexity for IO IT runners.
> > >>>>
> > >>>> Downsides:
> > >>>>
> > >>>> * Higher bar for creating the image in the first place - someone has
> > to
> > >>>> maintain the publicly available docker hub image.
> > >>>>
> > >>>> * It seems weird to have a custom docker image up on docker hub -
> > maybe
> > >>>> that's common, but if we need specific changes to images for our
> > needs,
> > >>> I'd
> > >>>> prefer it be private.
> > >>>>
> > >>>>
> > >>>> 3. Run our own *public* container registry
> > >>>>
> > >>>> ==============================================
> > >>>>
> > >>>> We would run a beam-specific container registry service - it would
> be
> > >>> used
> > >>>> by the apache beam CI servers, but it would also be available for
> use
> > by
> > >>>> anyone running beam IO ITs on their local dev setup.
> > >>>>
> > >>>> From a IO IT creator's perspective, this would look pretty similar
> to
> > how
> > >>>> things are now - they just check in a dockerfile. For someone
> running
> > the
> > >>>> k8 scripts, they similarly don't need to think about it.
> > >>>>
> > >>>> Upsides:
> > >>>>
> > >>>> * we're not adding any additional complexity for end developer
> > >>>>
> > >>>> Downsides:
> > >>>>
> > >>>> * Have to keep docker registry software up to date
> > >>>>
> > >>>> * The service is a single of failure for any beam devs running IO
> ITs
> > >>>>
> > >>>> * It can incur costs, etc… As an open source project, it doesn't
> seem
> > >>> great
> > >>>> for us to be running a public service.
> > >>>>
> > >>>>
> > >>>>
> > >>>> My thoughts on this
> > >>>>
> > >>>> ===============
> > >>>>
> > >>>> In spite of the additional complexity, I think using k8 helm is
> > probably
> > >>>> the best option. The general goal behind the IO ITs has been to keep
> > >>>> ourselves self-contained: avoid having centralized infrastructure
> for
> > >>> those
> > >>>> running the ITs. Helm is a good match for those criteria. I will
> admit
> > >>> that
> > >>>> I find the additional dependencies/complexity to be worrisome.
> > However, I
> > >>>> really like the idea of picking up additional data store configs for
> > >>> free -
> > >>>> if we were doing this in 5 years, we'd say "we should just use the
> > >>>> ecosystem of helm charts" and go from there.
> > >>>>
> > >>>> I do think that pushing images to docker hub is a viable option, and
> > if
> > >>> the
> > >>>> community is more excited to do that/wants to push the images there,
> > I'd
> > >>>> support it. I can see how folks would be hesitant. I would like for
> > the
> > >>>> developer of the docker file to do
> > >>>>
> > >>>> Of the 3 options, I would strongly push back against running a
> public
> > >>>> container registry - I would not want to administer it, and I don't
> > think
> > >>>> we as a project want to be paying for the costs associated with it.
> > >>>>
> > >>>> Next steps
> > >>>>
> > >>>> =========
> > >>>>
> > >>>> Let me know what you think! This is definitely a topic where
> > >>> understanding
> > >>>> what the community of IO devs wants is helpful. As we discuss, I'll
> > >>>> probably spend a little time exploring helm since I want to play
> > around
> > >>>> with it and understand if there are other drawbacks. I ran into this
> > >>>> question while working on getting the HIFIO cassandra cluster
> running,
> > >>> so I
> > >>>> might prototype with that.
> > >>>>
> > >>>> I'll create JIRA for this in the next day or so.
> > >>>>
> > >>>> Stephen
> > >>>>
> > >>>>
> > >>>>
> > >>>> [0] docker registry container - https://hub.docker.com/_/registry/
> > >>>>
> > >>>> [1] kubernetes issue open for supporting templates -
> > >>>> https://github.com/kubernetes/kubernetes/issues/23896
> > >>>>
> > >>>> [2] set of available charts - https://github.com/kubernetes/charts
> > >>>>
> > >>>> [3] kubernetes helm introduction -
> > >>>> https://deis.com/blog/2015/introducing-helm-for-kubernetes/
> > >>>> [4] kubernetes charts instructions -
> > >>>> https://github.com/kubernetes/helm/blob/master/docs/charts.md
> > >
> > > --
> > > Jean-Baptiste Onofré
> > > jbono...@apache.org
> > > http://blog.nanthrax.net
> > > Talend - http://www.talend.com
> >
>

Re: IO ITs: Hosting Docker images

Reply via email to