Can you expand a bit on why you'd like to do that vs the other options?

One of the downsides of the "we create our own docker images" plan is that
we have to maintain them. How do you want to do that?

to be clear: I'm asking because I'm trying to figure out what the community
wants  and right now I'm just hearing "I want X". It's most helpful for me
to hear "I want X because of reasons A,B, and C".

S

On Mon, Apr 10, 2017 at 9:59 AM Jean-Baptiste Onofré <[email protected]>
wrote:

> Agree it's what I said in a previous email.
>
> Regards
> JB
>
> On Apr 10, 2017, 18:58, at 18:58, Ekrem Aksoy <[email protected]>
> wrote:
> >Hi Stephen,
> >
> >Can we piggyback on current Apache Docker Hub account? I think images
> >can
> >be hold there, too.
> >
> >-E
> >
> >On Mon, Apr 10, 2017 at 5:22 PM, Stephen Sisk <[email protected]>
> >wrote:
> >
> >> for 4 - there's a number of logistics involved. How do you propose
> >handling
> >> cost, potential DOS, etc? People in different timezones would need to
> >be
> >> oncall for it since it impacts people's ability to dev work (or they
> >need
> >> to be okay if it goes out.) Can you give some reasons why you think
> >it's
> >> better than the other options? I put it on the list, but I'm strongly
> >not a
> >> fan.
> >>
> >> S
> >>
> >> On Sat, Apr 8, 2017 at 5:31 AM Ted Yu <[email protected]> wrote:
> >>
> >> > +1
> >> >
> >> > > On Apr 7, 2017, at 10:46 PM, Jean-Baptiste Onofré
> ><[email protected]>
> >> > wrote:
> >> > >
> >> > > Hi Stephen,
> >> > >
> >> > > I think we should go to 1 and 4:
> >> > >
> >> > > 1. Try to use existing images providing what we need. If we don't
> >find
> >> > existing image, we can always ask and help other community to
> >provide so.
> >> > > 4. If we don't find a suitable image, and waiting for this image,
> >we
> >> can
> >> > store the image in our own "IT dockerhub".
> >> > >
> >> > > Regards
> >> > > JB
> >> > >
> >> > >> On 04/08/2017 01:03 AM, Stephen Sisk wrote:
> >> > >> Wanted to see if anyone else had opinions on this/provide a
> >quick
> >> > update.
> >> > >>
> >> > >> I think for both elasticsearch and HIFIO that we can find
> >existing,
> >> > >> supported images that could serve those purposes - HIFIO is
> >looking
> >> like
> >> > >> it'll able to do so for cassandra, which was proving tricky.
> >> > >>
> >> > >> So to summarize my current proposed solutions: (ordered by my
> >> > preference)
> >> > >> 1. (new) Strongly urge people to find existing docker images
> >that meet
> >> > our
> >> > >> image criteria - regularly updated/security checked
> >> > >> 2. Start using helm
> >> > >> 3. Push our docker images to docker hub
> >> > >> 4. Host our own public container registry
> >> > >>
> >> > >> S
> >> > >>
> >> > >>> On Tue, Apr 4, 2017 at 10:16 AM Stephen Sisk <[email protected]>
> >> wrote:
> >> > >>>
> >> > >>> I'd like to hear what direction folks want to go in, and from
> >there
> >> > look
> >> > >>> at the options. I think for some of these options (like running
> >our
> >> own
> >> > >>> public registry), they may be able to and it's something we
> >should
> >> > look at,
> >> > >>> but I don't assume they have time to work on this type of
> >issue.
> >> > >>>
> >> > >>> S
> >> > >>>
> >> > >>> On Tue, Apr 4, 2017 at 10:00 AM Lukasz Cwik
> ><[email protected]
> >> >
> >> > >>> wrote:
> >> > >>>
> >> > >>> Is this something that Apache infra could help us with?
> >> > >>>
> >> > >>> On Mon, Apr 3, 2017 at 7:22 PM, Stephen Sisk
> ><[email protected]
> >> >
> >> > >>> wrote:
> >> > >>>
> >> > >>>> Summary:
> >> > >>>>
> >> > >>>> For IO ITs that use data stores that need custom docker images
> >in
> >> > order
> >> > >>> to
> >> > >>>> run, we can't currently use them in a kubernetes cluster
> >(which is
> >> > where
> >> > >>> we
> >> > >>>> host our data stores.) I have a couple options for how to
> >solve this
> >> > and
> >> > >>> am
> >> > >>>> looking for feedback from folks involved in creating IO
> >ITs/opinions
> >> > on
> >> > >>>> kubernetes.
> >> > >>>>
> >> > >>>>
> >> > >>>> Details:
> >> > >>>>
> >> > >>>> We've discussed in the past that we'll want to allow
> >developers to
> >> > submit
> >> > >>>> just a dockerfile, and then we'll use that when creating the
> >data
> >> > store
> >> > >>> on
> >> > >>>> kubernetes. This is the case for ElasticsearchIO and I assume
> >more
> >> > data
> >> > >>>> stores in the future will want to do this. It's also looking
> >like
> >> > it'll
> >> > >>> be
> >> > >>>> necessary to use custom docker images for the
> >HadoopInputFormatIO's
> >> > >>>> cassandra ITs - to run a cassandra cluster, there doesn't seem
> >to
> >> be a
> >> > >>> good
> >> > >>>> image you can use out of the box.
> >> > >>>>
> >> > >>>> In either case, in order to retrieve a docker image,
> >kubernetes
> >> needs
> >> > a
> >> > >>>> container registry - it will read the docker images from
> >there. A
> >> > simple
> >> > >>>> private container registry doesn't work because kubernetes
> >config
> >> > files
> >> > >>> are
> >> > >>>> static - this means that if local devs try to use the
> >kubernetes
> >> > files,
> >> > >>>> they point at the private container registry and they wouldn't
> >be
> >> > able to
> >> > >>>> retrieve the images since they don't have access. They'd have
> >to
> >> > manually
> >> > >>>> edit the files, which in theory is an option, but I don't
> >consider
> >> > that
> >> > >>> to
> >> > >>>> be acceptable since it feels pretty unfriendly (it is simple,
> >so if
> >> we
> >> > >>>> really don't like the below options we can revisit it.)
> >> > >>>>
> >> > >>>> Quick summary of the options
> >> > >>>>
> >> > >>>> =======================
> >> > >>>>
> >> > >>>> We can:
> >> > >>>>
> >> > >>>> * Start using something like k8 helm - this adds more
> >dependencies,
> >> > adds
> >> > >>> a
> >> > >>>> small amount of complexity (this is my recommendation, but
> >only by a
> >> > >>>> little)
> >> > >>>>
> >> > >>>> * Start pushing images to docker hub - this means they'll be
> >> publicly
> >> > >>>> visible and raises the bar for maintenance of those images
> >> > >>>>
> >> > >>>> * Host our own public container registry - this means running
> >our
> >> own
> >> > >>>> public service with costs, etc..
> >> > >>>>
> >> > >>>> Below are detailed discussions of these options. You can skip
> >to the
> >> > "My
> >> > >>>> thoughts on this" section if you're not interested in the
> >details.
> >> > >>>>
> >> > >>>>
> >> > >>>> 1. Templated kubernetes images
> >> > >>>>
> >> > >>>> =========================
> >> > >>>>
> >> > >>>> Kubernetes (k8) does not currently have built in support for
> >> > >>> parameterizing
> >> > >>>> scripts - there's an issues open for this[1], but it doesn't
> >seem to
> >> > be
> >> > >>>> very active.
> >> > >>>>
> >> > >>>> There are tools like Kubernetes helm that allow users to
> >specify
> >> > >>> parameters
> >> > >>>> when running their kubernetes scripts. They also enable a lot
> >more
> >> > >>> (they're
> >> > >>>> probably closer to a package manager like apt-get) - see this
> >> > >>>> description[3] for an overview.
> >> > >>>>
> >> > >>>> I'm open to other options besides helm, but it seems to be the
> >> > officially
> >> > >>>> supported one.
> >> > >>>>
> >> > >>>> How the world would look using helm:
> >> > >>>>
> >> > >>>> * When developing an IO IT, someone (either the developer or
> >one of
> >> > us),
> >> > >>>> would need to create a chart (the name for the helm script) -
> >it's
> >> > >>>> basically another set of config files but in theory is as
> >simple as
> >> a
> >> > >>>> couple metadata files plus a templatized version of a regular
> >k8
> >> > script.
> >> > >>>> This should be trivial compared to the task of creating a k8
> >script.
> >> > >>>>
> >> > >>>> *  When creating an instance of a data store, the developer
> >(or the
> >> > beam
> >> > >>> CI
> >> > >>>> server) would first build the docker image for the data store
> >and
> >> > push to
> >> > >>>> their container registry, then run a command like `helm
> >install -f
> >> > >>>> mydb.yaml --set imageRepo=1.2.3.4`
> >> > >>>>
> >> > >>>> * when done running tests/developing/etc…  the developer/beam
> >CI
> >> > server
> >> > >>>> would run `helm delete -f mydb.yaml`
> >> > >>>>
> >> > >>>> Upsides:
> >> > >>>>
> >> > >>>> * Something like helm is pretty interesting - we talked about
> >it as
> >> an
> >> > >>>> upside and something we wanted to do when we talked about
> >using
> >> > >>> kubernetes
> >> > >>>>
> >> > >>>> * We pick up a set of working kubernetes scripts this way. The
> >full
> >> > list
> >> > >>> is
> >> > >>>> at [2], but some ones that stood out: mongodb, memcached,
> >mysql,
> >> > >>> postgres,
> >> > >>>> redis, elasticsearch (incubating), kafka (incubating),
> >zookeeper
> >> > >>>> (incubating) - this could speed development
> >> > >>>>
> >> > >>>> Downsides:
> >> > >>>>
> >> > >>>> * Adds an additional dependency to run our ITs (helm or
> >another k8
> >> > >>>> templating tool)
> >> > >>>>
> >> > >>>> * Requires people to build their own images run a container
> >registry
> >> > if
> >> > >>>> they don't already have one (it will not surprise you that
> >there's a
> >> > >>> docker
> >> > >>>> image for running the registry [0] - so it's not crazy. :) I
> >*think*
> >> > this
> >> > >>>> will probably just be a simple one/two line command once we
> >have it
> >> > >>>> scripted.
> >> > >>>>
> >> > >>>> * Helm in particular is kind of heavyweight for what we really
> >need
> >> -
> >> > it
> >> > >>>> requires running a service in the k8 cluster and adds
> >additional
> >> > >>>> complexity.
> >> > >>>>
> >> > >>>> * Adds to the complexity of creating a new kubernetes script.
> >Until
> >> > I've
> >> > >>>> tried it, I can't really speak to the complexity, but taking a
> >look
> >> at
> >> > >>> the
> >> > >>>> instructions [4], it doesn't seem too bad.
> >> > >>>>
> >> > >>>>
> >> > >>>>
> >> > >>>>
> >> > >>>> 2. Push images to docker hub
> >> > >>>>
> >> > >>>> =======================
> >> > >>>>
> >> > >>>> This requires that users push images that we want to use to
> >docker
> >> > hub,
> >> > >>> and
> >> > >>>> then our IO ITs will rely on that. I  think the developer of
> >the
> >> > >>> dockerfile
> >> > >>>> should be responsible for the image - having the beam project
> >> > responsible
> >> > >>>> for a publicly available artifact (like the docker images)
> >outside
> >> of
> >> > our
> >> > >>>> core deliverables doesn't seem like the right move.
> >> > >>>>
> >> > >>>> We would still retain a copy of the source dockerfiles and
> >could
> >> > >>> regenerate
> >> > >>>> the images at any time, so I'm not concerned about a scenario
> >where
> >> > >>> docker
> >> > >>>> hub went away (it would be pretty simple to switch to another
> >repo -
> >> > just
> >> > >>>> change some config files.)
> >> > >>>>
> >> > >>>> For someone running the k8 scripts (ie, running the IO ITs),
> >this is
> >> > >>> pretty
> >> > >>>> easy - they just run the k8 script like they do today.
> >> > >>>>
> >> > >>>> For someone creating the k8 scripts (ie, creating the IO ITs),
> >this
> >> is
> >> > >>> more
> >> > >>>> complex - either they or we have to push this to docker hub
> >and make
> >> > sure
> >> > >>>> it's up to date, etc..
> >> > >>>>
> >> > >>>>
> >> > >>>> Upsides:
> >> > >>>>
> >> > >>>> * No additional complexity for IO IT runners.
> >> > >>>>
> >> > >>>> Downsides:
> >> > >>>>
> >> > >>>> * Higher bar for creating the image in the first place -
> >someone has
> >> > to
> >> > >>>> maintain the publicly available docker hub image.
> >> > >>>>
> >> > >>>> * It seems weird to have a custom docker image up on docker
> >hub -
> >> > maybe
> >> > >>>> that's common, but if we need specific changes to images for
> >our
> >> > needs,
> >> > >>> I'd
> >> > >>>> prefer it be private.
> >> > >>>>
> >> > >>>>
> >> > >>>> 3. Run our own *public* container registry
> >> > >>>>
> >> > >>>> ==============================================
> >> > >>>>
> >> > >>>> We would run a beam-specific container registry service - it
> >would
> >> be
> >> > >>> used
> >> > >>>> by the apache beam CI servers, but it would also be available
> >for
> >> use
> >> > by
> >> > >>>> anyone running beam IO ITs on their local dev setup.
> >> > >>>>
> >> > >>>> From a IO IT creator's perspective, this would look pretty
> >similar
> >> to
> >> > how
> >> > >>>> things are now - they just check in a dockerfile. For someone
> >> running
> >> > the
> >> > >>>> k8 scripts, they similarly don't need to think about it.
> >> > >>>>
> >> > >>>> Upsides:
> >> > >>>>
> >> > >>>> * we're not adding any additional complexity for end developer
> >> > >>>>
> >> > >>>> Downsides:
> >> > >>>>
> >> > >>>> * Have to keep docker registry software up to date
> >> > >>>>
> >> > >>>> * The service is a single of failure for any beam devs running
> >IO
> >> ITs
> >> > >>>>
> >> > >>>> * It can incur costs, etc… As an open source project, it
> >doesn't
> >> seem
> >> > >>> great
> >> > >>>> for us to be running a public service.
> >> > >>>>
> >> > >>>>
> >> > >>>>
> >> > >>>> My thoughts on this
> >> > >>>>
> >> > >>>> ===============
> >> > >>>>
> >> > >>>> In spite of the additional complexity, I think using k8 helm
> >is
> >> > probably
> >> > >>>> the best option. The general goal behind the IO ITs has been
> >to keep
> >> > >>>> ourselves self-contained: avoid having centralized
> >infrastructure
> >> for
> >> > >>> those
> >> > >>>> running the ITs. Helm is a good match for those criteria. I
> >will
> >> admit
> >> > >>> that
> >> > >>>> I find the additional dependencies/complexity to be worrisome.
> >> > However, I
> >> > >>>> really like the idea of picking up additional data store
> >configs for
> >> > >>> free -
> >> > >>>> if we were doing this in 5 years, we'd say "we should just use
> >the
> >> > >>>> ecosystem of helm charts" and go from there.
> >> > >>>>
> >> > >>>> I do think that pushing images to docker hub is a viable
> >option, and
> >> > if
> >> > >>> the
> >> > >>>> community is more excited to do that/wants to push the images
> >there,
> >> > I'd
> >> > >>>> support it. I can see how folks would be hesitant. I would
> >like for
> >> > the
> >> > >>>> developer of the docker file to do
> >> > >>>>
> >> > >>>> Of the 3 options, I would strongly push back against running a
> >> public
> >> > >>>> container registry - I would not want to administer it, and I
> >don't
> >> > think
> >> > >>>> we as a project want to be paying for the costs associated
> >with it.
> >> > >>>>
> >> > >>>> Next steps
> >> > >>>>
> >> > >>>> =========
> >> > >>>>
> >> > >>>> Let me know what you think! This is definitely a topic where
> >> > >>> understanding
> >> > >>>> what the community of IO devs wants is helpful. As we discuss,
> >I'll
> >> > >>>> probably spend a little time exploring helm since I want to
> >play
> >> > around
> >> > >>>> with it and understand if there are other drawbacks. I ran
> >into this
> >> > >>>> question while working on getting the HIFIO cassandra cluster
> >> running,
> >> > >>> so I
> >> > >>>> might prototype with that.
> >> > >>>>
> >> > >>>> I'll create JIRA for this in the next day or so.
> >> > >>>>
> >> > >>>> Stephen
> >> > >>>>
> >> > >>>>
> >> > >>>>
> >> > >>>> [0] docker registry container -
> >https://hub.docker.com/_/registry/
> >> > >>>>
> >> > >>>> [1] kubernetes issue open for supporting templates -
> >> > >>>> https://github.com/kubernetes/kubernetes/issues/23896
> >> > >>>>
> >> > >>>> [2] set of available charts -
> >https://github.com/kubernetes/charts
> >> > >>>>
> >> > >>>> [3] kubernetes helm introduction -
> >> > >>>> https://deis.com/blog/2015/introducing-helm-for-kubernetes/
> >> > >>>> [4] kubernetes charts instructions -
> >> > >>>> https://github.com/kubernetes/helm/blob/master/docs/charts.md
> >> > >
> >> > > --
> >> > > Jean-Baptiste Onofré
> >> > > [email protected]
> >> > > http://blog.nanthrax.net
> >> > > Talend - http://www.talend.com
> >> >
> >>
>

Reply via email to