Agree it's what I said in a previous email.

Regards
JB

On Apr 10, 2017, 18:58, at 18:58, Ekrem Aksoy <[email protected]> wrote:
>Hi Stephen,
>
>Can we piggyback on current Apache Docker Hub account? I think images
>can
>be hold there, too.
>
>-E
>
>On Mon, Apr 10, 2017 at 5:22 PM, Stephen Sisk <[email protected]>
>wrote:
>
>> for 4 - there's a number of logistics involved. How do you propose
>handling
>> cost, potential DOS, etc? People in different timezones would need to
>be
>> oncall for it since it impacts people's ability to dev work (or they
>need
>> to be okay if it goes out.) Can you give some reasons why you think
>it's
>> better than the other options? I put it on the list, but I'm strongly
>not a
>> fan.
>>
>> S
>>
>> On Sat, Apr 8, 2017 at 5:31 AM Ted Yu <[email protected]> wrote:
>>
>> > +1
>> >
>> > > On Apr 7, 2017, at 10:46 PM, Jean-Baptiste Onofré
><[email protected]>
>> > wrote:
>> > >
>> > > Hi Stephen,
>> > >
>> > > I think we should go to 1 and 4:
>> > >
>> > > 1. Try to use existing images providing what we need. If we don't
>find
>> > existing image, we can always ask and help other community to
>provide so.
>> > > 4. If we don't find a suitable image, and waiting for this image,
>we
>> can
>> > store the image in our own "IT dockerhub".
>> > >
>> > > Regards
>> > > JB
>> > >
>> > >> On 04/08/2017 01:03 AM, Stephen Sisk wrote:
>> > >> Wanted to see if anyone else had opinions on this/provide a
>quick
>> > update.
>> > >>
>> > >> I think for both elasticsearch and HIFIO that we can find
>existing,
>> > >> supported images that could serve those purposes - HIFIO is
>looking
>> like
>> > >> it'll able to do so for cassandra, which was proving tricky.
>> > >>
>> > >> So to summarize my current proposed solutions: (ordered by my
>> > preference)
>> > >> 1. (new) Strongly urge people to find existing docker images
>that meet
>> > our
>> > >> image criteria - regularly updated/security checked
>> > >> 2. Start using helm
>> > >> 3. Push our docker images to docker hub
>> > >> 4. Host our own public container registry
>> > >>
>> > >> S
>> > >>
>> > >>> On Tue, Apr 4, 2017 at 10:16 AM Stephen Sisk <[email protected]>
>> wrote:
>> > >>>
>> > >>> I'd like to hear what direction folks want to go in, and from
>there
>> > look
>> > >>> at the options. I think for some of these options (like running
>our
>> own
>> > >>> public registry), they may be able to and it's something we
>should
>> > look at,
>> > >>> but I don't assume they have time to work on this type of
>issue.
>> > >>>
>> > >>> S
>> > >>>
>> > >>> On Tue, Apr 4, 2017 at 10:00 AM Lukasz Cwik
><[email protected]
>> >
>> > >>> wrote:
>> > >>>
>> > >>> Is this something that Apache infra could help us with?
>> > >>>
>> > >>> On Mon, Apr 3, 2017 at 7:22 PM, Stephen Sisk
><[email protected]
>> >
>> > >>> wrote:
>> > >>>
>> > >>>> Summary:
>> > >>>>
>> > >>>> For IO ITs that use data stores that need custom docker images
>in
>> > order
>> > >>> to
>> > >>>> run, we can't currently use them in a kubernetes cluster
>(which is
>> > where
>> > >>> we
>> > >>>> host our data stores.) I have a couple options for how to
>solve this
>> > and
>> > >>> am
>> > >>>> looking for feedback from folks involved in creating IO
>ITs/opinions
>> > on
>> > >>>> kubernetes.
>> > >>>>
>> > >>>>
>> > >>>> Details:
>> > >>>>
>> > >>>> We've discussed in the past that we'll want to allow
>developers to
>> > submit
>> > >>>> just a dockerfile, and then we'll use that when creating the
>data
>> > store
>> > >>> on
>> > >>>> kubernetes. This is the case for ElasticsearchIO and I assume
>more
>> > data
>> > >>>> stores in the future will want to do this. It's also looking
>like
>> > it'll
>> > >>> be
>> > >>>> necessary to use custom docker images for the
>HadoopInputFormatIO's
>> > >>>> cassandra ITs - to run a cassandra cluster, there doesn't seem
>to
>> be a
>> > >>> good
>> > >>>> image you can use out of the box.
>> > >>>>
>> > >>>> In either case, in order to retrieve a docker image,
>kubernetes
>> needs
>> > a
>> > >>>> container registry - it will read the docker images from
>there. A
>> > simple
>> > >>>> private container registry doesn't work because kubernetes
>config
>> > files
>> > >>> are
>> > >>>> static - this means that if local devs try to use the
>kubernetes
>> > files,
>> > >>>> they point at the private container registry and they wouldn't
>be
>> > able to
>> > >>>> retrieve the images since they don't have access. They'd have
>to
>> > manually
>> > >>>> edit the files, which in theory is an option, but I don't
>consider
>> > that
>> > >>> to
>> > >>>> be acceptable since it feels pretty unfriendly (it is simple,
>so if
>> we
>> > >>>> really don't like the below options we can revisit it.)
>> > >>>>
>> > >>>> Quick summary of the options
>> > >>>>
>> > >>>> =======================
>> > >>>>
>> > >>>> We can:
>> > >>>>
>> > >>>> * Start using something like k8 helm - this adds more
>dependencies,
>> > adds
>> > >>> a
>> > >>>> small amount of complexity (this is my recommendation, but
>only by a
>> > >>>> little)
>> > >>>>
>> > >>>> * Start pushing images to docker hub - this means they'll be
>> publicly
>> > >>>> visible and raises the bar for maintenance of those images
>> > >>>>
>> > >>>> * Host our own public container registry - this means running
>our
>> own
>> > >>>> public service with costs, etc..
>> > >>>>
>> > >>>> Below are detailed discussions of these options. You can skip
>to the
>> > "My
>> > >>>> thoughts on this" section if you're not interested in the
>details.
>> > >>>>
>> > >>>>
>> > >>>> 1. Templated kubernetes images
>> > >>>>
>> > >>>> =========================
>> > >>>>
>> > >>>> Kubernetes (k8) does not currently have built in support for
>> > >>> parameterizing
>> > >>>> scripts - there's an issues open for this[1], but it doesn't
>seem to
>> > be
>> > >>>> very active.
>> > >>>>
>> > >>>> There are tools like Kubernetes helm that allow users to
>specify
>> > >>> parameters
>> > >>>> when running their kubernetes scripts. They also enable a lot
>more
>> > >>> (they're
>> > >>>> probably closer to a package manager like apt-get) - see this
>> > >>>> description[3] for an overview.
>> > >>>>
>> > >>>> I'm open to other options besides helm, but it seems to be the
>> > officially
>> > >>>> supported one.
>> > >>>>
>> > >>>> How the world would look using helm:
>> > >>>>
>> > >>>> * When developing an IO IT, someone (either the developer or
>one of
>> > us),
>> > >>>> would need to create a chart (the name for the helm script) -
>it's
>> > >>>> basically another set of config files but in theory is as
>simple as
>> a
>> > >>>> couple metadata files plus a templatized version of a regular
>k8
>> > script.
>> > >>>> This should be trivial compared to the task of creating a k8
>script.
>> > >>>>
>> > >>>> *  When creating an instance of a data store, the developer
>(or the
>> > beam
>> > >>> CI
>> > >>>> server) would first build the docker image for the data store
>and
>> > push to
>> > >>>> their container registry, then run a command like `helm
>install -f
>> > >>>> mydb.yaml --set imageRepo=1.2.3.4`
>> > >>>>
>> > >>>> * when done running tests/developing/etc…  the developer/beam
>CI
>> > server
>> > >>>> would run `helm delete -f mydb.yaml`
>> > >>>>
>> > >>>> Upsides:
>> > >>>>
>> > >>>> * Something like helm is pretty interesting - we talked about
>it as
>> an
>> > >>>> upside and something we wanted to do when we talked about
>using
>> > >>> kubernetes
>> > >>>>
>> > >>>> * We pick up a set of working kubernetes scripts this way. The
>full
>> > list
>> > >>> is
>> > >>>> at [2], but some ones that stood out: mongodb, memcached,
>mysql,
>> > >>> postgres,
>> > >>>> redis, elasticsearch (incubating), kafka (incubating),
>zookeeper
>> > >>>> (incubating) - this could speed development
>> > >>>>
>> > >>>> Downsides:
>> > >>>>
>> > >>>> * Adds an additional dependency to run our ITs (helm or
>another k8
>> > >>>> templating tool)
>> > >>>>
>> > >>>> * Requires people to build their own images run a container
>registry
>> > if
>> > >>>> they don't already have one (it will not surprise you that
>there's a
>> > >>> docker
>> > >>>> image for running the registry [0] - so it's not crazy. :) I
>*think*
>> > this
>> > >>>> will probably just be a simple one/two line command once we
>have it
>> > >>>> scripted.
>> > >>>>
>> > >>>> * Helm in particular is kind of heavyweight for what we really
>need
>> -
>> > it
>> > >>>> requires running a service in the k8 cluster and adds
>additional
>> > >>>> complexity.
>> > >>>>
>> > >>>> * Adds to the complexity of creating a new kubernetes script.
>Until
>> > I've
>> > >>>> tried it, I can't really speak to the complexity, but taking a
>look
>> at
>> > >>> the
>> > >>>> instructions [4], it doesn't seem too bad.
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>> 2. Push images to docker hub
>> > >>>>
>> > >>>> =======================
>> > >>>>
>> > >>>> This requires that users push images that we want to use to
>docker
>> > hub,
>> > >>> and
>> > >>>> then our IO ITs will rely on that. I  think the developer of
>the
>> > >>> dockerfile
>> > >>>> should be responsible for the image - having the beam project
>> > responsible
>> > >>>> for a publicly available artifact (like the docker images)
>outside
>> of
>> > our
>> > >>>> core deliverables doesn't seem like the right move.
>> > >>>>
>> > >>>> We would still retain a copy of the source dockerfiles and
>could
>> > >>> regenerate
>> > >>>> the images at any time, so I'm not concerned about a scenario
>where
>> > >>> docker
>> > >>>> hub went away (it would be pretty simple to switch to another
>repo -
>> > just
>> > >>>> change some config files.)
>> > >>>>
>> > >>>> For someone running the k8 scripts (ie, running the IO ITs),
>this is
>> > >>> pretty
>> > >>>> easy - they just run the k8 script like they do today.
>> > >>>>
>> > >>>> For someone creating the k8 scripts (ie, creating the IO ITs),
>this
>> is
>> > >>> more
>> > >>>> complex - either they or we have to push this to docker hub
>and make
>> > sure
>> > >>>> it's up to date, etc..
>> > >>>>
>> > >>>>
>> > >>>> Upsides:
>> > >>>>
>> > >>>> * No additional complexity for IO IT runners.
>> > >>>>
>> > >>>> Downsides:
>> > >>>>
>> > >>>> * Higher bar for creating the image in the first place -
>someone has
>> > to
>> > >>>> maintain the publicly available docker hub image.
>> > >>>>
>> > >>>> * It seems weird to have a custom docker image up on docker
>hub -
>> > maybe
>> > >>>> that's common, but if we need specific changes to images for
>our
>> > needs,
>> > >>> I'd
>> > >>>> prefer it be private.
>> > >>>>
>> > >>>>
>> > >>>> 3. Run our own *public* container registry
>> > >>>>
>> > >>>> ==============================================
>> > >>>>
>> > >>>> We would run a beam-specific container registry service - it
>would
>> be
>> > >>> used
>> > >>>> by the apache beam CI servers, but it would also be available
>for
>> use
>> > by
>> > >>>> anyone running beam IO ITs on their local dev setup.
>> > >>>>
>> > >>>> From a IO IT creator's perspective, this would look pretty
>similar
>> to
>> > how
>> > >>>> things are now - they just check in a dockerfile. For someone
>> running
>> > the
>> > >>>> k8 scripts, they similarly don't need to think about it.
>> > >>>>
>> > >>>> Upsides:
>> > >>>>
>> > >>>> * we're not adding any additional complexity for end developer
>> > >>>>
>> > >>>> Downsides:
>> > >>>>
>> > >>>> * Have to keep docker registry software up to date
>> > >>>>
>> > >>>> * The service is a single of failure for any beam devs running
>IO
>> ITs
>> > >>>>
>> > >>>> * It can incur costs, etc… As an open source project, it
>doesn't
>> seem
>> > >>> great
>> > >>>> for us to be running a public service.
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>> My thoughts on this
>> > >>>>
>> > >>>> ===============
>> > >>>>
>> > >>>> In spite of the additional complexity, I think using k8 helm
>is
>> > probably
>> > >>>> the best option. The general goal behind the IO ITs has been
>to keep
>> > >>>> ourselves self-contained: avoid having centralized
>infrastructure
>> for
>> > >>> those
>> > >>>> running the ITs. Helm is a good match for those criteria. I
>will
>> admit
>> > >>> that
>> > >>>> I find the additional dependencies/complexity to be worrisome.
>> > However, I
>> > >>>> really like the idea of picking up additional data store
>configs for
>> > >>> free -
>> > >>>> if we were doing this in 5 years, we'd say "we should just use
>the
>> > >>>> ecosystem of helm charts" and go from there.
>> > >>>>
>> > >>>> I do think that pushing images to docker hub is a viable
>option, and
>> > if
>> > >>> the
>> > >>>> community is more excited to do that/wants to push the images
>there,
>> > I'd
>> > >>>> support it. I can see how folks would be hesitant. I would
>like for
>> > the
>> > >>>> developer of the docker file to do
>> > >>>>
>> > >>>> Of the 3 options, I would strongly push back against running a
>> public
>> > >>>> container registry - I would not want to administer it, and I
>don't
>> > think
>> > >>>> we as a project want to be paying for the costs associated
>with it.
>> > >>>>
>> > >>>> Next steps
>> > >>>>
>> > >>>> =========
>> > >>>>
>> > >>>> Let me know what you think! This is definitely a topic where
>> > >>> understanding
>> > >>>> what the community of IO devs wants is helpful. As we discuss,
>I'll
>> > >>>> probably spend a little time exploring helm since I want to
>play
>> > around
>> > >>>> with it and understand if there are other drawbacks. I ran
>into this
>> > >>>> question while working on getting the HIFIO cassandra cluster
>> running,
>> > >>> so I
>> > >>>> might prototype with that.
>> > >>>>
>> > >>>> I'll create JIRA for this in the next day or so.
>> > >>>>
>> > >>>> Stephen
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>> [0] docker registry container -
>https://hub.docker.com/_/registry/
>> > >>>>
>> > >>>> [1] kubernetes issue open for supporting templates -
>> > >>>> https://github.com/kubernetes/kubernetes/issues/23896
>> > >>>>
>> > >>>> [2] set of available charts -
>https://github.com/kubernetes/charts
>> > >>>>
>> > >>>> [3] kubernetes helm introduction -
>> > >>>> https://deis.com/blog/2015/introducing-helm-for-kubernetes/
>> > >>>> [4] kubernetes charts instructions -
>> > >>>> https://github.com/kubernetes/helm/blob/master/docs/charts.md
>> > >
>> > > --
>> > > Jean-Baptiste Onofré
>> > > [email protected]
>> > > http://blog.nanthrax.net
>> > > Talend - http://www.talend.com
>> >
>>

Reply via email to