well, just to close the loop....

I've dived deeper into this whole Hadoop on containers story. And
unfortunately, I have came to the conclusion that it doesn't make that
much sense. Building separate images per a component and orchestrating
it through k8s or Swarm doesn't solve anything, but adds a lot of
hassle. Using this approach as a sort of packaging technique also
doesn't add much to a developer or an admin.

We already have this mechanism where one can create a docker image
with an arbitrary set of components and deploy a cluster using
different images like this. It is good enough for most of the cases
where it makes sense to deploy Hadoop stack from containers.

Hence, I decided to pull off of this project. But it someone else can
thing of a better way doing this sort of things, I would be happy to
join hands.

--
  With regards,
Konstantin (Cos) Boudnik
2CAC 8312 4870 D885 8616  6115 220F 6980 1F27 E622

Disclaimer: Opinions expressed in this email are those of the author,
and do not necessarily represent the views of any company the author
might be affiliated with at the moment of writing.


On Thu, Oct 18, 2018 at 2:26 PM, Konstantin Boudnik <c...@apache.org> wrote:
> Indeed, to be heard you don't need to be a committer: they aren't some sort of
> privileged class here ;)
>
> Anyway, back to this discussion and answering some of the concerns by Evans.
> Tarballs aren't a key requirement for this approach: I was using tarballs
> built from debs to cut some corners and not to change any of the Puppet
> recipes while I am certain my experiment shows something viable. My first
> intention was, of course, to use our packages. But I couldn't think of any
> clever way to avoid pulling all install-time dependencies without a massive
> rewrite of the packages. E.g. it won't be possible to install just Spark
> without pulling in YARN or HDFS dependencies. And will undermine the idea of
> component-specific images (or layers, as I've called them in the OP).
>
> In fact, you know my stance on the whole tarball thing: I've been pushing back
> on parcel-like approach since I can remember. I still think it's a horrible
> idea to produce tarballs as a first-class artifacts. There's plenty of reasons
> for this which are out of the scope of this conversation.
>
> Speaking of use-cases: as both Mikhail and pointed out, it is intended for
> something like Swarm or K8S (basically, anything that can orchestrate
> containers into something meaningful at scale).
>
> Much like Mikhail suggested, mixing base-layers would achieve my idea of
> piling up components on top of each other in order to create different
> special purpose or function roles. I guess it is much like sandbox you've
> mentioned, but without the hassle of creating whole stack for each new
> combination of the components. I will look more closely to the Swarm thing in
> the next day or so.
>
> Thanks guys!
>   Cos
>
>
> On Tue, Oct 16, 2018 at 01:38AM, Evans Ye wrote:
>> To Mikhail:
>> It never has to be committer to join the discuss. Welcome to share any idea
>> you have :)
>>
>> To reply in all:
>> This might be tangential, but I just want to bring in more information.
>> Currently we have Docker Provisioner and Docker Sandbox(experimental)
>> features inside Bigtop.
>> Which:
>> 1. Provisioner: install RPM/DEB via Puppet on the fly when creating cluster
>> 2. Sandbox: pre-install RPM/DEB via Puppet as special purposed stack(say
>> HDFS+SPARK) and save as an images
>>
>> Non of the above go for tarball because they're built to tale around bigtop
>> RPM/DEB packages, which might be the most valuable thing we produce. I
>> don't mean when can't ditch packages, but we have to come up with
>> considerations to cover the whole picture, say:
>>
>> 1. Where does the tarball from? Is it from upstream directly or produced by
>> Bigtop with self-patches for compatibility fixes?
>> 2. If we'd like to support install from tarball, how will the orchestration
>> tool(Puppet) being shaped? Is it going to support both RPM/DEB and tarball
>> or just the new one?
>> 3. What's the purpose of producing docker images in this new way? If we can
>> made it supported to run on K8S, that's a perfect use case!
>>
>> Overall, I champion to have this new feature in Bigtop, but just want to
>> bring up something more for discussion :)
>>
>> Evans
>>
>> Mikhail Epikhin <mikh...@epikhin.net> 於 2018年10月15日 週一 下午5:04寫道:
>>
>> > Looks very interesting!
>> >
>> > Sorry for breaking into discussion, i'm not a commiter, just yet another
>> > user, but..
>> >
>> > As you wrote, docker doesn't fit this well.
>> > The problem is that you tried to push all components into one container,
>> > and you lost immutability of image.
>> > I fully agree and understand this way for production, for more local
>> > connectivity, but docker containers doesn't have big difference for run
>> > this hive, spark, hdfs on one docker container, or on many different.
>> > Anyway, their using network for connectivity, and you compare connectivity
>> > inside one container and connectivity between many containers on one local
>> > machine.
>> >
>> > They all run on one single machine, and if you create own container for
>> > each component hdfs, hive, spark, hbase, yarn its good fitting to docker
>> > model.
>> >
>> > Futher, you can create environment using docker-compose for mixing this
>> > base layers [hdfs, hive, spark, hhbase, ignite] as you wish.
>> >
>> > Just create a set of base images and templating script for creating
>> > docker-compose.yml for connect their.
>> >
>> > Futher, if you want to simulate many-nodes cluster -- you can do it just
>> > writting new docker-compose.yaml. You can test High Availability, HDFS
>> > decommission, or anything you want just write your own docker-compose.yaml.
>> >
>> > --
>> > Mikhail Epikhin
>> >
>> > On Thu, Oct 11, 2018, at 20:25, Konstantin Boudnik wrote:
>> > > Well, finally I came around and started working on the long-awaiting
>> > feature
>> > > for Bigtop, where one would be able to quickly build a container with an
>> > > arbitrary set of components in it for further orchestration.
>> > >
>> > > The ideas was to have components in different layers, so they could be
>> > put
>> > > combined together for desired effect. Say there are layers with:
>> > >   1 hdfs
>> > >   2 hive
>> > >   3 spark
>> > >   4 hbase
>> > >   5 ignite
>> > >   6 yarn
>> > > and so on....
>> > >
>> > > If one wants to assemble a spark only cluster there would be a way to
>> > layer up
>> > > 3 and 1 (ideally, 3's dependency to 1 would be automatically calculated)
>> > and
>> > > boom - there's an image, which would be put to use. The number of
>> > combination
>> > > might be greater, of course. E.g. 3-6-1, or 4-2-1-6 and so forth.
>> > >
>> > > Turned out, that I can't "prebuild" those layers as Docker won't allow
>> > you to
>> > > combine separate images to one ;( However, there's still a way to
>> > achieve a
>> > > similar effect. All I need to do is to create a set of tar-ball
>> > containing all
>> > > bits of particular components, i.e. all bits of spark or hive. When an
>> > image
>> > > needs to be build, these tarballs would be used to layer the software on
>> > top
>> > > of the base image and each other. In the above example, Dockerfile would
>> > look
>> > > something like
>> > >
>> > >     FROM ubuntu:16.04
>> > >     ADD hdfs-all.tar /tmp
>> > >     RUN tar xf /tmp/hdfs-all.tar
>> > >     ADD spark-all.tar /tmp
>> > >     RUN tar xf /tmp/spark-all.tar
>> > >
>> > > Once the images is generated, the orchestration and configuration phases
>> > will
>> > > kick in. At which point a docker-based cluster would be all ready to go.
>> > >
>> > > Do you guys see any value in this approach comparing to the current
>> > > package-based way of managing things?
>> > >
>> > > Appreciate any thoughts!
>> > > --
>> > >   Cos
>> > >
>> > > P.S. BTW, I guess I have a decent answer to all those asking for tar-ball
>> > > installation artifacts. It is as easy as running
>> > >     dpkg-deb -xv
>> > > on all packages and then tar'ing up the resulted set of files.
>> > >
>> > > Email had 1 attachment:
>> > > + signature.asc
>> > >   1k (application/pgp-signature)
>> >

Reply via email to