Awesome.  Thanks.

On Sun, Nov 29, 2015 at 3:25 PM, Elias Levy <fearsome.lucid...@gmail.com>
wrote:

> Roger,
>
> You are welcomed.  If you want to experiment, you can use my hello samza
> <https://hub.docker.com/r/elevy/hello-samza/> Docker image.
>
> On Sun, Nov 29, 2015 at 12:19 PM, Roger Hoover <roger.hoo...@gmail.com>
> wrote:
>
> > Elias,
> >
> > I would also love to be able to deploy Samza on Kubernetes with dynamic
> > task management.  Thanks for sharing this.  It may be a good interim
> > solution.
> >
> > Roger
> >
> > On Sun, Nov 29, 2015 at 11:18 AM, Elias Levy <
> fearsome.lucid...@gmail.com>
> > wrote:
> >
> > > I've been exploring Samza for stream processing as well as Kubernetes
> as
> > a
> > > container orchestration system and I wanted to be able to use one with
> > the
> > > other.  The prospect of having to execute YARN either along side or on
> > top
> > > of Kubernetes did not appeal to me, so I developed a KubernetesJob
> > > implementation of SamzaJob.
> > >
> > > You can find the details at
> > https://github.com/eliaslevy/samza_kubernetes,
> > > but in summary KubernetesJob executes and generates a serialized
> > JobModel.
> > > Instead of interacting with Kubernetes directly to create the
> > > SamzaContainers (as the YarnJob's SamzaApplicationMaster may do with
> the
> > > YARN RM), it output a config YAML file that can be used to create the
> > > SamzaContainers in Kubernetes by using Resource Controllers.  For this
> > you
> > > require to package your job as a Docker image.  You can reach the
> README
> > at
> > > the above repo for details.
> > >
> > > A few observations:
> > >
> > > It would be useful if SamzaContainer accepted the JobModel via an
> > > environment variable.  Right not it expects a URL to download it
> from.  I
> > > get around this by using a entry point script that copies the model
> from
> > an
> > > environment variable into a file, then passes a file URL to
> > SamzaContainer.
> > >
> > > SamzaContainer doesn't allow you to configure the JMX port.  It
> selects a
> > > port at random from the ephemeral range as it expects to execute in
> YARN
> > > where a static port could result in a conflict.  This is not the case
> in
> > > Kubernetes where each Pod (i.e. SamzaContainer) is given its own IP
> > > address.
> > >
> > > This implementation doesn't provide a Samza dashboard, which in the
> YARN
> > > implementation is hosted in the Application Master.  There didn't seem
> to
> > > be much value provided by the dashboard that is not already provided by
> > the
> > > Kubernetes tools for monitoring pods.
> > >
> > > I've successfully executed the hello-samza jobs in Kubernetes:
> > >
> > > $ kubectl get po
> > > NAME                       READY     STATUS    RESTARTS   AGE
> > > kafka-1-jjh8n              1/1       Running   0          2d
> > > kafka-2-buycp              1/1       Running   0          2d
> > > kafka-3-tghkp              1/1       Running   0          2d
> > > wikipedia-feed-0-4its2     1/1       Running   0          1d
> > > wikipedia-parser-0-l0onv   1/1       Running   0          17h
> > > wikipedia-parser-1-crrxh   1/1       Running   0          17h
> > > wikipedia-parser-2-1c5nn   1/1       Running   0          17h
> > > wikipedia-stats-0-3gaiu    1/1       Running   0          16h
> > > wikipedia-stats-1-j5qlk    1/1       Running   0          16h
> > > wikipedia-stats-2-2laos    1/1       Running   0          16h
> > > zookeeper-1-1sb4a          1/1       Running   0          2d
> > > zookeeper-2-dndk7          1/1       Running   0          2d
> > > zookeeper-3-46n09          1/1       Running   0          2d
> > >
> > >
> > > Finally, accessing services within the Kubernetes cluster from the
> > outside
> > > is quite cumbersome unless one uses an external load balancer.  This
> > makes
> > > it difficult to bootstrap a job, as SamzaJob must connect to Zookeeper
> > and
> > > Kafka to find out the number of partitions on the topics it will
> > subscribe
> > > to, so it can assign them statically among the number of containers
> > > requested.
> > >
> > > Ideally Samza would operate along the lines of the Kafka high-level
> > > consumer, which dynamically coordinate to allocate work among members
> of
> > a
> > > consumer group.  This would do away with the new to execute SamzaJob a
> > > priori to generate the JobModel to pass to the SamzaContainers.  It
> would
> > > also allow for dynamically changing the number of containers without
> > having
> > > the shutdown the job.
> > >
> >
>

Reply via email to