I've been exploring Samza for stream processing as well as Kubernetes as a container orchestration system and I wanted to be able to use one with the other. The prospect of having to execute YARN either along side or on top of Kubernetes did not appeal to me, so I developed a KubernetesJob implementation of SamzaJob.
You can find the details at https://github.com/eliaslevy/samza_kubernetes, but in summary KubernetesJob executes and generates a serialized JobModel. Instead of interacting with Kubernetes directly to create the SamzaContainers (as the YarnJob's SamzaApplicationMaster may do with the YARN RM), it output a config YAML file that can be used to create the SamzaContainers in Kubernetes by using Resource Controllers. For this you require to package your job as a Docker image. You can reach the README at the above repo for details. A few observations: It would be useful if SamzaContainer accepted the JobModel via an environment variable. Right not it expects a URL to download it from. I get around this by using a entry point script that copies the model from an environment variable into a file, then passes a file URL to SamzaContainer. SamzaContainer doesn't allow you to configure the JMX port. It selects a port at random from the ephemeral range as it expects to execute in YARN where a static port could result in a conflict. This is not the case in Kubernetes where each Pod (i.e. SamzaContainer) is given its own IP address. This implementation doesn't provide a Samza dashboard, which in the YARN implementation is hosted in the Application Master. There didn't seem to be much value provided by the dashboard that is not already provided by the Kubernetes tools for monitoring pods. I've successfully executed the hello-samza jobs in Kubernetes: $ kubectl get po NAME READY STATUS RESTARTS AGE kafka-1-jjh8n 1/1 Running 0 2d kafka-2-buycp 1/1 Running 0 2d kafka-3-tghkp 1/1 Running 0 2d wikipedia-feed-0-4its2 1/1 Running 0 1d wikipedia-parser-0-l0onv 1/1 Running 0 17h wikipedia-parser-1-crrxh 1/1 Running 0 17h wikipedia-parser-2-1c5nn 1/1 Running 0 17h wikipedia-stats-0-3gaiu 1/1 Running 0 16h wikipedia-stats-1-j5qlk 1/1 Running 0 16h wikipedia-stats-2-2laos 1/1 Running 0 16h zookeeper-1-1sb4a 1/1 Running 0 2d zookeeper-2-dndk7 1/1 Running 0 2d zookeeper-3-46n09 1/1 Running 0 2d Finally, accessing services within the Kubernetes cluster from the outside is quite cumbersome unless one uses an external load balancer. This makes it difficult to bootstrap a job, as SamzaJob must connect to Zookeeper and Kafka to find out the number of partitions on the topics it will subscribe to, so it can assign them statically among the number of containers requested. Ideally Samza would operate along the lines of the Kafka high-level consumer, which dynamically coordinate to allocate work among members of a consumer group. This would do away with the new to execute SamzaJob a priori to generate the JobModel to pass to the SamzaContainers. It would also allow for dynamically changing the number of containers without having the shutdown the job.