We are finding YARN and AWS Ec2 to be too costly for us. We are having to scale the cluster to support more jobs and have plans to write more jobs. We are scaling because cluster doesn’t have enough VCores to support all the Containers, not enough RAM for jobs, etc.
Has anyone had luck running Samza jobs in an alternative scheduler? Say, Nomad, Kubernetes or something else? Similarly, anyone have any luck with Samza on something like Kafka’s streams where I don’t have to have the overhead of YARN and a scheduler at all? Also, at a small scale shop – what is the minimum number of partitions I can get away with? Any advice on determining the appropriate number of partitions? Kafka, Zookeeper and Secor are also costs we could potentially reduce via partition count. Thanks for any input. Jeremiah Adams Software Engineer www.helixeducation.com<http://www.helixeducation.com/> Blog<http://www.helixeducation.com/blog/> | Twitter<https://twitter.com/HelixEducation> | Facebook<https://www.facebook.com/HelixEducation> | LinkedIn<http://www.linkedin.com/company/3609946>