Thanks Darion and Garry, this is helpful. I have read that Zookeeper is very latency-sensitive.
I'll definitely try YARN NM on all 3 hosts. I'd be happy to contribute our findings to a FAQ or wiki page. One so far is that YARN is the most complicated bit within this setup process, since there is scant documentation on how to set up YARN without dragging in the rest of Hadoop. By the way I did come across an excellent presentation byPhilip O'Toole of Loggly (video <https://www.youtube.com/watch?v=LpNbjXFPyZ0>, slides<http://www.slideshare.net/AmazonWebServices/infrastructure-at-scale-apache-kafka-twitter-storm-elastic-search-arc303-aws-reinvent-2013>) that discusses how they use Kafka and Storm on EC2. No Samza. O'Toole mentions using EBS volumes for Kafka and says they create daily volume snapshots for disaster recovery purposes. I haven't found any mention of disaster recovery for Kafka or Samza and I wondered if that even makes sense given the replication/partition approach. On Thu, Apr 24, 2014 at 11:15 PM, darion <[email protected]>wrote: > Samza is based on JVM and Ubuntu maybe ok > > Samaza I haven't used but Spark and Storm is working well on EC2 > both seems similar > > 于 14-4-25 上午3:18, Oshoma Momoh 写道: > > Hi all, >> >> I am setting up a Samza cluster for the first time, and am now at the >> point >> of deploying on EC2. Hopefully this is the correct place to ask a few >> newbie questions. I'm impressed and excited by what I've seen so far, >> eager >> to get going with a real deployment. >> >> 1. Does anyone have good or bad experiences to report in running Samza >> atop >> Ubuntu 14.04 LTS? (Versus 12.04.) >> >> 2. Any best practices to recommend in terms of setup on EC2? E.g. instance >> types to use, EBS volumes versus non-EBS, and so on. I've found several >> threads with conflicting opinions on all of this. Our current plan is... >> (a) Use EBS volumes, separating Zookeeper from Kafka. >> (b) Start with three m3.large instances to begin with and upgrade later as >> needed, since our initial data volume will be low >> (c) Kafka + Zookeeper + Yarn Node Manager on two worker nodes, and Kafka + >> Zookeeper + Yarn Resource Manager on the third node. >> >> Regards, >> >> osh >> >> Oshoma Momoh >> http://pcglab.com >> >> >
