I'm looking to deploy Samza on AWS infrastructure in a HA configuration. I have a clear picture of how to configure all the components such that they do not contain any single point of failure.
I'm stuck, however, when it comes to the YARN architecture. It seems that YARN relies on the single-master / multi-slave pattern as described in the YARN documentation. This introduces a single point of failure at the ResourceManager level such that a failed ResourceManager will fail the entire YARN cluster. How does LinkedIn architect a HA configuration for Samza on YARN such that a complete instance failure of ResourceManager provides failover for the YARN cluster? Thanks for your help. Best, Ethan -- Ethan Setnik MobileAware m: +1 617 513 2052 e: [email protected]
