We are in the early evaluating period for Samza in a relatively resource constrained environment. One of the things we cannot currently expect is more than a 1 gigabit local network which our models indicate we will saturate in a naïve case.
One solution we are considering would be that all of our highest throughput jobs, the ones that consume directly from and filter high throughput topics, would be co-located on the same nodes running the brokers for the applicable partition of those topics. The idea being we would not have to escape loopback to deliver the messages and that the output bandwidth of those jobs would be significantly smaller and more manageable. It seems like this is something the ApplicationMaster would have to coordinate with YARN and very much resembles how YARN will allocate compute resources near HDFS-stored-data. Is there anything in ApplicationMaster that would allow us to do this today? Or would the proper approach be to run those jobs directly outside of a YARN grid and have the YARN Jobs read from the products of such direct jobs? -Bart ________________________________ This e-mail may contain CONFIDENTIAL AND PROPRIETARY INFORMATION and/or PRIVILEGED AND CONFIDENTIAL COMMUNICATION intended solely for the recipient and, therefore, may not be retransmitted to any party outside of the recipient's organization without the prior written consent of the sender. If you have received this e-mail in error please notify the sender immediately by telephone or reply e-mail and destroy the original message without making a copy. Deep Silver Volition, LLC accepts no liability for any losses or damages resulting from infected e-mail transmissions and viruses in e-mail attachment.
