Hi Samza Devs

The significant concern I got recently is, container leak. The data
pipeline based on Samza can guarantee at least once delivery but the
duplicate rate is over 1.0%, I am having alerts right now. Container leaks
will push a lot of alerts to me.

So, we need to find out running Samza on Mesos won't create that problem,
or Spark Streaming won't have that issue. In the worst case, creating our
own distribution coordination might be more predictable instead of running
Yarn on EMR.

What about standalone Samza? If this is quite plausible and the best
solution in the near future, I want to be able to contribute. Could you
share your thoughts or plans?

I really appreciate if you give me some guideline about implementing custom
cluster management interface of Samza. If it's possible, I want to take a
look to replace Yarn support with EC2 ASG stuff.

Thank you
Best, Jae

Reply via email to