Hi Tom,

This would depend on what your k8s container orchestration logic looks
like. For example, in YARN, 'status' returns 'not running' after 'start'
until all the containers requested from the AM are 'running'. We also
leverage YARN to restart containers/job automatically on failures (within
some bounds). Additionally, we set up a monitoring alert that goes off if
the number of running containers stays lower than the number of expected
containers for extended periods of time (~ 5 minutes).

Are you saying that you noticed that the LocalApplicationRunner status
returns 'running' even if its stream processor / SamzaContainer has stopped
processing?

- Prateek

On Fri, Mar 15, 2019 at 7:26 AM Tom Davis <t...@recursivedream.com> wrote:

> I'm using the LocalApplicationRunner and had added a liveness check
> around the `status` method. The app is running in Kubernetes so, in
> theory, it could be restarted if exceptions happened during processing.
> However, it seems that "container failure" is divorced from "app
> failure" because the app continues to run even after all the task
> containers have shut down. Is there a better way to check for
> application health? Is there a way to shut down the application if all
> containers have failed? Should I simply ensure exceptions never escape
> operators? Thanks!
>

Reply via email to