Hi Tom, This would depend on what your k8s container orchestration logic looks like. For example, in YARN, 'status' returns 'not running' after 'start' until all the containers requested from the AM are 'running'. We also leverage YARN to restart containers/job automatically on failures (within some bounds). Additionally, we set up a monitoring alert that goes off if the number of running containers stays lower than the number of expected containers for extended periods of time (~ 5 minutes).
Are you saying that you noticed that the LocalApplicationRunner status returns 'running' even if its stream processor / SamzaContainer has stopped processing? - Prateek On Fri, Mar 15, 2019 at 7:26 AM Tom Davis <t...@recursivedream.com> wrote: > I'm using the LocalApplicationRunner and had added a liveness check > around the `status` method. The app is running in Kubernetes so, in > theory, it could be restarted if exceptions happened during processing. > However, it seems that "container failure" is divorced from "app > failure" because the app continues to run even after all the task > containers have shut down. Is there a better way to check for > application health? Is there a way to shut down the application if all > containers have failed? Should I simply ensure exceptions never escape > operators? Thanks! >