Sudarshan Vasudevan created GOBBLIN-1162:
--------------------------------------------

             Summary: Provide an option to allow slow containers to commit 
suicide when unhealthy
                 Key: GOBBLIN-1162
                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1162
             Project: Apache Gobblin
          Issue Type: Improvement
          Components: gobblin-cluster
    Affects Versions: 0.15.0
            Reporter: Sudarshan Vasudevan
            Assignee: Hung Tran
             Fix For: 0.15.0


In execution environments such as Gobblin-on-Yarn, where Gobblin workers can be 
re-assigned when the worker dies or is killed, it is useful to add a mode where 
each Gobblin task running inside a Gobblin worker can perform application-level 
health checks and report results of the health checks back to the worker 
hosting the tasks. The worker on receiving a health check failure event can be 
configured to exit the JVM. In the case of Gobblin-on-Yarn mode, this will 
result in the worker getting re-assigned to a different node. The proposed 
behavior is similar in flavor to speculative execution modes provided in other 
execution frameworks such as MapReduce. 

This change also provides an example of such an application-level health check 
that arises in the case of Kafka ingestion. 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to