Attila Magyar created HIVE-23500:
------------------------------------

             Summary: [Kubernetes] Use Extend NodeId for LLAP registration
                 Key: HIVE-23500
                 URL: https://issues.apache.org/jira/browse/HIVE-23500
             Project: Hive
          Issue Type: Bug
          Components: llap
            Reporter: Attila Magyar
            Assignee: Attila Magyar
             Fix For: 4.0.0


In kubernetes environment where pods can have same host name and port, there 
can be situations where node trackers could be retaining old instance of the 
pod in its cache. In case of Hive LLAP, where the llap tez task scheduler 
maintains the membership of nodes based on zookeeper registry events there can 
be cases where NODE_ADDED followed by NODE_REMOVED event could end up removing 
the node/host from node trackers because of stable hostname and service port. 
The NODE_REMOVED event in this case is old stale event of the already dead pod 
but ZK will send only after session timeout (in case of non-graceful shutdown). 
If this sequence of events happen, a node/host is completely lost form the 
schedulers perspective. 

To support this scenario, tez can extend yarn's NodeId to include 
uniqueIdentifier. Llap task scheduler can construct the container object with 
this new NodeId that includes uniqueIdentifier as well so that stale events 
like above will only remove the host/node that matches the old 
uniqueIdentifier. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to