[ https://issues.apache.org/jira/browse/HIVE-23500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Attila Magyar reassigned HIVE-23500: ------------------------------------ > [Kubernetes] Use Extend NodeId for LLAP registration > ---------------------------------------------------- > > Key: HIVE-23500 > URL: https://issues.apache.org/jira/browse/HIVE-23500 > Project: Hive > Issue Type: Bug > Components: llap > Reporter: Attila Magyar > Assignee: Attila Magyar > Priority: Major > Fix For: 4.0.0 > > > In kubernetes environment where pods can have same host name and port, there > can be situations where node trackers could be retaining old instance of the > pod in its cache. In case of Hive LLAP, where the llap tez task scheduler > maintains the membership of nodes based on zookeeper registry events there > can be cases where NODE_ADDED followed by NODE_REMOVED event could end up > removing the node/host from node trackers because of stable hostname and > service port. The NODE_REMOVED event in this case is old stale event of the > already dead pod but ZK will send only after session timeout (in case of > non-graceful shutdown). If this sequence of events happen, a node/host is > completely lost form the schedulers perspective. > To support this scenario, tez can extend yarn's NodeId to include > uniqueIdentifier. Llap task scheduler can construct the container object with > this new NodeId that includes uniqueIdentifier as well so that stale events > like above will only remove the host/node that matches the old > uniqueIdentifier. -- This message was sent by Atlassian Jira (v8.3.4#803005)