[ https://issues.apache.org/jira/browse/IMPALA-6907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16476755#comment-16476755 ]
ASF subversion and git services commented on IMPALA-6907: --------------------------------------------------------- Commit f40dc5dd4d5e4b6e7c01f078940778fc23e33a8b in impala's branch refs/heads/2.x from [~kwho] [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=f40dc5d ] IMPALA-6907: Close stale connections to removed cluster members Previously, ImpalaServer::MembershipCallback() is used by each Impala backend node to update cluster membership. It also removes stale connections to nodes which are no longer members of the cluster. However, the way it detects removed member is flawed as it relies on query_locations_ to determine whether stale connections may exist to the removed members. query_locations_ is a map of host name to a set of queries running on that host. A entry for a remote node only exists in query_locations_ if an Impalad node has acted as coordinator of a query with fragment instances scheduled to run on that remote node. This change fixes this problem by closing connections to remote hosts which are removed from the cluster regardless of whether it can be found in query_locations_. A new test is added to exercise this path by restarting Impalad backend nodes between queries. Also change impala_cluster.py to use bin/start-impala.sh to start Impala demon instead of directly forking and exec'ing Impalad. This is needed as start-impala.sh sets up the proper Java related environment variables. Change-Id: I41b7297cf665bf291b09b23524d19b1d10ab281d Reviewed-on: http://gerrit.cloudera.org:8080/10327 Reviewed-by: Michael Ho <k...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> > ImpalaServer::MembershipCallback() may not remove all stale connections to > disconnected Impalad nodes > ----------------------------------------------------------------------------------------------------- > > Key: IMPALA-6907 > URL: https://issues.apache.org/jira/browse/IMPALA-6907 > Project: IMPALA > Issue Type: Bug > Components: Distributed Exec > Affects Versions: Impala 2.9.0, Impala 2.10.0, Impala 2.11.0, Impala 3.0, > Impala 2.12.0 > Reporter: Michael Ho > Assignee: Michael Ho > Priority: Major > Fix For: Impala 2.13.0, Impala 3.1.0 > > > Currently, {{ImpalaServer::MembershipCallback()}} will remove stale > connections to hosts which were removed from the cluster membership. > {noformat} > while (loc_entry != query_locations_.end()) { > if (current_membership.find(loc_entry->first) == > current_membership.end()) { > unordered_set<TUniqueId>::const_iterator query_id = > loc_entry->second.begin(); > // Add failed backend locations to all queries that ran on that > backend. > for(; query_id != loc_entry->second.end(); ++query_id) { > vector<TNetworkAddress>& failed_hosts = > queries_to_cancel[*query_id]; > failed_hosts.push_back(loc_entry->first); > } > > exec_env_->impalad_client_cache()->CloseConnections(loc_entry->first); > <<<----- > {noformat} > However, it's relies on checking against {{query_locations_}} which is > populated only when the Impalad node acts as a coordinator and currently > running queries using the disconnected backend. So > {{ImpalaServer::MembershipCallback()}} will not reliably remove stale > connections to hosts removed from cluster. This may cause stale connections > to stay in connection cache for extended period of time, leading to query > failure after the removed hosts rejoined the cluster as the stale connections > are used. > Instead, we should remove stale connections regardless of whether this node > happens to be currently coordinating a query using that backend. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org