[ 
https://issues.apache.org/jira/browse/IMPALA-6907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16476755#comment-16476755
 ] 

ASF subversion and git services commented on IMPALA-6907:
---------------------------------------------------------

Commit f40dc5dd4d5e4b6e7c01f078940778fc23e33a8b in impala's branch 
refs/heads/2.x from [~kwho]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=f40dc5d ]

IMPALA-6907: Close stale connections to removed cluster members

Previously, ImpalaServer::MembershipCallback() is used by each
Impala backend node to update cluster membership. It also removes
stale connections to nodes which are no longer members of the cluster.
However, the way it detects removed member is flawed as it relies
on query_locations_ to determine whether stale connections may
exist to the removed members. query_locations_ is a map of host
name to a set of queries running on that host. A entry for a remote
node only exists in query_locations_ if an Impalad node has acted
as coordinator of a query with fragment instances scheduled to run
on that remote node.

This change fixes this problem by closing connections to remote
hosts which are removed from the cluster regardless of whether
it can be found in query_locations_. A new test is added to
exercise this path by restarting Impalad backend nodes between
queries. Also change impala_cluster.py to use bin/start-impala.sh
to start Impala demon instead of directly forking and exec'ing
Impalad. This is needed as start-impala.sh sets up the proper
Java related environment variables.

Change-Id: I41b7297cf665bf291b09b23524d19b1d10ab281d
Reviewed-on: http://gerrit.cloudera.org:8080/10327
Reviewed-by: Michael Ho <k...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> ImpalaServer::MembershipCallback() may not remove all stale connections to 
> disconnected Impalad nodes
> -----------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-6907
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6907
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Distributed Exec
>    Affects Versions: Impala 2.9.0, Impala 2.10.0, Impala 2.11.0, Impala 3.0, 
> Impala 2.12.0
>            Reporter: Michael Ho
>            Assignee: Michael Ho
>            Priority: Major
>             Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> Currently, {{ImpalaServer::MembershipCallback()}} will remove stale 
> connections to hosts which were removed from the cluster membership.
> {noformat}
>       while (loc_entry != query_locations_.end()) {
>         if (current_membership.find(loc_entry->first) == 
> current_membership.end()) {
>           unordered_set<TUniqueId>::const_iterator query_id = 
> loc_entry->second.begin();
>           // Add failed backend locations to all queries that ran on that 
> backend.
>           for(; query_id != loc_entry->second.end(); ++query_id) {
>             vector<TNetworkAddress>& failed_hosts = 
> queries_to_cancel[*query_id];
>             failed_hosts.push_back(loc_entry->first);
>           }
>           
> exec_env_->impalad_client_cache()->CloseConnections(loc_entry->first); 
> <<<-----
> {noformat}
> However, it's relies on checking against {{query_locations_}} which is 
> populated only when the Impalad node acts as a coordinator and currently 
> running queries using the disconnected backend. So 
> {{ImpalaServer::MembershipCallback()}} will not reliably remove stale 
> connections to hosts removed from cluster. This may cause stale connections 
> to stay in connection cache for extended period of time, leading to query 
> failure after the removed hosts rejoined the cluster as the stale connections 
> are used.
> Instead, we should remove stale connections regardless of whether this node 
> happens to be currently coordinating a query using that backend.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to