[ https://issues.apache.org/jira/browse/HBASE-28339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andor Molnar resolved HBASE-28339. ---------------------------------- Resolution: Invalid > HBaseReplicationEndpoint creates new ZooKeeper client every time it tries to > reconnect > -------------------------------------------------------------------------------------- > > Key: HBASE-28339 > URL: https://issues.apache.org/jira/browse/HBASE-28339 > Project: HBase > Issue Type: Bug > Components: Replication > Affects Versions: 2.6.0, 2.4.17, 3.0.0-beta-1, 2.5.7, 2.7.0 > Reporter: Andor Molnar > Assignee: Andor Molnar > Priority: Major > > Asbtract base class {{HBaseReplicationEndpoint}} and therefore > {{HBaseInterClusterReplicationEndpoint}} creates new ZooKeeper client > instance every time there's an error occurs in communication and it tries to > reconnect. This was not a problem with ZooKeeper 3.4.x versions, because the > TGT Login thread was a static reference and only created once for all clients > in the same JVM. With the upgrade to ZooKeeper 3.5.x the login thread is > dedicated to the client instance, hence we have a new login thread every time > the replication endpoint reconnects. > {code:java} > /** > * A private method used to re-establish a zookeeper session with a peer > cluster. > */ > protected void reconnect(KeeperException ke) { > if ( > ke instanceof ConnectionLossException || ke instanceof > SessionExpiredException > || ke instanceof AuthFailedException > ) { > String clusterKey = ctx.getPeerConfig().getClusterKey(); > LOG.warn("Lost the ZooKeeper connection for peer " + clusterKey, ke); > try { > reloadZkWatcher(); > } catch (IOException io) { > LOG.warn("Creation of ZookeeperWatcher failed for peer " + clusterKey, > io); > } > } > }{code} > {code:java} > /** > * Closes the current ZKW (if not null) and creates a new one > * @throws IOException If anything goes wrong connecting > */ > synchronized void reloadZkWatcher() throws IOException { > if (zkw != null) zkw.close(); > zkw = new ZKWatcher(ctx.getConfiguration(), "connection to cluster: " + > ctx.getPeerId(), this); > getZkw().registerListener(new PeerRegionServerListener(this)); > } {code} > If the target cluster of replication is unavailable for some reason, the > replication endpoint keeps trying to reconnect to ZooKeeper destroying and > creating new Login threads constantly which will carpet bomb the KDC host > with login requests. > > I'm not sure how to fix this yet, trying to create a unit test first. -- This message was sent by Atlassian Jira (v8.20.10#820010)