[ https://issues.apache.org/jira/browse/HADOOP-18324?focusedWorklogId=796822&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796822 ]
ASF GitHub Bot logged work on HADOOP-18324: ------------------------------------------- Author: ASF GitHub Bot Created on: 01/Aug/22 08:48 Start Date: 01/Aug/22 08:48 Worklog Time Spent: 10m Work Description: ZanderXu commented on code in PR #4527: URL: https://github.com/apache/hadoop/pull/4527#discussion_r934277370 ########## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java: ########## @@ -1153,9 +1087,51 @@ public void run() { + connections.size()); } + /** + * A thread to write rpc requests to the socket. + */ + private class RpcRequestSender implements Runnable { + @Override + public void run() { + while (!shouldCloseConnection.get()) { + ResponseBuffer buf = null; + try { + Pair<Call, ResponseBuffer> pair = rpcRequestQueue.take(); + if (shouldCloseConnection.get()) { + return; + } + buf = pair.getRight(); + synchronized (ipcStreams.out) { + if (LOG.isDebugEnabled()) { + Call call = pair.getLeft(); + LOG.debug(getName() + " sending #" + call.id + + " " + call.rpcRequest); Review Comment: Maybe we can use `{}`, such as: ``` LOG.debug("{} sending #{} {}.", getName(), call.id, call.rpcRequest); ``` ########## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java: ########## @@ -1420,6 +1350,7 @@ public void stop() { // wake up all connections for (Connection conn : connections.values()) { conn.interrupt(); + conn.rpcRequestThread.interrupt(); Review Comment: Why don't we use `conn.close()` first? Because it might be blocked by `IOUtils.closeStream(ipcStreams)` or `socket.close()`? ########## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java: ########## @@ -455,6 +385,10 @@ private class Connection extends Thread { Consumer<Connection> removeMethod) { this.remoteId = remoteId; this.server = remoteId.getAddress(); + this.rpcRequestThread = new Thread(new RpcRequestSender(), + "IPC Parameter Sending Thread for " + remoteId); + this.rpcRequestThread.setDaemon(true); + this.rpcRequestThread.start(); Review Comment: Maybe we should start `rpcRequestThread` after `setupConnection()`, because if we started it here, the socket might not be available. Issue Time Tracking ------------------- Worklog Id: (was: 796822) Time Spent: 2h 50m (was: 2h 40m) > Interrupting RPC Client calls can lead to thread exhaustion > ----------------------------------------------------------- > > Key: HADOOP-18324 > URL: https://issues.apache.org/jira/browse/HADOOP-18324 > Project: Hadoop Common > Issue Type: Bug > Components: ipc > Affects Versions: 3.4.0, 2.10.2, 3.3.3 > Reporter: Owen O'Malley > Assignee: Owen O'Malley > Priority: Critical > Labels: pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > > Currently the IPC client creates a boundless number of threads to write the > rpc request to the socket. The NameNode uses timeouts on its RPC calls to the > Journal Node and a stuck JN will cause the NN to create an infinite set of > threads. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org