[ 
https://issues.apache.org/jira/browse/HADOOP-18324?focusedWorklogId=796822&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796822
 ]

ASF GitHub Bot logged work on HADOOP-18324:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 01/Aug/22 08:48
            Start Date: 01/Aug/22 08:48
    Worklog Time Spent: 10m 
      Work Description: ZanderXu commented on code in PR #4527:
URL: https://github.com/apache/hadoop/pull/4527#discussion_r934277370


##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java:
##########
@@ -1153,9 +1087,51 @@ public void run() {
             + connections.size());
     }
 
+    /**
+     * A thread to write rpc requests to the socket.
+     */
+    private class RpcRequestSender implements Runnable {
+      @Override
+      public void run() {
+        while (!shouldCloseConnection.get()) {
+          ResponseBuffer buf = null;
+          try {
+            Pair<Call, ResponseBuffer> pair = rpcRequestQueue.take();
+            if (shouldCloseConnection.get()) {
+              return;
+            }
+            buf = pair.getRight();
+            synchronized (ipcStreams.out) {
+              if (LOG.isDebugEnabled()) {
+                Call call = pair.getLeft();
+                LOG.debug(getName() + " sending #" + call.id
+                    + " " + call.rpcRequest);

Review Comment:
   Maybe we can use `{}`, such as:
   ```
   LOG.debug("{} sending #{} {}.", getName(), call.id, call.rpcRequest);
   ```



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java:
##########
@@ -1420,6 +1350,7 @@ public void stop() {
     // wake up all connections
     for (Connection conn : connections.values()) {
       conn.interrupt();
+      conn.rpcRequestThread.interrupt();

Review Comment:
   Why don't we use `conn.close()` first? Because it might be blocked by 
`IOUtils.closeStream(ipcStreams)` or `socket.close()`?



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java:
##########
@@ -455,6 +385,10 @@ private class Connection extends Thread {
         Consumer<Connection> removeMethod) {
       this.remoteId = remoteId;
       this.server = remoteId.getAddress();
+      this.rpcRequestThread = new Thread(new RpcRequestSender(),
+          "IPC Parameter Sending Thread for " + remoteId);
+      this.rpcRequestThread.setDaemon(true);
+      this.rpcRequestThread.start();

Review Comment:
   Maybe we should start `rpcRequestThread`  after `setupConnection()`, because 
if we started it here, the socket might not be available.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 796822)
    Time Spent: 2h 50m  (was: 2h 40m)

> Interrupting RPC Client calls can lead to thread exhaustion
> -----------------------------------------------------------
>
>                 Key: HADOOP-18324
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18324
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: 3.4.0, 2.10.2, 3.3.3
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>            Priority: Critical
>              Labels: pull-request-available
>          Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Currently the IPC client creates a boundless number of threads to write the 
> rpc request to the socket. The NameNode uses timeouts on its RPC calls to the 
> Journal Node and a stuck JN will cause the NN to create an infinite set of 
> threads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to