[ 
https://issues.apache.org/jira/browse/HADOOP-18024?focusedWorklogId=690744&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690744
 ]

ASF GitHub Bot logged work on HADOOP-18024:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 06/Dec/21 00:39
            Start Date: 06/Dec/21 00:39
    Worklog Time Spent: 10m 
      Work Description: hadoop-yetus commented on pull request #3719:
URL: https://github.com/apache/hadoop/pull/3719#issuecomment-986338899


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   0m 44s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 27s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  21m 57s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |  19m  2s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m  9s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 40s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 13s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 44s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 30s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 10s |  |  branch has no errors 
when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 58s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  21m 17s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |  21m 17s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  19m 11s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |  19m 11s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  8s | 
[/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3719/5/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common-project/hadoop-common: The patch generated 1 new + 227 
unchanged - 0 fixed = 228 total (was 227)  |
   | +1 :green_heart: |  mvnsite  |   1m 39s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  9s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 41s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 36s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 21s |  |  patch has no errors 
when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |  17m 30s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 59s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 195m 31s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3719/5/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3719 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 8fc0cd86018e 4.15.0-156-generic #163-Ubuntu SMP Thu Aug 19 
23:31:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / f755f5748fb8f60b84e1b3f3b0277f05b433b54a |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3719/5/testReport/ |
   | Max. process+thread count | 1249 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3719/5/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 690744)
    Time Spent: 2.5h  (was: 2h 20m)

> SocketChannel is not closed when IOException happens in 
> Server$Listener.doAccept
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-18024
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18024
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: 3.2.2
>            Reporter: Haoze Wu
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> This is a follow-up of HADOOP-17552.
> When the symptom described in HADOOP-17552 happens, the client may time out 
> in 2min, according to the default RPC timeout configuration specified in 
> HADOOP-17552. Before this timeout, the client just waits, and does not know 
> this issue happens.
> However, we recently found that actually the client doesn’t need to waste 
> this 2min, and the server’s availability can be also improved. If the 
> IOException happens in line 1402 or 1403 or 1404, we can just close this 
> problematic `SocketChannel` and continue to accept new socket connections. 
> The client side can also be aware of the close socket immediately, instead of 
> waiting 2min.
> The old implementation:
> {code:java}
> //hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
>    public void run() {
>       while (running) {
>         // ...
>         try {
>           // ...
>           while (iter.hasNext()) {
>             // ...
>             try {
>               if (key.isValid()) {
>                 if (key.isAcceptable())
>                   doAccept(key);                              // line 1348
>               }
>             } catch (IOException e) {                         // line 1350
>             }
>             // ...
>           }
>         } catch (OutOfMemoryError e) {
>           // ...
>         } catch (Exception e) {
>           // ...
>         }
>       }
>     } {code}
> {code:java}
> //hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
>     void doAccept(SelectionKey key) throws InterruptedException, IOException, 
>         OutOfMemoryError {
>       ServerSocketChannel server = (ServerSocketChannel) key.channel();
>       SocketChannel channel;
>       while ((channel = server.accept()) != null) {           // line 1400
>         channel.configureBlocking(false);                     // line 1402
>         channel.socket().setTcpNoDelay(tcpNoDelay);           // line 1403
>         channel.socket().setKeepAlive(true);                  // line 1404
>         Reader reader = getReader();
>         Connection c = connectionManager.register(channel,
>             this.listenPort, this.isOnAuxiliaryPort);
>         // If the connectionManager can't take it, close the connection.
>         if (c == null) {
>           if (channel.isOpen()) {
>             IOUtils.cleanup(null, channel);
>           }
>           connectionManager.droppedConnections.getAndIncrement();
>           continue;
>         }
>         key.attach(c);  // so closeCurrentConnection can get the object
>         reader.addConnection(c);
>       }
>     } {code}
>  
> We propose that the following implementation is better:
> {code:java}
>     void doAccept(SelectionKey key) throws InterruptedException, IOException, 
>         OutOfMemoryError {
>       ServerSocketChannel server = (ServerSocketChannel) key.channel();
>       SocketChannel channel;
>       while ((channel = server.accept()) != null) {           // line 1400
>         try {
>           channel.configureBlocking(false);                   // line 1402
>           channel.socket().setTcpNoDelay(tcpNoDelay);         // line 1403
>           channel.socket().setKeepAlive(true);                // line 1404
>         } catch (IOException e) {
>           LOG.warn(...);
>           try {
>             channel.socket().close();
>             channel.close();
>           } catch (IOException ignored) { }
>           continue;
>         }
>         // ...
>       }
>     }{code}
> The advantages include:
>  # {*}In the old implementation{*}, the `ServerSocketChannel` was abandoned 
> due to the single exception in this single `SocketChannel`, because the 
> exception handler is in line 1350. {*}In the new implementation{*}, we use a 
> try-catch to handle the exception in line 1402 or 1403 or 1404, then the 
> `ServerSocketChannel` can continue to accept new connections, and don’t need 
> to go back to the line 1348 in the next while loop in the run method.
>  # {*}In the old implementation{*}, the client (another endpoint of this 
> `SocketChannel`) is not aware of this issue, because the `SocketChannel` is 
> accepted and not closed. {*}In the new implementation{*}, we close the 
> `SocketChannel` when the IOException happens, then the client will 
> immediately get EOF from the socket. Then the client can choose to retry or 
> throw an exception, by the client’s discretion.
>  
> We have confirmed that this patch works as expected, in our local machine.
>  
> This code pattern was adopted by other communities. For example, in Kafka 
> [https://github.com/apache/kafka/blob/23e9818e625976c22fe6d4297a5ab76b01f92ef6/core/src/main/scala/kafka/network/SocketServer.scala#L714-L740]:
> {code:java}
>    /**
>    * Accept a new connection
>    */
>   private def accept(key: SelectionKey): Option[SocketChannel] = {
>     val serverSocketChannel = key.channel().asInstanceOf[ServerSocketChannel]
>     val socketChannel = serverSocketChannel.accept()
>     try {
>       connectionQuotas.inc(endPoint.listenerName, 
> socketChannel.socket.getInetAddress, blockedPercentMeter)
>       configureAcceptedSocketChannel(socketChannel)
>       Some(socketChannel)
>     } catch {
>       case e: TooManyConnectionsException =>
>         info(...)
>         close(endPoint.listenerName, socketChannel)
>         None
>       case e: ConnectionThrottledException =>
>         // ...
>         None
>       case e: IOException =>
>         error(...)
>         close(endPoint.listenerName, socketChannel)
>         None
>     }
>   }
>   /**
>    * Close `channel` and decrement the connection count.
>    */
>   def close(listenerName: ListenerName, channel: SocketChannel): Unit = {
>     if (channel != null) {
>       // ...
>       closeSocket(channel)
>     }
>   }
>   protected def closeSocket(channel: SocketChannel): Unit = {
>     CoreUtils.swallow(channel.socket().close(), this, Level.ERROR)
>     CoreUtils.swallow(channel.close(), this, Level.ERROR)
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to