[ 
https://issues.apache.org/jira/browse/HADOOP-19061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17813827#comment-17813827
 ] 

ASF GitHub Bot commented on HADOOP-19061:
-----------------------------------------

hadoop-yetus commented on PR #6519:
URL: https://github.com/apache/hadoop/pull/6519#issuecomment-1924699634

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   0m 50s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
   |||| _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  48m 35s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  18m  9s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |  16m 32s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m 13s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 36s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 13s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 49s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   2m 31s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  39m 42s |  |  branch has no errors 
when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 53s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  19m 50s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |  19m 50s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  19m 21s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |  19m 21s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m 16s |  |  
hadoop-common-project/hadoop-common: The patch generated 0 new + 65 unchanged - 
2 fixed = 65 total (was 67)  |
   | +1 :green_heart: |  mvnsite  |   1m 37s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  6s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 49s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   2m 41s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  40m  3s |  |  patch has no errors 
when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |  19m  6s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 58s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 242m  6s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6519/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6519 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 74c0a178fa52 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 805cdfa94572dac8f91ab11fb5684f992a5baf5a |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6519/4/testReport/ |
   | Max. process+thread count | 1263 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6519/4/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Capture exception in rpcRequestSender.start() in IPC.Connection.run()
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-19061
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19061
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: 3.5.0
>            Reporter: Xing Lin
>            Assignee: Xing Lin
>            Priority: Major
>              Labels: pull-request-available
>
> rpcRequestThread.start() can fail due to OOM. This will immediately crash the 
> Connection thread, without removing itself from the connections pool. Then 
> for all following getConnection(remoteid), we will get this bad connection 
> object and all rpc requests will be hanging, because this is a bad connection 
> object, without threads being properly running (Neither Connection or 
> Connection.rpcRequestSender thread is running due to OOM.).
> In this PR, we moved the rpcRequestThread.start() to be within the 
> try{}-catch{} block, to capture OOM from rpcRequestThread.start() and proper 
> cleaning is followed if we hit OOM.
> {code:java}
> IPC.Connection.run()
>   @Override
>     public void run() {
>       // Don't start the ipc parameter sending thread until we start this
>       // thread, because the shutdown logic only gets triggered if this
>       // thread is started.
>       rpcRequestThread.start();
>       if (LOG.isDebugEnabled())
>         LOG.debug(getName() + ": starting, having connections " 
>             + connections.size());      
>       try {
>         while (waitForWork()) {//wait here for work - read or close connection
>           receiveRpcResponse();
>         }
>       } catch (Throwable t) {
>         // This truly is unexpected, since we catch IOException in 
> receiveResponse
>         // -- this is only to be really sure that we don't leave a client 
> hanging
>         // forever.
>         LOG.warn("Unexpected error reading responses on connection " + this, 
> t);
>         markClosed(new IOException("Error reading responses", t));
>       }{code}
> Because there is no rpcRequestSender thread consuming the rpcRequestQueue, 
> all rpc request enqueue operations for this connection will be blocked and 
> will be hanging at this while loop forever during sendRpcRequest().
> {code:java}
> while (!shouldCloseConnection.get()) {
>   if (rpcRequestQueue.offer(Pair.of(call, buf), 1, TimeUnit.SECONDS)) {
>     break;
>   }
> }{code}
> OOM exception in starting the rpcRequestSender thread.
> {code:java}
> Exception in thread "IPC Client (1664093259) connection to 
> nn01.grid.linkedin.com/IP-Address:portNum from kafkaetl" 
> java.lang.OutOfMemoryError: unable to create new native thread
>       at java.lang.Thread.start0(Native Method)
>       at java.lang.Thread.start(Thread.java:717)
>       at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1034)
> {code}
> Multiple threads blocked by queue.offer(). and we don't found any "IPC 
> Client" or "IPC Parameter Sending Thread" in thread dump. 
> {code:java}
> Thread 2156123: (state = BLOCKED)
>  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information 
> may be imprecise)
>  - java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object, long) 
> @bci=20, line=215 (Compiled frame)
>  - 
> java.util.concurrent.SynchronousQueue$TransferQueue.awaitFulfill(java.util.concurrent.SynchronousQueue$TransferQueue$QNode,
>  java.lang.Object, boolean, long) @bci=156, line=764 (Compiled frame)
>  - 
> java.util.concurrent.SynchronousQueue$TransferQueue.transfer(java.lang.Object,
>  boolean, long) @bci=148, line=695 (Compiled frame)
>  - java.util.concurrent.SynchronousQueue.offer(java.lang.Object, long, 
> java.util.concurrent.TimeUnit) @bci=24, line=895 (Compiled frame)
>  - 
> org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(org.apache.hadoop.ipc.Client$Call)
>  @bci=88, line=1134 (Compiled frame)
>  - org.apache.hadoop.ipc.Client.call(org.apache.hadoop.ipc.RPC$RpcKind, 
> org.apache.hadoop.io.Writable, org.apache.hadoop.ipc.Client$ConnectionId, 
> int, java.util.concurrent.atomic.AtomicBoolean, 
> org.apache.hadoop.ipc.AlignmentContext) @bci=36, line=1402 (Interpreted frame)
>  - org.apache.hadoop.ipc.Client.call(org.apache.hadoop.ipc.RPC$RpcKind, 
> org.apache.hadoop.io.Writable, org.apache.hadoop.ipc.Client$ConnectionId, 
> java.util.concurrent.atomic.AtomicBoolean, 
> org.apache.hadoop.ipc.AlignmentContext) @bci=9, line=1349 (Compiled frame)
>  - org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(java.lang.Object, 
> java.lang.reflect.Method, java.lang.Object[]) @bci=248, line=230 (Compiled 
> frame)
>  - org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(java.lang.Object, 
> java.lang.reflect.Method, java.lang.Object[]) @bci=4, line=118 (Compiled 
> frame)
>  - com.sun.proxy.$Proxy11.getBlockLocations({code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to