[ https://issues.apache.org/jira/browse/HADOOP-19061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17813827#comment-17813827 ]
ASF GitHub Bot commented on HADOOP-19061: ----------------------------------------- hadoop-yetus commented on PR #6519: URL: https://github.com/apache/hadoop/pull/6519#issuecomment-1924699634 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |:----:|----------:|--------:|:--------:|:-------:| | +0 :ok: | reexec | 0m 50s | | Docker mode activated. | |||| _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | |||| _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 48m 35s | | trunk passed | | +1 :green_heart: | compile | 18m 9s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 16m 32s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 13s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 36s | | trunk passed | | +1 :green_heart: | javadoc | 1m 13s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 49s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 2m 31s | | trunk passed | | +1 :green_heart: | shadedclient | 39m 42s | | branch has no errors when building and testing our client artifacts. | |||| _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 53s | | the patch passed | | +1 :green_heart: | compile | 19m 50s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 19m 50s | | the patch passed | | +1 :green_heart: | compile | 19m 21s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 19m 21s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 16s | | hadoop-common-project/hadoop-common: The patch generated 0 new + 65 unchanged - 2 fixed = 65 total (was 67) | | +1 :green_heart: | mvnsite | 1m 37s | | the patch passed | | +1 :green_heart: | javadoc | 1m 6s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 49s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 2m 41s | | the patch passed | | +1 :green_heart: | shadedclient | 40m 3s | | patch has no errors when building and testing our client artifacts. | |||| _ Other Tests _ | | +1 :green_heart: | unit | 19m 6s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 58s | | The patch does not generate ASF License warnings. | | | | 242m 6s | | | | Subsystem | Report/Notes | |----------:|:-------------| | Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6519/4/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6519 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 74c0a178fa52 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 805cdfa94572dac8f91ab11fb5684f992a5baf5a | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6519/4/testReport/ | | Max. process+thread count | 1263 (vs. ulimit of 5500) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6519/4/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. > Capture exception in rpcRequestSender.start() in IPC.Connection.run() > --------------------------------------------------------------------- > > Key: HADOOP-19061 > URL: https://issues.apache.org/jira/browse/HADOOP-19061 > Project: Hadoop Common > Issue Type: Bug > Components: ipc > Affects Versions: 3.5.0 > Reporter: Xing Lin > Assignee: Xing Lin > Priority: Major > Labels: pull-request-available > > rpcRequestThread.start() can fail due to OOM. This will immediately crash the > Connection thread, without removing itself from the connections pool. Then > for all following getConnection(remoteid), we will get this bad connection > object and all rpc requests will be hanging, because this is a bad connection > object, without threads being properly running (Neither Connection or > Connection.rpcRequestSender thread is running due to OOM.). > In this PR, we moved the rpcRequestThread.start() to be within the > try{}-catch{} block, to capture OOM from rpcRequestThread.start() and proper > cleaning is followed if we hit OOM. > {code:java} > IPC.Connection.run() > @Override > public void run() { > // Don't start the ipc parameter sending thread until we start this > // thread, because the shutdown logic only gets triggered if this > // thread is started. > rpcRequestThread.start(); > if (LOG.isDebugEnabled()) > LOG.debug(getName() + ": starting, having connections " > + connections.size()); > try { > while (waitForWork()) {//wait here for work - read or close connection > receiveRpcResponse(); > } > } catch (Throwable t) { > // This truly is unexpected, since we catch IOException in > receiveResponse > // -- this is only to be really sure that we don't leave a client > hanging > // forever. > LOG.warn("Unexpected error reading responses on connection " + this, > t); > markClosed(new IOException("Error reading responses", t)); > }{code} > Because there is no rpcRequestSender thread consuming the rpcRequestQueue, > all rpc request enqueue operations for this connection will be blocked and > will be hanging at this while loop forever during sendRpcRequest(). > {code:java} > while (!shouldCloseConnection.get()) { > if (rpcRequestQueue.offer(Pair.of(call, buf), 1, TimeUnit.SECONDS)) { > break; > } > }{code} > OOM exception in starting the rpcRequestSender thread. > {code:java} > Exception in thread "IPC Client (1664093259) connection to > nn01.grid.linkedin.com/IP-Address:portNum from kafkaetl" > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:717) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1034) > {code} > Multiple threads blocked by queue.offer(). and we don't found any "IPC > Client" or "IPC Parameter Sending Thread" in thread dump. > {code:java} > Thread 2156123: (state = BLOCKED) > - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information > may be imprecise) > - java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object, long) > @bci=20, line=215 (Compiled frame) > - > java.util.concurrent.SynchronousQueue$TransferQueue.awaitFulfill(java.util.concurrent.SynchronousQueue$TransferQueue$QNode, > java.lang.Object, boolean, long) @bci=156, line=764 (Compiled frame) > - > java.util.concurrent.SynchronousQueue$TransferQueue.transfer(java.lang.Object, > boolean, long) @bci=148, line=695 (Compiled frame) > - java.util.concurrent.SynchronousQueue.offer(java.lang.Object, long, > java.util.concurrent.TimeUnit) @bci=24, line=895 (Compiled frame) > - > org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(org.apache.hadoop.ipc.Client$Call) > @bci=88, line=1134 (Compiled frame) > - org.apache.hadoop.ipc.Client.call(org.apache.hadoop.ipc.RPC$RpcKind, > org.apache.hadoop.io.Writable, org.apache.hadoop.ipc.Client$ConnectionId, > int, java.util.concurrent.atomic.AtomicBoolean, > org.apache.hadoop.ipc.AlignmentContext) @bci=36, line=1402 (Interpreted frame) > - org.apache.hadoop.ipc.Client.call(org.apache.hadoop.ipc.RPC$RpcKind, > org.apache.hadoop.io.Writable, org.apache.hadoop.ipc.Client$ConnectionId, > java.util.concurrent.atomic.AtomicBoolean, > org.apache.hadoop.ipc.AlignmentContext) @bci=9, line=1349 (Compiled frame) > - org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(java.lang.Object, > java.lang.reflect.Method, java.lang.Object[]) @bci=248, line=230 (Compiled > frame) > - org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(java.lang.Object, > java.lang.reflect.Method, java.lang.Object[]) @bci=4, line=118 (Compiled > frame) > - com.sun.proxy.$Proxy11.getBlockLocations({code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org