[ https://issues.apache.org/jira/browse/HDFS-17030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17727322#comment-17727322 ]
ASF GitHub Bot commented on HDFS-17030: --------------------------------------- hadoop-yetus commented on PR #5700: URL: https://github.com/apache/hadoop/pull/5700#issuecomment-1567745460 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |:----:|----------:|--------:|:--------:|:-------:| | +0 :ok: | reexec | 0m 35s | | Docker mode activated. | |||| _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | |||| _ trunk Compile Tests _ | | +0 :ok: | mvndep | 18m 37s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 19m 45s | | trunk passed | | +1 :green_heart: | compile | 5m 15s | | trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | compile | 5m 5s | | trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | checkstyle | 1m 19s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 8s | | trunk passed | | +1 :green_heart: | javadoc | 1m 49s | | trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javadoc | 2m 19s | | trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | spotbugs | 5m 44s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 11s | | branch has no errors when building and testing our client artifacts. | |||| _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 55s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 48s | | the patch passed | | +1 :green_heart: | compile | 5m 7s | | the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javac | 5m 7s | | the patch passed | | +1 :green_heart: | compile | 4m 53s | | the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | javac | 4m 53s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 6s | [/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5700/1/artifact/out/results-checkstyle-hadoop-hdfs-project.txt) | hadoop-hdfs-project: The patch generated 6 new + 1 unchanged - 0 fixed = 7 total (was 1) | | +1 :green_heart: | mvnsite | 1m 56s | | the patch passed | | +1 :green_heart: | javadoc | 1m 27s | | the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javadoc | 2m 0s | | the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | -1 :x: | spotbugs | 2m 30s | [/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs-client.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5700/1/artifact/out/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs-client.html) | hadoop-hdfs-project/hadoop-hdfs-client generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) | | +1 :green_heart: | shadedclient | 22m 30s | | patch has no errors when building and testing our client artifacts. | |||| _ Other Tests _ | | +1 :green_heart: | unit | 2m 22s | | hadoop-hdfs-client in the patch passed. | | +1 :green_heart: | unit | 202m 16s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 50s | | The patch does not generate ASF License warnings. | | | | 337m 44s | | | | Reason | Tests | |-------:|:------| | SpotBugs | module:hadoop-hdfs-project/hadoop-hdfs-client | | | Write to static field org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider.LOG from instance method new org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider(Configuration, URI, Class, HAProxyFactory, Logger) At ObserverReadProxyProvider.java:from instance method new org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider(Configuration, URI, Class, HAProxyFactory, Logger) At ObserverReadProxyProvider.java:[line 258] | | Subsystem | Report/Notes | |----------:|:-------------| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5700/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5700 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 78a46b74b35d 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 4fb97ce7da94b8ac92aba29d155970e3ff012e47 | | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5700/1/testReport/ | | Max. process+thread count | 3015 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5700/1/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. > Limit wait time for getHAServiceState in ObserverReaderProxy > ------------------------------------------------------------ > > Key: HDFS-17030 > URL: https://issues.apache.org/jira/browse/HDFS-17030 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs > Affects Versions: 3.4.0 > Reporter: Xing Lin > Assignee: Xing Lin > Priority: Minor > Labels: pull-request-available > > When namenode HA is enabled and a standby NN is not responsible, we have > observed it would take a long time to serve a request, even though we have a > healthy observer or active NN. > Basically, when a standby is down, the RPC client would (re)try to connect > that standby for _ipc.client.connect.timeout_ _* > ipc.client.connect.max.retries.on.timeouts_ before giving up. When we take a > heap dump at a standby, the NN still accepts the socket connection but it > won't send responses to these RPC requests and we would timeout after > _ipc.client.rpc-timeout.ms._ This adds a significantly latency. For clusters > at Linkedin, we set _ipc.client.rpc-timeout.ms_ to 120 seconds and thus a > request would need to take more than 2 mins to complete when we take a heap > dump at a standby. This has been causing user job failures. > We could set _ipc.client.rpc-timeout.ms to_ a smaller value when sending > getHAServiceState requests in ObserverReaderProxy (for user rpc requests, we > still use the original value from the config). However, that would double the > socket connection between clients and the NN. > The proposal is to add a timeout on getHAServiceState() calls in > ObserverReaderProxy and we will only wait for the timeout for an NN to > respond its HA state. Once we pass that timeout, we will move on to the next > NN. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org