[ 
https://issues.apache.org/jira/browse/HDFS-14857?focusedWorklogId=716833&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-716833
 ]

ASF GitHub Bot logged work on HDFS-14857:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 28/Jan/22 02:29
            Start Date: 28/Jan/22 02:29
    Worklog Time Spent: 10m 
      Work Description: hadoop-yetus commented on pull request #1480:
URL: https://github.com/apache/hadoop/pull/1480#issuecomment-1023827151


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   1m 10s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  12m 42s |  |  Maven dependency ordering for branch  |
   | -1 :x: |  mvninstall  |  22m  7s | 
[/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-1480/1/artifact/out/branch-mvninstall-root.txt)
 |  root in trunk failed.  |
   | +1 :green_heart: |  compile  |   6m 14s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   5m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 17s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 27s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 46s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   2m  9s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   5m 40s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 52s |  |  branch has no errors 
when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 27s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 15s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m 51s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   5m 51s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m 43s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | -1 :x: |  javac  |   5m 43s | 
[/results-compile-javac-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-1480/1/artifact/out/results-compile-javac-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.txt)
 |  hadoop-hdfs-project-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 
with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 generated 1 new + 
629 unchanged - 1 fixed = 630 total (was 630)  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  4s | 
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-1480/1/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project: The patch generated 7 new + 13 unchanged - 0 fixed = 
20 total (was 13)  |
   | +1 :green_heart: |  mvnsite  |   2m  9s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 30s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   2m  1s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | -1 :x: |  spotbugs  |   2m 35s | 
[/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs-client.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-1480/1/artifact/out/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs-client.html)
 |  hadoop-hdfs-project/hadoop-hdfs-client generated 1 new + 0 unchanged - 0 
fixed = 1 total (was 0)  |
   | +1 :green_heart: |  shadedclient  |  23m  7s |  |  patch has no errors 
when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 22s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  unit  | 417m 43s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-1480/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 43s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 552m 52s |  |  |
   
   
   | Reason | Tests |
   |-------:|:------|
   | SpotBugs | module:hadoop-hdfs-project/hadoop-hdfs-client |
   |  |  Inconsistent synchronization of 
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.currentProxyIndex;
 locked 60% of time  Unsynchronized access at 
ConfiguredFailoverProxyProvider.java:60% of time  Unsynchronized access at 
ConfiguredFailoverProxyProvider.java:[line 68] |
   | Failed junit tests | 
hadoop.hdfs.server.namenode.ha.TestObserverReadProxyProvider |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-1480/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1480 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 4838bf31a4bd 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 97b5632841ca2e0cff5ac02851d667e9b97b6c76 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-1480/1/testReport/ |
   | Max. process+thread count | 2948 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs-client 
hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-1480/1/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 716833)
    Remaining Estimate: 0h
            Time Spent: 10m

> FS operations fail in HA mode: DataNode fails to connect to NameNode
> --------------------------------------------------------------------
>
>                 Key: HDFS-14857
>                 URL: https://issues.apache.org/jira/browse/HDFS-14857
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 3.1.0
>            Reporter: Jeff Saremi
>            Assignee: Jeff Saremi
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> In an HA configuration, if the NameNodes get restarted and if they're 
> assigned new IP addresses, any client FS operation such as a copyFromLocal 
> will fail with a message like the following:
> {{2019-09-12 18:47:30,544 WARN hdfs.DataStreamer: DataStreamer 
> Exceptionorg.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /tmp/init.sh._COPYING_ could only be written to 0 of the 1 minReplication 
> nodes. There are 2 datanode(s) running and 2 node(s) are excluded in this 
> operation.        at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2211)
>  ...}}
>  
> Looking at DataNode's stderr shows the following:
>  * The heartbeat service detects the IP change and recovers (almost)
>  * At this stage, an *hdfs dfsadmin -report* reports all datanodes correctly
>  * Once the write begins, the follwoing exception shows up in the datanode 
> log: *no route to host*
> {{2019-09-12 01:35:11,251 WARN datanode.DataNode: IOException in 
> offerService2019-09-12 01:35:11,251 WARN datanode.DataNode: IOException in 
> offerServicejava.io.EOFException: End of File Exception between local host 
> is: "storage-0-0.storage-0-svc.test.svc.cluster.local/10.244.0.211"; 
> destination host is: "nmnode-0-0.nmnode-0-svc.test.svc.cluster.local":9000; : 
> java.io.EOFException; For more details see:  
> http://wiki.apache.org/hadoop/EOFException at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at 
> org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) at 
> org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:789) at 
> org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1549) at 
> org.apache.hadoop.ipc.Client.call(Client.java:1491) at 
> org.apache.hadoop.ipc.Client.call(Client.java:1388) at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
>  at com.sun.proxy.$Proxy17.sendHeartbeat(Unknown Source) at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:166)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:516)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:646)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:847)
>  at java.lang.Thread.run(Thread.java:748)Caused by: java.io.EOFException at 
> java.io.DataInputStream.readInt(DataInputStream.java:392) at 
> org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1850) at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1183) 
> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1079)}}
> {{2019-09-12 01:41:12,273 WARN ipc.Client: Address change detected. Old: 
> nmnode-0-1.nmnode-0-svc.test.svc.cluster.local/10.244.0.217:9000 New: 
> nmnode-0-1.nmnode-0-svc.test.svc.cluster.local/10.244.0.220:9000}}{{...}}
>  
> {{2019-09-12 01:41:12,482 INFO datanode.DataNode: Block pool 
> BP-930210564-10.244.0.216-1568249865477 (Datanode Uuid 
> 7673ef28-957a-439f-a721-d47a4a6adb7b) service to 
> nmnode-0-1.nmnode-0-svc.test.svc.cluster.local/10.244.0.217:9000 beginning 
> handshake with NN}}
> {{2019-09-12 01:41:12,534 INFO datanode.DataNode: Block pool Block pool 
> BP-930210564-10.244.0.216-1568249865477 (Datanode Uuid 
> 7673ef28-957a-439f-a721-d47a4a6adb7b) service to 
> nmnode-0-1.nmnode-0-svc.test.svc.cluster.local/10.244.0.217:9000 successfully 
> registered with NN}}
>  
> *NOTE*:  See how when the '{{Address change detected' shows up, the printout 
> correctly shows the old and the new address (}}{{10.244.0.220}}{{). 
> }}{{However when the registration with NN is complete, the old IP address is 
> still being printed (}}{{10.244.0.217) which is showing how cached copies of 
> the IP addresses linger on.}}{{}}
>  
> {{And the following is where the actual error happens preventing any writes 
> to FS:}}
>  
> {{2019-09-12 18:45:29,843 INFO retry.RetryInvocationHandler: 
> java.net.NoRouteToHostException: No Route to Host from 
> storage-0-0.storage-0-svc.test.svc.cluster.local/10.244.0.211 to 
> nmnode-0-1.nmnode-0-svc:50200 failed on socket timeout exception: 
> java.net.NoRouteToHostException: No route to host; For more details see: 
> http://wiki.apache.org/hadoop/NoRouteToHost, while invoking 
> InMemoryAliasMapProtocolClientSideTranslatorPB.read over 
> nmnode-0-1.nmnode-0-svc/10.244.0.217:50200 after 3 failover attempts. Trying 
> to failover after sleeping for 4452ms.}}{{}}{{}}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to