[ https://issues.apache.org/jira/browse/HDFS-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14510716#comment-14510716 ]
Steve Loughran commented on HDFS-8160: -------------------------------------- nothing obvious springs to mind. what happens if you kill that first DN? > Long delays when calling hdfsOpenFile() > --------------------------------------- > > Key: HDFS-8160 > URL: https://issues.apache.org/jira/browse/HDFS-8160 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs > Affects Versions: 2.5.2 > Environment: 3-node Apache Hadoop 2.5.2 cluster running on Ubuntu > 14.04 > dfshealth overview: > Security is off. > Safemode is off. > 8 files and directories, 9 blocks = 17 total filesystem object(s). > Heap Memory used 45.78 MB of 90.5 MB Heap Memory. Max Heap Memory is 889 MB. > Non Heap Memory used 36.3 MB of 70.44 MB Commited Non Heap Memory. Max Non > Heap Memory is 130 MB. > Configured Capacity: 118.02 GB > DFS Used: 2.77 GB > Non DFS Used: 12.19 GB > DFS Remaining: 103.06 GB > DFS Used%: 2.35% > DFS Remaining%: 87.32% > Block Pool Used: 2.77 GB > Block Pool Used%: 2.35% > DataNodes usages% (Min/Median/Max/stdDev): 2.35% / 2.35% / 2.35% / 0.00% > Live Nodes 3 (Decommissioned: 0) > Dead Nodes 0 (Decommissioned: 0) > Decommissioning Nodes 0 > Number of Under-Replicated Blocks 0 > Number of Blocks Pending Deletion 0 > Datanode Information > In operation > Node Last contact Admin State Capacity Used Non DFS Used > Remaining Blocks Block pool used Failed Volumes Version > hadoop252-3 (x.x.x.10:50010) 1 In Service 39.34 GB 944.85 > MB 3.63 GB 34.79 GB 9 944.85 MB (2.35%) 0 2.5.2 > hadoop252-1 (x.x.x.8:50010) 0 In Service 39.34 GB 944.85 > MB 4.94 GB 33.48 GB 9 944.85 MB (2.35%) 0 2.5.2 > hadoop252-2 (x.x.x.9:50010) 1 In Service 39.34 GB 944.85 > MB 3.63 GB 34.79 GB 9 944.85 MB (2.35%) 0 2.5.2 > java version "1.7.0_76" > Java(TM) SE Runtime Environment (build 1.7.0_76-b13) > Java HotSpot(TM) 64-Bit Server VM (build 24.76-b04, mixed mode) > Reporter: Rod > > Calling hdfsOpenFile on a file residing on target 3-node Hadoop cluster > (described in detail in Environment section) blocks for a long time (several > minutes). I've noticed that the delay is related to the size of the target > file. > For example, attempting to hdfsOpenFile() on a file of filesize 852483361 > took 121 seconds, but a file of 15458 took less than a second. > Also, during the long delay, the following stacktrace is routed to standard > out: > 2015-04-16 10:32:13,943 WARN [main] hdfs.BlockReaderFactory > (BlockReaderFactory.java:getRemoteBlockReaderFromTcp(693)) - I/O error > constructing remote block reader. > org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while > waiting for channel to be ready for connect. ch : > java.nio.channels.SocketChannel[connection-pending remote=/10.40.8.10:50010] > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533) > at > org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3101) > at > org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755) > at > org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670) > at > org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337) > at > org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576) > at > org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:854) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:143) > 2015-04-16 10:32:13,946 WARN [main] hdfs.DFSClient > (DFSInputStream.java:blockSeekTo(612)) - Failed to connect to > /10.40.8.10:50010 for block, add to deadNodes and continue. > org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while > waiting for channel to be ready for connect. ch : > java.nio.channels.SocketChannel[connection-pending remote=/10.40.8.10:50010] > org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while > waiting for channel to be ready for connect. ch : > java.nio.channels.SocketChannel[connection-pending remote=/10.40.8.10:50010] > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533) > at > org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3101) > at > org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755) > at > org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670) > at > org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337) > at > org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576) > at > org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:854) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:143) > I have also seen similar delays and stacktrace printout when executing dfs CL > commands on those same files (df -cat, df -tail, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)