[ 
https://issues.apache.org/jira/browse/HDFS-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504623#comment-14504623
 ] 

Steve Loughran commented on HDFS-8160:
--------------------------------------

it ultimately worked as after timing out, the DFS client tried a different host.

what may be happening is that the datanodes are reporting in as healthy, but 
the address they publish for clients to get that data isn't accessible. Wrong 
hostname or firewalls being the common causes; network & routing problems 
another

try a telnet to the hostname & port listed, from the machine that isn't able to 
connect, and see what happens

> Long delays when calling hdfsOpenFile()
> ---------------------------------------
>
>                 Key: HDFS-8160
>                 URL: https://issues.apache.org/jira/browse/HDFS-8160
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: libhdfs
>    Affects Versions: 2.5.2
>         Environment: 3-node Apache Hadoop 2.5.2 cluster running on Ubuntu 
> 14.04 
> dfshealth overview:
> Security is off.
> Safemode is off.
> 8 files and directories, 9 blocks = 17 total filesystem object(s).
> Heap Memory used 45.78 MB of 90.5 MB Heap Memory. Max Heap Memory is 889 MB.
> Non Heap Memory used 36.3 MB of 70.44 MB Commited Non Heap Memory. Max Non 
> Heap Memory is 130 MB.
> Configured Capacity:  118.02 GB
> DFS Used:     2.77 GB
> Non DFS Used: 12.19 GB
> DFS Remaining:        103.06 GB
> DFS Used%:    2.35%
> DFS Remaining%:       87.32%
> Block Pool Used:      2.77 GB
> Block Pool Used%:     2.35%
> DataNodes usages% (Min/Median/Max/stdDev):    2.35% / 2.35% / 2.35% / 0.00%
> Live Nodes    3 (Decommissioned: 0)
> Dead Nodes    0 (Decommissioned: 0)
> Decommissioning Nodes 0
> Number of Under-Replicated Blocks     0
> Number of Blocks Pending Deletion     0
> Datanode Information
> In operation
> Node  Last contact    Admin State     Capacity        Used    Non DFS Used    
> Remaining       Blocks  Block pool used Failed Volumes  Version
> hadoop252-3 (x.x.x.10:50010)  1       In Service      39.34 GB        944.85 
> MB       3.63 GB 34.79 GB        9       944.85 MB (2.35%)       0       2.5.2
> hadoop252-1 (x.x.x.8:50010)   0       In Service      39.34 GB        944.85 
> MB       4.94 GB 33.48 GB        9       944.85 MB (2.35%)       0       2.5.2
> hadoop252-2 (x.x.x.9:50010)   1       In Service      39.34 GB        944.85 
> MB       3.63 GB 34.79 GB        9       944.85 MB (2.35%)       0       2.5.2
> java version "1.7.0_76"
> Java(TM) SE Runtime Environment (build 1.7.0_76-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.76-b04, mixed mode)
>            Reporter: Rod
>
> Calling hdfsOpenFile on a file residing on target 3-node Hadoop cluster 
> (described in detail in Environment section) blocks for a long time (several 
> minutes).  I've noticed that the delay is related to the size of the target 
> file. 
> For example, attempting to hdfsOpenFile() on a file of filesize 852483361 
> took 121 seconds, but a file of 15458 took less than a second.
> Also, during the long delay, the following stacktrace is routed to standard 
> out:
> 2015-04-16 10:32:13,943 WARN  [main] hdfs.BlockReaderFactory 
> (BlockReaderFactory.java:getRemoteBlockReaderFromTcp(693)) - I/O error 
> constructing remote block reader.
> org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while 
> waiting for channel to be ready for connect. ch : 
> java.nio.channels.SocketChannel[connection-pending remote=/10.40.8.10:50010]
>       at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533)
>       at 
> org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3101)
>       at 
> org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755)
>       at 
> org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670)
>       at 
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337)
>       at 
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576)
>       at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800)
>       at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:854)
>       at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:143)
> 2015-04-16 10:32:13,946 WARN  [main] hdfs.DFSClient 
> (DFSInputStream.java:blockSeekTo(612)) - Failed to connect to 
> /10.40.8.10:50010 for block, add to deadNodes and continue. 
> org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while 
> waiting for channel to be ready for connect. ch : 
> java.nio.channels.SocketChannel[connection-pending remote=/10.40.8.10:50010]
> org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while 
> waiting for channel to be ready for connect. ch : 
> java.nio.channels.SocketChannel[connection-pending remote=/10.40.8.10:50010]
>       at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533)
>       at 
> org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3101)
>       at 
> org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755)
>       at 
> org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670)
>       at 
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337)
>       at 
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576)
>       at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800)
>       at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:854)
>       at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:143)
> I have also seen similar delays and stacktrace printout when executing dfs CL 
> commands on those same files (df -cat, df -tail, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to