Have you ever checked the RS gc log and observed any long ones?

Please also check the load of your RS/DN machine when this
SocketTimeoutException happens, there may be some pause caused by system
halt instead of JVM GC.

Hope this helps.

Best Regards,
Yu

On 17 January 2017 at 13:29, Stack <st...@duboce.net> wrote:

> Timeout after waiting ten seconds to read from HDFS is no fun. Tell us more
> about your HDFS setup. You collect system metrics on disks? Machines are
> healthy all around? How frequent does this occur?
>
> Thanks for the question,
> S
>
> On Thu, Jan 12, 2017 at 10:18 AM, Tulasi Paradarami <
> tulasi.krishn...@gmail.com> wrote:
>
> > Hi,
> >
> > I noticed that Regionservers are raising following exceptions
> > intermittently that is manifesting itself as request timeouts on the
> client
> > side. HDFS is in a healthy state and there are no corrupted blocks (from
> > "hdfs fsck" results). Datanodes were not out of service when this error
> > occurs and GC on datanodes is usually around 0.3sec.
> >
> > Also, when these exceptions occur, HDFS metric "Send Data Packet Blocked
> On
> > Network Average Time" tends to go up.
> >
> > Here are the configured values for some of the relevant parameters:
> > dfs.client.socket-timeout: 10s
> > dfs.datanode.socket.write.timeout: 10s
> > dfs.namenode.avoid.read.stale.datanode: true
> > dfs.namenode.avoid.write.stale.datanode: true
> > dfs.datanode.max.xcievers: 8192
> >
> > Any pointers towards what could be causing these exceptions is
> appreciated.
> > Thanks.
> >
> > CDH 5.7.2
> > HBase 1.2.0
> >
> > ---> Regionserver logs
> >
> > 2017-01-11 19:19:04,940 WARN
> >  [PriorityRpcServer.handler=3,queue=1,port=60020]
> hdfs.BlockReaderFactory:
> > I/O error constructing remote block reader.
> > java.net.SocketTimeoutException: 10000 millis timeout while waiting for
> > channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected local=/datanode3:27094
> > remote=/datanode2:50010]
> >         at
> > org.apache.hadoop.net.SocketIOWithTimeout.doIO(
> > SocketIOWithTimeout.java:164)
> >         ...
> >
> > 2017-01-11 19:19:04,995 WARN
> >  [PriorityRpcServer.handler=11,queue=1,port=60020] hdfs.DFSClient:
> > Connection failure: Failed to connect to /datanode2:50010 for file
> > /hbase/data/default/<table_name>/ec9ca
> > java.net.SocketTimeoutException: 10000 millis timeout while waiting for
> > channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected local=/datanode3:27107
> > remote=/datanode2:50010]
> >         at
> > org.apache.hadoop.net.SocketIOWithTimeout.doIO(
> > SocketIOWithTimeout.java:164)
> >         at
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> >         at
> > org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.
> > readChannelFully(PacketReceiver.java:258)
> >         at
> > org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(
> > PacketReceiver.java:209)
> >         at
> > org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(
> > PacketReceiver.java:171)
> >         at
> > org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.
> > receiveNextPacket(PacketReceiver.java:102)
> >         at
> > org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(
> > RemoteBlockReader2.java:207)
> >         at
> > org.apache.hadoop.hdfs.RemoteBlockReader2.read(
> > RemoteBlockReader2.java:156)
> >         at
> > org.apache.hadoop.hdfs.BlockReaderUtil.readAll(BlockReaderUtil.java:32)
> >         at
> > org.apache.hadoop.hdfs.RemoteBlockReader2.readAll(
> > RemoteBlockReader2.java:386)
> >         at
> > org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(
> > DFSInputStream.java:1193)
> >         at
> > org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(
> > DFSInputStream.java:1112)
> >         at
> > org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1473)
> >         at
> > org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1432)
> >         at
> > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:89)
> >         at
> > org.apache.hadoop.hbase.io.hfile.HFileBlock.positionalReadWithExtra(
> > HFileBlock.java:752)
> >         at
> > org.apache.hadoop.hbase.io.hfile.HFileBlock$
> AbstractFSReader.readAtOffset(
> > HFileBlock.java:1448)
> >         at
> > org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.
> > readBlockDataInternal(HFileBlock.java:1648)
> >         at
> > org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.
> > readBlockData(HFileBlock.java:1532)
> >         at
> > org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(
> > HFileReaderV2.java:445)
> >         at
> > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.
> > loadDataBlockWithScanInfo(HFileBlockIndex.java:261)
> >         at
> > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(
> > HFileReaderV2.java:642)
> >         at
> > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$
> AbstractScannerV2.reseekTo(
> > HFileReaderV2.java:622)
> >         at
> > org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(
> > StoreFileScanner.java:314)
> >         at
> > org.apache.hadoop.hbase.regionserver.StoreFileScanner.
> > reseek(StoreFileScanner.java:226)
> >         at
> > org.apache.hadoop.hbase.regionserver.StoreFileScanner.
> > enforceSeek(StoreFileScanner.java:437)
> >         at
> > org.apache.hadoop.hbase.regionserver.KeyValueHeap.
> > pollRealKV(KeyValueHeap.java:340)
> >         at
> > org.apache.hadoop.hbase.regionserver.KeyValueHeap.
> > generalizedSeek(KeyValueHeap.java:296)
> >         at
> > org.apache.hadoop.hbase.regionserver.KeyValueHeap.
> > requestSeek(KeyValueHeap.java:261)
> >         at
> > org.apache.hadoop.hbase.regionserver.StoreScanner.
> > reseek(StoreScanner.java:806)
> >         at
> > org.apache.hadoop.hbase.regionserver.StoreScanner.
> > seekAsDirection(StoreScanner.java:794)
> >         at
> > org.apache.hadoop.hbase.regionserver.StoreScanner.
> > next(StoreScanner.java:617)
> >         at
> > org.apache.hadoop.hbase.regionserver.KeyValueHeap.
> > next(KeyValueHeap.java:147)
> >         at
> > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.
> > populateResult(HRegion.java:5696)
> >         at
> > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.
> > nextInternal(HRegion.java:5855)
> >         at
> > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(
> > HRegion.java:5633)
> >         at
> > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(
> > HRegion.java:5619)
> >         at
> > org.apache.phoenix.coprocessor.BaseScannerRegionObserver$2.nextRaw(
> > BaseScannerRegionObserver.java:330)
> >         at
> > org.apache.phoenix.iterate.RegionScannerResultIterator.next(
> > RegionScannerResultIterator.java:50)
> >         at
> > org.apache.phoenix.iterate.OrderedResultIterator.getResultIterator(
> > OrderedResultIterator.java:240)
> >         at
> > org.apache.phoenix.iterate.OrderedResultIterator.next(
> > OrderedResultIterator.java:193)
> >         at
> > org.apache.phoenix.coprocessor.ScanRegionObserver.getTopNScanner(
> > ScanRegionObserver.java:245)
> >         at
> > org.apache.phoenix.coprocessor.ScanRegionObserver.doPostScannerOpen(
> > ScanRegionObserver.java:226)
> >         at
> > org.apache.phoenix.coprocessor.BaseScannerRegionObserver.
> postScannerOpen(
> > BaseScannerRegionObserver.java:202)
> >         at
> > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$52.call(
> > RegionCoprocessorHost.java:1334)
> >         at
> > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$
> > RegionOperation.call(RegionCoprocessorHost.java:1673)
> >         at
> > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.
> execOperation(
> > RegionCoprocessorHost.java:1749)
> >         at
> > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.
> > execOperationWithResult(RegionCoprocessorHost.java:1712)
> >         at
> > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.
> > postScannerOpen(RegionCoprocessorHost.java:1329)
> >         at
> > org.apache.hadoop.hbase.regionserver.RSRpcServices.
> > scan(RSRpcServices.java:2408)
> >         at
> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.
> > callBlockingMethod(ClientProtos.java:33648)
> >         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:
> 2170)
> >         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.
> java:109)
> >         at
> > org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(
> RpcExecutor.java:133)
> >         at
> > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> >         at java.lang.Thread.run(Thread.java:745)
> >
> >
> > ---> Datanode logs
> >
> > 2017-01-11 19:19:16,171 INFO  [DataXceiver for client
> > DFSClient_NONMAPREDUCE_-988481032_1 at /datanode4:13495 [Receiving block
> > BP-684272099-namenode-1472795703360:blk_1075014094_1273639]]
> > datanode.DataNode (BlockReceiver.java:receiveBlock(933)) - Exception for
> > BP-684272099-namenode-1472795703360:blk_1075014094_1273639
> > java.io.IOException: Premature EOF from inputStream
> >         at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:201)
> >         ...
> >
> > 2017-01-11 19:19:16,171 INFO  [DataXceiver for client
> > DFSClient_NONMAPREDUCE_-988481032_1 at /datanode4:46727 [Receiving block
> > BP-684272099-namenode-1472795703360:blk_1075014093_1273638]]
> > datanode.DataNode (BlockReceiver.java:receiveBlock(933)) - Exception for
> > BP-684272099-namenode-1472795703360:blk_1075014093_1273638
> > java.io.IOException: Premature EOF from inputStream
> >         at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:201)
> >         at
> > org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(
> > PacketReceiver.java:213)
> >         at
> > org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(
> > PacketReceiver.java:134)
> >         at
> > org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.
> > receiveNextPacket(PacketReceiver.java:109)
> >         at
> > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.
> > receivePacket(BlockReceiver.java:500)
> >         at
> > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.
> > receiveBlock(BlockReceiver.java:894)
> >         at
> > org.apache.hadoop.hdfs.server.datanode.DataXceiver.
> > writeBlock(DataXceiver.java:798)
> >         at
> > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(
> > Receiver.java:169)
> >         at
> > org.apache.hadoop.hdfs.protocol.datatransfer.
> Receiver.processOp(Receiver.
> > java:106)
> >         at
> > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(
> > DataXceiver.java:246)
> >         at java.lang.Thread.run(Thread.java:745)
> >
>

Reply via email to