Anes Mukhametov created FLUME-3366: -------------------------------------- Summary: hdfs sink can't rename file after hdfs.DFSClien timeout Key: FLUME-3366 URL: https://issues.apache.org/jira/browse/FLUME-3366 Project: Flume Issue Type: Bug Components: Sinks+Sources Affects Versions: 1.6.0 Environment: cdh 5.16.2
Sink configuration 2hdfs.sinks.scribe2hdfsprd.type = hdfs 2hdfs.sinks.scribe2hdfsprd.channel = kafka 2hdfs.sinks.scribe2hdfsprd.hdfs.path = hdfs://clname/data/logs/ 2hdfs.sinks.scribe2hdfsprd.hdfs.filePrefix = logone-%Y-%m-%d-%H 2hdfs.sinks.scribe2hdfsprd.hdfs.rollSize = 268435456 2hdfs.sinks.scribe2hdfsprd.hdfs.rollInterval = 3600 2hdfs.sinks.scribe2hdfsprd.hdfs.rollCount = 0 2hdfs.sinks.scribe2hdfsprd.hdfs.idleTimeout = 300 2hdfs.sinks.scribe2hdfsprd.hdfs.batchSize = 1000 2hdfs.sinks.scribe2hdfsprd.hdfs.minBlockReplicas = 3 2hdfs.sinks.scribe2hdfsprd.hdfs.threadsPoolSize = 600 2hdfs.sinks.scribe2hdfsprd.hdfs.rollTimerPoolSize = 10 2hdfs.sinks.scribe2hdfsprd.hdfs.callTimeout = 300000 2hdfs.sinks.scribe2hdfsprd.hdfs.fileType = DataStream 2hdfs.sinks.scribe2hdfsprd.hdfs.writeFormat = Text 2hdfs.sinks.scribe2hdfsprd.hdfs.proxyUser = stat 2hdfs.sinks.scribe2hdfsprd.sink.serializer = text 2hdfs.sinks.scribe2hdfsprd.sink.serializer.appendNewline = false Reporter: Anes Mukhametov There seems to be a problem related to hdfs timeouts. {{ 2020-05-03 20:50:39,819 INFO org.apache.flume.sink.hdfs.BucketWriter: Creating hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807.tmp 2020-05-03 21:01:52,208 INFO org.apache.flume.sink.hdfs.BucketWriter: Closing idle bucketWriter hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807.tmp at 1588528912208 2020-05-03 21:01:52,208 INFO org.apache.flume.sink.hdfs.HDFSEventSink: Writer callback called. 2020-05-03 21:01:52,208 INFO org.apache.flume.sink.hdfs.BucketWriter: Closing hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807.tmp 2020-05-03 21:03:07,282 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-1441209260-10.30.0.19-1387400921010:blk_5841169679_1121097069614 java.net.SocketTimeoutException: 75000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.30.3.187:40062 remote=/10.30.3.187:50010] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118) at java.io.FilterInputStream.read(FilterInputStream.java:83) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2354) at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:235) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:1093) 2020-05-03 21:03:07,283 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-1441209260-10.30.0.19-1387400921010:blk_5841169679_1121097069614 in pipeline DatanodeInfoWithStorage[10.30.3.187:50010,DS-306b4cb3-ba0d-4970-80cd-894083759467,DISK], DatanodeInfoWithStorage[10.30.3.196:50010,DS-bdc91c20-e8d0-4d43-a1b0-41cabb8306b3,DISK], DatanodeInfoWithStorage[10.30.3.10:50010,DS-5592761f-80c3-473c-b34b-536d52f3908e,DISK]: bad datanode DatanodeInfoWithStorage[10.30.3.187:50010,DS-306b4cb3-ba0d-4970-80cd-894083759467,DISK] 020-05-03 21:05:22,463 WARN org.apache.flume.sink.hdfs.BucketWriter: Closing file: hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807.tmp failed. Will retry again in 180 seconds. java.io.IOException: All datanodes DatanodeInfoWithStorage[10.30.3.10:50010,DS-5592761f-80c3-473c-b34b-536d52f3908e,DISK] are bad. Aborting... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1512) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1254) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:739) 2020-05-03 21:05:22,474 INFO org.apache.flume.sink.hdfs.BucketWriter: Renaming hdfs://prdhdp/data/mt/stream-flume/gekko_bahromator/prdhdp881.g/gekko_bahromator-2020-05-03-20.1588528239807.tmp to hdfs://prdhdp/data/mt/stream-flume/gekko_bahromator/prdhdp881.g/gekko_bahromator-2020-05-03-20.1588528239807 2020-05-03 21:08:22,466 WARN org.apache.flume.sink.hdfs.BucketWriter: Closing file: hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807.tmp failed. Will retry again in 180 seconds. java.io.IOException: All datanodes DatanodeInfoWithStorage[10.30.3.10:50010,DS-5592761f-80c3-473c-b34b-536d52f3908e,DISK] are bad. Aborting... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1512) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1254) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:739) 2020-05-03 21:11:22,467 WARN org.apache.flume.sink.hdfs.BucketWriter: Closing file: hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807.tmp failed. Will retry again in 180 seconds. java.io.IOException: All datanodes DatanodeInfoWithStorage[10.30.3.10:50010,DS-5592761f-80c3-473c-b34b-536d52f3908e,DISK] are bad. Aborting... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1512) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1254) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:739) ... 2020-05-10 13:14:08,742 WARN org.apache.flume.sink.hdfs.BucketWriter: Closing file: hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807.tmp failed. Will retry again in 180 seconds. java.io.IOException: All datanodes DatanodeInfoWithStorage[10.121.3.10:50010,DS-5592761f-80c3-473c-b34b-536d52f3908e,DISK] are bad. Aborting... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1512) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1254) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:739) }} Flume infinitely tries to close the file. First of all, the file is renamed from .tmp to it's normal name already. But still logged with tmp suffix. {{ -rw-r--r-- 3 flume flume 5054814 2020-05-03 20:50 hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807 }} It is in open state also {{ $ hdfs dfsadmin -listOpenFiles|grep 5-03-20.1588528239807 10.121.3.187 DFSClient_NONMAPREDUCE_833895359_50 /data/logs/logone-2020-05-03-20.1588528239807 }} hdfs fsck -files -blocks -locations on this file runs without any error, but with no any block listed. Could this be a problem -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@flume.apache.org For additional commands, e-mail: issues-h...@flume.apache.org