[
https://issues.apache.org/jira/browse/FLUME-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yuyang Gong resolved FLUME-3200.
--------------------------------
Resolution: Not A Problem
> Flume leaves .tmp files in HDFS(AWS s3) when rename timeout.
> ------------------------------------------------------------
>
> Key: FLUME-3200
> URL: https://issues.apache.org/jira/browse/FLUME-3200
> Project: Flume
> Issue Type: Bug
> Components: Sinks+Sources
> Affects Versions: 1.8.0
> Environment: Ubuntu (AWS EC2)
> Reporter: Yuyang Gong
>
> I found some .tmp file on my s3.
> Sometimes I can find both two file with same content.I need remove the .tmp
> file.
> For example:
> flume.1512259200732.txt.gz.tmp and flume.1512259200732.txt.gz
> Sometimes I can only find a .tmp file, I need rename it manually.
> h3. This is the log:
> 03 Dec 2017 05:28:57,641 INFO
> [SinkRunner-PollingRunner-DefaultSinkProcessor]
> (org.apache.flume.sink.hdfs.BucketWriter.open:251) - Creating
> s3a://mylogs/2017-12-03/05/flume.1512277200060.txt.gz.tmp
> 03 Dec 2017 05:29:28,119 INFO
> [SinkRunner-PollingRunner-DefaultSinkProcessor]
> (org.apache.flume.sink.hdfs.BucketWriter.close:393) - Closing
> s3a://mylogs/2017-12-03/05/flume.1512277200060.txt.gz.tmp
> 03 Dec 2017 05:29:38,120 WARN
> [SinkRunner-PollingRunner-DefaultSinkProcessor]
> (org.apache.flume.sink.hdfs.BucketWriter.close:400) - failed to close()
> HDFSWriter for file
> (s3a://mylogs/2017-12-03/05/flume.1512277200060.txt.gz.tmp). Exception
> follows.
> java.io.IOException: Callable timed out after 10000 ms on file:
> s3a://mylogs/2017-12-03/05/flume.1512277200060.txt.gz.tmp
> at
> org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:715)
> at
> org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:397)
> at
> org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:319)
> at
> org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:566)
> at
> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:401)
> at
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
> at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.util.concurrent.TimeoutException
> at java.util.concurrent.FutureTask.get(FutureTask.java:205)
> at
> org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:708)
> ... 7 more
> h3. This is my sink config:
> agent.sinks.k1.type = hdfs
> agent.sinks.k1.channel = c1
> agent.sinks.k1.hdfs.path = s3a://mylogs/%Y-%m-%d/%H
> agent.sinks.k1.hdfs.fileType = CompressedStream
> agent.sinks.k1.hdfs.codeC = gzip
> agent.sinks.k1.hdfs.filePrefix = flume
> agent.sinks.k1.hdfs.fileSuffix = .txt.gz
> agent.sinks.k1.hdfs.rollSize = 67108864
> agent.sinks.k1.hdfs.rollInterval = 300
> agent.sinks.k1.hdfs.rollCount = 100000
> agent.sinks.k1.hdfs.batchSize = 1000
> agent.sinks.k1.hdfs.useLocalTimeStamp = true
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)