[ https://issues.apache.org/jira/browse/FLUME-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yuyang Gong updated FLUME-3200: ------------------------------- Description: I found some .tmp file on my s3. Sometimes I can find both two file with same content.I need remove the .tmp file. For example: flume.1512259200732.txt.gz.tmp and flume.1512259200732.txt.gz Sometimes I can only find a .tmp file, I need rename it manually. h3. This is the log: 03 Dec 2017 05:28:57,641 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:251) - Creating s3a://mylogs/2017-12-03/05/flume.1512277200060.txt.gz.tmp 03 Dec 2017 05:29:28,119 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.close:393) - Closing s3a://mylogs/2017-12-03/05/flume.1512277200060.txt.gz.tmp 03 Dec 2017 05:29:38,120 WARN [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.close:400) - failed to close() HDFSWriter for file (s3a://mylogs/2017-12-03/05/flume.1512277200060.txt.gz.tmp). Exception follows. java.io.IOException: Callable timed out after 10000 ms on file: s3a://mylogs/2017-12-03/05/flume.1512277200060.txt.gz.tmp at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:715) at org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:397) at org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:319) at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:566) at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:401) at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67) at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145) at java.lang.Thread.run(Thread.java:748) Caused by: java.util.concurrent.TimeoutException at java.util.concurrent.FutureTask.get(FutureTask.java:205) at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:708) ... 7 more h3. This is my sink config: agent.sinks.k1.type = hdfs agent.sinks.k1.channel = c1 agent.sinks.k1.hdfs.path = s3a://mylogs/%Y-%m-%d/%H agent.sinks.k1.hdfs.fileType = CompressedStream agent.sinks.k1.hdfs.codeC = gzip agent.sinks.k1.hdfs.filePrefix = flume agent.sinks.k1.hdfs.fileSuffix = .txt.gz agent.sinks.k1.hdfs.rollSize = 67108864 agent.sinks.k1.hdfs.rollInterval = 300 agent.sinks.k1.hdfs.rollCount = 100000 agent.sinks.k1.hdfs.batchSize = 1000 agent.sinks.k1.hdfs.useLocalTimeStamp = true was: I found some .tmp file on my s3. Sometimes I can find both two file with same content.I need remove the .tmp file. For example: flume.1512259200732.txt.gz.tmp and flume.1512259200732.txt.gz Sometimes I can only find a .tmp file, I need rename it manually. This is my sink config: agent.sinks.k1.type = hdfs agent.sinks.k1.channel = c1 agent.sinks.k1.hdfs.path = s3a://mylogs/%Y-%m-%d/%H agent.sinks.k1.hdfs.fileType = CompressedStream agent.sinks.k1.hdfs.codeC = gzip agent.sinks.k1.hdfs.filePrefix = flume agent.sinks.k1.hdfs.fileSuffix = .txt.gz agent.sinks.k1.hdfs.rollSize = 67108864 agent.sinks.k1.hdfs.rollInterval = 300 agent.sinks.k1.hdfs.rollCount = 100000 agent.sinks.k1.hdfs.batchSize = 1000 agent.sinks.k1.hdfs.useLocalTimeStamp = true > Flume leaves .tmp files in HDFS(AWS s3) when rename timeout. > ------------------------------------------------------------ > > Key: FLUME-3200 > URL: https://issues.apache.org/jira/browse/FLUME-3200 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources > Affects Versions: 1.8.0 > Environment: Ubuntu (AWS EC2) > Reporter: Yuyang Gong > > I found some .tmp file on my s3. > Sometimes I can find both two file with same content.I need remove the .tmp > file. > For example: > flume.1512259200732.txt.gz.tmp and flume.1512259200732.txt.gz > Sometimes I can only find a .tmp file, I need rename it manually. > h3. This is the log: > 03 Dec 2017 05:28:57,641 INFO > [SinkRunner-PollingRunner-DefaultSinkProcessor] > (org.apache.flume.sink.hdfs.BucketWriter.open:251) - Creating > s3a://mylogs/2017-12-03/05/flume.1512277200060.txt.gz.tmp > 03 Dec 2017 05:29:28,119 INFO > [SinkRunner-PollingRunner-DefaultSinkProcessor] > (org.apache.flume.sink.hdfs.BucketWriter.close:393) - Closing > s3a://mylogs/2017-12-03/05/flume.1512277200060.txt.gz.tmp > 03 Dec 2017 05:29:38,120 WARN > [SinkRunner-PollingRunner-DefaultSinkProcessor] > (org.apache.flume.sink.hdfs.BucketWriter.close:400) - failed to close() > HDFSWriter for file > (s3a://mylogs/2017-12-03/05/flume.1512277200060.txt.gz.tmp). Exception > follows. > java.io.IOException: Callable timed out after 10000 ms on file: > s3a://mylogs/2017-12-03/05/flume.1512277200060.txt.gz.tmp > at > org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:715) > at > org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:397) > at > org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:319) > at > org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:566) > at > org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:401) > at > org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67) > at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.util.concurrent.TimeoutException > at java.util.concurrent.FutureTask.get(FutureTask.java:205) > at > org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:708) > ... 7 more > h3. This is my sink config: > agent.sinks.k1.type = hdfs > agent.sinks.k1.channel = c1 > agent.sinks.k1.hdfs.path = s3a://mylogs/%Y-%m-%d/%H > agent.sinks.k1.hdfs.fileType = CompressedStream > agent.sinks.k1.hdfs.codeC = gzip > agent.sinks.k1.hdfs.filePrefix = flume > agent.sinks.k1.hdfs.fileSuffix = .txt.gz > agent.sinks.k1.hdfs.rollSize = 67108864 > agent.sinks.k1.hdfs.rollInterval = 300 > agent.sinks.k1.hdfs.rollCount = 100000 > agent.sinks.k1.hdfs.batchSize = 1000 > agent.sinks.k1.hdfs.useLocalTimeStamp = true -- This message was sent by Atlassian JIRA (v6.4.14#64029)