[jira] [Updated] (FLUME-3200) Flume leaves .tmp files in HDFS(AWS s3) when rename timeout.

Yuyang Gong (JIRA) Mon, 04 Dec 2017 01:23:28 -0800

     [ 
https://issues.apache.org/jira/browse/FLUME-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yuyang Gong updated FLUME-3200:
-------------------------------
    Description: 
I found some .tmp file on my s3.

Sometimes I can find both two file with same content.I need remove the .tmp 
file.
For example:
flume.1512259200732.txt.gz.tmp and flume.1512259200732.txt.gz

Sometimes I can only find a .tmp file, I need rename it manually.

h3. This is the log:
03 Dec 2017 05:28:57,641 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] 
(org.apache.flume.sink.hdfs.BucketWriter.open:251)  - Creating 
s3a://mylogs/2017-12-03/05/flume.1512277200060.txt.gz.tmp
03 Dec 2017 05:29:28,119 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] 
(org.apache.flume.sink.hdfs.BucketWriter.close:393)  - Closing 
s3a://mylogs/2017-12-03/05/flume.1512277200060.txt.gz.tmp
03 Dec 2017 05:29:38,120 WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor] 
(org.apache.flume.sink.hdfs.BucketWriter.close:400)  - failed to close() 
HDFSWriter for file 
(s3a://mylogs/2017-12-03/05/flume.1512277200060.txt.gz.tmp). Exception follows.
java.io.IOException: Callable timed out after 10000 ms on file: 
s3a://mylogs/2017-12-03/05/flume.1512277200060.txt.gz.tmp
        at 
org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:715)
        at org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:397)
        at org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:319)
        at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:566)
        at 
org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:401)
        at 
org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
        at java.util.concurrent.FutureTask.get(FutureTask.java:205)
        at 
org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:708)
        ... 7 more

h3. This is my sink config:
agent.sinks.k1.type = hdfs
agent.sinks.k1.channel = c1
agent.sinks.k1.hdfs.path = s3a://mylogs/%Y-%m-%d/%H
agent.sinks.k1.hdfs.fileType = CompressedStream
agent.sinks.k1.hdfs.codeC = gzip
agent.sinks.k1.hdfs.filePrefix = flume
agent.sinks.k1.hdfs.fileSuffix = .txt.gz
agent.sinks.k1.hdfs.rollSize = 67108864
agent.sinks.k1.hdfs.rollInterval = 300
agent.sinks.k1.hdfs.rollCount = 100000
agent.sinks.k1.hdfs.batchSize = 1000
agent.sinks.k1.hdfs.useLocalTimeStamp = true

  was:
I found some .tmp file on my s3.

Sometimes I can find both two file with same content.I need remove the .tmp 
file.
For example:
flume.1512259200732.txt.gz.tmp and flume.1512259200732.txt.gz

Sometimes I can only find a .tmp file, I need rename it manually.

This is my sink config:
agent.sinks.k1.type = hdfs
agent.sinks.k1.channel = c1
agent.sinks.k1.hdfs.path = s3a://mylogs/%Y-%m-%d/%H
agent.sinks.k1.hdfs.fileType = CompressedStream
agent.sinks.k1.hdfs.codeC = gzip
agent.sinks.k1.hdfs.filePrefix = flume
agent.sinks.k1.hdfs.fileSuffix = .txt.gz
agent.sinks.k1.hdfs.rollSize = 67108864
agent.sinks.k1.hdfs.rollInterval = 300
agent.sinks.k1.hdfs.rollCount = 100000
agent.sinks.k1.hdfs.batchSize = 1000
agent.sinks.k1.hdfs.useLocalTimeStamp = true


> Flume leaves .tmp files in HDFS(AWS s3) when rename timeout.
> ------------------------------------------------------------
>
>                 Key: FLUME-3200
>                 URL: https://issues.apache.org/jira/browse/FLUME-3200
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: 1.8.0
>         Environment: Ubuntu (AWS EC2)
>            Reporter: Yuyang Gong
>
> I found some .tmp file on my s3.
> Sometimes I can find both two file with same content.I need remove the .tmp 
> file.
> For example:
> flume.1512259200732.txt.gz.tmp and flume.1512259200732.txt.gz
> Sometimes I can only find a .tmp file, I need rename it manually.
> h3. This is the log:
> 03 Dec 2017 05:28:57,641 INFO  
> [SinkRunner-PollingRunner-DefaultSinkProcessor] 
> (org.apache.flume.sink.hdfs.BucketWriter.open:251)  - Creating 
> s3a://mylogs/2017-12-03/05/flume.1512277200060.txt.gz.tmp
> 03 Dec 2017 05:29:28,119 INFO  
> [SinkRunner-PollingRunner-DefaultSinkProcessor] 
> (org.apache.flume.sink.hdfs.BucketWriter.close:393)  - Closing 
> s3a://mylogs/2017-12-03/05/flume.1512277200060.txt.gz.tmp
> 03 Dec 2017 05:29:38,120 WARN  
> [SinkRunner-PollingRunner-DefaultSinkProcessor] 
> (org.apache.flume.sink.hdfs.BucketWriter.close:400)  - failed to close() 
> HDFSWriter for file 
> (s3a://mylogs/2017-12-03/05/flume.1512277200060.txt.gz.tmp). Exception 
> follows.
> java.io.IOException: Callable timed out after 10000 ms on file: 
> s3a://mylogs/2017-12-03/05/flume.1512277200060.txt.gz.tmp
>         at 
> org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:715)
>         at 
> org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:397)
>         at 
> org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:319)
>         at 
> org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:566)
>         at 
> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:401)
>         at 
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.util.concurrent.TimeoutException
>         at java.util.concurrent.FutureTask.get(FutureTask.java:205)
>         at 
> org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:708)
>         ... 7 more
> h3. This is my sink config:
> agent.sinks.k1.type = hdfs
> agent.sinks.k1.channel = c1
> agent.sinks.k1.hdfs.path = s3a://mylogs/%Y-%m-%d/%H
> agent.sinks.k1.hdfs.fileType = CompressedStream
> agent.sinks.k1.hdfs.codeC = gzip
> agent.sinks.k1.hdfs.filePrefix = flume
> agent.sinks.k1.hdfs.fileSuffix = .txt.gz
> agent.sinks.k1.hdfs.rollSize = 67108864
> agent.sinks.k1.hdfs.rollInterval = 300
> agent.sinks.k1.hdfs.rollCount = 100000
> agent.sinks.k1.hdfs.batchSize = 1000
> agent.sinks.k1.hdfs.useLocalTimeStamp = true



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (FLUME-3200) Flume leaves .tmp files in HDFS(AWS s3) when rename timeout.

Reply via email to