[ 
https://issues.apache.org/jira/browse/FLUME-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428433#comment-13428433
 ] 

Hari Shreedharan commented on FLUME-1417:
-----------------------------------------

Brock, 

Unfortunately I don't have the log - but here is a brief explanation of the 
issue:

A) Checkpoint 1 occurs - files 1 2 and 3 are marked active in the checkpoint.
B) File 1 now has all events taken and committed and refcount down to 0, which 
means it is longer relevant and is removed from the queue.
C) Worker removes file 1 due to (B).
D) System shuts down - user kills/system dies anything - flume is shutdown.
E) Flume is restarted and file channel tries to replay from checkpoint, but 
can't find file 1 because worker deleted it - throws exception and keeps 
channel closed. Channel is unable to start and events are blocked at previous 
hop.

To fix this, simply delete the checkpoint. All events from all logs are 
replayed - but this takes very long and is not always a possibility.

My proposal is that the worker must not delete files unless a checkpoint 
happened. 
                
> File Channel checkpoint can be bad leading to the channel being unable to 
> start.
> --------------------------------------------------------------------------------
>
>                 Key: FLUME-1417
>                 URL: https://issues.apache.org/jira/browse/FLUME-1417
>             Project: Flume
>          Issue Type: Bug
>            Reporter: Hari Shreedharan
>
>  ERROR file.Log: Failed to initialize Log on [channel=file-channel]
> java.lang.NullPointerException
>       at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:739)
>       at org.apache.flume.channel.file.Log.replay(Log.java:261)
>       at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:228)
>       at 
> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:228)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>       at 
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>       at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>       at java.lang.Thread.run(Thread.java:619)
> ERROR file.FileChannel: Failed to start the file channel 
> [channel=file-channel]
> java.lang.NullPointerException
>       at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:739)
>       at org.apache.flume.channel.file.Log.replay(Log.java:261)
>       at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:228)
>       at 
> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:228)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>       at 
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>       at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>       at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to