[
https://issues.apache.org/jira/browse/FLUME-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428433#comment-13428433
]
Hari Shreedharan commented on FLUME-1417:
-----------------------------------------
Brock,
Unfortunately I don't have the log - but here is a brief explanation of the
issue:
A) Checkpoint 1 occurs - files 1 2 and 3 are marked active in the checkpoint.
B) File 1 now has all events taken and committed and refcount down to 0, which
means it is longer relevant and is removed from the queue.
C) Worker removes file 1 due to (B).
D) System shuts down - user kills/system dies anything - flume is shutdown.
E) Flume is restarted and file channel tries to replay from checkpoint, but
can't find file 1 because worker deleted it - throws exception and keeps
channel closed. Channel is unable to start and events are blocked at previous
hop.
To fix this, simply delete the checkpoint. All events from all logs are
replayed - but this takes very long and is not always a possibility.
My proposal is that the worker must not delete files unless a checkpoint
happened.
> File Channel checkpoint can be bad leading to the channel being unable to
> start.
> --------------------------------------------------------------------------------
>
> Key: FLUME-1417
> URL: https://issues.apache.org/jira/browse/FLUME-1417
> Project: Flume
> Issue Type: Bug
> Reporter: Hari Shreedharan
>
> ERROR file.Log: Failed to initialize Log on [channel=file-channel]
> java.lang.NullPointerException
> at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:739)
> at org.apache.flume.channel.file.Log.replay(Log.java:261)
> at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:228)
> at
> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:228)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> ERROR file.FileChannel: Failed to start the file channel
> [channel=file-channel]
> java.lang.NullPointerException
> at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:739)
> at org.apache.flume.channel.file.Log.replay(Log.java:261)
> at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:228)
> at
> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:228)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira