[ 
https://issues.apache.org/jira/browse/FLUME-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289193#comment-13289193
 ] 

Juhani Connolly commented on FLUME-1232:
----------------------------------------

It doesn't. I'm pretty sure also the source you use is irrelevant. I switched 
back to using top(though any exec will work) and could still recreate this.

here's a portion of the logs:
2012-06-05 15:13:44,840 (lifecycleSupervisor-1-1) [INFO - 
org.apache.flume.channel.file.Log.lock(Log.java:573)] Cannot lock 
/home/share/juhani_connolly/flume-1.2.0-incubating-SNAPSHOT/check. The 
directory is already locked.
2012-06-05 15:13:44,840 (lifecycleSupervisor-1-1) [WARN - 
org.apache.flume.channel.file.FileChannel.start(FileChannel.java:182)] Starting 
FileChannel failed
java.io.IOException: Cannot lock 
/home/share/juhani_connolly/flume-1.2.0-incubating-SNAPSHOT/check. The 
directory is already locked.
        at org.apache.flume.channel.file.Log.lock(Log.java:574)
        at org.apache.flume.channel.file.Log.<init>(Log.java:95)
        at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:178)
        at 
org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:228)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at 
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
2012-06-05 15:13:44,841 (lifecycleSupervisor-1-1) [ERROR - 
org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:230)]
 Unable to start org.apache.flume.channel.file.FileChannel@3ed02b51 - Exception 
follows.
java.lang.RuntimeException: java.io.IOException: Cannot lock 
/home/share/juhani_connolly/flume-1.2.0-incubating-SNAPSHOT/check. The 
directory is already locked.
        at com.google.common.base.Throwables.propagate(Throwables.java:156)
        at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:183)
        at 
org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:228)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at 
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: Cannot lock 
/home/share/juhani_connolly/flume-1.2.0-incubating-SNAPSHOT/check. The 
directory is already locked.
        at org.apache.flume.channel.file.Log.lock(Log.java:574)
        at org.apache.flume.channel.file.Log.<init>(Log.java:95)
        at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:178)
        ... 10 more


It looks though like one way to recreate this is to kill flume while a replay 
of logs is happening(I have not managed to recreate Will's 3 start pattern):
- Start flume with a file channel, let it write some data
- restart
- stop while it is replaying the data. This will eventually put things into a 
dead state where a kill -9 is necessary
- when you try to restart it, once the replay is done, it is unstartable with 
the above logs. More interestingly, even deleting the in_use lock files does 
not recover the state(deleting the checkpoint and data directories will of 
course work)

I'm pretty sure that when I first hit this it was with a different method but I 
can't figure it out, but this one I can recreate every time.

So any time flume is interrupted when trying to replay, something in the 
process breaks the file channel.
                
> Cannot start agent a 3rd time when using FileChannel
> ----------------------------------------------------
>
>                 Key: FLUME-1232
>                 URL: https://issues.apache.org/jira/browse/FLUME-1232
>             Project: Flume
>          Issue Type: Bug
>          Components: Channel
>    Affects Versions: v1.2.0
>         Environment: RHEL 5.6 64-bit
>            Reporter: Will McQueen
>            Assignee: Arvind Prabhakar
>            Priority: Blocker
>             Fix For: v1.2.0
>
>
> Steps:
> 1) Start clean by wiping-out FileChannel's existing checkpoint dir and data 
> dir
> 2) Configure the agent to use filechannel (type = FILE). THe config file I 
> used is at the end of this text.
> 3) Start the agent, confirm lock files exist in data and checkpoint dirs, 
> stop agent, confirm lock files are remove from data and checkpoint dirs.
> 4) Repeat step 3
> 5) Start the agent. The following exceptions are shown in the logs:
> 2012-05-29 03:15:36,453 DEBUG file.ReplayHandler: record.getTimestamp() = 
> 1338286275813, lastCheckpoint = 1338286279596, fileId = 1, offset = 1924, 
> type = Commit, transaction 1338285619250
> 2012-05-29 03:15:36,453 DEBUG file.ReplayHandler: record.getTimestamp() = 
> 1338286279783, lastCheckpoint = 1338286279596, fileId = 1, offset = 1949, 
> type = Take, transaction 1338285619251
> 2012-05-29 03:15:36,453 DEBUG file.ReplayHandler: record.getTimestamp() = 
> 1338286279784, lastCheckpoint = 1338286279596, fileId = 1, offset = 1980, 
> type = Commit, transaction 1338285619251
> 2012-05-29 03:15:36,453 DEBUG file.ReplayHandler: Processing commit of Take
> 2012-05-29 03:15:36,453 INFO file.ReplayHandler: Replayed 1 from 
> /var/run/flume-ng/.flume/file-channel/data/log-1
> 2012-05-29 03:15:36,453 INFO file.ReplayHandler: Replaying 
> /var/run/flume-ng/.flume/file-channel/data/log-2
> 2012-05-29 03:15:36,454 DEBUG file.ReplayHandler: record.getTimestamp() = 
> 1338286370280, lastCheckpoint = 1338286279596, fileId = 2, offset = 8, type = 
> Take, transaction 1338286369982
> 2012-05-29 03:15:36,454 DEBUG file.ReplayHandler: record.getTimestamp() = 
> 1338286370287, lastCheckpoint = 1338286279596, fileId = 2, offset = 39, type 
> = Commit, transaction 1338286369982
> 2012-05-29 03:15:36,454 DEBUG file.ReplayHandler: Processing commit of Take
> 2012-05-29 03:15:36,454 INFO file.ReplayHandler: Unable to remove 
> FlumeEventPointer [fileID=1, offset=1853] added to pending list
> 2012-05-29 03:15:36,454 INFO file.ReplayHandler: Replayed 1 from 
> /var/run/flume-ng/.flume/file-channel/data/log-2
> 2012-05-29 03:15:36,454 DEBUG file.ReplayHandler: Pending take 
> FlumeEventPointer [fileID=1, offset=1853]
> 2012-05-29 03:15:36,455 ERROR file.Log: Failed to initialize Log
> java.lang.IllegalStateException: Pending takes 1 exist after the end of replay
>         at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>         at 
> org.apache.flume.channel.file.ReplayHandler.replayLog(ReplayHandler.java:137)
>         at org.apache.flume.channel.file.Log.replay(Log.java:205)
>         at 
> org.apache.flume.channel.file.FileChannel.start(FileChannel.java:180)
>         at 
> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:228)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at 
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2012-05-29 03:15:36,457 ERROR lifecycle.LifecycleSupervisor: Unable to start 
> org.apache.flume.channel.file.FileChannel@1ac88440 - Exception follows.
> java.lang.IllegalStateException: Log is closed
>         at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>         at org.apache.flume.channel.file.Log.getFlumeEventQueue(Log.java:226)
>         at 
> org.apache.flume.channel.file.FileChannel.getDepth(FileChannel.java:253)
>         at 
> org.apache.flume.channel.file.FileChannel.start(FileChannel.java:187)
>         at 
> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:228)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at 
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2012-05-29 03:15:36,458 ERROR flume.SinkRunner: Unhandled exception, logging 
> and sleeping for 5000ms
> java.lang.IllegalStateException: Channel closed
>         at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>         at 
> org.apache.flume.channel.file.FileChannel.createTransaction(FileChannel.java:237)
>         at 
> org.apache.flume.channel.BasicChannelSemantics.getTransaction(BasicChannelSemantics.java:118)
>         at org.apache.flume.sink.LoggerSink.process(LoggerSink.java:61)
>         at 
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>         at java.lang.Thread.run(Thread.java:662)
> 2012-05-29 03:15:39,460 INFO file.FileChannel: Starting FileChannel with 
> dataDir [/var/run/flume-ng/.flume/file-channel/data]
> 2012-05-29 03:15:39,460 INFO file.Log: Cannot lock 
> /var/run/flume-ng/.flume/file-channel/checkpoint. The directory is already 
> locked.
> 2012-05-29 03:15:39,461 ERROR lifecycle.LifecycleSupervisor: Unable to start 
> org.apache.flume.channel.file.FileChannel@1ac88440 - Exception follows.
> java.lang.RuntimeException: java.io.IOException: Cannot lock 
> /var/run/flume-ng/.flume/file-channel/checkpoint. The directory is already 
> locked.
>         at com.google.common.base.Throwables.propagate(Throwables.java:156)
>         at 
> org.apache.flume.channel.file.FileChannel.start(FileChannel.java:182)
>         at 
> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:228)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at 
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
> Config file I used (flume.conf):
> agent.channels = c1
> agent.sources = r1
> agent.sinks = k1
> #
> agent.channels.c1.type = FILE
> #
> agent.sources.r1.channels = c1
> agent.sources.r1.type = NETCAT
> agent.sources.r1.bind = 0.0.0.0
> agent.sources.r1.port = 41414
> #
> agent.sinks.k1.channel = c1
> agent.sinks.k1.type = LOGGER

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to