Collector failing but status ACTIVE
-----------------------------------
Key: FLUME-1010
URL: https://issues.apache.org/jira/browse/FLUME-1010
Project: Flume
Issue Type: Bug
Components: Build, Node
Affects Versions: v0.9.4
Environment: Debian Squeeze, ubuntu maverick, cloudera package
Reporter: Nicolas Cerveaux
After a certain time I received this error, nothing was written to the log
until I restart the collector:
2012-02-24 16:57:41,553 ERROR com.cloudera.flume.core.connector.DirectDriver:
Exiting driver logicalNode hdflume01.lamp.com-19 in error state CollectorSource
| Collector because null
My problem is the collector server is still marked as active and all the agents
wont failover to the next collector, because the collector itself is still up.
My configurations is the following:
collectorSource
collectorSink("hdfs://hadoop01.lamp.com:54310/logs/%Y/%m/%d/%H/", "-")
collectorSource
collectorSink("hdfs://hadoop01.lamp.com:54310/logs/%Y/%m/%d/%H/", "-")
collectorSource
collectorSink("hdfs://hadoop01.lamp.com:54310/logs/%Y/%m/%d/%H/", "-")
tail("/var/log/nginx/localhost.access.log")
agentE2EChain("hdflume01.lamp.com", "hdflume03.lamp.com")
tail("/var/log/nginx/localhost.access.log")
agentE2EChain("hdflume01.lamp.com", "hdflume03.lamp.com")
I think was able to reproduce this error with two virtual machine when I was
changing the same configuration of the collector to:
collectorSource console
2012-02-25 13:46:17,711 INFO com.cloudera.flume.collector.CollectorSource:
closed
2012-02-25 13:46:18,712 INFO
com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port
35853...
2012-02-25 13:46:18,713 INFO
com.cloudera.flume.handlers.thrift.ThriftEventSource: Queue still has 1
elements ...
2012-02-25 13:46:19,942 WARN com.cloudera.flume.agent.LivenessManager:
Heartbeats are backing up, currently behind by 1 heartbeats
2012-02-25 13:46:24,944 WARN com.cloudera.flume.agent.LivenessManager:
Heartbeats are backing up, currently behind by 2 heartbeats
2012-02-25 13:46:29,946 WARN com.cloudera.flume.agent.LivenessManager:
Heartbeats are backing up, currently behind by 3 heartbeats
2012-02-25 13:46:34,949 WARN com.cloudera.flume.agent.LivenessManager:
Heartbeats are backing up, currently behind by 4 heartbeats
2012-02-25 13:46:39,941 ERROR com.cloudera.flume.agent.LogicalNode: Forcing
driver to exit uncleanly
2012-02-25 13:46:39,942 ERROR
com.cloudera.flume.handlers.thrift.ThriftEventSource: Unexpected interrupt of
close sleep interrupted
java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at
com.cloudera.flume.handlers.thrift.ThriftEventSource.close(ThriftEventSource.java:191)
at
com.cloudera.flume.collector.CollectorSource.close(CollectorSource.java:61)
at
com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:166)
2012-02-25 13:46:39,943 ERROR com.cloudera.flume.core.connector.DirectDriver:
Closing down due to exception during close calls
2012-02-25 13:46:39,943 INFO com.cloudera.flume.core.connector.DirectDriver:
Connector logicalNode gfiles02.lamp.com-18 exited with error: sleep interrupted
java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at
com.cloudera.flume.handlers.thrift.ThriftEventSource.close(ThriftEventSource.java:191)
at
com.cloudera.flume.collector.CollectorSource.close(CollectorSource.java:61)
at
com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:166)
2012-02-25 13:46:39,944 INFO com.cloudera.flume.collector.CollectorSource:
closed
2012-02-25 13:46:39,944 INFO
com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port
35853...
2012-02-25 13:46:39,944 INFO
com.cloudera.flume.handlers.thrift.ThriftEventSource: Queue still has 2
elements ...
2012-02-25 13:46:39,944 INFO com.cloudera.flume.agent.LogicalNode: Node config
successfully set to FlumeConfigData: {srcVer:'Sat Feb 25 13:46:08 PST 2012'
snkVer:'Sat Feb 25 13:46:08 PST 2012' ts='Sat Feb 25 13:46:08 PST 2012'
flowId:'default-flow' source:'collectorSource' sink:'console' }
2012-02-25 13:46:39,957 INFO com.cloudera.flume.collector.CollectorSource:
opened
2012-02-25 13:46:39,988 INFO
com.cloudera.flume.handlers.thrift.ThriftEventSource: Starting blocking thread
pool server on port 35853...
2012-02-25 13:46:39,997 INFO
com.cloudera.flume.handlers.debug.ConsoleEventSink: ConsoleEventSink( debug )
opened
2012-02-25 13:46:49,978 WARN
com.cloudera.flume.handlers.thrift.ThriftEventSource: Close timed out due to no
progress. Closing despite having 2 values still enqueued
2012-02-25 13:46:49,978 INFO com.cloudera.flume.handlers.rolling.RollSink:
closing RollSink
'escapedCustomDfs("hdfs://hh01.hadoop.com:54310/logs/%Y/%m/%d/%H/","-%{rolltag}"
)'
2012-02-25 13:46:49,979 WARN
com.cloudera.flume.handlers.endtoend.AckChecksumChecker: partial acks
abandoned: {20120225-134554606-0800.212691888462127.00000021=1330206354608}
2012-02-25 13:46:49,979 ERROR com.cloudera.flume.core.connector.DirectDriver:
Exiting driver logicalNode gfiles02.lamp.com-18 in error state CollectorSource
| Collector because sleep interrupted
In both case the error state that the node is in error state but they really
are in an active state.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira