[
https://issues.apache.org/jira/browse/FLUME-839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13143896#comment-13143896
]
Björn Edström commented on FLUME-839:
-------------------------------------
I haven't been able to investigate this a lot, but my gut feeling is that, in
the specific ERROR case of FLUME-798, the thrift server is still running
(failover works when I shut down the broken collector, and that will close the
TCP connections). Is flume sending messages asynchronously?
> Agent chain doesn't switch to a backup collector if the primary one get to
> ERROR state
> --------------------------------------------------------------------------------------
>
> Key: FLUME-839
> URL: https://issues.apache.org/jira/browse/FLUME-839
> Project: Flume
> Issue Type: Bug
> Affects Versions: v0.9.4
> Reporter: Björn Edström
>
> I have a setup like this:
> agent: source | agentDFOChain("collector1:35853", "collector2:35853")
> collector1: collectorSource(35853) |
> collectorSink("hdfs://namenode.company.net:54310/flume/", "%Y_%m_%d_%H-")
> collector2: collectorSource(35853) |
> collectorSink("hdfs://namenode.company.net:54310/flume/", "%Y_%m_%d_%H-")
> If both collectors are running, and then collector1 gets into an ERROR state
> (such as because of the FLUME-798), events are silently dropped. No fail-over
> takes place to the other node in the chain, and no events are written to disk.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira