[ 
https://issues.apache.org/jira/browse/FLUME-839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13143896#comment-13143896
 ] 

Björn Edström commented on FLUME-839:
-------------------------------------

I haven't been able to investigate this a lot, but my gut feeling is that, in 
the specific ERROR case of FLUME-798, the thrift server is still running 
(failover works when I shut down the broken collector, and that will close the 
TCP connections). Is flume sending messages asynchronously?


                
> Agent chain doesn't switch to a backup collector if the primary one get to 
> ERROR state
> --------------------------------------------------------------------------------------
>
>                 Key: FLUME-839
>                 URL: https://issues.apache.org/jira/browse/FLUME-839
>             Project: Flume
>          Issue Type: Bug
>    Affects Versions: v0.9.4
>            Reporter: Björn Edström
>
> I have a setup like this:
> agent: source | agentDFOChain("collector1:35853", "collector2:35853")
> collector1: collectorSource(35853) | 
> collectorSink("hdfs://namenode.company.net:54310/flume/", "%Y_%m_%d_%H-")
> collector2: collectorSource(35853) | 
> collectorSink("hdfs://namenode.company.net:54310/flume/", "%Y_%m_%d_%H-")
> If both collectors are running, and then collector1 gets into an ERROR state 
> (such as because of the FLUME-798), events are silently dropped. No fail-over 
> takes place to the other node in the chain, and no events are written to disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to