[ 
https://issues.apache.org/jira/browse/FLUME-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13197634#comment-13197634
 ] 

jeff commented on FLUME-948:
----------------------------

I've looked in the code in 0.9.3 Cloudera, These is some wrong for update 
backoff retry condition in BackOffFailOverSink.java.
When primary collector is halt, the event will be handled by the 
backofFailOverSink as the primary sink is sencondary collector and the 
secondary sink is nullsink
In this tacke, the event will never send to primay sink , because backoffPolicy 
for primary sink didn't reach the retry condition as the backoff time update 
wrong in the code.
                
> [Agent Reliability] In BEChain mode,if primary collector is off,but the 
> secondary collector is on,agent's event send to null sink, instead of 
> seconday collector
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLUME-948
>                 URL: https://issues.apache.org/jira/browse/FLUME-948
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.3
>            Reporter: jeff
>            Priority: Critical
>             Fix For: v0.9.5
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Here is my config for flume. 
> agent:   :  rpcSource(333)| 
> agentDFOChain("192.168.130.15:17876","192.168.130.14:17876") 
> collector:  collector(17876)|myCustomplugin 
> Here is my test case:
> 1. use an rpcClient send one event to agent every munite.
> 2. shutdown the primay collector and sendory collector
> 3. wait about 1.5h, start the sendory collector
> In my expect,  the events received by agent at the posted time of  the 
> secondary recovered, should be send to sendory collector, but in actually, 
> the events just be discard as it be send to the null sink in BEChain.
> Here is my log:[2012-01-20 10:19:00,098] [INFO ] [logicalNode ESC01_agent-24] 
> [com.cloudera.flume.handlers.debug.StubbornAppendSink 76] Append failed 
> java.net.SocketException: No route to host
> [2012-01-20 10:19:00,098] [INFO ] [logicalNode ESC01_agent-24] 
> [com.cloudera.flume.handlers.thrift.ThriftEventSink 89] ThriftEventSink on 
> port 17876 closed
> [2012-01-20 10:19:01,066] [WARN ] [Thread-2] 
> [com.cloudera.flume.agent.MultiMasterRPC 198] Could not connect to any master 
> nodes (tried 1: [192.168.130.13:17872])
> [2012-01-20 10:19:01,067] [INFO ] [Heartbeat] 
> [com.cloudera.flume.agent.MultiMasterRPC 194] MasterRPC called while 
> disconnected.
> [2012-01-20 10:19:03,102] [INFO ] [logicalNode ESC01_agent-24] 
> [com.cloudera.flume.core.BackOffFailOverSink 143] Failed to open thrift event 
> sink at 192.168.130.15:17876 : java.net.NoRouteToHostException: No route to 
> host
> [2012-01-20 10:19:04,070] [WARN ] [Heartbeat] 
> [com.cloudera.flume.agent.MultiMasterRPC 198] Could not connect to any master 
> nodes (tried 1: [192.168.130.13:17872])
> [2012-01-20 10:19:06,070] [INFO ] [Thread-2] 
> [com.cloudera.flume.agent.MultiMasterRPC 194] MasterRPC called while 
> disconnected.
> [2012-01-20 10:19:06,106] [INFO ] [logicalNode ESC01_agent-24] 
> [com.cloudera.flume.core.BackOffFailOverSink 143] Failed to open thrift event 
> sink at 192.168.130.14:17876 : java.net.NoRouteToHostException: No route to 
> host
> [2012-01-20 10:20:39,830] [INFO ] [logicalNode ESC01_agent-24] 
> [com.cloudera.flume.core.BackOffFailOverSink 143] Failed to open thrift event 
> sink at 192.168.130.15:17876 : java.net.NoRouteToHostException: No route to 
> host
> [2012-01-20 10:21:47,997] [INFO ] [Thread-2] 
> [com.cloudera.flume.agent.ThriftMasterRPC 78] Connected to master at 
> 192.168.130.13:17872
> [2012-01-20 10:22:44,874] [INFO ] [logicalNode ESC01_agent-24] 
> [com.cloudera.flume.core.BackOffFailOverSink 143] Failed to open thrift event 
> sink at 192.168.130.15:17876 : java.net.NoRouteToHostException: No route to 
> host
> [2012-01-20 10:23:35,951] [INFO ] [logicalNode ESC01_agent-24] 
> [com.cloudera.flume.core.BackOffFailOverSink 143] Failed to open thrift event 
> sink at 192.168.130.15:17876 : java.net.NoRouteToHostException: No route to 
> host
> [2012-01-20 10:24:40,049] [INFO ] [logicalNode ESC01_agent-24] 
> [com.cloudera.flume.core.BackOffFailOverSink 143] Failed to open thrift event 
> sink at 192.168.130.15:17876 : java.net.NoRouteToHostException: No route to 
> host
> [2012-01-20 10:25:44,139] [INFO ] [logicalNode ESC01_agent-24] 
> [com.cloudera.flume.core.BackOffFailOverSink 143] Failed to open thrift event 
> sink at 192.168.130.15:17876 : java.net.NoRouteToHostException: No route to 
> host
> [2012-01-20 10:27:39,987] [INFO ] [logicalNode ESC01_agent-24] 
> [com.cloudera.flume.core.BackOffFailOverSink 143] Failed to open thrift event 
> sink at 192.168.130.15:17876 : java.net.NoRouteToHostException: No route to 
> host
> [2012-01-20 10:29:39,914] [INFO ] [logicalNode ESC01_agent-24] 
> [com.cloudera.flume.core.BackOffFailOverSink 143] Failed to open thrift event 
> sink at 192.168.130.15:17876 : java.net.NoRouteToHostException: No route to 
> host
> [2012-01-20 10:31:43,054] [INFO ] [logicalNode ESC01_agent-24] 
> [com.cloudera.flume.core.BackOffFailOverSink 143] Failed to open thrift event 
> sink at 192.168.130.15:17876 : java.net.NoRouteToHostException: No route to 
> host

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to