[ https://issues.apache.org/jira/browse/SPARK-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14038525#comment-14038525 ]
chao.wu commented on SPARK-2201: -------------------------------- good idea > Improve FlumeInputDStream's stability > ------------------------------------- > > Key: SPARK-2201 > URL: https://issues.apache.org/jira/browse/SPARK-2201 > Project: Spark > Issue Type: Improvement > Reporter: sunshangchun > > Currently only one flume receiver can work with FlumeInputDStream and I am > willing to do some works to improve it, my ideas are described as follows: > a ip and port denotes a physical host, and a logical host consists of one or > more physical hosts > In our case, spark flume receivers bind themselves to a logical host when > started, and a flume agent get physical hosts and push events to them. > Two classes are introduced, LogicalHostRouter supplies a map between logical > host and physical host, and LogicalHostRouterListener let relation changes > watchable. > Some works need to be done here: > 1. LogicalHostRouter and LogicalHostRouterListener can be implemented by > zookeeper. when physical host started, create tmp node in zk, listeners just > watch those tmp nodes. > 2. when spark FlumeReceivers started, they acquire a physical host > (localhost's ip and an idle port) and register itself to zookeeper. > 3. A new flume sink. In the method of appendEvents, they get physical hosts > and push data to them in a round-robin manner. > Does it a feasible plan? Thanks. -- This message was sent by Atlassian JIRA (v6.2#6252)