[ 
https://issues.apache.org/jira/browse/SPARK-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14038525#comment-14038525
 ] 

chao.wu commented on SPARK-2201:
--------------------------------

good idea

> Improve FlumeInputDStream's stability
> -------------------------------------
>
>                 Key: SPARK-2201
>                 URL: https://issues.apache.org/jira/browse/SPARK-2201
>             Project: Spark
>          Issue Type: Improvement
>            Reporter: sunshangchun
>
> Currently only one flume receiver can work with FlumeInputDStream and I am 
> willing to do some works to improve it, my ideas are described as follows: 
> a ip and port denotes a physical host, and a logical host consists of one or 
> more physical hosts
> In our case, spark flume receivers bind themselves to a logical host when 
> started, and a flume agent get physical hosts and push events to them.
> Two classes are introduced, LogicalHostRouter supplies a map between logical 
> host and physical host, and LogicalHostRouterListener let relation changes 
> watchable.
> Some works need to be done here: 
> 1. LogicalHostRouter and LogicalHostRouterListener  can be implemented by 
> zookeeper. when physical host started, create tmp node in zk,  listeners just 
> watch those tmp nodes.
> 2. when spark FlumeReceivers started, they acquire a physical host 
> (localhost's ip and an idle port) and register itself to zookeeper.
> 3. A new flume sink. In the method of appendEvents, they get physical hosts 
> and push data to them in a round-robin manner.
> Does it a feasible plan? Thanks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to