[ https://issues.apache.org/jira/browse/SPARK-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14054442#comment-14054442 ]
sunshangchun edited comment on SPARK-2201 at 7/8/14 11:12 AM: -------------------------------------------------------------- I don't think it's a problem. 1. It's a external module and take no effect on spark core module. 2. spark core module has already used zookeeper to select primary master. 3. The changes is backward compatibility, set a host and port as the flume receiver still works. Thanks was (Author: joyyoj): I don't like it's a problem. It's a external module and take no effect on spark core module. Again, spark core module has already used zookeeper to select leader master. Thanks > Improve FlumeInputDStream's stability and make it scalable > ---------------------------------------------------------- > > Key: SPARK-2201 > URL: https://issues.apache.org/jira/browse/SPARK-2201 > Project: Spark > Issue Type: Improvement > Reporter: sunshangchun > > Currently: > FlumeUtils.createStream(ssc, "localhost", port); > This means that only one flume receiver can work with FlumeInputDStream .so > the solution is not scalable. > I use a zookeeper to solve this problem. > Spark flume receivers register themselves to a zk path when started, and a > flume agent get physical hosts and push events to them. > Some works need to be done here: > 1.receiver create tmp node in zk, listeners just watch those tmp nodes. > 2. when spark FlumeReceivers started, they acquire a physical host > (localhost's ip and an idle port) and register itself to zookeeper. > 3. A new flume sink. In the method of appendEvents, they get physical hosts > and push data to them in a round-robin manner. -- This message was sent by Atlassian JIRA (v6.2#6252)