Cool. I'll look at making the code change in FlumeUtils and generating a pull request.
As far as the use case, the volume of messages we have is currently about 30 MB per second which may grow to over what a 1 Gbit network adapter can handle. - Christophe On Apr 7, 2014 1:51 PM, "Michael Ernest" <mfern...@cloudera.com> wrote: > I don't see why not. If one were doing something similar with straight > Flume, you'd start an agent on each node you care to receive Avro/RPC > events. In the absence of clearer insight to your use case, I'm puzzling > just a little why it's necessary for each Worker to be its own receiver, > but there's no real objection or concern to fuel the puzzlement, just > curiosity. > > > On Mon, Apr 7, 2014 at 4:16 PM, Christophe Clapp > <christo...@christophe.cc>wrote: > > > Could it be as simple as just changing FlumeUtils to accept a list of > > host/port number pairs to start the RPC servers on? > > > > > > > > On 4/7/14, 12:58 PM, Christophe Clapp wrote: > > > >> Based on the source code here: > >> https://github.com/apache/spark/blob/master/external/ > >> flume/src/main/scala/org/apache/spark/streaming/flume/FlumeUtils.scala > >> > >> It looks like in its current version, FlumeUtils does not support > >> starting an Avro RPC server on more than one worker. > >> > >> - Christophe > >> > >> On 4/7/14, 12:23 PM, Michael Ernest wrote: > >> > >>> You can configure your sinks to write to one or more Avro sources in a > >>> load-balanced configuration. > >>> > >>> https://flume.apache.org/FlumeUserGuide.html#flume-sink-processors > >>> > >>> mfe > >>> > >>> > >>> On Mon, Apr 7, 2014 at 3:19 PM, Christophe Clapp > >>> <christo...@christophe.cc>wrote: > >>> > >>> Hi, > >>>> > >>>> From my testing of Spark Streaming with Flume, it seems that there's > >>>> only > >>>> one of the Spark worker nodes that runs a Flume Avro RPC server to > >>>> receive > >>>> messages at any given time, as opposed to every Spark worker running > an > >>>> Avro RPC server to receive messages. Is this the case? Our use-case > >>>> would > >>>> benefit from balancing the load across Workers because of our volume > of > >>>> messages. We would be using a load balancer in front of the Spark > >>>> workers > >>>> running the Avro RPC servers, essentially round-robinning the messages > >>>> across all of them. > >>>> > >>>> If this is something that is currently not supported, I'd be > interested > >>>> in > >>>> contributing to the code to make it happen. > >>>> > >>>> - Christophe > >>>> > >>>> > >>> > >>> > >> > > > > > -- > Michael Ernest > Sr. Solutions Consultant > West Coast >