Re: Spark Streaming and Flume Avro RPC Servers

Christophe Clapp Mon, 07 Apr 2014 14:45:28 -0700

Cool. I'll look at making the code change in FlumeUtils and generating a
pull request.


As far as the use case, the volume of messages we have is currently about
30 MB per second which may grow to over what a 1 Gbit network adapter can
handle.

- Christophe
On Apr 7, 2014 1:51 PM, "Michael Ernest" <mfern...@cloudera.com> wrote:

> I don't see why not. If one were doing something similar with straight
> Flume, you'd start an agent on each node you care to receive Avro/RPC
> events. In the absence of clearer insight to your use case, I'm puzzling
> just a little why it's necessary for each Worker to be its own receiver,
> but there's no real objection or concern to fuel the puzzlement, just
> curiosity.
>
>
> On Mon, Apr 7, 2014 at 4:16 PM, Christophe Clapp
> <christo...@christophe.cc>wrote:
>
> > Could it be as simple as just changing FlumeUtils to accept a list of
> > host/port number pairs to start the RPC servers on?
> >
> >
> >
> > On 4/7/14, 12:58 PM, Christophe Clapp wrote:
> >
> >> Based on the source code here:
> >> https://github.com/apache/spark/blob/master/external/
> >> flume/src/main/scala/org/apache/spark/streaming/flume/FlumeUtils.scala
> >>
> >> It looks like in its current version, FlumeUtils does not support
> >> starting an Avro RPC server on more than one worker.
> >>
> >> - Christophe
> >>
> >> On 4/7/14, 12:23 PM, Michael Ernest wrote:
> >>
> >>> You can configure your sinks to write to one or more Avro sources in a
> >>> load-balanced configuration.
> >>>
> >>> https://flume.apache.org/FlumeUserGuide.html#flume-sink-processors
> >>>
> >>> mfe
> >>>
> >>>
> >>> On Mon, Apr 7, 2014 at 3:19 PM, Christophe Clapp
> >>> <christo...@christophe.cc>wrote:
> >>>
> >>>  Hi,
> >>>>
> >>>>  From my testing of Spark Streaming with Flume, it seems that there's
> >>>> only
> >>>> one of the Spark worker nodes that runs a Flume Avro RPC server to
> >>>> receive
> >>>> messages at any given time, as opposed to every Spark worker running
> an
> >>>> Avro RPC server to receive messages. Is this the case? Our use-case
> >>>> would
> >>>> benefit from balancing the load across Workers because of our volume
> of
> >>>> messages. We would be using a load balancer in front of the Spark
> >>>> workers
> >>>> running the Avro RPC servers, essentially round-robinning the messages
> >>>> across all of them.
> >>>>
> >>>> If this is something that is currently not supported, I'd be
> interested
> >>>> in
> >>>> contributing to the code to make it happen.
> >>>>
> >>>> - Christophe
> >>>>
> >>>>
> >>>
> >>>
> >>
> >
>
>
> --
> Michael Ernest
> Sr. Solutions Consultant
> West Coast
>

Re: Spark Streaming and Flume Avro RPC Servers

Reply via email to