Re: Contributing to Spark
Ha ha! nice try, sheepherder! ;-) On Tue, Apr 8, 2014 at 12:37 PM, Matei Zaharia matei.zaha...@gmail.comwrote: Shh, maybe I really wanted people to fix that one issue. On Apr 8, 2014, at 9:34 AM, Aaron Davidson ilike...@gmail.com wrote: Matei's link seems to point to a specific starter project as part of the starter list, but here is the list itself: https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20labels%20%3D%20Starter%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened) On Mon, Apr 7, 2014 at 10:22 PM, Matei Zaharia matei.zaha...@gmail.com wrote: I'd suggest looking for the issues labeled Starter on JIRA. You can find them here: https://issues.apache.org/jira/browse/SPARK-1438?jql=project%20%3D%20SPARK%20AND%20labels%20%3D%20Starter%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened) Matei On Apr 7, 2014, at 9:45 PM, Mukesh G muk...@gmail.com wrote: Hi Sujeet, Thanks. I went thru the website and looks great. Is there a list of items that I can choose from, for contribution? Thanks Mukesh On Mon, Apr 7, 2014 at 10:14 PM, Sujeet Varakhedi svarakh...@gopivotal.comwrote: This is a good place to start: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark Sujeet On Mon, Apr 7, 2014 at 9:20 AM, Mukesh G muk...@gmail.com wrote: Hi, How I contribute to Spark and it's associated projects? Appreciate the help... Thanks Mukesh -- Michael Ernest Sr. Solutions Consultant West Coast
Re: Spark Streaming and Flume Avro RPC Servers
You can configure your sinks to write to one or more Avro sources in a load-balanced configuration. https://flume.apache.org/FlumeUserGuide.html#flume-sink-processors mfe On Mon, Apr 7, 2014 at 3:19 PM, Christophe Clapp christo...@christophe.ccwrote: Hi, From my testing of Spark Streaming with Flume, it seems that there's only one of the Spark worker nodes that runs a Flume Avro RPC server to receive messages at any given time, as opposed to every Spark worker running an Avro RPC server to receive messages. Is this the case? Our use-case would benefit from balancing the load across Workers because of our volume of messages. We would be using a load balancer in front of the Spark workers running the Avro RPC servers, essentially round-robinning the messages across all of them. If this is something that is currently not supported, I'd be interested in contributing to the code to make it happen. - Christophe -- Michael Ernest Sr. Solutions Consultant West Coast
Re: Spark Streaming and Flume Avro RPC Servers
I don't see why not. If one were doing something similar with straight Flume, you'd start an agent on each node you care to receive Avro/RPC events. In the absence of clearer insight to your use case, I'm puzzling just a little why it's necessary for each Worker to be its own receiver, but there's no real objection or concern to fuel the puzzlement, just curiosity. On Mon, Apr 7, 2014 at 4:16 PM, Christophe Clapp christo...@christophe.ccwrote: Could it be as simple as just changing FlumeUtils to accept a list of host/port number pairs to start the RPC servers on? On 4/7/14, 12:58 PM, Christophe Clapp wrote: Based on the source code here: https://github.com/apache/spark/blob/master/external/ flume/src/main/scala/org/apache/spark/streaming/flume/FlumeUtils.scala It looks like in its current version, FlumeUtils does not support starting an Avro RPC server on more than one worker. - Christophe On 4/7/14, 12:23 PM, Michael Ernest wrote: You can configure your sinks to write to one or more Avro sources in a load-balanced configuration. https://flume.apache.org/FlumeUserGuide.html#flume-sink-processors mfe On Mon, Apr 7, 2014 at 3:19 PM, Christophe Clapp christo...@christophe.ccwrote: Hi, From my testing of Spark Streaming with Flume, it seems that there's only one of the Spark worker nodes that runs a Flume Avro RPC server to receive messages at any given time, as opposed to every Spark worker running an Avro RPC server to receive messages. Is this the case? Our use-case would benefit from balancing the load across Workers because of our volume of messages. We would be using a load balancer in front of the Spark workers running the Avro RPC servers, essentially round-robinning the messages across all of them. If this is something that is currently not supported, I'd be interested in contributing to the code to make it happen. - Christophe -- Michael Ernest Sr. Solutions Consultant West Coast