Re: Hash Join in Spark

2014-02-03 Thread Aaron Davidson
This method is doing very little. Line 2 constructs the CoGroupedRDD, which will do all the real work. Note that while this cogroup function just groups 2 RDDs together, CoGroupedRDD allows general n-way cogrouping, so it takes a Seq[RDD(K, _)] rather than just 2 such key-value RDDs. The rest of

spark streaming questions

2014-02-03 Thread Liam Stewart
I'm looking at adding spark / shark to our analytics pipeline and would also like to use spark streaming for some incremental computations, but I have some questions about the suitability of spark streaming. Roughly, we have events that are generated by app servers based on user interactions with

writing SparkR reducer functions

2014-02-03 Thread Justin Lent
So i've been struggling with this now for a bit using SparkR.. I can't even seem to write a basic mean/median function in R that works when passing it into reduceByKey() for my very simple dataset. I can pass in R's base function 'sum' and it works just fine. Looking in help shows that the

ClassNotFoundException: PRCombiner

2014-02-03 Thread Tsai Li Ming
Hi, While running the Bagel’s Wikipedia Page Rank example (org.apache.spark.examples.bagel.WikipediaPageRank), it is having this error at the end: org.apache.spark.SparkException: Job aborted: Task 3.0:4 failed 4 times (most recent failure: Exception failure: java.lang.ClassNotFoundException:

Re: ClassNotFoundException: PRCombiner

2014-02-03 Thread Tsai Li Ming
On 4 Feb, 2014, at 10:08 am, Tsai Li Ming mailingl...@ltsai.com wrote: Hi, While running the Bagel’s Wikipedia Page Rank example (org.apache.spark.examples.bagel.WikipediaPageRank), it is having this error at the end: org.apache.spark.SparkException: Job aborted: Task 3.0:4 failed 4

Re: writing SparkR reducer functions

2014-02-03 Thread Justin Lent
after googling around I realize how ridiculous my question is :( being new to Spark, for some reason I thought all of the basic stats function were implemented in a first class way out of the box over the mapreduce framework... oops! sorry for the spam :) On Monday, February 3, 2014, Justin

spark errors: Executor X disconnected, so removing it

2014-02-03 Thread emeric
Hello, I am experiencing the following problem with Spark. My application runs properly for very small datasets (6 MB), but fails for datasets beyond 12MB. With those larger datasets, the main log shows the following errors for all of my executors. The application (launched from sbt command)