Re: Spark Streaming - Collecting RDDs into array in the driver program

2015-02-25 Thread Tobias Pfeiffer
Hi,

On Thu, Feb 26, 2015 at 11:24 AM, Thanigai Vellore 
thanigai.vell...@gmail.com wrote:

 It appears that the function immediately returns even before the
 foreachrdd stage is executed. Is that possible?

Sure, that's exactly what happens. foreachRDD() schedules a computation, it
does not perform it. Maybe your streaming application would not ever
terminate, but still the function needs to return, right?

If you remove the toArray(), you will return a reference to the ArrayBuffer
that will be appended to over time. You can then, in a different thread,
check the contents of that ArrayBuffer as processing happens, or wait until
processing ends.

Tobias


Re: Spark Streaming - Collecting RDDs into array in the driver program

2015-02-25 Thread Thanigai Vellore
I didn't include the complete driver code but I do run the streaming
context from the main program which calls this function. Again, I can print
the red elements within the foreachrdd block but the array that is returned
is always empty. It appears that the function immediately returns even
before the foreachrdd stage is executed. Is that possible?
On Feb 25, 2015 5:41 PM, Tathagata Das t...@databricks.com wrote:

 You are just setting up the computation here using foreacRDD. You have not
 even run the streaming context to get any data.


 On Wed, Feb 25, 2015 at 2:21 PM, Thanigai Vellore 
 thanigai.vell...@gmail.com wrote:

 I have this function in the driver program which collects the result from
 rdds (in a stream) into an array and return. However, even though the RDDs
 (in the dstream) have data, the function is returning an empty array...What
 am I doing wrong?

 I can print the RDD values inside the foreachRDD call but the array is
 always empty.

 def runTopFunction() : Array[(String, Int)] = {
 val topSearches = some function
 val summary = new ArrayBuffer[(String,Int)]()
 topSearches.foreachRDD(rdd = {
 summary = summary.++(rdd.collect())
 })

 return summary.toArray
 }