Spark Streaming - Collecting RDDs into array in the driver program

2015-02-25 Thread Thanigai Vellore
I have this function in the driver program which collects the result from
rdds (in a stream) into an array and return. However, even though the RDDs
(in the dstream) have data, the function is returning an empty array...What
am I doing wrong?

I can print the RDD values inside the foreachRDD call but the array is
always empty.

def runTopFunction() : Array[(String, Int)] = {
val topSearches = some function
val summary = new ArrayBuffer[(String,Int)]()
topSearches.foreachRDD(rdd = {
summary = summary.++(rdd.collect())
})

return summary.toArray
}


Re: Spark Streaming - Collecting RDDs into array in the driver program

2015-02-25 Thread Thanigai Vellore
I didn't include the complete driver code but I do run the streaming
context from the main program which calls this function. Again, I can print
the red elements within the foreachrdd block but the array that is returned
is always empty. It appears that the function immediately returns even
before the foreachrdd stage is executed. Is that possible?
On Feb 25, 2015 5:41 PM, Tathagata Das t...@databricks.com wrote:

 You are just setting up the computation here using foreacRDD. You have not
 even run the streaming context to get any data.


 On Wed, Feb 25, 2015 at 2:21 PM, Thanigai Vellore 
 thanigai.vell...@gmail.com wrote:

 I have this function in the driver program which collects the result from
 rdds (in a stream) into an array and return. However, even though the RDDs
 (in the dstream) have data, the function is returning an empty array...What
 am I doing wrong?

 I can print the RDD values inside the foreachRDD call but the array is
 always empty.

 def runTopFunction() : Array[(String, Int)] = {
 val topSearches = some function
 val summary = new ArrayBuffer[(String,Int)]()
 topSearches.foreachRDD(rdd = {
 summary = summary.++(rdd.collect())
 })

 return summary.toArray
 }