subject:"Using rdd methods with Dstream"

Re: Using rdd methods with Dstream

2015-03-14 Thread Laeeq Ahmed

Thanks TD, this is what I was looking for. rdd.context.makeRDD worked. Laeeq On Friday, March 13, 2015 11:08 PM, Tathagata Das t...@databricks.com wrote: Is the number of top K elements you want to keep small? That is, is K small? In which case, you can1. either do it in the

Re: Using rdd methods with Dstream

2015-03-13 Thread Tathagata Das

Is the number of top K elements you want to keep small? That is, is K small? In which case, you can 1. either do it in the driver on the array DStream.foreachRDD ( rdd = { val topK = rdd.top(K) ; // use top K }) 2. Or, you can use the topK to create another RDD using sc.makeRDD

Using rdd methods with Dstream

2015-03-13 Thread Laeeq Ahmed

Hi, I normally use dstream.transform whenever I need to use methods which are available in RDD API but not in streaming API. e.g. dstream.transform(x = x.sortByKey(true)) But there are other RDD methods which return types other than RDD. e.g. dstream.transform(x = x.top(5)) top here returns

Re: Using rdd methods with Dstream

2015-03-13 Thread Sean Owen

Hm, aren't you able to use the SparkContext here? DStream operations happen on the driver. So you can parallelize() the result? take() won't work as it's not the same as top() On Fri, Mar 13, 2015 at 11:23 AM, Akhil Das ak...@sigmoidanalytics.com wrote: Like this?

Re: Using rdd methods with Dstream

2015-03-13 Thread Akhil Das

Like this? dtream.repartition(1).mapPartitions(it = it.take(5)) Thanks Best Regards On Fri, Mar 13, 2015 at 4:11 PM, Laeeq Ahmed laeeqsp...@yahoo.com.invalid wrote: Hi, I normally use dstream.transform whenever I need to use methods which are available in RDD API but not in streaming

Re: Using rdd methods with Dstream

2015-03-13 Thread Laeeq Ahmed

Hi, repartition is expensive. Looking for an efficient to do this. Regards,Laeeq On Friday, March 13, 2015 12:24 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Like this? dtream.repartition(1).mapPartitions(it = it.take(5)) ThanksBest Regards On Fri, Mar 13, 2015 at 4:11 PM,

Re: Using rdd methods with Dstream

2015-03-13 Thread Laeeq Ahmed

Hi, Earlier my code was like follwing but slow due to repartition. I want top K of each window in a stream. val counts = keyAndValues.map(x = math.round(x._3.toDouble)).countByValueAndWindow(Seconds(4), Seconds(4))val topCounts = counts.repartition(1).map(_.swap).transform(rdd =

Re: Using rdd methods with Dstream

Re: Using rdd methods with Dstream

Using rdd methods with Dstream

Re: Using rdd methods with Dstream

Re: Using rdd methods with Dstream

Re: Using rdd methods with Dstream

Re: Using rdd methods with Dstream

7 matches

Site Navigation

Mail list logo

Footer information