Re: Saving partial (top 10) DStream windows to hdfs

2015-01-08 Thread Yana Kadiyska
I'm glad you solved this issue but have a followup question for you. Wouldn't Akhil's solution be better for you after all? I run similar computation where a large set of data gets reduced to a much smaller aggregate in an interval. If you do saveAsText without coalescing, I believe you'd get the

Re: Saving partial (top 10) DStream windows to hdfs

2015-01-07 Thread Laeeq Ahmed
Hi, I applied it as fallows:    eegStreams(a) = KafkaUtils.createStream(ssc, zkQuorum, group, Map(args(a) - 1),StorageLevel.MEMORY_AND_DISK_SER).map(_._2)val counts = eegStreams(a).map(x = math.round(x.toDouble)).countByValueAndWindow(Seconds(4), Seconds(4)) val sortedCounts =

Re: Saving partial (top 10) DStream windows to hdfs

2015-01-07 Thread Laeeq Ahmed
Hi, It worked out as this. val topCounts = sortedCounts.transform(rdd = rdd.zipWithIndex().filter(x=x._2 =10)) Regards,Laeeq On Wednesday, January 7, 2015 5:25 PM, Laeeq Ahmed laeeqsp...@yahoo.com.INVALID wrote: Hi Yana, I also think thatval top10 = your_stream.mapPartitions(rdd =

Re: Saving partial (top 10) DStream windows to hdfs

2015-01-07 Thread Akhil Das
Oh yeah. In that case you can simply repartition it into 1 and do mapPartition. val top10 = mysream.repartition(1).mapPartitions(rdd = rdd.take(10)) On 7 Jan 2015 21:08, Laeeq Ahmed laeeqsp...@yahoo.com wrote: Hi, I applied it as fallows: eegStreams(a) = KafkaUtils.createStream(ssc,

Re: Saving partial (top 10) DStream windows to hdfs

2015-01-07 Thread Laeeq Ahmed
Hi Yana, I also think thatval top10 = your_stream.mapPartitions(rdd = rdd.take(10)) will give top 10 from each partition. I will try your code. Regards,Laeeq On Wednesday, January 7, 2015 5:19 PM, Yana Kadiyska yana.kadiy...@gmail.com wrote: My understanding is that  val top10 =

Re: Saving partial (top 10) DStream windows to hdfs

2015-01-06 Thread Akhil Das
You can try something like: *val top10 = your_stream.mapPartitions(rdd = rdd.take(10))* Thanks Best Regards On Mon, Jan 5, 2015 at 11:08 PM, Laeeq Ahmed laeeqsp...@yahoo.com.invalid wrote: Hi, I am counting values in each window and find the top values and want to save only the top 10

Saving partial (top 10) DStream windows to hdfs

2015-01-05 Thread Laeeq Ahmed
Hi, I am counting values in each window and find the top values and want to save only the top 10 frequent values of each window to hdfs rather than all the values. eegStreams(a) = KafkaUtils.createStream(ssc, zkQuorum, group, Map(args(a) - 1),StorageLevel.MEMORY_AND_DISK_SER).map(_._2)val