Re: word count aggregation
For windows that large (1 hour), you will probably also have to increase the batch interval for efficiency. TD On Mon, Dec 29, 2014 at 12:16 AM, Akhil Das wrote: > You can use reduceByKeyAndWindow for that. Here's a pretty clean example > https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/TwitterPopularTags.scala > > Thanks > Best Regards > > On Mon, Dec 29, 2014 at 1:30 PM, Hoai-Thu Vuong wrote: >> >> dear user of spark >> >> I've got a program, streaming a folder, when a new file is created in this >> folder, I count a word, which appears in this document and update it (I used >> StatefulNetworkWordCount to do it). And it work like charm. However, I would >> like to know the different of top 10 word at now and at time (one hour >> before). How could I do it? I try to use windowDuration, but it seem not >> work. > > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: word count aggregation
You can use reduceByKeyAndWindow for that. Here's a pretty clean example https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/TwitterPopularTags.scala Thanks Best Regards On Mon, Dec 29, 2014 at 1:30 PM, Hoai-Thu Vuong wrote: > dear user of spark > > I've got a program, streaming a folder, when a new file is created in this > folder, I count a word, which appears in this document and update it (I > used StatefulNetworkWordCount to do it). And it work like charm. However, I > would like to know the different of top 10 word at now and at time (one > hour before). How could I do it? I try to use windowDuration, but it seem > not work. >
word count aggregation
dear user of spark I've got a program, streaming a folder, when a new file is created in this folder, I count a word, which appears in this document and update it (I used StatefulNetworkWordCount to do it). And it work like charm. However, I would like to know the different of top 10 word at now and at time (one hour before). How could I do it? I try to use windowDuration, but it seem not work.