Re: Windowed Operations

2014-10-10 Thread julyfire
hi Diego, I have the same problem. // reduce by key in the first window val *w1* = *one*.reduceByKeyAndWindow(_ + _, Seconds(30), Seconds(10)) w1.count().print() //reduce by key in the second window based on the results of the first window val *w2* = *w1*.reduceByKeyAndWindow(_ + _,

Re: How to use FlumeInputDStream in spark cluster?

2014-09-24 Thread julyfire
I have test the example codes FlumeEventCount on standalone cluster, and this is still a problem in Spark 1.1.0, the latest version up to now. Do you have solved this issue in your way? -- View this message in context:

Spark streaming: size of DStream

2014-09-09 Thread julyfire
I want to implement the following logic: val stream = getFlumeStream() // a DStream if(size_of_stream 0) // if the DStream contains some RDD stream.someTransfromation stream.count() can figure out the number of RDD in a DStream, but it return a DStream[Long] and can't compare with a

Re: How to profile a spark application

2014-09-09 Thread julyfire
VisualVM is free and is enough in most situations -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-profile-a-spark-application-tp13684p13770.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

RE: Spark streaming: size of DStream

2014-09-09 Thread julyfire
Hi Jerry, Thanks for your reply. I use spark streaming to receive the flume stream, then I need to do a judgement, in each batchDuration, if the received stream has data, then I should do something, if no data, do the other thing. Then I thought the count() can give me the measure, but it returns

RE: Spark streaming: size of DStream

2014-09-09 Thread julyfire
Thanks all, yes, i did using foreachRDD, the following is my code: var count = -1L // a global variable in the main object val currentBatch = some_DStream val countDStream = currentBatch.map(o={ *count = 0L *// reset the count variable in each batch o })

RE: Spark streaming: size of DStream

2014-09-09 Thread julyfire
i'm sorry I have some error in my code, update here: var count = -1L // a global variable in the main object val currentBatch = some_DStream val countDStream = currentBatch.map(o={ count = 0L // reset the count variable in each batch o }) countDStream.foreachRDD(rdd= count

RE: Spark streaming: size of DStream

2014-09-09 Thread julyfire
yes, I agree to directly transform on DStream even there is no data injected in this batch duration. while my situation is : Spark receive flume stream continurously, and I use updateStateByKey function to collect data for a key among several batches, then I will handle the collected data after

Spark groupByKey partition out of memory

2014-09-07 Thread julyfire
When a MappedRDD is handled by groupByKey transformation, tuples distributed in different worker nodes with the same key will be collected into one worker nodes, say, (K, V1), (K, V2), ..., (K, Vn) - (K, Seq(V1, V2, ..., Vn)). I want to know whether the value /Seq(V1, V2, ..., Vn)/ of a tuple