I think you should clarify some things in Spark Streaming: 1. closure in map is running in the remote side, so modify count var will only take effect in remote side. You will always get -1 in driver side. 2. some codes in closure in foreachRDD is lazily executed in each batch duration, while the if (...) code outside the closure is executed once immediately and will never executed again, so your code logic is wrong as expected. 3. I don't think you need to judge whether there is data feed in to do some transformations, you can directly transform on DStream even there is no data injected in this batch duration, it's only an empty transformation, no more specific overhead.
Thanks Jerry -----Original Message----- From: julyfire [mailto:hellowe...@gmail.com] Sent: Tuesday, September 09, 2014 4:20 PM To: u...@spark.incubator.apache.org Subject: RE: Spark streaming: size of DStream i'm sorry I have some error in my code, update here: var count = -1L // a global variable in the main object val currentBatch = some_DStream val countDStream = currentBatch.map(o=>{ count = 0L // reset the count variable in each batch o }) countDStream.foreachRDD(rdd=> count += rdd.count()) if (count > 0) { currentBatch.map(...).someOtherTransformation } two problems: 1. the variable count just go on accumulate and no reset in each batch 2. if(count > 0) only evaluate in the beginning of running the program, so the next statement will never run Can you all give me some suggestion? thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-size-of-DStream-tp13769p13781.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org