RE: Spark streaming: size of DStream

Shao, Saisai Tue, 09 Sep 2014 01:38:01 -0700

I think you should clarify some things in Spark Streaming:

1. closure in map is running in the remote side, so modify count var will only 
take effect in remote side. You will always get -1 in driver side.
2. some codes in closure in foreachRDD is lazily executed in each batch 
duration, while the if (...) code outside the closure is executed once 
immediately and will never executed again, so your code logic is wrong as 
expected.
3. I don't think you need to judge whether there is data feed in to do some 
transformations, you can directly transform on DStream even there is no data 
injected in this batch duration, it's only an empty transformation, no more 
specific overhead.


Thanks
Jerry

-----Original Message-----
From: julyfire [mailto:hellowe...@gmail.com] 
Sent: Tuesday, September 09, 2014 4:20 PM
To: u...@spark.incubator.apache.org
Subject: RE: Spark streaming: size of DStream

i'm sorry I have some error in my code, update here:

var count = -1L // a global variable in the main object 

val currentBatch = some_DStream
val countDStream = currentBatch.map(o=>{ 
      count = 0L  // reset the count variable in each batch 
      o 
    })
countDStream.foreachRDD(rdd=> count += rdd.count())

if (count > 0) {
  currentBatch.map(...).someOtherTransformation
}

two problems:
1. the variable count just go on accumulate and no reset in each batch 2. 
if(count > 0) only evaluate in the beginning of running the program, so the 
next statement will never run

Can you all give me some suggestion? thanks





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-size-of-DStream-tp13769p13781.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
commands, e-mail: user-h...@spark.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

RE: Spark streaming: size of DStream

Reply via email to