can distinct transform applied on DStream?

2015-03-20 Thread Darren Hoo
val aDstream = ... val distinctStream = aDstream.transform(_.distinct()) but the elements in distinctStream are not distinct. Did I use it wrong?

Re: [spark-streaming] can shuffle write to disk be disabled?

2015-03-18 Thread Darren Hoo
Thanks, Shao On Wed, Mar 18, 2015 at 3:34 PM, Shao, Saisai saisai.s...@intel.com wrote: Yeah, as I said your job processing time is much larger than the sliding window, and streaming job is executed one by one in sequence, so the next job will wait until the first job is finished, so the

Re: [spark-streaming] can shuffle write to disk be disabled?

2015-03-18 Thread Darren Hoo
, Darren Hoo darren@gmail.com wrote: Thanks, Shao On Wed, Mar 18, 2015 at 3:34 PM, Shao, Saisai saisai.s...@intel.com wrote: Yeah, as I said your job processing time is much larger than the sliding window, and streaming job is executed one by one in sequence, so the next job will wait until

Re: [spark-streaming] can shuffle write to disk be disabled?

2015-03-18 Thread Darren Hoo
On Wed, Mar 18, 2015 at 8:31 PM, Shao, Saisai saisai.s...@intel.com wrote: From the log you pasted I think this (-rw-r--r-- 1 root root 80K Mar 18 16:54 shuffle_47_519_0.data) is not shuffle spilled data, but the final shuffle result. why the shuffle result is written to disk? As I

Re: [spark-streaming] can shuffle write to disk be disabled?

2015-03-18 Thread Darren Hoo
is larger than the sliding window, so maybe you computation power cannot reach to the qps you wanted. I think you need to identify the bottleneck at first, and then trying to tune your code, balance the data, add more computation resources. Thanks Jerry *From:* Darren Hoo