aDstream.transform(_.distinct()) will only make the elements of each RDD
in the DStream distinct, not for the whole DStream globally. Is that what
you're seeing?
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe
What do you mean not distinct?
It does works for me:
[image: Inline image 1]
Code:
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.{SparkContext, SparkConf}
val ssc = new StreamingContext(sc, Seconds(1))
val data =
val aDstream = ...
val distinctStream = aDstream.transform(_.distinct())
but the elements in distinctStream are not distinct.
Did I use it wrong?