What do you mean not distinct? It does works for me: [image: Inline image 1]
Code: import org.apache.spark.streaming.{Seconds, StreamingContext} import org.apache.spark.{SparkContext, SparkConf} val ssc = new StreamingContext(sc, Seconds(1)) val data = ssc.textFileStream("/home/akhld/mobi/localcluster/spark-1/sigmoid/") val dist = data.transform(_.distinct()) dist.print() ssc.start() ssc.awaitTermination() Thanks Best Regards On Fri, Mar 20, 2015 at 11:07 PM, Darren Hoo <darren....@gmail.com> wrote: > val aDstream = ... > > val distinctStream = aDstream.transform(_.distinct()) > > but the elements in distinctStream are not distinct. > > Did I use it wrong? >