Hi, Thanks for the response. I understand that part. But I am asking why the internal implementation using a subclass when it can use an existing api? Unless there is a real difference, it feels like code smell to me.
Regards, Madhukara Phatak http://datamantra.io/ On Mon, Mar 16, 2015 at 2:14 PM, Shao, Saisai <saisai.s...@intel.com> wrote: > I think these two ways are both OK for you to write streaming job, > `transform` is a more general way for you to transform from one DStream to > another if there’s no related DStream API (but have related RDD API). But > using map maybe more straightforward and easy to understand. > > > > Thanks > > Jerry > > > > *From:* madhu phatak [mailto:phatak....@gmail.com] > *Sent:* Monday, March 16, 2015 4:32 PM > *To:* user@spark.apache.org > *Subject:* MappedStream vs Transform API > > > > Hi, > > Current implementation of map function in spark streaming looks as below. > > > > *def *map[U: ClassTag](mapFunc: T => U): DStream[U] = { > > *new *MappedDStream(*this*, context.sparkContext.clean(mapFunc)) > } > > It creates an instance of MappedDStream which is a subclass of DStream. > > > > The same function can be also implemented using transform API > > > > *def map*[U: ClassTag](mapFunc: T => U): DStream[U] = > > this.transform(rdd => { > > rdd.map(mapFunc) > }) > > > > Both implementation looks same. If they are same, is there any advantage > having a subclass of DStream?. Why can't we just use transform API? > > > > > > Regards, > Madhukara Phatak > http://datamantra.io/ >