It's mostly for legacy reasons. First we had added all the MappedDStream, etc. and then later we realized we need to expose something that is more generic for arbitrary RDD-RDD transformations. It can be easily replaced. However, there is a slight value in having MappedDStream, for developers to learn about DStreams.
TD On Mon, Mar 16, 2015 at 3:37 AM, madhu phatak <phatak....@gmail.com> wrote: > Hi, > Thanks for the response. I understand that part. But I am asking why the > internal implementation using a subclass when it can use an existing api? > Unless there is a real difference, it feels like code smell to me. > > > Regards, > Madhukara Phatak > http://datamantra.io/ > > On Mon, Mar 16, 2015 at 2:14 PM, Shao, Saisai <saisai.s...@intel.com> > wrote: > >> I think these two ways are both OK for you to write streaming job, >> `transform` is a more general way for you to transform from one DStream to >> another if there’s no related DStream API (but have related RDD API). But >> using map maybe more straightforward and easy to understand. >> >> >> >> Thanks >> >> Jerry >> >> >> >> *From:* madhu phatak [mailto:phatak....@gmail.com] >> *Sent:* Monday, March 16, 2015 4:32 PM >> *To:* user@spark.apache.org >> *Subject:* MappedStream vs Transform API >> >> >> >> Hi, >> >> Current implementation of map function in spark streaming looks as >> below. >> >> >> >> *def *map[U: ClassTag](mapFunc: T => U): DStream[U] = { >> >> *new *MappedDStream(*this*, context.sparkContext.clean(mapFunc)) >> } >> >> It creates an instance of MappedDStream which is a subclass of DStream. >> >> >> >> The same function can be also implemented using transform API >> >> >> >> *def map*[U: ClassTag](mapFunc: T => U): DStream[U] = >> >> this.transform(rdd => { >> >> rdd.map(mapFunc) >> }) >> >> >> >> Both implementation looks same. If they are same, is there any advantage >> having a subclass of DStream?. Why can't we just use transform API? >> >> >> >> >> >> Regards, >> Madhukara Phatak >> http://datamantra.io/ >> > >