Re: MappedStream vs Transform API

2015-03-17 Thread madhu phatak
Hi, Thank you for the response. Can I give a PR to use transform for all the functions like map,flatMap etc so they are consistent with other API's?. Regards, Madhukara Phatak http://datamantra.io/ On Mon, Mar 16, 2015 at 11:42 PM, Tathagata Das t...@databricks.com wrote: It's mostly for

Re: MappedStream vs Transform API

2015-03-17 Thread madhu phatak
Hi, Regards, Madhukara Phatak http://datamantra.io/ On Tue, Mar 17, 2015 at 2:31 PM, Tathagata Das t...@databricks.com wrote: That's not super essential, and hence hasn't been done till now. Even in core Spark there are MappedRDD, etc. even though all of them can be implemented by

Re: MappedStream vs Transform API

2015-03-17 Thread Tathagata Das
That's not super essential, and hence hasn't been done till now. Even in core Spark there are MappedRDD, etc. even though all of them can be implemented by MapPartitionedRDD (may be the name is wrong). So its nice to maintain the consistency, MappedDStream creates MappedRDDs. :) Though this does

Re: MappedStream vs Transform API

2015-03-17 Thread madhu phatak
Hi, Sorry for the wrong formatting in the earlier mail. On Tue, Mar 17, 2015 at 2:31 PM, Tathagata Das t...@databricks.com wrote: That's not super essential, and hence hasn't been done till now. Even in core Spark there are MappedRDD, etc. even though all of them can be implemented by

Re: MappedStream vs Transform API

2015-03-16 Thread madhu phatak
Hi, Thanks for the response. I understand that part. But I am asking why the internal implementation using a subclass when it can use an existing api? Unless there is a real difference, it feels like code smell to me. Regards, Madhukara Phatak http://datamantra.io/ On Mon, Mar 16, 2015 at 2:14

Re: MappedStream vs Transform API

2015-03-16 Thread Tathagata Das
It's mostly for legacy reasons. First we had added all the MappedDStream, etc. and then later we realized we need to expose something that is more generic for arbitrary RDD-RDD transformations. It can be easily replaced. However, there is a slight value in having MappedDStream, for developers to

RE: MappedStream vs Transform API

2015-03-16 Thread Shao, Saisai
I think these two ways are both OK for you to write streaming job, `transform` is a more general way for you to transform from one DStream to another if there’s no related DStream API (but have related RDD API). But using map maybe more straightforward and easy to understand. Thanks Jerry