Hi,
Thank you for the response.
Can I give a PR to use transform for all the functions like map,flatMap
etc so they are consistent with other API's?.
Regards,
Madhukara Phatak
http://datamantra.io/
On Mon, Mar 16, 2015 at 11:42 PM, Tathagata Das t...@databricks.com wrote:
It's mostly for
Hi,
Regards,
Madhukara Phatak
http://datamantra.io/
On Tue, Mar 17, 2015 at 2:31 PM, Tathagata Das t...@databricks.com wrote:
That's not super essential, and hence hasn't been done till now. Even in
core Spark there are MappedRDD, etc. even though all of them can be
implemented by
That's not super essential, and hence hasn't been done till now. Even in
core Spark there are MappedRDD, etc. even though all of them can be
implemented by MapPartitionedRDD (may be the name is wrong). So its nice to
maintain the consistency, MappedDStream creates MappedRDDs. :)
Though this does
Hi,
Sorry for the wrong formatting in the earlier mail.
On Tue, Mar 17, 2015 at 2:31 PM, Tathagata Das t...@databricks.com wrote:
That's not super essential, and hence hasn't been done till now. Even in
core Spark there are MappedRDD, etc. even though all of them can be
implemented by
Hi,
Thanks for the response. I understand that part. But I am asking why the
internal implementation using a subclass when it can use an existing api?
Unless there is a real difference, it feels like code smell to me.
Regards,
Madhukara Phatak
http://datamantra.io/
On Mon, Mar 16, 2015 at 2:14
It's mostly for legacy reasons. First we had added all the MappedDStream,
etc. and then later we realized we need to expose something that is more
generic for arbitrary RDD-RDD transformations. It can be easily replaced.
However, there is a slight value in having MappedDStream, for developers to
I think these two ways are both OK for you to write streaming job, `transform`
is a more general way for you to transform from one DStream to another if
there’s no related DStream API (but have related RDD API). But using map maybe
more straightforward and easy to understand.
Thanks
Jerry