So after months and months, I finally started to try and tackle this, but
my scala ability isn't up to it.

The problem is that, of course, even with the common interface, we don't
want inter-operability between RDDs and DStreams.

I looked into Monads, as per Ashish's suggestion, and I think I understand
their relevance.  But when done processing, one would still have to pull
out the wrapped object, knowing what it was, and I don't see how to do that.

I'm guessing there is a way to do this in scala, but I'm not seeing it.

In detail, the requirement would be having something on the order of:

abstract class DistributedCollection[T] {
    def [U] map(fcn: T => U): DistributedCollection[U]
    ...
}

class RDD extends DistrubutedCollection[T] {
    // Note the return type that doesn't quite match the interface
    def [U] map(fcn: T => U): RDD[U]
    ...
}

class DStream extends DistrubutedCollection[T] {
    // Note the return type that doesn't quite match the interface
    def [U] map(fcn: T => U): DStreamU]
    ...
}

Can anyone point me at a way to do this?

Thanks,
                 -Nathan



On Thu, Dec 19, 2013 at 1:08 AM, Ashish Rangole <arang...@gmail.com> wrote:

> I wonder if it will help to have a generic Monad container that wraps
> either RDD or DStream and provides
> map, flatmap, foreach and filter methods.
>
> case class DataMonad[A](data: A) {
>     def map[B]( f : A => B ) : DataMonad[B] = {
>        DataMonad( f( data ) )
>     }
>
>     def flatMap[B]( f : A => DataMonad[B] ) : DataMonad[B] = {
>        f( data )
>     }
>
>     def foreach ...
>     def withFilter ...
>     :
>     :
>     etc, something like that
> }
>
> On Wed, Dec 18, 2013 at 10:42 PM, Reynold Xin <r...@apache.org> wrote:
>
>>
>> On Wed, Dec 18, 2013 at 12:17 PM, Nathan Kronenfeld <
>> nkronenf...@oculusinfo.com> wrote:
>>
>>>
>>>
>>> Since many of the functions exist in parallel between the two, I guess I
>>> would expect something like:
>>>
>>> trait BasicRDDFunctions {
>>> def map...
>>> def reduce...
>>> def filter...
>>> def foreach...
>>> }
>>>
>>> class RDD extends  BasicRDDFunctions...
>>> class DStream extends BasicRDDFunctions...
>>>
>>
>> I like this idea. We should discuss more about it on the dev list. It
>> would require refactoring some APIs, but does lead to better unification.
>>
>
>


-- 
Nathan Kronenfeld
Senior Visualization Developer
Oculus Info Inc
2 Berkeley Street, Suite 600,
Toronto, Ontario M5A 4J5
Phone:  +1-416-203-3003 x 238
Email:  nkronenf...@oculusinfo.com

Reply via email to