Folks, I want just to pint something out... I didn't had time yet to sort it out and to think enough to give valuable strict explanation of -- event though, intuitively I feel they are a lot ===> need spark people or time to move forward. But here is the thing regarding *flatMap*.
Actually, it looks like (and again intuitively makes sense) that RDD (and of course DStream) aren't monadic and it is reflected in the implementation (and signature) of flatMap. > > * def flatMap[U: ClassTag](f: T => TraversableOnce[U]): RDD[U] = ** > new FlatMappedRDD(this, sc.clean(f))* There!? flatMap (or bind, >>=) should take a function that use the same Higher level abstraction in order to be considered as such right? In this case, it takes a function that returns a TraversableOnce which is the type of the content of the RDD, and what represent the output is more the content of the RDD than the RDD itself (still right?). This actually breaks the understand of map and flatMap > *def map[U: ClassTag](f: T => U): RDD[U] = new MappedRDD(this, > sc.clean(f))* Indeed, RDD is a functor and the underlying reason for flatMap to not take A => RDD[B] doesn't show up in map. This has a lot of consequence actually, because at first one might want to create for-comprehension over RDDs, of even Traversable[F[_]] functions like sequence -- and he will get stuck since the signature aren't compliant. More importantly, Scala uses convention on the structure of a type to allow for-comp... so where Traversable[F[_]] will fail on type, for-comp will failed weirdly. Again this signature sounds normal, because my intuitive feeling about RDDs is that they *only can* be monadic but the composition would depend on the use case and might have heavy consequences (unioning the RDDs for instance => this happening behind the sea can be a big pain, since it wouldn't be efficient at all). So Yes, RDD could be monadic but with care. So what exposes this signature is a way to flatMap over the inner value, like it is almost the case for Map (flatMapValues) So, wouldn't be better to rename flatMap as flatMapData (or whatever better name)? Or to have flatMap requiring a Monad instance of RDD? Sorry for the prose, just dropped my thoughts and feelings at once :-/ Cheers, andy PS: and my English maybe, although my name's Andy I'm a native Belgian ^^.