its definitely unfortunate that the current naming breaks scala's for
comprehension


On Sat, Mar 15, 2014 at 2:15 PM, andy petrella <andy.petre...@gmail.com>wrote:

> [Thanks a *lot* for your answers!]
>
> That's CoOl, a possible example would be to simply write a
> for-comprehension that would do this:
> >
> > val allEvents = for {
> >   deviceId      <- rddFromHdfsOfDeviceId
> >   deviceEvent <- rddFromHdfsOfDeviceEvent(deviceId)
> > } deviceEvent
> > val hist = computeHistOf(allEvents)
>
>
> It's just an example huh ( :-D )
>
> Also here is the answer I was preparing to your previous answer
>
> yes, it's what does
> > And it makes a lot of sense because RDD is """"just"""" an abstraction of
> > the whole sequence of data and distribute it in a resilient fashion.
> > So RDD#flatMap would mean: distribute in a resilient fashion a sequence
> of
> > RDDs before aggregate (or consolidate or what not verb ^^) them all.
> > A example to explain myself maybe ;-) (and maybe also to confirm my own
> > understanding)
> > * data (logical representation)
> >  > [A1, A2, A3, A4, A5, A6, A7, A8, A9, A10]
> > * rdd-ized
> > * > rdd1*[A1, A2, A3] *rdd2*[A4, A5] *rdd3*[A6, A7, A8, A9, A10]
> > * map with f:A=>B
> > * > rdd1'*[B1, B2, B3] *rdd2'*[B4, B5] *rdd3'*[B6, B7, B8, B9, B10]
> > * current flatMap with f: A => Traversable[B]  (*not redistributed I
> > suppose*)
> >  > (intermediate step) *rdd1'*[[B11, B12], [B21] , []] *rdd2'*[[],
> > [B41,B42,B43]] *rdd3'*[[B61], [B71, B72], [B81, B82, B83, B84], [], []]
> >  > (final result)           *rdd1"*[B11, B12, B21] *rdd2**"*[B41,B42,B43]
> > *rdd3**"*[B61, B71, B72, B81, B82, B83, B84]
> > * for expression compliant version of flatMap with f: A => RDD[B]
>  (*could
> > involve some redistributions I presume*)
> >  > (intermediate step) *rdd1'*[*rdd11*[B11] *rdd11*[B12] *rdd12*[B21]]
> > *rdd2'*[*rdd21*[B41] *rdd22*[B42, B43]] *rdd3'*[*rdd31*[B61] *rdd32*[B71,
> > B72], *rdd33*[B81] *rdd34*[B82, B83] *rdd35*[B84]]
> >  > (final result)           *rdd1**"*[B11, B12, B21] *rdd2**"*
> > [B41,B42,B43] *rdd3**"*[B61, B71, B72, B81, B82, B83, B84]
> >
> > As you can see in the example I tried to depict here, the final result
> > that would be seen is the same but in between there might have some
> > redistribution (have a quick look at B81, B82, B83, B84).
> > That's why, imho, it'd interesting to clear this out by either renaming
> > the function -- or even awesomely better making flatMap able to
> > redistribute in-between (which can have a big impact at the
> reconciliation
> > => my first mail :-D).
> > Tell me if I'm completely wrong ;-) -- or if I forget something in my
> > actual understanding -- or if there is something similar already existing
> > and I'm not aware of
> > Cheers,
> > andy
> >
> >
> >
>
> Andy Petrella
> Belgium (Liège)
>
> *       *********
>  Data Engineer in *NextLab <http://nextlab.be/> sprl* (owner)
>  Engaged Citizen Coder for *WAJUG <http://wajug.be/>* (co-founder)
>  Author of *Learning Play! Framework 2
> <http://www.packtpub.com/learning-play-framework-2/book>*
>  Bio: on visify <https://www.vizify.com/es/52c3feec2163aa0010001eaa>
> *       *********
> Mobile: *+32 495 99 11 04*
> Mails:
>
>    - andy.petre...@nextlab.be
>    - andy.petre...@gmail.com
>
> *       *********
> Socials:
>
>    - Twitter: https://twitter.com/#!/noootsab
>    - LinkedIn: http://be.linkedin.com/in/andypetrella
>    - Blogger: http://ska-la.blogspot.com/
>    - GitHub:  https://github.com/andypetrella
>    - Masterbranch: https://masterbranch.com/andy.petrella
>
>
>
> On Sat, Mar 15, 2014 at 7:06 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
> > just going head first without any thinking, it changed flatMap to
> > flatMapData and added a flatMap. for FlatMappedRDD my compute is:
> >
> > firstParent[T].iterator(split, context).flatMap(f andThen
> (_.compute(split,
> > context)))
> >
> >
> > scala> val x = sc.parallelize(1 to 100)
> > scala> x.flatMap _
> > res0: (Int => org.apache.spark.rdd.RDD[Nothing]) =>
> > org.apache.spark.rdd.RDD[Nothing] = <function1>
> >
> > my f for flatMap is now f: T => RDD[U], however, i am not sure how to
> write
> > a useful function for this :)
> >
> >
> >
> > On Sat, Mar 15, 2014 at 1:17 PM, Koert Kuipers <ko...@tresata.com>
> wrote:
> >
> > > MappedRDD does:
> > > firstParent[T].iterator(split, context).map(f)
> > >
> > > and FlatMappedRDD:
> > > firstParent[T].iterator(split, context).flatMap(f)
> > >
> > > do yeah seems like its a map or flatMap over the iterator inside, not
> the
> > > RDD itself, sort of...
> > >
> > >
> > > On Sat, Mar 15, 2014 at 9:08 AM, andy petrella <
> andy.petre...@gmail.com
> > >wrote:
> > >
> > >> Yep,
> > >> Regarding flatMap and an implicit parameter might work like in scala's
> > >> future for instance:
> > >>
> > >>
> >
> https://github.com/scala/scala/blob/master/src/library/scala/concurrent/Future.scala#L246
> > >>
> > >> Dunno, still waiting for some insights from the team ^^
> > >>
> > >> andy
> > >>
> > >> On Wed, Mar 12, 2014 at 3:23 PM, Pascal Voitot Dev <
> > >> pascal.voitot....@gmail.com> wrote:
> > >>
> > >> > On Wed, Mar 12, 2014 at 3:06 PM, andy petrella <
> > andy.petre...@gmail.com
> > >> > >wrote:
> > >> >
> > >> > > Folks,
> > >> > >
> > >> > > I want just to pint something out...
> > >> > > I didn't had time yet to sort it out and to think enough to give
> > >> valuable
> > >> > > strict explanation of -- event though, intuitively I feel they
> are a
> > >> lot
> > >> > > ===> need spark people or time to move forward.
> > >> > > But here is the thing regarding *flatMap*.
> > >> > >
> > >> > > Actually, it looks like (and again intuitively makes sense) that
> RDD
> > >> (and
> > >> > > of course DStream) aren't monadic and it is reflected in the
> > >> > implementation
> > >> > > (and signature) of flatMap.
> > >> > >
> > >> > > >
> > >> > > > *  def flatMap[U: ClassTag](f: T => TraversableOnce[U]): RDD[U]
> =
> > **
> > >> > > > new FlatMappedRDD(this, sc.clean(f))*
> > >> > >
> > >> > >
> > >> > > There!? flatMap (or bind, >>=) should take a function that use the
> > >> same
> > >> > > Higher level abstraction in order to be considered as such right?
> > >> > >
> > >> > >
> > >> > I had remarked exactly the same thing and asked myself the same
> > >> question...
> > >> >
> > >> > In this case, it takes a function that returns a TraversableOnce
> which
> > >> is
> > >> > > the type of the content of the RDD, and what represent the output
> is
> > >> more
> > >> > > the content of the RDD than the RDD itself (still right?).
> > >> > >
> > >> > > This actually breaks the understand of map and flatMap
> > >> > >
> > >> > > > *def map[U: ClassTag](f: T => U): RDD[U] = new MappedRDD(this,
> > >> > > > sc.clean(f))*
> > >> > >
> > >> > >
> > >> > > Indeed, RDD is a functor and the underlying reason for flatMap to
> > not
> > >> > take
> > >> > > A => RDD[B] doesn't show up in map.
> > >> > >
> > >> > > This has a lot of consequence actually, because at first one might
> > >> want
> > >> > to
> > >> > > create for-comprehension over RDDs, of even Traversable[F[_]]
> > >> functions
> > >> > > like sequence -- and he will get stuck since the signature aren't
> > >> > > compliant.
> > >> > > More importantly, Scala uses convention on the structure of a type
> > to
> > >> > allow
> > >> > > for-comp... so where Traversable[F[_]] will fail on type, for-comp
> > >> will
> > >> > > failed weirdly.
> > >> > >
> > >> >
> > >> > +1
> > >> >
> > >> >
> > >> > >
> > >> > > Again this signature sounds normal, because my intuitive feeling
> > about
> > >> > RDDs
> > >> > > is that they *only can* be monadic but the composition would
> depend
> > on
> > >> > the
> > >> > > use case and might have heavy consequences (unioning the RDDs for
> > >> > instance
> > >> > > => this happening behind the sea can be a big pain, since it
> > wouldn't
> > >> be
> > >> > > efficient at all).
> > >> > >
> > >> > > So Yes, RDD could be monadic but with care.
> > >> > >
> > >> >
> > >> > At least we can say, it is a Functor...
> > >> > Actually, I had imagined studying the monadic aspect of RDDs but as
> > you
> > >> > said, it's not so easy...
> > >> > So for now, I consider them as pseudo-monadic ;)
> > >> >
> > >> >
> > >> >
> > >> > > So what exposes this signature is a way to flatMap over the inner
> > >> value,
> > >> > > like it is almost the case for Map (flatMapValues)
> > >> > >
> > >> > > So, wouldn't be better to rename flatMap as flatMapData (or
> whatever
> > >> > better
> > >> > > name)? Or to have flatMap requiring a Monad instance of RDD?
> > >> > >
> > >> > >
> > >> > renaming is to flatMapData or flatTraversableMap sounds good to me
> > >> (even if
> > >> > lots of people will hate it...)
> > >> > flatMap requiring a Monad would make it impossible to use with
> > >> > for-comprehension certainly no?
> > >> >
> > >> >
> > >> > > Sorry for the prose, just dropped my thoughts and feelings at once
> > :-/
> > >> > >
> > >> > >
> > >> > I agree with you in case it can help not to feel alone ;)
> > >> >
> > >> > Pascal
> > >> >
> > >> > Cheers,
> > >> > > andy
> > >> > >
> > >> > > PS: and my English maybe, although my name's Andy I'm a native
> > Belgian
> > >> > ^^.
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Reply via email to