On Sat, Mar 22, 2014 at 7:45 PM, andy petrella <andy.petre...@gmail.com>wrote:

> Dear,
> I'm pretty much following the Pascal's advices, since I've myseelf
> encoutered some problems with implicits (when playing the same kind of game
> with my Neo4J Scala API).
>
> Nevertheless, one remark regarding the serialization, the lost of data
> shouldn't arrive in the case whenimplicit typeclasses aren't involved. Of
> course using Typeclasses means that the instance will be chosen at compile
> time. Without them it will behave like classical use cases where the
> serializer will do the dirty work at runtime and using the current class
> :/.
>
> Now, imho, I'd be interested to have RDD covariant on the content type,
> this because I have an API that I should be able to share with you soon or
> sooner where we are trying to bind the two worlds (rdd+SparkCtx and
> dstream+StreamingCtx) and also to combine and chain job components.
> In a nutshell, it will able to define Source, Process and Sink of Container
> of Wagons (Rdds Dstreams themselves) to compose a Job using a (to be
> defined) DSLs.
>

You can't give information like that and stop too soon :)
You know that I've been struggling for some time playing with spark &
scalaz-stream and I'm curious ;)


So without covariance I cannot for now define a generic noop Sink.
>
> My0.02c
> Andy
>
> Sent from Tab, sorry for the typos...
>  Le 22 mars 2014 17:00, "Pascal Voitot Dev" <pascal.voitot....@gmail.com>
> a
> écrit :
>
> > On Sat, Mar 22, 2014 at 3:45 PM, Michael Armbrust <
> mich...@databricks.com
> > >wrote:
> >
> > > >
> > > > From my experience, covariance often becomes a pain when dealing with
> > > > serialization/deserialization (I've experienced a few cases while
> > > > developing play-json & datomisca).
> > > > Moreover, if you have implicits, variance often becomes a headache...
> > >
> > >
> > > This is exactly the kind of feedback I was hoping for!  Can you be any
> > more
> > > specific about the kinds of problems you ran into here?
> > >
> >
> > I've been rethinking about this topic after writing my first mail.
> >
> > The problem I was talking about is when you try to use typeclass
> converters
> > and make them contravariant/covariant for input/output. Something like:
> >
> > Reader[-I, +O] { def read(i:I): O }
> >
> > Doing this, you soon have implicit collisions and philosophical concerns
> > about what it means to serialize/deserialize a Parent class and a Child
> > class...
> >
> > For ex, if you have a Reader[I, Dog], you also have a Reader[I, Mammal]
> by
> > covariance.
> > Then you use this Reader[I, Mammal] to read a Cat because it's a Mammal.
> > But it fails as the original Reader expects the representation of a full
> > Dog, not only a part of it corresponding to the Mammal...
> >
> > So you see here that the problem is on deserialization/deserialization
> > mechanism itself.
> >
> > In your case, you don't have this kind of concerns as JavaSerializer and
> > KryoSerializer are more about basic marshalling that operates at
> low-level
> > class representation and you don't rely on implicit typeclasses...
> >
> > So let's consider what you really want, RDD[+T] and see whether it will
> > have bad impacts.
> >
> > if you do:
> >
> > val rddChild: RDD[Child] = sc.parallelize(Seq(Child(...), Child(...),
> ...))
> >
> > If you perform map/reduce ops on this rddChild, when remoting objects,
> > spark context will serialize all sequence elements as Child.
> >
> > But if you do that:
> > val rddParent: RDD[Parent] = rddChild
> >
> > If you perform ops on rddParent, I believe that the serializer should
> > serialize elements as Parent elements, certainly losing some data from
> > Child.
> > On the remote node, they will be deserialized as Parent too but they
> > shouldn't be Child elements anymore.
> >
> > So, here, if it works as I say (I'm not sure), it would mean the
> following:
> > you have created a RDD from some data and just by invoking covariance,
> you
> > might have lost data through the remoting mechanism.
> >
> > Is it something bad in your opinion? (I'm thinking aloud)
> >
> > Pascal
> >
>

Reply via email to