On Sat, Mar 22, 2014 at 7:45 PM, andy petrella <andy.petre...@gmail.com>wrote:
> Dear, > I'm pretty much following the Pascal's advices, since I've myseelf > encoutered some problems with implicits (when playing the same kind of game > with my Neo4J Scala API). > > Nevertheless, one remark regarding the serialization, the lost of data > shouldn't arrive in the case whenimplicit typeclasses aren't involved. Of > course using Typeclasses means that the instance will be chosen at compile > time. Without them it will behave like classical use cases where the > serializer will do the dirty work at runtime and using the current class > :/. > > Now, imho, I'd be interested to have RDD covariant on the content type, > this because I have an API that I should be able to share with you soon or > sooner where we are trying to bind the two worlds (rdd+SparkCtx and > dstream+StreamingCtx) and also to combine and chain job components. > In a nutshell, it will able to define Source, Process and Sink of Container > of Wagons (Rdds Dstreams themselves) to compose a Job using a (to be > defined) DSLs. > You can't give information like that and stop too soon :) You know that I've been struggling for some time playing with spark & scalaz-stream and I'm curious ;) So without covariance I cannot for now define a generic noop Sink. > > My0.02c > Andy > > Sent from Tab, sorry for the typos... > Le 22 mars 2014 17:00, "Pascal Voitot Dev" <pascal.voitot....@gmail.com> > a > écrit : > > > On Sat, Mar 22, 2014 at 3:45 PM, Michael Armbrust < > mich...@databricks.com > > >wrote: > > > > > > > > > > From my experience, covariance often becomes a pain when dealing with > > > > serialization/deserialization (I've experienced a few cases while > > > > developing play-json & datomisca). > > > > Moreover, if you have implicits, variance often becomes a headache... > > > > > > > > > This is exactly the kind of feedback I was hoping for! Can you be any > > more > > > specific about the kinds of problems you ran into here? > > > > > > > I've been rethinking about this topic after writing my first mail. > > > > The problem I was talking about is when you try to use typeclass > converters > > and make them contravariant/covariant for input/output. Something like: > > > > Reader[-I, +O] { def read(i:I): O } > > > > Doing this, you soon have implicit collisions and philosophical concerns > > about what it means to serialize/deserialize a Parent class and a Child > > class... > > > > For ex, if you have a Reader[I, Dog], you also have a Reader[I, Mammal] > by > > covariance. > > Then you use this Reader[I, Mammal] to read a Cat because it's a Mammal. > > But it fails as the original Reader expects the representation of a full > > Dog, not only a part of it corresponding to the Mammal... > > > > So you see here that the problem is on deserialization/deserialization > > mechanism itself. > > > > In your case, you don't have this kind of concerns as JavaSerializer and > > KryoSerializer are more about basic marshalling that operates at > low-level > > class representation and you don't rely on implicit typeclasses... > > > > So let's consider what you really want, RDD[+T] and see whether it will > > have bad impacts. > > > > if you do: > > > > val rddChild: RDD[Child] = sc.parallelize(Seq(Child(...), Child(...), > ...)) > > > > If you perform map/reduce ops on this rddChild, when remoting objects, > > spark context will serialize all sequence elements as Child. > > > > But if you do that: > > val rddParent: RDD[Parent] = rddChild > > > > If you perform ops on rddParent, I believe that the serializer should > > serialize elements as Parent elements, certainly losing some data from > > Child. > > On the remote node, they will be deserialized as Parent too but they > > shouldn't be Child elements anymore. > > > > So, here, if it works as I say (I'm not sure), it would mean the > following: > > you have created a RDD from some data and just by invoking covariance, > you > > might have lost data through the remoting mechanism. > > > > Is it something bad in your opinion? (I'm thinking aloud) > > > > Pascal > > >