i believe kryo serialization uses runtime class, not declared class
we have no issues serializing covariant scala lists


On Sat, Mar 22, 2014 at 11:59 AM, Pascal Voitot Dev <
pascal.voitot....@gmail.com> wrote:

> On Sat, Mar 22, 2014 at 3:45 PM, Michael Armbrust <mich...@databricks.com
> >wrote:
>
> > >
> > > From my experience, covariance often becomes a pain when dealing with
> > > serialization/deserialization (I've experienced a few cases while
> > > developing play-json & datomisca).
> > > Moreover, if you have implicits, variance often becomes a headache...
> >
> >
> > This is exactly the kind of feedback I was hoping for!  Can you be any
> more
> > specific about the kinds of problems you ran into here?
> >
>
> I've been rethinking about this topic after writing my first mail.
>
> The problem I was talking about is when you try to use typeclass converters
> and make them contravariant/covariant for input/output. Something like:
>
> Reader[-I, +O] { def read(i:I): O }
>
> Doing this, you soon have implicit collisions and philosophical concerns
> about what it means to serialize/deserialize a Parent class and a Child
> class...
>
> For ex, if you have a Reader[I, Dog], you also have a Reader[I, Mammal] by
> covariance.
> Then you use this Reader[I, Mammal] to read a Cat because it's a Mammal.
> But it fails as the original Reader expects the representation of a full
> Dog, not only a part of it corresponding to the Mammal...
>
> So you see here that the problem is on deserialization/deserialization
> mechanism itself.
>
> In your case, you don't have this kind of concerns as JavaSerializer and
> KryoSerializer are more about basic marshalling that operates at low-level
> class representation and you don't rely on implicit typeclasses...
>
> So let's consider what you really want, RDD[+T] and see whether it will
> have bad impacts.
>
> if you do:
>
> val rddChild: RDD[Child] = sc.parallelize(Seq(Child(...), Child(...), ...))
>
> If you perform map/reduce ops on this rddChild, when remoting objects,
> spark context will serialize all sequence elements as Child.
>
> But if you do that:
> val rddParent: RDD[Parent] = rddChild
>
> If you perform ops on rddParent, I believe that the serializer should
> serialize elements as Parent elements, certainly losing some data from
> Child.
> On the remote node, they will be deserialized as Parent too but they
> shouldn't be Child elements anymore.
>
> So, here, if it works as I say (I'm not sure), it would mean the following:
> you have created a RDD from some data and just by invoking covariance, you
> might have lost data through the remoting mechanism.
>
> Is it something bad in your opinion? (I'm thinking aloud)
>
> Pascal
>

Reply via email to