There are a few facts useful to know also mixed with my opinion:

(1) My take is Mahout (at this point at least) doesn't need to support
serialization anything of specific classes but Matrix and Vector because
anything else is not algebra.

(2) Most native scala types, including scala collections, are already
supported by kryo by default.

(3) We don't want use java collections in scala code as a serialization
envelope. Like, ever.

(4) Clearly, a Spark application working with RDD outside of Mahout
algebraic support may want to use a specific serialization envelope which
is neither matrix nor standard Scala type/collection. (not sure why it
would -- but ok). In this case the real solution is to  provide a way for
application to _decorate_ default mahout registrator, rather than hack the
registrator itself.



On Tue, Jul 29, 2014 at 10:18 AM, Pat Ferrel <p...@occamsmachete.com> wrote:

> This time it doesn’t seem to be related to registering class serializers.
> Seems like the Scala collections work as well as the Java ones. It would
> still be nice to know when we have to add to that list in
> MahoutKryoRegistrator. When a job fails to serialize the message is not
> very helpful.
>
>
> On Jul 29, 2014, at 9:10 AM, Pat Ferrel <p...@occamsmachete.com> wrote:
>
> I need to do a sort each vector inside an rdd.map. The last time I added a
> collection class, Guava’s HashBiMap, I had to add it to the
> MahoutKryoRegisrator.
>
> This time at first it wouldn't serialize when I used a Scala
> List[Vector.Element], but the problem is I can’t seem to add the Scala List
> to the MahoutKryoRegisrator because it doesn’t understand the classname. So
> I had to fall back to using Java’s ArrayList, which doesn’t require
> registering for some reason.
>
> What are the rules for when, why, and what we need to register with the
> MahoutKryoRegisrator? Is there a problem with just registering the Scala
> collection library?
>
>

Reply via email to