There are a few facts useful to know also mixed with my opinion: (1) My take is Mahout (at this point at least) doesn't need to support serialization anything of specific classes but Matrix and Vector because anything else is not algebra.
(2) Most native scala types, including scala collections, are already supported by kryo by default. (3) We don't want use java collections in scala code as a serialization envelope. Like, ever. (4) Clearly, a Spark application working with RDD outside of Mahout algebraic support may want to use a specific serialization envelope which is neither matrix nor standard Scala type/collection. (not sure why it would -- but ok). In this case the real solution is to provide a way for application to _decorate_ default mahout registrator, rather than hack the registrator itself. On Tue, Jul 29, 2014 at 10:18 AM, Pat Ferrel <p...@occamsmachete.com> wrote: > This time it doesn’t seem to be related to registering class serializers. > Seems like the Scala collections work as well as the Java ones. It would > still be nice to know when we have to add to that list in > MahoutKryoRegistrator. When a job fails to serialize the message is not > very helpful. > > > On Jul 29, 2014, at 9:10 AM, Pat Ferrel <p...@occamsmachete.com> wrote: > > I need to do a sort each vector inside an rdd.map. The last time I added a > collection class, Guava’s HashBiMap, I had to add it to the > MahoutKryoRegisrator. > > This time at first it wouldn't serialize when I used a Scala > List[Vector.Element], but the problem is I can’t seem to add the Scala List > to the MahoutKryoRegisrator because it doesn’t understand the classname. So > I had to fall back to using Java’s ArrayList, which doesn’t require > registering for some reason. > > What are the rules for when, why, and what we need to register with the > MahoutKryoRegisrator? Is there a problem with just registering the Scala > collection library? > >