what i mean, among other things, in (4) is that the fanciest possible bidirectional hash map is still a map, i.e. simplest serialization envelope for this data is a bag of (key, value) pairs. Serializing that as java-serialized way of biHashMap is probably the bulkiest thing to do here. much faster way is to serialize scala iterator of the said tuples (a collection wrapper). In non-strict way. Obviously, it is also possible to map it to a strict scala collection and serialize as such (probably shorter notation but bigger memory overhead).
On Tue, Jul 29, 2014 at 10:30 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > There are a few facts useful to know also mixed with my opinion: > > (1) My take is Mahout (at this point at least) doesn't need to support > serialization anything of specific classes but Matrix and Vector because > anything else is not algebra. > > (2) Most native scala types, including scala collections, are already > supported by kryo by default. > > (3) We don't want use java collections in scala code as a serialization > envelope. Like, ever. > > (4) Clearly, a Spark application working with RDD outside of Mahout > algebraic support may want to use a specific serialization envelope which > is neither matrix nor standard Scala type/collection. (not sure why it > would -- but ok). In this case the real solution is to provide a way for > application to _decorate_ default mahout registrator, rather than hack the > registrator itself. > > > > On Tue, Jul 29, 2014 at 10:18 AM, Pat Ferrel <p...@occamsmachete.com> > wrote: > >> This time it doesn’t seem to be related to registering class serializers. >> Seems like the Scala collections work as well as the Java ones. It would >> still be nice to know when we have to add to that list in >> MahoutKryoRegistrator. When a job fails to serialize the message is not >> very helpful. >> >> >> On Jul 29, 2014, at 9:10 AM, Pat Ferrel <p...@occamsmachete.com> wrote: >> >> I need to do a sort each vector inside an rdd.map. The last time I added >> a collection class, Guava’s HashBiMap, I had to add it to the >> MahoutKryoRegisrator. >> >> This time at first it wouldn't serialize when I used a Scala >> List[Vector.Element], but the problem is I can’t seem to add the Scala List >> to the MahoutKryoRegisrator because it doesn’t understand the classname. So >> I had to fall back to using Java’s ArrayList, which doesn’t require >> registering for some reason. >> >> What are the rules for when, why, and what we need to register with the >> MahoutKryoRegisrator? Is there a problem with just registering the Scala >> collection library? >> >> >