what i mean, among other things, in (4) is that the fanciest possible
bidirectional hash map is still a map, i.e. simplest serialization envelope
for this data is a bag of (key, value) pairs. Serializing that as
java-serialized way of biHashMap is probably the bulkiest thing to do here.
much faster way is to serialize scala iterator of the said tuples (a
collection wrapper). In non-strict way. Obviously, it is also possible to
map it to a strict scala collection and serialize as such (probably shorter
notation but bigger memory overhead).


On Tue, Jul 29, 2014 at 10:30 AM, Dmitriy Lyubimov <dlie...@gmail.com>
wrote:

> There are a few facts useful to know also mixed with my opinion:
>
> (1) My take is Mahout (at this point at least) doesn't need to support
> serialization anything of specific classes but Matrix and Vector because
> anything else is not algebra.
>
> (2) Most native scala types, including scala collections, are already
> supported by kryo by default.
>
> (3) We don't want use java collections in scala code as a serialization
> envelope. Like, ever.
>
> (4) Clearly, a Spark application working with RDD outside of Mahout
> algebraic support may want to use a specific serialization envelope which
> is neither matrix nor standard Scala type/collection. (not sure why it
> would -- but ok). In this case the real solution is to  provide a way for
> application to _decorate_ default mahout registrator, rather than hack the
> registrator itself.
>
>
>
> On Tue, Jul 29, 2014 at 10:18 AM, Pat Ferrel <p...@occamsmachete.com>
> wrote:
>
>> This time it doesn’t seem to be related to registering class serializers.
>> Seems like the Scala collections work as well as the Java ones. It would
>> still be nice to know when we have to add to that list in
>> MahoutKryoRegistrator. When a job fails to serialize the message is not
>> very helpful.
>>
>>
>> On Jul 29, 2014, at 9:10 AM, Pat Ferrel <p...@occamsmachete.com> wrote:
>>
>> I need to do a sort each vector inside an rdd.map. The last time I added
>> a collection class, Guava’s HashBiMap, I had to add it to the
>> MahoutKryoRegisrator.
>>
>> This time at first it wouldn't serialize when I used a Scala
>> List[Vector.Element], but the problem is I can’t seem to add the Scala List
>> to the MahoutKryoRegisrator because it doesn’t understand the classname. So
>> I had to fall back to using Java’s ArrayList, which doesn’t require
>> registering for some reason.
>>
>> What are the rules for when, why, and what we need to register with the
>> MahoutKryoRegisrator? Is there a problem with just registering the Scala
>> collection library?
>>
>>
>

Reply via email to