[ 
https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982278#action_12982278
 ] 

Sean Owen commented on MAHOUT-510:
----------------------------------

(BTW I'm not committing this for some time.)

I've managed to take out almost all the usages. The only real usage of it is in 
the dirichlet implementation, which uses it to serialize a ModelDistribution 
and pass it as a string to Hadoop workers via the Configuration object.

Now, per the issue description, we could re-do serialization here to use 
Writable. That's not hard and makes it possible to write these things out to 
HDFS later in a more Hadoop-ish way later. But that gives you a serialization 
to bytes, not String. I could Base64-encode it; it's not huge.

That's starting to get a little weird. Is the better answer to look at writing 
the ModelDistribution to HDFS? or just leave this use of JSON?  

> Standardize serialization mechanisms
> ------------------------------------
>
>                 Key: MAHOUT-510
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-510
>             Project: Mahout
>          Issue Type: Task
>    Affects Versions: 0.4
>            Reporter: Sean Owen
>             Fix For: 0.5
>
>         Attachments: MAHOUT-510.patch
>
>
> At the moment this is tracking a broader concern: to standardize as much as 
> possible how we approach serialization. The long-term goal is notionally to 
> use the following "encodings" as the input/output of Mahout stuff, and by 
> extension, probably internally too.
> - Text
> - Vector Writable
> - (maybe Avro)
> not
> - Serializable
> - GSON / JSON

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to