[
https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982278#action_12982278
]
Sean Owen commented on MAHOUT-510:
----------------------------------
(BTW I'm not committing this for some time.)
I've managed to take out almost all the usages. The only real usage of it is in
the dirichlet implementation, which uses it to serialize a ModelDistribution
and pass it as a string to Hadoop workers via the Configuration object.
Now, per the issue description, we could re-do serialization here to use
Writable. That's not hard and makes it possible to write these things out to
HDFS later in a more Hadoop-ish way later. But that gives you a serialization
to bytes, not String. I could Base64-encode it; it's not huge.
That's starting to get a little weird. Is the better answer to look at writing
the ModelDistribution to HDFS? or just leave this use of JSON?
> Standardize serialization mechanisms
> ------------------------------------
>
> Key: MAHOUT-510
> URL: https://issues.apache.org/jira/browse/MAHOUT-510
> Project: Mahout
> Issue Type: Task
> Affects Versions: 0.4
> Reporter: Sean Owen
> Fix For: 0.5
>
> Attachments: MAHOUT-510.patch
>
>
> At the moment this is tracking a broader concern: to standardize as much as
> possible how we approach serialization. The long-term goal is notionally to
> use the following "encodings" as the input/output of Mahout stuff, and by
> extension, probably internally too.
> - Text
> - Vector Writable
> - (maybe Avro)
> not
> - Serializable
> - GSON / JSON
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.