[ 
https://issues.apache.org/jira/browse/MAHOUT-510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated MAHOUT-510:
-----------------------------

    Attachment: MAHOUT-510.patch

This is my proposed final patch. Highlights:

It removes all use of JSON. This was a stated goal, to reduce the number of 
serialization approaches, and this does by 1.

Dirichlet had used JSON to serialize ModelDistribution, when really what it was 
doing was serializing a simple description of the instance as a string. So, 
there's now DistributionDescription used throughout to encapsulate these params 
and transport them around as a string.

LogisticModelParameters serialized itself with JSON too. This was the only bit 
that looked a bit hard to know what to do with. Its main component, 
OnlineLogisticRegression, was already Writable. So I made LMP Writable too and 
used this serialization to write/load it to a stream. At least, it's using an 
existing serialization mechanism.

Tests pass of course.

The only remaining concern was that this will mean some examples on the wiki 
(and in the book) are a little out of date then. How big an issue is that vs. 
wanting to commit this at some near point?

Thoughts on the original concept of this JIRA to begin with?

Nice thing is that this results in a net decrease of over 1,600 lines of code !

> Standardize serialization mechanisms
> ------------------------------------
>
>                 Key: MAHOUT-510
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-510
>             Project: Mahout
>          Issue Type: Task
>    Affects Versions: 0.4
>            Reporter: Sean Owen
>            Assignee: Sean Owen
>             Fix For: 0.5
>
>         Attachments: MAHOUT-510.patch, MAHOUT-510.patch, MAHOUT-510.patch
>
>
> At the moment this is tracking a broader concern: to standardize as much as 
> possible how we approach serialization. The long-term goal is notionally to 
> use the following "encodings" as the input/output of Mahout stuff, and by 
> extension, probably internally too.
> - Text
> - Vector Writable
> - (maybe Avro)
> not
> - Serializable
> - GSON / JSON

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to