[ 
https://issues.apache.org/jira/browse/MAHOUT-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabel Drost updated MAHOUT-85:
-------------------------------

    Attachment: MAHOUT-85.patch

The patch has tests added to the implementation. The additional abstraction 
proposed earlier is integrated. Distance measure is not configurable but 
corresponds to what was defined in the original algorithm formulations.

The implementation currently is sequential-only. Still evaluating, if and how 
is might be possible to parallelize.

Missing so far: An example showing how to use training, how to store the 
resulting model and how to apply the model. Probably should be done in a new 
issue to keep this one focused on the algorithm itself. In addition I still 
have to at least add links from our wiki to the wikipedia pages on both 
algorithms.

(Had some time left during the past few days: Screws in my knee are out now ;) )

> Perceptron/Winnow Trainer
> -------------------------
>
>                 Key: MAHOUT-85
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-85
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Classification
>    Affects Versions: 0.1
>            Reporter: Isabel Drost
>            Assignee: Isabel Drost
>             Fix For: 0.3
>
>         Attachments: MAHOUT-85.patch, perceptronWinnowTrainer.diff
>
>
> Please find attached a first sketch for perceptron and winnow training. 
> Please look very, very carefully at the patch, as I added the heart of the 
> algorithms in the emergency room at Charite Berlin (after I broke my leg when 
> cycling to the Hadoop Get Together ;) ). 
> The patch does not yet feature unit tests nor is it parallelised. Currently 
> my plan is to set up an example with the webKb dataset, add unit tests to the 
> code and after that go parallel. I would like to get some feedback early on, 
> in addition I would feel a lot better, if a second and third pair of eyes had 
> a look at the code to make sure all obvious mistakes are out as early as 
> possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to