[ 
https://issues.apache.org/jira/browse/MAHOUT-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037468#comment-13037468
 ] 

Ted Dunning commented on MAHOUT-668:
------------------------------------

What is the overall use case here?  It seems like this is a not very motivated 
collection of command line
programs that have arbitrary choices of distances and methods.

Why doesn't it fit into the standard classifier API more?

Is there any roadmap document that would describe how to use these classifiers? 
 Could a Mahout user of some other kind
of classifier guess how to use these classes?

What I would much rather see is something that works on Vector's and which has 
a well-defined on-disk format for a model.  Then it would be nice to have good 
and fast parallel and sequential training code.  The sequential training code 
should emulate on-line training and implement the standard API's.  You should 
allow old state to be updated and then written back to disk with the close 
method.  Deployment should be possible analogously to the way that the 
LogisticRegression stuff does it.  The ModelSerializer should be able to load 
and save this kind of model.  It would be very fine if the model itself were a 
writable.


> Adding knn support to Mahout classifiers
> ----------------------------------------
>
>                 Key: MAHOUT-668
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-668
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification
>    Affects Versions: 0.6
>            Reporter: Daniel McEnnis
>              Labels: classification, knn
>         Attachments: MAHOUT-668.pat, Mahout-668-2.patch, Mahout-668-3.patch, 
> Mahout-668.pat
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Initial implementation of the knn.  This is a minimum base set with many more 
> possible add-ons including support for text and weka input as well as a 
> classify only (no confusion matrix) back end.  The system was tested on the 
> 20 newsgroup data set.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to