[ 
https://issues.apache.org/jira/browse/MAHOUT-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037466#comment-13037466
 ] 

Ted Dunning commented on MAHOUT-668:
------------------------------------

More style: Please make instance variables private where ever possible.  Also, 
weaken types where possible.  Don't use a HashMap if you really only care about 
it being a Map.  Do use the Mahout standard indentation.

Content: Why did you not use the Dictionary class to manage a set of id's?

Content: Isn't there already a cosine distance available?  Likewise, isn't 
there a distance class available?

Content: Your dot product distance is a little odd.  Why use 1/(a \dot b)?  Why 
not use - (a \dot b)?  Are you assuming a and b are normalized?  If so, why not 
use euclidean distance (aka L2)?

Style: Can you provide a wikipedia link for your definition of KL distances?  
Do you mean KL as Kulback-Liebler or Karhonen-Loeve?

Style: Don't comment out code that you want to delete.  Just delete it.

Content: I think that your use of State variables is confused especially in, 
for instance, the MasterVector class.  Why do you do it?  What does it really 
mean?  What is the advantage?

Style: Isn't there a better name for MasterVector?  How about 
IdfCountDictionary or something?

Style: Spell check your javadoc

Style: The javadoc on TestClassifier seems very confusing.

Overall, I think that this needs a lot of work.

> Adding knn support to Mahout classifiers
> ----------------------------------------
>
>                 Key: MAHOUT-668
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-668
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification
>    Affects Versions: 0.6
>            Reporter: Daniel McEnnis
>              Labels: classification, knn
>         Attachments: MAHOUT-668.pat, Mahout-668-2.patch, Mahout-668-3.patch, 
> Mahout-668.pat
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Initial implementation of the knn.  This is a minimum base set with many more 
> possible add-ons including support for text and weka input as well as a 
> classify only (no confusion matrix) back end.  The system was tested on the 
> 20 newsgroup data set.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to