[ 
https://issues.apache.org/jira/browse/MAHOUT-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898267#action_12898267
 ] 

Isabel Drost edited comment on MAHOUT-479 at 8/13/10 11:06 AM:
---------------------------------------------------------------

Some thoughts that come to my mind:

* Algorithm implementations should be able to rely on getting their input as 
vectors+.
* To make plug'n'play of algorithms easy we need to provide sensible default 
parameters for each implementation (for spectral clustering that would include 
adding a default strategy for computing an affinity matrix from a set of item 
vectors).
* Parameters must be easy to override.
* Integrate with the vector generation classes in mahout.util - should we move 
anything feature related that is still in core there?
* Need a set of common interfaces for classification algorithms (methods train, 
classify etc. come to mind) so implementations of these can be exchanged easily.

Probably have forgotten like dozens of other open questions - any input welcome.

+ Could potentially help with MAHOUT-287, however I need some help 
understanding the existing code.

      was (Author: isabel):
    Some thoughts that come to my mind:

* Algorithm implementations should be able to rely on getting their input as 
vectors*.
* To make plug'n'play of algorithms easy we need to provide sensible default 
parameters for each implementation (for spectral clustering that would include 
adding a default strategy for computing an affinity matrix from a set of item 
vectors).
* Parameters must be easy to override.
* Integrate with the vector generation classes in mahout.util - should we move 
anything feature related that is still in core there?
* Need a set of common interfaces for classification algorithms (methods train, 
classify etc. come to mind) so implementations of these can be exchanged easily.

Probably have forgotten like dozens of other open questions - any input welcome.

 * Could potentially help with MAHOUT-287, however I need some help 
understanding the existing code.
  
> Streamline classification/ clustering data structures
> -----------------------------------------------------
>
>                 Key: MAHOUT-479
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-479
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification, Clustering
>    Affects Versions: 0.1, 0.2, 0.3, 0.4
>            Reporter: Isabel Drost
>
> Opening this JIRA issue to collect ideas on how to streamline our 
> classification and clustering algorithms to make integration for users easier 
> as per mailing list thread http://markmail.org/message/pnzvrqpv5226twfs
> {quote}
> Jake and Robin and I were talking the other evening and a common lament was 
> that our classification (and clustering) stuff was all over the map in terms 
> of data structures.  Driving that to rest and getting those comments even 
> vaguely as plug and play as our much more advanced recommendation components 
> would be very, very helpful.
> {quote}
> This issue probably also realates to MAHOUT-287 (intention there is to make 
> naive bayes run on vectors as input).
> Ted, Jake, Robin: Would be great if someone of you could add a comment on 
> some of the issues you discussed "the other evening" and (if applicable) any 
> minor or major changes you think could help solve this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to