Co-ask. Shannon: we'd be happy if you are going to help us!
Ted: what do you think about our (Yexi's and my) ideas? Shall we move on to the proposal? On Fri, May 3, 2013 at 8:10 AM, 姜页希 <yexiji...@gmail.com> wrote: > Is there other comments about this issue? > > > > 2013/5/2 Shannon Quinn <squ...@gatech.edu> > > > This sounds excellent. I'd be happy to assist in unifying the interfaces > > of the spectral methods in particular. > > > > > > On 5/2/13 3:54 PM, Yu Lee (JIRA) wrote: > > > >> [ https://issues.apache.org/**jira/browse/MAHOUT-1177?page=** > >> com.atlassian.jira.plugin.**system.issuetabpanels:comment-** > >> tabpanel&focusedCommentId=**13647841#comment-13647841< > https://issues.apache.org/jira/browse/MAHOUT-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13647841#comment-13647841 > >] > >> > >> Yu Lee commented on MAHOUT-1177: > >> ------------------------------**-- > >> > >> Hello Robin Anil, Jeff Eastman, Dan Filimon, and Ted Dunning, > >> > >> Yexi and I (Yu Lee) are new to this Mahout community. We want to > >> contribute to the improvement of Mahout by reforming and simplifying the > >> clustering APIs per the following link: > >> https://issues.apache.org/**jira/browse/MAHOUT-1177?page=** > >> com.atlassian.jira.plugin.**system.issuetabpanels:comment-** > >> tabpanel&focusedCommentId=**13644120#comment-13644120< > https://issues.apache.org/jira/browse/MAHOUT-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644120#comment-13644120 > > > >> > >> We have gone through the code of Mahout clustering. Now we have some > >> ideas about improving it: > >> > >> ==============================**==============================** > >> ============================= > >> Addressing the problems in the current interface: > >> > >> Testing cases are missing. For example, in spectral kmeans clustering, > >> the run methods of SpectralKmeansDriver and EigencutsDriver are not > tested > >> > >> Documentations are missing for some methods. For example: in the run > >> method of DirichletDriver, the description of parameter 'numModels' is > >> missing; in the run method of SpectralKmeansDriver, the description of > some > >> arguments are missing > >> > >> Some testing methods do not contain the specific description of some > >> arguments. For example: in the run method of FuzzyKmeansDriver, the > >> description of an argument of "m" (fuzzification factor) is missing. > >> Although a wiki link regarding "Clustering Analysis" is given, it is not > >> clear enough. > >> > >> ------------------------------**------------------------------** > >> ----------------------------- > >> > >> Implementing some new clustering algorithms > >> > >> Agglomerative hierarchical clustering, which will cluster the data > points > >> into a dendragram, so that user could indicate whatever number of > clusters > >> as they want. (http://en.wikipedia.org/wiki/**Hierarchical_clustering< > http://en.wikipedia.org/wiki/Hierarchical_clustering> > >> ) > >> > >> Dbscan, which is a density based clustering method being able to > identify > >> clusters with arbitrary shapes, and is useful in spatial clustering. ( > >> http://en.wikipedia.org/wiki/**DBSCAN< > http://en.wikipedia.org/wiki/DBSCAN> > >> ) > >> > >> ------------------------------**------------------------------** > >> ----------------------------- > >> > >> Providing a new unified interface > >> > >> Currently, each clustering algorithm has its own implemented class with > >> different interfaces (i.e., run methods in different Drivers have > different > >> argument list). However, it is better to have a unified interface to > >> execute all available clustering methods, and an example interface is as > >> follows: > >> > >> Clustering-run(input, output, methodClass,clusteringConfig) > >> > >> Here, the "methodClass" indicates a specific clustering method, while > >> "clusteringConfig" indicates the configuration for this specific > clustering > >> method. > >> > >> ==============================**==============================** > >> ============================= > >> > >> Could you please let us know what you think about our ideas? > >> > >> > >> > >> > >>> GSOC 2013: Reform and simplify the clustering APIs > >>> ------------------------------**-------------------- > >>> > >>> Key: MAHOUT-1177 > >>> URL: https://issues.apache.org/** > >>> jira/browse/MAHOUT-1177< > https://issues.apache.org/jira/browse/MAHOUT-1177> > >>> Project: Mahout > >>> Issue Type: Improvement > >>> Reporter: Dan Filimon > >>> Labels: gsoc2013, mentor > >>> > >>> Clustering is one of the most used features in Mahout and has many > >>> applications [http://en.wikipedia.org/wiki/** > >>> Cluster_analysis#Applications< > http://en.wikipedia.org/wiki/Cluster_analysis#Applications> > >>> ]**. > >>> We have of lots clustering algorithms. There's: > >>> - basic k-means > >>> - canopy clustering > >>> - Dirichlet clustering > >>> - Fuzzy k-means > >>> - Spectral k-means > >>> - Streaming k-means [coming soon] > >>> We want to make them easier to use by updating the APIs and make sure > >>> they all work in the same way have consistent inputs, outputs, > diagnostics > >>> and documentation. > >>> This is a great way to gain an in-depth understanding of clustering > >>> algorithms, familiarize yourself with Hadoop, Mahout clustering and > good > >>> software engineering principles. > >>> > >> -- > >> This message is automatically generated by JIRA. > >> If you think it was sent incorrectly, please contact your JIRA > >> administrators > >> For more information on JIRA, see: http://www.atlassian.com/** > >> software/jira <http://www.atlassian.com/software/jira> > >> > > > > > > > -- > ------ > Yexi Jiang, > ECS 251, yjian...@cs.fiu.edu > School of Computer and Information Science, > Florida International University > Homepage: http://users.cis.fiu.edu/~yjian004/ >