Re: [jira] [Commented] (MAHOUT-1177) GSOC 2013: Reform and simplify the clustering APIs

yu lee Fri, 03 May 2013 09:57:17 -0700

Co-ask.

Shannon: we'd be happy if you are going to help us!


Ted: what do you think about our (Yexi's and my) ideas? Shall we move on to
the proposal?


On Fri, May 3, 2013 at 8:10 AM, 姜页希 <yexiji...@gmail.com> wrote:

> Is there other comments about this issue?
>
>
>
> 2013/5/2 Shannon Quinn <squ...@gatech.edu>
>
> > This sounds excellent. I'd be happy to assist in unifying the interfaces
> > of the spectral methods in particular.
> >
> >
> > On 5/2/13 3:54 PM, Yu Lee (JIRA) wrote:
> >
> >>      [ https://issues.apache.org/**jira/browse/MAHOUT-1177?page=**
> >> com.atlassian.jira.plugin.**system.issuetabpanels:comment-**
> >> tabpanel&focusedCommentId=**13647841#comment-13647841<
> https://issues.apache.org/jira/browse/MAHOUT-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13647841#comment-13647841
> >]
> >>
> >> Yu Lee commented on MAHOUT-1177:
> >> ------------------------------**--
> >>
> >> Hello Robin Anil, Jeff Eastman, Dan Filimon, and Ted Dunning,
> >>
> >> Yexi and I (Yu Lee) are new to this Mahout community. We want to
> >> contribute to the improvement of Mahout by reforming and simplifying the
> >> clustering APIs per the following link:
> >> https://issues.apache.org/**jira/browse/MAHOUT-1177?page=**
> >> com.atlassian.jira.plugin.**system.issuetabpanels:comment-**
> >> tabpanel&focusedCommentId=**13644120#comment-13644120<
> https://issues.apache.org/jira/browse/MAHOUT-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644120#comment-13644120
> >
> >>
> >> We have gone through the code of Mahout clustering. Now we have some
> >> ideas about improving it:
> >>
> >> ==============================**==============================**
> >> =============================
> >> Addressing the problems in the current interface:
> >>
> >> Testing cases are missing. For example, in spectral kmeans clustering,
> >> the run methods of SpectralKmeansDriver and EigencutsDriver are not
> tested
> >>
> >> Documentations are missing for some methods. For example: in the run
> >> method of DirichletDriver, the description of parameter 'numModels' is
> >> missing; in the run method of SpectralKmeansDriver, the description of
> some
> >> arguments are missing
> >>
> >> Some testing methods do not contain the specific description of some
> >> arguments. For example: in the run method of FuzzyKmeansDriver, the
> >> description of an argument of "m" (fuzzification factor) is missing.
> >> Although a wiki link regarding "Clustering Analysis" is given, it is not
> >> clear enough.
> >>
> >> ------------------------------**------------------------------**
> >> -----------------------------
> >>
> >> Implementing some new clustering algorithms
> >>
> >> Agglomerative hierarchical clustering, which will cluster the data
> points
> >> into a dendragram, so that user could indicate whatever number of
> clusters
> >> as they want. (http://en.wikipedia.org/wiki/**Hierarchical_clustering<
> http://en.wikipedia.org/wiki/Hierarchical_clustering>
> >> )
> >>
> >> Dbscan, which is a density based clustering method being able to
> identify
> >> clusters with arbitrary shapes, and is useful in spatial clustering. (
> >> http://en.wikipedia.org/wiki/**DBSCAN<
> http://en.wikipedia.org/wiki/DBSCAN>
> >> )
> >>
> >> ------------------------------**------------------------------**
> >> -----------------------------
> >>
> >> Providing a new unified interface
> >>
> >> Currently, each clustering algorithm has its own implemented class with
> >> different interfaces (i.e., run methods in different Drivers have
> different
> >> argument list). However, it is better to have a unified interface to
> >> execute all available clustering methods, and an example interface is as
> >> follows:
> >>
> >> Clustering-run(input, output, methodClass,clusteringConfig)
> >>
> >> Here, the "methodClass" indicates a specific clustering method, while
> >> "clusteringConfig" indicates the configuration for this specific
> clustering
> >> method.
> >>
> >> ==============================**==============================**
> >> =============================
> >>
> >> Could you please let us know what you think about our ideas?
> >>
> >>
> >>
> >>
> >>> GSOC 2013: Reform and simplify the clustering APIs
> >>> ------------------------------**--------------------
> >>>
> >>>                  Key: MAHOUT-1177
> >>>                  URL: https://issues.apache.org/**
> >>> jira/browse/MAHOUT-1177<
> https://issues.apache.org/jira/browse/MAHOUT-1177>
> >>>              Project: Mahout
> >>>           Issue Type: Improvement
> >>>             Reporter: Dan Filimon
> >>>               Labels: gsoc2013, mentor
> >>>
> >>> Clustering is one of the most used features in Mahout and has many
> >>> applications [http://en.wikipedia.org/wiki/**
> >>> Cluster_analysis#Applications<
> http://en.wikipedia.org/wiki/Cluster_analysis#Applications>
> >>> ]**.
> >>> We have of lots clustering algorithms. There's:
> >>> - basic k-means
> >>> - canopy clustering
> >>> - Dirichlet clustering
> >>> - Fuzzy k-means
> >>> - Spectral k-means
> >>> - Streaming k-means [coming soon]
> >>> We want to make them easier to use by updating the APIs and make sure
> >>> they all work in the same way have consistent inputs, outputs,
> diagnostics
> >>> and documentation.
> >>> This is a great way to gain an in-depth understanding of clustering
> >>> algorithms, familiarize yourself with Hadoop, Mahout clustering and
> good
> >>> software engineering principles.
> >>>
> >> --
> >> This message is automatically generated by JIRA.
> >> If you think it was sent incorrectly, please contact your JIRA
> >> administrators
> >> For more information on JIRA, see: http://www.atlassian.com/**
> >> software/jira <http://www.atlassian.com/software/jira>
> >>
> >
> >
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian...@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>

Re: [jira] [Commented] (MAHOUT-1177) GSOC 2013: Reform and simplify the clustering APIs

Reply via email to