Actually the density based clustering algorithms are very important
nowadays.

There are tons of GIS applications demanding the functionality of grouping
the data points in spatial settings, e.g., how to cluster two buildings
which are quite close to each other in terms of their euclidean distance,
yet with a river in between them?

Considering the size of the GIS data, the implementation of density-based
clustering algorithms with MapReduce is a must. In addition, some Mahout
users ask about the plan of including DBScan into Mahout (
http://comments.gmane.org/gmane.comp.apache.mahout.user/11638)

BTW, There are some publications online regarding the algorithms of DBScan
in MapReduce.
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6121313
http://www.scientific.net/AMR.301-303.1133
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?reload=true&arnumber=6253489






On Tue, May 7, 2013 at 6:33 PM, 姜页希 <yexiji...@gmail.com> wrote:

> There are several distinct implementations available online.
> Also, Yu Lee and I have some experience on developing hierarchical
> clustering (clustering the data with arbitrary shape through
> connectivity-based clustering) on hadoop.
>
> If you think this is OK, Yu Lee and I can take this task of implementation,
> test, and maintenance. Also, we would periodically report the progress.
>
> We can fork the mahout from github and conduct the implementation
> independently.
>
>
>
> 2013/5/7 Ted Dunning (JIRA) <j...@apache.org>
>
> >
> >     [
> >
> https://issues.apache.org/jira/browse/MAHOUT-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13651369#comment-13651369
> ]
> >
> > Ted Dunning commented on MAHOUT-1206:
> > -------------------------------------
> >
> > Do you know of scalable algorithms for these other algorithms?
> >
> > Is there a demonstrated need?  Who will maintain the implementations?
> >
> >
> >
> > > Add density-based clustering algorithms to mahout
> > > -------------------------------------------------
> > >
> > >                 Key: MAHOUT-1206
> > >                 URL: https://issues.apache.org/jira/browse/MAHOUT-1206
> > >             Project: Mahout
> > >          Issue Type: Improvement
> > >            Reporter: Yexi Jiang
> > >              Labels: clustering
> > >
> > > The clustering algorithms (kmeans, fuzzy kmeans, dirichlet clustering,
> > and spectral cluster) clustering data by assuming that the data can be
> > clustered into the regular hyper sphere or ellipsoid. However, in
> > practical, not all the data can be clustered in this way.
> > > To enable the data to be clustered in arbitrary shapes, clustering
> > algorithms like DBSCAN, BIRCH, CLARANCE (
> > http://en.wikipedia.org/wiki/Cluster_analysis#Density-based_clustering)
> > are proposed.
> > > It is better that we can implement one or some of these clustering
> > algorithm to enrich the clustering library.
> >
> > --
> > This message is automatically generated by JIRA.
> > If you think it was sent incorrectly, please contact your JIRA
> > administrators
> > For more information on JIRA, see:
> http://www.atlassian.com/software/jira
> >
>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian...@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>

Reply via email to