Actually the density based clustering algorithms are very important nowadays.
There are tons of GIS applications demanding the functionality of grouping the data points in spatial settings, e.g., how to cluster two buildings which are quite close to each other in terms of their euclidean distance, yet with a river in between them? Considering the size of the GIS data, the implementation of density-based clustering algorithms with MapReduce is a must. In addition, some Mahout users ask about the plan of including DBScan into Mahout ( http://comments.gmane.org/gmane.comp.apache.mahout.user/11638) BTW, There are some publications online regarding the algorithms of DBScan in MapReduce. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6121313 http://www.scientific.net/AMR.301-303.1133 http://ieeexplore.ieee.org/xpl/articleDetails.jsp?reload=true&arnumber=6253489 On Tue, May 7, 2013 at 6:33 PM, 姜页希 <yexiji...@gmail.com> wrote: > There are several distinct implementations available online. > Also, Yu Lee and I have some experience on developing hierarchical > clustering (clustering the data with arbitrary shape through > connectivity-based clustering) on hadoop. > > If you think this is OK, Yu Lee and I can take this task of implementation, > test, and maintenance. Also, we would periodically report the progress. > > We can fork the mahout from github and conduct the implementation > independently. > > > > 2013/5/7 Ted Dunning (JIRA) <j...@apache.org> > > > > > [ > > > https://issues.apache.org/jira/browse/MAHOUT-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13651369#comment-13651369 > ] > > > > Ted Dunning commented on MAHOUT-1206: > > ------------------------------------- > > > > Do you know of scalable algorithms for these other algorithms? > > > > Is there a demonstrated need? Who will maintain the implementations? > > > > > > > > > Add density-based clustering algorithms to mahout > > > ------------------------------------------------- > > > > > > Key: MAHOUT-1206 > > > URL: https://issues.apache.org/jira/browse/MAHOUT-1206 > > > Project: Mahout > > > Issue Type: Improvement > > > Reporter: Yexi Jiang > > > Labels: clustering > > > > > > The clustering algorithms (kmeans, fuzzy kmeans, dirichlet clustering, > > and spectral cluster) clustering data by assuming that the data can be > > clustered into the regular hyper sphere or ellipsoid. However, in > > practical, not all the data can be clustered in this way. > > > To enable the data to be clustered in arbitrary shapes, clustering > > algorithms like DBSCAN, BIRCH, CLARANCE ( > > http://en.wikipedia.org/wiki/Cluster_analysis#Density-based_clustering) > > are proposed. > > > It is better that we can implement one or some of these clustering > > algorithm to enrich the clustering library. > > > > -- > > This message is automatically generated by JIRA. > > If you think it was sent incorrectly, please contact your JIRA > > administrators > > For more information on JIRA, see: > http://www.atlassian.com/software/jira > > > > > > -- > ------ > Yexi Jiang, > ECS 251, yjian...@cs.fiu.edu > School of Computer and Information Science, > Florida International University > Homepage: http://users.cis.fiu.edu/~yjian004/ >