[jira] [Commented] (MAHOUT-1206) Add density-based clustering algorithms to mahout
[ https://issues.apache.org/jira/browse/MAHOUT-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816858#comment-13816858 ] Chisomo Sakala commented on MAHOUT-1206: I'm really excited about this prospect. The paper http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6253489 talks about how to implement MapReduce for DBScan (DBSCAN-MR). I have a pdf copy and can email it to anybody interested in viewing it. I emailed the author of that paper to find out if they'd be willing to contribute their coded implementation of DBSCAN-MR to Mahout, but I haven't yet gotten a response. Here is another paper discussing parellelization of DBSCAN. http://conferences.computer.org/sc/2012/papers/1000a053.pdf Add density-based clustering algorithms to mahout - Key: MAHOUT-1206 URL: https://issues.apache.org/jira/browse/MAHOUT-1206 Project: Mahout Issue Type: Improvement Reporter: Yexi Jiang Labels: clustering Fix For: Backlog The clustering algorithms (kmeans, fuzzy kmeans, dirichlet clustering, and spectral cluster) clustering data by assuming that the data can be clustered into the regular hyper sphere or ellipsoid. However, in practical, not all the data can be clustered in this way. To enable the data to be clustered in arbitrary shapes, clustering algorithms like DBSCAN, BIRCH, CLARANCE (http://en.wikipedia.org/wiki/Cluster_analysis#Density-based_clustering) are proposed. It is better that we can implement one or some of these clustering algorithm to enrich the clustering library. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAHOUT-1206) Add density-based clustering algorithms to mahout
[ https://issues.apache.org/jira/browse/MAHOUT-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672489#comment-13672489 ] Ted Dunning commented on MAHOUT-1206: - I put some questions on here. I am still somewhat dubious of these algorithms for large data. If you could suggest more concrete information about how you would implement them, that would help. The other questions about need and support are important as well. Add density-based clustering algorithms to mahout - Key: MAHOUT-1206 URL: https://issues.apache.org/jira/browse/MAHOUT-1206 Project: Mahout Issue Type: Improvement Reporter: Yexi Jiang Labels: clustering Fix For: Backlog The clustering algorithms (kmeans, fuzzy kmeans, dirichlet clustering, and spectral cluster) clustering data by assuming that the data can be clustered into the regular hyper sphere or ellipsoid. However, in practical, not all the data can be clustered in this way. To enable the data to be clustered in arbitrary shapes, clustering algorithms like DBSCAN, BIRCH, CLARANCE (http://en.wikipedia.org/wiki/Cluster_analysis#Density-based_clustering) are proposed. It is better that we can implement one or some of these clustering algorithm to enrich the clustering library. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-1206) Add density-based clustering algorithms to mahout
[ https://issues.apache.org/jira/browse/MAHOUT-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672147#comment-13672147 ] Yexi Jiang commented on MAHOUT-1206: Still there is no comments? Add density-based clustering algorithms to mahout - Key: MAHOUT-1206 URL: https://issues.apache.org/jira/browse/MAHOUT-1206 Project: Mahout Issue Type: Improvement Reporter: Yexi Jiang Labels: clustering The clustering algorithms (kmeans, fuzzy kmeans, dirichlet clustering, and spectral cluster) clustering data by assuming that the data can be clustered into the regular hyper sphere or ellipsoid. However, in practical, not all the data can be clustered in this way. To enable the data to be clustered in arbitrary shapes, clustering algorithms like DBSCAN, BIRCH, CLARANCE (http://en.wikipedia.org/wiki/Cluster_analysis#Density-based_clustering) are proposed. It is better that we can implement one or some of these clustering algorithm to enrich the clustering library. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] [Commented] (MAHOUT-1206) Add density-based clustering algorithms to mahout
Actually the density based clustering algorithms are very important nowadays. There are tons of GIS applications demanding the functionality of grouping the data points in spatial settings, e.g., how to cluster two buildings which are quite close to each other in terms of their euclidean distance, yet with a river in between them? Considering the size of the GIS data, the implementation of density-based clustering algorithms with MapReduce is a must. In addition, some Mahout users ask about the plan of including DBScan into Mahout ( http://comments.gmane.org/gmane.comp.apache.mahout.user/11638) BTW, There are some publications online regarding the algorithms of DBScan in MapReduce. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6121313 http://www.scientific.net/AMR.301-303.1133 http://ieeexplore.ieee.org/xpl/articleDetails.jsp?reload=truearnumber=6253489 On Tue, May 7, 2013 at 6:33 PM, 姜页希 yexiji...@gmail.com wrote: There are several distinct implementations available online. Also, Yu Lee and I have some experience on developing hierarchical clustering (clustering the data with arbitrary shape through connectivity-based clustering) on hadoop. If you think this is OK, Yu Lee and I can take this task of implementation, test, and maintenance. Also, we would periodically report the progress. We can fork the mahout from github and conduct the implementation independently. 2013/5/7 Ted Dunning (JIRA) j...@apache.org [ https://issues.apache.org/jira/browse/MAHOUT-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13651369#comment-13651369 ] Ted Dunning commented on MAHOUT-1206: - Do you know of scalable algorithms for these other algorithms? Is there a demonstrated need? Who will maintain the implementations? Add density-based clustering algorithms to mahout - Key: MAHOUT-1206 URL: https://issues.apache.org/jira/browse/MAHOUT-1206 Project: Mahout Issue Type: Improvement Reporter: Yexi Jiang Labels: clustering The clustering algorithms (kmeans, fuzzy kmeans, dirichlet clustering, and spectral cluster) clustering data by assuming that the data can be clustered into the regular hyper sphere or ellipsoid. However, in practical, not all the data can be clustered in this way. To enable the data to be clustered in arbitrary shapes, clustering algorithms like DBSCAN, BIRCH, CLARANCE ( http://en.wikipedia.org/wiki/Cluster_analysis#Density-based_clustering) are proposed. It is better that we can implement one or some of these clustering algorithm to enrich the clustering library. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/
Re: [jira] [Commented] (MAHOUT-1206) Add density-based clustering algorithms to mahout
On Wed, May 8, 2013 at 2:33 PM, yu lee leeyufam...@gmail.com wrote: e.g., how to cluster two buildings which are quite close to each other in terms of their euclidean distance, yet with a river in between them? Define distance in SVD space based on how people move. Next problem?
Re: [jira] [Commented] (MAHOUT-1206) Add density-based clustering algorithms to mahout
Actually, that's not the motivation of density-based clustering, the example you gave can be solved by measuring the distance in a proper way. One characteristic of the density-based clustering is that it can leverage the local information of each data point to cluster the data to obtain the clustering with arbitrary shape. Another characteristic is that the density-based clustering is able to discover the data points that does not belongs to any cluster, and mark them as outlier. Although the spectral clustering that clusters the data based on the similarity matrix can partially achieve the first characteristic, but the built-in K-means clustering inside the spectral clustering make the second characteristic not so easy to achieve. 2013/5/8 Ted Dunning ted.dunn...@gmail.com On Wed, May 8, 2013 at 2:33 PM, yu lee leeyufam...@gmail.com wrote: e.g., how to cluster two buildings which are quite close to each other in terms of their euclidean distance, yet with a river in between them? Define distance in SVD space based on how people move. Next problem? -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/
[jira] [Commented] (MAHOUT-1206) Add density-based clustering algorithms to mahout
[ https://issues.apache.org/jira/browse/MAHOUT-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13651369#comment-13651369 ] Ted Dunning commented on MAHOUT-1206: - Do you know of scalable algorithms for these other algorithms? Is there a demonstrated need? Who will maintain the implementations? Add density-based clustering algorithms to mahout - Key: MAHOUT-1206 URL: https://issues.apache.org/jira/browse/MAHOUT-1206 Project: Mahout Issue Type: Improvement Reporter: Yexi Jiang Labels: clustering The clustering algorithms (kmeans, fuzzy kmeans, dirichlet clustering, and spectral cluster) clustering data by assuming that the data can be clustered into the regular hyper sphere or ellipsoid. However, in practical, not all the data can be clustered in this way. To enable the data to be clustered in arbitrary shapes, clustering algorithms like DBSCAN, BIRCH, CLARANCE (http://en.wikipedia.org/wiki/Cluster_analysis#Density-based_clustering) are proposed. It is better that we can implement one or some of these clustering algorithm to enrich the clustering library. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] [Commented] (MAHOUT-1206) Add density-based clustering algorithms to mahout
There are several distinct implementations available online. Also, Yu Lee and I have some experience on developing hierarchical clustering (clustering the data with arbitrary shape through connectivity-based clustering) on hadoop. If you think this is OK, Yu Lee and I can take this task of implementation, test, and maintenance. Also, we would periodically report the progress. We can fork the mahout from github and conduct the implementation independently. 2013/5/7 Ted Dunning (JIRA) j...@apache.org [ https://issues.apache.org/jira/browse/MAHOUT-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13651369#comment-13651369] Ted Dunning commented on MAHOUT-1206: - Do you know of scalable algorithms for these other algorithms? Is there a demonstrated need? Who will maintain the implementations? Add density-based clustering algorithms to mahout - Key: MAHOUT-1206 URL: https://issues.apache.org/jira/browse/MAHOUT-1206 Project: Mahout Issue Type: Improvement Reporter: Yexi Jiang Labels: clustering The clustering algorithms (kmeans, fuzzy kmeans, dirichlet clustering, and spectral cluster) clustering data by assuming that the data can be clustered into the regular hyper sphere or ellipsoid. However, in practical, not all the data can be clustered in this way. To enable the data to be clustered in arbitrary shapes, clustering algorithms like DBSCAN, BIRCH, CLARANCE ( http://en.wikipedia.org/wiki/Cluster_analysis#Density-based_clustering) are proposed. It is better that we can implement one or some of these clustering algorithm to enrich the clustering library. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira -- -- Yexi Jiang, ECS 251, yjian...@cs.fiu.edu School of Computer and Information Science, Florida International University Homepage: http://users.cis.fiu.edu/~yjian004/