[jira] [Commented] (MAHOUT-1206) Add density-based clustering algorithms to mahout

2013-11-07 Thread Chisomo Sakala (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816858#comment-13816858
 ] 

Chisomo Sakala commented on MAHOUT-1206:


I'm really excited about this prospect.

The paper http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6253489 
talks about how to implement MapReduce for DBScan (DBSCAN-MR). I have a pdf 
copy and can email it to anybody interested in viewing it. 

I emailed the author of that paper to find out if they'd be willing to 
contribute their coded implementation of DBSCAN-MR to Mahout, but  I haven't 
yet gotten a response.

Here is another paper discussing parellelization of DBSCAN.
http://conferences.computer.org/sc/2012/papers/1000a053.pdf







 Add density-based clustering algorithms to mahout
 -

 Key: MAHOUT-1206
 URL: https://issues.apache.org/jira/browse/MAHOUT-1206
 Project: Mahout
  Issue Type: Improvement
Reporter: Yexi Jiang
  Labels: clustering
 Fix For: Backlog


 The clustering algorithms (kmeans, fuzzy kmeans, dirichlet clustering, and 
 spectral cluster) clustering data by assuming that the data can be clustered 
 into the regular hyper sphere or ellipsoid. However, in practical, not all 
 the data can be clustered in this way. 
 To enable the data to be clustered in arbitrary shapes, clustering algorithms 
 like DBSCAN, BIRCH, CLARANCE 
 (http://en.wikipedia.org/wiki/Cluster_analysis#Density-based_clustering) are 
 proposed.
 It is better that we can implement one or some of these clustering algorithm 
 to enrich the clustering library. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAHOUT-1206) Add density-based clustering algorithms to mahout

2013-06-02 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672489#comment-13672489
 ] 

Ted Dunning commented on MAHOUT-1206:
-

I put some questions on here.

I am still somewhat dubious of these algorithms for large data.

If you could suggest more concrete information about how you would implement 
them, that would help.

The other questions about need and support are important as well.

 Add density-based clustering algorithms to mahout
 -

 Key: MAHOUT-1206
 URL: https://issues.apache.org/jira/browse/MAHOUT-1206
 Project: Mahout
  Issue Type: Improvement
Reporter: Yexi Jiang
  Labels: clustering
 Fix For: Backlog


 The clustering algorithms (kmeans, fuzzy kmeans, dirichlet clustering, and 
 spectral cluster) clustering data by assuming that the data can be clustered 
 into the regular hyper sphere or ellipsoid. However, in practical, not all 
 the data can be clustered in this way. 
 To enable the data to be clustered in arbitrary shapes, clustering algorithms 
 like DBSCAN, BIRCH, CLARANCE 
 (http://en.wikipedia.org/wiki/Cluster_analysis#Density-based_clustering) are 
 proposed.
 It is better that we can implement one or some of these clustering algorithm 
 to enrich the clustering library. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAHOUT-1206) Add density-based clustering algorithms to mahout

2013-06-01 Thread Yexi Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672147#comment-13672147
 ] 

Yexi Jiang commented on MAHOUT-1206:


Still there is no comments?

 Add density-based clustering algorithms to mahout
 -

 Key: MAHOUT-1206
 URL: https://issues.apache.org/jira/browse/MAHOUT-1206
 Project: Mahout
  Issue Type: Improvement
Reporter: Yexi Jiang
  Labels: clustering

 The clustering algorithms (kmeans, fuzzy kmeans, dirichlet clustering, and 
 spectral cluster) clustering data by assuming that the data can be clustered 
 into the regular hyper sphere or ellipsoid. However, in practical, not all 
 the data can be clustered in this way. 
 To enable the data to be clustered in arbitrary shapes, clustering algorithms 
 like DBSCAN, BIRCH, CLARANCE 
 (http://en.wikipedia.org/wiki/Cluster_analysis#Density-based_clustering) are 
 proposed.
 It is better that we can implement one or some of these clustering algorithm 
 to enrich the clustering library. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [jira] [Commented] (MAHOUT-1206) Add density-based clustering algorithms to mahout

2013-05-08 Thread yu lee
Actually the density based clustering algorithms are very important
nowadays.

There are tons of GIS applications demanding the functionality of grouping
the data points in spatial settings, e.g., how to cluster two buildings
which are quite close to each other in terms of their euclidean distance,
yet with a river in between them?

Considering the size of the GIS data, the implementation of density-based
clustering algorithms with MapReduce is a must. In addition, some Mahout
users ask about the plan of including DBScan into Mahout (
http://comments.gmane.org/gmane.comp.apache.mahout.user/11638)

BTW, There are some publications online regarding the algorithms of DBScan
in MapReduce.
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6121313
http://www.scientific.net/AMR.301-303.1133
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?reload=truearnumber=6253489






On Tue, May 7, 2013 at 6:33 PM, 姜页希 yexiji...@gmail.com wrote:

 There are several distinct implementations available online.
 Also, Yu Lee and I have some experience on developing hierarchical
 clustering (clustering the data with arbitrary shape through
 connectivity-based clustering) on hadoop.

 If you think this is OK, Yu Lee and I can take this task of implementation,
 test, and maintenance. Also, we would periodically report the progress.

 We can fork the mahout from github and conduct the implementation
 independently.



 2013/5/7 Ted Dunning (JIRA) j...@apache.org

 
  [
 
 https://issues.apache.org/jira/browse/MAHOUT-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13651369#comment-13651369
 ]
 
  Ted Dunning commented on MAHOUT-1206:
  -
 
  Do you know of scalable algorithms for these other algorithms?
 
  Is there a demonstrated need?  Who will maintain the implementations?
 
 
 
   Add density-based clustering algorithms to mahout
   -
  
   Key: MAHOUT-1206
   URL: https://issues.apache.org/jira/browse/MAHOUT-1206
   Project: Mahout
Issue Type: Improvement
  Reporter: Yexi Jiang
Labels: clustering
  
   The clustering algorithms (kmeans, fuzzy kmeans, dirichlet clustering,
  and spectral cluster) clustering data by assuming that the data can be
  clustered into the regular hyper sphere or ellipsoid. However, in
  practical, not all the data can be clustered in this way.
   To enable the data to be clustered in arbitrary shapes, clustering
  algorithms like DBSCAN, BIRCH, CLARANCE (
  http://en.wikipedia.org/wiki/Cluster_analysis#Density-based_clustering)
  are proposed.
   It is better that we can implement one or some of these clustering
  algorithm to enrich the clustering library.
 
  --
  This message is automatically generated by JIRA.
  If you think it was sent incorrectly, please contact your JIRA
  administrators
  For more information on JIRA, see:
 http://www.atlassian.com/software/jira
 



 --
 --
 Yexi Jiang,
 ECS 251,  yjian...@cs.fiu.edu
 School of Computer and Information Science,
 Florida International University
 Homepage: http://users.cis.fiu.edu/~yjian004/



Re: [jira] [Commented] (MAHOUT-1206) Add density-based clustering algorithms to mahout

2013-05-08 Thread Ted Dunning
On Wed, May 8, 2013 at 2:33 PM, yu lee leeyufam...@gmail.com wrote:

 e.g., how to cluster two buildings
 which are quite close to each other in terms of their euclidean distance,
 yet with a river in between them?


Define distance in SVD space based on how people move.

Next problem?


Re: [jira] [Commented] (MAHOUT-1206) Add density-based clustering algorithms to mahout

2013-05-08 Thread 姜页希
Actually, that's not the motivation of density-based clustering, the
example you gave can be solved by measuring the distance in a proper way.

One characteristic of the density-based clustering is that it can leverage
the local information of each data point to cluster the data to obtain the
clustering with arbitrary shape.

Another characteristic is that the density-based clustering is able to
discover the data points that does not belongs to any cluster, and mark
them as outlier.

Although the spectral clustering that clusters the data based on the
similarity matrix can partially achieve the first characteristic, but the
built-in K-means clustering inside the spectral clustering make the second
characteristic not so easy to achieve.


2013/5/8 Ted Dunning ted.dunn...@gmail.com

 On Wed, May 8, 2013 at 2:33 PM, yu lee leeyufam...@gmail.com wrote:

  e.g., how to cluster two buildings
  which are quite close to each other in terms of their euclidean distance,
  yet with a river in between them?
 

 Define distance in SVD space based on how people move.

 Next problem?




-- 
--
Yexi Jiang,
ECS 251,  yjian...@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/


[jira] [Commented] (MAHOUT-1206) Add density-based clustering algorithms to mahout

2013-05-07 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13651369#comment-13651369
 ] 

Ted Dunning commented on MAHOUT-1206:
-

Do you know of scalable algorithms for these other algorithms?

Is there a demonstrated need?  Who will maintain the implementations?



 Add density-based clustering algorithms to mahout
 -

 Key: MAHOUT-1206
 URL: https://issues.apache.org/jira/browse/MAHOUT-1206
 Project: Mahout
  Issue Type: Improvement
Reporter: Yexi Jiang
  Labels: clustering

 The clustering algorithms (kmeans, fuzzy kmeans, dirichlet clustering, and 
 spectral cluster) clustering data by assuming that the data can be clustered 
 into the regular hyper sphere or ellipsoid. However, in practical, not all 
 the data can be clustered in this way. 
 To enable the data to be clustered in arbitrary shapes, clustering algorithms 
 like DBSCAN, BIRCH, CLARANCE 
 (http://en.wikipedia.org/wiki/Cluster_analysis#Density-based_clustering) are 
 proposed.
 It is better that we can implement one or some of these clustering algorithm 
 to enrich the clustering library. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [jira] [Commented] (MAHOUT-1206) Add density-based clustering algorithms to mahout

2013-05-07 Thread 姜页希
There are several distinct implementations available online.
Also, Yu Lee and I have some experience on developing hierarchical
clustering (clustering the data with arbitrary shape through
connectivity-based clustering) on hadoop.

If you think this is OK, Yu Lee and I can take this task of implementation,
test, and maintenance. Also, we would periodically report the progress.

We can fork the mahout from github and conduct the implementation
independently.



2013/5/7 Ted Dunning (JIRA) j...@apache.org


 [
 https://issues.apache.org/jira/browse/MAHOUT-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13651369#comment-13651369]

 Ted Dunning commented on MAHOUT-1206:
 -

 Do you know of scalable algorithms for these other algorithms?

 Is there a demonstrated need?  Who will maintain the implementations?



  Add density-based clustering algorithms to mahout
  -
 
  Key: MAHOUT-1206
  URL: https://issues.apache.org/jira/browse/MAHOUT-1206
  Project: Mahout
   Issue Type: Improvement
 Reporter: Yexi Jiang
   Labels: clustering
 
  The clustering algorithms (kmeans, fuzzy kmeans, dirichlet clustering,
 and spectral cluster) clustering data by assuming that the data can be
 clustered into the regular hyper sphere or ellipsoid. However, in
 practical, not all the data can be clustered in this way.
  To enable the data to be clustered in arbitrary shapes, clustering
 algorithms like DBSCAN, BIRCH, CLARANCE (
 http://en.wikipedia.org/wiki/Cluster_analysis#Density-based_clustering)
 are proposed.
  It is better that we can implement one or some of these clustering
 algorithm to enrich the clustering library.

 --
 This message is automatically generated by JIRA.
 If you think it was sent incorrectly, please contact your JIRA
 administrators
 For more information on JIRA, see: http://www.atlassian.com/software/jira




-- 
--
Yexi Jiang,
ECS 251,  yjian...@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/