Clustering without Hadoop
Hi, I am working on a very simple k-means clustering example. Is there a way to run clustering algorithms in mahout without using Hadoop? I am reading the book Mahout in Action. In chapter 7, the hello world clustering code example, they use == KMeansDriver.run(conf, new Path(testdata/points), new Path(testdata/clusters), new Path(output), new EuclideanDistanceMeasure(), 0.001, 10, true, false); == to run the k-means algorithm. How can I run the k-means algorithm without Hadoop? Thanks! Shan
Re: Clustering without Hadoop
When you say without hadoop does that include local mode? You can run these examples in local mode that doesn't require a cluster for testing and poking around. Everything then runs in a single jvm. On Dec 1, 2013 9:18 PM, Shan Lu shanlu...@gmail.com wrote: Hi, I am working on a very simple k-means clustering example. Is there a way to run clustering algorithms in mahout without using Hadoop? I am reading the book Mahout in Action. In chapter 7, the hello world clustering code example, they use == KMeansDriver.run(conf, new Path(testdata/points), new Path(testdata/clusters), new Path(output), new EuclideanDistanceMeasure(), 0.001, 10, true, false); == to run the k-means algorithm. How can I run the k-means algorithm without Hadoop? Thanks! Shan
Re: Clustering without Hadoop
Thanks for your reply. In the example code, they run the k-means algorithm using org.apache.hadoop.conf.Configuration, org.apache.hadoop.fs.FileSystem, and org.apache.hadoop.fs.Path parameters. Is there any algorithm that doesn't need any Configuration and Path parameter, just use the data in memory? I mean, can I run the k-means algorithm without using the hadoop api, just using java? Thanks. On Sun, Dec 1, 2013 at 9:58 PM, Amit Nithian anith...@gmail.com wrote: When you say without hadoop does that include local mode? You can run these examples in local mode that doesn't require a cluster for testing and poking around. Everything then runs in a single jvm. On Dec 1, 2013 9:18 PM, Shan Lu shanlu...@gmail.com wrote: Hi, I am working on a very simple k-means clustering example. Is there a way to run clustering algorithms in mahout without using Hadoop? I am reading the book Mahout in Action. In chapter 7, the hello world clustering code example, they use == KMeansDriver.run(conf, new Path(testdata/points), new Path(testdata/clusters), new Path(output), new EuclideanDistanceMeasure(), 0.001, 10, true, false); == to run the k-means algorithm. How can I run the k-means algorithm without Hadoop? Thanks! Shan -- Shan Lu ECE Dept., NEU, Boston, MA 02115
Re: Clustering without Hadoop
Shan, All of Mahout implementations use Hadoop API, but if u r trying to run kmeans in sequential (non-MapReduce) mode; pass in runSequential = true instead of false as the last parameter to KMeansDriver.run() or Amit run them in LOCAL_MODE as pointed out earlier by Amit. On Sunday, December 1, 2013 10:28 PM, Shan Lu shanlu...@gmail.com wrote: Thanks for your reply. In the example code, they run the k-means algorithm using org.apache.hadoop.conf.Configuration, org.apache.hadoop.fs.FileSystem, and org.apache.hadoop.fs.Path parameters. Is there any algorithm that doesn't need any Configuration and Path parameter, just use the data in memory? I mean, can I run the k-means algorithm without using the hadoop api, just using java? Thanks. On Sun, Dec 1, 2013 at 9:58 PM, Amit Nithian anith...@gmail.com wrote: When you say without hadoop does that include local mode? You can run these examples in local mode that doesn't require a cluster for testing and poking around. Everything then runs in a single jvm. On Dec 1, 2013 9:18 PM, Shan Lu shanlu...@gmail.com wrote: Hi, I am working on a very simple k-means clustering example. Is there a way to run clustering algorithms in mahout without using Hadoop? I am reading the book Mahout in Action. In chapter 7, the hello world clustering code example, they use == KMeansDriver.run(conf, new Path(testdata/points), new Path(testdata/clusters), new Path(output), new EuclideanDistanceMeasure(), 0.001, 10, true, false); == to run the k-means algorithm. How can I run the k-means algorithm without Hadoop? Thanks! Shan -- Shan Lu ECE Dept., NEU, Boston, MA 02115
Re: Clustering without Hadoop
Thanks, Suneel, I'll try this way. In this recommender example: https://github.com/ManuelB/facebook-recommender-demo/blob/master/src/main/java/de/apaxo/bedcon/AnimalFoodRecommender.java#L142 , they only use mahout api. So I am thinking that can I do the clustering similarly. On Sun, Dec 1, 2013 at 10:42 PM, Suneel Marthi suneel_mar...@yahoo.comwrote: Shan, All of Mahout implementations use Hadoop API, but if u r trying to run kmeans in sequential (non-MapReduce) mode; pass in runSequential = true instead of false as the last parameter to KMeansDriver.run() or Amit run them in LOCAL_MODE as pointed out earlier by Amit. On Sunday, December 1, 2013 10:28 PM, Shan Lu shanlu...@gmail.com wrote: Thanks for your reply. In the example code, they run the k-means algorithm using org.apache.hadoop.conf.Configuration, org.apache.hadoop.fs.FileSystem, and org.apache.hadoop.fs.Path parameters. Is there any algorithm that doesn't need any Configuration and Path parameter, just use the data in memory? I mean, can I run the k-means algorithm without using the hadoop api, just using java? Thanks. On Sun, Dec 1, 2013 at 9:58 PM, Amit Nithian anith...@gmail.com wrote: When you say without hadoop does that include local mode? You can run these examples in local mode that doesn't require a cluster for testing and poking around. Everything then runs in a single jvm. On Dec 1, 2013 9:18 PM, Shan Lu shanlu...@gmail.com wrote: Hi, I am working on a very simple k-means clustering example. Is there a way to run clustering algorithms in mahout without using Hadoop? I am reading the book Mahout in Action. In chapter 7, the hello world clustering code example, they use == KMeansDriver.run(conf, new Path(testdata/points), new Path(testdata/clusters), new Path(output), new EuclideanDistanceMeasure(), 0.001, 10, true, false); == to run the k-means algorithm. How can I run the k-means algorithm without Hadoop? Thanks! Shan -- Shan Lu ECE Dept., NEU, Boston, MA 02115 -- Shan Lu ECE Dept., NEU, Boston, MA 02115
Re: Clustering without Hadoop
The new Ball k-means and streaming k-means implementations have non-Hadoop versions. The streaming k-means implementation also has a threaded implementation that runs without Hadoop. The threaded streaming k-means implementation should be pretty fast. On Sun, Dec 1, 2013 at 7:55 PM, Shan Lu shanlu...@gmail.com wrote: Thanks, Suneel, I'll try this way. In this recommender example: https://github.com/ManuelB/facebook-recommender-demo/blob/master/src/main/java/de/apaxo/bedcon/AnimalFoodRecommender.java#L142 , they only use mahout api. So I am thinking that can I do the clustering similarly. On Sun, Dec 1, 2013 at 10:42 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: Shan, All of Mahout implementations use Hadoop API, but if u r trying to run kmeans in sequential (non-MapReduce) mode; pass in runSequential = true instead of false as the last parameter to KMeansDriver.run() or Amit run them in LOCAL_MODE as pointed out earlier by Amit. On Sunday, December 1, 2013 10:28 PM, Shan Lu shanlu...@gmail.com wrote: Thanks for your reply. In the example code, they run the k-means algorithm using org.apache.hadoop.conf.Configuration, org.apache.hadoop.fs.FileSystem, and org.apache.hadoop.fs.Path parameters. Is there any algorithm that doesn't need any Configuration and Path parameter, just use the data in memory? I mean, can I run the k-means algorithm without using the hadoop api, just using java? Thanks. On Sun, Dec 1, 2013 at 9:58 PM, Amit Nithian anith...@gmail.com wrote: When you say without hadoop does that include local mode? You can run these examples in local mode that doesn't require a cluster for testing and poking around. Everything then runs in a single jvm. On Dec 1, 2013 9:18 PM, Shan Lu shanlu...@gmail.com wrote: Hi, I am working on a very simple k-means clustering example. Is there a way to run clustering algorithms in mahout without using Hadoop? I am reading the book Mahout in Action. In chapter 7, the hello world clustering code example, they use == KMeansDriver.run(conf, new Path(testdata/points), new Path(testdata/clusters), new Path(output), new EuclideanDistanceMeasure(), 0.001, 10, true, false); == to run the k-means algorithm. How can I run the k-means algorithm without Hadoop? Thanks! Shan -- Shan Lu ECE Dept., NEU, Boston, MA 02115 -- Shan Lu ECE Dept., NEU, Boston, MA 02115
Re: Clustering without Hadoop
Thanks, Ted. I went through some introductions of Ball k-means and streaming k-means, but still not clear how to implement the algorithm without hadoop. Do you know any hello world example code using non-Hadoop version streaming k-means? Thanks. On Sun, Dec 1, 2013 at 11:12 PM, Ted Dunning ted.dunn...@gmail.com wrote: The new Ball k-means and streaming k-means implementations have non-Hadoop versions. The streaming k-means implementation also has a threaded implementation that runs without Hadoop. The threaded streaming k-means implementation should be pretty fast. On Sun, Dec 1, 2013 at 7:55 PM, Shan Lu shanlu...@gmail.com wrote: Thanks, Suneel, I'll try this way. In this recommender example: https://github.com/ManuelB/facebook-recommender-demo/blob/master/src/main/java/de/apaxo/bedcon/AnimalFoodRecommender.java#L142 , they only use mahout api. So I am thinking that can I do the clustering similarly. On Sun, Dec 1, 2013 at 10:42 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: Shan, All of Mahout implementations use Hadoop API, but if u r trying to run kmeans in sequential (non-MapReduce) mode; pass in runSequential = true instead of false as the last parameter to KMeansDriver.run() or Amit run them in LOCAL_MODE as pointed out earlier by Amit. On Sunday, December 1, 2013 10:28 PM, Shan Lu shanlu...@gmail.com wrote: Thanks for your reply. In the example code, they run the k-means algorithm using org.apache.hadoop.conf.Configuration, org.apache.hadoop.fs.FileSystem, and org.apache.hadoop.fs.Path parameters. Is there any algorithm that doesn't need any Configuration and Path parameter, just use the data in memory? I mean, can I run the k-means algorithm without using the hadoop api, just using java? Thanks. On Sun, Dec 1, 2013 at 9:58 PM, Amit Nithian anith...@gmail.com wrote: When you say without hadoop does that include local mode? You can run these examples in local mode that doesn't require a cluster for testing and poking around. Everything then runs in a single jvm. On Dec 1, 2013 9:18 PM, Shan Lu shanlu...@gmail.com wrote: Hi, I am working on a very simple k-means clustering example. Is there a way to run clustering algorithms in mahout without using Hadoop? I am reading the book Mahout in Action. In chapter 7, the hello world clustering code example, they use == KMeansDriver.run(conf, new Path(testdata/points), new Path(testdata/clusters), new Path(output), new EuclideanDistanceMeasure(), 0.001, 10, true, false); == to run the k-means algorithm. How can I run the k-means algorithm without Hadoop? Thanks! Shan -- Shan Lu ECE Dept., NEU, Boston, MA 02115 -- Shan Lu ECE Dept., NEU, Boston, MA 02115 -- Shan Lu ECE Dept., NEU, Boston, MA 02115
Re: Clustering without hadoop
Hi Florents, it just became different but still works without hdfs, i also had trouble getting the right classes together but here is something that will hopefully work correctly: DistanceMeasure measure = new CosineDistanceMeasure(); // ClusterUtils is no mahout class ListCluster initialClusters = ClusterUtils.getInitialClusters(points, k, measure); System.out.println(going ma d!); ClusterClassifier prior = new ClusterClassifier(initialClusters, newKMeansClusteringPolicy(0.01)); ClusterClassifier clustered = ClusterIterator.iterate(points, prior, 10); System.out.println(clustered.getModels()); ListCluster finalClusters = clustered.getModels(); Cheers, Johannes On Mon, Nov 12, 2012 at 11:55 PM, Florents Tselai florents.tse...@gmail.com wrote: Hello, I'm working on market basket data clustering and I'd like to know the fastest(quick and dirty) way to use mahout. Specifically, Is it possible to run Kmeans or EM without having a HDFS configured? If I'm not wrong Mahout 0.5 Kmeans implementation didn't require HDFS but the latest version does.