Clustering without Hadoop

2013-12-01 Thread Shan Lu
Hi,

I am working on a very simple k-means clustering example. Is there a way to
run clustering algorithms in mahout without using Hadoop? I am reading the
book Mahout in Action. In chapter 7, the hello world clustering code
example, they use
==

KMeansDriver.run(conf, new Path(testdata/points), new
Path(testdata/clusters),
  new Path(output), new EuclideanDistanceMeasure(), 0.001, 10,
  true, false);

==
to run the k-means algorithm. How can I run the k-means algorithm without
Hadoop?

Thanks!

Shan


Re: Clustering without Hadoop

2013-12-01 Thread Amit Nithian
When you say without hadoop does that include local mode? You can run these
examples in local mode that doesn't require a cluster for testing and
poking around. Everything then runs in a single jvm.
On Dec 1, 2013 9:18 PM, Shan Lu shanlu...@gmail.com wrote:

 Hi,

 I am working on a very simple k-means clustering example. Is there a way to
 run clustering algorithms in mahout without using Hadoop? I am reading the
 book Mahout in Action. In chapter 7, the hello world clustering code
 example, they use
 ==

 KMeansDriver.run(conf, new Path(testdata/points), new
 Path(testdata/clusters),
   new Path(output), new EuclideanDistanceMeasure(), 0.001, 10,
   true, false);

 ==
 to run the k-means algorithm. How can I run the k-means algorithm without
 Hadoop?

 Thanks!

 Shan



Re: Clustering without Hadoop

2013-12-01 Thread Shan Lu
Thanks for your reply. In the example code, they run the k-means algorithm
using org.apache.hadoop.conf.Configuration,
org.apache.hadoop.fs.FileSystem, and org.apache.hadoop.fs.Path parameters.
Is there any algorithm that doesn't need any Configuration and Path
parameter, just use the data in memory? I mean, can I  run the k-means
algorithm without using the hadoop api, just using java? Thanks.


On Sun, Dec 1, 2013 at 9:58 PM, Amit Nithian anith...@gmail.com wrote:

 When you say without hadoop does that include local mode? You can run these
 examples in local mode that doesn't require a cluster for testing and
 poking around. Everything then runs in a single jvm.
 On Dec 1, 2013 9:18 PM, Shan Lu shanlu...@gmail.com wrote:

  Hi,
 
  I am working on a very simple k-means clustering example. Is there a way
 to
  run clustering algorithms in mahout without using Hadoop? I am reading
 the
  book Mahout in Action. In chapter 7, the hello world clustering code
  example, they use
  ==
 
  KMeansDriver.run(conf, new Path(testdata/points), new
  Path(testdata/clusters),
new Path(output), new EuclideanDistanceMeasure(), 0.001, 10,
true, false);
 
  ==
  to run the k-means algorithm. How can I run the k-means algorithm without
  Hadoop?
 
  Thanks!
 
  Shan
 




-- 
Shan Lu
ECE Dept., NEU, Boston, MA 02115


Re: Clustering without Hadoop

2013-12-01 Thread Suneel Marthi
Shan,

All of Mahout implementations use Hadoop API, but if u r trying to run kmeans 
in sequential (non-MapReduce) mode; pass in  runSequential = true instead of 
false as the last parameter to KMeansDriver.run() or Amit run them in 
LOCAL_MODE as pointed out earlier by Amit.







On Sunday, December 1, 2013 10:28 PM, Shan Lu shanlu...@gmail.com wrote:
 
Thanks for your reply. In the example code, they run the k-means algorithm
using org.apache.hadoop.conf.Configuration,
org.apache.hadoop.fs.FileSystem, and org.apache.hadoop.fs.Path parameters.
Is there any algorithm that doesn't need any Configuration and Path
parameter, just use the data in memory? I mean, can I  run the k-means
algorithm without using the hadoop api, just using java? Thanks.


On Sun, Dec 1, 2013 at 9:58 PM, Amit Nithian anith...@gmail.com wrote:

 When you say without hadoop does that include local mode? You can run these
 examples in local mode that doesn't require a cluster for testing and
 poking around. Everything then runs in a single jvm.
 On Dec 1, 2013 9:18 PM, Shan Lu shanlu...@gmail.com wrote:

  Hi,
 
  I am working on a very simple k-means clustering example. Is there a way
 to
  run clustering algorithms in mahout without using Hadoop? I am reading
 the
  book Mahout in Action. In chapter 7, the hello world clustering code
  example, they use
  ==
 
  KMeansDriver.run(conf, new Path(testdata/points), new
  Path(testdata/clusters),
        new Path(output), new EuclideanDistanceMeasure(), 0.001, 10,
        true, false);
 
  ==
  to run the k-means algorithm. How can I run the k-means algorithm without
  Hadoop?
 
  Thanks!

 
  Shan
 




-- 
Shan Lu
ECE Dept., NEU, Boston, MA 02115

Re: Clustering without Hadoop

2013-12-01 Thread Shan Lu
Thanks, Suneel, I'll try this way.

In this recommender example:
https://github.com/ManuelB/facebook-recommender-demo/blob/master/src/main/java/de/apaxo/bedcon/AnimalFoodRecommender.java#L142
,

they only use mahout api. So I am thinking that can I do the clustering
similarly.


On Sun, Dec 1, 2013 at 10:42 PM, Suneel Marthi suneel_mar...@yahoo.comwrote:

 Shan,

 All of Mahout implementations use Hadoop API, but if u r trying to run
 kmeans in sequential (non-MapReduce) mode; pass in  runSequential = true
 instead of false as the last parameter to KMeansDriver.run() or Amit run
 them in LOCAL_MODE as pointed out earlier by Amit.







 On Sunday, December 1, 2013 10:28 PM, Shan Lu shanlu...@gmail.com wrote:

 Thanks for your reply. In the example code, they run the k-means algorithm
 using org.apache.hadoop.conf.Configuration,
 org.apache.hadoop.fs.FileSystem, and org.apache.hadoop.fs.Path parameters.
 Is there any algorithm that doesn't need any Configuration and Path
 parameter, just use the data in memory? I mean, can I  run the k-means
 algorithm without using the hadoop api, just using java? Thanks.


 On Sun, Dec 1, 2013 at 9:58 PM, Amit Nithian anith...@gmail.com wrote:

  When you say without hadoop does that include local mode? You can run
 these
  examples in local mode that doesn't require a cluster for testing and
  poking around. Everything then runs in a single jvm.
  On Dec 1, 2013 9:18 PM, Shan Lu shanlu...@gmail.com wrote:
 
   Hi,
  
   I am working on a very simple k-means clustering example. Is there a
 way
  to
   run clustering algorithms in mahout without using Hadoop? I am reading
  the
   book Mahout in Action. In chapter 7, the hello world clustering code
   example, they use
   ==
  
   KMeansDriver.run(conf, new Path(testdata/points), new
   Path(testdata/clusters),
 new Path(output), new EuclideanDistanceMeasure(), 0.001, 10,
 true, false);
  
   ==
   to run the k-means algorithm. How can I run the k-means algorithm
 without
   Hadoop?
  
   Thanks!

  
   Shan
  
 



 --
 Shan Lu
 ECE Dept., NEU, Boston, MA 02115




-- 
Shan Lu
ECE Dept., NEU, Boston, MA 02115


Re: Clustering without Hadoop

2013-12-01 Thread Ted Dunning
The new Ball k-means and streaming k-means implementations have non-Hadoop
versions.  The streaming k-means implementation also has a threaded
implementation that runs without Hadoop.

The threaded streaming k-means implementation should be pretty fast.



On Sun, Dec 1, 2013 at 7:55 PM, Shan Lu shanlu...@gmail.com wrote:

 Thanks, Suneel, I'll try this way.

 In this recommender example:

 https://github.com/ManuelB/facebook-recommender-demo/blob/master/src/main/java/de/apaxo/bedcon/AnimalFoodRecommender.java#L142
 ,

 they only use mahout api. So I am thinking that can I do the clustering
 similarly.


 On Sun, Dec 1, 2013 at 10:42 PM, Suneel Marthi suneel_mar...@yahoo.com
 wrote:

  Shan,
 
  All of Mahout implementations use Hadoop API, but if u r trying to run
  kmeans in sequential (non-MapReduce) mode; pass in  runSequential = true
  instead of false as the last parameter to KMeansDriver.run() or Amit run
  them in LOCAL_MODE as pointed out earlier by Amit.
 
 
 
 
 
 
 
  On Sunday, December 1, 2013 10:28 PM, Shan Lu shanlu...@gmail.com
 wrote:
 
  Thanks for your reply. In the example code, they run the k-means
 algorithm
  using org.apache.hadoop.conf.Configuration,
  org.apache.hadoop.fs.FileSystem, and org.apache.hadoop.fs.Path
 parameters.
  Is there any algorithm that doesn't need any Configuration and Path
  parameter, just use the data in memory? I mean, can I  run the k-means
  algorithm without using the hadoop api, just using java? Thanks.
 
 
  On Sun, Dec 1, 2013 at 9:58 PM, Amit Nithian anith...@gmail.com wrote:
 
   When you say without hadoop does that include local mode? You can run
  these
   examples in local mode that doesn't require a cluster for testing and
   poking around. Everything then runs in a single jvm.
   On Dec 1, 2013 9:18 PM, Shan Lu shanlu...@gmail.com wrote:
  
Hi,
   
I am working on a very simple k-means clustering example. Is there a
  way
   to
run clustering algorithms in mahout without using Hadoop? I am
 reading
   the
book Mahout in Action. In chapter 7, the hello world clustering
 code
example, they use
==
   
KMeansDriver.run(conf, new Path(testdata/points), new
Path(testdata/clusters),
  new Path(output), new EuclideanDistanceMeasure(), 0.001, 10,
  true, false);
   
==
to run the k-means algorithm. How can I run the k-means algorithm
  without
Hadoop?
   
Thanks!
 
   
Shan
   
  
 
 
 
  --
  Shan Lu
  ECE Dept., NEU, Boston, MA 02115
 



 --
 Shan Lu
 ECE Dept., NEU, Boston, MA 02115



Re: Clustering without Hadoop

2013-12-01 Thread Shan Lu
Thanks, Ted. I went through some introductions of Ball k-means and
streaming k-means, but still not clear how to implement the algorithm
without hadoop. Do you know any hello world example code using non-Hadoop
version streaming k-means?  Thanks.


On Sun, Dec 1, 2013 at 11:12 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 The new Ball k-means and streaming k-means implementations have non-Hadoop
 versions.  The streaming k-means implementation also has a threaded
 implementation that runs without Hadoop.

 The threaded streaming k-means implementation should be pretty fast.



 On Sun, Dec 1, 2013 at 7:55 PM, Shan Lu shanlu...@gmail.com wrote:

  Thanks, Suneel, I'll try this way.
 
  In this recommender example:
 
 
 https://github.com/ManuelB/facebook-recommender-demo/blob/master/src/main/java/de/apaxo/bedcon/AnimalFoodRecommender.java#L142
  ,
 
  they only use mahout api. So I am thinking that can I do the clustering
  similarly.
 
 
  On Sun, Dec 1, 2013 at 10:42 PM, Suneel Marthi suneel_mar...@yahoo.com
  wrote:
 
   Shan,
  
   All of Mahout implementations use Hadoop API, but if u r trying to run
   kmeans in sequential (non-MapReduce) mode; pass in  runSequential =
 true
   instead of false as the last parameter to KMeansDriver.run() or Amit
 run
   them in LOCAL_MODE as pointed out earlier by Amit.
  
  
  
  
  
  
  
   On Sunday, December 1, 2013 10:28 PM, Shan Lu shanlu...@gmail.com
  wrote:
  
   Thanks for your reply. In the example code, they run the k-means
  algorithm
   using org.apache.hadoop.conf.Configuration,
   org.apache.hadoop.fs.FileSystem, and org.apache.hadoop.fs.Path
  parameters.
   Is there any algorithm that doesn't need any Configuration and Path
   parameter, just use the data in memory? I mean, can I  run the k-means
   algorithm without using the hadoop api, just using java? Thanks.
  
  
   On Sun, Dec 1, 2013 at 9:58 PM, Amit Nithian anith...@gmail.com
 wrote:
  
When you say without hadoop does that include local mode? You can run
   these
examples in local mode that doesn't require a cluster for testing and
poking around. Everything then runs in a single jvm.
On Dec 1, 2013 9:18 PM, Shan Lu shanlu...@gmail.com wrote:
   
 Hi,

 I am working on a very simple k-means clustering example. Is there
 a
   way
to
 run clustering algorithms in mahout without using Hadoop? I am
  reading
the
 book Mahout in Action. In chapter 7, the hello world clustering
  code
 example, they use
 ==

 KMeansDriver.run(conf, new Path(testdata/points), new
 Path(testdata/clusters),
   new Path(output), new EuclideanDistanceMeasure(), 0.001,
 10,
   true, false);

 ==
 to run the k-means algorithm. How can I run the k-means algorithm
   without
 Hadoop?

 Thanks!
  

 Shan

   
  
  
  
   --
   Shan Lu
   ECE Dept., NEU, Boston, MA 02115
  
 
 
 
  --
  Shan Lu
  ECE Dept., NEU, Boston, MA 02115
 




-- 
Shan Lu
ECE Dept., NEU, Boston, MA 02115


Re: Clustering without hadoop

2012-11-12 Thread Johannes Schulte
Hi Florents,

it just became different but still works without hdfs, i also had trouble
getting the right classes together but here is something that will
hopefully work correctly:

  DistanceMeasure measure = new CosineDistanceMeasure();

  // ClusterUtils is no mahout class

  ListCluster initialClusters = ClusterUtils.getInitialClusters(points,
k, measure);


   System.out.println(going ma d!);


ClusterClassifier prior =

new ClusterClassifier(initialClusters,
newKMeansClusteringPolicy(0.01));


ClusterClassifier clustered = ClusterIterator.iterate(points,
prior, 10);

System.out.println(clustered.getModels());

ListCluster finalClusters = clustered.getModels();


Cheers,


Johannes


On Mon, Nov 12, 2012 at 11:55 PM, Florents Tselai florents.tse...@gmail.com
 wrote:

 Hello,

 I'm working on market basket data clustering and I'd like to know the
 fastest(quick and dirty) way to use mahout.

 Specifically, Is it possible to run Kmeans or EM without having a HDFS
 configured?

 If I'm not wrong Mahout 0.5 Kmeans implementation didn't require HDFS but
 the latest version does.