subject:"Clustering without Hadoop"

Clustering without Hadoop

2013-12-01 Thread Shan Lu

Hi,

I am working on a very simple k-means clustering example. Is there a way to
run clustering algorithms in mahout without using Hadoop? I am reading the
book Mahout in Action. In chapter 7, the hello world clustering code
example, they use
==

KMeansDriver.run(conf, new Path(testdata/points), new
Path(testdata/clusters),
  new Path(output), new EuclideanDistanceMeasure(), 0.001, 10,
  true, false);

==
to run the k-means algorithm. How can I run the k-means algorithm without
Hadoop?

Thanks!

Shan

Re: Clustering without Hadoop

2013-12-01 Thread Amit Nithian

When you say without hadoop does that include local mode? You can run these
examples in local mode that doesn't require a cluster for testing and
poking around. Everything then runs in a single jvm.
On Dec 1, 2013 9:18 PM, Shan Lu shanlu...@gmail.com wrote:

 Hi,

 I am working on a very simple k-means clustering example. Is there a way to
 run clustering algorithms in mahout without using Hadoop? I am reading the
 book Mahout in Action. In chapter 7, the hello world clustering code
 example, they use
 ==

 KMeansDriver.run(conf, new Path(testdata/points), new
 Path(testdata/clusters),
   new Path(output), new EuclideanDistanceMeasure(), 0.001, 10,
   true, false);

 ==
 to run the k-means algorithm. How can I run the k-means algorithm without
 Hadoop?

 Thanks!

 Shan

Re: Clustering without Hadoop

2013-12-01 Thread Shan Lu

Thanks for your reply. In the example code, they run the k-means algorithm
using org.apache.hadoop.conf.Configuration,
org.apache.hadoop.fs.FileSystem, and org.apache.hadoop.fs.Path parameters.
Is there any algorithm that doesn't need any Configuration and Path
parameter, just use the data in memory? I mean, can I  run the k-means
algorithm without using the hadoop api, just using java? Thanks.


On Sun, Dec 1, 2013 at 9:58 PM, Amit Nithian anith...@gmail.com wrote:

 When you say without hadoop does that include local mode? You can run these
 examples in local mode that doesn't require a cluster for testing and
 poking around. Everything then runs in a single jvm.
 On Dec 1, 2013 9:18 PM, Shan Lu shanlu...@gmail.com wrote:

  Hi,
 
  I am working on a very simple k-means clustering example. Is there a way
 to
  run clustering algorithms in mahout without using Hadoop? I am reading
 the
  book Mahout in Action. In chapter 7, the hello world clustering code
  example, they use
  ==
 
  KMeansDriver.run(conf, new Path(testdata/points), new
  Path(testdata/clusters),
new Path(output), new EuclideanDistanceMeasure(), 0.001, 10,
true, false);
 
  ==
  to run the k-means algorithm. How can I run the k-means algorithm without
  Hadoop?
 
  Thanks!
 
  Shan
 




-- 
Shan Lu
ECE Dept., NEU, Boston, MA 02115

Re: Clustering without Hadoop

2013-12-01 Thread Suneel Marthi

Shan,

All of Mahout implementations use Hadoop API, but if u r trying to run kmeans 
in sequential (non-MapReduce) mode; pass in  runSequential = true instead of 
false as the last parameter to KMeansDriver.run() or Amit run them in 
LOCAL_MODE as pointed out earlier by Amit.







On Sunday, December 1, 2013 10:28 PM, Shan Lu shanlu...@gmail.com wrote:
 
Thanks for your reply. In the example code, they run the k-means algorithm
using org.apache.hadoop.conf.Configuration,
org.apache.hadoop.fs.FileSystem, and org.apache.hadoop.fs.Path parameters.
Is there any algorithm that doesn't need any Configuration and Path
parameter, just use the data in memory? I mean, can I  run the k-means
algorithm without using the hadoop api, just using java? Thanks.


On Sun, Dec 1, 2013 at 9:58 PM, Amit Nithian anith...@gmail.com wrote:

 When you say without hadoop does that include local mode? You can run these
 examples in local mode that doesn't require a cluster for testing and
 poking around. Everything then runs in a single jvm.
 On Dec 1, 2013 9:18 PM, Shan Lu shanlu...@gmail.com wrote:

  Hi,
 
  I am working on a very simple k-means clustering example. Is there a way
 to
  run clustering algorithms in mahout without using Hadoop? I am reading
 the
  book Mahout in Action. In chapter 7, the hello world clustering code
  example, they use
  ==
 
  KMeansDriver.run(conf, new Path(testdata/points), new
  Path(testdata/clusters),
        new Path(output), new EuclideanDistanceMeasure(), 0.001, 10,
        true, false);
 
  ==
  to run the k-means algorithm. How can I run the k-means algorithm without
  Hadoop?
 
  Thanks!

 
  Shan
 




-- 
Shan Lu
ECE Dept., NEU, Boston, MA 02115

Re: Clustering without Hadoop

2013-12-01 Thread Shan Lu

Thanks, Suneel, I'll try this way.

In this recommender example:
https://github.com/ManuelB/facebook-recommender-demo/blob/master/src/main/java/de/apaxo/bedcon/AnimalFoodRecommender.java#L142
,

they only use mahout api. So I am thinking that can I do the clustering
similarly.


On Sun, Dec 1, 2013 at 10:42 PM, Suneel Marthi suneel_mar...@yahoo.comwrote:

 Shan,

 All of Mahout implementations use Hadoop API, but if u r trying to run
 kmeans in sequential (non-MapReduce) mode; pass in  runSequential = true
 instead of false as the last parameter to KMeansDriver.run() or Amit run
 them in LOCAL_MODE as pointed out earlier by Amit.







 On Sunday, December 1, 2013 10:28 PM, Shan Lu shanlu...@gmail.com wrote:

 Thanks for your reply. In the example code, they run the k-means algorithm
 using org.apache.hadoop.conf.Configuration,
 org.apache.hadoop.fs.FileSystem, and org.apache.hadoop.fs.Path parameters.
 Is there any algorithm that doesn't need any Configuration and Path
 parameter, just use the data in memory? I mean, can I  run the k-means
 algorithm without using the hadoop api, just using java? Thanks.


 On Sun, Dec 1, 2013 at 9:58 PM, Amit Nithian anith...@gmail.com wrote:

  When you say without hadoop does that include local mode? You can run
 these
  examples in local mode that doesn't require a cluster for testing and
  poking around. Everything then runs in a single jvm.
  On Dec 1, 2013 9:18 PM, Shan Lu shanlu...@gmail.com wrote:
 
   Hi,
  
   I am working on a very simple k-means clustering example. Is there a
 way
  to
   run clustering algorithms in mahout without using Hadoop? I am reading
  the
   book Mahout in Action. In chapter 7, the hello world clustering code
   example, they use
   ==
  
   KMeansDriver.run(conf, new Path(testdata/points), new
   Path(testdata/clusters),
 new Path(output), new EuclideanDistanceMeasure(), 0.001, 10,
 true, false);
  
   ==
   to run the k-means algorithm. How can I run the k-means algorithm
 without
   Hadoop?
  
   Thanks!

  
   Shan
  
 



 --
 Shan Lu
 ECE Dept., NEU, Boston, MA 02115




-- 
Shan Lu
ECE Dept., NEU, Boston, MA 02115

Re: Clustering without Hadoop

2013-12-01 Thread Ted Dunning

The new Ball k-means and streaming k-means implementations have non-Hadoop
versions. The streaming k-means implementation also has a threaded
implementation that runs without Hadoop.

The threaded streaming k-means implementation should be pretty fast.

On Sun, Dec 1, 2013 at 7:55 PM, Shan Lu shanlu...@gmail.com wrote:

Thanks, Suneel, I'll try this way.

In this recommender example:

https://github.com/ManuelB/facebook-recommender-demo/blob/master/src/main/java/de/apaxo/bedcon/AnimalFoodRecommender.java#L142
,

they only use mahout api. So I am thinking that can I do the clustering
similarly.

On Sun, Dec 1, 2013 at 10:42 PM, Suneel Marthi suneel_mar...@yahoo.com
wrote:

Shan,

All of Mahout implementations use Hadoop API, but if u r trying to run
kmeans in sequential (non-MapReduce) mode; pass in runSequential = true
instead of false as the last parameter to KMeansDriver.run() or Amit run
them in LOCAL_MODE as pointed out earlier by Amit.

On Sunday, December 1, 2013 10:28 PM, Shan Lu shanlu...@gmail.com
wrote:

Thanks for your reply. In the example code, they run the k-means
algorithm
using org.apache.hadoop.conf.Configuration,
org.apache.hadoop.fs.FileSystem, and org.apache.hadoop.fs.Path
parameters.
Is there any algorithm that doesn't need any Configuration and Path
parameter, just use the data in memory? I mean, can I run the k-means
algorithm without using the hadoop api, just using java? Thanks.

On Sun, Dec 1, 2013 at 9:58 PM, Amit Nithian anith...@gmail.com wrote:

When you say without hadoop does that include local mode? You can run
these
examples in local mode that doesn't require a cluster for testing and
poking around. Everything then runs in a single jvm.
On Dec 1, 2013 9:18 PM, Shan Lu shanlu...@gmail.com wrote:

Hi,

I am working on a very simple k-means clustering example. Is there a
way
to
run clustering algorithms in mahout without using Hadoop? I am
reading
the
book Mahout in Action. In chapter 7, the hello world clustering
code
example, they use
==

KMeansDriver.run(conf, new Path(testdata/points), new
Path(testdata/clusters),
new Path(output), new EuclideanDistanceMeasure(), 0.001, 10,
true, false);

==
to run the k-means algorithm. How can I run the k-means algorithm
without
Hadoop?

Thanks!

Shan

--
Shan Lu
ECE Dept., NEU, Boston, MA 02115

Re: Clustering without Hadoop

2013-12-01 Thread Shan Lu

Thanks, Ted. I went through some introductions of Ball k-means and
streaming k-means, but still not clear how to implement the algorithm
without hadoop. Do you know any hello world example code using non-Hadoop
version streaming k-means? Thanks.

On Sun, Dec 1, 2013 at 11:12 PM, Ted Dunning ted.dunn...@gmail.com wrote:

The new Ball k-means and streaming k-means implementations have non-Hadoop
versions. The streaming k-means implementation also has a threaded
implementation that runs without Hadoop.

The threaded streaming k-means implementation should be pretty fast.

On Sun, Dec 1, 2013 at 7:55 PM, Shan Lu shanlu...@gmail.com wrote:

Thanks, Suneel, I'll try this way.

In this recommender example:

https://github.com/ManuelB/facebook-recommender-demo/blob/master/src/main/java/de/apaxo/bedcon/AnimalFoodRecommender.java#L142
,

they only use mahout api. So I am thinking that can I do the clustering
similarly.

On Sun, Dec 1, 2013 at 10:42 PM, Suneel Marthi suneel_mar...@yahoo.com
wrote:

Shan,

All of Mahout implementations use Hadoop API, but if u r trying to run
kmeans in sequential (non-MapReduce) mode; pass in runSequential =
true
instead of false as the last parameter to KMeansDriver.run() or Amit
run
them in LOCAL_MODE as pointed out earlier by Amit.

On Sunday, December 1, 2013 10:28 PM, Shan Lu shanlu...@gmail.com
wrote:

On Sun, Dec 1, 2013 at 9:58 PM, Amit Nithian anith...@gmail.com
wrote:

Hi,

I am working on a very simple k-means clustering example. Is there
a
way
to
run clustering algorithms in mahout without using Hadoop? I am
reading
the
book Mahout in Action. In chapter 7, the hello world clustering
code
example, they use
==

KMeansDriver.run(conf, new Path(testdata/points), new
Path(testdata/clusters),
new Path(output), new EuclideanDistanceMeasure(), 0.001,
10,
true, false);

==
to run the k-means algorithm. How can I run the k-means algorithm
without
Hadoop?

Thanks!

Shan

--
Shan Lu
ECE Dept., NEU, Boston, MA 02115

Re: Clustering without hadoop

2012-11-12 Thread Johannes Schulte

Hi Florents,

it just became different but still works without hdfs, i also had trouble
getting the right classes together but here is something that will
hopefully work correctly:

  DistanceMeasure measure = new CosineDistanceMeasure();

  // ClusterUtils is no mahout class

  ListCluster initialClusters = ClusterUtils.getInitialClusters(points,
k, measure);


   System.out.println(going ma d!);


ClusterClassifier prior =

new ClusterClassifier(initialClusters,
newKMeansClusteringPolicy(0.01));


ClusterClassifier clustered = ClusterIterator.iterate(points,
prior, 10);

System.out.println(clustered.getModels());

ListCluster finalClusters = clustered.getModels();


Cheers,


Johannes


On Mon, Nov 12, 2012 at 11:55 PM, Florents Tselai florents.tse...@gmail.com
 wrote:

 Hello,

 I'm working on market basket data clustering and I'd like to know the
 fastest(quick and dirty) way to use mahout.

 Specifically, Is it possible to run Kmeans or EM without having a HDFS
 configured?

 If I'm not wrong Mahout 0.5 Kmeans implementation didn't require HDFS but
 the latest version does.

Clustering without Hadoop

Re: Clustering without Hadoop

Re: Clustering without Hadoop

Re: Clustering without Hadoop

Re: Clustering without Hadoop

Re: Clustering without Hadoop

Re: Clustering without Hadoop

Re: Clustering without hadoop

8 matches

Site Navigation

Mail list logo

Footer information