Re: Clustering options

2016-05-24 Thread FRANCISCO XAVIER SUMBA TORAL

Actually, I was confused between MLlib and Samsara, thanks Pat. I was already 
reading the documentation of MLlib. 

Cheers.


> On May 24, 2016, at 10:48, Pat Ferrel  wrote:
> 
> Mahout Samsara is more about rolling your own algo, though it has already 
> implemented several as examples. If you want to build your own clustering you 
> will find a lot of what you need in the R-like DSL. 
> 
> But if you want something already built you may want to look at Spark’s MLlib 
> kmeans.
> 
> People often ask; what is the difference between Mahout and MLlib? MLlib is a 
> collection of algos, Mahout is an optimized tensor math engine with many 
> extensions and several algos. You can’t do the matrix A’B in MLlib because 
> it’s not an algo, it’s a bit of math—a very useful bit.
> 
> 
> On May 23, 2016, at 8:10 PM, FRANCISCO XAVIER SUMBA TORAL 
>  wrote:
> 
> Hi Dmitriy,
> 
> Thanks for your clarification.
> 
> Cheers.
> 
> 
>> On May 23, 2016, at 12:00, Dmitriy Lyubimov  wrote:
>> 
>> Xavier,
>> there are no exact equivalents in public domain to algorithms existed for
>> MR clustering as of yet. My understanding some of them are on the roadmap
>> though.
>> 
>> depending on the level of sophistication you require, some of them are very
>> easy to build though.
>> 
>> On Sat, May 21, 2016 at 8:46 PM, FRANCISCO XAVIER SUMBA TORAL <
>> xavier.sumb...@ucuenca.ec> wrote:
>> 
>>> Hi,
>>> 
>>> Since clustering algorithms are deprecated in mahout samsara. How can I
>>> make use of mahout to run a clustering algorithm. Basically, I use mahout
>>> to cluster paper's keywords. I take a bunch of keywords and I cluster them
>>> to find groups of related keywords. How can I update my code to mahout
>>> samsara any suggestion?
>>> 
>>> Cheers
>>> 
> 
> 



Re: Clustering options

2016-05-24 Thread Pat Ferrel
Mahout Samsara is more about rolling your own algo, though it has already 
implemented several as examples. If you want to build your own clustering you 
will find a lot of what you need in the R-like DSL. 

But if you want something already built you may want to look at Spark’s MLlib 
kmeans.

People often ask; what is the difference between Mahout and MLlib? MLlib is a 
collection of algos, Mahout is an optimized tensor math engine with many 
extensions and several algos. You can’t do the matrix A’B in MLlib because it’s 
not an algo, it’s a bit of math—a very useful bit.


On May 23, 2016, at 8:10 PM, FRANCISCO XAVIER SUMBA TORAL 
 wrote:

Hi Dmitriy,

Thanks for your clarification.

Cheers.


> On May 23, 2016, at 12:00, Dmitriy Lyubimov  wrote:
> 
> Xavier,
> there are no exact equivalents in public domain to algorithms existed for
> MR clustering as of yet. My understanding some of them are on the roadmap
> though.
> 
> depending on the level of sophistication you require, some of them are very
> easy to build though.
> 
> On Sat, May 21, 2016 at 8:46 PM, FRANCISCO XAVIER SUMBA TORAL <
> xavier.sumb...@ucuenca.ec> wrote:
> 
>> Hi,
>> 
>> Since clustering algorithms are deprecated in mahout samsara. How can I
>> make use of mahout to run a clustering algorithm. Basically, I use mahout
>> to cluster paper's keywords. I take a bunch of keywords and I cluster them
>> to find groups of related keywords. How can I update my code to mahout
>> samsara any suggestion?
>> 
>> Cheers
>> 




Re: Clustering options

2016-05-23 Thread FRANCISCO XAVIER SUMBA TORAL
Hi Dmitriy,

Thanks for your clarification.

Cheers.


> On May 23, 2016, at 12:00, Dmitriy Lyubimov  wrote:
> 
> Xavier,
> there are no exact equivalents in public domain to algorithms existed for
> MR clustering as of yet. My understanding some of them are on the roadmap
> though.
> 
> depending on the level of sophistication you require, some of them are very
> easy to build though.
> 
> On Sat, May 21, 2016 at 8:46 PM, FRANCISCO XAVIER SUMBA TORAL <
> xavier.sumb...@ucuenca.ec> wrote:
> 
>> Hi,
>> 
>> Since clustering algorithms are deprecated in mahout samsara. How can I
>> make use of mahout to run a clustering algorithm. Basically, I use mahout
>> to cluster paper's keywords. I take a bunch of keywords and I cluster them
>> to find groups of related keywords. How can I update my code to mahout
>> samsara any suggestion?
>> 
>> Cheers
>> 



Re: Clustering options

2016-05-23 Thread Dmitriy Lyubimov
Xavier,
there are no exact equivalents in public domain to algorithms existed for
MR clustering as of yet. My understanding some of them are on the roadmap
though.

depending on the level of sophistication you require, some of them are very
easy to build though.

On Sat, May 21, 2016 at 8:46 PM, FRANCISCO XAVIER SUMBA TORAL <
xavier.sumb...@ucuenca.ec> wrote:

> Hi,
>
> Since clustering algorithms are deprecated in mahout samsara. How can I
> make use of mahout to run a clustering algorithm. Basically, I use mahout
> to cluster paper's keywords. I take a bunch of keywords and I cluster them
> to find groups of related keywords. How can I update my code to mahout
> samsara any suggestion?
>
> Cheers
>


Clustering options

2016-05-21 Thread FRANCISCO XAVIER SUMBA TORAL
Hi,

Since clustering algorithms are deprecated in mahout samsara. How can I
make use of mahout to run a clustering algorithm. Basically, I use mahout
to cluster paper's keywords. I take a bunch of keywords and I cluster them
to find groups of related keywords. How can I update my code to mahout
samsara any suggestion?

Cheers


Kmeans clustering options

2011-04-06 Thread Kate Ericson
Hi all,

I just got the latest update to the first 6 chapters of Mahout In
Action, and it still says that '-r' is an option to k-means
clustering.  I'm working with 0.4, and I'm not seeing it as an option
off of -h.
I'm just looking for a sanity check - can you set the number of
reducers for k-means?

Thanks,

Kate


RE: Kmeans clustering options

2011-04-06 Thread Jeff Eastman
Only using a -Dmapred.reduce.tasks=n parameter. The explicit CLI argument was 
dropped in 0.4. Looks like the book has a typo.

-Original Message-
From: moving...@gmail.com [mailto:moving...@gmail.com] On Behalf Of Kate Ericson
Sent: Wednesday, April 06, 2011 5:45 PM
To: user@mahout.apache.org
Subject: Kmeans clustering options

Hi all,

I just got the latest update to the first 6 chapters of Mahout In
Action, and it still says that '-r' is an option to k-means
clustering.  I'm working with 0.4, and I'm not seeing it as an option
off of -h.
I'm just looking for a sanity check - can you set the number of
reducers for k-means?

Thanks,

Kate


Re: Kmeans clustering options

2011-04-06 Thread Kate Ericson
Thanks for the quick reply!

-Kate

On Wed, Apr 6, 2011 at 6:58 PM, Jeff Eastman jeast...@narus.com wrote:
 Only using a -Dmapred.reduce.tasks=n parameter. The explicit CLI argument was 
 dropped in 0.4. Looks like the book has a typo.

 -Original Message-
 From: moving...@gmail.com [mailto:moving...@gmail.com] On Behalf Of Kate 
 Ericson
 Sent: Wednesday, April 06, 2011 5:45 PM
 To: user@mahout.apache.org
 Subject: Kmeans clustering options

 Hi all,

 I just got the latest update to the first 6 chapters of Mahout In
 Action, and it still says that '-r' is an option to k-means
 clustering.  I'm working with 0.4, and I'm not seeing it as an option
 off of -h.
 I'm just looking for a sanity check - can you set the number of
 reducers for k-means?

 Thanks,

 Kate