date:20131125

Fwd: Algorithms in Mahout

2013-11-25 Thread unmesha sreeveni

I have gone through  http://mahout.apache.org  for some data mining
algorithms already implemented on the Hadoop plattform.

From that i understood that

1. Kmeans
2. Decision Tree
3. Navie Bayes
Have implementation in hadoop platform

And for
4. DBscan
5. k-mearesr neighbr
6. svm
7. Logistic Regression
8. Neural n/w
9. Aprori
it is not there in Mahout.
Is that inference right?


-- 
*Thanks  Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Fwd: Algorithms in Mahout

2013-11-25 Thread Pavan K Narayanan

k nearest neibhor, svm, logistic regression, neural nets exist in mahout .
just type mahout and press enter you ll see list of algorithms available
and type mahout algo-name -h to get detailed information about how to use
/configure them

Pavan
On Nov 25, 2013 2:44 PM, unmesha sreeveni unmeshab...@gmail.com wrote:

 I have gone through  http://mahout.apache.org  for some data mining
 algorithms already implemented on the Hadoop plattform.

 From that i understood that

 1. Kmeans
 2. Decision Tree
 3. Navie Bayes
 Have implementation in hadoop platform

 And for
 4. DBscan
 5. k-mearesr neighbr
 6. svm
 7. Logistic Regression
 8. Neural n/w
 9. Aprori
 it is not there in Mahout.
 Is that inference right?


 --
 *Thanks  Regards*

 Unmesha Sreeveni U.B

 *Junior Developer*

Re: Fwd: Algorithms in Mahout

2013-11-25 Thread Sebastian Schelter

From the algorithms listed, only logistic regression (non-distributed)
is implemented.

Sorry, for the confusion, we are currently reworking the wiki.

On 25.11.2013 10:24, Pavan K Narayanan wrote:
 k nearest neibhor, svm, logistic regression, neural nets exist in mahout .
 just type mahout and press enter you ll see list of algorithms available
 and type mahout algo-name -h to get detailed information about how to use
 /configure them
 
 Pavanc 
 On Nov 25, 2013 2:44 PM, unmesha sreeveni unmeshab...@gmail.com wrote:
 
 I have gone through  http://mahout.apache.org  for some data mining
 algorithms already implemented on the Hadoop plattform.

 From that i understood that

 1. Kmeans
 2. Decision Tree
 3. Navie Bayes
 Have implementation in hadoop platform

 And for
 4. DBscan
 5. k-mearesr neighbr
 6. svm
 7. Logistic Regression
 8. Neural n/w
 9. Aprori
 it is not there in Mahout.
 Is that inference right?


 --
 *Thanks  Regards*

 Unmesha Sreeveni U.B

 *Junior Developer*

Re: Fwd: Algorithms in Mahout

2013-11-25 Thread unmesha sreeveni

So currently we dnt have Decision Tree in mahout 0.6 release.


On Mon, Nov 25, 2013 at 2:59 PM, Sebastian Schelter ssc.o...@googlemail.com
 wrote:

 From the algorithms listed, only logistic regression (non-distributed)
 is implemented.

 Sorry, for the confusion, we are currently reworking the wiki.

 On 25.11.2013 10:24, Pavan K Narayanan wrote:
  k nearest neibhor, svm, logistic regression, neural nets exist in mahout
 .
  just type mahout and press enter you ll see list of algorithms available
  and type mahout algo-name -h to get detailed information about how to use
  /configure them
 
  Pavanc
  On Nov 25, 2013 2:44 PM, unmesha sreeveni unmeshab...@gmail.com
 wrote:
 
  I have gone through  http://mahout.apache.org  for some data mining
  algorithms already implemented on the Hadoop plattform.
 
  From that i understood that
 
  1. Kmeans
  2. Decision Tree
  3. Navie Bayes
  Have implementation in hadoop platform
 
  And for
  4. DBscan
  5. k-mearesr neighbr
  6. svm
  7. Logistic Regression
  8. Neural n/w
  9. Aprori
  it is not there in Mahout.
  Is that inference right?
 
 
  --
  *Thanks  Regards*
 
  Unmesha Sreeveni U.B
 
  *Junior Developer*
 
 




-- 
*Thanks  Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: Algorithms in Mahout

2013-11-25 Thread Manuel Blechschmidt

Hi Unmesha,
please also consult JIRA as a source for algorithm, there you find
implementations or discussions:

e.g. for neural networks a.k.a multilayer perceptrons:
https://issues.apache.org/jira/browse/MAHOUT-1265
https://issues.apache.org/jira/browse/MAHOUT-976

SVM:
https://issues.apache.org/jira/browse/MAHOUT-334
https://issues.apache.org/jira/browse/MAHOUT-232
https://issues.apache.org/jira/browse/MAHOUT-14
https://issues.apache.org/jira/browse/MAHOUT-227

For aprior Mahout offered an alternative Parallel Frequent Pattern Mining. This
will be retired after 0.8
https://cwiki.apache.org/confluence/display/MAHOUT/Parallel+Frequent+Pattern+Mining

There are/were multiple kNN implementation in Mahout:
Recommender knn
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.6/org/apache/mahout/cf/taste/impl/recommender/knn/Optimizer.java
(will be removed for 0.9)
stream knn
https://github.com/tdunning/knn/blob/master/src/main/java/org/apache/mahout/knn/cluster/StreamingKMeans.java
normal knn

Hope that helps
Manuel

On 25.11.2013, at 10:14, unmesha sreeveni wrote:

I have gone through http://mahout.apache.org for some data mining
algorithms already implemented on the Hadoop plattform.

From that i understood that

1. Kmeans
2. Decision Tree
3. Navie Bayes
Have implementation in hadoop platform

And for
4. DBscan
5. k-mearesr neighbr
6. svm
7. Logistic Regression
8. Neural n/w
9. Aprori
it is not there in Mahout.
Is that inference right?

--
*Thanks Regards*

Unmesha Sreeveni U.B

*Junior Developer*

--
Manuel Blechschmidt
Dortustr. 57
14467 Potsdam
Mobil: 0173/6322621
Twitter: http://twitter.com/Manuel_B

Re: HELP for implicit data feed back - beginner

2013-11-25 Thread Antony Adopo

Hello, I disover one ebook and an article which help me about my problem:
the article :http://www.csulb.edu/web/journals/jecr/issues/20044/Paper1.pdf
the ebook :
http://www.amazon.fr/gp/product/B00BEQ82FY/ref=oh_d__o00_details_o00__i00?ie=UTF8psc=1

very interesting

2013/11/23 Manuel Blechschmidt manuel.blechschm...@gmx.de

Hello Pavan,
the following project is preconfigured using maven, m2eclipse and a normal
eclipse project layout:

https://github.com/ManuelB/facebook-recommender-demo

https://raw.github.com/ManuelB/facebook-recommender-demo/master/docs/EclipseWorkspace.png

When you execute the maven goal mvn install followed by mvn
embedded-glassfish:run it will generate a war and deploy it on an embedded
glassfish.

If you have a lot of data you should build a model e.g. similarities or a
matrix factorization on hadoop and then deploy this model in a live
environment.

Here is an excellent blog post by Sebastian:

http://ssc.io/deploying-a-massively-scalable-recommender-system-with-apache-mahout/

Hope that helps
Manuel

On 23.11.2013, at 07:49, Sebastian Schelter wrote:

You can use it in a standard Java program, no need for JavaEE. There is
no special perspective for Mahout in Eclipse.

The easiest way to setup up a project is to configure a maven project
and use mahout-core as dependency.

On 23.11.2013 13:43, Pavan K Narayanan wrote:
Hi Sebastian

Pardon my ignorance but how do you suggest we use this
o.a.m.cf.taste.impl.
recommender.GenericBooleanPrefItemBasedRecommender? Can we use it by
coding
in Java? - if yes, do we need Java EE? Is there a Mahout perspective for
Eclipse IDE? Is it possible to use these in Mahout CLI? There are
mentions
of java programs in MiA but I am unsure how to setup Mahout in Java .
Please can you clarify this part .

Sincerely,
Pavan

On 23 November 2013 04:59, Sebastian Schelter ssc.o...@googlemail.com
wrote:

Antony,

You don't need numeric ratings or preferences for your recommender. I
would suggest you start by using

o.a.m.cf.taste.impl.recommender.GenericBooleanPrefItemBasedRecommender

which has explicitly been built to support scenarios without ratings. I
would further suggest to use

o.a.m.cf.taste.impl.similarity.LogLikelihoodSimilarity

as similarity measure.

Best,
Sebastian

On 22.11.2013 22:37, Antony Adopo wrote:
ok, thank you so much. I will start like this and after do some
tricks to
increase accuracy

2013/11/22 Manuel Blechschmidt manuel.blechschm...@gmx.de

Hallo Antony,
you can use the following project as a starting point:
https://github.com/ManuelB/facebook-recommender-demo

Further you can purchase support for mahout at many companies e.g.
MapR,
Apaxo or Cloudera.

For implicit feedback just use a 1 as preference and the
LogLikelihoodSimilarity.

Hope that helps
Manuel

On 22.11.2013, at 16:22, Antony Adopo wrote:

thanks.
I've already seen this but my question is Mahout propose some
collaborative
filtering function not based on preference? or how modelize these
with
purchases?

Thanks

2013/11/22 Smith, Dan dan.sm...@disney.com

Hi Anthony,

I would suggest looking into the collaborative filtering functions.
It
will work best if you have your customers segmented into similar
groups
such as those that buy high end goods vs low end.

_Dan

On 11/22/13 11:04 AM, Antony Adopo saius...@gmail.com wrote:

Ok. thanks for answering very quickly

I forgot that to mention in the customer table there is a job
variable
and implicitly, I thought taht this variable will be also need for
accurate
recommendations. anyway

I have around 200 000 customers
My order table is around 12 000 000 orders
and I have around 2 000 000 distincts (customerid,itemid) tuples
About (customerID,itemID) tuples, when I read Mahout or
recommender
system
litterature, they use
(customerID,itemID,*preference*) and I don't have *preference.*
So exist an Mahout method or class that handle only
(customerID,itemID)
data?
And it is possible to use external data as job or (RFM ) analysis
to
get
something more accurate?

Sorry (it's about 2 weeks, I have headache how organize all of
this
to
build a great system). Propose your solutions and after, we'll see

about

2013/11/22 Sebastian Schelter ssc.o...@googlemail.com

Hi Antony,

I would start with a simple approach: extract all
customerID,itemID
tuples from the orders table and use them as your input data. How
many
of those do you have? The datasize will dictate whether you need
to
employ a distributed approach to recommendation mining or not.

--sebastian

On 22.11.2013 19:21, Antony Adopo wrote:
Morning,

My name is Antony and I have a great recommender system to build

I'm totally new on recommender systems. After reading all
scientific
files,
I didn't find

Re: Canopy threshold limitation

2013-11-25 Thread Chih-Hsien Wu

Hey Suneel, thanks for the reply. I'm trying to create hierarchical
clusters via top down approach. I'm caught in the trade off between the
lower canopy threshold and running out of heap memory.  Stream Kmeans
sounds ideal for top clustering. What are the major differences between
Streaming kmeans verses Kmeans, other than faster and less memory usage? In
other words, what are the pros and cons?


On Fri, Nov 22, 2013 at 5:30 PM, Suneel Marthi suneel_mar...@yahoo.comwrote:

 the threshold is based on user's pref of inter-cluster distances. If you
 are running out of memory, suggest increasing the JVM memory settings.

 Not sure as to what you are trying to accomplish, but if you are looking
 to get a first cut at clustering; suggest u look at the new Streaming
 kmeans that's part of Mahout 0.8.

 See
 http://stackoverflow.com/questions/17272296/how-to-use-mahout-streaming-k-meansfor
  the steps.






 On Friday, November 22, 2013 4:45 PM, Chih-Hsien Wu chjaso...@gmail.com
 wrote:

 Just out of curiosity. Is there a threshold limitation for canopy
 algorithm? Is it just defined by the user's preference based on the
 inter-cluster distances? or perhaps it is just limited by how much memory
 allowed to execute them?

Re: Algorithms in Mahout

2013-11-25 Thread Ted Dunning

On Mon, Nov 25, 2013 at 3:14 AM, Manuel Blechschmidt 
manuel.blechschm...@gmx.de wrote:

 There are/were multiple kNN implementation in Mahout:
 Recommender knn
 http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.6/org/apache/mahout/cf/taste/impl/recommender/knn/Optimizer.java(will
  be removed for 0.9)
 stream knn
 https://github.com/tdunning/knn/blob/master/src/main/java/org/apache/mahout/knn/cluster/StreamingKMeans.java
 normal knn


Streaming k-means isn't strictly a knn implementation.  It is a k-means
clustering application.

Recommender Streaming with EMR

2013-11-25 Thread Bryan Marble

Hello - 

If this isn't the best forum to ask, please let me know.

TL;DR;
Is there a way to stream preference/user data to an EMR recommender workflow 
without having to go through the pain of re-uploading all preference data, and 
starting brand new jobs over and over, etc?

I am trying to process large volumes of preference data using Amazon EMR.  It 
seems extremely unscalable to upload our entire preference set every time we 
run a job, as the vast majority of the preferences will never change.  It seems 
like the append files that Mahout can process would be perfect for this, but it 
doesn't appear that EMR supports it.

The brute force method appears to be:
1) Upload preference set
2) Run Recommender job
3) Download and process results
4) Go to step 1

Does anyone have some general advice for processing recommendations in as 
real-time a manner as possible using EMR?

Thank you for any help or references you could provide.

Bryan Marble

Re: Recommender Streaming with EMR

2013-11-25 Thread Manuel Blechschmidt

Hi Bryan,

On 25.11.2013, at 17:14, Bryan Marble wrote:

Hello -

If this isn't the best forum to ask, please let me know.

This is the correct forum to ask this question.

TL;DR;
Is there a way to stream preference/user data to an EMR recommender workflow
without having to go through the pain of re-uploading all preference data,
and starting brand new jobs over and over, etc?

No, currently not. Streaming machine learning is current research. Currently
you always train your model based on all the data that you have and use it
afterwards. After some time you retrain.

I am trying to process large volumes of preference data using Amazon EMR. It
seems extremely unscalable to upload our entire preference set every time we
run a job

Why? Sending 1TB to EMR will take about 3,7 hours according to the following
blog post:
http://www.rightscale.com/blog/cloud-industry-insights/network-performance-within-amazon-ec2-and-amazon-s3

If you use compression you can stream around 10 times the amount.

, as the vast majority of the preferences will never change.

Just append them.

It seems like the append files that Mahout can process would be perfect for
this, but it doesn't appear that EMR supports it.

The ItemSimilarityJob can already read multiple files:
https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJob.html
--input (path): Directory containing one or more text files with the preference
data

The brute force method appears to be:
1) Upload preference set
2) Run Recommender job
3) Download and process results
4) Go to step 1

Does anyone have some general advice for processing recommendations in as
real-time a manner as possible using EMR?

For better advice you can contact companies like Cloudera, MapR or Apaxo (my
company).

Thank you for any help or references you could provide.

Bryan Marble

/Manuel

--
Manuel Blechschmidt
M.Sc. IT Systems Engineering
Dortustr. 57
14467 Potsdam
Mobil: 0173/6322621
Twitter: http://twitter.com/Manuel_B

java.io.ioexception: Failed to set permissions of path

2013-11-25 Thread Antony Adopo

Hello,
please for my first install of Mahout, I have this error on eclipse
java.io.ioexception: Failed to set permissions of path on many tests.
please , could someone help me fix it. thanks

Only one reducer running on canopy generator

2013-11-25 Thread Chih-Hsien Wu

Hi all,  I have been experiencing memory issue while working with Mahout
canopy algorithm on big set of data on Hadoop. I notice that only one
reducer was running while other nodes were idle. I was wondering if
increasing the number of reduce tasks would ease down the memory usage and
speed up procedure. However, I realize that by configuring
mapred.reduce.tasks on Hadoop has no effect on canopy reduce tasks. It's
still running only with one reducer. Now, I'm question if canopy is set
that way, or am I not configuring correct on Hadoop?

Re: Only one reducer running on canopy generator

2013-11-25 Thread Suneel Marthi

Canopy Clustering is a 2 step process: Canopy Generation followed by Canopy 
Clustering.

For Canopy Generation, it uses a single reducer (and this cannot be overidden), 
while the Clustering task uses multiple reducers.

You seem to be hitting OOM during the Canopy generation phase.





On Monday, November 25, 2013 6:09 PM, Chih-Hsien Wu chjaso...@gmail.com wrote:
 
Hi all,  I have been experiencing memory issue while working with Mahout
canopy algorithm on big set of data on Hadoop. I notice that only one
reducer was running while other nodes were idle. I was wondering if
increasing the number of reduce tasks would ease down the memory usage and
speed up procedure. However, I realize that by configuring
mapred.reduce.tasks on Hadoop has no effect on canopy reduce tasks. It's
still running only with one reducer. Now, I'm question if canopy is set
that way, or am I not configuring correct on Hadoop?

Re: Algorithms in Mahout

2013-11-25 Thread unmesha sreeveni

Thxs for the replies. I will go through those links.Thanks for spending
time for me :)



On Mon, Nov 25, 2013 at 11:59 PM, Suneel Marthi suneel_mar...@yahoo.comwrote:

 Dhruv,

 Could u update the patch to present trunk codebase and also create a Wiki
 page for this?





 On Monday, November 25, 2013 1:04 PM, Dhruv dhru...@gmail.com wrote:

 Distributed Hidden Markov Model trainer using Baum Welch Algorithm is also
 available as a patch. Please see the JIRA issue MAHOUT-627.



 On Mon, Nov 25, 2013 at 8:07 AM, Ted Dunning ted.dunn...@gmail.com
 wrote:

  On Mon, Nov 25, 2013 at 3:14 AM, Manuel Blechschmidt 
  manuel.blechschm...@gmx.de wrote:
 
   There are/were multiple kNN implementation in Mahout:
   Recommender knn
  
 
 http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.6/org/apache/mahout/cf/taste/impl/recommender/knn/Optimizer.java(willberemoved
  for 0.9)
   stream knn
  
 
 https://github.com/tdunning/knn/blob/master/src/main/java/org/apache/mahout/knn/cluster/StreamingKMeans.java
   normal knn
  
 
  Streaming k-means isn't strictly a knn implementation.  It is a k-means
  clustering application.
 




-- 
*Thanks  Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Fwd: Algorithms in Mahout

Re: Fwd: Algorithms in Mahout

Re: Fwd: Algorithms in Mahout

Re: Fwd: Algorithms in Mahout

Re: Algorithms in Mahout

Re: HELP for implicit data feed back - beginner

Re: Canopy threshold limitation

Re: Algorithms in Mahout

Recommender Streaming with EMR

Re: Recommender Streaming with EMR

java.io.ioexception: Failed to set permissions of path

Only one reducer running on canopy generator

Re: Only one reducer running on canopy generator

Re: Algorithms in Mahout

14 matches

Site Navigation

Mail list logo

Footer information