Fwd: Algorithms in Mahout

2013-11-25 Thread unmesha sreeveni
I have gone through  http://mahout.apache.org  for some data mining
algorithms already implemented on the Hadoop plattform.

From that i understood that

1. Kmeans
2. Decision Tree
3. Navie Bayes
Have implementation in hadoop platform

And for
4. DBscan
5. k-mearesr neighbr
6. svm
7. Logistic Regression
8. Neural n/w
9. Aprori
it is not there in Mahout.
Is that inference right?


-- 
*Thanks  Regards*

Unmesha Sreeveni U.B

*Junior Developer*


Re: Fwd: Algorithms in Mahout

2013-11-25 Thread Pavan K Narayanan
k nearest neibhor, svm, logistic regression, neural nets exist in mahout .
just type mahout and press enter you ll see list of algorithms available
and type mahout algo-name -h to get detailed information about how to use
/configure them

Pavan
On Nov 25, 2013 2:44 PM, unmesha sreeveni unmeshab...@gmail.com wrote:

 I have gone through  http://mahout.apache.org  for some data mining
 algorithms already implemented on the Hadoop plattform.

 From that i understood that

 1. Kmeans
 2. Decision Tree
 3. Navie Bayes
 Have implementation in hadoop platform

 And for
 4. DBscan
 5. k-mearesr neighbr
 6. svm
 7. Logistic Regression
 8. Neural n/w
 9. Aprori
 it is not there in Mahout.
 Is that inference right?


 --
 *Thanks  Regards*

 Unmesha Sreeveni U.B

 *Junior Developer*



Re: Fwd: Algorithms in Mahout

2013-11-25 Thread Sebastian Schelter
From the algorithms listed, only logistic regression (non-distributed)
is implemented.

Sorry, for the confusion, we are currently reworking the wiki.

On 25.11.2013 10:24, Pavan K Narayanan wrote:
 k nearest neibhor, svm, logistic regression, neural nets exist in mahout .
 just type mahout and press enter you ll see list of algorithms available
 and type mahout algo-name -h to get detailed information about how to use
 /configure them
 
 Pavanc 
 On Nov 25, 2013 2:44 PM, unmesha sreeveni unmeshab...@gmail.com wrote:
 
 I have gone through  http://mahout.apache.org  for some data mining
 algorithms already implemented on the Hadoop plattform.

 From that i understood that

 1. Kmeans
 2. Decision Tree
 3. Navie Bayes
 Have implementation in hadoop platform

 And for
 4. DBscan
 5. k-mearesr neighbr
 6. svm
 7. Logistic Regression
 8. Neural n/w
 9. Aprori
 it is not there in Mahout.
 Is that inference right?


 --
 *Thanks  Regards*

 Unmesha Sreeveni U.B

 *Junior Developer*

 



Re: Fwd: Algorithms in Mahout

2013-11-25 Thread unmesha sreeveni
So currently we dnt have Decision Tree in mahout 0.6 release.


On Mon, Nov 25, 2013 at 2:59 PM, Sebastian Schelter ssc.o...@googlemail.com
 wrote:

 From the algorithms listed, only logistic regression (non-distributed)
 is implemented.

 Sorry, for the confusion, we are currently reworking the wiki.

 On 25.11.2013 10:24, Pavan K Narayanan wrote:
  k nearest neibhor, svm, logistic regression, neural nets exist in mahout
 .
  just type mahout and press enter you ll see list of algorithms available
  and type mahout algo-name -h to get detailed information about how to use
  /configure them
 
  Pavanc
  On Nov 25, 2013 2:44 PM, unmesha sreeveni unmeshab...@gmail.com
 wrote:
 
  I have gone through  http://mahout.apache.org  for some data mining
  algorithms already implemented on the Hadoop plattform.
 
  From that i understood that
 
  1. Kmeans
  2. Decision Tree
  3. Navie Bayes
  Have implementation in hadoop platform
 
  And for
  4. DBscan
  5. k-mearesr neighbr
  6. svm
  7. Logistic Regression
  8. Neural n/w
  9. Aprori
  it is not there in Mahout.
  Is that inference right?
 
 
  --
  *Thanks  Regards*
 
  Unmesha Sreeveni U.B
 
  *Junior Developer*
 
 




-- 
*Thanks  Regards*

Unmesha Sreeveni U.B

*Junior Developer*


Re: Algorithms in Mahout

2013-11-25 Thread Manuel Blechschmidt
Hi Unmesha,
please also consult JIRA as a source for algorithm, there you find 
implementations or discussions:

e.g. for neural networks a.k.a multilayer perceptrons:
https://issues.apache.org/jira/browse/MAHOUT-1265
https://issues.apache.org/jira/browse/MAHOUT-976

SVM:
https://issues.apache.org/jira/browse/MAHOUT-334
https://issues.apache.org/jira/browse/MAHOUT-232
https://issues.apache.org/jira/browse/MAHOUT-14
https://issues.apache.org/jira/browse/MAHOUT-227

For aprior Mahout offered an alternative Parallel Frequent Pattern Mining. This 
will be retired after 0.8
https://cwiki.apache.org/confluence/display/MAHOUT/Parallel+Frequent+Pattern+Mining

There are/were multiple kNN implementation in Mahout:
Recommender knn 
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.6/org/apache/mahout/cf/taste/impl/recommender/knn/Optimizer.java
 (will be removed for 0.9)
stream knn 
https://github.com/tdunning/knn/blob/master/src/main/java/org/apache/mahout/knn/cluster/StreamingKMeans.java
normal knn

Hope that helps
Manuel


On 25.11.2013, at 10:14, unmesha sreeveni wrote:

 I have gone through  http://mahout.apache.org  for some data mining
 algorithms already implemented on the Hadoop plattform.
 
 From that i understood that
 
 1. Kmeans
 2. Decision Tree
 3. Navie Bayes
 Have implementation in hadoop platform
 
 And for
 4. DBscan
 5. k-mearesr neighbr
 6. svm
 7. Logistic Regression
 8. Neural n/w
 9. Aprori
 it is not there in Mahout.
 Is that inference right?
 
 
 -- 
 *Thanks  Regards*
 
 Unmesha Sreeveni U.B
 
 *Junior Developer*

-- 
Manuel Blechschmidt
Dortustr. 57
14467 Potsdam
Mobil: 0173/6322621
Twitter: http://twitter.com/Manuel_B



Re: HELP for implicit data feed back - beginner

2013-11-25 Thread Antony Adopo
Hello, I disover one ebook and an article which help me about my problem:
the article :http://www.csulb.edu/web/journals/jecr/issues/20044/Paper1.pdf
the ebook :
http://www.amazon.fr/gp/product/B00BEQ82FY/ref=oh_d__o00_details_o00__i00?ie=UTF8psc=1

very interesting


2013/11/23 Manuel Blechschmidt manuel.blechschm...@gmx.de

 Hello Pavan,
 the following project is preconfigured using maven, m2eclipse and a normal
 eclipse project layout:

 https://github.com/ManuelB/facebook-recommender-demo


 https://raw.github.com/ManuelB/facebook-recommender-demo/master/docs/EclipseWorkspace.png

 When you execute the maven goal mvn install followed by mvn
 embedded-glassfish:run it will generate a war and deploy it on an embedded
 glassfish.

 If you have a lot of data you should build a model e.g. similarities or a
 matrix factorization on hadoop and then deploy this model in a live
 environment.

 Here is an excellent blog post by Sebastian:

 http://ssc.io/deploying-a-massively-scalable-recommender-system-with-apache-mahout/

 Hope that helps
 Manuel


 On 23.11.2013, at 07:49, Sebastian Schelter wrote:

  You can use it in a standard Java program, no need for JavaEE. There is
  no special perspective for Mahout in Eclipse.
 
  The easiest way to setup up a project is to configure a maven project
  and use mahout-core as dependency.
 
 
  On 23.11.2013 13:43, Pavan K Narayanan wrote:
  Hi Sebastian
 
  Pardon my ignorance but how do you suggest we use this
 o.a.m.cf.taste.impl.
  recommender.GenericBooleanPrefItemBasedRecommender? Can we use it by
 coding
  in Java? - if yes, do we need Java EE? Is there a Mahout perspective for
  Eclipse IDE? Is it possible to use these in Mahout CLI? There are
 mentions
  of java programs in MiA but I am unsure how to setup Mahout in Java .
  Please can you clarify this part .
 
  Sincerely,
  Pavan
 
 
 
 
  On 23 November 2013 04:59, Sebastian Schelter ssc.o...@googlemail.com
 wrote:
 
  Antony,
 
  You don't need numeric ratings or preferences for your recommender. I
  would suggest you start by using
 
  o.a.m.cf.taste.impl.recommender.GenericBooleanPrefItemBasedRecommender
 
  which has explicitly been built to support scenarios without ratings. I
  would further suggest to use
 
  o.a.m.cf.taste.impl.similarity.LogLikelihoodSimilarity
 
  as similarity measure.
 
  Best,
  Sebastian
 
 
  On 22.11.2013 22:37, Antony Adopo wrote:
  ok, thank you so much. I will start like this and after do some
 tricks to
  increase accuracy
 
 
  2013/11/22 Manuel Blechschmidt manuel.blechschm...@gmx.de
 
  Hallo Antony,
  you can use the following project as a starting point:
  https://github.com/ManuelB/facebook-recommender-demo
 
  Further you can purchase support for mahout at many companies e.g.
 MapR,
  Apaxo or Cloudera.
 
  For implicit feedback just use a 1 as preference and the
  LogLikelihoodSimilarity.
 
  Hope that helps
 Manuel
 
  On 22.11.2013, at 16:22, Antony Adopo wrote:
 
  thanks.
  I've already seen this but my question is Mahout propose some
  collaborative
  filtering function not based on preference? or how modelize these
 with
  purchases?
 
  Thanks
 
 
  2013/11/22 Smith, Dan dan.sm...@disney.com
 
  Hi Anthony,
 
  I would suggest looking into the collaborative filtering functions.
  It
  will work best if you have your customers segmented into similar
  groups
  such as those that buy high end goods vs low end.
 
  _Dan
 
  On 11/22/13 11:04 AM, Antony Adopo saius...@gmail.com wrote:
 
  Ok. thanks for answering very quickly
 
  I forgot that to mention in the customer table there is a job
  variable
  and implicitly, I thought taht this variable will be also need for
  accurate
  recommendations. anyway
 
  I have around 200 000 customers
  My order table is around 12 000 000 orders
  and I have around 2 000 000 distincts (customerid,itemid) tuples
  About (customerID,itemID) tuples, when I read Mahout or
 recommender
  system
  litterature, they use
  (customerID,itemID,*preference*) and I don't have *preference.*
  So exist an Mahout method or class that handle only
  (customerID,itemID)
  data?
  And it is possible to use external data as job or (RFM ) analysis
 to
  get
  something more accurate?
 
  Sorry (it's about 2 weeks, I have headache how organize all of
 this
  to
  build a great system). Propose your solutions and after, we'll see
 
 
 
  about
 
 
  2013/11/22 Sebastian Schelter ssc.o...@googlemail.com
 
  Hi Antony,
 
  I would start with a simple approach: extract all
 customerID,itemID
  tuples from the orders table and use them as your input data. How
  many
  of those do you have? The datasize will dictate whether you need
 to
  employ a distributed approach to recommendation mining or not.
 
  --sebastian
 
  On 22.11.2013 19:21, Antony Adopo wrote:
  Morning,
 
  My name is Antony and I have a great recommender system to build
 
  I'm totally new on recommender systems. After reading all
  scientific
  files,
  I didn't find 

Re: Canopy threshold limitation

2013-11-25 Thread Chih-Hsien Wu
Hey Suneel, thanks for the reply. I'm trying to create hierarchical
clusters via top down approach. I'm caught in the trade off between the
lower canopy threshold and running out of heap memory.  Stream Kmeans
sounds ideal for top clustering. What are the major differences between
Streaming kmeans verses Kmeans, other than faster and less memory usage? In
other words, what are the pros and cons?


On Fri, Nov 22, 2013 at 5:30 PM, Suneel Marthi suneel_mar...@yahoo.comwrote:

 the threshold is based on user's pref of inter-cluster distances. If you
 are running out of memory, suggest increasing the JVM memory settings.

 Not sure as to what you are trying to accomplish, but if you are looking
 to get a first cut at clustering; suggest u look at the new Streaming
 kmeans that's part of Mahout 0.8.

 See
 http://stackoverflow.com/questions/17272296/how-to-use-mahout-streaming-k-meansfor
  the steps.






 On Friday, November 22, 2013 4:45 PM, Chih-Hsien Wu chjaso...@gmail.com
 wrote:

 Just out of curiosity. Is there a threshold limitation for canopy
 algorithm? Is it just defined by the user's preference based on the
 inter-cluster distances? or perhaps it is just limited by how much memory
 allowed to execute them?



Re: Algorithms in Mahout

2013-11-25 Thread Ted Dunning
On Mon, Nov 25, 2013 at 3:14 AM, Manuel Blechschmidt 
manuel.blechschm...@gmx.de wrote:

 There are/were multiple kNN implementation in Mahout:
 Recommender knn
 http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.6/org/apache/mahout/cf/taste/impl/recommender/knn/Optimizer.java(will
  be removed for 0.9)
 stream knn
 https://github.com/tdunning/knn/blob/master/src/main/java/org/apache/mahout/knn/cluster/StreamingKMeans.java
 normal knn


Streaming k-means isn't strictly a knn implementation.  It is a k-means
clustering application.


Recommender Streaming with EMR

2013-11-25 Thread Bryan Marble
Hello - 

If this isn't the best forum to ask, please let me know.

TL;DR;
Is there a way to stream preference/user data to an EMR recommender workflow 
without having to go through the pain of re-uploading all preference data, and 
starting brand new jobs over and over, etc?

I am trying to process large volumes of preference data using Amazon EMR.  It 
seems extremely unscalable to upload our entire preference set every time we 
run a job, as the vast majority of the preferences will never change.  It seems 
like the append files that Mahout can process would be perfect for this, but it 
doesn't appear that EMR supports it.

The brute force method appears to be:
1) Upload preference set
2) Run Recommender job
3) Download and process results
4) Go to step 1

Does anyone have some general advice for processing recommendations in as 
real-time a manner as possible using EMR?

Thank you for any help or references you could provide.

Bryan Marble



Re: Recommender Streaming with EMR

2013-11-25 Thread Manuel Blechschmidt
Hi Bryan,

On 25.11.2013, at 17:14, Bryan Marble wrote:

 Hello - 
 
 If this isn't the best forum to ask, please let me know.

This is the correct forum to ask this question.

 
 TL;DR;
 Is there a way to stream preference/user data to an EMR recommender workflow 
 without having to go through the pain of re-uploading all preference data, 
 and starting brand new jobs over and over, etc?

No, currently not. Streaming machine learning is current research. Currently 
you always train your model based on all the data that you have and use it 
afterwards. After some time you retrain.

 
 I am trying to process large volumes of preference data using Amazon EMR.  It 
 seems extremely unscalable to upload our entire preference set every time we 
 run a job

Why? Sending 1TB to EMR will take about 3,7 hours according to the following 
blog post:
http://www.rightscale.com/blog/cloud-industry-insights/network-performance-within-amazon-ec2-and-amazon-s3

If you use compression you can stream around 10 times the amount.

 , as the vast majority of the preferences will never change.

Just append them.

 It seems like the append files that Mahout can process would be perfect for 
 this, but it doesn't appear that EMR supports it.

The ItemSimilarityJob can already read multiple files:
https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJob.html
--input (path): Directory containing one or more text files with the preference 
data

 
 The brute force method appears to be:
 1) Upload preference set
 2) Run Recommender job
 3) Download and process results
 4) Go to step 1
 
 Does anyone have some general advice for processing recommendations in as 
 real-time a manner as possible using EMR?

For better advice you can contact companies like Cloudera, MapR or Apaxo (my 
company).

 
 Thank you for any help or references you could provide.
 
 Bryan Marble
 

/Manuel

-- 
Manuel Blechschmidt
M.Sc. IT Systems Engineering
Dortustr. 57
14467 Potsdam
Mobil: 0173/6322621
Twitter: http://twitter.com/Manuel_B



java.io.ioexception: Failed to set permissions of path

2013-11-25 Thread Antony Adopo
Hello,
please for my first install of Mahout, I have this error on eclipse
java.io.ioexception: Failed to set permissions of path on many tests.
please , could someone help me fix it. thanks


Only one reducer running on canopy generator

2013-11-25 Thread Chih-Hsien Wu
Hi all,  I have been experiencing memory issue while working with Mahout
canopy algorithm on big set of data on Hadoop. I notice that only one
reducer was running while other nodes were idle. I was wondering if
increasing the number of reduce tasks would ease down the memory usage and
speed up procedure. However, I realize that by configuring
mapred.reduce.tasks on Hadoop has no effect on canopy reduce tasks. It's
still running only with one reducer. Now, I'm question if canopy is set
that way, or am I not configuring correct on Hadoop?


Re: Only one reducer running on canopy generator

2013-11-25 Thread Suneel Marthi
Canopy Clustering is a 2 step process: Canopy Generation followed by Canopy 
Clustering.

For Canopy Generation, it uses a single reducer (and this cannot be overidden), 
while the Clustering task uses multiple reducers.

You seem to be hitting OOM during the Canopy generation phase.





On Monday, November 25, 2013 6:09 PM, Chih-Hsien Wu chjaso...@gmail.com wrote:
 
Hi all,  I have been experiencing memory issue while working with Mahout
canopy algorithm on big set of data on Hadoop. I notice that only one
reducer was running while other nodes were idle. I was wondering if
increasing the number of reduce tasks would ease down the memory usage and
speed up procedure. However, I realize that by configuring
mapred.reduce.tasks on Hadoop has no effect on canopy reduce tasks. It's
still running only with one reducer. Now, I'm question if canopy is set
that way, or am I not configuring correct on Hadoop?

Re: Algorithms in Mahout

2013-11-25 Thread unmesha sreeveni
Thxs for the replies. I will go through those links.Thanks for spending
time for me :)



On Mon, Nov 25, 2013 at 11:59 PM, Suneel Marthi suneel_mar...@yahoo.comwrote:

 Dhruv,

 Could u update the patch to present trunk codebase and also create a Wiki
 page for this?





 On Monday, November 25, 2013 1:04 PM, Dhruv dhru...@gmail.com wrote:

 Distributed Hidden Markov Model trainer using Baum Welch Algorithm is also
 available as a patch. Please see the JIRA issue MAHOUT-627.



 On Mon, Nov 25, 2013 at 8:07 AM, Ted Dunning ted.dunn...@gmail.com
 wrote:

  On Mon, Nov 25, 2013 at 3:14 AM, Manuel Blechschmidt 
  manuel.blechschm...@gmx.de wrote:
 
   There are/were multiple kNN implementation in Mahout:
   Recommender knn
  
 
 http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.6/org/apache/mahout/cf/taste/impl/recommender/knn/Optimizer.java(willberemoved
  for 0.9)
   stream knn
  
 
 https://github.com/tdunning/knn/blob/master/src/main/java/org/apache/mahout/knn/cluster/StreamingKMeans.java
   normal knn
  
 
  Streaming k-means isn't strictly a knn implementation.  It is a k-means
  clustering application.
 




-- 
*Thanks  Regards*

Unmesha Sreeveni U.B

*Junior Developer*