Fwd: Algorithms in Mahout
I have gone through http://mahout.apache.org for some data mining algorithms already implemented on the Hadoop plattform. From that i understood that 1. Kmeans 2. Decision Tree 3. Navie Bayes Have implementation in hadoop platform And for 4. DBscan 5. k-mearesr neighbr 6. svm 7. Logistic Regression 8. Neural n/w 9. Aprori it is not there in Mahout. Is that inference right? -- *Thanks Regards* Unmesha Sreeveni U.B *Junior Developer*
Re: Fwd: Algorithms in Mahout
k nearest neibhor, svm, logistic regression, neural nets exist in mahout . just type mahout and press enter you ll see list of algorithms available and type mahout algo-name -h to get detailed information about how to use /configure them Pavan On Nov 25, 2013 2:44 PM, unmesha sreeveni unmeshab...@gmail.com wrote: I have gone through http://mahout.apache.org for some data mining algorithms already implemented on the Hadoop plattform. From that i understood that 1. Kmeans 2. Decision Tree 3. Navie Bayes Have implementation in hadoop platform And for 4. DBscan 5. k-mearesr neighbr 6. svm 7. Logistic Regression 8. Neural n/w 9. Aprori it is not there in Mahout. Is that inference right? -- *Thanks Regards* Unmesha Sreeveni U.B *Junior Developer*
Re: Fwd: Algorithms in Mahout
From the algorithms listed, only logistic regression (non-distributed) is implemented. Sorry, for the confusion, we are currently reworking the wiki. On 25.11.2013 10:24, Pavan K Narayanan wrote: k nearest neibhor, svm, logistic regression, neural nets exist in mahout . just type mahout and press enter you ll see list of algorithms available and type mahout algo-name -h to get detailed information about how to use /configure them Pavanc On Nov 25, 2013 2:44 PM, unmesha sreeveni unmeshab...@gmail.com wrote: I have gone through http://mahout.apache.org for some data mining algorithms already implemented on the Hadoop plattform. From that i understood that 1. Kmeans 2. Decision Tree 3. Navie Bayes Have implementation in hadoop platform And for 4. DBscan 5. k-mearesr neighbr 6. svm 7. Logistic Regression 8. Neural n/w 9. Aprori it is not there in Mahout. Is that inference right? -- *Thanks Regards* Unmesha Sreeveni U.B *Junior Developer*
Re: Fwd: Algorithms in Mahout
So currently we dnt have Decision Tree in mahout 0.6 release. On Mon, Nov 25, 2013 at 2:59 PM, Sebastian Schelter ssc.o...@googlemail.com wrote: From the algorithms listed, only logistic regression (non-distributed) is implemented. Sorry, for the confusion, we are currently reworking the wiki. On 25.11.2013 10:24, Pavan K Narayanan wrote: k nearest neibhor, svm, logistic regression, neural nets exist in mahout . just type mahout and press enter you ll see list of algorithms available and type mahout algo-name -h to get detailed information about how to use /configure them Pavanc On Nov 25, 2013 2:44 PM, unmesha sreeveni unmeshab...@gmail.com wrote: I have gone through http://mahout.apache.org for some data mining algorithms already implemented on the Hadoop plattform. From that i understood that 1. Kmeans 2. Decision Tree 3. Navie Bayes Have implementation in hadoop platform And for 4. DBscan 5. k-mearesr neighbr 6. svm 7. Logistic Regression 8. Neural n/w 9. Aprori it is not there in Mahout. Is that inference right? -- *Thanks Regards* Unmesha Sreeveni U.B *Junior Developer* -- *Thanks Regards* Unmesha Sreeveni U.B *Junior Developer*
Re: Algorithms in Mahout
Hi Unmesha, please also consult JIRA as a source for algorithm, there you find implementations or discussions: e.g. for neural networks a.k.a multilayer perceptrons: https://issues.apache.org/jira/browse/MAHOUT-1265 https://issues.apache.org/jira/browse/MAHOUT-976 SVM: https://issues.apache.org/jira/browse/MAHOUT-334 https://issues.apache.org/jira/browse/MAHOUT-232 https://issues.apache.org/jira/browse/MAHOUT-14 https://issues.apache.org/jira/browse/MAHOUT-227 For aprior Mahout offered an alternative Parallel Frequent Pattern Mining. This will be retired after 0.8 https://cwiki.apache.org/confluence/display/MAHOUT/Parallel+Frequent+Pattern+Mining There are/were multiple kNN implementation in Mahout: Recommender knn http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.6/org/apache/mahout/cf/taste/impl/recommender/knn/Optimizer.java (will be removed for 0.9) stream knn https://github.com/tdunning/knn/blob/master/src/main/java/org/apache/mahout/knn/cluster/StreamingKMeans.java normal knn Hope that helps Manuel On 25.11.2013, at 10:14, unmesha sreeveni wrote: I have gone through http://mahout.apache.org for some data mining algorithms already implemented on the Hadoop plattform. From that i understood that 1. Kmeans 2. Decision Tree 3. Navie Bayes Have implementation in hadoop platform And for 4. DBscan 5. k-mearesr neighbr 6. svm 7. Logistic Regression 8. Neural n/w 9. Aprori it is not there in Mahout. Is that inference right? -- *Thanks Regards* Unmesha Sreeveni U.B *Junior Developer* -- Manuel Blechschmidt Dortustr. 57 14467 Potsdam Mobil: 0173/6322621 Twitter: http://twitter.com/Manuel_B
Re: HELP for implicit data feed back - beginner
Hello, I disover one ebook and an article which help me about my problem: the article :http://www.csulb.edu/web/journals/jecr/issues/20044/Paper1.pdf the ebook : http://www.amazon.fr/gp/product/B00BEQ82FY/ref=oh_d__o00_details_o00__i00?ie=UTF8psc=1 very interesting 2013/11/23 Manuel Blechschmidt manuel.blechschm...@gmx.de Hello Pavan, the following project is preconfigured using maven, m2eclipse and a normal eclipse project layout: https://github.com/ManuelB/facebook-recommender-demo https://raw.github.com/ManuelB/facebook-recommender-demo/master/docs/EclipseWorkspace.png When you execute the maven goal mvn install followed by mvn embedded-glassfish:run it will generate a war and deploy it on an embedded glassfish. If you have a lot of data you should build a model e.g. similarities or a matrix factorization on hadoop and then deploy this model in a live environment. Here is an excellent blog post by Sebastian: http://ssc.io/deploying-a-massively-scalable-recommender-system-with-apache-mahout/ Hope that helps Manuel On 23.11.2013, at 07:49, Sebastian Schelter wrote: You can use it in a standard Java program, no need for JavaEE. There is no special perspective for Mahout in Eclipse. The easiest way to setup up a project is to configure a maven project and use mahout-core as dependency. On 23.11.2013 13:43, Pavan K Narayanan wrote: Hi Sebastian Pardon my ignorance but how do you suggest we use this o.a.m.cf.taste.impl. recommender.GenericBooleanPrefItemBasedRecommender? Can we use it by coding in Java? - if yes, do we need Java EE? Is there a Mahout perspective for Eclipse IDE? Is it possible to use these in Mahout CLI? There are mentions of java programs in MiA but I am unsure how to setup Mahout in Java . Please can you clarify this part . Sincerely, Pavan On 23 November 2013 04:59, Sebastian Schelter ssc.o...@googlemail.com wrote: Antony, You don't need numeric ratings or preferences for your recommender. I would suggest you start by using o.a.m.cf.taste.impl.recommender.GenericBooleanPrefItemBasedRecommender which has explicitly been built to support scenarios without ratings. I would further suggest to use o.a.m.cf.taste.impl.similarity.LogLikelihoodSimilarity as similarity measure. Best, Sebastian On 22.11.2013 22:37, Antony Adopo wrote: ok, thank you so much. I will start like this and after do some tricks to increase accuracy 2013/11/22 Manuel Blechschmidt manuel.blechschm...@gmx.de Hallo Antony, you can use the following project as a starting point: https://github.com/ManuelB/facebook-recommender-demo Further you can purchase support for mahout at many companies e.g. MapR, Apaxo or Cloudera. For implicit feedback just use a 1 as preference and the LogLikelihoodSimilarity. Hope that helps Manuel On 22.11.2013, at 16:22, Antony Adopo wrote: thanks. I've already seen this but my question is Mahout propose some collaborative filtering function not based on preference? or how modelize these with purchases? Thanks 2013/11/22 Smith, Dan dan.sm...@disney.com Hi Anthony, I would suggest looking into the collaborative filtering functions. It will work best if you have your customers segmented into similar groups such as those that buy high end goods vs low end. _Dan On 11/22/13 11:04 AM, Antony Adopo saius...@gmail.com wrote: Ok. thanks for answering very quickly I forgot that to mention in the customer table there is a job variable and implicitly, I thought taht this variable will be also need for accurate recommendations. anyway I have around 200 000 customers My order table is around 12 000 000 orders and I have around 2 000 000 distincts (customerid,itemid) tuples About (customerID,itemID) tuples, when I read Mahout or recommender system litterature, they use (customerID,itemID,*preference*) and I don't have *preference.* So exist an Mahout method or class that handle only (customerID,itemID) data? And it is possible to use external data as job or (RFM ) analysis to get something more accurate? Sorry (it's about 2 weeks, I have headache how organize all of this to build a great system). Propose your solutions and after, we'll see about 2013/11/22 Sebastian Schelter ssc.o...@googlemail.com Hi Antony, I would start with a simple approach: extract all customerID,itemID tuples from the orders table and use them as your input data. How many of those do you have? The datasize will dictate whether you need to employ a distributed approach to recommendation mining or not. --sebastian On 22.11.2013 19:21, Antony Adopo wrote: Morning, My name is Antony and I have a great recommender system to build I'm totally new on recommender systems. After reading all scientific files, I didn't find
Re: Canopy threshold limitation
Hey Suneel, thanks for the reply. I'm trying to create hierarchical clusters via top down approach. I'm caught in the trade off between the lower canopy threshold and running out of heap memory. Stream Kmeans sounds ideal for top clustering. What are the major differences between Streaming kmeans verses Kmeans, other than faster and less memory usage? In other words, what are the pros and cons? On Fri, Nov 22, 2013 at 5:30 PM, Suneel Marthi suneel_mar...@yahoo.comwrote: the threshold is based on user's pref of inter-cluster distances. If you are running out of memory, suggest increasing the JVM memory settings. Not sure as to what you are trying to accomplish, but if you are looking to get a first cut at clustering; suggest u look at the new Streaming kmeans that's part of Mahout 0.8. See http://stackoverflow.com/questions/17272296/how-to-use-mahout-streaming-k-meansfor the steps. On Friday, November 22, 2013 4:45 PM, Chih-Hsien Wu chjaso...@gmail.com wrote: Just out of curiosity. Is there a threshold limitation for canopy algorithm? Is it just defined by the user's preference based on the inter-cluster distances? or perhaps it is just limited by how much memory allowed to execute them?
Re: Algorithms in Mahout
On Mon, Nov 25, 2013 at 3:14 AM, Manuel Blechschmidt manuel.blechschm...@gmx.de wrote: There are/were multiple kNN implementation in Mahout: Recommender knn http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.6/org/apache/mahout/cf/taste/impl/recommender/knn/Optimizer.java(will be removed for 0.9) stream knn https://github.com/tdunning/knn/blob/master/src/main/java/org/apache/mahout/knn/cluster/StreamingKMeans.java normal knn Streaming k-means isn't strictly a knn implementation. It is a k-means clustering application.
Recommender Streaming with EMR
Hello - If this isn't the best forum to ask, please let me know. TL;DR; Is there a way to stream preference/user data to an EMR recommender workflow without having to go through the pain of re-uploading all preference data, and starting brand new jobs over and over, etc? I am trying to process large volumes of preference data using Amazon EMR. It seems extremely unscalable to upload our entire preference set every time we run a job, as the vast majority of the preferences will never change. It seems like the append files that Mahout can process would be perfect for this, but it doesn't appear that EMR supports it. The brute force method appears to be: 1) Upload preference set 2) Run Recommender job 3) Download and process results 4) Go to step 1 Does anyone have some general advice for processing recommendations in as real-time a manner as possible using EMR? Thank you for any help or references you could provide. Bryan Marble
Re: Recommender Streaming with EMR
Hi Bryan, On 25.11.2013, at 17:14, Bryan Marble wrote: Hello - If this isn't the best forum to ask, please let me know. This is the correct forum to ask this question. TL;DR; Is there a way to stream preference/user data to an EMR recommender workflow without having to go through the pain of re-uploading all preference data, and starting brand new jobs over and over, etc? No, currently not. Streaming machine learning is current research. Currently you always train your model based on all the data that you have and use it afterwards. After some time you retrain. I am trying to process large volumes of preference data using Amazon EMR. It seems extremely unscalable to upload our entire preference set every time we run a job Why? Sending 1TB to EMR will take about 3,7 hours according to the following blog post: http://www.rightscale.com/blog/cloud-industry-insights/network-performance-within-amazon-ec2-and-amazon-s3 If you use compression you can stream around 10 times the amount. , as the vast majority of the preferences will never change. Just append them. It seems like the append files that Mahout can process would be perfect for this, but it doesn't appear that EMR supports it. The ItemSimilarityJob can already read multiple files: https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJob.html --input (path): Directory containing one or more text files with the preference data The brute force method appears to be: 1) Upload preference set 2) Run Recommender job 3) Download and process results 4) Go to step 1 Does anyone have some general advice for processing recommendations in as real-time a manner as possible using EMR? For better advice you can contact companies like Cloudera, MapR or Apaxo (my company). Thank you for any help or references you could provide. Bryan Marble /Manuel -- Manuel Blechschmidt M.Sc. IT Systems Engineering Dortustr. 57 14467 Potsdam Mobil: 0173/6322621 Twitter: http://twitter.com/Manuel_B
java.io.ioexception: Failed to set permissions of path
Hello, please for my first install of Mahout, I have this error on eclipse java.io.ioexception: Failed to set permissions of path on many tests. please , could someone help me fix it. thanks
Only one reducer running on canopy generator
Hi all, I have been experiencing memory issue while working with Mahout canopy algorithm on big set of data on Hadoop. I notice that only one reducer was running while other nodes were idle. I was wondering if increasing the number of reduce tasks would ease down the memory usage and speed up procedure. However, I realize that by configuring mapred.reduce.tasks on Hadoop has no effect on canopy reduce tasks. It's still running only with one reducer. Now, I'm question if canopy is set that way, or am I not configuring correct on Hadoop?
Re: Only one reducer running on canopy generator
Canopy Clustering is a 2 step process: Canopy Generation followed by Canopy Clustering. For Canopy Generation, it uses a single reducer (and this cannot be overidden), while the Clustering task uses multiple reducers. You seem to be hitting OOM during the Canopy generation phase. On Monday, November 25, 2013 6:09 PM, Chih-Hsien Wu chjaso...@gmail.com wrote: Hi all, I have been experiencing memory issue while working with Mahout canopy algorithm on big set of data on Hadoop. I notice that only one reducer was running while other nodes were idle. I was wondering if increasing the number of reduce tasks would ease down the memory usage and speed up procedure. However, I realize that by configuring mapred.reduce.tasks on Hadoop has no effect on canopy reduce tasks. It's still running only with one reducer. Now, I'm question if canopy is set that way, or am I not configuring correct on Hadoop?
Re: Algorithms in Mahout
Thxs for the replies. I will go through those links.Thanks for spending time for me :) On Mon, Nov 25, 2013 at 11:59 PM, Suneel Marthi suneel_mar...@yahoo.comwrote: Dhruv, Could u update the patch to present trunk codebase and also create a Wiki page for this? On Monday, November 25, 2013 1:04 PM, Dhruv dhru...@gmail.com wrote: Distributed Hidden Markov Model trainer using Baum Welch Algorithm is also available as a patch. Please see the JIRA issue MAHOUT-627. On Mon, Nov 25, 2013 at 8:07 AM, Ted Dunning ted.dunn...@gmail.com wrote: On Mon, Nov 25, 2013 at 3:14 AM, Manuel Blechschmidt manuel.blechschm...@gmx.de wrote: There are/were multiple kNN implementation in Mahout: Recommender knn http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.6/org/apache/mahout/cf/taste/impl/recommender/knn/Optimizer.java(willberemoved for 0.9) stream knn https://github.com/tdunning/knn/blob/master/src/main/java/org/apache/mahout/knn/cluster/StreamingKMeans.java normal knn Streaming k-means isn't strictly a knn implementation. It is a k-means clustering application. -- *Thanks Regards* Unmesha Sreeveni U.B *Junior Developer*