RE: Mahout v0.9 is not working with 2.2.0-cdh5.0.0-beta-1
Yes, I did rebuild it. oracle@bpdevdmsdbs01: /ora/db002/stg001/BDMSL1D/hadoop/nem-dms/devices/mahout/mahout-distribution-0.9 - $ mvn clean install -Dhadoop2.version=2.2.0-cdh5.0.0-beta-1 -DskipTests=true [INFO] Scanning for projects... [INFO] [INFO] Reactor Summary: [INFO] [INFO] Mahout Build Tools SUCCESS [ 8.215 s] [INFO] Apache Mahout . SUCCESS [ 1.158 s] [INFO] Mahout Math ... SUCCESS [16:21 min] [INFO] Mahout Core ... SUCCESS [26:21 min] [INFO] Mahout Integration SUCCESS [03:55 min] [INFO] Mahout Examples ... SUCCESS [02:54 min] [INFO] Mahout Release Package SUCCESS [ 0.084 s] [INFO] Mahout Math/Scala wrappers SUCCESS [01:16 min] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 50:59 min [INFO] Finished at: 2014-03-31T14:25:27+10:00 [INFO] Final Memory: 47M/250M [INFO] Thanks and Regards, Truong Phan P + 61 2 8576 5771 M + 61 4 1463 7424 E troung.p...@team.telstra.com W www.telstra.com -Original Message- From: Andrew Musselman [mailto:andrew.mussel...@gmail.com] Sent: Monday, 31 March 2014 2:44 PM To: user@mahout.apache.org Subject: Re: Mahout v0.9 is not working with 2.2.0-cdh5.0.0-beta-1 Have you rebuilt Mahout for your version? We're not supporting Hadoop version two yet. See here for some direction: http://mail-archives.us.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCANg8BGD8Cm_=ESecQQ5mDL+6ybbNrR1Ce7i=pkuimxmcktw...@mail.gmail.com%3E On Mar 30, 2014, at 7:28 PM, Phan, Truong Q troung.p...@team.telstra.com wrote: Hi Does Mahout v0.9 supports Cloudera Hadoop v5 (2.2.0-cdh5.0.0-beta-1)? I have managed to installed and run all test cases under the Mahout v0.9 without any issue. Please see below for the evident of the test cases. However I have no success to run the example from http://girlincomputerscience.blogspot.com.au/2010/11/apache-mahout.html and got the following errors. Note: I have set the CLASSPATH to point to all of Mahout’s jar files. snip $ env | grep CLASS CLASSPATH=:/usr/lib/hadoop-0.20-mapreduce/lib:/usr/lib/hadoop-0.20-map reduce/lib:/ora/db002/stg001/BDMSL1D/hadoop/nem-dms/devices/mahout/mah out-distribution-0.9/core/target/mahout-core-0.9.jar:/ora/db002/stg001 /BDMSL1D/hadoop/nem-dms/devices/mahout/mahout-distribution-0.9/core/ta rget/mahout-core-0.9-job.jar:/ora/db002/stg001/BDMSL1D/hadoop/nem-dms/ devices/mahout/mahout-distribution-0.9/core/target/mahout-core-0.9-sou rces.jar:/ora/db002/stg001/BDMSL1D/hadoop/nem-dms/devices/mahout/mahou t-distribution-0.9/core/target/mahout-core-0.9-tests.jar:/ora/db002/st g001/BDMSL1D/hadoop/nem-dms/devices/mahout/mahout-distribution-0.9/mat h/target/mahout-math-0.9.jar:/ora/db002/stg001/BDMSL1D/hadoop/nem-dms/ devices/mahout/mahout-distribution-0.9/math/target/mahout-math-0.9-sou rces.jar:/ora/db002/stg001/BDMSL1D/hadoop/nem-dms/devices/mahout/mahou t-distribution-0.9/math/target/mahout-math-0.9-tests.jar:/ora/db002/st g001/BDMSL1D/hadoop/nem-dms/devices/mahout/mahout-distribution-0.9/int egration/target/mahout-integration-0.9.jar:/ora/db002/stg001/BDMSL1D/h adoop/nem-dms/devices/mahout/mahout-distribution-0.9/integration/targe t/mahout-integration-0.9-sources.jar $ export MAHOUT_HOME=/ora/db002/stg001/BDMSL1D/hadoop/nem-dms/devices/mahout/ma hout-distribution-0.9 $ export PATH=$MAHOUT_HOME/bin:$PATH oracle@bpdevdmsdbs01:BDMSSI1D1 /ora/db002/stg001/BDMSL1D/hadoop/nem-dms/devices/mahout/mahout-distrib ution-0.9/nem-dms - $ mahout recommenditembased --input mydata.dat --usersFile user.dat --numRecommendations 2 --output output/ --similarityClassname SIMILARITY_PEARSON_CORRELATION Running on hadoop, using /usr/lib/hadoop-0.20-mapreduce/bin/hadoop and HADOOP_CONF_DIR= MAHOUT-JOB: /ora/db002/stg001/BDMSL1D/hadoop/nem-dms/devices/mahout/mahout-distrib ution-0.9/examples/target/mahout-examples-0.9-job.jar Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/util/PlatformName Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.PlatformName at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at
Re: Profiling with visualvm
I tried with YourKit and a CPU sampling analysis shows only three threads! org.apache.hadoop.mapred.LocalJobRunner$Job.run() org.apache.mahout.driver.MahoutDriver.main(String[]) java.lang.Thread.run() I am trying to view somthing like http://www.yourkit.com/docs/yjp2013/help/cpu_intro.jsp Whoever tried mahout/hadoop profiling, please let us know Regards, Mahmood On Sunday, March 30, 2014 2:30 PM, Mahmood Naderan nt_mahm...@yahoo.com wrote: Profiled what exactly, a Hadoop job? As soon as I run /mahout testclassifier -m wikipediamodel -d wikipediainputI see a org.apache.mahout.driver.MahoutDriver in the visualvm and then I open it. Regards, Mahmood
RE: Mahout v0.9 is not working with 2.2.0-cdh5.0.0-beta-1
But you have a bunch of Hadoop 0.20 jars on your classpath! Definitely a problem. Those should not be there. On Mar 31, 2014 7:09 AM, Phan, Truong Q troung.p...@team.telstra.com wrote: Yes, I did rebuild it. oracle@bpdevdmsdbs01: /ora/db002/stg001/BDMSL1D/hadoop/nem-dms/devices/mahout/mahout-distribution-0.9 - $ mvn clean install -Dhadoop2.version=2.2.0-cdh5.0.0-beta-1 -DskipTests=true [INFO] Scanning for projects... [INFO] [INFO] Reactor Summary: [INFO] [INFO] Mahout Build Tools SUCCESS [ 8.215 s] [INFO] Apache Mahout . SUCCESS [ 1.158 s] [INFO] Mahout Math ... SUCCESS [16:21 min] [INFO] Mahout Core ... SUCCESS [26:21 min] [INFO] Mahout Integration SUCCESS [03:55 min] [INFO] Mahout Examples ... SUCCESS [02:54 min] [INFO] Mahout Release Package SUCCESS [ 0.084 s] [INFO] Mahout Math/Scala wrappers SUCCESS [01:16 min] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 50:59 min [INFO] Finished at: 2014-03-31T14:25:27+10:00 [INFO] Final Memory: 47M/250M [INFO] Thanks and Regards, Truong Phan P+ 61 2 8576 5771 M + 61 4 1463 7424 Etroung.p...@team.telstra.com W www.telstra.com -Original Message- From: Andrew Musselman [mailto:andrew.mussel...@gmail.com] Sent: Monday, 31 March 2014 2:44 PM To: user@mahout.apache.org Subject: Re: Mahout v0.9 is not working with 2.2.0-cdh5.0.0-beta-1 Have you rebuilt Mahout for your version? We're not supporting Hadoop version two yet. See here for some direction: http://mail-archives.us.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCANg8BGD8Cm_=ESecQQ5mDL+6ybbNrR1Ce7i=pkuimxmcktw...@mail.gmail.com%3E On Mar 30, 2014, at 7:28 PM, Phan, Truong Q troung.p...@team.telstra.com wrote: Hi Does Mahout v0.9 supports Cloudera Hadoop v5 (2.2.0-cdh5.0.0-beta-1)? I have managed to installed and run all test cases under the Mahout v0.9 without any issue. Please see below for the evident of the test cases. However I have no success to run the example from http://girlincomputerscience.blogspot.com.au/2010/11/apache-mahout.htmland got the following errors. Note: I have set the CLASSPATH to point to all of Mahout’s jar files. snip $ env | grep CLASS CLASSPATH=:/usr/lib/hadoop-0.20-mapreduce/lib:/usr/lib/hadoop-0.20-map reduce/lib:/ora/db002/stg001/BDMSL1D/hadoop/nem-dms/devices/mahout/mah out-distribution-0.9/core/target/mahout-core-0.9.jar:/ora/db002/stg001 /BDMSL1D/hadoop/nem-dms/devices/mahout/mahout-distribution-0.9/core/ta rget/mahout-core-0.9-job.jar:/ora/db002/stg001/BDMSL1D/hadoop/nem-dms/ devices/mahout/mahout-distribution-0.9/core/target/mahout-core-0.9-sou rces.jar:/ora/db002/stg001/BDMSL1D/hadoop/nem-dms/devices/mahout/mahou t-distribution-0.9/core/target/mahout-core-0.9-tests.jar:/ora/db002/st g001/BDMSL1D/hadoop/nem-dms/devices/mahout/mahout-distribution-0.9/mat h/target/mahout-math-0.9.jar:/ora/db002/stg001/BDMSL1D/hadoop/nem-dms/ devices/mahout/mahout-distribution-0.9/math/target/mahout-math-0.9-sou rces.jar:/ora/db002/stg001/BDMSL1D/hadoop/nem-dms/devices/mahout/mahou t-distribution-0.9/math/target/mahout-math-0.9-tests.jar:/ora/db002/st g001/BDMSL1D/hadoop/nem-dms/devices/mahout/mahout-distribution-0.9/int egration/target/mahout-integration-0.9.jar:/ora/db002/stg001/BDMSL1D/h adoop/nem-dms/devices/mahout/mahout-distribution-0.9/integration/targe t/mahout-integration-0.9-sources.jar $ export MAHOUT_HOME=/ora/db002/stg001/BDMSL1D/hadoop/nem-dms/devices/mahout/ma hout-distribution-0.9 $ export PATH=$MAHOUT_HOME/bin:$PATH oracle@bpdevdmsdbs01:BDMSSI1D1 /ora/db002/stg001/BDMSL1D/hadoop/nem-dms/devices/mahout/mahout-distrib ution-0.9/nem-dms - $ mahout recommenditembased --input mydata.dat --usersFile user.dat --numRecommendations 2 --output output/ --similarityClassname SIMILARITY_PEARSON_CORRELATION Running on hadoop, using /usr/lib/hadoop-0.20-mapreduce/bin/hadoop and HADOOP_CONF_DIR= MAHOUT-JOB: /ora/db002/stg001/BDMSL1D/hadoop/nem-dms/devices/mahout/mahout-distrib ution-0.9/examples/target/mahout-examples-0.9-job.jar Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/util/PlatformName Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.PlatformName at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method)
Re: Fuzzy KMeans fails on reuters corpus with 4GB max heap size
What else could I do to avoid the problem ? Another question is that, whether or not can this be resolved using a later version of Mahout. I ran the same example with Mahout 0.9 and it works fine for me. Regards, Saleem
Re: (help!) Can someone scan this
FYI I eventually got this working. Im not sure what the fix was, but here is all the stuff i tried (some combination below must have got it working) . - created log4j.properties files and made sure all the necessary properties were there - exported some of the usual hadoop HOME and HADOOP_CONF dir env properties. - expoerted MAHOUT_HOME. In any case, I thin something about the way mahout nests jobs, or else, the way it logs, makes it tricky to debug when failures happen in local mode, but i was never able to put my finger on just what. On Sat, Mar 29, 2014 at 11:34 AM, Jay Vyas jayunit...@gmail.com wrote: 0.9.0 What do you mean by explicitly setting the /tmp path? Thanks for the feedback. FYI, after the job is run, I see that it fails IMMEDIATELY when starting the PreparePreferenceMatrix job, and i see this in my local hadoop /tmp dir: ├── [102] local │ └── [102] localRunner │ └── [170] jay │ ├── [ 68] job_local1531736937_0001 │ ├── [ 68] job_local218993552_0002 │ └── [136] jobcache │ ├── [102] job_local1531736937_0001 │ │ └── [102] attempt_local1531736937_0001_m_00_0 │ │ └── [136] output │ │ ├── [ 14] file.out │ │ └── [ 32] file.out.index │ └── [102] job_local218993552_0002 │ └── [102] attempt_local218993552_0002_m_00_0 │ └── [136] output │ ├── [ 14] file.out │ └── [ 32] file.out.index └── [136] staging ├── [102] jay1531736937 └── [102] jay218993552 On Sat, Mar 29, 2014 at 2:01 AM, Sebastian Schelter s...@apache.orgwrote: Jay, which version of Mahout are you using? Have you tried to explicitly set the temp path? --sebastian On 03/29/2014 01:52 AM, Jay Vyas wrote: Hi again mahout: Im wrapping a distributed recommender like this: https://raw.githubusercontent.com/jayunit100/bigpetstore/ master/src/main/java/org/bigtop/bigpetstore/clustering/ BPSRecommnder.java And its not working. Any thoguhts on why? The error message is simply that intermediate data sets dont exist (i.e. numUsers.bin or /tmp/preparePreferencesMatrix...). Basically its clear that the intermediate jobs are failing but i cant see any reason why they would fail And I don't see any meaningfull stack traces. I've found alot of good whitepapers and stuff on how the algorithms work , but its not clear what is really done for me by mahout, and what i have to do on my own for the distributed recommender APIs. -- Jay Vyas http://jayunit100.blogspot.com -- Jay Vyas http://jayunit100.blogspot.com
Recommendation thresholds
Hi again mahout! What is the lowest that we can set a threshold in the item recommender? I'd like to set it low enough to gaurantee output to confirm that my recommender actually worked structurally, and then start tightening it up But with --threshold=.0001 i still get no results.
Using split without partitioning the data to train/test
Hi, In an old Mahout, I used wikipediaDataSetCreator on an input to create the training data mahout wikipediaDataSetCreator -i wiki-tr/chunks -o tr-input -c labels.txt and then fed the tr-input to the trainclassifier using mahout trainclassifier -i tr-input -o wikimodel Now, in Mahout 0.9, I see some examples that create 80% of the input file as training model using split mahout split -i input-vectors --trainingOutput tr-vectors --testOutput ts-vectors --randomSelectionPct 20 My question is how can I use split to split the input without partitioning it to train and test parts? I want to use one file as training input and the other file as the test input. Regards, Mahmood
Re: Using split without partitioning the data to train/test
Sent from my iPhone On Mar 31, 2014, at 4:20 PM, Mahmood Naderan nt_mahm...@yahoo.com wrote: Hi, In an old Mahout, I used wikipediaDataSetCreator on an input to create the training data mahout wikipediaDataSetCreator -i wiki-tr/chunks -o tr-input -c labels.txt and then fed the tr-input to the trainclassifier using mahout trainclassifier -i tr-input -o wikimodel Now, in Mahout 0.9, I see some examples that create 80% of the input file as training model using split mahout split -i input-vectors --trainingOutput tr-vectors --testOutput ts-vectors --randomSelectionPct 20 My question is how can I use split to split the input without partitioning it to train and test parts? I want to use one file as training input and the other file as the test input. So why use 'split'? Separate out the test and training files. Regards, Mahmood
Re: Using split without partitioning the data to train/test
Yeah you are right. I have to ignore that command Regards, Mahmood On Monday, March 31, 2014 6:56 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: Sent from my iPhone On Mar 31, 2014, at 4:20 PM, Mahmood Naderan nt_mahm...@yahoo.com wrote: Hi, In an old Mahout, I used wikipediaDataSetCreator on an input to create the training data mahout wikipediaDataSetCreator -i wiki-tr/chunks -o tr-input -c labels.txt and then fed the tr-input to the trainclassifier using mahout trainclassifier -i tr-input -o wikimodel Now, in Mahout 0.9, I see some examples that create 80% of the input file as training model using split mahout split -i input-vectors --trainingOutput tr-vectors --testOutput ts-vectors --randomSelectionPct 20 My question is how can I use split to split the input without partitioning it to train and test parts? I want to use one file as training input and the other file as the test input. So why use 'split'? Separate out the test and training files. Regards, Mahmood
Difference between CiMapper and ClusterIterator
Hi all, I noticed in the CIMapper that the policy.update() call is done in the setup of the mapper, while in the ClusterIterator it is called for every vector in the iteration. In the sequential version there is only a single policy while in the MR version we will get a policy per mapper. Which implementation is correct? If I recall correctly from the previous K-means implementation the update centroids step was done at the end of each iteration, so I think the policy.update() call should be moved outside of the vector loop in ClusterIterator. Thoughts? Cheers, Frank
Amazon EMR updating Mahout
The EMR team told me that as requested they'll upgrade their default AMI to use Mahout 0.9 in their next release scheduled for April 7. Best Andrew