RE: Mahout v0.9 is not working with 2.2.0-cdh5.0.0-beta-1

2014-03-31 Thread Phan, Truong Q
Yes, I did rebuild it.

oracle@bpdevdmsdbs01:  
/ora/db002/stg001/BDMSL1D/hadoop/nem-dms/devices/mahout/mahout-distribution-0.9 
-
$ mvn clean install -Dhadoop2.version=2.2.0-cdh5.0.0-beta-1 -DskipTests=true
[INFO] Scanning for projects...

[INFO] 
[INFO] Reactor Summary:
[INFO]
[INFO] Mahout Build Tools  SUCCESS [  8.215 s]
[INFO] Apache Mahout . SUCCESS [  1.158 s]
[INFO] Mahout Math ... SUCCESS [16:21 min]
[INFO] Mahout Core ... SUCCESS [26:21 min]
[INFO] Mahout Integration  SUCCESS [03:55 min]
[INFO] Mahout Examples ... SUCCESS [02:54 min]
[INFO] Mahout Release Package  SUCCESS [  0.084 s]
[INFO] Mahout Math/Scala wrappers  SUCCESS [01:16 min]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 50:59 min
[INFO] Finished at: 2014-03-31T14:25:27+10:00
[INFO] Final Memory: 47M/250M
[INFO] 


Thanks and Regards,
Truong Phan


P    + 61 2 8576 5771
M   + 61 4 1463 7424
E    troung.p...@team.telstra.com
W  www.telstra.com


-Original Message-
From: Andrew Musselman [mailto:andrew.mussel...@gmail.com] 
Sent: Monday, 31 March 2014 2:44 PM
To: user@mahout.apache.org
Subject: Re: Mahout v0.9 is not working with 2.2.0-cdh5.0.0-beta-1

Have you rebuilt Mahout for your version?  We're not supporting Hadoop version 
two yet.

See here for some direction:  
http://mail-archives.us.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCANg8BGD8Cm_=ESecQQ5mDL+6ybbNrR1Ce7i=pkuimxmcktw...@mail.gmail.com%3E

 On Mar 30, 2014, at 7:28 PM, Phan, Truong Q troung.p...@team.telstra.com 
 wrote:
 
 Hi
  
 Does Mahout v0.9 supports Cloudera Hadoop v5 (2.2.0-cdh5.0.0-beta-1)?
 I have managed to installed and run all test cases under the Mahout v0.9 
 without any issue.
 Please see below for the evident of the test cases.
 However I have no success to run the example from 
 http://girlincomputerscience.blogspot.com.au/2010/11/apache-mahout.html and 
 got the following errors.
 Note: I have set the CLASSPATH to point to all of Mahout’s jar files.
  
 snip
 $ env | grep CLASS
 CLASSPATH=:/usr/lib/hadoop-0.20-mapreduce/lib:/usr/lib/hadoop-0.20-map
 reduce/lib:/ora/db002/stg001/BDMSL1D/hadoop/nem-dms/devices/mahout/mah
 out-distribution-0.9/core/target/mahout-core-0.9.jar:/ora/db002/stg001
 /BDMSL1D/hadoop/nem-dms/devices/mahout/mahout-distribution-0.9/core/ta
 rget/mahout-core-0.9-job.jar:/ora/db002/stg001/BDMSL1D/hadoop/nem-dms/
 devices/mahout/mahout-distribution-0.9/core/target/mahout-core-0.9-sou
 rces.jar:/ora/db002/stg001/BDMSL1D/hadoop/nem-dms/devices/mahout/mahou
 t-distribution-0.9/core/target/mahout-core-0.9-tests.jar:/ora/db002/st
 g001/BDMSL1D/hadoop/nem-dms/devices/mahout/mahout-distribution-0.9/mat
 h/target/mahout-math-0.9.jar:/ora/db002/stg001/BDMSL1D/hadoop/nem-dms/
 devices/mahout/mahout-distribution-0.9/math/target/mahout-math-0.9-sou
 rces.jar:/ora/db002/stg001/BDMSL1D/hadoop/nem-dms/devices/mahout/mahou
 t-distribution-0.9/math/target/mahout-math-0.9-tests.jar:/ora/db002/st
 g001/BDMSL1D/hadoop/nem-dms/devices/mahout/mahout-distribution-0.9/int
 egration/target/mahout-integration-0.9.jar:/ora/db002/stg001/BDMSL1D/h
 adoop/nem-dms/devices/mahout/mahout-distribution-0.9/integration/targe
 t/mahout-integration-0.9-sources.jar
  
 $ export 
 MAHOUT_HOME=/ora/db002/stg001/BDMSL1D/hadoop/nem-dms/devices/mahout/ma
 hout-distribution-0.9
 $ export PATH=$MAHOUT_HOME/bin:$PATH
  
 oracle@bpdevdmsdbs01:BDMSSI1D1  
 /ora/db002/stg001/BDMSL1D/hadoop/nem-dms/devices/mahout/mahout-distrib
 ution-0.9/nem-dms - $ mahout recommenditembased --input mydata.dat 
 --usersFile user.dat --numRecommendations 2 --output output/ 
 --similarityClassname SIMILARITY_PEARSON_CORRELATION Running on 
 hadoop, using /usr/lib/hadoop-0.20-mapreduce/bin/hadoop and 
 HADOOP_CONF_DIR=
 MAHOUT-JOB: 
 /ora/db002/stg001/BDMSL1D/hadoop/nem-dms/devices/mahout/mahout-distrib
 ution-0.9/examples/target/mahout-examples-0.9-job.jar
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/hadoop/util/PlatformName
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.hadoop.util.PlatformName
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at 

Re: Profiling with visualvm

2014-03-31 Thread Mahmood Naderan
I tried with YourKit and a CPU sampling analysis shows only three threads!

org.apache.hadoop.mapred.LocalJobRunner$Job.run() 
org.apache.mahout.driver.MahoutDriver.main(String[])
java.lang.Thread.run()


I am trying to view somthing like 
http://www.yourkit.com/docs/yjp2013/help/cpu_intro.jsp
Whoever tried mahout/hadoop profiling, please let us know

 
Regards,
Mahmood
On Sunday, March 30, 2014 2:30 PM, Mahmood Naderan nt_mahm...@yahoo.com wrote:
 
 Profiled what exactly, a Hadoop job?
As soon as I run 
/mahout testclassifier -m
wikipediamodel -d wikipediainputI see a org.apache.mahout.driver.MahoutDriver 
in the visualvm and then I open it.




 
Regards,
Mahmood

RE: Mahout v0.9 is not working with 2.2.0-cdh5.0.0-beta-1

2014-03-31 Thread Sean Owen
But you have a bunch of Hadoop 0.20 jars on your classpath! Definitely a
problem. Those should not be there.
On Mar 31, 2014 7:09 AM, Phan, Truong Q troung.p...@team.telstra.com
wrote:

 Yes, I did rebuild it.

 oracle@bpdevdmsdbs01: 
 /ora/db002/stg001/BDMSL1D/hadoop/nem-dms/devices/mahout/mahout-distribution-0.9
 -
 $ mvn clean install -Dhadoop2.version=2.2.0-cdh5.0.0-beta-1
 -DskipTests=true
 [INFO] Scanning for projects...
 
 [INFO]
 
 [INFO] Reactor Summary:
 [INFO]
 [INFO] Mahout Build Tools  SUCCESS [
  8.215 s]
 [INFO] Apache Mahout . SUCCESS [
  1.158 s]
 [INFO] Mahout Math ... SUCCESS [16:21
 min]
 [INFO] Mahout Core ... SUCCESS [26:21
 min]
 [INFO] Mahout Integration  SUCCESS [03:55
 min]
 [INFO] Mahout Examples ... SUCCESS [02:54
 min]
 [INFO] Mahout Release Package  SUCCESS [
  0.084 s]
 [INFO] Mahout Math/Scala wrappers  SUCCESS [01:16
 min]
 [INFO]
 
 [INFO] BUILD SUCCESS
 [INFO]
 
 [INFO] Total time: 50:59 min
 [INFO] Finished at: 2014-03-31T14:25:27+10:00
 [INFO] Final Memory: 47M/250M
 [INFO]
 


 Thanks and Regards,
 Truong Phan


 P+ 61 2 8576 5771
 M   + 61 4 1463 7424
 Etroung.p...@team.telstra.com
 W  www.telstra.com


 -Original Message-
 From: Andrew Musselman [mailto:andrew.mussel...@gmail.com]
 Sent: Monday, 31 March 2014 2:44 PM
 To: user@mahout.apache.org
 Subject: Re: Mahout v0.9 is not working with 2.2.0-cdh5.0.0-beta-1

 Have you rebuilt Mahout for your version?  We're not supporting Hadoop
 version two yet.

 See here for some direction:
 http://mail-archives.us.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCANg8BGD8Cm_=ESecQQ5mDL+6ybbNrR1Ce7i=pkuimxmcktw...@mail.gmail.com%3E

  On Mar 30, 2014, at 7:28 PM, Phan, Truong Q 
 troung.p...@team.telstra.com wrote:
 
  Hi
 
  Does Mahout v0.9 supports Cloudera Hadoop v5 (2.2.0-cdh5.0.0-beta-1)?
  I have managed to installed and run all test cases under the Mahout v0.9
 without any issue.
  Please see below for the evident of the test cases.
  However I have no success to run the example from
 http://girlincomputerscience.blogspot.com.au/2010/11/apache-mahout.htmland 
 got the following errors.
  Note: I have set the CLASSPATH to point to all of Mahout’s jar files.
 
  snip
  $ env | grep CLASS
  CLASSPATH=:/usr/lib/hadoop-0.20-mapreduce/lib:/usr/lib/hadoop-0.20-map
  reduce/lib:/ora/db002/stg001/BDMSL1D/hadoop/nem-dms/devices/mahout/mah
  out-distribution-0.9/core/target/mahout-core-0.9.jar:/ora/db002/stg001
  /BDMSL1D/hadoop/nem-dms/devices/mahout/mahout-distribution-0.9/core/ta
  rget/mahout-core-0.9-job.jar:/ora/db002/stg001/BDMSL1D/hadoop/nem-dms/
  devices/mahout/mahout-distribution-0.9/core/target/mahout-core-0.9-sou
  rces.jar:/ora/db002/stg001/BDMSL1D/hadoop/nem-dms/devices/mahout/mahou
  t-distribution-0.9/core/target/mahout-core-0.9-tests.jar:/ora/db002/st
  g001/BDMSL1D/hadoop/nem-dms/devices/mahout/mahout-distribution-0.9/mat
  h/target/mahout-math-0.9.jar:/ora/db002/stg001/BDMSL1D/hadoop/nem-dms/
  devices/mahout/mahout-distribution-0.9/math/target/mahout-math-0.9-sou
  rces.jar:/ora/db002/stg001/BDMSL1D/hadoop/nem-dms/devices/mahout/mahou
  t-distribution-0.9/math/target/mahout-math-0.9-tests.jar:/ora/db002/st
  g001/BDMSL1D/hadoop/nem-dms/devices/mahout/mahout-distribution-0.9/int
  egration/target/mahout-integration-0.9.jar:/ora/db002/stg001/BDMSL1D/h
  adoop/nem-dms/devices/mahout/mahout-distribution-0.9/integration/targe
  t/mahout-integration-0.9-sources.jar
 
  $ export
  MAHOUT_HOME=/ora/db002/stg001/BDMSL1D/hadoop/nem-dms/devices/mahout/ma
  hout-distribution-0.9
  $ export PATH=$MAHOUT_HOME/bin:$PATH
 
  oracle@bpdevdmsdbs01:BDMSSI1D1 
  /ora/db002/stg001/BDMSL1D/hadoop/nem-dms/devices/mahout/mahout-distrib
  ution-0.9/nem-dms - $ mahout recommenditembased --input mydata.dat
  --usersFile user.dat --numRecommendations 2 --output output/
  --similarityClassname SIMILARITY_PEARSON_CORRELATION Running on
  hadoop, using /usr/lib/hadoop-0.20-mapreduce/bin/hadoop and
  HADOOP_CONF_DIR=
  MAHOUT-JOB:
  /ora/db002/stg001/BDMSL1D/hadoop/nem-dms/devices/mahout/mahout-distrib
  ution-0.9/examples/target/mahout-examples-0.9-job.jar
  Exception in thread main java.lang.NoClassDefFoundError:
  org/apache/hadoop/util/PlatformName
  Caused by: java.lang.ClassNotFoundException:
 org.apache.hadoop.util.PlatformName
  at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
  at java.security.AccessController.doPrivileged(Native Method)
 

Re: Fuzzy KMeans fails on reuters corpus with 4GB max heap size

2014-03-31 Thread tuxdna

 What else could I do to avoid the problem ?

 Another question is that, whether or not can this be resolved using a
 later version of Mahout.


I ran the same example with Mahout 0.9 and it works fine for me.


Regards,
Saleem


Re: (help!) Can someone scan this

2014-03-31 Thread Jay Vyas
FYI I eventually got this working.  Im not sure what the fix was, but here
is all the stuff i tried (some combination below must have got it working) .

- created log4j.properties files and made sure all the necessary properties
were there
- exported some of the usual hadoop HOME and HADOOP_CONF dir env properties.
- expoerted MAHOUT_HOME.

In any case, I thin something about the way mahout nests jobs, or else, the
way it logs, makes it tricky to debug when failures happen in local mode,
but i was never able to put my finger on just what.




On Sat, Mar 29, 2014 at 11:34 AM, Jay Vyas jayunit...@gmail.com wrote:

 0.9.0   What do you mean by explicitly setting the /tmp path?

 Thanks for the feedback.  FYI, after the job is run, I see that it fails
 IMMEDIATELY when starting the PreparePreferenceMatrix job, and i see this
 in my local hadoop /tmp dir:

 ├── [102]  local
 │   └── [102]  localRunner
 │   └── [170]  jay
 │   ├── [ 68]  job_local1531736937_0001
 │   ├── [ 68]  job_local218993552_0002
 │   └── [136]  jobcache
 │   ├── [102]  job_local1531736937_0001
 │   │   └── [102]
 attempt_local1531736937_0001_m_00_0
 │   │   └── [136]  output
 │   │   ├── [ 14]  file.out
 │   │   └── [ 32]  file.out.index
 │   └── [102]  job_local218993552_0002
 │   └── [102]
 attempt_local218993552_0002_m_00_0
 │   └── [136]  output
 │   ├── [ 14]  file.out
 │   └── [ 32]  file.out.index
 └── [136]  staging
 ├── [102]  jay1531736937
 └── [102]  jay218993552



 On Sat, Mar 29, 2014 at 2:01 AM, Sebastian Schelter s...@apache.orgwrote:

 Jay,

 which version of Mahout are you using? Have you tried to explicitly set
 the temp path?

 --sebastian


 On 03/29/2014 01:52 AM, Jay Vyas wrote:

 Hi again mahout:

 Im wrapping a distributed recommender like this:

 https://raw.githubusercontent.com/jayunit100/bigpetstore/
 master/src/main/java/org/bigtop/bigpetstore/clustering/
 BPSRecommnder.java

 And its not working.

 Any thoguhts on why?  The error message is simply that intermediate data
 sets dont exist (i.e. numUsers.bin or /tmp/preparePreferencesMatrix...).

 Basically its clear that the intermediate jobs are failing but i cant see
 any reason why they would fail And I don't see any meaningfull stack
 traces.

 I've found alot of good whitepapers and stuff on how the algorithms work
 ,
 but its not clear what is really done for me by mahout, and what i have
 to
 do on my own for the distributed recommender APIs.





 --
 Jay Vyas
 http://jayunit100.blogspot.com




-- 
Jay Vyas
http://jayunit100.blogspot.com


Recommendation thresholds

2014-03-31 Thread Jay Vyas
Hi again mahout!

What is the lowest that we can set a threshold in the item recommender?

I'd like to set it low enough to gaurantee output to confirm that my 
recommender actually worked structurally, and then start tightening it up 
But with 

--threshold=.0001 i still get no results.

Using split without partitioning the data to train/test

2014-03-31 Thread Mahmood Naderan
Hi,
In an old Mahout, I used wikipediaDataSetCreator on an input to create the 
training data
    
    mahout wikipediaDataSetCreator -i 
wiki-tr/chunks -o tr-input -c labels.txt 

and then fed the tr-input to the trainclassifier using

    mahout trainclassifier -i tr-input -o wikimodel


Now, in Mahout 0.9, I see some examples that create 80% of the input file as 
training model using split

    mahout split -i input-vectors --trainingOutput tr-vectors --testOutput 
ts-vectors --randomSelectionPct 20

My question is how can I use split to split the input without partitioning it 
to train and test parts? I want to use one file as training input and the other 
file as the test input.


 
Regards,
Mahmood

Re: Using split without partitioning the data to train/test

2014-03-31 Thread Suneel Marthi


Sent from my iPhone

 On Mar 31, 2014, at 4:20 PM, Mahmood Naderan nt_mahm...@yahoo.com wrote:
 
 Hi,
 In an old Mahout, I used wikipediaDataSetCreator on an input to create the 
 training data
 
 mahout wikipediaDataSetCreator -i 
 wiki-tr/chunks -o tr-input -c labels.txt 
 
 and then fed the tr-input to the trainclassifier using
 
 mahout trainclassifier -i tr-input -o wikimodel
 
 
 Now, in Mahout 0.9, I see some examples that create 80% of the input file as 
 training model using split
 
 mahout split -i input-vectors --trainingOutput tr-vectors --testOutput 
 ts-vectors --randomSelectionPct 20
 
 My question is how can I use split to split the input without partitioning 
 it to train and test parts? I want to use one file as training input and the 
 other file as the test input.

So why use 'split'?  Separate out the test and training files. 
 
 
  
 Regards,
 Mahmood


Re: Using split without partitioning the data to train/test

2014-03-31 Thread Mahmood Naderan
Yeah you are right. I have to ignore that command

 
Regards,
Mahmood
On Monday, March 31, 2014 6:56 PM, Suneel Marthi suneel_mar...@yahoo.com 
wrote:
 


Sent from my iPhone

 On Mar 31, 2014, at 4:20 PM, Mahmood Naderan nt_mahm...@yahoo.com wrote:
 
 Hi,
 In an old Mahout, I used wikipediaDataSetCreator on an input to create the 
 training data
    
     mahout wikipediaDataSetCreator -i 
 wiki-tr/chunks -o tr-input -c labels.txt 
 
 and then fed the tr-input to the trainclassifier using
 
     mahout trainclassifier -i tr-input -o wikimodel
 
 
 Now, in Mahout 0.9, I see some examples that create 80% of the input file as 
 training model using split
 
     mahout split -i input-vectors --trainingOutput tr-vectors --testOutput 
ts-vectors --randomSelectionPct 20
 
 My question is how can I use split to split the input without partitioning 
 it to train and test parts? I want to use one file as training input and the 
 other file as the test input.

So why use 'split'?  Separate out the test and training files. 

 
 
  
 Regards,
 Mahmood

Difference between CiMapper and ClusterIterator

2014-03-31 Thread Frank Scholten
Hi all,

I noticed in the CIMapper that the policy.update() call is done in the
setup of the mapper, while
in the ClusterIterator it is called for every vector in the iteration.

In the sequential version there is only a single policy while in the MR
version we will get a policy per mapper. Which implementation is correct?
If I recall correctly from the previous K-means implementation the update
centroids step was done at the end of each iteration, so I think the
policy.update() call should be moved outside of the vector loop in
ClusterIterator.

Thoughts?

Cheers,

Frank


Amazon EMR updating Mahout

2014-03-31 Thread Andrew Musselman
The EMR team told me that as requested they'll upgrade their default AMI to
use Mahout 0.9 in their next release scheduled for April 7.

Best
Andrew