Re: Welcome Andrew Musselman as new comitter

2014-03-08 Thread Sebastian Schelter

Hi Pavan,

Committership is given for engagement with the project like providing 
documentation, answering questions on the mailinglist, reviewing 
patches, testing patches and submitting patches.


We currently have a discussion ongoing about the future of mahout, feel 
free to participate.


--sebastian


On 03/07/2014 06:41 PM, Pavan Kumar N wrote:

Congratulations to Andrew. Would be nice to have some
information/background on how PMC evaluated Andrew to become committer.
Also would be nice what future aspects/algorithms of machine learning is
mahout is going to focus on.

I have been keen to maintain code for one of the projects and mistakenly I
spent time on developing map reduce version of weighted linear regression
solutions procedure. Only recently I saw mahout's webpages are updated.
Would appreciate any advice from Andrew and other PMC members.

Pavan


On 7 March 2014 22:56, Frank Scholten fr...@frankscholten.nl wrote:


Congratulations Andrew!


On Fri, Mar 7, 2014 at 6:12 PM, Sebastian Schelter s...@apache.org wrote:


Hi,

this is to announce that the Project Management Committee (PMC) for

Apache

Mahout has asked Andrew Musselman to become committer and we are pleased

to

announce that he has accepted.

Being a committer enables easier contribution to the project since in
addition to posting patches on JIRA it also gives write access to the

code

repository. That also means that now we have yet another person who can
commit patches submitted by others to our repo *wink*

Andrew, we look forward to working with you in the future. Welcome! It
would be great if you could introduce yourself with a few words :)

Sebastian









Re: mahout command

2014-03-08 Thread Suneel Marthi
Not sure what's so disappointing here, it was never officially announced that 
Mahout 0.9 had Hadoop 2.x support.

From trunk, can you build mahout for hadoop2 using this command:

mvn clean package -Dhadoop2.version=YOUR_HADOOP2_VERSION



On Friday, March 7, 2014 12:12 PM, Mahmood Naderan nt_mahm...@yahoo.com wrote:
 
That is rather disappointing 

b) Work off of present Head and build with Hadoop 2.x profile. 
Can you explain more? 


 
Regards,
Mahmood



On Friday, March 7, 2014 8:09 PM, Suneel Marthi suneel_mar...@yahoo.com wrote:
 
The example as documented on the Wiki should work.  The issue u seem to be 
running Mahout 0.9 distro that was built with hadoop 1.2.1 profile on a Hadoop 
2.3 environment. I don't think that's gonna work.

Suggest that you either:

a) Switch to a
 Hadoop 1.2.1 environment
b) Work off of present Head and build with Hadoop 2.x profile. 

Mahout 0.9 is not certified for Hadoop 2.x.






On Friday, March 7, 2014 11:16 AM, Mahmood Naderan nt_mahm...@yahoo.com wrote:
 
FYI, I am trying to complete the wikipedia example from Apache's document
https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example

 
Regards,
Mahmood




On Friday, March 7, 2014 5:23 PM, Mahmood Naderan nt_mahm...@yahoo.com wrote:

In fact,  see this file
    src/conf/driver.classes.default.props

which is not exactly as what you said. Still I have the same problem. Please 
see the complete log

hadoop@solaris:~/mahout-distribution-0.9$ head -n 5 
src/conf/driver.classes.default.props 
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : 
wikipedia splitter
#Utils
org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors from a 
sequence file to text
org.apache.mahout.utils.clustering.ClusterDumper = clusterdump :
 Dump cluster output to
text
org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence File 
dumper



hadoop@solaris:~/mahout-distribution-0.9$ mahout wikipediaXMLSplitter -d 
examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and 
HADOOP_CONF_DIR=
MAHOUT-JOB: 
/export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
14/03/07 17:19:04 WARN driver.MahoutDriver: Unable to add class: 
wikipediaXMLSplitter
java.lang.ClassNotFoundException: wikipediaXMLSplitter
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at
 java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at
java.lang.ClassLoader.loadClass(ClassLoader.java:423)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:186)
    at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/03/07 17:19:04 WARN driver.MahoutDriver: No
wikipediaXMLSplitter.props found on classpath, will use command-line arguments 
only
Unknown program 'wikipediaXMLSplitter' chosen.
Valid program names are:
  arff.vector: : Generate Vectors from an ARFF file or directory
  baumwelch: : Baum-Welch algorithm for unsupervised HMM training
  canopy: : Canopy clustering
  cat: : Print a file or resource as the logistic regression models would see it
  cleansvd: : Cleanup and verification of SVD output
  clusterdump:
 : Dump cluster output to text
  clusterpp: : Groups Clustering Output In Clusters
  cmdump: : Dump confusion matrix in HTML or text formats
  concatmatrices: : Concatenates 2 matrices of same cardinality into a single 
matrix
  cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
  cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
  evaluateFactorization: : compute RMSE
and MAE of a rating matrix factorization against probes
  fkmeans: : Fuzzy K-means clustering
  hmmpredict: : Generate random sequence of observations by given HMM
  itemsimilarity: : Compute the item-item-similarities for item-based 
collaborative filtering
  kmeans: : K-means clustering
  lucene.vector: : Generate
 Vectors from a Lucene index
  lucene2seq: : Generate Text SequenceFiles from a Lucene index
  matrixdump: : Dump matrix in CSV format
  matrixmult: : Take the product of two matrices
  parallelALS: : ALS-WR factorization of a rating matrix
  qualcluster: : Runs clustering experiments and summarizes results in a CSV
  recommendfactorized: : Compute recommendations using the 

Re: Newbie question

2014-03-08 Thread Martin, Nick
+ Mahout user

Sent from my iPhone

On Mar 8, 2014, at 10:42 AM, Mahmood Naderan 
nt_mahm...@yahoo.commailto:nt_mahm...@yahoo.com wrote:

Hi
Maybe this is a newbie question but I want to know does Hadoop/Mahout use 
pthread models?

Regards,
Mahmood


Re: mahout command

2014-03-08 Thread Andrew Musselman
You have upper-case in your command but lower-case in your declaration in
the properties file; correct that and it should work.

Note:
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
wikipediaXmlSplitter : wikipedia splitter
hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter
-d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64


On Sat, Mar 8, 2014 at 11:11 AM, Mahmood Naderan nt_mahm...@yahoo.comwrote:

 No success Suneel...

 Please see the attachment which is the output of
  mvn clean package -Dhadoop2.version=2.3.0

 Additionally:


 hadoop@solaris:~/mahout-distribution-0.9$ head -n 5
 src/conf/driver.classes.default.props
 org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
 wikipediaXmlSplitter : wikipedia splitter
 #Utils
 org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors
 from a sequence file to text
 org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump
 cluster output to text
 org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence
 File dumper


 hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter
 -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64

 Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and
 HADOOP_CONF_DIR=
 MAHOUT-JOB:
 /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
 14/03/08 22:37:03 WARN driver.MahoutDriver: Unable to add class:
 wikipediaXMLSplitter

 java.lang.ClassNotFoundException: wikipediaXMLSplitter
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:186)
 at
 org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
 at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 14/03/08 22:37:03 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props
 found on classpath, will use command-line arguments only

 Unknown program 'wikipediaXMLSplitter' chosen.
 Valid program names are:
   arff.vector: : Generate Vectors from an ARFF file or directory
   baumwelch: : Baum-Welch algorithm for unsupervised HMM training
   canopy: : Canopy clustering
   cat: : Print a file or resource as the logistic regression models would
 see it
   cleansvd: : Cleanup and verification of SVD output
   clusterdump: : Dump cluster output to text
   clusterpp: : Groups Clustering Output In Clusters
   cmdump: : Dump confusion matrix in HTML or text formats
   concatmatrices: : Concatenates 2 matrices of same cardinality into a
 single matrix
   cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
   cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
   evaluateFactorization: : compute RMSE and MAE of a rating matrix
 factorization against probes
   fkmeans: : Fuzzy K-means clustering
   hmmpredict: : Generate random sequence of observations by given HMM
   itemsimilarity: : Compute the item-item-similarities for item-based
 collaborative filtering
   kmeans: : K-means clustering
   lucene.vector: : Generate Vectors from a Lucene index
   lucene2seq: : Generate Text SequenceFiles from a Lucene index
   matrixdump: : Dump matrix in CSV format
   matrixmult: : Take the product of two matrices
   parallelALS: : ALS-WR factorization of a rating matrix
   qualcluster: : Runs clustering experiments and summarizes results in a
 CSV
   recommendfactorized: : Compute recommendations using the factorization
 of a rating matrix
   recommenditembased: : Compute recommendations using item-based
 collaborative filtering
   regexconverter: : Convert text files on a per line basis based on
 regular expressions
   resplit: : Splits a set of SequenceFiles into a number of equal splits
   rowid: : Map SequenceFileText,VectorWritable to
 {SequenceFileIntWritable,VectorWritable, SequenceFileIntWritable,Text}
   rowsimilarity: : Compute the pairwise similarities of the rows of a
 matrix
   runAdaptiveLogistic: : Score new production data using a probably
 trained and validated AdaptivelogisticRegression model
   runlogistic: : Run a logistic regression model against CSV data
   seq2encoded: : Encoded Sparse Vector generation from Text sequence files
   seq2sparse: : 

Re: mahout command

2014-03-08 Thread Suneel Marthi
Thanks Andrew, that seems to have been the issue all the while.
Nevertheless, it is better to run from Head if running on Hadoop 2.3.0




On Saturday, March 8, 2014 2:42 PM, Andrew Musselman 
andrew.mussel...@gmail.com wrote:
 
You have upper-case in your command but lower-case in your declaration in
the properties file; correct that and it should work.

Note:
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
wikipediaXmlSplitter : wikipedia splitter
hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter
-d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64



On Sat, Mar 8, 2014 at 11:11 AM, Mahmood Naderan nt_mahm...@yahoo.comwrote:

 No success Suneel...

 Please see the attachment which is the output of
      mvn clean package -Dhadoop2.version=2.3.0

 Additionally:


 hadoop@solaris:~/mahout-distribution-0.9$ head -n 5
 src/conf/driver.classes.default.props
 org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
 wikipediaXmlSplitter : wikipedia splitter
 #Utils
 org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors
 from a sequence file to text
 org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump
 cluster output to text
 org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence
 File dumper


 hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter
 -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64

 Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and
 HADOOP_CONF_DIR=
 MAHOUT-JOB:
 /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
 14/03/08 22:37:03 WARN driver.MahoutDriver: Unable to add class:
 wikipediaXMLSplitter

 java.lang.ClassNotFoundException: wikipediaXMLSplitter
     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
     at java.security.AccessController.doPrivileged(Native Method)
     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
     at java.lang.Class.forName0(Native Method)
     at java.lang.Class.forName(Class.java:186)
     at
 org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
     at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
     at java.lang.reflect.Method.invoke(Method.java:601)
     at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 14/03/08 22:37:03 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props
 found on classpath, will use command-line arguments only

 Unknown program 'wikipediaXMLSplitter' chosen.
 Valid program names are:
   arff.vector: : Generate Vectors from an ARFF file or directory
   baumwelch: : Baum-Welch algorithm for unsupervised HMM training
   canopy: : Canopy clustering
   cat: : Print a file or resource as the logistic regression models would
 see it
   cleansvd: : Cleanup and verification of SVD output
   clusterdump: : Dump cluster output to text
   clusterpp: : Groups Clustering Output In Clusters
   cmdump: : Dump confusion matrix in HTML or text formats
   concatmatrices: : Concatenates 2 matrices of same cardinality into a
 single matrix
   cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
   cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
   evaluateFactorization: : compute RMSE and MAE of a rating matrix
 factorization against probes
   fkmeans: : Fuzzy K-means clustering
   hmmpredict: : Generate random sequence of observations by given HMM
   itemsimilarity: : Compute the item-item-similarities for item-based
 collaborative filtering
   kmeans: : K-means clustering
   lucene.vector: : Generate Vectors from a Lucene index
   lucene2seq: : Generate Text SequenceFiles from a Lucene index
   matrixdump: : Dump matrix in CSV format
   matrixmult: : Take the product of two matrices
   parallelALS: : ALS-WR factorization of a rating matrix
   qualcluster: : Runs clustering experiments and summarizes results in a
 CSV
   recommendfactorized: : Compute recommendations using the factorization
 of a rating matrix
   recommenditembased: : Compute recommendations using item-based
 collaborative filtering
   regexconverter: : Convert text files on a per line basis based on
 regular expressions
   resplit: : Splits a set of SequenceFiles into a number of equal splits
   rowid: : Map SequenceFileText,VectorWritable to
 {SequenceFileIntWritable,VectorWritable, SequenceFileIntWritable,Text}
   rowsimilarity: : Compute the pairwise similarities of the rows of a
 matrix
   runAdaptiveLogistic: : Score new production data using 

Re: mahout command

2014-03-08 Thread Mahmood Naderan
Oh yes... Thanks Andrew you are right
Meanwhile I see two warnings

WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, 
will use command-line arguments only
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your 
platform... using builtin-java classes where applicable

Is there any concern about them?


 
R.egards,
Mahmood



On Saturday, March 8, 2014 11:19 PM, Suneel Marthi suneel_mar...@yahoo.com 
wrote:
 
Thanks Andrew, that seems to have been the issue all the while.
Nevertheless, it is better to run from Head if running on Hadoop 2.3.0





On Saturday, March 8, 2014 2:42 PM, Andrew Musselman 
andrew.mussel...@gmail.com wrote:

You have upper-case in your command but lower-case in your declaration in
the properties file; correct that and it should work.

Note:
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
wikipediaXmlSplitter : wikipedia splitter
hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter
-d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64



On Sat, Mar 8, 2014 at 11:11 AM, Mahmood Naderan nt_mahm...@yahoo.comwrote:

 No success Suneel...

 Please see the attachment which is the output of
      mvn clean package -Dhadoop2.version=2.3.0

 Additionally:


 hadoop@solaris:~/mahout-distribution-0.9$ head -n 5
 src/conf/driver.classes.default.props
 org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
 wikipediaXmlSplitter : wikipedia splitter
 #Utils
 org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors
 from a sequence file to text
 org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump
 cluster output to text
 org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence
 File dumper


 hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter
 -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64

 Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and
 HADOOP_CONF_DIR=
 MAHOUT-JOB:
 /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
 14/03/08 22:37:03 WARN driver.MahoutDriver: Unable to add class:
 wikipediaXMLSplitter

 java.lang.ClassNotFoundException: wikipediaXMLSplitter
     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
     at java.security.AccessController.doPrivileged(Native Method)
     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
     at java.lang.Class.forName0(Native Method)
     at java.lang.Class.forName(Class.java:186)
     at
 org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
     at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
     at java.lang.reflect.Method.invoke(Method.java:601)
     at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 14/03/08 22:37:03 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props
 found on classpath, will use command-line arguments only

 Unknown program 'wikipediaXMLSplitter' chosen.
 Valid program names are:
   arff.vector: : Generate Vectors from an ARFF file or directory
   baumwelch: : Baum-Welch algorithm for unsupervised HMM training
   canopy: : Canopy clustering
   cat: : Print a file or resource as the logistic regression models would
 see it
   cleansvd: : Cleanup and verification of SVD output
   clusterdump: : Dump cluster output to text
   clusterpp: : Groups Clustering Output In Clusters
   cmdump: : Dump confusion matrix in HTML or text formats
   concatmatrices: : Concatenates 2 matrices of same cardinality into a
 single matrix
   cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
   cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
   evaluateFactorization: : compute RMSE and MAE of a rating matrix
 factorization against probes
   fkmeans: : Fuzzy K-means clustering
   hmmpredict: : Generate random sequence of observations by given HMM
   itemsimilarity: : Compute the item-item-similarities for item-based
 collaborative filtering
   kmeans: : K-means clustering
   lucene.vector: : Generate Vectors from a Lucene index
   lucene2seq: : Generate Text SequenceFiles from a Lucene index
   matrixdump: : Dump matrix in CSV format
   matrixmult: : Take the product of two matrices
   parallelALS: : ALS-WR factorization of a rating matrix
   qualcluster: : Runs clustering experiments and summarizes results in a
 CSV
   recommendfactorized: : Compute recommendations using the factorization
 of a rating matrix
   recommenditembased: : Compute recommendations using 

Re: mahout command

2014-03-08 Thread Suneel Marthi
You can ignore the warnings. 





On Saturday, March 8, 2014 2:58 PM, Mahmood Naderan nt_mahm...@yahoo.com 
wrote:
 
Oh yes... Thanks Andrew you are right
Meanwhile I see two warnings

WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, 
will use command-line arguments only
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your 
platform... using builtin-java classes where applicable

Is there any concern about them?


 
R.egards,
Mahmood



On Saturday, March 8, 2014 11:19 PM, Suneel Marthi suneel_mar...@yahoo.com 
wrote:
 
Thanks Andrew, that seems to have been the issue all the while.
Nevertheless, it is better to run from Head if running on Hadoop 2.3.0





On Saturday, March 8, 2014 2:42 PM, Andrew Musselman 
andrew.mussel...@gmail.com wrote:

You have upper-case in your command but lower-case in your declaration in
the properties file; correct that and it should work.

Note:
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
wikipediaXmlSplitter : wikipedia splitter
hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter
-d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64



On Sat, Mar 8, 2014 at 11:11 AM, Mahmood Naderan nt_mahm...@yahoo.comwrote:

 No success Suneel...

 Please see the attachment which is the output of
      mvn clean package -Dhadoop2.version=2.3.0

 Additionally:


 hadoop@solaris:~/mahout-distribution-0.9$ head -n 5
 src/conf/driver.classes.default.props
 org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
 wikipediaXmlSplitter : wikipedia splitter
 #Utils
 org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors
 from a sequence file to text
 org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump
 cluster output to text
 org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence
 File dumper


 hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter
 -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64

 Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and
 HADOOP_CONF_DIR=
 MAHOUT-JOB:
 /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
 14/03/08 22:37:03 WARN driver.MahoutDriver: Unable to add class:
 wikipediaXMLSplitter

 java.lang.ClassNotFoundException: wikipediaXMLSplitter
     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
     at java.security.AccessController.doPrivileged(Native Method)
     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
     at
 java.lang.ClassLoader.loadClass(ClassLoader.java:423)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
     at java.lang.Class.forName0(Native Method)
     at java.lang.Class.forName(Class.java:186)
     at
 org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
     at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
     at java.lang.reflect.Method.invoke(Method.java:601)
     at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 14/03/08 22:37:03 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props
 found on classpath, will use command-line arguments only

 Unknown program 'wikipediaXMLSplitter' chosen.
 Valid program names are:
   arff.vector: : Generate Vectors from an ARFF file or directory
   baumwelch: : Baum-Welch algorithm for unsupervised HMM training
   canopy: : Canopy clustering
   cat: : Print a file or resource as the logistic regression models would
 see it
   cleansvd: : Cleanup and verification of SVD output
   clusterdump: : Dump cluster output to text
   clusterpp: : Groups Clustering Output In Clusters
   cmdump: : Dump confusion matrix in HTML or text formats
   concatmatrices: : Concatenates 2 matrices of same cardinality into a
 single matrix
   cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
   cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
   evaluateFactorization: : compute RMSE and MAE of a rating matrix
 factorization against probes
   fkmeans: : Fuzzy K-means clustering
   hmmpredict: : Generate random sequence of observations by given HMM
   itemsimilarity: : Compute the
 item-item-similarities for item-based
 collaborative filtering
   kmeans: : K-means clustering
   lucene.vector: : Generate Vectors from a Lucene index
   lucene2seq: : Generate Text SequenceFiles from a Lucene index
   matrixdump: : Dump matrix in CSV format
   matrixmult: : Take the product of two matrices
   parallelALS: : ALS-WR factorization of a rating matrix
   qualcluster: : Runs clustering experiments and summarizes results in a
 CSV
   recommendfactorized: : 

Re: mahout command

2014-03-08 Thread Mahmood Naderan
What a fast reply... Thanks a lot Suneel,

 
Regards,
Mahmood



On Saturday, March 8, 2014 11:29 PM, Suneel Marthi suneel_mar...@yahoo.com 
wrote:
 
You can ignore the warnings. 





On Saturday, March 8, 2014 2:58 PM, Mahmood Naderan nt_mahm...@yahoo.com 
wrote:
 
Oh yes... Thanks Andrew you are right
Meanwhile I see two
 warnings

WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, 
will use command-line arguments only
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your 
platform... using builtin-java classes where applicable

Is there any concern about them?


 
R.egards,
Mahmood



On Saturday, March 8, 2014 11:19 PM, Suneel Marthi suneel_mar...@yahoo.com 
wrote:
 
Thanks Andrew, that seems to have been the issue all the while.
Nevertheless, it is better to run from Head if running on Hadoop 2.3.0





On Saturday, March 8, 2014 2:42 PM, Andrew Musselman 
andrew.mussel...@gmail.com wrote:

You have upper-case in your command but lower-case in your declaration in
the properties file; correct that and it should work.

Note:
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter
 =
wikipediaXmlSplitter : wikipedia splitter
hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter
-d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64



On Sat, Mar 8, 2014 at 11:11 AM, Mahmood Naderan nt_mahm...@yahoo.comwrote:

 No success Suneel...

 Please see the attachment which is the output of
      mvn clean package -Dhadoop2.version=2.3.0

 Additionally:


 hadoop@solaris:~/mahout-distribution-0.9$ head -n 5
 src/conf/driver.classes.default.props
 org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
 wikipediaXmlSplitter : wikipedia splitter
 #Utils
 org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors
 from a sequence file to text
 org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump
 cluster output to text
 org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence
 File dumper


 hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter
 -d examples/temp/enwiki-latest-pages-articles.xml -o
 wikipedia/chunks -c 64

 Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and
 HADOOP_CONF_DIR=
 MAHOUT-JOB:
 /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
 14/03/08 22:37:03 WARN driver.MahoutDriver: Unable to add class:
 wikipediaXMLSplitter

 java.lang.ClassNotFoundException: wikipediaXMLSplitter
     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
     at java.security.AccessController.doPrivileged(Native Method)
     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
     at
 java.lang.ClassLoader.loadClass(ClassLoader.java:423)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
     at java.lang.Class.forName0(Native Method)
     at java.lang.Class.forName(Class.java:186)
     at
 org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
     at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
     at java.lang.reflect.Method.invoke(Method.java:601)
     at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 14/03/08 22:37:03 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props
 found on classpath, will use command-line arguments only

 Unknown program 'wikipediaXMLSplitter' chosen.
 Valid program names are:
   arff.vector: : Generate Vectors from an ARFF file or directory
   baumwelch: : Baum-Welch algorithm for unsupervised HMM training
   canopy: : Canopy clustering
   cat: : Print a file or resource as the logistic regression models would
 see it
   cleansvd: : Cleanup and verification of SVD output
   clusterdump: : Dump cluster output to text
   clusterpp: : Groups Clustering Output In Clusters
   cmdump: : Dump confusion matrix in HTML or text formats
   concatmatrices: : Concatenates 2 matrices of same cardinality into a
 single matrix
   cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
   cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
   evaluateFactorization: : compute RMSE and MAE of a rating matrix
 factorization against probes
   fkmeans: : Fuzzy K-means clustering
   hmmpredict: : Generate random sequence of observations by given HMM
   itemsimilarity: : Compute the
 item-item-similarities for item-based
 collaborative filtering
   kmeans: : K-means clustering
   lucene.vector: : Generate Vectors from a Lucene index
   lucene2seq: : Generate Text SequenceFiles from a Lucene index
   matrixdump: : Dump matrix in CSV format
   matrixmult: : Take the product of two matrices
   parallelALS: 

Re: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected

2014-03-08 Thread Margusja

Hi, is there any information about the problem I submitted?

Best regards, Margus (Margusja) Roo
+372 51 48 780
http://margus.roo.ee
http://ee.linkedin.com/in/margusroo
skype: margusja
ldapsearch -x -h ldap.sk.ee -b c=EE (serialNumber=37303140314)
-BEGIN PUBLIC KEY-
MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCvbeg7LwEC2SCpAEewwpC3ajxE
5ZsRMCB77L8bae9G7TslgLkoIzo9yOjPdx2NN6DllKbV65UjTay43uUDyql9g3tl
RhiJIcoAExkSTykWqAIPR88LfilLy1JlQ+0RD8OXiWOVVQfhOHpQ0R/jcAkM2lZa
BjM8j36yJvoBVsfOHQIDAQAB
-END PUBLIC KEY-

On 05/03/14 10:30, Margusja wrote:

Hi

Here are my actions and the problematic result again:

[hduser@vm38 ~]$ git clone https://github.com/apache/mahout.git
remote: Reusing existing pack: 76099, done.
remote: Counting objects: 39, done.
remote: Compressing objects: 100% (32/32), done.
remote: Total 76138 (delta 2), reused 0 (delta 0)
Receiving objects: 100% (76138/76138), 49.04 MiB | 275 KiB/s, done.
Resolving deltas: 100% (34449/34449), done.
[hduser@vm38 ~]$ cd mahout
[hduser@vm38 ~]$ mvn clean package -DskipTests=true 
-Dhadoop2.version=2.2.0

...
...
...
[INFO] Reactor Summary:
[INFO]
[INFO] Mahout Build Tools  SUCCESS 
[15.529s]
[INFO] Apache Mahout . SUCCESS 
[1.657s]
[INFO] Mahout Math ... SUCCESS 
[1:00.891s]
[INFO] Mahout Core ... SUCCESS 
[2:44.617s]
[INFO] Mahout Integration  SUCCESS 
[38.195s]
[INFO] Mahout Examples ... SUCCESS 
[45.458s]
[INFO] Mahout Release Package  SUCCESS 
[0.012s]
[INFO] Mahout Math/Scala wrappers  SUCCESS 
[53.519s]
[INFO] 


[INFO] BUILD SUCCESS
[INFO] 


[INFO] Total time: 6:27.763s
[INFO] Finished at: Wed Mar 05 10:22:51 EET 2014
[INFO] Final Memory: 57M/442M
[INFO] 


[hduser@vm38 mahout]$
[hduser@vm38 mahout]$ cd ../
[hduser@vm38 ~]$ /usr/lib/hadoop/bin/hadoop jar 
mahout/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar 
org.apache.mahout.classifier.df.mapreduce.BuildForest -d 
input/data666.noheader.data -ds input/data666.noheader.data.info -sl 5 
-p -t 100 -o nsl-forest
14/03/05 10:26:39 INFO mapreduce.BuildForest: Partial Mapred 
implementation

14/03/05 10:26:39 INFO mapreduce.BuildForest: Building the forest...
14/03/05 10:26:39 INFO client.RMProxy: Connecting to ResourceManager 
at /0.0.0.0:8032
14/03/05 10:26:51 INFO input.FileInputFormat: Total input paths to 
process : 1

14/03/05 10:26:51 INFO mapreduce.JobSubmitter: number of splits:1
14/03/05 10:26:51 INFO Configuration.deprecation: user.name is 
deprecated. Instead, use mapreduce.job.user.name
14/03/05 10:26:51 INFO Configuration.deprecation: mapred.jar is 
deprecated. Instead, use mapreduce.job.jar
14/03/05 10:26:51 INFO Configuration.deprecation: 
mapred.cache.files.filesizes is deprecated. Instead, use 
mapreduce.job.cache.files.filesizes
14/03/05 10:26:51 INFO Configuration.deprecation: mapred.cache.files 
is deprecated. Instead, use mapreduce.job.cache.files
14/03/05 10:26:51 INFO Configuration.deprecation: mapred.reduce.tasks 
is deprecated. Instead, use mapreduce.job.reduces
14/03/05 10:26:51 INFO Configuration.deprecation: 
mapred.output.value.class is deprecated. Instead, use 
mapreduce.job.output.value.class
14/03/05 10:26:51 INFO Configuration.deprecation: mapreduce.map.class 
is deprecated. Instead, use mapreduce.job.map.class
14/03/05 10:26:51 INFO Configuration.deprecation: mapred.job.name is 
deprecated. Instead, use mapreduce.job.name
14/03/05 10:26:51 INFO Configuration.deprecation: 
mapreduce.inputformat.class is deprecated. Instead, use 
mapreduce.job.inputformat.class
14/03/05 10:26:51 INFO Configuration.deprecation: mapred.input.dir is 
deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/03/05 10:26:51 INFO Configuration.deprecation: mapred.output.dir is 
deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/03/05 10:26:51 INFO Configuration.deprecation: 
mapreduce.outputformat.class is deprecated. Instead, use 
mapreduce.job.outputformat.class
14/03/05 10:26:51 INFO Configuration.deprecation: mapred.map.tasks is 
deprecated. Instead, use mapreduce.job.maps
14/03/05 10:26:51 INFO Configuration.deprecation: 
mapred.cache.files.timestamps is deprecated. Instead, use 
mapreduce.job.cache.files.timestamps
14/03/05 10:26:51 INFO Configuration.deprecation: 
mapred.output.key.class is deprecated. Instead, use 
mapreduce.job.output.key.class
14/03/05 10:26:51 INFO Configuration.deprecation: mapred.working.dir 
is deprecated. Instead, use mapreduce.job.working.dir
14/03/05 10:26:52 INFO mapreduce.JobSubmitter: Submitting tokens for 
job: job_1393936067845_0018
14/03/05