Re: Welcome Andrew Musselman as new comitter
Hi Pavan, Committership is given for engagement with the project like providing documentation, answering questions on the mailinglist, reviewing patches, testing patches and submitting patches. We currently have a discussion ongoing about the future of mahout, feel free to participate. --sebastian On 03/07/2014 06:41 PM, Pavan Kumar N wrote: Congratulations to Andrew. Would be nice to have some information/background on how PMC evaluated Andrew to become committer. Also would be nice what future aspects/algorithms of machine learning is mahout is going to focus on. I have been keen to maintain code for one of the projects and mistakenly I spent time on developing map reduce version of weighted linear regression solutions procedure. Only recently I saw mahout's webpages are updated. Would appreciate any advice from Andrew and other PMC members. Pavan On 7 March 2014 22:56, Frank Scholten fr...@frankscholten.nl wrote: Congratulations Andrew! On Fri, Mar 7, 2014 at 6:12 PM, Sebastian Schelter s...@apache.org wrote: Hi, this is to announce that the Project Management Committee (PMC) for Apache Mahout has asked Andrew Musselman to become committer and we are pleased to announce that he has accepted. Being a committer enables easier contribution to the project since in addition to posting patches on JIRA it also gives write access to the code repository. That also means that now we have yet another person who can commit patches submitted by others to our repo *wink* Andrew, we look forward to working with you in the future. Welcome! It would be great if you could introduce yourself with a few words :) Sebastian
Re: mahout command
Not sure what's so disappointing here, it was never officially announced that Mahout 0.9 had Hadoop 2.x support. From trunk, can you build mahout for hadoop2 using this command: mvn clean package -Dhadoop2.version=YOUR_HADOOP2_VERSION On Friday, March 7, 2014 12:12 PM, Mahmood Naderan nt_mahm...@yahoo.com wrote: That is rather disappointing b) Work off of present Head and build with Hadoop 2.x profile. Can you explain more? Regards, Mahmood On Friday, March 7, 2014 8:09 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: The example as documented on the Wiki should work. The issue u seem to be running Mahout 0.9 distro that was built with hadoop 1.2.1 profile on a Hadoop 2.3 environment. I don't think that's gonna work. Suggest that you either: a) Switch to a Hadoop 1.2.1 environment b) Work off of present Head and build with Hadoop 2.x profile. Mahout 0.9 is not certified for Hadoop 2.x. On Friday, March 7, 2014 11:16 AM, Mahmood Naderan nt_mahm...@yahoo.com wrote: FYI, I am trying to complete the wikipedia example from Apache's document https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example Regards, Mahmood On Friday, March 7, 2014 5:23 PM, Mahmood Naderan nt_mahm...@yahoo.com wrote: In fact, see this file src/conf/driver.classes.default.props which is not exactly as what you said. Still I have the same problem. Please see the complete log hadoop@solaris:~/mahout-distribution-0.9$ head -n 5 src/conf/driver.classes.default.props org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter #Utils org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors from a sequence file to text org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump cluster output to text org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence File dumper hadoop@solaris:~/mahout-distribution-0.9$ mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64 Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and HADOOP_CONF_DIR= MAHOUT-JOB: /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar 14/03/07 17:19:04 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter java.lang.ClassNotFoundException: wikipediaXMLSplitter at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:186) at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) 14/03/07 17:19:04 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only Unknown program 'wikipediaXMLSplitter' chosen. Valid program names are: arff.vector: : Generate Vectors from an ARFF file or directory baumwelch: : Baum-Welch algorithm for unsupervised HMM training canopy: : Canopy clustering cat: : Print a file or resource as the logistic regression models would see it cleansvd: : Cleanup and verification of SVD output clusterdump: : Dump cluster output to text clusterpp: : Groups Clustering Output In Clusters cmdump: : Dump confusion matrix in HTML or text formats concatmatrices: : Concatenates 2 matrices of same cardinality into a single matrix cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx) cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally. evaluateFactorization: : compute RMSE and MAE of a rating matrix factorization against probes fkmeans: : Fuzzy K-means clustering hmmpredict: : Generate random sequence of observations by given HMM itemsimilarity: : Compute the item-item-similarities for item-based collaborative filtering kmeans: : K-means clustering lucene.vector: : Generate Vectors from a Lucene index lucene2seq: : Generate Text SequenceFiles from a Lucene index matrixdump: : Dump matrix in CSV format matrixmult: : Take the product of two matrices parallelALS: : ALS-WR factorization of a rating matrix qualcluster: : Runs clustering experiments and summarizes results in a CSV recommendfactorized: : Compute recommendations using the
Re: Newbie question
+ Mahout user Sent from my iPhone On Mar 8, 2014, at 10:42 AM, Mahmood Naderan nt_mahm...@yahoo.commailto:nt_mahm...@yahoo.com wrote: Hi Maybe this is a newbie question but I want to know does Hadoop/Mahout use pthread models? Regards, Mahmood
Re: mahout command
You have upper-case in your command but lower-case in your declaration in the properties file; correct that and it should work. Note: org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64 On Sat, Mar 8, 2014 at 11:11 AM, Mahmood Naderan nt_mahm...@yahoo.comwrote: No success Suneel... Please see the attachment which is the output of mvn clean package -Dhadoop2.version=2.3.0 Additionally: hadoop@solaris:~/mahout-distribution-0.9$ head -n 5 src/conf/driver.classes.default.props org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter #Utils org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors from a sequence file to text org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump cluster output to text org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence File dumper hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64 Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and HADOOP_CONF_DIR= MAHOUT-JOB: /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar 14/03/08 22:37:03 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter java.lang.ClassNotFoundException: wikipediaXMLSplitter at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:186) at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) 14/03/08 22:37:03 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only Unknown program 'wikipediaXMLSplitter' chosen. Valid program names are: arff.vector: : Generate Vectors from an ARFF file or directory baumwelch: : Baum-Welch algorithm for unsupervised HMM training canopy: : Canopy clustering cat: : Print a file or resource as the logistic regression models would see it cleansvd: : Cleanup and verification of SVD output clusterdump: : Dump cluster output to text clusterpp: : Groups Clustering Output In Clusters cmdump: : Dump confusion matrix in HTML or text formats concatmatrices: : Concatenates 2 matrices of same cardinality into a single matrix cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx) cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally. evaluateFactorization: : compute RMSE and MAE of a rating matrix factorization against probes fkmeans: : Fuzzy K-means clustering hmmpredict: : Generate random sequence of observations by given HMM itemsimilarity: : Compute the item-item-similarities for item-based collaborative filtering kmeans: : K-means clustering lucene.vector: : Generate Vectors from a Lucene index lucene2seq: : Generate Text SequenceFiles from a Lucene index matrixdump: : Dump matrix in CSV format matrixmult: : Take the product of two matrices parallelALS: : ALS-WR factorization of a rating matrix qualcluster: : Runs clustering experiments and summarizes results in a CSV recommendfactorized: : Compute recommendations using the factorization of a rating matrix recommenditembased: : Compute recommendations using item-based collaborative filtering regexconverter: : Convert text files on a per line basis based on regular expressions resplit: : Splits a set of SequenceFiles into a number of equal splits rowid: : Map SequenceFileText,VectorWritable to {SequenceFileIntWritable,VectorWritable, SequenceFileIntWritable,Text} rowsimilarity: : Compute the pairwise similarities of the rows of a matrix runAdaptiveLogistic: : Score new production data using a probably trained and validated AdaptivelogisticRegression model runlogistic: : Run a logistic regression model against CSV data seq2encoded: : Encoded Sparse Vector generation from Text sequence files seq2sparse: :
Re: mahout command
Thanks Andrew, that seems to have been the issue all the while. Nevertheless, it is better to run from Head if running on Hadoop 2.3.0 On Saturday, March 8, 2014 2:42 PM, Andrew Musselman andrew.mussel...@gmail.com wrote: You have upper-case in your command but lower-case in your declaration in the properties file; correct that and it should work. Note: org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64 On Sat, Mar 8, 2014 at 11:11 AM, Mahmood Naderan nt_mahm...@yahoo.comwrote: No success Suneel... Please see the attachment which is the output of mvn clean package -Dhadoop2.version=2.3.0 Additionally: hadoop@solaris:~/mahout-distribution-0.9$ head -n 5 src/conf/driver.classes.default.props org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter #Utils org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors from a sequence file to text org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump cluster output to text org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence File dumper hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64 Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and HADOOP_CONF_DIR= MAHOUT-JOB: /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar 14/03/08 22:37:03 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter java.lang.ClassNotFoundException: wikipediaXMLSplitter at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:186) at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) 14/03/08 22:37:03 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only Unknown program 'wikipediaXMLSplitter' chosen. Valid program names are: arff.vector: : Generate Vectors from an ARFF file or directory baumwelch: : Baum-Welch algorithm for unsupervised HMM training canopy: : Canopy clustering cat: : Print a file or resource as the logistic regression models would see it cleansvd: : Cleanup and verification of SVD output clusterdump: : Dump cluster output to text clusterpp: : Groups Clustering Output In Clusters cmdump: : Dump confusion matrix in HTML or text formats concatmatrices: : Concatenates 2 matrices of same cardinality into a single matrix cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx) cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally. evaluateFactorization: : compute RMSE and MAE of a rating matrix factorization against probes fkmeans: : Fuzzy K-means clustering hmmpredict: : Generate random sequence of observations by given HMM itemsimilarity: : Compute the item-item-similarities for item-based collaborative filtering kmeans: : K-means clustering lucene.vector: : Generate Vectors from a Lucene index lucene2seq: : Generate Text SequenceFiles from a Lucene index matrixdump: : Dump matrix in CSV format matrixmult: : Take the product of two matrices parallelALS: : ALS-WR factorization of a rating matrix qualcluster: : Runs clustering experiments and summarizes results in a CSV recommendfactorized: : Compute recommendations using the factorization of a rating matrix recommenditembased: : Compute recommendations using item-based collaborative filtering regexconverter: : Convert text files on a per line basis based on regular expressions resplit: : Splits a set of SequenceFiles into a number of equal splits rowid: : Map SequenceFileText,VectorWritable to {SequenceFileIntWritable,VectorWritable, SequenceFileIntWritable,Text} rowsimilarity: : Compute the pairwise similarities of the rows of a matrix runAdaptiveLogistic: : Score new production data using
Re: mahout command
Oh yes... Thanks Andrew you are right Meanwhile I see two warnings WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Is there any concern about them? R.egards, Mahmood On Saturday, March 8, 2014 11:19 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: Thanks Andrew, that seems to have been the issue all the while. Nevertheless, it is better to run from Head if running on Hadoop 2.3.0 On Saturday, March 8, 2014 2:42 PM, Andrew Musselman andrew.mussel...@gmail.com wrote: You have upper-case in your command but lower-case in your declaration in the properties file; correct that and it should work. Note: org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64 On Sat, Mar 8, 2014 at 11:11 AM, Mahmood Naderan nt_mahm...@yahoo.comwrote: No success Suneel... Please see the attachment which is the output of mvn clean package -Dhadoop2.version=2.3.0 Additionally: hadoop@solaris:~/mahout-distribution-0.9$ head -n 5 src/conf/driver.classes.default.props org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter #Utils org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors from a sequence file to text org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump cluster output to text org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence File dumper hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64 Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and HADOOP_CONF_DIR= MAHOUT-JOB: /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar 14/03/08 22:37:03 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter java.lang.ClassNotFoundException: wikipediaXMLSplitter at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:186) at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) 14/03/08 22:37:03 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only Unknown program 'wikipediaXMLSplitter' chosen. Valid program names are: arff.vector: : Generate Vectors from an ARFF file or directory baumwelch: : Baum-Welch algorithm for unsupervised HMM training canopy: : Canopy clustering cat: : Print a file or resource as the logistic regression models would see it cleansvd: : Cleanup and verification of SVD output clusterdump: : Dump cluster output to text clusterpp: : Groups Clustering Output In Clusters cmdump: : Dump confusion matrix in HTML or text formats concatmatrices: : Concatenates 2 matrices of same cardinality into a single matrix cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx) cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally. evaluateFactorization: : compute RMSE and MAE of a rating matrix factorization against probes fkmeans: : Fuzzy K-means clustering hmmpredict: : Generate random sequence of observations by given HMM itemsimilarity: : Compute the item-item-similarities for item-based collaborative filtering kmeans: : K-means clustering lucene.vector: : Generate Vectors from a Lucene index lucene2seq: : Generate Text SequenceFiles from a Lucene index matrixdump: : Dump matrix in CSV format matrixmult: : Take the product of two matrices parallelALS: : ALS-WR factorization of a rating matrix qualcluster: : Runs clustering experiments and summarizes results in a CSV recommendfactorized: : Compute recommendations using the factorization of a rating matrix recommenditembased: : Compute recommendations using
Re: mahout command
You can ignore the warnings. On Saturday, March 8, 2014 2:58 PM, Mahmood Naderan nt_mahm...@yahoo.com wrote: Oh yes... Thanks Andrew you are right Meanwhile I see two warnings WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Is there any concern about them? R.egards, Mahmood On Saturday, March 8, 2014 11:19 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: Thanks Andrew, that seems to have been the issue all the while. Nevertheless, it is better to run from Head if running on Hadoop 2.3.0 On Saturday, March 8, 2014 2:42 PM, Andrew Musselman andrew.mussel...@gmail.com wrote: You have upper-case in your command but lower-case in your declaration in the properties file; correct that and it should work. Note: org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64 On Sat, Mar 8, 2014 at 11:11 AM, Mahmood Naderan nt_mahm...@yahoo.comwrote: No success Suneel... Please see the attachment which is the output of mvn clean package -Dhadoop2.version=2.3.0 Additionally: hadoop@solaris:~/mahout-distribution-0.9$ head -n 5 src/conf/driver.classes.default.props org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter #Utils org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors from a sequence file to text org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump cluster output to text org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence File dumper hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64 Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and HADOOP_CONF_DIR= MAHOUT-JOB: /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar 14/03/08 22:37:03 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter java.lang.ClassNotFoundException: wikipediaXMLSplitter at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:186) at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) 14/03/08 22:37:03 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only Unknown program 'wikipediaXMLSplitter' chosen. Valid program names are: arff.vector: : Generate Vectors from an ARFF file or directory baumwelch: : Baum-Welch algorithm for unsupervised HMM training canopy: : Canopy clustering cat: : Print a file or resource as the logistic regression models would see it cleansvd: : Cleanup and verification of SVD output clusterdump: : Dump cluster output to text clusterpp: : Groups Clustering Output In Clusters cmdump: : Dump confusion matrix in HTML or text formats concatmatrices: : Concatenates 2 matrices of same cardinality into a single matrix cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx) cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally. evaluateFactorization: : compute RMSE and MAE of a rating matrix factorization against probes fkmeans: : Fuzzy K-means clustering hmmpredict: : Generate random sequence of observations by given HMM itemsimilarity: : Compute the item-item-similarities for item-based collaborative filtering kmeans: : K-means clustering lucene.vector: : Generate Vectors from a Lucene index lucene2seq: : Generate Text SequenceFiles from a Lucene index matrixdump: : Dump matrix in CSV format matrixmult: : Take the product of two matrices parallelALS: : ALS-WR factorization of a rating matrix qualcluster: : Runs clustering experiments and summarizes results in a CSV recommendfactorized: :
Re: mahout command
What a fast reply... Thanks a lot Suneel, Regards, Mahmood On Saturday, March 8, 2014 11:29 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: You can ignore the warnings. On Saturday, March 8, 2014 2:58 PM, Mahmood Naderan nt_mahm...@yahoo.com wrote: Oh yes... Thanks Andrew you are right Meanwhile I see two warnings WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Is there any concern about them? R.egards, Mahmood On Saturday, March 8, 2014 11:19 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: Thanks Andrew, that seems to have been the issue all the while. Nevertheless, it is better to run from Head if running on Hadoop 2.3.0 On Saturday, March 8, 2014 2:42 PM, Andrew Musselman andrew.mussel...@gmail.com wrote: You have upper-case in your command but lower-case in your declaration in the properties file; correct that and it should work. Note: org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64 On Sat, Mar 8, 2014 at 11:11 AM, Mahmood Naderan nt_mahm...@yahoo.comwrote: No success Suneel... Please see the attachment which is the output of mvn clean package -Dhadoop2.version=2.3.0 Additionally: hadoop@solaris:~/mahout-distribution-0.9$ head -n 5 src/conf/driver.classes.default.props org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter #Utils org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors from a sequence file to text org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump cluster output to text org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence File dumper hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64 Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and HADOOP_CONF_DIR= MAHOUT-JOB: /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar 14/03/08 22:37:03 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter java.lang.ClassNotFoundException: wikipediaXMLSplitter at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:186) at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) 14/03/08 22:37:03 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only Unknown program 'wikipediaXMLSplitter' chosen. Valid program names are: arff.vector: : Generate Vectors from an ARFF file or directory baumwelch: : Baum-Welch algorithm for unsupervised HMM training canopy: : Canopy clustering cat: : Print a file or resource as the logistic regression models would see it cleansvd: : Cleanup and verification of SVD output clusterdump: : Dump cluster output to text clusterpp: : Groups Clustering Output In Clusters cmdump: : Dump confusion matrix in HTML or text formats concatmatrices: : Concatenates 2 matrices of same cardinality into a single matrix cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx) cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally. evaluateFactorization: : compute RMSE and MAE of a rating matrix factorization against probes fkmeans: : Fuzzy K-means clustering hmmpredict: : Generate random sequence of observations by given HMM itemsimilarity: : Compute the item-item-similarities for item-based collaborative filtering kmeans: : K-means clustering lucene.vector: : Generate Vectors from a Lucene index lucene2seq: : Generate Text SequenceFiles from a Lucene index matrixdump: : Dump matrix in CSV format matrixmult: : Take the product of two matrices parallelALS:
Re: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
Hi, is there any information about the problem I submitted? Best regards, Margus (Margusja) Roo +372 51 48 780 http://margus.roo.ee http://ee.linkedin.com/in/margusroo skype: margusja ldapsearch -x -h ldap.sk.ee -b c=EE (serialNumber=37303140314) -BEGIN PUBLIC KEY- MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCvbeg7LwEC2SCpAEewwpC3ajxE 5ZsRMCB77L8bae9G7TslgLkoIzo9yOjPdx2NN6DllKbV65UjTay43uUDyql9g3tl RhiJIcoAExkSTykWqAIPR88LfilLy1JlQ+0RD8OXiWOVVQfhOHpQ0R/jcAkM2lZa BjM8j36yJvoBVsfOHQIDAQAB -END PUBLIC KEY- On 05/03/14 10:30, Margusja wrote: Hi Here are my actions and the problematic result again: [hduser@vm38 ~]$ git clone https://github.com/apache/mahout.git remote: Reusing existing pack: 76099, done. remote: Counting objects: 39, done. remote: Compressing objects: 100% (32/32), done. remote: Total 76138 (delta 2), reused 0 (delta 0) Receiving objects: 100% (76138/76138), 49.04 MiB | 275 KiB/s, done. Resolving deltas: 100% (34449/34449), done. [hduser@vm38 ~]$ cd mahout [hduser@vm38 ~]$ mvn clean package -DskipTests=true -Dhadoop2.version=2.2.0 ... ... ... [INFO] Reactor Summary: [INFO] [INFO] Mahout Build Tools SUCCESS [15.529s] [INFO] Apache Mahout . SUCCESS [1.657s] [INFO] Mahout Math ... SUCCESS [1:00.891s] [INFO] Mahout Core ... SUCCESS [2:44.617s] [INFO] Mahout Integration SUCCESS [38.195s] [INFO] Mahout Examples ... SUCCESS [45.458s] [INFO] Mahout Release Package SUCCESS [0.012s] [INFO] Mahout Math/Scala wrappers SUCCESS [53.519s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 6:27.763s [INFO] Finished at: Wed Mar 05 10:22:51 EET 2014 [INFO] Final Memory: 57M/442M [INFO] [hduser@vm38 mahout]$ [hduser@vm38 mahout]$ cd ../ [hduser@vm38 ~]$ /usr/lib/hadoop/bin/hadoop jar mahout/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar org.apache.mahout.classifier.df.mapreduce.BuildForest -d input/data666.noheader.data -ds input/data666.noheader.data.info -sl 5 -p -t 100 -o nsl-forest 14/03/05 10:26:39 INFO mapreduce.BuildForest: Partial Mapred implementation 14/03/05 10:26:39 INFO mapreduce.BuildForest: Building the forest... 14/03/05 10:26:39 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 14/03/05 10:26:51 INFO input.FileInputFormat: Total input paths to process : 1 14/03/05 10:26:51 INFO mapreduce.JobSubmitter: number of splits:1 14/03/05 10:26:51 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name 14/03/05 10:26:51 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 14/03/05 10:26:51 INFO Configuration.deprecation: mapred.cache.files.filesizes is deprecated. Instead, use mapreduce.job.cache.files.filesizes 14/03/05 10:26:51 INFO Configuration.deprecation: mapred.cache.files is deprecated. Instead, use mapreduce.job.cache.files 14/03/05 10:26:51 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 14/03/05 10:26:51 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class 14/03/05 10:26:51 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class 14/03/05 10:26:51 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name 14/03/05 10:26:51 INFO Configuration.deprecation: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class 14/03/05 10:26:51 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 14/03/05 10:26:51 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir 14/03/05 10:26:51 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class 14/03/05 10:26:51 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 14/03/05 10:26:51 INFO Configuration.deprecation: mapred.cache.files.timestamps is deprecated. Instead, use mapreduce.job.cache.files.timestamps 14/03/05 10:26:51 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class 14/03/05 10:26:51 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir 14/03/05 10:26:52 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1393936067845_0018 14/03/05