Hi, I've seen the commit from Robin this afternoon so I gave it another try. Using the new shell I still run into a few problems At first, in order to satisfy a dependency to slf4j I've had to add the following to examples/pom.xml (once again I'm not a maven expert, so this may not be the correct way to do it)
<dependency> <groupId>org.slf4j</groupId> <artifactId>slf4j-nop</artifactId> <version>1.5.8</version> <classifier>sources</classifier> </dependency> Then, after a succesful mvn -B I've launched the shell: flor...@florent-laptop:~/workspace/mahout$ ./examples/bin/build-reuters.sh It fails with the following error: 10/05/10 21:28:06 WARN mapred.LocalJobRunner: job_local_0001 java.io.IOException: The temporary job-output directory file:/tokenized-documents/_temporary doesn't exist! at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:204) at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:234) at org.apache.hadoop.mapred.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:48) at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:662) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:352) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) 10/05/10 21:28:07 INFO mapred.JobClient: map 0% reduce 0% 10/05/10 21:28:07 INFO mapred.JobClient: Job complete: job_local_0001 10/05/10 21:28:07 INFO mapred.JobClient: Counters: 0 10/05/10 21:28:07 ERROR driver.MahoutDriver: MahoutDriver failed with args: [-i, ./examples/bin/work/reuters-out-seqdir/, -o, ./examples/bin/work/reuters-out-seqdir-sparse, null] Job failed! Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252) at org.apache.mahout.utils.vectors.text.DocumentProcessor.tokenizeDocuments(DocumentProcessor.java:97) at org.apache.mahout.text.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:215) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:172) A find makes me think that the issue is in /utils/src/main/java/org/apache/mahout/utils/vectors/text/DocumentProcessor.java /utils/src/main/java/org/apache/mahout/utils/vectors/text/DocumentProcessor.java: public static final String TOKENIZED_DOCUMENT_OUTPUT_FOLDER = "/tokenized-documents"; I tried changing this value, but it did not solve my problem, although I did a mvn -B on utils afterwards.... it looks like the mahout-utils used by the test comes from somewhere else: I guess there's something I'm missing.... 2010/5/10 Jeff Eastman <[email protected]> > I will commit once I verify it completes. It's running now... > Jeff > > > On 5/10/10 7:50 AM, Robin Anil wrote: > >> +1. Should be using bin/mahout script for all these. >> >> >> Robin >> >> >> On Mon, May 10, 2010 at 8:12 PM, Jeff Eastman<[email protected] >> >wrote: >> >> >> >>> Well, thanks for the info. Perhaps we should replace the script then. >>> Leaving time bombs around like this is not good. >>> Jeff >>> >>> >>> On 5/10/10 7:32 AM, Robin Anil wrote: >>> >>> >>> >>>> thats been broken for a long time, it was used by David while he >>>> developed >>>> LDA, It didn't get updated to work post 0.2 . Use Sisir's script to >>>> convert >>>> reuters to vectors, its up on the wiki >>>> >>>> Robin >>>> >>>> >>>> >>>> >>> >>> >>> >> >> > >
