Sean posted something about that recently (5/5/10: Re: Installation
problem in utils) in which he claims to have fixed it. At least, all the
tests ran. But then, you are running the reuters script and that does
not get exercised in the build. I suspect there are some more issues
with the recent temp file allocation patch. Are you running trunk?
On 5/10/10 1:43 PM, Florent Empis wrote:
Hi,
It might help for the build part, but probably won't fix the 2nd issue?
The / is not writeable on most systems so creation of
/tokenized-documents/_temporary
will still fail?
2010/5/10 Jeff Eastman<[email protected]>
Hi Florent,
I successfully ran the new build-reuters.sh before I committed it this
morning, so I suspect you must have some other problem in your system. Have
you tried deleting your Maven repository (.m2) and doing a full mvn clean
install?
Jeff
On 5/10/10 12:50 PM, Florent Empis wrote:
Hi,
I've seen the commit from Robin this afternoon so I gave it another try.
Using the new shell I still run into a few problems
At first, in order to satisfy a dependency to slf4j I've had to add the
following to examples/pom.xml (once again I'm not a maven expert, so this
may not be the correct way to do it)
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-nop</artifactId>
<version>1.5.8</version>
<classifier>sources</classifier>
</dependency>
Then, after a succesful mvn -B
I've launched the shell:
flor...@florent-laptop:~/workspace/mahout$
./examples/bin/build-reuters.sh
It fails with the following error:
10/05/10 21:28:06 WARN mapred.LocalJobRunner: job_local_0001
java.io.IOException: The temporary job-output directory
file:/tokenized-documents/_temporary doesn't exist!
at
org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:204)
at
org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:234)
at
org.apache.hadoop.mapred.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:48)
at
org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:662)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:352)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
10/05/10 21:28:07 INFO mapred.JobClient: map 0% reduce 0%
10/05/10 21:28:07 INFO mapred.JobClient: Job complete: job_local_0001
10/05/10 21:28:07 INFO mapred.JobClient: Counters: 0
10/05/10 21:28:07 ERROR driver.MahoutDriver: MahoutDriver failed with
args:
[-i, ./examples/bin/work/reuters-out-seqdir/, -o,
./examples/bin/work/reuters-out-seqdir-sparse, null]
Job failed!
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
at
org.apache.mahout.utils.vectors.text.DocumentProcessor.tokenizeDocuments(DocumentProcessor.java:97)
at
org.apache.mahout.text.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:215)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:172)
A find makes me think that the issue is
in
/utils/src/main/java/org/apache/mahout/utils/vectors/text/DocumentProcessor.java
/utils/src/main/java/org/apache/mahout/utils/vectors/text/DocumentProcessor.java:
public static final String TOKENIZED_DOCUMENT_OUTPUT_FOLDER =
"/tokenized-documents";
I tried changing this value, but it did not solve my problem, although I
did
a mvn -B on utils afterwards.... it looks like the mahout-utils used by
the
test comes from somewhere else: I guess there's something I'm missing....
2010/5/10 Jeff Eastman<[email protected]>
I will commit once I verify it completes. It's running now...
Jeff
On 5/10/10 7:50 AM, Robin Anil wrote:
+1. Should be using bin/mahout script for all these.
Robin
On Mon, May 10, 2010 at 8:12 PM, Jeff Eastman<
[email protected]
wrote:
Well, thanks for the info. Perhaps we should replace the script then.
Leaving time bombs around like this is not good.
Jeff
On 5/10/10 7:32 AM, Robin Anil wrote:
thats been broken for a long time, it was used by David while he
developed
LDA, It didn't get updated to work post 0.2 . Use Sisir's script to
convert
reuters to vectors, its up on the wiki
Robin