Hi,

I've seen the commit from Robin this afternoon so I gave it another try.
Using the new shell I still run into a few problems
At first, in order to satisfy a dependency to slf4j I've had to add the
following to examples/pom.xml (once again I'm not a maven expert, so this
may not be the correct way to do it)

<dependency>
  <groupId>org.slf4j</groupId>
  <artifactId>slf4j-nop</artifactId>
  <version>1.5.8</version>
  <classifier>sources</classifier>
</dependency>

Then, after a succesful mvn -B
I've launched the shell:
flor...@florent-laptop:~/workspace/mahout$ ./examples/bin/build-reuters.sh

It fails with the following error:
10/05/10 21:28:06 WARN mapred.LocalJobRunner: job_local_0001
java.io.IOException: The temporary job-output directory
file:/tokenized-documents/_temporary doesn't exist!
at
org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:204)
at
org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:234)
at
org.apache.hadoop.mapred.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:48)
at
org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:662)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:352)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
10/05/10 21:28:07 INFO mapred.JobClient:  map 0% reduce 0%
10/05/10 21:28:07 INFO mapred.JobClient: Job complete: job_local_0001
10/05/10 21:28:07 INFO mapred.JobClient: Counters: 0
10/05/10 21:28:07 ERROR driver.MahoutDriver: MahoutDriver failed with args:
[-i, ./examples/bin/work/reuters-out-seqdir/, -o,
./examples/bin/work/reuters-out-seqdir-sparse, null]
Job failed!
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
at
org.apache.mahout.utils.vectors.text.DocumentProcessor.tokenizeDocuments(DocumentProcessor.java:97)
at
org.apache.mahout.text.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:215)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:172)

A find makes me think that the issue is
in 
/utils/src/main/java/org/apache/mahout/utils/vectors/text/DocumentProcessor.java
/utils/src/main/java/org/apache/mahout/utils/vectors/text/DocumentProcessor.java:
 public static final String TOKENIZED_DOCUMENT_OUTPUT_FOLDER =
"/tokenized-documents";

I tried changing this value, but it did not solve my problem, although I did
a mvn -B on utils afterwards.... it looks like the mahout-utils used by the
test comes from somewhere else: I guess there's something I'm missing....




2010/5/10 Jeff Eastman <[email protected]>

> I will commit once I verify it completes.  It's running now...
> Jeff
>
>
> On 5/10/10 7:50 AM, Robin Anil wrote:
>
>> +1. Should be using bin/mahout script for all these.
>>
>>
>> Robin
>>
>>
>> On Mon, May 10, 2010 at 8:12 PM, Jeff Eastman<[email protected]
>> >wrote:
>>
>>
>>
>>> Well, thanks for the info. Perhaps we should replace the script then.
>>> Leaving time bombs around like this is not good.
>>> Jeff
>>>
>>>
>>> On 5/10/10 7:32 AM, Robin Anil wrote:
>>>
>>>
>>>
>>>> thats been broken for a long time, it was used by David while he
>>>> developed
>>>> LDA, It didn't get updated to work post 0.2 . Use Sisir's script to
>>>> convert
>>>> reuters to vectors, its up on the wiki
>>>>
>>>> Robin
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>
>

Reply via email to