Hi,

Here I'm developing a MapReduce web crawler which reads url lists and writes 
html to MongoDB.
So, each map read one url list file, get the html and insert to MongoDB. There 
is no reduce and no output of map. So, how to set the output directory in this 
case? If I do not set the output directory, it gives me following exception,

Exception in thread "main" org.apache.hadoop.mapred.InvalidJobConfException: 
Output directory not set.
        at 
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:123)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:872)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
        at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:476)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:506)
        at 
com.ipinyou.data.preprocess.mapreduce.ExtractFeatureFromURLJob.main(ExtractFeatureFromURLJob.java:56)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:186)


Thank you ! 

Best,
Huanchen
  

2012-06-08 



huanchen.zhang 

Reply via email to