[ 
http://issues.apache.org/jira/browse/NUTCH-209?page=comments#action_12365794 ] 

[EMAIL PROTECTED] commented on NUTCH-209:
-----------------------------------------

What is the 'simple way in Hadoop' to specify the job jar file?  Is it the 
'jar' argument to the hadoop script?

One use case I forsee is building one jar but using this one jar running 
multiple jobs: E.g. A single nutch job jar would now be used to do indexing 
job, later same jar is used to do dedup, etc.  The hadoop 'jar' option just 
takes the jar name then looks in the jar MANIFEST.MF for the Main-Class failing 
if not present.  This is grand but for the scenario above, it means I have to 
create a jar per job I want to run. Can we pass the Main-Class on the hadoop 
command-line as an (optional) argument to 'hadoop jar JAR_NAME'?  (I can make a 
patch if wanted).





> include nutch jar in mapred jobs
> --------------------------------
>
>          Key: NUTCH-209
>          URL: http://issues.apache.org/jira/browse/NUTCH-209
>      Project: Nutch
>         Type: Improvement
>     Versions: 0.8-dev
>     Reporter: Doug Cutting
>     Priority: Minor
>      Fix For: 0.8-dev

>
> I just added a simple way in Hadoop to specify the job jar file.  When 
> constructing a JobConf one can specify a class whose containing jar is set to 
> be the job's jar.  To take advantage of this in Nutch, we could add a util 
> class:
> public class NutchJob extends JobConf {
>   public NutchJob(Configuration conf) {
>     super(conf, NutchJob.class);
>   }
> }
> Then change all of the places where we construct a JobConf to instead 
> construct a NutchJob.
> Finally, we should add an ant target called 'job' that constructs a job jar, 
> containing all of the classes and the plugins, and make this the default 
> target.  This way all Nutch code can be distributed with each job as it is 
> submitted, and daemons would only need to be restarted when Hadoop code is 
> updated.
> Does this sound reasonable?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to