Scott Green wrote:
Well, why should all resources needed to be packed?

Because when you run Nutch on a Hadoop cluster, Hadoop requires that all job resources be packed into a job JAR, which is then submitted to each tasktracker as a part of the job. So, if you want to run in non-local mode you have to build the nutch-xxx.job JAR ("ant job" target).

Apparently you are running in so called "local" mode, where these issues are quite muddy - but as soon as you try to execute it on a cluster your method will stop working.


The built result may looks like:

xxx-plugin
 `--- conf
 `--- web
 `--- xxx-plugin.jar
 `--- deps.jar
 `-- plugin.xml

Again: in the "local" mode this may work, but these unpacked plugins are not available for jobs executing on a Hadoop cluster.


Now, you may have tested your method and found that it does indeed work
- but the reason is a bit obscure: the bin/nutch and bin/hadoop scripts
add your build/ directory to the classpath, so that you can locally test
the latest versions of the code without creating the *.job file.
However, when you run your code on a Hadoop cluster your local build/
directory is no longer accessible, and your method will mysteriously
fail - or even worse, you may get a different version of a resource from
an older version of the build/ directory found on Hadoop tasktracker
nodes ...

If you packed everything into jar(s), it is possible that the jar on
hadoop tasktracker node is old version, right?

No. The job jar is always up to date, because it is sent with every job.

But if you don't get the resources from this jar, and instead rely on using java.io.File-s, you may pick some old cruft from the local build/ directory that you may have accidentally deployed to your tasktrackers ...

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply via email to