Hi,
I'm interested in building a Nutch plugin. I am having trouble
getting the example "recommended" plugin to work - I followed all of
the steps in http://wiki.apache.org/nutch/WritingPluginExample-0%2e9,
confirmed after I ran the top-level ant that
build/plugins/recommended contained the plugin.xml and jar file for
the 'recommended' plugin, and then tried crawling a single page from
a local webserver that contains the test content (with the
="recommended" meta tag) from the example. Although the page got
crawled/indexed and I can search for it, I see no evidence of any
rank boosting on the "explain" search link, and when I look at
NUTCHDIR/logs/hadoop.log I don't see any indication that the
recommended filter got loaded by the crawl.
If anyone has suggestions I'd appreciate hearing them.
Also, a couple of things I notice that I didn't understand and/or
looked odd from the example wiki page:
1. In the section on "Getting Ant to Compile Your Plugin", it said to
add the line into NUTCHDIR/src/plugin/build.xml:
<ant dir="reccomended" target="deploy" />
There's an extra "c" in there (typo). (I fixed my local copy before
I ran the crawl; telling you in case you want to update the wiki; I
don't want to edit it myself until I have actually gotten it working...)
2. In the section on "Getting Nutch to Use Your Plugin" it said to
add a regex to include the id of the plugin, using the example:
<value>recommended|protocol-http|urlfilter-regex|parse-(text|html|js)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
But the <description> just above this part says you need to at least
include the nutch-extensionpoints plugin (which is not present in
this line). I notice from the wiki edit history you used to have the
nutch-extensionpoints plugin in there and removed it, so I'm not sure
which way it's supposed to be -- what's correct?
(I tried it both with and without the nutch-extensionpoints and
neither way worked for me.)
Thanks
- Mike Schwartz