Thanks Alexander, however, tried that but again the plugin is registered but
not used. The mime-type is html, I had not entered my other plugins in
parse-plugin.xml but they were still running.

The other thing I don't get is that all urls starting with
literature/article.do are not being indexed by any of my plugins. Maybe the
fetching process is somehow scoring them and deciding that they are not
worth indexing.

I am using boost values so could this be a possibility.

  Again, these urls get fetched but never indexed.
hadoop.log file shows
  2009-05-07 14:32:23,048 INFO  fetcher.Fetcher - fetching


http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=5966
  2009-05-07<
http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=5966%0A2009-05-07
>
>
> 14:32:23,049 INFO  fetcher.Fetcher - fetching
>
>
> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=9196
>  2009-05-07<
> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=9196%0A2009-05-07>14:32:23,051
> INFO  fetcher.Fetcher - fetching
>
>
> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=6247
>  2009-05-07<
> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=6247%0A2009-05-07>14:32:23,052
> INFO  fetcher.Fetcher - fetching
>
>
Thanks, Kenan.
On Thu, May 7, 2009 at 11:12 PM, Alexander Aristov <
[email protected]> wrote:

> Did you assign mime type to this plugin. What is it?
>
> It's in the parse-plugins.xml file. Unless you do that Nutch won't know if
> it should invoke your plugin for processing particular pages.
>
>
> Best Regards
> Alexander Aristov
>
>
> 2009/5/8 kazam <[email protected]>
>
> >
> > Hi there,
> > I am using nutch-0.8.1 and I have 5 custom plugins that I am using. All
> of
> > those plugins seem to get used from the logs but one of them is not being
> > used. Also, the urls it was written for are also skipped altogether.
> >
> > Here are some pieces from hadoop.log file
> > 2009-05-07 14:27:41,227 INFO  plugin.PluginRepository - Registered
> Plugins:
> > .....
> > .........
> > 2009-05-07 14:27:41,228 INFO  plugin.PluginRepository -         Xenbase
> > Indexer
> > (index-xenbase)
> > 2009-05-07 14:27:41,228 INFO  plugin.PluginRepository -         Article
> > Display
> > Page Parser (parse-articlePage)
> >
> > The last plugin --> parse-articlePage is never used.
> >
> > I wrote this plugin to index urls of the type
> >
> >
> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=670
> >
> > Again, these urls get fetched but never indexed.
> > hadoop.log file shows
> > 2009-05-07 14:32:23,048 INFO  fetcher.Fetcher - fetching
> >
> >
> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=5966
> > 2009-05-07<
> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=5966%0A2009-05-07>14:32:23,049
> INFO  fetcher.Fetcher - fetching
> >
> >
> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=9196
> > 2009-05-07<
> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=9196%0A2009-05-07>14:32:23,051
> INFO  fetcher.Fetcher - fetching
> >
> >
> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=6247
> > 2009-05-07<
> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=6247%0A2009-05-07>14:32:23,052
> INFO  fetcher.Fetcher - fetching
> >
> >
> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=6223
> >
> > Am I missing some configuration, or is there a bug in the plugin, I don't
> > see any exceptions being thrown.
> >
> > Thanks for any pointers.
> >
> >
> > --
> > View this message in context:
> >
> http://www.nabble.com/Registered-plugin-never-invoked-and-urls-skipped-tp23435093p23435093.html
> > Sent from the Nutch - User mailing list archive at Nabble.com.
> >
> >
>

Reply via email to