Thanks Alexander, however, tried that but again the plugin is registered but not used. The mime-type is html, I had not entered my other plugins in parse-plugin.xml but they were still running.
The other thing I don't get is that all urls starting with literature/article.do are not being indexed by any of my plugins. Maybe the fetching process is somehow scoring them and deciding that they are not worth indexing. I am using boost values so could this be a possibility. Again, these urls get fetched but never indexed. hadoop.log file shows 2009-05-07 14:32:23,048 INFO fetcher.Fetcher - fetching http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=5966 2009-05-07< http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=5966%0A2009-05-07 > > > 14:32:23,049 INFO fetcher.Fetcher - fetching > > > http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=9196 > 2009-05-07< > http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=9196%0A2009-05-07>14:32:23,051 > INFO fetcher.Fetcher - fetching > > > http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=6247 > 2009-05-07< > http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=6247%0A2009-05-07>14:32:23,052 > INFO fetcher.Fetcher - fetching > > Thanks, Kenan. On Thu, May 7, 2009 at 11:12 PM, Alexander Aristov < [email protected]> wrote: > Did you assign mime type to this plugin. What is it? > > It's in the parse-plugins.xml file. Unless you do that Nutch won't know if > it should invoke your plugin for processing particular pages. > > > Best Regards > Alexander Aristov > > > 2009/5/8 kazam <[email protected]> > > > > > Hi there, > > I am using nutch-0.8.1 and I have 5 custom plugins that I am using. All > of > > those plugins seem to get used from the logs but one of them is not being > > used. Also, the urls it was written for are also skipped altogether. > > > > Here are some pieces from hadoop.log file > > 2009-05-07 14:27:41,227 INFO plugin.PluginRepository - Registered > Plugins: > > ..... > > ......... > > 2009-05-07 14:27:41,228 INFO plugin.PluginRepository - Xenbase > > Indexer > > (index-xenbase) > > 2009-05-07 14:27:41,228 INFO plugin.PluginRepository - Article > > Display > > Page Parser (parse-articlePage) > > > > The last plugin --> parse-articlePage is never used. > > > > I wrote this plugin to index urls of the type > > > > > http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=670 > > > > Again, these urls get fetched but never indexed. > > hadoop.log file shows > > 2009-05-07 14:32:23,048 INFO fetcher.Fetcher - fetching > > > > > http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=5966 > > 2009-05-07< > http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=5966%0A2009-05-07>14:32:23,049 > INFO fetcher.Fetcher - fetching > > > > > http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=9196 > > 2009-05-07< > http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=9196%0A2009-05-07>14:32:23,051 > INFO fetcher.Fetcher - fetching > > > > > http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=6247 > > 2009-05-07< > http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=6247%0A2009-05-07>14:32:23,052 > INFO fetcher.Fetcher - fetching > > > > > http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=6223 > > > > Am I missing some configuration, or is there a bug in the plugin, I don't > > see any exceptions being thrown. > > > > Thanks for any pointers. > > > > > > -- > > View this message in context: > > > http://www.nabble.com/Registered-plugin-never-invoked-and-urls-skipped-tp23435093p23435093.html > > Sent from the Nutch - User mailing list archive at Nabble.com. > > > > >
