date:20070604

Re: [Nutch-general] Is fetcher.throttle.bandwidth known to work?

2007-06-04 Thread Andrzej Bialecki

Enzo Michelangeli wrote: - Original Message - From: Andrzej Bialecki [EMAIL PROTECTED] Sent: Monday, June 04, 2007 1:31 AM Enzo Michelangeli wrote: In my case (with Nutch 0.8), it seems not: I set it to 500, and the fetcher still saturates the 1.5 Mbit/s link... Is it supposed to

Re: [Nutch-general] Error with the inject command

2007-06-04 Thread Vishal Shah

Hi Berlin, Nutch needs a file called urls.txt inside the directory that you are passing to the inject command. Try renaming the urls file to urls.txt. Also, are you using the local FS or hadoop dfs? If it's the latter, you'll have to put your dmoz directory on the hadoop fs. -vishal.

[Nutch-general] Number of Pages

2007-06-04 Thread carmmello

As I indexed about 600 sites with Nutch 0.9, I noticed that, at least, one of them were showing less results than expected. This site was www.nrc.gov. As a test I tried to index only the NRC site, allowing only internal links in site.xml conf. file, using crawl-urlfiter.txt with

Re: [Nutch-general] Nutch 0.9 and Crawl-Delay

2007-06-04 Thread Ken Krugler

Hi Lutz, I have had problems with a Nutch based robot during the last 12 hours, which I have now solved by banning this particular bot from my server (not Nutch completely for the moment). The ilial bot, which created considerable load on my server, was using the latest Nutch version - v0.9 -

[Nutch-general] Loading mechnism of plugin classes and singleton objects

2007-06-04 Thread Enzo Michelangeli

I have a question about the loading mechanism of plugin classes. I'm working with a custom URLFilter, and I need a singleton object loaded and initialized by the first instance of the URLFilter, and shared by other instances (e.g., instantiated by other threads). I was assuming that the

[Nutch-general] field collapsing impl

2007-06-04 Thread Yonik Seeley

Some people over in Solr-land are developing generic field collapsing https://issues.apache.org/jira/browse/SOLR-236 and I thought I should check if you guys have any good ideas about it. How does Nutch implement this for grouping results by site (like google does)? -Yonik

Re: [Nutch-general] Loading mechnism of plugin classes and singleton objects

2007-06-04 Thread Enzo Michelangeli

Additionally, the shutDown() method (that shadows the one in org.apache.nutch.plugin.Plugin) appears never to be called, even if System.runFinalizersOnExit(true) (which is deprecated as dangerous) was previously invoked. The only way of having my shutdown code executed seems to be to place it

[Nutch-general] Complex problem of recrawling economically

2007-06-04 Thread Manoharam Reddy

Hi, I am trying to solve a problem but I am unable to find any feature in Nutch that lets me solve this problem. Let's say in my intranet there are 1000 sites. Sites 1 to 100 have pages that are never going to change, i.e. they are static. So I don't need to crawl them again and again. But

Re: [Nutch-general] Is fetcher.throttle.bandwidth known to work?

Re: [Nutch-general] Error with the inject command

[Nutch-general] Number of Pages

Re: [Nutch-general] Nutch 0.9 and Crawl-Delay

[Nutch-general] Loading mechnism of plugin classes and singleton objects

[Nutch-general] field collapsing impl

Re: [Nutch-general] Loading mechnism of plugin classes and singleton objects

[Nutch-general] Complex problem of recrawling economically

8 matches

Site Navigation

Mail list logo

Footer information