Re: fetcher issues

Dennis Kubes Sun, 12 Apr 2009 20:44:49 -0700

Could it be you have duplicate urls when they are normalized?


Dennis

Fadzi Ushewokunze wrote:

yanky,

thanks for that;

i am running on linux so i definitely dont have spaces in my files

names.

having said that i changed the db.max.outlinks.per.page to 1000 from 100
and i started getting exactly 500 documents instead of the 600 or so. So
i changed it to -1 and i am still getting 500 docs! Not sure whats going
on here.



On Mon, 2009-04-13 at 11:17 +0800, yanky young wrote:

Hi:

I have encountered a similar problem with local windows file system search
with nutch 0.9. You can see my post here.
http://www.nabble.com/nutch-0.9-protocol-file-plugin-break-with-windows-file-name-that--contains-space-td22903785.html.
Hope it helps.

good luck

yanky


2009/4/13 Fadzi Ushewokunze <[email protected]>

Hi,

I am having a problem with a file system crawl where i have about 600
urls text files in a folder and only 100 of them are getting fetched and
indexed.

i have +^file://* in my regex.urlfilter.txt and crawl.urlfilter.txt so
every file should be picked up. i have created my own text parser and
Indexingfilter plugins as well. Not sure if this could have something to
do with this problem or not. I dont think it does.

I can see that the QueueFeeder only contains 100 records only but doesnt
replenish;

Any leads?

Thanks,

Fadzi

Re: fetcher issues

Reply via email to