yanky, thanks for that;
i am running on linux so i definitely dont have spaces in my files names. having said that i changed the db.max.outlinks.per.page to 1000 from 100 and i started getting exactly 500 documents instead of the 600 or so. So i changed it to -1 and i am still getting 500 docs! Not sure whats going on here. On Mon, 2009-04-13 at 11:17 +0800, yanky young wrote: > Hi: > > I have encountered a similar problem with local windows file system search > with nutch 0.9. You can see my post here. > http://www.nabble.com/nutch-0.9-protocol-file-plugin-break-with-windows-file-name-that--contains-space-td22903785.html. > Hope it helps. > > good luck > > yanky > > > 2009/4/13 Fadzi Ushewokunze <[email protected]> > > > Hi, > > > > I am having a problem with a file system crawl where i have about 600 > > urls text files in a folder and only 100 of them are getting fetched and > > indexed. > > > > i have +^file://* in my regex.urlfilter.txt and crawl.urlfilter.txt so > > every file should be picked up. i have created my own text parser and > > Indexingfilter plugins as well. Not sure if this could have something to > > do with this problem or not. I dont think it does. > > > > I can see that the QueueFeeder only contains 100 records only but doesnt > > replenish; > > > > Any leads? > > > > Thanks, > > > > Fadzi > > > > > >
