Just to let you know that I fix my problem. I found the following in the parameters nutch-default.xml
<property> <name>db.max.outlinks.per.page</name> <value>100</value> <description>The maximum number of outlinks that we'll process for a page. </description> </property> Since I have many links on each page then I need to increase the value of this parameter. Francois -----Original Message----- From: Lacoursiere, Francois [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 19, 2005 9:50 AM To: [email protected] Subject: Missing files in fetchlist Hello, I have a small problem. I'm indexing the files of a web server on my intranet (apache). In one directory of the intranet there is 50 files. I run the generate,fetch commands and I see that the last 3 files are never fetched. The following 2 workarounds work: -If I create an index.html file that refers all the 50 files. Then all the 50 files are in the fetch list and they are indexed. -If I do a subdirectory. 47 files in parent dir and I move 3 files in the subdirectory. Then all the 50 files are in the fetchlist and they are indexed. Do you have an idea what's going wrong? thanks Francois. Here is the script I use to build the fetch list and index: : echo "** Nutch Index 1 iteration" bin/nutch generate db segments s1=`ls -d segments/2* | tail -1` echo $s1 echo "** fetch list" bin/nutch fetchlist db -local -dumpurls $s1 bin/nutch fetch -local -threads 1 $s1 bin/nutch updatedb db $s1 echo "** Nutch Index 2 iteration" bin/nutch generate db segments s2=`ls -d segments/2* | tail -1` echo "** fetch list" bin/nutch fetchlist db -local -dumpurls $s2 echo $s2 bin/nutch fetch -local -threads 1 $s2 bin/nutch updatedb db $s2 echo "** Nutch Index 3 iteration" bin/nutch generate db segments s3=`ls -d segments/2* | tail -1` echo "** fetch list" bin/nutch fetchlist db -local -dumpurls $s3 echo $s3 bin/nutch fetch -local -threads 1 $s3 bin/nutch updatedb db $s3 bin/nutch index $s1 bin/nutch index $s2 bin/nutch index $s3 ------------------------------------------------------- This SF.Net email is sponsored by: Power Architecture Resource Center: Free content, downloads, discussions, and more. http://solutions.newsforge.com/ibmarch.tmpl _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
