Hello, I am trying to get a local searchable index of the site www.dbmsmag.com but htdig stops right in the first page despite I am able to leech the whole site with wget. I am using the following configuration with htdig 3.1.14. Is there anything wrong or missing in it or could this be a bug in htdig? start_url: http://www.dbmsmag.com/index.shtml limit_urls: http://www.dbmsmag.com/ search_algorithm: exact:1 endings:0.5 exclude_urls: ? valid_extensions: .html .htm .shtml noindex_start: <html> noindex_end: </html> database_dir: /extra/dbms/db bad_extensions: .wav .gz .z .sit .au .zip .tar .hqx .exe .com .gif .jpg .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg .mov .avi max_head_length: 10000 max_doc_size: 200000 no_excerpt_show_top: true valid_punctuation: : .-_/!#$%^&*+; template_map: htdig htdig /extra/dbms/htdig_template.html search_results_header: /extra/dbms/htdig_header.html search_results_footer: nothing_found_file: /extra/dbms/htdig_nomatch.html syntax_error_file: /extra/dbms/htdig_syntaxerror.html My dig, merge and fuzzy script outputs this: 2000-06-18 01:11:34 Starting htdig... (/usr/local/htdig/bin/htdig -v 3 -s -a -c /extra/dbms/conf/htdig.conf) New server: www.dbmsmag.com, 80 0:0:0:http://www.dbmsmag.com/index.shtml: -------------------- size = 5040 htdig: Run complete htdig: 1 server seen: htdig: www.dbmsmag.com:80 1 document 2000-06-18 01:11:52 htdig done... 2000-06-18 01:11:52 Starting htmerge... (/usr/local/htdig/bin/htmerge -v -s -a -c /extra/dbms/conf/htdig.conf) htmerge: Sorting... htmerge: Merging... htmerge: 100:orcommentsab htmerge: Total word count: 149 htmerge: Total documents: 1 htmerge: Total doc db size (in K): 4 2000-06-18 01:11:52 htmerge done... 2000-06-18 01:11:52 Starting htfuzzy... (/usr/local/htdig/bin/htfuzzy -c /extra/dbms/conf/htdig.conf endings) 2000-06-18 01:12:12 htfuzzy done... 2000-06-18 01:12:12 Updating htdig database files 2000-06-18 01:12:12 Updated htdig database files Regards, Manuel Lemos PS: I did a traceroute to the www.htdig.org site and it seems that the route was looping between contigo-gw.sndgca.pacific.verio.net (207.67.241.138) a1-5-0-0-49.a02.sndgca02.us.ra.verio.net (207.67.241.137), so I suspect that this message will bounce unless the responsible carrier fixes their routers or else you will be seeing this message, so never mind! :-) Web Programming Components using PHP Classes. Look at: http://phpclasses.UpperDesign.com/?[EMAIL PROTECTED] -- E-mail: [EMAIL PROTECTED] URL: http://www.mlemos.e-na.net/ PGP key: http://www.mlemos.e-na.net/ManuelLemos.pgp -- ------------------------------------ To unsubscribe from the htdig3-dev mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this.
