The debug info indicates that you are successfully connecting to the server hosting the site, but that server is choosing to tell you that the pages do not exist. Is this a site that you control or have some agreement with? One explanation that fits the facts is that the server has been configured to deny you access and redirect your requests to a 404 error page (perhaps based on IP address or similar). One way to at least partially test this possibility would be to ssh back to your server and try requesting the page with a text browser. If there is a block based solely on IP address or server/ domain name, you should see the same 404 response.
If you are sure that you aren't being blocked, you might try copying your config file, changing the start_url, and indexing some other site just to make sure all the settings are sane. I tried using htdig (3.1.6) to start indexing this site and had no problem retrieving pages with a nearly stock configuration. Jim On Mar 8, 2007, at 2:00 PM, Clint Davis wrote: > I ran rundig from an ssh session to the server. I can pull up the > first page > from my desktop with no problem. I can also retrieve the robots.txt > with no > problem via my desktop browser. > > Any other ideas? > > > On 3/8/07 2:51 PM, "Jim Cole" <[EMAIL PROTECTED]> wrote: > >> For some reason htdig was unable to retrieve the first page from the >> site in question. The server is claiming that the file does not exist >> (404 response). If this only happened at one time, or is always >> happening at the same time, it might be due to a server problem, >> server maintenance, etc. If it is happening all the time, a first >> step would be to fire up a browser on the machine that runs htdig and >> make sure you can load the page from there. >> >> The "DB2 problem..." message is just due to the fact there was >> nothing in the database when htmerge ran. >> >> Jim >> >> On Mar 8, 2007, at 9:46 AM, Clint Davis wrote: >> >> >>> After using Htdig for years, I just noticed that one of my sites >>> hasn't been >>> indexed properly in a while. >>> >> ... >> >>> pick: www.realtree.com, # servers = 1 >>> 0:0:0:http://www.realtree.com/: Retrieval command for >>> http://www.realtree.com/: GET / HTTP/1.0 >>> User-Agent: htdig/3.1.6 ([EMAIL PROTECTED]) >>> Host: www.realtree.com >>> >>> Header line: HTTP/1.1 404 Not Found >>> >> ... >> >>> htmerge: Sorting... >>> htmerge: Removing doc #0 >>> DB2 problem...: missing or empty key value specified >>> >>> Deleted, no excerpt: 0/http://www.realtree.com/ > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ ht://Dig general mailing list: <[email protected]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

