The debug info indicates that you are successfully connecting to the  
server hosting the site, but that server is choosing to tell you that  
the pages do not exist. Is this a site that you control or have some  
agreement with? One explanation that fits the facts is that the  
server has been configured to deny you access and redirect your  
requests to a 404 error page (perhaps based on IP address or  
similar). One way to at least partially test this possibility would  
be to ssh back to your server and try requesting the page with a text  
browser. If there is a block based solely on IP address or server/ 
domain name, you should see the same 404 response.

If you are sure that you aren't being blocked, you might try copying  
your config file, changing the start_url, and indexing some other  
site just to make sure all the settings are sane.

I tried using htdig (3.1.6) to start indexing this site and had no  
problem retrieving pages with a nearly stock configuration.

Jim

On Mar 8, 2007, at 2:00 PM, Clint Davis wrote:

> I ran rundig from an ssh session to the server. I can pull up the  
> first page
> from my desktop with no problem. I can also retrieve the robots.txt  
> with no
> problem via my desktop browser.
>
> Any other ideas?
>
>
> On 3/8/07 2:51 PM, "Jim Cole" <[EMAIL PROTECTED]> wrote:
>
>> For some reason htdig was unable to retrieve the first page from the
>> site in question. The server is claiming that the file does not exist
>> (404 response). If this only happened at one time, or is always
>> happening at the same time, it might be due to a server problem,
>> server maintenance, etc. If it is happening all the time, a first
>> step would be to fire up a browser on the machine that runs htdig and
>> make sure you can load the page from there.
>>
>> The "DB2 problem..." message is just due to the fact there was
>> nothing in the database when htmerge ran.
>>
>> Jim
>>
>> On Mar 8, 2007, at 9:46 AM, Clint Davis wrote:
>>
>>
>>> After using Htdig for years, I just noticed that one of my sites
>>> hasn't been
>>> indexed properly in a while.
>>>
>> ...
>>
>>> pick: www.realtree.com, # servers = 1
>>> 0:0:0:http://www.realtree.com/: Retrieval command for
>>> http://www.realtree.com/: GET / HTTP/1.0
>>> User-Agent: htdig/3.1.6 ([EMAIL PROTECTED])
>>> Host: www.realtree.com
>>>
>>> Header line: HTTP/1.1 404 Not Found
>>>
>> ...
>>
>>> htmerge: Sorting...
>>> htmerge: Removing doc #0
>>> DB2 problem...: missing or empty key value specified
>>>
>>> Deleted, no excerpt: 0/http://www.realtree.com/
>


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
ht://Dig general mailing list: <[email protected]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to