On Friday, January 24, 2003, at 10:04  AM, Gilles Detillieux wrote:
According to Conrad Schilbe:
On Thursday, January 23, 2003, at 04:13  PM, Gilles Detillieux wrote:
According to Conrad Schilbe:
At some point I stopped receiving the "Errors to take note of"
information in my dig reports. I have done tests with the exact same
configuration digging different sites and the information was there.
The dig I am doing is 3000 + documents, could this be a problem or
possibly something to do with the configuration of the apache server?

I have tried many changes to the configuration for htdig with no luck
and again the configuration I have works for a different and smaller
site. Same installation, just digging a different site.

I turned to the mailing list expecting similar problems reported, with
no luck.

Any suggestions would be greatly appreciated.
Are you sure you're still running htdig with the -s option?
Yes. That was my first thought too.

I have since done some testing digging the same server but a different
virtual host and have found that it works there. So to me, it is either
a volume issue or an apache configuration issue. I imagine that htdig
is in use for other sites with 3000+ active documents, can anyone
confirm this?

So that leaves only an apache configuration issue... but I did note
that even on sites that display no errors, the text `Errors to note:'
is still in the report... this is not the case for this one particular
site.

Is it possible that somewhere along the way the htdig report is
becoming too large and that information is dropped? I notice that htdig
does not print anything while digging unless of course verbose is on.
It then stores the data in memory and dumps when done. This could be a
volume issue then.

More likely, what's happening is for whatever reason htdig is no longer
getting any 404 errors on that site. That would happen if you're doing
update digs, as the documents that aren't found will be removed from the
database, so until one of the documents that's in the database or in the
start_url is updated, htdig won't even look at its links to get the URLs
again for the missing documents. Either update these documents that
link to the formerly missing URLs, or reindex the site from scratch.
It isn't running in update mode. I even added `remove_bad_urls: false' to the configuration file.


If that doesn't help, have a look at how 404 errors are dealt with on
that site. It may be that htdig is never seeing that status code there,
but is instead getting some other document (e.g. an error page), with
a normal status code, for any unresolvable URL on that site.
Even if it is not seeing any bad URLs possibly caused by the way 404s are handled, it should still output `Errors to take note of:' in the report. That text should be there even when there are no errors... I have seen it in my tests. Which makes me believe that something is failing.

Conrad



-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html


Reply via email to