I've been battling with the case_sensitive issue for a while now.  It
seems that by declaring "case_sensitive: false" will automatically
lowercase the URLs (performed in ../htlib/URL.cc).  This seems like
a great idea, however, I think a more logical procedure would be to
not automatically lowercase the URL from the get-go and only lower
case the URL temporarily when performing comparisons to previously
crawled/queued URLs.

Basically, what is happening is that the university's web server uses
Apache's mod_mispel.  Upon a URL case sensitivity mis-match (ex:
http://www.foo.com/DOCUMENT is the request, but http://www.foo.com/document
is the true document name), the module will send an automatic
301 Moved Permanently message -- a message that htdig does NOT follow,
regardless of the case_sensitive argument.

Long story short: where/how can the code be modified so that the actual
URL is NOT lowercased automatically, but rather, is only lowercased
temporarily when doing a comparison to other queued/crawled URLs
(which will also be temporarily lowercased during the comparison
process)?

Thanks,
Patrick


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] 
You will receive a message to confirm this. 


Reply via email to