Can someone help me identify the file and subroutine that
is the FIRST to see and strip out the HREFs?  I think that
this is the best place to lowercase a URL, before it gets
any further in the spidering process..

At that location, we'll check the conf["case_sensitive"]
(or pass that value to that subroutine) and IMMEDIATELY
lowercase the URL(s).

>> I found that if I uncommented each line in DocumentDB.cc
>> that contains "url.lowercase()", htdig's verbose report
>> still looks like this:
>
>I think this will still be necessary.
>
>> I would like to avoid any uppercase representation all
>> together if case_sensitive = false;
>
>I think you'll need to do this in Retriever.cc, probably for the Need2Get
>portion.

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to