Can someone help me identify the file and subroutine that is the FIRST to see and strip out the HREFs? I think that this is the best place to lowercase a URL, before it gets any further in the spidering process.. At that location, we'll check the conf["case_sensitive"] (or pass that value to that subroutine) and IMMEDIATELY lowercase the URL(s). >> I found that if I uncommented each line in DocumentDB.cc >> that contains "url.lowercase()", htdig's verbose report >> still looks like this: > >I think this will still be necessary. > >> I would like to avoid any uppercase representation all >> together if case_sensitive = false; > >I think you'll need to do this in Retriever.cc, probably for the Need2Get >portion. ------------------------------------ To unsubscribe from the htdig3-dev mailing list, send a message to [EMAIL PROTECTED] containing the single word "unsubscribe" in the SUBJECT of the message.
