According to Gabriele Bartolini: > I came up with a doubt (I don't know but I never paid attention to this > before). If I am not *wrong*, when requesting a URL through HTTP, we should > encode the string (URL encoding), by using the encodeURL functions of the > URL class (URLTrans.cc file). > > Essentially, two fields of the HTTP request IMHO need to be encoded > prior requesting the resource. > > First case in the request line, for instance GET encoded_URL HTTP/1.0. > > The second one, when given, the referer. > > Any suggestion? Do you think we can just ignore this and keep on sending > the plain URL?
Well, htdig never does actually try to decode URLs, does it? Apart from when it tries to match local_urls, I believe htdig always keeps URLs in a hex-encoded form. If it does get URLs that aren't properly encoded, it's because they weren't properly encoded in the source HTML documents that it indexes. If you were to add an extra encoding step, I think the danger would be that you'd end up doubly encoding URLs that were already properly encoded in the documents in which they were found. Maybe someone can correct me if I'm falsely assuming what the code is doing, or what HTML documents are supposed to contain in their hrefs, but I think htdig is behaving correctly. -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) _______________________________________________________________ Don't miss the 2002 Sprint PCS Application Developer's Conference August 25-28 in Las Vegas - http://devcon.sprintpcs.com/adp/index.cfm?source=osdntextlink _______________________________________________ htdig-dev mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/htdig-dev
