According to Gabriele Bartolini:
>     I came up with a doubt (I don't know but I never paid attention to this 
> before). If I am not *wrong*, when requesting a URL through HTTP, we should 
> encode the string (URL encoding), by using the encodeURL functions of the 
> URL class (URLTrans.cc file).
> 
>     Essentially, two fields of the HTTP request IMHO need to be encoded 
> prior requesting the resource.
> 
>     First case in the request line, for instance GET encoded_URL HTTP/1.0.
> 
>     The second one, when given, the referer.
> 
> Any suggestion? Do you think we can just ignore this and keep on sending 
> the plain URL?

Well, htdig never does actually try to decode URLs, does it?  Apart from
when it tries to match local_urls, I believe htdig always keeps URLs in
a hex-encoded form.  If it does get URLs that aren't properly encoded,
it's because they weren't properly encoded in the source HTML documents
that it indexes.  If you were to add an extra encoding step, I think
the danger would be that you'd end up doubly encoding URLs that were
already properly encoded in the documents in which they were found.

Maybe someone can correct me if I'm falsely assuming what the code is
doing, or what HTML documents are supposed to contain in their hrefs,
but I think htdig is behaving correctly.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)

_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas - 
http://devcon.sprintpcs.com/adp/index.cfm?source=osdntextlink

_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to