Hi all, I'm not too certain about all the details of character encodings in HttpClient but it is on my list of docs to write so would like to confirm a few things and extract any thoughts about it.
1. URLs should only consist of ISO-8859-1 characters whenever possible as this is the encoding used by RFC 1738 and using other encodings may cause compatibility issues with some servers (eg: Windows Web Folders). This is mostly due to the fact that there is no way to determine the encoding used for the URL. 2. The headers of a HTTP request/response must always be ISO 8859-1 (or is this ASCII?) as per the HTTP standard. 3. The Content-Type: header may specify a charset for the body of the HTTP request/response, eg: Content-Type: text/html; charset=UTF-8 4. Is there any simple way to extract the charset returned by the server from HttpClient? If not we probably should add one. Obviously you could get the Content-Type header and parse it but since HttpClient already does this (I think) it would be better to avoid it. 5. getResponseBodyAsString always uses the platform default encoding. Why doesn't this use the charset specified in the HTTP request? 6. Some document types specify the charset inside the document itself, you should consult the appropriate standards to determine whether to use the charset specified in the HTTP response or the charset in the document. Any other things that should be documented would be good to know as well. Thanks in advance. Adrian "Doc Boy" Sutton. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]