On 17/3/03 18:23, "Dirk-Willem van Gulik" <[EMAIL PROTECTED]> wrote:
>> I am almost sure that it should be made all-the-way around: the client can >> request a specific encoding to the server: See RFC 2616 section 14.2 page >> 102: the Accept-Charset header. > > Or an _ordered_list_ of those as input. See also the Languages while you > are at it; and the Accept: type as well - they are all dimensions of the > same problem. And they are not orthogonal; i.e. there is an easy semantic > coupling between languages and charset - and the Accept list may prompt > you to send a gif or pdf in some cases. Yes... You're absolutely right... I was re-reading that part of HTTP on the tube today, and it gets pretty nasty at that point... Basically, correct me if I'm wrong, from what I understand the client sends a list of "preferred" encodings, while the application should "negotiate" charset, language and type... It gets quite complicated, because for the same URL the client might request a Japanese, shift_jis, text/html view, while another might request a simple image/jpeg... It basically implies that the URL is a resource _for_real_ and that the client can decide the way in which he wants to receive it.. >> On another thought... The cache should store unicode characters "as is", not >> bytes, as those might change for the same request URL depending on the >> different headers in the request... > > You'd have to track which Accept, Accept-Language and Accept-Charset you > negotiated on. As applications may (also) do i18n and localizations > optimizations such as swapping ',' into '.' or abusing charsets and doing > locale specific normalizations of the unicode cast. Yes yes yes... But there is a problem... Proxies and caches... If, for example, in my corporation there are two guys, one using Windows in jp and one using Linux in en_US, if the first guy requests "http://www.vnunet.com/", I'll deliver the page the first time in jp, encoded in shift_jis (let's not track content-type for a sec). Now, when the second guy requests the same page, I'd have to send it in en_US maybe encoded in iso-8859-1... But my corporation proxy (or the cocoon cache), will cache the first version it hits, so, to both of them, I'll end up serving the same Japanese shift_jis content... Not good... Needs more thinking indeed... Pier