Pier Fumagalli wrote:
On 16/3/03 23:38, "Vadim Gritsenko" <[EMAIL PROTECTED]> wrote:

true. but you can't have chinese text in US-ASCII, right?

Even if you can not that anybody will be able to read it ;-) So yes, right.


Unicode specifes (somewhere) that any character non representable by the
current charset-encoding should be replaced with a "?" (\u003f) which exists
in all representations...


But I am not convinced that it's sitemap's responsibility to worry
about encoding (from SoC POV).

I restate:


1) I want a way for serializers to indicate to the pipeline what is
the encoding they will be using, so that the pipeline can set the
right HTTP header for it.

+-0, I'm not sure (yet) on this one...


I am almost sure that it should be made all-the-way around: the client can
request a specific encoding to the server: See RFC 2616 section 14.2 page
102: the Accept-Charset header.

I believe that the TextSerializer should return what the client asked in its
request through the "Accept-Charset" header, if this is present.

It it isn't, it should default to what has been specified in the pipeline
(if we use <map:serialize charset="xxxx"/>) or default to the "cocoon
global" configuration...

Oh, that's right, I forgot about the client 'forcing' a charset. Great point.


2) also, i want a way to overwrite the sitemap-wide behavior of every
single serializers, locally, such as

<map:serialize encoding="UTF-8"/>

when the global serializer configurations state they will be using
something else.

But this one is Ok with me and, more over, in line with earlier decision: http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=101826371615914&w=2


I'd say to use this only if the client didn't request a particular
encoding...

On another thought... The cache should store unicode characters "as is", not
bytes, as those might change for the same request URL depending on the
different headers in the request...

Uh, another good point.


Stefano.



Reply via email to