On 14.08.2006, at 12:53, Michael Lex wrote:

I think you get Bernd wrong: The problem was, that Bernd wanted
naviserver to return the content in iso-8859-1 encoding. So the number
of bytes and the number of characters should be equal.
The Content-Length has to be the number of bytes returned, but naviserver
computed the value with string bytelength of an utf-8 string, which
was, in Bernds, case greater than the bytelength of the iso8859-1
string.

I believe the best way is to peek at the standard (RFC 2616):

14.13 Content-Length

   The Content-Length entity-header field indicates the size of the
   entity-body, in decimal number of OCTETs, sent to the recipient or,
   in the case of the HEAD method, the size of the entity-body that
   would have been sent had the request been a GET.

       Content-Length    = "Content-Length" ":" 1*DIGIT

   An example is

       Content-Length: 3495

Applications SHOULD use this field to indicate the transfer- length of
   the message-body, unless this is prohibited by the rules in section
   4.4.

This all means that content-length gives total number of *bytes*
in the response, regardless of any encoding applied. This also
means that in the case of UTF8 encoded string "mü" it will be 3
and not 2. If the "mü" is sent with ISO8859-1 then the content
length wold be 2. Allright. I think I get it now.

If this is so, then this means that we cannot possibly give the
correct content-length UNLESS we apply the encoding BEFORE sending
any headers and body, as we would have to either give the correct
value in content-length header OR would need to OMIT the content-length
and turn off the keepalive for that response.


So it seems that chunked encoding is the best possible solution. But
as Gustav said, chunked transfer-encoding is only part of HTTP/1.0 and
some clients don't understand it.

Yes, chunked encoding seems feasible there. For clients not supporting
the chunked responses, we could convert the entire message beforehand
burning some memory and cycles. As there are quite a few of them out
there, this may not be of much importance anyways.
OK, this makes sense.


Btw: Aolserver doesn't encode "on-the-fly", but in memory. So they
know the content-length before the content is sent to the recipient.


On the fly I mean that the message is not encoded in its *entirety*
beforehand, rather it is converted piece-by-piece (hence on-the-fly)
in Ns_ConnWriteVChars().

So, what do we have now?

A. For HTTP 1.0 clients only, we could/should/must either:

   a. omit content-length and turn keepalive off leaving
      the browser to drain the connection until EOF.
   b. calculate the content-length in advance by
      performing the conversion of the message
      in its entirety in the memory using the given
      output encoding

B. For HTTP 1.1 clients we can turn on chunked encoding
   if the output encoding is specified, and is not UTF8
   (basically, this is what Bernd's workaround does).


Is this right? Are there any other options we may have?
Zoran


Michael

---------------------------------------------------------------------- --- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel? cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel


Reply via email to