Re: [5.0] content-type and charset issues
meta http-equiv=Content-Type content=text/html; charset=windows-1253 Remy Maucherat wrote: Kazuhiro Kazama wrote: Remy, Tomcat 5.0 always adds a charset=ISO-8859-1 to the content type. While this is I think relatively legal, it is rather risky (it causes problems with some clients, as I've read on tomcat-user), and very dubious when dealing with non text data. I received the report that the same(?) charset problem exists in Tomcat 4.1.29 from Japanese developers. They said that Tomcat 4.1.27 is ok. Could you check whether the same problem exists or not? If not, I will analyze Tomcat 4.1.29 and send the patch. This has been fixed already. However, I have to point out that the client is not compliant (not specifying a charset is equivalent to specifying charset=ISO-8859-1). Remy This might be true, but there is the meta http-equiv tag in HTML that does not work as expected when setting the encoding in the HTTP response. If both HTTP encoding and meta HTML encoding are specified, the HTTP encoding takes precedence (according to section 5.2.2 of HTML 4.01 spec) and the users still need to manually set the encoding from the user-agents interface. For example, imagine having a set of html files served by tomcat, which all are encoded with non iso-8859-1 encoding. Even if all of them have a meta tag with the correct encoding, tomcat adds by default the iso-8859-1 to the HTTP response. The user seens grabage. Please note that Apache server does the same in a default installation but this is simply unacceptable in a multi-encoding installation. I have to remove the relevant directive from the httpd.conf file in all apache installations because my users are not able to write non iso-8859-1 html files. Please also note from section 5.2.2 of HTML 4.01 The HTTP protocol ([RFC2616], section 3.7.1) mentions ISO-8859-1 as a default character encoding when the charset parameter is absent from the Content-Type header field. In practice, this recommendation has proved useless because some servers don't allow a charset parameter to be sent, and others may not be configured to send the parameter. Therefore, user agents must not assume any default value for the charset parameter. I would add to this, that servers simply setting a character encoding by defualt to ALL responses is at least as bad as not setting at all. Stefanos Karasavvidis -- == Stefanos Karasavvidis Electronic Computer Engineer e-mail : [EMAIL PROTECTED] Multimedia Systems Center S.A. Kissamou 178 73100 Chania - Crete - Hellas http://www.msc.gr Tel : +30 2821 0 88447 Fax : +30 2821 0 88427 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [5.0] content-type and charset issues
Kazuhiro Kazama wrote: Remy, Tomcat 5.0 always adds a charset=ISO-8859-1 to the content type. While this is I think relatively legal, it is rather risky (it causes problems with some clients, as I've read on tomcat-user), and very dubious when dealing with non text data. I received the report that the same(?) charset problem exists in Tomcat 4.1.29 from Japanese developers. They said that Tomcat 4.1.27 is ok. Could you check whether the same problem exists or not? If not, I will analyze Tomcat 4.1.29 and send the patch. This has been fixed already. However, I have to point out that the client is not compliant (not specifying a charset is equivalent to specifying charset=ISO-8859-1). Remy - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [5.0] content-type and charset issues
Bill Barker wrote: Hi, Tomcat 5.0 always adds a charset=ISO-8859-1 to the content type. While this is I think relatively legal, it is rather risky (it causes problems with some clients, as I've read on tomcat-user), and very dubious when dealing with non text data. Example: GET /tomcat.gif HTTP/1.0 User-Agent: ApacheBench/1.3d Host: 127.0.0.1 Accept: */* HTTP/1.1 200 OK ETag: W/1934-1068549702000 Last-Modified: Tue, 11 Nov 2003 11:21:42 GMT Content-Type: image/gif;charset=ISO-8859-1 Content-Length: 1934 Date: Tue, 11 Nov 2003 14:59:56 GMT Server: Apache-Coyote/1.1 Connection: close (lol) Maybe Jan's changes to charset handling caused that. If charset is not explicitely added, I think it should not be added to content-type either. What about adding the charset only when contentType.startsWith(text)? A better choice would be when usingWriter is true, but that's not available here. Since Writer output is sent out as iso-latin-1 if the Servlet doen't set the charset, I think that it would be better to add the charset to the header, rather than trust that the browser's default encoding is compatible. If the charset is iso-latin-1, then it will be displayed. The startsWith could possibly be going too far, unless this is written in black and white in the HTTP spec (I didn't check). I think my patch restored the previous behavior (and saved some object allocation in the general case). People seemed to be fine with it, unlike the new behavior (for which I've seen complaints on tomcat-user). Remy - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [5.0] content-type and charset issues
Remy, From: Remy Maucherat [EMAIL PROTECTED] Subject: [5.0] content-type and charset issues Date: Tue, 11 Nov 2003 16:10:24 +0100 Message-ID: [EMAIL PROTECTED] Tomcat 5.0 always adds a charset=ISO-8859-1 to the content type. While this is I think relatively legal, it is rather risky (it causes problems with some clients, as I've read on tomcat-user), and very dubious when dealing with non text data. I received the report that the same(?) charset problem exists in Tomcat 4.1.29 from Japanese developers. They said that Tomcat 4.1.27 is ok. Could you check whether the same problem exists or not? If not, I will analyze Tomcat 4.1.29 and send the patch. Kazuhiro Kazama ([EMAIL PROTECTED]) NTT Network Innovation Laboratories - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [5.0] content-type and charset issues
Remy Maucherat wrote: Hi, Tomcat 5.0 always adds a charset=ISO-8859-1 to the content type. While this is I think relatively legal, it is rather risky (it causes problems with some clients, as I've read on tomcat-user), and very dubious when dealing with non text data. Example: GET /tomcat.gif HTTP/1.0 User-Agent: ApacheBench/1.3d Host: 127.0.0.1 Accept: */* HTTP/1.1 200 OK ETag: W/1934-1068549702000 Last-Modified: Tue, 11 Nov 2003 11:21:42 GMT Content-Type: image/gif;charset=ISO-8859-1 Content-Length: 1934 Date: Tue, 11 Nov 2003 14:59:56 GMT Server: Apache-Coyote/1.1 Connection: close (lol) Maybe Jan's changes to charset handling caused that. If charset is not explicitely added, I think it should not be added to content-type either. Other than that, 5.0.14 looks quite good :) I have fixed it, but the patch which caused it is not very optimal, as it uses String concatenations. IMO, if the user sets the full String (including the charset) using setContentType, then we should probably use it rather than reconstruct it again. I'll do a performance analysis to make sure there are no regressions (after all, similar problems could have been introduced). Rémy - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [5.0] content-type and charset issues
- Original Message - From: Remy Maucherat [EMAIL PROTECTED] To: Tomcat Developers List [EMAIL PROTECTED] Sent: Tuesday, November 11, 2003 7:10 AM Subject: [5.0] content-type and charset issues Hi, Tomcat 5.0 always adds a charset=ISO-8859-1 to the content type. While this is I think relatively legal, it is rather risky (it causes problems with some clients, as I've read on tomcat-user), and very dubious when dealing with non text data. Example: GET /tomcat.gif HTTP/1.0 User-Agent: ApacheBench/1.3d Host: 127.0.0.1 Accept: */* HTTP/1.1 200 OK ETag: W/1934-1068549702000 Last-Modified: Tue, 11 Nov 2003 11:21:42 GMT Content-Type: image/gif;charset=ISO-8859-1 Content-Length: 1934 Date: Tue, 11 Nov 2003 14:59:56 GMT Server: Apache-Coyote/1.1 Connection: close (lol) Maybe Jan's changes to charset handling caused that. If charset is not explicitely added, I think it should not be added to content-type either. What about adding the charset only when contentType.startsWith(text)? A better choice would be when usingWriter is true, but that's not available here. Since Writer output is sent out as iso-latin-1 if the Servlet doen't set the charset, I think that it would be better to add the charset to the header, rather than trust that the browser's default encoding is compatible. Other than that, 5.0.14 looks quite good :) Rémy - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] This message is intended only for the use of the person(s) listed above as the intended recipient(s), and may contain information that is PRIVILEGED and CONFIDENTIAL. If you are not an intended recipient, you may not read, copy, or distribute this message or any attachment. If you received this communication in error, please notify us immediately by e-mail and then delete all copies of this message and any attachments. In addition you should be aware that ordinary (unencrypted) e-mail sent through the Internet is not secure. Do not send confidential or sensitive information, such as social security numbers, account numbers, personal identification numbers and passwords, to us via ordinary (unencrypted) e-mail. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]