Re: [5.0] content-type and charset issues

2003-11-15 Thread Stefanos Karasavvidis


meta http-equiv=Content-Type content=text/html; charset=windows-1253
Remy Maucherat wrote:
Kazuhiro Kazama wrote:

Remy,

Tomcat 5.0 always adds a charset=ISO-8859-1 to the content type. 
While this is I think relatively legal, it is rather risky (it causes 
problems with some clients, as I've read on tomcat-user), and very 
dubious when dealing with non text data.


I received the report that the same(?) charset problem exists in
Tomcat 4.1.29 from Japanese developers. They said that Tomcat 4.1.27
is ok.
Could you check whether the same problem exists or not?

If not, I will analyze Tomcat 4.1.29 and send the patch.


This has been fixed already. However, I have to point out that the 
client is not compliant (not specifying a charset is equivalent to 
specifying charset=ISO-8859-1).

Remy
This might be true, but there is the meta http-equiv tag in HTML
that does not work as expected when setting the encoding in the HTTP 
response. If both HTTP encoding and meta HTML encoding are specified, 
the HTTP encoding takes precedence (according to section 5.2.2 of HTML 
4.01 spec) and the users still need to manually set the encoding from 
the user-agents interface.

For example, imagine having a set of html files served by tomcat, which 
all are encoded with non iso-8859-1 encoding. Even if all of them have a 
meta tag with the correct encoding, tomcat adds by default the 
iso-8859-1 to the HTTP response. The user seens grabage.

Please note that Apache server does the same in a default installation 
but this is simply unacceptable in a multi-encoding installation. I 
have to remove the relevant directive from the httpd.conf file in all 
apache installations because my users are not able to write non 
iso-8859-1 html files.

Please also note from section 5.2.2 of HTML 4.01
The HTTP protocol ([RFC2616], section 3.7.1) mentions ISO-8859-1 as a 
default character encoding when the charset parameter is absent from 
the Content-Type header field. In practice, this recommendation has 
proved useless because some servers don't allow a charset parameter to 
be sent, and others may not be configured to send the parameter. 
Therefore, user agents must not assume any default value for the 
charset parameter.

I would add to this, that servers simply setting a character encoding by 
defualt to ALL responses is at least as bad as not setting at all.

Stefanos Karasavvidis



--
==
Stefanos Karasavvidis
Electronic  Computer Engineer
e-mail : [EMAIL PROTECTED]
Multimedia Systems Center S.A.
Kissamou 178
73100 Chania - Crete - Hellas
http://www.msc.gr
Tel : +30 2821 0 88447
Fax : +30 2821 0 88427
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [5.0] content-type and charset issues

2003-11-13 Thread Remy Maucherat
Kazuhiro Kazama wrote:
Remy,

Tomcat 5.0 always adds a charset=ISO-8859-1 to the content type. While 
this is I think relatively legal, it is rather risky (it causes problems 
with some clients, as I've read on tomcat-user), and very dubious when 
dealing with non text data.
I received the report that the same(?) charset problem exists in
Tomcat 4.1.29 from Japanese developers. They said that Tomcat 4.1.27
is ok.
Could you check whether the same problem exists or not?

If not, I will analyze Tomcat 4.1.29 and send the patch.
This has been fixed already. However, I have to point out that the 
client is not compliant (not specifying a charset is equivalent to 
specifying charset=ISO-8859-1).

Remy

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [5.0] content-type and charset issues

2003-11-12 Thread Remy Maucherat
Bill Barker wrote:
Hi,

Tomcat 5.0 always adds a charset=ISO-8859-1 to the content type. While
this is I think relatively legal, it is rather risky (it causes problems
with some clients, as I've read on tomcat-user), and very dubious when
dealing with non text data.
Example:
GET /tomcat.gif HTTP/1.0
User-Agent: ApacheBench/1.3d
Host: 127.0.0.1
Accept: */*
HTTP/1.1 200 OK
ETag: W/1934-1068549702000
Last-Modified: Tue, 11 Nov 2003 11:21:42 GMT
Content-Type: image/gif;charset=ISO-8859-1
Content-Length: 1934
Date: Tue, 11 Nov 2003 14:59:56 GMT
Server: Apache-Coyote/1.1
Connection: close
(lol)

Maybe Jan's changes to charset handling caused that.
If charset is not explicitely added, I think it should not be added to
content-type either.
What about adding the charset only when contentType.startsWith(text)?  A
better choice would be when usingWriter is true, but that's not available
here.
Since Writer output is sent out as iso-latin-1 if the Servlet doen't set the
charset, I think that it would be better to add the charset to the header,
rather than trust that the browser's default encoding is compatible.
If the charset is iso-latin-1, then it will be displayed.

The startsWith could possibly be going too far, unless this is written 
in black and white in the HTTP spec (I didn't check).
I think my patch restored the previous behavior (and saved some object 
allocation in the general case). People seemed to be fine with it, 
unlike the new behavior (for which I've seen complaints on tomcat-user).

Remy



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [5.0] content-type and charset issues

2003-11-12 Thread Kazuhiro Kazama
Remy,

From: Remy Maucherat [EMAIL PROTECTED]
Subject: [5.0] content-type and charset issues
Date: Tue, 11 Nov 2003 16:10:24 +0100
Message-ID: [EMAIL PROTECTED]
 Tomcat 5.0 always adds a charset=ISO-8859-1 to the content type. While 
 this is I think relatively legal, it is rather risky (it causes problems 
 with some clients, as I've read on tomcat-user), and very dubious when 
 dealing with non text data.

I received the report that the same(?) charset problem exists in
Tomcat 4.1.29 from Japanese developers. They said that Tomcat 4.1.27
is ok.

Could you check whether the same problem exists or not?

If not, I will analyze Tomcat 4.1.29 and send the patch.

Kazuhiro Kazama ([EMAIL PROTECTED]) NTT Network Innovation Laboratories


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [5.0] content-type and charset issues

2003-11-11 Thread Remy Maucherat
Remy Maucherat wrote:

Hi,

Tomcat 5.0 always adds a charset=ISO-8859-1 to the content type. While 
this is I think relatively legal, it is rather risky (it causes problems 
with some clients, as I've read on tomcat-user), and very dubious when 
dealing with non text data.

Example:
GET /tomcat.gif HTTP/1.0
User-Agent: ApacheBench/1.3d
Host: 127.0.0.1
Accept: */*
HTTP/1.1 200 OK
ETag: W/1934-1068549702000
Last-Modified: Tue, 11 Nov 2003 11:21:42 GMT
Content-Type: image/gif;charset=ISO-8859-1
Content-Length: 1934
Date: Tue, 11 Nov 2003 14:59:56 GMT
Server: Apache-Coyote/1.1
Connection: close
(lol)

Maybe Jan's changes to charset handling caused that.
If charset is not explicitely added, I think it should not be added to 
content-type either.

Other than that, 5.0.14 looks quite good :)
I have fixed it, but the patch which caused it is not very optimal, as 
it uses String concatenations. IMO, if the user sets the full String 
(including the charset) using setContentType, then we should probably 
use it rather than reconstruct it again.

I'll do a performance analysis to make sure there are no regressions 
(after all, similar problems could have been introduced).

Rémy



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [5.0] content-type and charset issues

2003-11-11 Thread Bill Barker

- Original Message -
From: Remy Maucherat [EMAIL PROTECTED]
To: Tomcat Developers List [EMAIL PROTECTED]
Sent: Tuesday, November 11, 2003 7:10 AM
Subject: [5.0] content-type and charset issues


 Hi,

 Tomcat 5.0 always adds a charset=ISO-8859-1 to the content type. While
 this is I think relatively legal, it is rather risky (it causes problems
 with some clients, as I've read on tomcat-user), and very dubious when
 dealing with non text data.

 Example:
 GET /tomcat.gif HTTP/1.0
 User-Agent: ApacheBench/1.3d
 Host: 127.0.0.1
 Accept: */*

 HTTP/1.1 200 OK
 ETag: W/1934-1068549702000
 Last-Modified: Tue, 11 Nov 2003 11:21:42 GMT
 Content-Type: image/gif;charset=ISO-8859-1
 Content-Length: 1934
 Date: Tue, 11 Nov 2003 14:59:56 GMT
 Server: Apache-Coyote/1.1
 Connection: close

 (lol)

 Maybe Jan's changes to charset handling caused that.
 If charset is not explicitely added, I think it should not be added to
 content-type either.


What about adding the charset only when contentType.startsWith(text)?  A
better choice would be when usingWriter is true, but that's not available
here.

Since Writer output is sent out as iso-latin-1 if the Servlet doen't set the
charset, I think that it would be better to add the charset to the header,
rather than trust that the browser's default encoding is compatible.

 Other than that, 5.0.14 looks quite good :)

 Rémy


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



This message is intended only for the use of the person(s) listed above as the 
intended recipient(s), and may contain information that is PRIVILEGED and 
CONFIDENTIAL.  If you are not an intended recipient, you may not read, copy, or 
distribute this message or any attachment. If you received this communication in 
error, please notify us immediately by e-mail and then delete all copies of this 
message and any attachments.

In addition you should be aware that ordinary (unencrypted) e-mail sent through the 
Internet is not secure. Do not send confidential or sensitive information, such as 
social security numbers, account numbers, personal identification numbers and 
passwords, to us via ordinary (unencrypted) e-mail.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]