RE: RFC-2047 Header Character Set Encoding JK + Tomcat 5
For others who might be interested- and the tomcat developers should correct me if I'm wrong since this goes into the archive, Tomcat 5.5.9 or < does not appear to support RFC-2047 for processing MIME-Headers that use different character encodings besides ISO-8859-1. Searching through 1000's of lines of tomcat code, as best I could tell, the code always assumes headers are of ISO-8859-1 type... from the MimeHeaders class down to the ChunkByte class. While both appear to have the ability to specify encoding, they correctly assume the default to be ISO and from what I could tell, the code parsing headers from the Request does nothing to change this. I could find no provisions for processing RFC-2047 compliant headers in any of the connectors. Listed here: http://www.faqs.org/rfcs/rfc2047.html and referenced from the HTTP 1.1 RFC listed here: http://www.faqs.org/rfcs/rfc2616.html (see section 2.2 on basic rules for TEXT, and the definition of headers in section 4.2) and references in JSR-154 servlet 2.4 spec. Is Tomcat still considered a reference implementation? I hope this helps all who run into similar issues and can find no information on them. Now on to the Apache 2 source code to see if it specifies the format required in the Header module API. Byron Keywords: International Headers UTF-8 ISO-8859-1 RFC-2047 -Original Message- From: Guernsey, Byron (GE Consumer & Industrial) Sent: Tuesday, July 12, 2005 4:16 PM To: Tomcat Users List Subject: RFC-2047 Header Character Set Encoding JK + Tomcat 5 Is there a FAQ on how Tomcat 5 and JK1 implement HTTP header character sets? (ie, does it support RFC-2047) We use some single sign-on plugin's at the web server (apache 2) that set specific headers which may contain international characters. The headers are being returned by Tomcat to jsps/servlets in such a way that the strings decode properly only if the browser is forced to view them as UTF-8. This implies that the values are actually UTF-8 encoded, but improperly assumed to be ISO-8859-1 as some point. I have not yet tracked down which component in the chain is at fault. It may very well be that the SSO plugin is calling the Apache API to set Headers with UTF-8 values when they accept only ISO-8859-1 values, or values encoded per RFC-2047. I'd like to find out what mod_jk expects the header values to be when it retrieves them from Apache, and whether Tomcat supports RFC-2047 decoding of header values. If anyone has any experience with this, or can refer me to a discussion or thread about this very item, I'd greatly appreciate the tip. I'm not looking forward to the amount of inspection I'm going to have to do to find the culprit. thanks, Byron - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: RFC-2047 Header Character Set Encoding JK + Tomcat 5
Does URIEncoding affect all HTTP headers or only the URIs? Thanks, Byron -Original Message- From: Tim Funk [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 13, 2005 6:31 AM To: Tomcat Users List Subject: Re: RFC-2047 Header Character Set Encoding JK + Tomcat 5 You may need to add this to your Connector declaration: URIEncoding="UTF-8" -Tim Guernsey, Byron (GE Consumer & Industrial) wrote: > Is there a FAQ on how Tomcat 5 and JK1 implement HTTP header character > sets? (ie, does it support RFC-2047) > > We use some single sign-on plugin's at the web server (apache 2) that > set specific headers which may contain international characters. The > headers are being returned by Tomcat to jsps/servlets in such a way > that the strings decode properly only if the browser is forced to view > them as UTF-8. > > This implies that the values are actually UTF-8 encoded, but > improperly assumed to be ISO-8859-1 as some point. > > I have not yet tracked down which component in the chain is at fault. > It may very well be that the SSO plugin is calling the Apache API to > set Headers with UTF-8 values when they accept only ISO-8859-1 values, > or values encoded per RFC-2047. > > I'd like to find out what mod_jk expects the header values to be when > it retrieves them from Apache, and whether Tomcat supports RFC-2047 > decoding of header values. > > If anyone has any experience with this, or can refer me to a > discussion or thread about this very item, I'd greatly appreciate the > tip. I'm not looking forward to the amount of inspection I'm going to > have to do to find the culprit. > > thanks, > Byron > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: RFC-2047 Header Character Set Encoding JK + Tomcat 5
You may need to add this to your Connector declaration: URIEncoding="UTF-8" -Tim Guernsey, Byron (GE Consumer & Industrial) wrote: Is there a FAQ on how Tomcat 5 and JK1 implement HTTP header character sets? (ie, does it support RFC-2047) We use some single sign-on plugin's at the web server (apache 2) that set specific headers which may contain international characters. The headers are being returned by Tomcat to jsps/servlets in such a way that the strings decode properly only if the browser is forced to view them as UTF-8. This implies that the values are actually UTF-8 encoded, but improperly assumed to be ISO-8859-1 as some point. I have not yet tracked down which component in the chain is at fault. It may very well be that the SSO plugin is calling the Apache API to set Headers with UTF-8 values when they accept only ISO-8859-1 values, or values encoded per RFC-2047. I'd like to find out what mod_jk expects the header values to be when it retrieves them from Apache, and whether Tomcat supports RFC-2047 decoding of header values. If anyone has any experience with this, or can refer me to a discussion or thread about this very item, I'd greatly appreciate the tip. I'm not looking forward to the amount of inspection I'm going to have to do to find the culprit. thanks, Byron - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RFC-2047 Header Character Set Encoding JK + Tomcat 5
Is there a FAQ on how Tomcat 5 and JK1 implement HTTP header character sets? (ie, does it support RFC-2047) We use some single sign-on plugin's at the web server (apache 2) that set specific headers which may contain international characters. The headers are being returned by Tomcat to jsps/servlets in such a way that the strings decode properly only if the browser is forced to view them as UTF-8. This implies that the values are actually UTF-8 encoded, but improperly assumed to be ISO-8859-1 as some point. I have not yet tracked down which component in the chain is at fault. It may very well be that the SSO plugin is calling the Apache API to set Headers with UTF-8 values when they accept only ISO-8859-1 values, or values encoded per RFC-2047. I'd like to find out what mod_jk expects the header values to be when it retrieves them from Apache, and whether Tomcat supports RFC-2047 decoding of header values. If anyone has any experience with this, or can refer me to a discussion or thread about this very item, I'd greatly appreciate the tip. I'm not looking forward to the amount of inspection I'm going to have to do to find the culprit. thanks, Byron - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]