RE: charset stuff
-Original Message- From: Paul P Komkoff Jr [mailto:[EMAIL PROTECTED] Sent: Sunday, February 22, 2004 11:28 PM To: [EMAIL PROTECTED] Subject: charset stuff I was jumping around like mad because some sites aren't working. Some digging shows that it is because we got wrong charset body without correct charset header. More deeper digging involved Russian Apache default config (which says that mobile User-Agents are braindead and all and any ;charset= header portion should be killed regardless of result body encoding). I had the same troubles when try to update to newest version of kannel. Previously used - kannel-1.2.1 + SAR patched. In my case many content providers use any of UTF-8, ISO-8859-5, WINDOWS-1251, KOI8-r cyrillic charsets, and indicate it either in the Content-Type HTTP reply header or xml preamble. All of it works with ver-1.2.1, because 1.2.1 has not code blocks that adapt content body's charset to device in wap_appl.c and wml_compiler.c, and always return binary wmlc data in UTF-8. Considering adapt content feature in newest version, content with this charsets seems must to be work ok, but don't. Allow me to express one's thoughts about why it may happens... . first of all... device_headers == NULL for S_MethodInvoke_Ind in return_reply. for S_MethodInvoke_Ind device_headers must be = orig_event-u.S_MethodInvoke_Ind.session_headers, not .request_headers like in S_Unit_MethodInvoke_Ind. . error in comparison (octstr_case_compare(charset, octstr_imm(UTF-8)) 0) in return_reply and wml_compile. It must be !=0, because for any of WINDOWS-125X octstr_case_compare return 1. . in return_reply content charset adoption must be based on both charset indication (http header and xml preamble), and in 2-nd case will delete it indication. I propose to examine following attached patch that imho corrects above. PS: Don't forget update libxml for security reason according to http://cve.mitre.org/cgi-bin/cvename.cgi?name=CAN-2004-0110 charset.patch Description: Binary data
Re: charset stuff
Hi Paul, Paul P Komkoff Jr schrieb: I was jumping around like mad because some sites aren't working. Some digging shows that it is because we got wrong charset body without correct charset header. More deeper digging involved Russian Apache default config (which says that mobile User-Agents are braindead and all and any ;charset= header portion should be killed regardless of result body encoding). And finally I've found add_charset_headers in wap-appl.c and corresponding part in return_reply. Who and for what purpose coded it _that_ way? On the first sight it produces complete mess, which cannot be correctly xml'ed. For now I did (attached patch) and happily watching wapbox logs with failed xml compile messages count drastically reduced. You can try this at home too. can you point out what the patch actually fixes/solves/does better, please?! Of course we will be able to come thought this. But it makes things easier if you shade the light directly to our eyes ;) Stipe mailto:[EMAIL PROTECTED] --- Wapme Systems AG Münsterstr. 248 40470 Düsseldorf, NRW, Germany phone: +49.211.74845.0 fax: +49.211.74845.299 mailto:[EMAIL PROTECTED] http://www.wapme-systems.de/ --- -BEGIN PGP PUBLIC KEY BLOCK- Version: GnuPG v1.2.2 (Cygwin) mIsEP6mcYwEEAMDnUiUwrbb+xwTFWN6TxF2+XZu7/alwJMeCwMBRvXtPZqfjpPhS OkBpU0F4TrVuugz1HINTSaJTYq10AzDQXp5NkyWgckqW79nPAWuOX0dicbJk+cN2 nM2TI4KaxUDe6u8hghNEnH/i2lXsUu9apnP/iixzV81VC2je3uc9hZpnAAYptEVT dGlwZSBUb2xqIChUZWNobm9sb2d5IENlbnRlciAmIFJlc2VhcmNoIExhYikgPHRv bGpAd2FwbWUtc3lzdGVtcy5kZT6ItAQTAQIAHgUCP6mcYwIbAwYLCQgHAwIDFQID AxYCAQIeAQIXgAAKCRABV0w1BqPYRuSqA/wPzsQxao2YePENCtgRTrO86U6zg3sl OcS6CJFI4FZP5h/xD3GRsNH1+MPSvZlomDdpFnr547DGz/Kq9MXuQwVvlVig5yWZ K5dtKp1r5YLhxJQBhfirZbRFFnYmf19f18J8OoS28tuFVftDl1AIwJS3HLyBTv6H g2HyLAEKQIp30Q== =aYCI -END PGP PUBLIC KEY BLOCK-
charset stuff
I was jumping around like mad because some sites aren't working. Some digging shows that it is because we got wrong charset body without correct charset header. More deeper digging involved Russian Apache default config (which says that mobile User-Agents are braindead and all and any ;charset= header portion should be killed regardless of result body encoding). And finally I've found add_charset_headers in wap-appl.c and corresponding part in return_reply. Who and for what purpose coded it _that_ way? On the first sight it produces complete mess, which cannot be correctly xml'ed. For now I did (attached patch) and happily watching wapbox logs with failed xml compile messages count drastically reduced. You can try this at home too. Enjoy. -- Paul P 'Stingray' Komkoff Jr // http://stingr.net/key - my pgp key This message represents the official view of the voices in my head # This is a BitKeeper generated diff -Nru style patch. # # ChangeSet # 2004/02/22 20:47:20+03:00 [EMAIL PROTECTED] # Rework charset stuff, charset recoding and accept-charset headers part 0. # # gw/wap-appl.c # 2004/02/22 20:47:18+03:00 [EMAIL PROTECTED] +25 -0 # replace crappy charset header and recoding with simplier # # BitKeeper/etc/ignore # 2004/02/22 20:47:18+03:00 [EMAIL PROTECTED] +6 -0 # add couple ignores # diff -Nru a/gw/wap-appl.c b/gw/wap-appl.c --- a/gw/wap-appl.c Mon Feb 23 00:19:56 2004 +++ b/gw/wap-appl.c Mon Feb 23 00:19:56 2004 @@ -103,6 +103,7 @@ #endif #define ENABLE_NOT_ACCEPTED +#define NEW_CHARSETS /* * Give the status the module: @@ -668,6 +669,10 @@ * to handle those charsets for all content types, just WML/XHTML. */ static void add_charset_headers(List *headers) { +#ifdef NEW_CHARSETS +if (!http_charset_accepted(headers, utf-8)) +http_header_add(headers, Accept-Charset, utf-8); +#else long i, len; gw_assert(charsets != NULL); @@ -677,6 +682,7 @@ if (!http_charset_accepted(headers, charset)) http_header_add(headers, Accept-Charset, charset); } +#endif } @@ -1005,11 +1011,29 @@ /* get charset used in content body, default to utf-8 if not present */ if ((charset = find_charset_encoding(content.body)) == NULL) +#ifdef NEW_CHARSETS +if (octstr_len(content.charset) 0) { +charset = octstr_duplicate(content.charset); +} else { +charset = octstr_imm(UTF-8); +} +#else charset = octstr_imm(UTF-8); +#endif /* convert to utf-8 if original charset is not utf-8 * and device supports it */ +#ifdef NEW_CHARSETS +if (octstr_case_compare(charset, octstr_imm(UTF-8)) != 0) { +debug(wsp,0,Converting wml/xhtml from charset %s to UTF-8, +octstr_get_cstr(charset)); +if (charset_convert(content.body, octstr_get_cstr(charset), UTF-8) = 0) { +octstr_destroy(content.charset); +content.charset = octstr_create(UTF-8); +} +} +#else if (octstr_case_compare(charset, octstr_imm(UTF-8)) 0 !http_charset_accepted(device_headers, octstr_get_cstr(charset))) { if (!http_charset_accepted(device_headers, UTF-8)) { @@ -1047,6 +1071,7 @@ } } } +#endif octstr_destroy(charset); }