RE: charset stuff

2004-03-04 Thread Pogrebnoy Alexander


 -Original Message-
 From: Paul P Komkoff Jr [mailto:[EMAIL PROTECTED]
 Sent: Sunday, February 22, 2004 11:28 PM
 To: [EMAIL PROTECTED]
 Subject: charset stuff
 
 
 I was jumping around like mad because some sites aren't working.
 Some digging shows that it is because we got wrong charset body
 without correct charset header.
 More deeper digging involved Russian Apache default config (which says
 that mobile User-Agents are braindead and all and any ;charset= header
 portion should be killed regardless of result body encoding).

I had the same troubles when try to update to newest version of kannel.
Previously used - kannel-1.2.1 + SAR patched.
In my case many content providers use any of UTF-8, ISO-8859-5,
WINDOWS-1251, KOI8-r cyrillic charsets, and indicate it either in the
Content-Type HTTP reply header or xml preamble.
All of it works with ver-1.2.1, because 1.2.1 has not code blocks that adapt
content body's charset to device in wap_appl.c and wml_compiler.c, and
always return binary wmlc data in UTF-8.

Considering adapt content feature in newest version, content with this
charsets seems must to be work ok, but don't. Allow me to express one's
thoughts about why it may happens...

. first of all... device_headers == NULL for S_MethodInvoke_Ind in
return_reply.
for S_MethodInvoke_Ind device_headers must be =
orig_event-u.S_MethodInvoke_Ind.session_headers, not .request_headers like
in S_Unit_MethodInvoke_Ind.

. error in comparison (octstr_case_compare(charset, octstr_imm(UTF-8)) 
0) in return_reply and wml_compile. It must be !=0, because for any of
WINDOWS-125X octstr_case_compare return 1.

. in return_reply content charset adoption must be based on both charset
indication (http header and  xml preamble), and in 2-nd case will delete it
indication.

I propose to examine following attached patch that imho corrects above.



PS: 
Don't forget update libxml for security reason according to 
http://cve.mitre.org/cgi-bin/cvename.cgi?name=CAN-2004-0110













charset.patch
Description: Binary data


Re: charset stuff

2004-02-24 Thread Stipe Tolj
Hi Paul,

Paul P Komkoff Jr schrieb:
 
 I was jumping around like mad because some sites aren't working.
 Some digging shows that it is because we got wrong charset body
 without correct charset header.
 More deeper digging involved Russian Apache default config (which says
 that mobile User-Agents are braindead and all and any ;charset= header
 portion should be killed regardless of result body encoding).
 
 And finally I've found add_charset_headers in wap-appl.c and
 corresponding part in return_reply.
 
 Who and for what purpose coded it _that_ way?
 On the first sight it produces complete mess, which cannot be
 correctly xml'ed.
 
 For now I did (attached patch) and happily watching wapbox logs
 with failed xml compile messages count drastically reduced.
 
 You can try this at home too.

can you point out what the patch actually fixes/solves/does better,
please?! Of course we will be able to come thought this. But it
makes things easier if you shade the light directly to our eyes ;)

Stipe

mailto:[EMAIL PROTECTED]
---
Wapme Systems AG

Münsterstr. 248
40470 Düsseldorf, NRW, Germany

phone: +49.211.74845.0
fax: +49.211.74845.299

mailto:[EMAIL PROTECTED]
http://www.wapme-systems.de/
---

-BEGIN PGP PUBLIC KEY BLOCK-
Version: GnuPG v1.2.2 (Cygwin)

mIsEP6mcYwEEAMDnUiUwrbb+xwTFWN6TxF2+XZu7/alwJMeCwMBRvXtPZqfjpPhS
OkBpU0F4TrVuugz1HINTSaJTYq10AzDQXp5NkyWgckqW79nPAWuOX0dicbJk+cN2
nM2TI4KaxUDe6u8hghNEnH/i2lXsUu9apnP/iixzV81VC2je3uc9hZpnAAYptEVT
dGlwZSBUb2xqIChUZWNobm9sb2d5IENlbnRlciAmIFJlc2VhcmNoIExhYikgPHRv
bGpAd2FwbWUtc3lzdGVtcy5kZT6ItAQTAQIAHgUCP6mcYwIbAwYLCQgHAwIDFQID
AxYCAQIeAQIXgAAKCRABV0w1BqPYRuSqA/wPzsQxao2YePENCtgRTrO86U6zg3sl
OcS6CJFI4FZP5h/xD3GRsNH1+MPSvZlomDdpFnr547DGz/Kq9MXuQwVvlVig5yWZ
K5dtKp1r5YLhxJQBhfirZbRFFnYmf19f18J8OoS28tuFVftDl1AIwJS3HLyBTv6H
g2HyLAEKQIp30Q==
=aYCI
-END PGP PUBLIC KEY BLOCK-



charset stuff

2004-02-22 Thread Paul P Komkoff Jr
I was jumping around like mad because some sites aren't working.
Some digging shows that it is because we got wrong charset body
without correct charset header.
More deeper digging involved Russian Apache default config (which says
that mobile User-Agents are braindead and all and any ;charset= header
portion should be killed regardless of result body encoding).

And finally I've found add_charset_headers in wap-appl.c and
corresponding part in return_reply.

Who and for what purpose coded it _that_ way?
On the first sight it produces complete mess, which cannot be
correctly xml'ed.

For now I did (attached patch) and happily watching wapbox logs
with failed xml compile messages count drastically reduced.

You can try this at home too.

Enjoy.
-- 
Paul P 'Stingray' Komkoff Jr // http://stingr.net/key - my pgp key
 This message represents the official view of the voices in my head
# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2004/02/22 20:47:20+03:00 [EMAIL PROTECTED] 
#   Rework charset stuff, charset recoding and accept-charset headers part 0.
# 
# gw/wap-appl.c
#   2004/02/22 20:47:18+03:00 [EMAIL PROTECTED] +25 -0
#   replace crappy charset header and recoding with simplier
# 
# BitKeeper/etc/ignore
#   2004/02/22 20:47:18+03:00 [EMAIL PROTECTED] +6 -0
#   add couple ignores
# 
diff -Nru a/gw/wap-appl.c b/gw/wap-appl.c
--- a/gw/wap-appl.c Mon Feb 23 00:19:56 2004
+++ b/gw/wap-appl.c Mon Feb 23 00:19:56 2004
@@ -103,6 +103,7 @@
 #endif
 
 #define ENABLE_NOT_ACCEPTED 
+#define NEW_CHARSETS
 
 /*
  * Give the status the module:
@@ -668,6 +669,10 @@
  * to handle those charsets for all content types, just WML/XHTML. */
 static void add_charset_headers(List *headers) 
 {
+#ifdef NEW_CHARSETS
+if (!http_charset_accepted(headers, utf-8))
+http_header_add(headers, Accept-Charset, utf-8);
+#else
 long i, len;
 
 gw_assert(charsets != NULL);
@@ -677,6 +682,7 @@
 if (!http_charset_accepted(headers, charset))
 http_header_add(headers, Accept-Charset, charset);
 }
+#endif
 }
 
 
@@ -1005,11 +1011,29 @@
 
 /* get charset used in content body, default to utf-8 if not present */
 if ((charset = find_charset_encoding(content.body)) == NULL)
+#ifdef NEW_CHARSETS
+if (octstr_len(content.charset)  0) {
+charset = octstr_duplicate(content.charset);
+} else {
+charset = octstr_imm(UTF-8);
+}
+#else
 charset = octstr_imm(UTF-8); 
+#endif
 
 /* convert to utf-8 if original charset is not utf-8 
  * and device supports it */
 
+#ifdef NEW_CHARSETS
+if (octstr_case_compare(charset, octstr_imm(UTF-8)) != 0) {
+debug(wsp,0,Converting wml/xhtml from charset %s to UTF-8,
+octstr_get_cstr(charset));
+if (charset_convert(content.body, octstr_get_cstr(charset), UTF-8) 
= 0) {
+octstr_destroy(content.charset);
+content.charset = octstr_create(UTF-8);
+}
+}
+#else
 if (octstr_case_compare(charset, octstr_imm(UTF-8))  0 
 !http_charset_accepted(device_headers, octstr_get_cstr(charset))) {
 if (!http_charset_accepted(device_headers, UTF-8)) {
@@ -1047,6 +1071,7 @@
 }
 }
 }
+#endif
 
 octstr_destroy(charset);
 }