Bug#291735: [PATCH] Bug#291735: w3m shouldn't "simplify" page's charset

Karsten Schoelzel Wed, 03 May 2006 18:52:04 -0700

package w3m
tag 291735 patch
thanks

On Sat, Jan 22, 2005 at 09:59:27PM +0100, Samuel Thibault wrote:
> Package: w3m
> Version: 0.5.1-3
> Severity: normal
> 
> 
> Hi,
> 
> Say I have a test.html.utf-8 page on some web server:
> <body>
> test
> </body>
> 
> The web server properly announces that it is an utf-8 encoded page:
> Content-Type: text/html; charset=utf-8
> 
> But w3m simplify this into US-ASCII, because the page indeed doesn't
> contain anything than can't be coded in plain ascii:
> 
> [snipped]
> 
> The problem comes if I put a form in my page. Since the page is
> announced as utf-8-encoded, w3m should default to using utf-8 to code
> the values. But since w3m simplifies charset into US-ASCII, it will
> default to that to code the values (and won't know how to code accents &
> co).
> 
> W3m should *not* simplify charset.
> 
Hi,


here is a patch so that w3m does not simplify the charset with
autodetect of charsets on. Another possibility is to set the option
"Automatic charset detect when loading" to OFF.

Regards,
-- 
Karsten Schölzel        | Email:  [EMAIL PROTECTED]
Väderleden 9 4:98       | Jabber: [EMAIL PROTECTED]
97633 Luleå             | VoIP:   sip:[EMAIL PROTECTED]
Sweden                  |         sip:[EMAIL PROTECTED]
                        | Tel:    +4918015855857712
                        | Mobile: +46706725974

Use the hint instead of US_ASCII in wc_auto_detect.
Fixes Debian bug #291735: w3m shouldn't "simplify" page's charset

---
commit 5ab3cec76b0514cc1cb333889ba34de5f82800c7
tree 249e2fcf17a83a378caeb7f829afe4e592723ccf
parent a3449ff39ec4a3cda629873f4e2fc37b026a9327
author Karsten Schoelzel <[EMAIL PROTECTED](none)> Sun, 06 Nov 2005 00:25:35 
+0100
committer Karsten Schoelzel <[EMAIL PROTECTED](none)> Sun, 06 Nov 2005 00:25:35 
+0100

 libwc/detect.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/libwc/detect.c b/libwc/detect.c
--- a/libwc/detect.c
+++ b/libwc/detect.c
@@ -99,7 +99,7 @@ wc_auto_detect(char *is, size_t len, wc_
     for (; p < ep && ! WC_DETECT_MAP[*p]; p++)
        ;
     if (p == ep)
-       return WC_CES_US_ASCII;
+       return hint;
 
     switch (hint) {
     case WC_CES_ISO_2022_JP:

Bug#291735: [PATCH] Bug#291735: w3m shouldn't "simplify" page's charset

Reply via email to