package w3m tag 291735 patch thanks On Sat, Jan 22, 2005 at 09:59:27PM +0100, Samuel Thibault wrote: > Package: w3m > Version: 0.5.1-3 > Severity: normal > > > Hi, > > Say I have a test.html.utf-8 page on some web server: > <body> > test > </body> > > The web server properly announces that it is an utf-8 encoded page: > Content-Type: text/html; charset=utf-8 > > But w3m simplify this into US-ASCII, because the page indeed doesn't > contain anything than can't be coded in plain ascii: > > [snipped] > > The problem comes if I put a form in my page. Since the page is > announced as utf-8-encoded, w3m should default to using utf-8 to code > the values. But since w3m simplifies charset into US-ASCII, it will > default to that to code the values (and won't know how to code accents & > co). > > W3m should *not* simplify charset. > Hi,
here is a patch so that w3m does not simplify the charset with autodetect of charsets on. Another possibility is to set the option "Automatic charset detect when loading" to OFF. Regards, -- Karsten Schölzel | Email: [EMAIL PROTECTED] Väderleden 9 4:98 | Jabber: [EMAIL PROTECTED] 97633 Luleå | VoIP: sip:[EMAIL PROTECTED] Sweden | sip:[EMAIL PROTECTED] | Tel: +4918015855857712 | Mobile: +46706725974
Use the hint instead of US_ASCII in wc_auto_detect. Fixes Debian bug #291735: w3m shouldn't "simplify" page's charset --- commit 5ab3cec76b0514cc1cb333889ba34de5f82800c7 tree 249e2fcf17a83a378caeb7f829afe4e592723ccf parent a3449ff39ec4a3cda629873f4e2fc37b026a9327 author Karsten Schoelzel <[EMAIL PROTECTED](none)> Sun, 06 Nov 2005 00:25:35 +0100 committer Karsten Schoelzel <[EMAIL PROTECTED](none)> Sun, 06 Nov 2005 00:25:35 +0100 libwc/detect.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/libwc/detect.c b/libwc/detect.c --- a/libwc/detect.c +++ b/libwc/detect.c @@ -99,7 +99,7 @@ wc_auto_detect(char *is, size_t len, wc_ for (; p < ep && ! WC_DETECT_MAP[*p]; p++) ; if (p == ep) - return WC_CES_US_ASCII; + return hint; switch (hint) { case WC_CES_ISO_2022_JP: