Package: lynx
Version: 2.8.8pre3-1
Severity: normal

If I run "lynx -dump" on this HTML:

    <html>
    <body>
    This ( ) is a UTF-8 unbreakable space.
    </body>
    </html>

I get this output:

   This (Â ) is a UTF-8 unbreakable space.

Note the "capital A with circumflex".  This seems to be because the C2
A0 sequence is being interpreted as two iso-8859-1 characters, rather
than a single utf-8 character.

If I add the "-assume_charset=utf8" option, it does what I expect, but
I believe that should be the default (especially since I have
LANG=en.utf8 as my locale).


-- 
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]

Reply via email to