Package: mutt
Version: 1.5.13-1
Severity: important
Tags: l10n

(The following is in the Big5 charset)

Consider the following byte sequence, a GB18030 message incorrectly
tagged as GB2312 (see wishlist item #402027):

0000000 c2 a0 d6 76 d9 46 cc 8e b7 ea df 4c b6 fe d3 d0
0000020 c6 b9 c5 d2 9a c2 bb ee 84 d3 a3 ac b5 ab be 57
0000040 ed 93 c9 cf 93 68 df ed b5 bd

When displayed on a Big5 terminal, mutt's internal viewer renders this
as mojibake:

???v?F??逢?L二有乒乓?祿??櫻?但?W??上?h唔到

(i.e.,
0000000 3f 3f 3f 76 3f 46 3f 3f b3 7b 3f 4c a4 47 a6 b3
0000020 a5 e2 a5 e3 3f b8 53 3f 3f c4 e5 3f a6 fd 3f 57
0000040 3f 3f a4 57 3f 68 ad f8 a8 ec
in Big5)

There are a couple of problems with this output:

1. In an allegedly-GB2312 string, any high byte following pure ASCII
   should be treated as the lead byte of a presumed double-byte
   character, even if it is invalid GB2312. The sequences "?v", "?F",
   "?L", and "?h" are all meaningless and should all be simply "??"
   (because they are "unknown kanji", not pairs of "unknown 8-bit
   character followed by valid ASCII").

2. The output "祿" [b8 53] and "櫻" [c4 e5] cannot be explained; they
   don't seem to be related to the original gb18030 in any way.

(While the real problem is #2, the cause of #2 is quite possibly #1.)

When the same message is piped to w3m with an argument explicitly
telling it that the input is GB2312 (as incorrectly tagged), it
correctly renders it as:

聽講貴處逢週二有乒乓毬活動,但網頁上??唔到

(i.e.,
0000000 c5 a5 c1 bf b6 51 b3 42 b3 7b b6 67 a4 47 a6 b3
0000020 a5 e2 a5 e3 b2 41 ac a1 b0 ca a1 41 a6 fd ba f4
0000040 ad b6 a4 57 3f 3f ad f8 a8 ec
in Big5)

This implies that mutt's internal viewer is confused by invalid GB2312
and fails to correctly replace out-of-range byte sequences with the "??"
string.

(When the same message is piped to w3m without any optional arguments,
w3m correctly detects 18030 and renders it as intended. But this is OT.)

This *might* be the same bug as or a related bug of #249626, since the
symptoms seem identical, though it is a different character set. If this
is the same bug, it is probably affecting all other CJK encodings as
well.

-- System Information:
Debian Release: 4.0
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'stable')
Architecture: i386 (i686)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.4.28-ow1
Locale: LANG=zh_TW.Big5, LC_CTYPE=zh_TW.Big5 (charmap=BIG5)

Versions of packages mutt depends on:
ii  libc6                   2.3.6.ds1-8      GNU C Library: Shared libraries
ii  libdb4.4                4.4.20-8         Berkeley v4.4 Database Libraries [
ii  libgnutls13             1.4.4-3          the GNU TLS library - runtime libr
ii  libidn11                0.6.5-1          GNU libidn library, implementation
ii  libncursesw5            5.5-5            Shared libraries for terminal hand
ii  libsasl2                2.1.19.dfsg1-0.5 Authentication abstraction library
ii  zmailer [mail-transport 2.99.56-2        Mailer for Extreme Performance Dem

Versions of packages mutt recommends:
ii  locales                      2.3.6.ds1-8 GNU C Library: National Language (
ii  mime-support                 3.37-1      MIME files 'mime.types' & 'mailcap

-- no debconf information

  • Bug#402035: mutt: Mutt's internal text viewer confused by out-o... Ambrose Li

Reply via email to