LC_CTYPE vs. charset (il)logic (was: Re: overriding the charset for headers)

Baurjan Ismagulov Wed, 28 Aug 2002 02:58:42 -0700

On Tue, Aug 27, 2002 at 01:37:30PM -0700, Sam Peterson wrote:
> > I'm using 1.4. I think LC_CTYPE and charset are set correctly in both
> > cases. BTW, why does Mutt use both of these variables? I would find
> > logical if charset have overridden LC_CTYPE. I can't see the rationale
> > behind the current implementation, and find it counter-intuitive. Could
> > anyone please shed some light on this, too?
> 
> Hmm, from my experience, charset does override LC_CTYPE.  I'll have to
> double check that.  To my knowledge, charset is set based on your
> locale if you don't explicitly set it in your muttrc.  Anyone?


Example: let's take a message encoded in iso-8859-1 and containing the
character 'į' (LATIN SMALL LETTER C WITH CEDILLA, octal code 347). The
following table outlines how the message is displayed with different
LC_CTYPE and charset values on a system with (hopefully) properly
configured locales (mutt 1.4.0-2 on Debian unstable):

LC_CTYPE           charset     user wants   mutt shows  user expects
--------           -------     ----------   ----------  ------------
""                 ""          unknown      \347        ?
""                 iso-8859-1  iso-8859-1   \347        ccedilla
""                 us-ascii    us-ascii     ?           ?
en_US.US-ASCII     ""          us-ascii     \347        ?
en_US.US-ASCII     iso-8859-1  iso-8859-1   \347        ccedilla
en_US.US-ASCII     us-ascii    us-ascii     ?           ?
en_US.ISO-8859-1   ""          iso-8859-1   ccedilla    ccedilla
en_US.ISO-8859-1   iso-8859-1  iso-8859-1   ccedilla    ccedilla
en_US.ISO-8859-1   us-ascii    us-ascii     ?           ?


Here LC_CTYPE and charset are the environment and mutt variables,
respectively. "Mutt shows" column lists what mutt is actually
displaying. "User wants" and "user expects" show what I call "override":
the charset to be assumed by mutt given the two variable values, and the
character that, IMHO, should be displayed by mutt according to the
charset assumed.

Apparently, mutt does the following:
* display ccedilla if it can be displayed both with LC_CTYPE and
  charset;
* display ? if char can't be displayed with charset, regardless of
  LC_CTYPE value;
* display \347 if char can be displayed with charset, but can't be
  displayed with LC_CTYPE.

At the moment, I can see neither any useful application of ? and \347
distinction, nor the relation between LC_CTYPE and charset -- I
certainly cannot call it "override". Seems to me as if it was some side
effect of code layering inside mutt.

And the logic I propose is:
1. if charset is set, assumed_charset = charset;
   otherwise, if LC_CTYPE is set, assumed_charset = LC_CTYPE;
   otherwise, assumed_charset = "us-ascii".
2. * display ccedilla if it can be displayed with assumed_charset;
   * display ? if it can't.

I don't insist my scheme is "better" since I don't know the rationale
behind the current design. However, I see much user confusion with this
issue and think it could be made more simple, more stupid the way I've
desribed.

What do you think?


With kind regards,
Baurjan.

P.S. I can see the full address in index mode using "@". Can I see the
     full subject?

LC_CTYPE vs. charset (il)logic (was: Re: overriding the charset for headers)

Reply via email to