Re: Off-topic request

Mark Crispin Thu, 25 Mar 2004 12:51:03 -0800

On Thu, 25 Mar 2004, Pete Maclean wrote:

- what charsets are commonly used for email?

The ones that I see most commonly are: US-ASCII, UTF-8, ISO-8859-1, ISO-8859-2, ISO-8859-15, KOI8-R, ISO-2022-JP, GB2312, BIG5, EUC-KR, WINDOWS-1251. However, others do appear.

- can most clients handle text received in UTF-8?

Not yet. However, the number is growing; and the fact that Outlook supports it means that it is likely that a majority of email users (as opposed to clients) can deal with UTF-8.

The forthcoming release of Pine (Pine 4.60) can not send UTF-8 mail; however, it will render UTF-8 mail into the user's local character set.

- will client developers be happy if they can send text only in US_ASCII, ISO-8859-1 or UTF-8?

ISO-8859-1 is dead, or at least the death warrant has been signed. If you want to cater to the 8-bit western European market, support ISO-8859-15, not ISO-8859-1. The Europeans really need the Euro character...

I *think* that a client which does only US-ASCII (not US_ASCII) and UTF-8 will be viable in the future, but not yet. On the other hand, if I was specifying the development of a new client today, I would go UTF-8 only on the grounds that by release time UTF-8 will have hit critical mass.

   - when providing resources for MUA developers, what charset-related
     facilities should be supplied?

The UW c-client library's utf8 section has functions to transfer from lots of local character sets into UTF-8. For some time to come, a client will need to recognize messages in other character sets and render them *into* UTF-8, if not necessarily generate UTF-8.

Some more modern UNIX systems have the iconv() function which is useful if you're not using c-client.

I would not use Pine's approach in a new client. Pine was designed to use local character sets, and as a result is approaching the UTF-8 transition backwards (the hard way). I know because I've written much of that code. We have to do it that way because of our legacy base (and a *lot* of code), but that doesn't mean that you should (do what I say, not what I do).

It's much easier to start off using UTF-8 internally for everything, convert all other character sets into UTF-8, and possibility have a mechanism to generate other character sets from UTF-8.

And generally, what does one need to know when providing client-oriented facilities for developers targeting foreign-language/character-set users?

Yeegs. That's a lot to ask. Your best option IMHO is to ask people questions as they come in -- and be sure to include questions about what they think they did *wrong* or could have done *better*. This is something in which a lot of learning can be from the mistakes of others.

-- Mark --

http://staff.washington.edu/mrc
Science does not emerge from voting, party politics, or public debate.
Si vis pacem, para bellum.

Re: Off-topic request

Reply via email to