Re: Documenting in Tamil Computing
On Mon, Dec 16, 2002 at 10:29:14AM -0800, Barry Caplan [EMAIL PROTECTED] wrote a message of 23 lines which said: Actually, it is not Unicode which is nt mature enough. It is SMTP, the core mail transport protocol. It is not 8 bit clean. It is very clear in the RFCs that only 7bit data is allowed over the wire. I have to correct this because it may seriously cast doubts about the ability of Internet email to send Unicode files. There are various extensions and kluges described in various RFCs (ESMTP, MIME, etc. ) All these extensions are referenced in the same RFC, 2821, which is the authoritative one about SMTP. I do not know any mainstream SMTP server which does not implement them. The most important for us is 8BITMIME: Eight-bit message content transmission MAY be requested of the server by a client using extended SMTP facilities, notably the 8BITMIME extension [20]. 8BITMIME SHOULD be supported by SMTP servers. but they are not universally implemented at the server transport layer, This is absolutely wrong. sendmail, Postfix and qmail allow 8-bits transport for a *very* long time. But for arbitrary email from one address to another, you can't rely on it. I send Latin-1 (ISO 8859-1) emails for more than ten years (and without using quoted-printable or other similar hacks) to French-speaking people in various parts of the world and I'm still waiting for an actual problem.
Re: Documenting in Tamil Computing
I don't understand what you meant by Unicode not being mature enough to support multilingual emails. Maybe the argument is simply that there are not enough email agents that can render Tamil properly from Unicode-encoded text, and that email rarely has a useful life that justifies pain today. Eric.
8-bit MIME (was: Documenting in Tamil Computing)
Dear all, Barry Caplan had written: SMTP [...] is not 8 bit clean. It is very clear in the RFCs that only 7bit data is allowed over the wire. Stephane Bortzmeyer wrote: All these extensions are referenced in the same RFC, 2821, which is the authoritative one about SMTP. As of November 2002, RFC 2821 is still a Proposed Standard, and RFC 821 is the Standard Protocol (cf. http://rfc.sunsite.dk/rfc/rfc3300.html). The most important for us is 8BITMIME: Section 2.3.1 of RFC 2821, the proposed standard, says: | The content is textual in nature, expressed using the US-ASCII | repertoire [1]. Although SMTP extensions (such as 8BITMIME [20]) | may relax this restriction for the content body, Stephane Bortzmeyer quoted section 2.4 of RFC 2821: Eight-bit message content transmission MAY be requested of the server by a client using extended SMTP facilities, notably the 8BITMIME extension [20]. 8BITMIME SHOULD be supported by SMTP servers. SHOULD does definitely not mean the same thing as MUST. An SMTP server does not have to support 8-bit MIME mail. And the remainder of the quoted paragraph requests proper MIME headers for 8-bit text: | However, it MUST not be construed as authorization to transmit | unrestricted eight bit material. 8BITMIME MUST NOT be requested | by senders for material with the high bit on that is not in MIME | format with an appropriate content-transfer encoding; servers | MAY reject such messages. Barry Caplan had written: But for arbitrary email from one address to another, you can't rely on it. Stephane Bortzmeyer wrote: I send Latin-1 (ISO 8859-1) emails for more than ten years (and without using quoted-printable or other similar hacks) to French-speaking people in various parts of the world and I'm still waiting for an actual problem. Mere luck, I'd say, but no proof at all. I have seen many messages, originally in ISO-8859-1-encoded French, that got the high-bit of every accented character chopped off, thus replacing é with i, î with n, and so forth. And even more mail in German, distorted in a similar way. This has provoked an entry in my E-Mail FAQ: http://www.systems.uni-konstanz.de/EMAIL/FAQ.php#SMTP-73. Of course, more and more SMTP servers support 8-bit MIME, and many take the pains to transform 8-bit MIME to some transfer-encoding supported by the receiving server. If you are located behind a server that recodes your 8-bit mail, you cannot claim that 8-bit mail is supported everywhere; you can only claim that your server compensates for the incompatibility of your MUA and the world at large. Best wishes, Otto Stolz
Re: 8-bit MIME (was: Documenting in Tamil Computing)
On Tue, Dec 17, 2002 at 01:28:00PM +0100, Otto Stolz [EMAIL PROTECTED] wrote a message of 65 lines which said: As of November 2002, RFC 2821 is still a Proposed Standard, and RFC 821 is the Standard Protocol (cf. http://rfc.sunsite.dk/rfc/rfc3300.html). For those on the mailing list not versed in IETF language, let us add that most Internet protocols are just Proposed Standard: it takes a lot of time to move to an upper level. (The RFC 2821 is more than 18 months old.) Anyway, 8bits MIME was already possible with RFC 821, the difference was just editorial (RFC 2821 is easier to read since you do not need to patch it with many following RFCs.) SHOULD does definitely not mean the same thing as MUST. An SMTP server does not have to support 8-bit MIME mail. You're playing with words. In real life, all SMTP servers support 8-bits mail because all SMTP servers authors are aware of the issue (true, it was long and difficult to convince them all but it worked). Any counter-example? I have seen many messages, originally in ISO-8859-1-encoded French, that got the high-bit of every accented character chopped off, thus replacing é with i, î with n, and so forth. Last time I saw such problems was something like ten years ago. It was almost never the fault of the SMTP server, but of some programs on the destination machine (or sometimes the faults of funny gateways like X400 servers, something you cannot blame on the Internet). Of course, more and more SMTP servers support 8-bit MIME, All implementations already supports 8-bits MIME. Some servers have not been upgraded yet but it is uncommon. (Remember we are talking about a move which occurred many years ago: even if many system administrators do not upgrade their software, in the long term, machines are replaced and new software catches on.) take the pains to transform 8-bit MIME to some transfer-encoding supported by the receiving server. Very bad idea, BTW, since it mangles the mail, which can be a problem with applications like cryptographic signatures. I always turn it off and it was never a problem. In practice (do note I refer to the real world), all SMTP servers accept 8-bits EVEN IF THEY DO NOT ADVERTISE IT PROPERLY with the 8BITMIME option. Back to Unicode: why does nobody use UTF-7? Precisely because it is no longer necessary.
Re: 8-bit MIME (was: Documenting in Tamil Computing)
On Tue, 17 Dec 2002, Stephane Bortzmeyer wrote: On Tue, Dec 17, 2002 at 01:28:00PM +0100, Otto Stolz [EMAIL PROTECTED] wrote I have seen many messages, originally in ISO-8859-1-encoded French, that got the high-bit of every accented character chopped off, thus replacing é with i, î with n, and so forth. When was the last time you saw this? Last time I saw such problems was something like ten years ago. It was almost never the fault of the SMTP server, but of some programs on the destination machine (or sometimes the faults of funny gateways like X400 servers, something you cannot blame on the Internet). Although I agree that 8BITMIME is implemented and deployed very widely these days(it's been more than two years since I received garbled emails due to 7bit-only path. I receive tens of emails in 8bit encodings every day), I'm afraid it's your unique experience that the last time you received emails with MSB stripped off was 10 years ago. While trying to counter the exaggeration made against the ability of the internet email to transport UTF-8 emails, you may have gone to the other extreme. In 1992, sendmail 4.x/5.x transported more than half (if not more) of the Internet email and they're not 8bit clean. That's why RFC 1468 and RFC 1557 were written circa 1992 for Japanese and Korean email exchanges in 7bit ISO-2022-JP and ISO-2022-KR, respectively. (in case of ISO-2022-JP, there's another important reason. there are two major encodings used for Japanese, Shift_JIS on DOS/Windows/Mac and EUC-JP on Unix) As lately as 1999, I did receive MSB-stripped emails which didn't go through non-SMTP gateway (e.g. X400). Back then, some mail servers still used 7bit-only sendmail 4.x, 5.x (on old Sun OS 4.x, AIX 3.x, 4.x, HP/UX 8.x, IRIX, etc machines), old version of PMDF(old VMS machines) and smail(on some Unix machines) while 8bit clean sendmail 8.6.x or later had been around since mid-1990's. Besides, some email servers still don't abide by ESMTP standard and don't include '8BITMIME' in their response when queried with 'EHLO' although they support 8bit clean transport (as you wrote). Nonetheless, I agree that these days most mail transport paths are 8bit clean. Even if not, Base64 and QP(I don't regard them as hack as you do) are well supported by most modern MUAs so that end-users have little problem exchanging emails in UTF-8 (or other legacy 8bit encodings). Most of them don't have to care whether 8BITMIME is used in transit or which C-T-E is used, 8bit,QP, or Base64. take the pains to transform 8-bit MIME to some transfer-encoding supported by the receiving server. Very bad idea, BTW, since it mangles the mail, which can be a problem with applications like cryptographic signatures. I always turn it off and it was never a problem. In practice (do note I refer to the real world), all SMTP servers accept 8-bits EVEN IF THEY DO NOT ADVERTISE IT PROPERLY with the 8BITMIME option. Doing this type of C-T-E change (from 8bit to QP/Base64) automatically at the MTA level may be a bad idea, but doing this with MUAs should not be a problem(that's what end-users choose). With most modern MUAs supporting MIME standard very well(with notable exceptions being Eudora and some popular web mail services), the 8bit-cleanness of the transport path doesn't matter much for UTF-8 email exchange as I wrote above. IMHO, the biggest obstacle to email exchange in UTF-8 is not 7bit only SMTP but the fact that people don't feel a strong need to switch because they think legacy encodings just work fine for them. (not many people need to exchange emails in languages other than their native ones, let alone multilingual emails that cross the boundary of legacy encodings). Another obstacle is that popular web mail services don't support UTF-8 well incorrectly assuming that there's 'the' invariant mapping between languages and MIME charset/encodings(e.g. for French, use ISO-8859-15/1 or Windows-1252, for Japanese ISO-2022-JP). Therefore, even though major MUAs have no problem with UTF-8 emails, some people get reluctant to send all their outgoing emails in UTF-8 for fear that their correspondents with web mail accounts won't be able to read them without some 'user-intervention'. Jungshik Shin
Re: Documenting in Tamil Computing
At 10:34 AM 12/17/2002 +0100, Stephane Bortzmeyer wrote: There are various extensions and kluges described in various RFCs (ESMTP, MIME, etc. ) All these extensions are referenced in the same RFC, 2821, which is the authoritative one about SMTP. I do not know any mainstream SMTP server which does not implement them. The most important for us is 8BITMIME: Eight-bit message content transmission MAY be requested of the server by a client using extended SMTP facilities, notably the 8BITMIME extension [20]. 8BITMIME SHOULD be supported by SMTP servers. There is another RFC, whose number I forget, that defines should. Essentially it says you must not rely on anyone else actually implementing this feature. but they are not universally implemented at the server transport layer, This is absolutely wrong. sendmail, Postfix and qmail allow 8-bits transport for a *very* long time. Well, aside from the fact that those are not the only 2 pieces of mail transport sw by a long shot, this feature e is a configurable option, and may not always be turned on. But for arbitrary email from one address to another, you can't rely on it. I send Latin-1 (ISO 8859-1) emails for more than ten years (and without using quoted-printable or other similar hacks) to French-speaking people in various parts of the world and I'm still waiting for an actual problem. You're playing with words. Not really - this is very clearly dealt with in an RFC that defines SHOULD and MUST. In real life, all SMTP servers support 8-bits mail because all SMTP servers authors are aware of the issue (true, it was long and difficult to convince them all but it worked). Any counter-example? Jungshik Shin wrote: Besides, some email servers still don't abide by ESMTP standard and don't include '8BITMIME' in their response when queried with 'EHLO' although they support 8bit clean transport (as you wrote). I did a quick survey of mail servers in the .com top level domain about 18 months ago to see which servers implemented 8bitmime and which didn't. IIRC, about 20% or more did not. As I said earlier, that does not mean 8 nits wouldn't go through anyway if they are modern servers, but you can't rely on that. I would like to do a wider survey if someone could donate some bandwidth or maybe someone at W3 who was going to look into this at the time can bring this back to top of the things to do list (no names, but I am pretty sure he is on this list...:) Barry Caplan www.i18n.com
Re: Documenting in Tamil Computing
At 08:32 PM 12/15/2002 -0500, Jungshik Shin wrote: because Unicode is not mature enough to be used in multilingual email yet. You just have to make do with the 8bit TSCII encoding for Tamil eMail. I don't understand what you meant by Unicode not being mature enough to support multilingual emails. Modern email clients like Netscape7/Mozilla, MS Outlook (Express), and Mutt support UTF-8 very well. Actually, it is not Unicode which is nt mature enough. It is SMTP, the core mail transport protocol. It is not 8 bit clean. It is very clear in the RFCs that only 7bit data is allowed over the wire. There are various extensions and kluges described in various RFCs (ESMTP, MIME, etc. ) but they are not universally implemented at the server transport layer, let alone at the client layer. So Unicode falls into a (very large) class of encodings that are not safe to pass over SMTP because they use 8 bits for the encoding of at least some characters. This is a well know problem, and some mail servers do not follow the SMTP RFC exactly in that they do not specifically strip the 8th bit of all data and turn it to 0. If you are lucky and all th e mail servers on the path between you and your recipient act this way, then 8 bit data will go through. But for arbitrary email from one address to another, you can't rely on it. Barry Caplan www.i18n.com
Documenting in Tamil Computing
fwd: fyi: Below is a copy of a mail I circulated on the subject of Documenting in Tamil Computing From: "sisrivas [EMAIL PROTECTED]" [EMAIL PROTECTED] Date: Sun Dec 15, 2002 11:24pm Subject: Documenting in Tamil Computing We need to be clear as to the direction that Tamil is going with regard to Tamil computing. I'm writing this again and again as there is some miss understanding about what font encodings are doing to Tamil computing. (TSC is Temporary. TAB is temporary, OldType(alas Bamini) is temporary. 1/If you are preparing a Tamil document, intended for long term use you must use Unicode Encoding. Any other approach you take can be considered a waste of time if your content is intended for long term use.So do yourself and others a favour, prepare your documents using Tamil Unicode. see item 7 at the URL http://www.gbizg.com/Tamilfonts/ekalappai.htm on how to get Unicode keyboard drivers. Unfortunately Windows 95 and Windows 98 can only read Unicode pages. You can write in Unicode using Windows NT, 2000, XP and linux. So what can you do if you only have Windows 95, 98 or 3.1,Well sorry you need to use TSC or TAB or even OldType (alas Bamini) encoding. You can assume that these documents that you make will not be usable in the near future. Are you going to write a book, are you going to publish some research materials, etc, etc, do your self a favour. use Unicode and nothing else. DO NOT WASTE YOUR TIME. TIME IS PRESIOUS. 2/catch 22 You know we all use Tamil eMail and for that we can not use Unicode.For Tamil eMail we use 8bit encoding called TSCii. I'm sorry to say that you still need to use this 8 bit encoding (which is not Unicode), because Unicode is not mature enough to be used in multilingual email yet. You just have to make do with the 8bit TSCII encoding for Tamil eMail. For more infohttp://www.geocities.com/avarangal/ Sinnathuirai Srivas
Re: Documenting in Tamil Computing
On Sun, 15 Dec 2002, Avarangal wrote: If you are preparing a Tamil document, intended for long term use you must use Unicode Encoding. Any other approach you take can be considered a Absolutely. Unfortunately Windows 95 and Windows 98 can only read Unicode pages. You can write in Unicode using Windows NT, 2000, XP and linux. Even under Win 9x/ME, there are free and commercial word processors, and editors to enable you to make files in UTF-8 or UTF-16. For instance, yudit(http://www.yudit.org) has supported Tamil (both UTF-8 and TSCII) for over a year now. 2/ You know we all use Tamil eMail and for that we can not use Unicode. For Tamil eMail we use 8bit encoding called TSCii. I'm sorry to say that you still need to use this 8 bit encoding (which is not Unicode), What is 'Tamil eMail'? Is it a web mail service for Tamil? because Unicode is not mature enough to be used in multilingual email yet. You just have to make do with the 8bit TSCII encoding for Tamil eMail. I don't understand what you meant by Unicode not being mature enough to support multilingual emails. Modern email clients like Netscape7/Mozilla, MS Outlook (Express), and Mutt support UTF-8 very well. If you believe in Unicode, there's no reason not to promote UTF-8 right now for email exchange. Of course, some people relying on **broken** Web mail services that assume that there's one-to-one relationship between languages and encodings for them would have trouble reading UTF-8 messages, but that's not a fault of Unicode but that of those web mail services. Unfortunately, most web mail services(hotmail, Yahoo, Lycos, etc) are broken in that aspect. (btw, I have made a patch to a popular opensource web mail program, IMP, to make it better support multilingual emails, but there are stil rough edges in my patch) Jungshik