Re: Documenting in Tamil Computing

2002-12-17 Thread Stephane Bortzmeyer
On Mon, Dec 16, 2002 at 10:29:14AM -0800,
 Barry Caplan [EMAIL PROTECTED] wrote 
 a message of 23 lines which said:

 Actually, it is not Unicode which is nt mature enough. It is SMTP,
 the core mail transport protocol. It is not 8 bit clean. It is very
 clear in the RFCs that only 7bit data is allowed over the wire.

I have to correct this because it may seriously cast doubts about the
ability of Internet email to send Unicode files.
 
 There are various extensions and kluges described in various RFCs
 (ESMTP, MIME, etc. )

All these extensions are referenced in the same RFC, 2821, which is
the authoritative one about SMTP. I do not know any mainstream SMTP
server which does not implement them.

The most important for us is 8BITMIME:

   Eight-bit message content transmission MAY be requested of the server
   by a client using extended SMTP facilities, notably the 8BITMIME
   extension [20].  8BITMIME SHOULD be supported by SMTP servers.

 but they are not universally implemented at the server transport
 layer,

This is absolutely wrong. sendmail, Postfix and qmail allow 8-bits
transport for a *very* long time.

 But for arbitrary email from one address to another, you can't rely on it.

I send Latin-1 (ISO 8859-1) emails for more than ten years (and
without using quoted-printable or other similar hacks) to
French-speaking people in various parts of the world and I'm still
waiting for an actual problem.




Re: Documenting in Tamil Computing

2002-12-17 Thread Eric Muller


I don't understand what you meant by Unicode not being
mature enough to support multilingual emails. 

Maybe the argument is simply that there are not enough email agents that 
can render Tamil properly from Unicode-encoded text, and that email 
rarely has a useful life that justifies pain today.

Eric.





8-bit MIME (was: Documenting in Tamil Computing)

2002-12-17 Thread Otto Stolz
Dear all,

Barry Caplan had written:

SMTP [...] is not 8 bit clean. It is very
clear in the RFCs that only 7bit data is allowed over the wire.


Stephane Bortzmeyer wrote:

All these extensions are referenced in the same RFC, 2821, which is
the authoritative one about SMTP.



As of November 2002, RFC 2821 is still a Proposed Standard, and RFC 821
is the Standard Protocol (cf. http://rfc.sunsite.dk/rfc/rfc3300.html).


The most important for us is 8BITMIME:



Section 2.3.1 of RFC 2821, the proposed standard, says:
| The content is textual in nature, expressed using the US-ASCII
| repertoire [1]. Although SMTP extensions (such as 8BITMIME [20])
| may relax this restriction for the content body,

Stephane Bortzmeyer quoted section 2.4 of RFC 2821:
 Eight-bit message content transmission MAY be requested of the server
 by a client using extended SMTP facilities, notably the 8BITMIME
 extension [20].  8BITMIME SHOULD be supported by SMTP servers.

SHOULD does definitely not mean the same thing as MUST.
An SMTP server does not have to support 8-bit MIME mail.

And the remainder of the quoted paragraph requests proper MIME
headers for 8-bit text:
| However, it MUST not be construed as authorization to transmit
| unrestricted eight bit material.  8BITMIME MUST NOT be requested
| by senders for material with the high bit on that is not in MIME
| format with an appropriate content-transfer encoding; servers
| MAY reject such messages.

Barry Caplan had written:

But for arbitrary email from one address to another, you can't rely on it.


Stephane Bortzmeyer wrote:

I send Latin-1 (ISO 8859-1) emails for more than ten years (and
without using quoted-printable or other similar hacks) to
French-speaking people in various parts of the world and I'm still
waiting for an actual problem.


Mere luck, I'd say, but no proof at all.

I have seen many messages, originally in ISO-8859-1-encoded French,
that got the high-bit of every accented character chopped off, thus
replacing é with i, î with n, and so forth. And even more mail
in German, distorted in a similar way. This has provoked an entry in
my E-Mail FAQ: http://www.systems.uni-konstanz.de/EMAIL/FAQ.php#SMTP-73.

Of course, more and more SMTP servers support 8-bit MIME, and many
take the pains to transform 8-bit MIME to some transfer-encoding
supported by the receiving server. If you are located behind a server
that recodes your 8-bit mail, you cannot claim that 8-bit mail is
supported everywhere; you can only claim that your server compensates
for the incompatibility of your MUA and the world at large.

Best wishes,
  Otto Stolz





Re: 8-bit MIME (was: Documenting in Tamil Computing)

2002-12-17 Thread Stephane Bortzmeyer
On Tue, Dec 17, 2002 at 01:28:00PM +0100,
 Otto Stolz [EMAIL PROTECTED] wrote 
 a message of 65 lines which said:

 As of November 2002, RFC 2821 is still a Proposed Standard, and RFC 821
 is the Standard Protocol (cf. http://rfc.sunsite.dk/rfc/rfc3300.html).

For those on the mailing list not versed in IETF language, let us add
that most Internet protocols are just Proposed Standard: it takes a
lot of time to move to an upper level. (The RFC 2821 is more than 18
months old.) Anyway, 8bits MIME was already possible with RFC 821, the
difference was just editorial (RFC 2821 is easier to read since you do
not need to patch it with many following RFCs.)

 SHOULD does definitely not mean the same thing as MUST.
 An SMTP server does not have to support 8-bit MIME mail.

You're playing with words. In real life, all SMTP servers support
8-bits mail because all SMTP servers authors are aware of the issue
(true, it was long and difficult to convince them all but it
worked). Any counter-example?
 
 I have seen many messages, originally in ISO-8859-1-encoded French,
 that got the high-bit of every accented character chopped off, thus
 replacing é with i, î with n, and so forth. 

Last time I saw such problems was something like ten years ago. It was
almost never the fault of the SMTP server, but of some programs on the
destination machine (or sometimes the faults of funny gateways like
X400 servers, something you cannot blame on the Internet).

 Of course, more and more SMTP servers support 8-bit MIME, 

All implementations already supports 8-bits MIME. Some servers have
not been upgraded yet but it is uncommon. (Remember we are talking
about a move which occurred many years ago: even if many system
administrators do not upgrade their software, in the long term,
machines are replaced and new software catches on.)

 take the pains to transform 8-bit MIME to some transfer-encoding
 supported by the receiving server. 

Very bad idea, BTW, since it mangles the mail, which can be a problem
with applications like cryptographic signatures. I always turn it off
and it was never a problem. In practice (do note I refer to the real
world), all SMTP servers accept 8-bits EVEN IF THEY DO NOT ADVERTISE
IT PROPERLY with the 8BITMIME option.

Back to Unicode: why does nobody use UTF-7? Precisely because it is no
longer necessary.






Re: 8-bit MIME (was: Documenting in Tamil Computing)

2002-12-17 Thread Jungshik Shin

On Tue, 17 Dec 2002, Stephane Bortzmeyer wrote:

 On Tue, Dec 17, 2002 at 01:28:00PM +0100,
  Otto Stolz [EMAIL PROTECTED] wrote

  I have seen many messages, originally in ISO-8859-1-encoded French,
  that got the high-bit of every accented character chopped off, thus
  replacing é with i, î with n, and so forth.

  When was the last time you saw this?

 Last time I saw such problems was something like ten years ago. It was
 almost never the fault of the SMTP server, but of some programs on the
 destination machine (or sometimes the faults of funny gateways like
 X400 servers, something you cannot blame on the Internet).

  Although I agree that 8BITMIME is implemented and deployed
very widely these days(it's been more than two years since I received
garbled emails due to 7bit-only path. I receive tens of emails in 8bit
encodings  every day), I'm afraid it's your unique experience that the
last time you received emails with MSB stripped off was 10 years ago.
While trying to counter the exaggeration made against the ability of the
internet email to transport UTF-8 emails, you may have gone to the other
extreme.  In 1992, sendmail 4.x/5.x transported more than half (if not
more) of the Internet email and they're not 8bit clean. That's why RFC
1468 and RFC 1557 were written circa 1992 for Japanese and Korean email
exchanges in 7bit ISO-2022-JP and ISO-2022-KR, respectively. (in case
of ISO-2022-JP, there's another important reason. there are two major
encodings used for Japanese, Shift_JIS on DOS/Windows/Mac and EUC-JP on
Unix) As lately as 1999, I did receive MSB-stripped emails which didn't
go through non-SMTP gateway (e.g. X400).  Back then,  some mail servers
still used 7bit-only sendmail 4.x, 5.x (on old Sun OS 4.x, AIX 3.x, 4.x,
HP/UX 8.x, IRIX, etc machines), old version of PMDF(old VMS machines)
and smail(on some Unix machines) while 8bit clean sendmail 8.6.x or
later had been around since mid-1990's.

Besides, some email servers still don't
abide by ESMTP standard and don't include '8BITMIME' in their response
when queried with 'EHLO' although they support 8bit clean transport
(as you wrote).

Nonetheless, I  agree that these days most mail transport paths are 8bit
clean. Even if not, Base64 and QP(I don't regard them as hack as you do)
are well supported by most modern MUAs so that end-users have little
problem exchanging emails in UTF-8 (or other legacy 8bit encodings).
Most of them don't have to care whether 8BITMIME is used in transit
or which C-T-E is used, 8bit,QP, or Base64.


  take the pains to transform 8-bit MIME to some transfer-encoding
  supported by the receiving server.

 Very bad idea, BTW, since it mangles the mail, which can be a problem
 with applications like cryptographic signatures. I always turn it off
 and it was never a problem. In practice (do note I refer to the real
 world), all SMTP servers accept 8-bits EVEN IF THEY DO NOT ADVERTISE
 IT PROPERLY with the 8BITMIME option.

  Doing this type of C-T-E change (from 8bit to QP/Base64)
automatically at the MTA level may be a bad idea, but doing this with
MUAs should not be a problem(that's what end-users choose). With most
modern MUAs supporting MIME standard very well(with  notable exceptions
being Eudora and some popular web mail services), the 8bit-cleanness
of the transport path doesn't matter much for UTF-8 email exchange
as I wrote above.

 IMHO, the biggest obstacle to email exchange in UTF-8 is not
7bit only SMTP but the fact that people don't feel a strong need to
switch because they think legacy encodings just work fine for them.
(not many people need to exchange emails in languages other than their
native ones, let alone multilingual emails that cross the boundary of
legacy encodings). Another obstacle is that popular web mail services don't
support UTF-8 well incorrectly assuming that there's 'the' invariant
mapping between languages and MIME charset/encodings(e.g. for French,
use ISO-8859-15/1 or Windows-1252, for Japanese ISO-2022-JP). Therefore,
even though major MUAs have no problem with UTF-8 emails, some people
get reluctant to send all their outgoing emails in UTF-8 for fear that
their correspondents with web mail accounts won't be able to read them
without some 'user-intervention'.

 Jungshik Shin





Re: Documenting in Tamil Computing

2002-12-17 Thread Barry Caplan
At 10:34 AM 12/17/2002 +0100, Stephane Bortzmeyer wrote:
 There are various extensions and kluges described in various RFCs
 (ESMTP, MIME, etc. )

All these extensions are referenced in the same RFC, 2821, which is
the authoritative one about SMTP. I do not know any mainstream SMTP
server which does not implement them.

The most important for us is 8BITMIME:

   Eight-bit message content transmission MAY be requested of the server
   by a client using extended SMTP facilities, notably the 8BITMIME
   extension [20].  8BITMIME SHOULD be supported by SMTP servers.


There is another RFC, whose number I forget, that defines should. Essentially it 
says you must not rely on anyone else actually implementing this feature.


 but they are not universally implemented at the server transport
 layer,

This is absolutely wrong. sendmail, Postfix and qmail allow 8-bits
transport for a *very* long time.

Well, aside from the fact that those are not the only 2 pieces of mail transport sw by 
a long shot, this feature   e is a configurable option, and may not always be turned 
on.


 But for arbitrary email from one address to another, you can't rely on it.

I send Latin-1 (ISO 8859-1) emails for more than ten years (and
without using quoted-printable or other similar hacks) to
French-speaking people in various parts of the world and I'm still
waiting for an actual problem.

You're playing with words. 

Not really - this is very clearly dealt with in an RFC that defines SHOULD and 
MUST.


In real life, all SMTP servers support 
8-bits mail because all SMTP servers authors are aware of the issue 
(true, it was long and difficult to convince them all but it 
worked). Any counter-example?

Jungshik Shin wrote: 
Besides, some email servers still don't 
abide by ESMTP standard and don't include '8BITMIME' in their response 
when queried with 'EHLO' although they support 8bit clean transport 
(as you wrote).

I did a quick survey of mail servers in the .com top level domain about 18 months ago 
to see which servers implemented 8bitmime and which didn't.  IIRC, about 20% or more 
did not. As I said earlier, that does not mean 8 nits wouldn't go through anyway if 
they are modern servers, but you can't rely on that.

I would like to do a wider survey if someone could donate some bandwidth or maybe 
someone at W3 who was going to look into this at the time can bring this back to top 
of the things to do list (no names, but I am pretty sure he is on this list...:)

Barry Caplan
www.i18n.com





Re: Documenting in Tamil Computing

2002-12-16 Thread Barry Caplan
At 08:32 PM 12/15/2002 -0500, Jungshik Shin wrote:
 because
 Unicode is not mature enough to be used in multilingual email yet.
 You just have to make do with the 8bit TSCII encoding for Tamil eMail.

  I don't understand what you meant by Unicode not being
mature enough to support multilingual emails. Modern email clients like
Netscape7/Mozilla, MS Outlook (Express), and Mutt support UTF-8 very well.


Actually, it is not Unicode which is nt mature enough. It is SMTP, the core mail 
transport protocol. It is not 8 bit clean. It is very clear in the RFCs that only 7bit 
data is allowed over the wire. 

There are various extensions and kluges described in various RFCs (ESMTP, MIME, etc. ) 
but they are not universally implemented at the server transport layer, let alone at 
the client layer.

So Unicode falls into a (very large) class of encodings that are not safe to pass over 
SMTP because they use 8 bits for the encoding of at least some characters.

This is a well know problem, and some mail servers do not follow the SMTP RFC exactly 
in that they do not specifically strip the 8th bit of all data and turn it to 0. If 
you are lucky and all th e mail servers on the path between you and your recipient act 
this way, then 8 bit data will go through.

But for arbitrary email from one address to another, you can't rely on it.

Barry Caplan
www.i18n.com





Documenting in Tamil Computing

2002-12-15 Thread Avarangal



fwd:
fyi: Below is a copy of a mail I circulated on the 
subject of 
Documenting in Tamil Computing 

From: "sisrivas [EMAIL PROTECTED]" [EMAIL PROTECTED] Date: Sun Dec 
15, 2002 11:24pm Subject: Documenting in Tamil Computing 

We need to be clear as to the direction that Tamil is going with regard 
to Tamil computing. I'm writing this again and again as there is some miss 
understanding about what font encodings are doing to Tamil computing. (TSC is 
Temporary. TAB is temporary, OldType(alas Bamini) is temporary.

1/If you are preparing a Tamil document, intended for long term use you 
must use Unicode Encoding. Any other approach you take can be considered a 
waste of time if your content is intended for long term use.So do yourself 
and others a favour, prepare your documents using Tamil Unicode.

see item 7 at the URL http://www.gbizg.com/Tamilfonts/ekalappai.htm 
on how to get Unicode keyboard drivers.

Unfortunately Windows 95 and Windows 98 can only read Unicode pages. You 
can write in Unicode using Windows NT, 2000, XP and linux.

So what can you do if you only have Windows 95, 98 or 3.1,Well sorry 
you need to use TSC or TAB or even OldType (alas Bamini) encoding. You can 
assume that these documents that you make will not be usable in the near 
future.

Are you going to write a book, are you going to publish some research 
materials, etc, etc, do your self a favour. use Unicode and nothing else. 


DO NOT WASTE YOUR TIME. TIME IS PRESIOUS.

2/catch 22

You know we all use Tamil eMail and for that we can not use Unicode.For 
Tamil eMail we use 8bit encoding called TSCii. I'm sorry to say that you 
still need to use this 8 bit encoding (which is not Unicode), because 
Unicode is not mature enough to be used in multilingual email yet.

You just have to make do with the 8bit TSCII encoding for Tamil 
eMail.

For more infohttp://www.geocities.com/avarangal/

Sinnathuirai Srivas



Re: Documenting in Tamil Computing

2002-12-15 Thread Jungshik Shin



On Sun, 15 Dec 2002, Avarangal wrote:

 If you are preparing a Tamil document, intended for long term use you must
 use Unicode Encoding. Any other approach you take can be considered a

  Absolutely.

 Unfortunately Windows 95 and Windows 98 can only read Unicode pages.
 You can write in Unicode using Windows NT, 2000, XP and linux.

  Even under Win 9x/ME, there are free and commercial word processors,
and editors to enable you to make files in UTF-8 or UTF-16.  For instance,
yudit(http://www.yudit.org) has supported Tamil (both UTF-8 and TSCII)
for over a year now.


 2/

 You know we all use Tamil eMail and for that we can not use Unicode.
 For Tamil eMail we use 8bit encoding called TSCii. I'm sorry to say that
 you still need to use this 8 bit encoding (which is not Unicode),

 What is 'Tamil eMail'? Is it a web mail service for Tamil?

 because
 Unicode is not mature enough to be used in multilingual email yet.
 You just have to make do with the 8bit TSCII encoding for Tamil eMail.

  I don't understand what you meant by Unicode not being
mature enough to support multilingual emails. Modern email clients like
Netscape7/Mozilla, MS Outlook (Express), and Mutt support UTF-8 very well.
If you believe in Unicode, there's no reason not to promote UTF-8 right
now for email exchange. Of course, some people relying on **broken**
Web mail services that assume that there's one-to-one relationship
between languages and encodings for them would have trouble reading UTF-8
messages, but that's not a fault of Unicode but that of those web mail
services. Unfortunately, most web mail services(hotmail, Yahoo, Lycos,
etc) are broken in that aspect. (btw, I have made a patch to a popular
opensource web mail program, IMP, to make it better support multilingual
emails, but there are stil rough edges in my patch)

  Jungshik