Re: Is there Unicode mail out there?

2001-07-12 Thread DougEwell2

In a message dated 2001-07-11 15:03:27 Pacific Daylight Time, 
[EMAIL PROTECTED] writes:

  One exception to this should be US-ASCII because not only the repertoire
  of US-ASCII is a subset of the repertoire of UTF-8 but also the
  representation of all characters in US-ASCII is identical in UTF-8.
  A smart mail client would notice that all characters
  are in US-ASCII repertoire  and label outgoing messages as in
  US-ASCII EVEN if it's configured to label outgoing messages
  in UTF-8
[...]

I thought this might even be enshrined in an RFC.  It certainly makes sense.  
If you are using a mailer that sends CP1252 down the wire (not that this is a 
good idea, but some mailers do this), the mailer should examine the message 
and if it only contains US-ASCII characters, the message should be tagged as 
US-ASCII.  Otherwise, if it only contains ISO 8859-1, it should be tagged as 
ISO 8859-1.  Only if it actually contains CP1252 characters, like smart 
quotes or long dashes, should it be tagged as CP1252.  As Jungshik observed, 
the same goes for UTF-8.

-Doug Ewell
 Fullerton, California




Re: Is there Unicode mail out there?

2001-07-12 Thread James Kass

Please disregard my previous message about a work-around
for Outlook Express problem.

Although it works, non-UTF-8 messages are no longer being
properly displayed, an unacceptable trade-off.

Another possibility which was tested was to add an innocuous
character which isn't included in any code page to the
signature.  Tried the zero-width space.  When copying the
zero-width space into the signature of a message being sent
in reply to a message encoded as Thai (Windows), Outlook
Express prompted to Send as Unicode... when the letter
was tagged to be sent later.  So far, so good.

Figured it would be possible to set up a signature with
ZWS to eliminate the necessity of manually changing the 
encoding of messages being sent to UTF-8 every time a 
message is sent.  Unfortunately, on Windows M.E., the 
signature  information is stored in the Registry, and it's ASCII.  
So, the ZWS got converted to a question mark and doesn't
get switched back when it's added to a message.

So, tried setting up a signature file to be added to each
outgoing message including the ZWS.  In this case, MSOE
displays the UTF-8 ZWS as mojibake (gibberish) when the
signature is added to the outgoing message.

Perhaps a future version of Outlook will correct the
problem.

Best regards,

James Kass.






Re: Is there Unicode mail out there?

2001-07-12 Thread James

[EMAIL PROTECTED] wrote:
 
 In a message dated 2001-07-11 15:03:27 Pacific Daylight Time,
 [EMAIL PROTECTED] writes:
 
   One exception to this should be US-ASCII because not only the repertoire
   of US-ASCII is a subset of the repertoire of UTF-8 but also the
   representation of all characters in US-ASCII is identical in UTF-8.
   A smart mail client would notice that all characters
   are in US-ASCII repertoire  and label outgoing messages as in
   US-ASCII EVEN if it's configured to label outgoing messages
   in UTF-8
 [...]
 
 I thought this might even be enshrined in an RFC.  It certainly makes sense.
 If you are using a mailer that sends CP1252 down the wire (not that this is a
 good idea, but some mailers do this), the mailer should examine the message
 and if it only contains US-ASCII characters, the message should be tagged as
 US-ASCII. 

The RFCs/BCPs do encourage using as minimal a charset as possible.

Anyway, UTF-8 email is nowhere right now. Kat Momoi of Netscape has suggested
that about the only this could change is if email client vendors turn it
on by default in new product releases. I won't be the first!

Having done a lot of email client programming using the RFCs as a basis,
let me say that in general RFCs are vague, and not always the best practice
for interoperability when it comes to email.

For example, CRLF in message bodies is recommended, but actually reduces
interoperability, particularly with subversions of IE 5. So I don't know
of any email client that does it. And quoted-printable is way too
complicated to expect conforming implementations.

And don't get me started about all the random charsets that RFCs promote that
nobody adopts!

James.




Re: Is there Unicode mail out there?

2001-07-12 Thread James Kass

Here's a work-around that seems to work.

Added the ZWS after the signature in a signature file.
Because the mojibake for ZWS includes the Euro
currency symbol, OE prompts to 'send as Unicode'
when replying to a non-UTF-8 sender.

Of course, the time saved by not having to manually
change the encoding will probably be less than the
time lost explaining what the junk is under my name.

Best regards,

James Kass.
​






Re:

2001-07-12 Thread akerbeltz.alba


 大家好!!! ← ふりがなください
 
 Is that like だいすき? No, だいすき is 大好き.
 
 Something like da jia hao in Mandarin but with appropriate Chinese
 tones. こんにちは in Japanese, I gather.

Dunno about Mandarin, but it's Daaih Gā Hóu in Cantonese. Bit formal though
...How about 你哋點阿? ... just kidding  : )

Michael





Re: Is there Unicode mail out there?

2001-07-12 Thread Jungshik Shin




On Thu, 12 Jul 2001, James Kass wrote:

 Here's a work-around that seems to work.

 Added the ZWS after the signature in a signature file.
 Because the mojibake for ZWS includes the Euro
 currency symbol, OE prompts to 'send as Unicode'
 when replying to a non-UTF-8 sender.

  Mysterious is why this prompting (by MS OE) did not happen to Mike
Ayers when he replied to Peter's message with Thai string in Windows-874
adding some Chinese characters while MS OE (5.50.x) I tried certainly
prompted me to pick one of three (1. send as Unicode, 2. send as is -
in Windows-874 - risking loss of info. 3. cancel) when I did the same
thing. ZWS and Chinese characters have no reason to be treated differently
when added to a Windows-874 encoded message.

  BTW, Mozilla/Netscape 6 also uses the encoding of the message
(or its closest match among IANA-registered MIME charsets. Thus, in place
of Windows-874, Mozilla/Netscape 6 uses TIS-620) you're replying to by
default. When one adds some characters outside the repertoire of that
encoding, it warns that there are some characters not representable in the
current encoding and it's necessary to change the encoding to something
that can represent all characters. (it does not suggest Unicode.) It
offers two options : go ahead despite potential loss of some characters
or cancel and change the encoding.

  Perhaps, both Mozilla/Netscape 6 and MS OE should have an option (
'toggle-switchable') to let users  specify that their preferred encoding
(set in preference) be used by default regardless of the encoding of
messages they're replying to.

   Jungshik Shin





Re: Is there Unicode mail out there?

2001-07-12 Thread Peter_Constable


  Hmm, it didn't work either.
OK, one more try -- Thai test, take 3: กลัปมาอยู่แล้ว


- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485​
E-mail: [EMAIL PROTECTED]




Re: $B$U$j$,$J$/$@$5$$(B

2001-07-12 Thread foster . feng

'$BBg2H9%(B!!!' is the most common chinese greeting!

'$BBg2H(B' means 'EVERYBODY', '$B9%(B' means 'GOOD', $BBg2H9%(B!!! means 'HOPE
EVERYBODY'S DOING GOOD'

It is something like '$B3'$5$s!"$45!7y$$$+$,$G$7$g$&$+!G(Bin Japanese.

'$B

Re: A UTF-8 based News Service

2001-07-12 Thread DougEwell2

In a message dated 2001-07-12 8:27:20 Pacific Daylight Time, 
[EMAIL PROTECTED] writes:

  As someone involved in the service I often wish there was some
  form of compressed Unicode encoding.  The 3-byte penalty that
  Ethiopic bears under UTF-8 turns into higher bandwidth that web
  hosting services meter and charge for by the megabyte.  For a
  popular site this soon makes UTF-8 a costly option to support.

  A system analagous to iso-8859-x whereby Ethiopic and other scripts
  in the 3 byte range could be shifted back into the 2 byte range
  might help (generally only English and Ethiopic is desired together).

Today is your lucky day.  Check out Unicode Technical Standard #6, A 
Standard Compression Scheme for Unicode:

http://www.unicode.org/unicode/reports/tr6/

SCSU uses 128-byte windows to compress small alphabetic scripts to almost 1 
byte per character.  Since Ethiopic occupies three 128-character half-blocks, 
SCSU must use three windows and switch between them, but the overhead is 
still much lower than UTF-8.  In the worst case (each character belongs to a 
different half-block than the one before), you will still use only 2 bytes 
per character.

SCSU is fully supported by SC UniPad, a Unicode text editor that is currently 
available for free.  For more information, visit:

http://www.unipad.org/

-Doug Ewell
 Fullerton, California




Re: Is there Unicode mail out there?

2001-07-12 Thread Tex Texin

(I didnt read all the thread so maybe I missed a step).

So the proposal is that minimizing the charset is a good thing?

This means that you and I start out in a conversation about a
product I am trying to sell you, it happens to be all in ascii
and we exchange several mails successfully. Then I quote you
a price in Euros and my 1252 message gets corrupted by your
reader which can handle either only 8859-1 or ASCII, and
you miss the fact that the Euro is corrupted and think we
are talking dollars or some other currency.

Although I understand why you would want a minimal charset in order
to not needlessly prevent communications, the implication of
reliability and trust that is built by having some success is
a problem. You think you are communicating successfully but when it
is critical it may not...

Perhaps if a harder line was taken when characters
are used that cannot be converted, this would make more sense.
(ie give a very clear recognizable indication of corruption or
conversion failures)

tex



[EMAIL PROTECTED] wrote:
 
 In a message dated 2001-07-11 15:03:27 Pacific Daylight Time,
 [EMAIL PROTECTED] writes:
 
   One exception to this should be US-ASCII because not only the repertoire
   of US-ASCII is a subset of the repertoire of UTF-8 but also the
   representation of all characters in US-ASCII is identical in UTF-8.
   A smart mail client would notice that all characters
   are in US-ASCII repertoire  and label outgoing messages as in
   US-ASCII EVEN if it's configured to label outgoing messages
   in UTF-8
 [...]
 
 I thought this might even be enshrined in an RFC.  It certainly makes sense.
 If you are using a mailer that sends CP1252 down the wire (not that this is a
 good idea, but some mailers do this), the mailer should examine the message
 and if it only contains US-ASCII characters, the message should be tagged as
 US-ASCII.  Otherwise, if it only contains ISO 8859-1, it should be tagged as
 ISO 8859-1.  Only if it actually contains CP1252 characters, like smart
 quotes or long dashes, should it be tagged as CP1252.  As Jungshik observed,
 the same goes for UTF-8.
 
 -Doug Ewell
  Fullerton, California

-- 
---
Tex Texin  Director, International Business
mailto:[EMAIL PROTECTED]  +1-781-280-4271
Fax:+1-781-280-4655
the Progress Company   14 Oak Park, Bedford, MA 01730
---




A UTF-8 based News Service

2001-07-12 Thread Daniel Yacob

Greeings,

I thought this would be of interest to people here who might be
involved in multilingual news services:


The Ethiopian News Headlines has relocated to a new server at
http://www.ethiozena.net/ and is making it easier than ever to
read news headlines in Unicode.  A companion Unicode only server
is launched at http://unicode.ethiozena.net/ which serves
articles in UTF-8 encoding only.

Other new features include localization in three languages and daily
article links are packaged in XML for other news services to link to
(see http://www.ethiozena.net/zena.xml and a demonstration parsing
script in Perl http://www.ethiozena.net/zena.pl.txt).


As someone involved in the service I often wish there was some
form of compressed Unicode encoding.  The 3-byte penalty that
Ethiopic bears under UTF-8 turns into higher bandwidth that web
hosting services meter and charge for by the megabyte.  For a
popular site this soon makes UTF-8 a costly option to support.

A system analagous to iso-8859-x whereby Ethiopic and other scripts
in the 3 byte range could be shifted back into the 2 byte range
might help (generally only English and Ethiopic is desired together).

Fortunately there is mod_gzip for Apache.  I would appreciate any
information about other options.

thanks,

/Daniel




More about SCSU (was: Re: A UTF-8 based News Service)

2001-07-12 Thread DougEwell2

I should have also mentioned that SCSU is fully supported by the programming 
toolkit ICU (International Components for Unicode), found at:

http://oss.software.ibm.com/icu/

An Open Source project, ICU is available for free and comes with voluminous 
documentation.

SCSU is also registered as an IANA charset, although you are unlikely to find 
raw SCSU text on the Internet, due to its use of control characters (bytes 
below 0x20).

Hope this helps.

-Doug Ewell
 Fullerton, California




Re: Is there Unicode mail out there?

2001-07-12 Thread $B$F$s$I$&$j$e$&$8(B
My other e-mail was a real "moji-baka", I'd say. That would be a good term, 
$BJ8;zGO(B: Re: Is there Unicode mail out there?

(I didnt read all the thread so maybe I missed a step).

So the proposal is that minimizing the charset is a good thing?

This means that you and I start out in a conversation about a
product I am trying to sell you, it happens to be all in ascii
and we exchange several mails successfully. Then I quote you
a price in Euros and my 1252 message gets corrupted by your
reader which can handle either only 8859-1 or ASCII, and
you miss the fact that the Euro is corrupted and think we
are talking dollars or some other currency.

Although I understand why you would want a minimal charset in order
to not needlessly prevent communications, the implication of
reliability and trust that is built by having some success is
a problem. You think you are communicating successfully but when it
is critical it may not...

Perhaps if a harder line was taken when characters
are used that cannot be converted, this would make more sense.
(ie give a very clear recognizable indication of corruption or
conversion failures)

tex



[EMAIL PROTECTED] wrote:
 
 In a message dated 2001-07-11 15:03:27 Pacific Daylight Time,
 [EMAIL PROTECTED] writes:
 
   One exception to this should be US-ASCII because not only the repertoire
   of US-ASCII is a subset of the repertoire of UTF-8 but also the
   representation of all characters in US-ASCII is identical in UTF-8.
   A smart mail client would notice that all characters
   are in US-ASCII repertoire  and label outgoing messages as in
   US-ASCII EVEN if it's configured to label outgoing messages
   in UTF-8
 [...]
 
 I thought this might even be enshrined in an RFC.  It certainly makes sense.
 If you are using a mailer that sends CP1252 down the wire (not that this is a
 good idea, but some mailers do this), the mailer should examine the message
 and if it only contains US-ASCII characters, the message should be tagged as
 US-ASCII.  Otherwise, if it only contains ISO 8859-1, it should be tagged as
 ISO 8859-1.  Only if it actually contains CP1252 characters, like smart
 quotes or long dashes, should it be tagged as CP1252.  As Jungshik observed,
 the same goes for UTF-8.
 
 -Doug Ewell
  Fullerton, California

-- 
---
Tex Texin  Director, International Business
mailto:[EMAIL PROTECTED]  +1-781-280-4271
Fax:+1-781-280-4655
the Progress Company   14 Oak Park, Bedford, MA 01730
---




Re: Is there Unicode mail out there?

2001-07-12 Thread Jungshik Shin


On Thu, 12 Jul 2001 [EMAIL PROTECTED] wrote:

   Hmm, it didn't work either.
 OK, one more try -- Thai test, take 3: กลัปมาอยู่แล้ว

   Finally, you succeeded ! Congratulations :-). Could you
explain what you did differently this time so that other Lotus
Notes users can benefit from your experience/experiment?

  Jungshik Shin





RE: Is there Unicode mail out there?

2001-07-12 Thread Ayers, Mike


 From: Jungshik Shin [mailto:[EMAIL PROTECTED]] 

   Mysterious is why this prompting (by MS OE) did not happen to Mike
 Ayers when he replied to Peter's message with Thai string in 
 Windows-874
 adding some Chinese characters while MS OE (5.50.x) I 
 tried certainly
 prompted me to pick one of three (1. send as Unicode, 2. send as is -
 in Windows-874 - risking loss of info. 3. cancel) when I did the same
 thing. ZWS and Chinese characters have no reason to be 
 treated differently
 when added to a Windows-874 encoded message.

Not mysterious really, I'm using Outlook, not Outlook Express.
Despite the similarity of names, the differences seem to be considerable.
It is disturbing, though, that the premium product has less desireable
behavior than the free one in this case.


/|/|ike




Re: Is there Unicode mail out there?

2001-07-12 Thread James Kass


Jungshik Shin wrote:

   Perhaps, both Mozilla/Netscape 6 and MS OE should have an option (
 'toggle-switchable') to let users  specify that their preferred encoding
 (set in preference) be used by default regardless of the encoding of
 messages they're replying to.


It would be nice...

MS OE appeared to already have the option.  Under Tools-Options-
Send, there's a check-box for Reply to messages using the format
in which they were sent.  Under Tools-Options-Send-International
Settings, there's a provision for the user to choose a default
encoding and a check-box to Use the following default encoding for
outgoing messages:.  Even though this system was set up
accordingly, outgoing messages which were replies to messages
in non-UTF-8 encodings weren't being sent in UTF-8, to my
surprise, chagrin, and dismay.

Best regards,

James Kass.
​





RE: Is there Unicode mail out there?

2001-07-12 Thread Chris Wendt

In any case, no matter if new message or reply or forward, you can force
OE to use a specific encoding using the Format.Encoding menu. There is
no option to ALWAYS use a specific encoding in replies and forwards, you
will have to choose manually each time. OE itself has no option to
automatically determine the best outbound encoding (and I agree that
generally the encoding with the smallest repertoire is the best). OE
will only suggest UTF-8 and will not suggest any other charset, if the
chosen encoding does not hold the characters used.

Note: an HTML message to an HTML4 capable recipient will transport any
character regardless of the chosen encoding. That might explain the
different results you are seeing when sending to differently enabled
recipients.

Replying in the charset of the original message is in my view reasonable
behavior: the recipient of your reply has the best chance to read the
message in the encoding the original message was sent. Changing the
encoding decreases the chance the replyee will be able to read your
message.


-Original Message-
From: James Kass [mailto:[EMAIL PROTECTED]] 
Sent: Thursday, July 12, 2001 1:18 PM
To: Jungshik Shin
Cc: Unicode List
Subject: Re: Is there Unicode mail out there?



Jungshik Shin wrote:

   Perhaps, both Mozilla/Netscape 6 and MS OE should have an option (
 'toggle-switchable') to let users  specify that their preferred 
 encoding (set in preference) be used by default regardless of the 
 encoding of messages they're replying to.


It would be nice...

MS OE appeared to already have the option.  Under Tools-Options- Send,
there's a check-box for Reply to messages using the format in which
they were sent.  Under Tools-Options-Send-International Settings,
there's a provision for the user to choose a default encoding and a
check-box to Use the following default encoding for outgoing
messages:.  Even though this system was set up accordingly, outgoing
messages which were replies to messages in non-UTF-8 encodings weren't
being sent in UTF-8, to my surprise, chagrin, and dismay.

Best regards,

James Kass.
​






RE: Is there Unicode mail out there?

2001-07-12 Thread Ayers, Mike


 From: Chris Wendt [mailto:[EMAIL PROTECTED]] 

 Replying in the charset of the original message is in my view 
 reasonable
 behavior: the recipient of your reply has the best chance to read the
 message in the encoding the original message was sent. Changing the
 encoding decreases the chance the replyee will be able to read your
 message.

For person-to-person emails, this makes sense.  It does not hold up
for mailing lists, however - it's not necessarily unreasonable behavior, but
the odds of readability for mailing lists are fixed to the character set,
regardless of the character set used in any individual mailing (note that
the Windows Thai character set could not be viewed by many people - changed
to UTF-8, almost everyone could read it).  For this reason, I would really
like to see option controlled behavior (use the current behavior as a
default).


/|/|ike




Re: Is there Unicode mail out there?

2001-07-12 Thread James Kass

Chris Wendt wrote:

 Replying in the charset of the original message
 is in my view reasonable behavior: the recipient
 of your reply has the best chance to read the
 message in the encoding the original message
 was sent. Changing the encoding decreases the
 chance the replyee will be able to read your
 message.

When a user issues an instruction to a computer, it
is a command rather than a request.  If a user selects
the option to Use the following default encoding for
outgoing messages:, then the expected behavior is
compliance.

Of course, you are quite right in that the recipient
is more likely to be able to read a message sent in the
recipient's default.  As we move towards a World encoding
standard, perhaps more applications will use the standard
as default.

This message is being sent in Arabic (Windows) because
it is in reponse to a message sent in that encoding.  The
author of the original message has noted my work-around
and has cleverly prevented it by selecting a code-page
which includes the special character I'm using for the
kludge.

Best regards,

James Kass.
​






Re: Wordprocessors in Korean

2001-07-12 Thread Edward Cherlin

At 06:50 AM 2001-07-10, Genenz wrote:
...Now a teacher from Korea told me, MS Word has
some shortcomings concerning Korean
and there would exist  another word-
processor much more frequently used than Word2000
in his country. (He also complained about win2000 their
are better choices for multilanguage apps and the net,
but that is another story not to be discussed here).

Word 2000 on Windows 2000 supports Korean well enough for my needs, but I 
am not an authority on what Koreans want in a word processor. Can you get 
an explanation for us of its shortcomings?

As for better multilingual apps, indeed there are some that support more 
languages, but I have not found any fully multilingual products for Mac or 
PC capable enough and reliable enough for my needs. Not that Word is 
totally reliable, of course, but I manage.



Edward Cherlin
Generalist
A knot! Oh, do let me help to undo it.
Alice in Wonderland





Re: A UTF-8 based News Service

2001-07-12 Thread David Starner

 As someone involved in the service I often wish there was some
 form of compressed Unicode encoding.  The 3-byte penalty that
 Ethiopic bears under UTF-8 turns into higher bandwidth that web
 hosting services meter and charge for by the megabyte.  For a
 popular site this soon makes UTF-8 a costly option to support.

 A system analagous to iso-8859-x whereby Ethiopic and other scripts
 in the 3 byte range could be shifted back into the 2 byte range
 might help (generally only English and Ethiopic is desired together).

 Fortunately there is mod_gzip for Apache.  I would appreciate any
 information about other options.

What about UTF-16? Encode all characters as 2 bytes, and your problem is
solved, and UTF-16 should be supported by all recent Unicode-supporting web
browsers.

--
David Starner - [EMAIL PROTECTED]