Re: Acronyms (off-topic)

2000-08-03 Thread Robert A. Rosenberg

At 12:04 PM 08/02/2000 -0800, Geoffrey Waigh wrote:
>On Wed, 2 Aug 2000, Alain LaBonté  wrote:
>
> > À 07:12 2000-07-11 -0800, Doug Ewell a écrit:
> > >Many English speakers also think ISO is an abbreviation or initialism
> > >(not "acronym"; that term is correct only when the resulting "word"
> > >is actually pronounced, like "AIDS" or "SIDA") of the English name
> > >"International Standards (or Standardization) Organization."  Of course,
> > >this is wrong.
> >
> > [Alain]  ISO is not pronounced as a word in English but it is in French
>
>Um, I and quite a few people I know pronounce ISO as if it were a word in
>English.

So have I. Most of the spoken usages of it is as I-SO (EYE-SEW) not I-S-O 
(EYE-ESS-OH).

>  I have no idea on the origins of the name, but given that an
>uncynical view of the function of that body is to produce international
>standards, I doubt the correct etymology will win through.
>
>Geoffrey




RE: Subject lines in UTF-8 mssgs? [was: Proposal to make ...]

2000-07-28 Thread Robert A. Rosenberg

At 01:41 AM 07/13/2000 -0800, [EMAIL PROTECTED] wrote:
>As far as I can understand, the choice of the outgoing charset is highly
>automatic in MS Outlook 2000. I suspects it depends on the combination of
>characters that I (or the system) used in the various fields of the e-mail.

The problem is that the heuristics are not correct for ISO-8859-1/CP-1252. 
The selection SHOULD be:

   1) Only x00-x7F - US-ASCII
   2) x00-x7F + xA0-xFF - ISO-8859-1 [Western European(ISO)]
   3) x00-x7F + xA0-xFF + a character in the x80-x9F code point range - 
CP-1252/Windows-1252 [Western European(Windows])

If you check the Encoding list, you will note that Western European(ISO) 
and Western European(Windows) are both listed and the selection controls if 
a message with xA0-xFF characters gets ID'ed as ISO-8859-1 or CP-1252. The 
problem is that selection of Western European(ISO) does not correct the 
message's CHARSET to CP-1252 if a x80-x9F is found in the message.




Re: Euro character in ISO

2000-07-12 Thread Robert A. Rosenberg

At 09:27 PM 07/11/2000 -0800, Michael \(michka\) Kaplan wrote:
>Robert,
>
>I am a big fan of the Windows code pages, they often make my life easier.
>However, there is a disadvantage to the fact that even over the course of a
>few service packs (let alone a few operating systems!)

So? What does that have to do with ISO issuing C1-less 8859 codes that have 
an extra 32 glyphs? They issue it, MS adds the codes, and everyone has a 
constant set of codes. The fact that MS's CP125x codes are changed with no 
name change is a separate issue.

>  the code pages have
>changed, and there is simply no good documentation that will tell you when
>(for example) Farsi characters U+06A9 and U+06AF were added to Windows
>CP1256 (Arabic) . All that one knows for certain is that it was before
>Windows 98 SE and before NT4 SP5 (although it did no ship with NT4).
>
>When you cannot figure out why an application works on one platform and not
>another, it can make you pine for a more stationary standard! :-)
>
>My ISP moved to Windows 2000 so I do not have to worry about making them
>install things like newer code page files on the web server, but for a long
>time thse differences plagued me heavily.
>
>michka
>
>
>- Original Message -
>From: "Robert A. Rosenberg" <[EMAIL PROTECTED]>
>To: "Unicode List" <[EMAIL PROTECTED]>
>Cc: "Unicode List" <[EMAIL PROTECTED]>
>Sent: Tuesday, July 11, 2000 7:19 PM
>Subject: Re: Euro character in ISO
>
>
> > At 15:30 -0800 on 07/11/00, Asmus Freytag wrote about Re: Euro
> > character in ISO:
> >
> > >There has been an attempt to create a series of 'touched up' 8859
> > >standards. The problem with these is that you get all the issues of
> > >character set confusion that abound today with e.g. Windows CP 1252
> > >mistaken for 8895-1 with a vengeance:
> >
> > The problem would go away if the ISO would get their heads out of
> > their a$$ and drop the C1 junk from the NEW 'TOUCHED UP" 8859s and
> > put the CP125x codes there.
> > Then when you said you used 8859-21 you'd get CP-1252 and Windows
> > would no longer need to lie (or tell the truth by admitting it is
> > CP-1252).
> >




Re: Euro character in ISO

2000-07-12 Thread Robert A. Rosenberg

At 08:56 PM 07/11/2000 -0800, Geoffrey Waigh wrote:
>On Tue, 11 Jul 2000, Robert A. Rosenberg wrote:
>
> > At 15:30 -0800 on 07/11/00, Asmus Freytag wrote about Re: Euro
> > character in ISO:
> >
> > >There has been an attempt to create a series of 'touched up' 8859
> > >standards. The problem with these is that you get all the issues of
> > >character set confusion that abound today with e.g. Windows CP 1252
> > >mistaken for 8895-1 with a vengeance:
> >
> > The problem would go away if the ISO would get their heads out of
> > their a$$ and drop the C1 junk from the NEW 'TOUCHED UP" 8859s and
> > put the CP125x codes there.
>
>Except that would break all the systems that understand that C1 "junk,"
>and a number of systems do so because they are adhering to other
>ISO standards.  If you are going to force someone to change their
>datastreams to something new, they might as well go to some flavour
>of Unicode anyways.

Who is going to get broken if I say on my MIME header (or HTML) that my 
CHARSET is (example) ISO-8859-21? You are talking about uses where the 
computer is talking to a device and needs the C1 range to tell it what to 
do not another computer (where it is just passing a text stream). The C1 
codes are DEVICE CONTROL and have no purpose (except to occupy slots that 
are better used for extra GLYPHS) in EMAIL or HTML transfer. I am NOT 
asking for anyone to change their mode of operation - only for ISO-8859-x 
codes that are designed for transfer of printable data. UNICODE is not a 
viable option since all we are talking about is the ability to select from 
a number of 256 codepoint 8-bit tables not go over to UTF-8 or UTF-16 
(which would require changes to the program code).


>Geoffrey
>"tilting at terminal emulators, err windmills."




Re: Euro character in ISO

2000-07-12 Thread Robert A. Rosenberg

At 04:27 AM 07/12/2000 -0800, Michael Everson wrote:
>Ar 18:19 -0800 2000-07-11, scríobh Robert A. Rosenberg:
>
> >The problem would go away if the ISO would get their heads out of
> >their a$$ and drop the C1 junk from the NEW 'TOUCHED UP" 8859s and
> >put the CP125x codes there.
>
>Excuse me, but that is not appropriate. The ISO/IEC 8859 series is
>conformant with ISO/IEC 2022, and protocols which adhere to that standard
>should not be compromised by what you suggest.
>
> >Then when you said you used 8859-21 you'd get CP-1252 and Windows
> >would no longer need to lie (or tell the truth by admitting it is
> >CP-1252).
>
>The problem is that some companies do/did not correctly identify their code
>pages. The world can live with Latin-1 and CP-1252. It shouldn't have to
>live with CP-1252 being identified as Latin-1.

Which is what I am saying when I talk about admitting that you are using 
CP-1252 not
ISO-8859-1 (in your MIME/HTML headers) at least in the case where there are 
glyphs in the
x80-x9F range in use. If a system can claim US-ASCII if no codes in the 
x80-xFF range appear and ISO-8859-1 otherwise (as many MUAs do), it should 
have the smarts to claim CP-1252 if in its scan it found a x80-x9F glyph).


>Michael Everson  **  Everson Gunn Teoranta  **   http://www.egt.ie
>15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland
>Vox +353 1 478 2597 ** Fax +353 1 478 2597 ** Mob +353 86 807 9169
>27 Páirc an Fhéithlinn;  Baile an Bhóthair;  Co. Átha Cliath; Éire




Re: Euro character in ISO

2000-07-12 Thread Robert A. Rosenberg

At 04:49 AM 07/12/2000 -0800, Antoine Leca wrote:
> > The problem would go away if the ISO would get their heads out of
> > their a$$ and drop the C1 junk from the NEW 'TOUCHED UP" 8859s and
> > put the CP125x codes there.
>
>
>Sorry. It may work for CP1252/iso-8859-1, and CP1254/iso-8859-9,
>but won't for the others.

Note - I am not SAYING to use the CP-125x glyphs in my suggested C1-less 
8859s - only that doing so will make life much easier. I'd love for ISO to 
also issue on for the MacRoman/ISO translation so that Apple gets an 
Official Supported x80-x9F mapping like I'm suggesting for Win-Latinx.

>  Since Windows starts with the same letter as
>Word --or is the reason that they both come from the same company.
>No! I cannot believe that-- there are a couple of requirements
>that makes effectively the "other" codepages slighty incompatible,
>such as the necessary presence for · at position B5 (because this
>is the character Word uses when you ask it to "display" the spaces,
>and this is hard-coded in the product).

Last time I looked, B5 was not in the x80-x9F C1 range where CP125x differs 
from ISO-8859-x. Thus this is a non-issue.


>- Windows HTML-tools/MAs are reluctant to add the test for presence
>of non-Latin1 characters to either tag as iso-8859-1 or
>windows-1252. Apparently they are too lazy (because they already
>did such a test for ASCII).

I agree with this statement. Since the scan is there it is not that hard to 
change it to test for the x80-x9F and xA0-xFF ranges separately and have 
separate flags for the two cases.




Re: Euro character in ISO

2000-07-12 Thread Robert A. Rosenberg

At 09:55 PM 07/11/2000 -0800, Doug Ewell wrote:
>Oh no, here comes a flame war.
>
>Robert A. Rosenberg <[EMAIL PROTECTED]> wrote:
>
> > The problem would go away if the ISO would get their heads out of
> > their a$$ and drop the C1 junk from the NEW 'TOUCHED UP" 8859s and
> > put the CP125x codes there.
>
>The problem would be just beginning for all the users of terminals and
>terminal emulators that apparently rely on C1 control codes.  These
>folks argue that UTF-8 is defective because it sends bytes in the range
>0x80 to 0x9F to their terminal when no C1 code was intended, and/or
>because real C1 sequences are prefaced by 0xC2.  Some even go so far
>as to claim that an 8-bit character set is not "legitimate" unless it
>supports ISO 2022 and therefore C1.  Try telling these people that C1
>characters are "junk."

That is an issue with the software that is driving the Terminal or Terminal 
Emulator NOT with the input to the program. In an Email message (or in an 
HTML Page Source) C1 characters have no purpose (ie: they do not represent 
Glyphs or the few C0 functions [CR/LF/TAB?]that are used for these types of 
data transport) so the fact that some HARDWARE DEVICE needs them is not a 
reason for leaving them in codes that are used for inter-computer data 
transfer.


>-Doug Ewell
>  Fullerton, California




Re: Subset of Unicode to represent Japanese Kanji?

2000-07-12 Thread Robert A. Rosenberg

At 04:41 AM 07/12/2000 -0800, Otto Stolz wrote:
>If I am not mistaken, Kanji is ideographic characters, which would take
>the lion's share of memory to implement. Probably, you have to support
>kana (hiragana or katakana).
>
>I do not know Japanese, so others may jump in.


In case of major memory constraint, go for Romanjii [sp?] (which is 
Japanese written in Latin Letters and which the name of the writing systems 
are examples ). That is what we often see Japanese written as here in 
the US. It is the text converted phonetically and needs some accents but 
nothing more. For another example with accents, check out the name of the 
popular children's show "Pocket Monsters" AKA Pokémon.






Re: Euro character in ISO

2000-07-11 Thread Robert A. Rosenberg

At 15:30 -0800 on 07/11/00, Asmus Freytag wrote about Re: Euro 
character in ISO:

>There has been an attempt to create a series of 'touched up' 8859
>standards. The problem with these is that you get all the issues of
>character set confusion that abound today with e.g. Windows CP 1252
>mistaken for 8895-1 with a vengeance:

The problem would go away if the ISO would get their heads out of 
their a$$ and drop the C1 junk from the NEW 'TOUCHED UP" 8859s and 
put the CP125x codes there.
Then when you said you used 8859-21 you'd get CP-1252 and Windows 
would no longer need to lie (or tell the truth by admitting it is 
CP-1252).



Re: What is this "case folding"?

2000-07-11 Thread Robert A. Rosenberg

At 08:27 PM 07/10/2000 -0800, Mark Davis wrote:

>While an interesting trivia question, there are enough homographs in
>English that the very small percentage of them that can (sometimes) be
>distinguished according to case is completely insignificant.

I first ran into the Polish/polish pair in a book of short mystery stories 
that each turned on trivia of this type. In this case it was on the order 
of "What string of letters represent a word whose pronunciation is 
undefined if printed in all capital letters". Since the pronunciation of 
the string is dependent on if the "p" is a Capital Letter or not this 
differs from words that have more than one pronunciation even when are 
spelled and cased the same. 




Re: What is this "case folding"?

2000-07-10 Thread Robert A. Rosenberg

At 06:43 AM 07/10/2000 -0800, [EMAIL PROTECTED] wrote:
>If it is what I think it is, I don't want it in English.
>How could it tell "aids" from "AIDS", for instance?
>Or "joy" from "Joy"(name)?

Or Polish (nationality) from polish (shine) .


>--
>Robert Lozyniak
>Accusplit pedometer, purchased about 2000a07l01d19h45mZ,
>has NOT FLIPPED
>My page: http://walk.to/11
>[EMAIL PROTECTED] - email
>(917) 421-3909 x1133 - voicemail/fax
>
>
>
> [EMAIL PROTECTED] wrote:
> > [EMAIL PROTECTED] wrote:
> > > - Can these mutations only occur after a determinative,
> > or can they also
> > be
> > > at the beginning of a sentence?
> > I don't believe they can occur at the beginning
> > of a sentence. The most
> > common construct occurs after "na" (meaning "of");
> > "Ambasáid na hÉireann"
> > (Embassy of Ireland) is an example commonly encountered
> > outside Ireland.
> > However, they can occur after other words.
> >
> > > - Is this automatically implemented in the case
> > folding function of
> > > localized word processors?
> > No, not unless some new word processor has been
> > launched in the past year
> > or so.
> >
> > B=
>
>___
>Get your own FREE Bolt Onebox - FREE voicemail, email, and
>fax, all in one place - sign up at http://www.bolt.com




Re: UTF-8N?

2000-06-23 Thread Robert A. Rosenberg

At 10:54 PM 06/22/2000 -0800, Doug Ewell wrote:
>Now that Unicode plans to deprecate the use of U+FEFF as ZWNBSP,
>programs that *expect* UTF-8 instead of SBCS will be able to throw away
>an initial U+FEFF with even greater confidence.  It may even be possible
>for operating system developers to build this in at the OS level: open
>a UTF-8 text file; read characters; if the very first character in the
>file was U+FEFF then eat it.  Applications would never even see it.
>How cool would that be?

It would be very UNCool unless the application can tell the operating 
system that it wants this done for it. Otherwise it will have no way of 
KNOWING that the edited stream that the operating system is passing it IS 
UTF-8 (and was so identified by the deleted BOM) and not some other 
character-set that the program will fail on if it tries to parse it as 
UTF-8. Letting the application SEE the BOM acts as a sanity check.




RE: UTF-8 BOM Nonsense

2000-06-23 Thread Robert A. Rosenberg

At 11:31 AM 06/22/2000 -0800, Michael Kaplan (Trigeminal Inc.) wrote:
>I do not believe that this will require it to be added to a standard, and
>this is a non-standard usage, but life is about dealing with things as they
>are (and this is how they are!).

I assume that you also feel that the charset parm on a MIME Email Header 
(or HTML/XML header) is not needed and thus should be discouraged. The use 
of the BOM character at the start of a TEXT file serves the same purpose as 
the charset tag - It says "I am in UTF-8 format" (so you do not try to 
treat it as ISO-8859-x, CP1252, or some other encoding format).




RE: How to distinguish UTF-8 from Latin-* ?

2000-06-23 Thread Robert A. Rosenberg

At 09:41 AM 06/22/2000 -0800, Karlsson Kent - keka wrote:
>
>"Be liberal with what you accept and conservative with what you create"]).
>
>
>Well, there is a security aspect to this: sometimes given texts
>need to be scanned to try to determine if they are "harmless"
>or may trigger some undesirable interpretation (as interpreted
>program code, like shell-script, for instance).  A hacker may
>try to hide characters that trigger the undesired, and potentially
>dangerous, interpretation, by using overlong UTF-8 sequences.
>If the security scanner program does not "decode" overlong
>UTF-8 sequences,

Since the interpreter will only see it if the security system has "signed 
off" on its harmlessness, there is nothing to say that the security system 
can not normalize the overlong strings prior to doing its scan and act as 
if that were the form they were supplied in. This would let the interpreter 
accept the overlong data (or the normalized copy the security system 
checked and then passed to it).

>but the interpreter accepts them as if nothing
>was wrong, things you would not like to happen might happen.
>So overlong UTF-8 sequences should be regarded as errors, and
>not as a coding for any character at all.  Yes, you may regard
>systems that at all have "escapes" into "execute this" mode
>as ill-designed.  But they are around.




RE: How to distinguish UTF-8 from Latin-* ?

2000-06-22 Thread Robert A. Rosenberg

At 12:12 PM 06/20/2000 -0800, Kenneth Whistler wrote:
>Bob Rosenberg wrote:
>
> > >
> > >This was my concern, there is no way to distinguish UTF-8 from Latin-1 in
> > >case of upper ASCII characters here.
> >
> > Yes there is - its called a "Sanity Check". You parse the file looking for
> > High-ASCII. If you find none - you are US-ASCII (or ISO-8859-1). Once you
> > find one, you use the UTF-8 Suffix method to see how long the string 
> should
> > be IF it is UTF-8. Look at the next x characters to see if they have the
> > correct suffix. If not, count as a Bad-UTF-8. If so, count as one
> > Good-UTF-8. Once you roll off the end of the string resume scanning for
> > another High-ASCII and do the check again. After finding 12 strings that
> > start with High-ASCII (or bopping off the end of the file) check your
> > GOOD/BAD counts. All BAD means ISO-8859-1. All GOOD means UTF-8.
>
>Well, not necessarily. Granted, the distribution of precedent bytes and
>successor bytes in UTF-8, when interpreted as ISO 8859-1, mostly results
>in gibberish that is unlikely to appear in real text. The first byte of
>a two-byte UTF-8 sequence consists essentially of an accented capital
>letter in 8859-1 (0xC0..0xDF). And the successor bytes are either C1
>controls or come from the set of miscellaneous symbols, currency signs,
>punctuation, etc., that are rather unlikely to occur directly following
>an uppercase accented Latin letter.
>
>But if I invented a hoity-toity company name with extra accents for
>"class", such as, L·DÏ·DÀ® Productions, Inc. and sent this to you in
>ISO 8859-1, as I am currently doing, your sanity check will fail in
>this case and identify this file as UTF-8, with 3 characters misinterpreted.
>(i.e., LDD. Productions, Inc.) Of course, a 
>further check
>for irregular sequence UTF-8 would discover that 0xC0 0xAE ==> U+002E is
>not shortest form UTF-8, and might, therefore, not actually be UTF-8,
>but even that cannot really be relied on.

True you can FAKE an incorrect evaluation by plugging a trick string into 
an otherwise low ASCII file/message. My comment was aimed at normal (not a 
faked) files. I agree that missed the extra sanity check of looked for 
shortest string but if I remember the rules correctly, there is no 
requirement the shortest form be emitted - only a strong suggestion to do 
so (with a stronger suggestion to accept it [ie: "Be liberal with what you 
accept and conservative with what you create"]). I doubt that a real 
ISO-8859-1 file could be mistaken for a UTF-8 one without it being 
specially constructed to trick the sanity check. Note that the 12 string 
"universe" is just an attempt to check for false positives and could be 
adjusted for circumstances.

> > Mixed
> > (with most being BAD) is ISO-8859-1 (the Goods are "noise"). Mostly Good
> > with a few Bad are either malformed UTF-8 or ISO-8859-1 (with the bad luck
> > of finding 2 byte strings that LOOK LIKE UTF-8).
>
>Even entirely GOOD can have that bad luck, as this email itself
>demonstrates.

Since this is a special message that was designed to spoof not a real 
message, I do not regard it as bad luck. If you can supply a set of normal 
text that would give a false reading, I'd be much more willing to say that 
my claim of just doing a sanity check was overly simplistic.


>--Ken




RE: How to distinguish UTF-8 from Latin-* ?

2000-06-20 Thread Robert A. Rosenberg

At 02:01 PM 06/19/2000 -0800, Vinod Balakrishnan wrote:


>[snip]
> >2) No encoding information... UTF-8  can be assumed (often it is just ASCII
> >so this works)
>
>This was my concern, there is no way to distinguish UTF-8 from Latin-1 in
>case of upper ASCII characters here.

Yes there is - its called a "Sanity Check". You parse the file looking for 
High-ASCII. If you find none - you are US-ASCII (or ISO-8859-1). Once you 
find one, you use the UTF-8 Suffix method to see how long the string should 
be IF it is UTF-8. Look at the next x characters to see if they have the 
correct suffix. If not, count as a Bad-UTF-8. If so, count as one 
Good-UTF-8. Once you roll off the end of the string resume scanning for 
another High-ASCII and do the check again. After finding 12 strings that 
start with High-ASCII (or bopping off the end of the file) check your 
GOOD/BAD counts. All BAD means ISO-8859-1. All GOOD means UTF-8. Mixed 
(with most being BAD) is ISO-8859-1 (the Goods are "noise"). Mostly Good 
with a few Bad are either malformed UTF-8 or ISO-8859-1 (with the bad luck 
of finding 2 byte strings that LOOK LIKE UTF-8). 




Re: Unicode and multilingual support in Macintosh Web browsers

2000-06-16 Thread Robert A. Rosenberg

At 10:07 AM 06/16/2000 -0800, Deborah Goldsmith wrote:
>on 6/16/2000 10:22 AM, Robert A. Rosenberg <[EMAIL PROTECTED]>
>wrote:
>
> > Adobe Indesign has Unicode Support. So does Outlook Express (just
> > send/receive a message in UTF-7/UTF-8).
>
>Outlook Express only supports the subset of Unicode which can be displayed
>using Mac OS legacy character sets. It does not support all of Unicode.
>
>I haven't tried Indesign, but I believe it supports a subset of Unicode as
>well.

The InDesign may be a Subset but I think it is more than the Mac 192 
characters. I think that it covers Eastern & Western Europe (and Cyrillic & 
Hebrew). Missing is Arabic and the Far East Scripts (CJK).


>Deborah Goldsmith
>Manager, International Toolbox Group
>Apple Computer, Inc.
>[EMAIL PROTECTED]




RE: Linguistic precedence

2000-06-16 Thread Robert A. Rosenberg

At 02:37 AM 06/16/2000 -0800, Michael Everson wrote:
>software that insists ... that all letters be capitalized is utterly evil. 
>:-)


It sure makes it hard to tell how to tell the difference between polish and 
Polish (as well as how to pronounce the word "POLISH" since you first must 
figure out which word it is) .





Re: The mother of all collation schemes

2000-06-16 Thread Robert A. Rosenberg

At 12:11 PM 06/15/2000 -0800, [EMAIL PROTECTED] wrote:

>2) My alphabetical order: (digits are treated as letters):
>[sp] [other punc.] 0 1 2 3 4 5 6 7 8 9 A Á Ä À B C Ç D E É Ë È F G H Í Ï Ì J K
>L M N Ñ O Ó Ö Ò P Q R S T U Ú Ü Ù V W X Y ÿ(why couldn't I find this in
>uppercase?)

Ÿ=Alt+0159 (on a WinTel Machine).

>Z




Re: Unicode and multilingual support in Macintosh Web browsers

2000-06-16 Thread Robert A. Rosenberg

At 01:26 PM 06/15/2000 -0800, John Jenkins wrote:
>on 6/15/00 6:00 AM, Alan Wood at [EMAIL PROTECTED] wrote:
>
> > I have tried without success to find information on how to view 
> multilingual
> > Web pages with a Macintosh and which multilingual fonts are available, so I
> > have documented the things I have discovered by a process of trial and
> > error, and produced a new page in my collection of Unicode information at:
> >
> > http://www.hclrss.demon.co.uk/unicode/macbrowsers.html
> >
> > I will appreciate being advised of any errors or of further sources of
> > information.
> >
>
>Well done.  There is one clarification I would suggest, however.
>
>Apple has provided support for direct Unicode rendering since Mac OS 8.5.
>This includes the ability to use large, data-fork TrueType fonts such as
>Arial Unicode. The OS is perfectly capable of handling these fonts and
>applications have the ability extended to them to do all of Unicode. None,
>however, are taking advantage of that ability as of yet.

Adobe Indesign has Unicode Support. So does Outlook Express (just 
send/receive a message in UTF-7/UTF-8).


>=
>John H. Jenkins
>[EMAIL PROTECTED]
>[EMAIL PROTECTED]
>http://www.blueneptune.com/~tseng




Re: Pictograms

2000-06-15 Thread Robert A. Rosenberg

At 12:09 AM 06/15/2000 -0800, William Overington wrote:
>I am unsure what happens to an
>International Standard Book Number when a second, altered or corrected,
>edition of a book is published, whether a new ISBN is assigned to the new
>edition or whether the old ISBN is recycled

It is a new book and a new number is assigned. A price change on the book 
(even if it is just a new printing not a new edition) normally gets a new 
number since the number keys back to the book and the publisher must track 
the $4.95 copies separately from the $5.95 ones. A new printing without a 
new price keeps the same ISBN. In some cases when a book is printed with 
multiple covers, each cover sometimes gets its own ISBN number.




RE: Linguistic precedence [was: (TC304.2313) AND/OR:

2000-06-15 Thread Robert A. Rosenberg

At 07:53 AM 06/15/2000 -0800, Michael Kaplan (Trigeminal Inc.) wrote:
>Eventually someone will have a language name that does not fit
>or a language like German will inist on sorting sooner, under Deutsch rather
>than under German, etc. (which I personally think makes more sense than
>making a locale take someone's translation of their language name, FWIW).

Since it was stated that Greek was displayed between German and Spanish, 
I;d assume that German was Deutsch since Spanish is Espanol (not sure if 
that "n" is "n" or "ñ" as well as if my spelling is correct).




Re: French encoding [Was: Chapter on character sets]

2000-06-15 Thread Robert A. Rosenberg

At 09:52 AM 06/15/2000 -0800, [EMAIL PROTECTED] wrote:
>Since Latin-1 was the
>encoding of choice prior to Latin-9 (and still is in many situations),
>there really are much more data encoded as 8859-1 than as 8859-9.


Latin9 = ISO-8859-15 not -9.





Re: Chapter on character sets

2000-06-15 Thread Robert A. Rosenberg

At 05:01 AM 06/15/2000 -0800, [EMAIL PROTECTED] wrote:
>- The C1 range wasn't empty: Microsoft simply took advantage of the fact
>that this range isn't needed on PC's, and filled it with graphic
>characters in the Windows codepages.

Apple (or rather the MUA/WEB Publishers for Apple Programs) did the same 
for the
MacRoman<->ISO-8859-1 translations. Unfortunately the codes that are in the 
Windows C1 range and MacRoman (the Macintosh native mapping for x80-xFF) do 
not have the same "ISO-8859-1" mappings so unless Windows usage of this 
range is marked as CP1252 (in lieu of the inaccurate ISO-8859-1) there will 
be display problems when transferring content containing glyphs in the C1 
range between the two platforms. Of course this is due to the "Head in the 
Sand" behavior of the ISO Ivory Town types in refusing to issue standards 
which put usable Glyphs in the C1 range instead of the useless control 
codes. Note: I am not saying that ISO-8859-1 should not have the junk there 
but only that there should be a parallel set of Standards to ISO-8859-x 
with the Glyphs there. 




Re: Pictograms (was: (TC304.2313) AND/OR: antediluvian views)

2000-06-14 Thread Robert A. Rosenberg

At 11:34 AM 06/13/2000 -0800, Alain wrote:
>[Alain]  In my example of this morning, it was not mainly because French 
>was in 5th position that I was the most upset, it is because I was in a 
>hurry -- that was last Tuesday -- and that I had to wait for the vocal 
>explanations for many minutes while French was the second most-used 
>language in this hotel (the others in 2nd, 3rd and 4th position were, for 
>those interested [remember that we are in Toronto, not in Tokyo nor 
>Cairo]: Japanese [nihon-go], Spanish [español], Arabic [arabiya] [sorry if 
>I made a mistakes in spelling, that is what I heard, and I was as 
>attentive as I could). Even Spanish should have come before Japanese in 
>North America. It is a matter of common sense. But I was under the 
>impression that those who took the decision for the order in languages at 
>this hotel were vicious. Not very good indeed for their customer base... 
>The guy who did that should be reprimanded... Anyway they risk to lose me...

As I noted in a prior comment, since I assume you were in your room at the 
time, it should have offered YOU French as option 1 (since that could be 
set at check in time). For a Voice Mail system in a HOTEL, the lack of a 
way to flag preferred language on a per-room basis is poor design. I do 
computer program design for a living and as an example of this principle, I 
designed an ATM system that could display in a number of different 
languages BUT based on the card used would default to the correct language 
prior to offering the "What Language do you want" screen. 




Re: [unicode] Re: (TC304.2313) AND/OR: antediluvian views

2000-06-14 Thread Robert A. Rosenberg

At 07:29 AM 06/13/2000 -0800, Alain wrote:
>With more than 2 languages, precedence becomes problematic. As an example 
>of language precedence, an actual case: at the Toronto Airport Radisson 
>Suite Hotels, my prefered hotel in Toronto (so far! but it could 
>change...), they recently introduced a multilingual voice mail system. In 
>Canada, French and English are the two official languages of the country 
>(and most probably at this hotel the majority of the customers speak 
>Englsih and French, with a high concentration of French speakers). In 
>general in Canada you are presented with a choice of language where you 
>indicate your option by pressing a specific key on the telephone keypad (1 
>English 2 French -- or the reverse in Québec). At this hotel, French is 
>the 5th choice. It is offensive, I can assure you (I would not have been 
>offended in Taiwan, of course).

It is also a bad design (for a Hotel). When you check into your room the 
system should be told what language is to be the default FOR THAT ROOM and 
you should get a list where that language is #1 (with the others listed as 
#2-x). I agree that in Canada E&F should be the first 2 offered in ALL cases.




Re: [unicode] Re: (TC304.2313) AND/OR: antediluvian views

2000-06-14 Thread Robert A. Rosenberg

At 09:57 AM 06/13/2000 -0800, Otto Stolz wrote:
>
>
>Am 2000-06-13 um 17:49 h hat Alain geschrieben:
> > [Having pictograms everywhere] is much lighter than having to provide
> > indications, say, in 12 languages (most common example: toilets).

Watch out when you go to the bathroom in Scotland. The Dress/Slacks 
Female/Male outlines might not fly there since the Female outline looks 
Male to them (Kilts you know ).