Re: Acronyms (off-topic)
At 12:04 PM 08/02/2000 -0800, Geoffrey Waigh wrote: >On Wed, 2 Aug 2000, Alain LaBonté wrote: > > > À 07:12 2000-07-11 -0800, Doug Ewell a écrit: > > >Many English speakers also think ISO is an abbreviation or initialism > > >(not "acronym"; that term is correct only when the resulting "word" > > >is actually pronounced, like "AIDS" or "SIDA") of the English name > > >"International Standards (or Standardization) Organization." Of course, > > >this is wrong. > > > > [Alain] ISO is not pronounced as a word in English but it is in French > >Um, I and quite a few people I know pronounce ISO as if it were a word in >English. So have I. Most of the spoken usages of it is as I-SO (EYE-SEW) not I-S-O (EYE-ESS-OH). > I have no idea on the origins of the name, but given that an >uncynical view of the function of that body is to produce international >standards, I doubt the correct etymology will win through. > >Geoffrey
RE: Subject lines in UTF-8 mssgs? [was: Proposal to make ...]
At 01:41 AM 07/13/2000 -0800, [EMAIL PROTECTED] wrote: >As far as I can understand, the choice of the outgoing charset is highly >automatic in MS Outlook 2000. I suspects it depends on the combination of >characters that I (or the system) used in the various fields of the e-mail. The problem is that the heuristics are not correct for ISO-8859-1/CP-1252. The selection SHOULD be: 1) Only x00-x7F - US-ASCII 2) x00-x7F + xA0-xFF - ISO-8859-1 [Western European(ISO)] 3) x00-x7F + xA0-xFF + a character in the x80-x9F code point range - CP-1252/Windows-1252 [Western European(Windows]) If you check the Encoding list, you will note that Western European(ISO) and Western European(Windows) are both listed and the selection controls if a message with xA0-xFF characters gets ID'ed as ISO-8859-1 or CP-1252. The problem is that selection of Western European(ISO) does not correct the message's CHARSET to CP-1252 if a x80-x9F is found in the message.
Re: Euro character in ISO
At 09:27 PM 07/11/2000 -0800, Michael \(michka\) Kaplan wrote: >Robert, > >I am a big fan of the Windows code pages, they often make my life easier. >However, there is a disadvantage to the fact that even over the course of a >few service packs (let alone a few operating systems!) So? What does that have to do with ISO issuing C1-less 8859 codes that have an extra 32 glyphs? They issue it, MS adds the codes, and everyone has a constant set of codes. The fact that MS's CP125x codes are changed with no name change is a separate issue. > the code pages have >changed, and there is simply no good documentation that will tell you when >(for example) Farsi characters U+06A9 and U+06AF were added to Windows >CP1256 (Arabic) . All that one knows for certain is that it was before >Windows 98 SE and before NT4 SP5 (although it did no ship with NT4). > >When you cannot figure out why an application works on one platform and not >another, it can make you pine for a more stationary standard! :-) > >My ISP moved to Windows 2000 so I do not have to worry about making them >install things like newer code page files on the web server, but for a long >time thse differences plagued me heavily. > >michka > > >- Original Message - >From: "Robert A. Rosenberg" <[EMAIL PROTECTED]> >To: "Unicode List" <[EMAIL PROTECTED]> >Cc: "Unicode List" <[EMAIL PROTECTED]> >Sent: Tuesday, July 11, 2000 7:19 PM >Subject: Re: Euro character in ISO > > > > At 15:30 -0800 on 07/11/00, Asmus Freytag wrote about Re: Euro > > character in ISO: > > > > >There has been an attempt to create a series of 'touched up' 8859 > > >standards. The problem with these is that you get all the issues of > > >character set confusion that abound today with e.g. Windows CP 1252 > > >mistaken for 8895-1 with a vengeance: > > > > The problem would go away if the ISO would get their heads out of > > their a$$ and drop the C1 junk from the NEW 'TOUCHED UP" 8859s and > > put the CP125x codes there. > > Then when you said you used 8859-21 you'd get CP-1252 and Windows > > would no longer need to lie (or tell the truth by admitting it is > > CP-1252). > >
Re: Euro character in ISO
At 08:56 PM 07/11/2000 -0800, Geoffrey Waigh wrote: >On Tue, 11 Jul 2000, Robert A. Rosenberg wrote: > > > At 15:30 -0800 on 07/11/00, Asmus Freytag wrote about Re: Euro > > character in ISO: > > > > >There has been an attempt to create a series of 'touched up' 8859 > > >standards. The problem with these is that you get all the issues of > > >character set confusion that abound today with e.g. Windows CP 1252 > > >mistaken for 8895-1 with a vengeance: > > > > The problem would go away if the ISO would get their heads out of > > their a$$ and drop the C1 junk from the NEW 'TOUCHED UP" 8859s and > > put the CP125x codes there. > >Except that would break all the systems that understand that C1 "junk," >and a number of systems do so because they are adhering to other >ISO standards. If you are going to force someone to change their >datastreams to something new, they might as well go to some flavour >of Unicode anyways. Who is going to get broken if I say on my MIME header (or HTML) that my CHARSET is (example) ISO-8859-21? You are talking about uses where the computer is talking to a device and needs the C1 range to tell it what to do not another computer (where it is just passing a text stream). The C1 codes are DEVICE CONTROL and have no purpose (except to occupy slots that are better used for extra GLYPHS) in EMAIL or HTML transfer. I am NOT asking for anyone to change their mode of operation - only for ISO-8859-x codes that are designed for transfer of printable data. UNICODE is not a viable option since all we are talking about is the ability to select from a number of 256 codepoint 8-bit tables not go over to UTF-8 or UTF-16 (which would require changes to the program code). >Geoffrey >"tilting at terminal emulators, err windmills."
Re: Euro character in ISO
At 04:27 AM 07/12/2000 -0800, Michael Everson wrote: >Ar 18:19 -0800 2000-07-11, scríobh Robert A. Rosenberg: > > >The problem would go away if the ISO would get their heads out of > >their a$$ and drop the C1 junk from the NEW 'TOUCHED UP" 8859s and > >put the CP125x codes there. > >Excuse me, but that is not appropriate. The ISO/IEC 8859 series is >conformant with ISO/IEC 2022, and protocols which adhere to that standard >should not be compromised by what you suggest. > > >Then when you said you used 8859-21 you'd get CP-1252 and Windows > >would no longer need to lie (or tell the truth by admitting it is > >CP-1252). > >The problem is that some companies do/did not correctly identify their code >pages. The world can live with Latin-1 and CP-1252. It shouldn't have to >live with CP-1252 being identified as Latin-1. Which is what I am saying when I talk about admitting that you are using CP-1252 not ISO-8859-1 (in your MIME/HTML headers) at least in the case where there are glyphs in the x80-x9F range in use. If a system can claim US-ASCII if no codes in the x80-xFF range appear and ISO-8859-1 otherwise (as many MUAs do), it should have the smarts to claim CP-1252 if in its scan it found a x80-x9F glyph). >Michael Everson ** Everson Gunn Teoranta ** http://www.egt.ie >15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland >Vox +353 1 478 2597 ** Fax +353 1 478 2597 ** Mob +353 86 807 9169 >27 Páirc an Fhéithlinn; Baile an Bhóthair; Co. Átha Cliath; Éire
Re: Euro character in ISO
At 04:49 AM 07/12/2000 -0800, Antoine Leca wrote: > > The problem would go away if the ISO would get their heads out of > > their a$$ and drop the C1 junk from the NEW 'TOUCHED UP" 8859s and > > put the CP125x codes there. > > >Sorry. It may work for CP1252/iso-8859-1, and CP1254/iso-8859-9, >but won't for the others. Note - I am not SAYING to use the CP-125x glyphs in my suggested C1-less 8859s - only that doing so will make life much easier. I'd love for ISO to also issue on for the MacRoman/ISO translation so that Apple gets an Official Supported x80-x9F mapping like I'm suggesting for Win-Latinx. > Since Windows starts with the same letter as >Word --or is the reason that they both come from the same company. >No! I cannot believe that-- there are a couple of requirements >that makes effectively the "other" codepages slighty incompatible, >such as the necessary presence for · at position B5 (because this >is the character Word uses when you ask it to "display" the spaces, >and this is hard-coded in the product). Last time I looked, B5 was not in the x80-x9F C1 range where CP125x differs from ISO-8859-x. Thus this is a non-issue. >- Windows HTML-tools/MAs are reluctant to add the test for presence >of non-Latin1 characters to either tag as iso-8859-1 or >windows-1252. Apparently they are too lazy (because they already >did such a test for ASCII). I agree with this statement. Since the scan is there it is not that hard to change it to test for the x80-x9F and xA0-xFF ranges separately and have separate flags for the two cases.
Re: Euro character in ISO
At 09:55 PM 07/11/2000 -0800, Doug Ewell wrote: >Oh no, here comes a flame war. > >Robert A. Rosenberg <[EMAIL PROTECTED]> wrote: > > > The problem would go away if the ISO would get their heads out of > > their a$$ and drop the C1 junk from the NEW 'TOUCHED UP" 8859s and > > put the CP125x codes there. > >The problem would be just beginning for all the users of terminals and >terminal emulators that apparently rely on C1 control codes. These >folks argue that UTF-8 is defective because it sends bytes in the range >0x80 to 0x9F to their terminal when no C1 code was intended, and/or >because real C1 sequences are prefaced by 0xC2. Some even go so far >as to claim that an 8-bit character set is not "legitimate" unless it >supports ISO 2022 and therefore C1. Try telling these people that C1 >characters are "junk." That is an issue with the software that is driving the Terminal or Terminal Emulator NOT with the input to the program. In an Email message (or in an HTML Page Source) C1 characters have no purpose (ie: they do not represent Glyphs or the few C0 functions [CR/LF/TAB?]that are used for these types of data transport) so the fact that some HARDWARE DEVICE needs them is not a reason for leaving them in codes that are used for inter-computer data transfer. >-Doug Ewell > Fullerton, California
Re: Subset of Unicode to represent Japanese Kanji?
At 04:41 AM 07/12/2000 -0800, Otto Stolz wrote: >If I am not mistaken, Kanji is ideographic characters, which would take >the lion's share of memory to implement. Probably, you have to support >kana (hiragana or katakana). > >I do not know Japanese, so others may jump in. In case of major memory constraint, go for Romanjii [sp?] (which is Japanese written in Latin Letters and which the name of the writing systems are examples ). That is what we often see Japanese written as here in the US. It is the text converted phonetically and needs some accents but nothing more. For another example with accents, check out the name of the popular children's show "Pocket Monsters" AKA Pokémon.
Re: Euro character in ISO
At 15:30 -0800 on 07/11/00, Asmus Freytag wrote about Re: Euro character in ISO: >There has been an attempt to create a series of 'touched up' 8859 >standards. The problem with these is that you get all the issues of >character set confusion that abound today with e.g. Windows CP 1252 >mistaken for 8895-1 with a vengeance: The problem would go away if the ISO would get their heads out of their a$$ and drop the C1 junk from the NEW 'TOUCHED UP" 8859s and put the CP125x codes there. Then when you said you used 8859-21 you'd get CP-1252 and Windows would no longer need to lie (or tell the truth by admitting it is CP-1252).
Re: What is this "case folding"?
At 08:27 PM 07/10/2000 -0800, Mark Davis wrote: >While an interesting trivia question, there are enough homographs in >English that the very small percentage of them that can (sometimes) be >distinguished according to case is completely insignificant. I first ran into the Polish/polish pair in a book of short mystery stories that each turned on trivia of this type. In this case it was on the order of "What string of letters represent a word whose pronunciation is undefined if printed in all capital letters". Since the pronunciation of the string is dependent on if the "p" is a Capital Letter or not this differs from words that have more than one pronunciation even when are spelled and cased the same.
Re: What is this "case folding"?
At 06:43 AM 07/10/2000 -0800, [EMAIL PROTECTED] wrote: >If it is what I think it is, I don't want it in English. >How could it tell "aids" from "AIDS", for instance? >Or "joy" from "Joy"(name)? Or Polish (nationality) from polish (shine) . >-- >Robert Lozyniak >Accusplit pedometer, purchased about 2000a07l01d19h45mZ, >has NOT FLIPPED >My page: http://walk.to/11 >[EMAIL PROTECTED] - email >(917) 421-3909 x1133 - voicemail/fax > > > > [EMAIL PROTECTED] wrote: > > [EMAIL PROTECTED] wrote: > > > - Can these mutations only occur after a determinative, > > or can they also > > be > > > at the beginning of a sentence? > > I don't believe they can occur at the beginning > > of a sentence. The most > > common construct occurs after "na" (meaning "of"); > > "Ambasáid na hÉireann" > > (Embassy of Ireland) is an example commonly encountered > > outside Ireland. > > However, they can occur after other words. > > > > > - Is this automatically implemented in the case > > folding function of > > > localized word processors? > > No, not unless some new word processor has been > > launched in the past year > > or so. > > > > B= > >___ >Get your own FREE Bolt Onebox - FREE voicemail, email, and >fax, all in one place - sign up at http://www.bolt.com
Re: UTF-8N?
At 10:54 PM 06/22/2000 -0800, Doug Ewell wrote: >Now that Unicode plans to deprecate the use of U+FEFF as ZWNBSP, >programs that *expect* UTF-8 instead of SBCS will be able to throw away >an initial U+FEFF with even greater confidence. It may even be possible >for operating system developers to build this in at the OS level: open >a UTF-8 text file; read characters; if the very first character in the >file was U+FEFF then eat it. Applications would never even see it. >How cool would that be? It would be very UNCool unless the application can tell the operating system that it wants this done for it. Otherwise it will have no way of KNOWING that the edited stream that the operating system is passing it IS UTF-8 (and was so identified by the deleted BOM) and not some other character-set that the program will fail on if it tries to parse it as UTF-8. Letting the application SEE the BOM acts as a sanity check.
RE: UTF-8 BOM Nonsense
At 11:31 AM 06/22/2000 -0800, Michael Kaplan (Trigeminal Inc.) wrote: >I do not believe that this will require it to be added to a standard, and >this is a non-standard usage, but life is about dealing with things as they >are (and this is how they are!). I assume that you also feel that the charset parm on a MIME Email Header (or HTML/XML header) is not needed and thus should be discouraged. The use of the BOM character at the start of a TEXT file serves the same purpose as the charset tag - It says "I am in UTF-8 format" (so you do not try to treat it as ISO-8859-x, CP1252, or some other encoding format).
RE: How to distinguish UTF-8 from Latin-* ?
At 09:41 AM 06/22/2000 -0800, Karlsson Kent - keka wrote: > >"Be liberal with what you accept and conservative with what you create"]). > > >Well, there is a security aspect to this: sometimes given texts >need to be scanned to try to determine if they are "harmless" >or may trigger some undesirable interpretation (as interpreted >program code, like shell-script, for instance). A hacker may >try to hide characters that trigger the undesired, and potentially >dangerous, interpretation, by using overlong UTF-8 sequences. >If the security scanner program does not "decode" overlong >UTF-8 sequences, Since the interpreter will only see it if the security system has "signed off" on its harmlessness, there is nothing to say that the security system can not normalize the overlong strings prior to doing its scan and act as if that were the form they were supplied in. This would let the interpreter accept the overlong data (or the normalized copy the security system checked and then passed to it). >but the interpreter accepts them as if nothing >was wrong, things you would not like to happen might happen. >So overlong UTF-8 sequences should be regarded as errors, and >not as a coding for any character at all. Yes, you may regard >systems that at all have "escapes" into "execute this" mode >as ill-designed. But they are around.
RE: How to distinguish UTF-8 from Latin-* ?
At 12:12 PM 06/20/2000 -0800, Kenneth Whistler wrote: >Bob Rosenberg wrote: > > > > > > >This was my concern, there is no way to distinguish UTF-8 from Latin-1 in > > >case of upper ASCII characters here. > > > > Yes there is - its called a "Sanity Check". You parse the file looking for > > High-ASCII. If you find none - you are US-ASCII (or ISO-8859-1). Once you > > find one, you use the UTF-8 Suffix method to see how long the string > should > > be IF it is UTF-8. Look at the next x characters to see if they have the > > correct suffix. If not, count as a Bad-UTF-8. If so, count as one > > Good-UTF-8. Once you roll off the end of the string resume scanning for > > another High-ASCII and do the check again. After finding 12 strings that > > start with High-ASCII (or bopping off the end of the file) check your > > GOOD/BAD counts. All BAD means ISO-8859-1. All GOOD means UTF-8. > >Well, not necessarily. Granted, the distribution of precedent bytes and >successor bytes in UTF-8, when interpreted as ISO 8859-1, mostly results >in gibberish that is unlikely to appear in real text. The first byte of >a two-byte UTF-8 sequence consists essentially of an accented capital >letter in 8859-1 (0xC0..0xDF). And the successor bytes are either C1 >controls or come from the set of miscellaneous symbols, currency signs, >punctuation, etc., that are rather unlikely to occur directly following >an uppercase accented Latin letter. > >But if I invented a hoity-toity company name with extra accents for >"class", such as, L·DÏ·DÀ® Productions, Inc. and sent this to you in >ISO 8859-1, as I am currently doing, your sanity check will fail in >this case and identify this file as UTF-8, with 3 characters misinterpreted. >(i.e., LDD. Productions, Inc.) Of course, a >further check >for irregular sequence UTF-8 would discover that 0xC0 0xAE ==> U+002E is >not shortest form UTF-8, and might, therefore, not actually be UTF-8, >but even that cannot really be relied on. True you can FAKE an incorrect evaluation by plugging a trick string into an otherwise low ASCII file/message. My comment was aimed at normal (not a faked) files. I agree that missed the extra sanity check of looked for shortest string but if I remember the rules correctly, there is no requirement the shortest form be emitted - only a strong suggestion to do so (with a stronger suggestion to accept it [ie: "Be liberal with what you accept and conservative with what you create"]). I doubt that a real ISO-8859-1 file could be mistaken for a UTF-8 one without it being specially constructed to trick the sanity check. Note that the 12 string "universe" is just an attempt to check for false positives and could be adjusted for circumstances. > > Mixed > > (with most being BAD) is ISO-8859-1 (the Goods are "noise"). Mostly Good > > with a few Bad are either malformed UTF-8 or ISO-8859-1 (with the bad luck > > of finding 2 byte strings that LOOK LIKE UTF-8). > >Even entirely GOOD can have that bad luck, as this email itself >demonstrates. Since this is a special message that was designed to spoof not a real message, I do not regard it as bad luck. If you can supply a set of normal text that would give a false reading, I'd be much more willing to say that my claim of just doing a sanity check was overly simplistic. >--Ken
RE: How to distinguish UTF-8 from Latin-* ?
At 02:01 PM 06/19/2000 -0800, Vinod Balakrishnan wrote: >[snip] > >2) No encoding information... UTF-8 can be assumed (often it is just ASCII > >so this works) > >This was my concern, there is no way to distinguish UTF-8 from Latin-1 in >case of upper ASCII characters here. Yes there is - its called a "Sanity Check". You parse the file looking for High-ASCII. If you find none - you are US-ASCII (or ISO-8859-1). Once you find one, you use the UTF-8 Suffix method to see how long the string should be IF it is UTF-8. Look at the next x characters to see if they have the correct suffix. If not, count as a Bad-UTF-8. If so, count as one Good-UTF-8. Once you roll off the end of the string resume scanning for another High-ASCII and do the check again. After finding 12 strings that start with High-ASCII (or bopping off the end of the file) check your GOOD/BAD counts. All BAD means ISO-8859-1. All GOOD means UTF-8. Mixed (with most being BAD) is ISO-8859-1 (the Goods are "noise"). Mostly Good with a few Bad are either malformed UTF-8 or ISO-8859-1 (with the bad luck of finding 2 byte strings that LOOK LIKE UTF-8).
Re: Unicode and multilingual support in Macintosh Web browsers
At 10:07 AM 06/16/2000 -0800, Deborah Goldsmith wrote: >on 6/16/2000 10:22 AM, Robert A. Rosenberg <[EMAIL PROTECTED]> >wrote: > > > Adobe Indesign has Unicode Support. So does Outlook Express (just > > send/receive a message in UTF-7/UTF-8). > >Outlook Express only supports the subset of Unicode which can be displayed >using Mac OS legacy character sets. It does not support all of Unicode. > >I haven't tried Indesign, but I believe it supports a subset of Unicode as >well. The InDesign may be a Subset but I think it is more than the Mac 192 characters. I think that it covers Eastern & Western Europe (and Cyrillic & Hebrew). Missing is Arabic and the Far East Scripts (CJK). >Deborah Goldsmith >Manager, International Toolbox Group >Apple Computer, Inc. >[EMAIL PROTECTED]
RE: Linguistic precedence
At 02:37 AM 06/16/2000 -0800, Michael Everson wrote: >software that insists ... that all letters be capitalized is utterly evil. >:-) It sure makes it hard to tell how to tell the difference between polish and Polish (as well as how to pronounce the word "POLISH" since you first must figure out which word it is) .
Re: The mother of all collation schemes
At 12:11 PM 06/15/2000 -0800, [EMAIL PROTECTED] wrote: >2) My alphabetical order: (digits are treated as letters): >[sp] [other punc.] 0 1 2 3 4 5 6 7 8 9 A Á Ä À B C Ç D E É Ë È F G H Í Ï Ì J K >L M N Ñ O Ó Ö Ò P Q R S T U Ú Ü Ù V W X Y ÿ(why couldn't I find this in >uppercase?) =Alt+0159 (on a WinTel Machine). >Z
Re: Unicode and multilingual support in Macintosh Web browsers
At 01:26 PM 06/15/2000 -0800, John Jenkins wrote: >on 6/15/00 6:00 AM, Alan Wood at [EMAIL PROTECTED] wrote: > > > I have tried without success to find information on how to view > multilingual > > Web pages with a Macintosh and which multilingual fonts are available, so I > > have documented the things I have discovered by a process of trial and > > error, and produced a new page in my collection of Unicode information at: > > > > http://www.hclrss.demon.co.uk/unicode/macbrowsers.html > > > > I will appreciate being advised of any errors or of further sources of > > information. > > > >Well done. There is one clarification I would suggest, however. > >Apple has provided support for direct Unicode rendering since Mac OS 8.5. >This includes the ability to use large, data-fork TrueType fonts such as >Arial Unicode. The OS is perfectly capable of handling these fonts and >applications have the ability extended to them to do all of Unicode. None, >however, are taking advantage of that ability as of yet. Adobe Indesign has Unicode Support. So does Outlook Express (just send/receive a message in UTF-7/UTF-8). >= >John H. Jenkins >[EMAIL PROTECTED] >[EMAIL PROTECTED] >http://www.blueneptune.com/~tseng
Re: Pictograms
At 12:09 AM 06/15/2000 -0800, William Overington wrote: >I am unsure what happens to an >International Standard Book Number when a second, altered or corrected, >edition of a book is published, whether a new ISBN is assigned to the new >edition or whether the old ISBN is recycled It is a new book and a new number is assigned. A price change on the book (even if it is just a new printing not a new edition) normally gets a new number since the number keys back to the book and the publisher must track the $4.95 copies separately from the $5.95 ones. A new printing without a new price keeps the same ISBN. In some cases when a book is printed with multiple covers, each cover sometimes gets its own ISBN number.
RE: Linguistic precedence [was: (TC304.2313) AND/OR:
At 07:53 AM 06/15/2000 -0800, Michael Kaplan (Trigeminal Inc.) wrote: >Eventually someone will have a language name that does not fit >or a language like German will inist on sorting sooner, under Deutsch rather >than under German, etc. (which I personally think makes more sense than >making a locale take someone's translation of their language name, FWIW). Since it was stated that Greek was displayed between German and Spanish, I;d assume that German was Deutsch since Spanish is Espanol (not sure if that "n" is "n" or "ñ" as well as if my spelling is correct).
Re: French encoding [Was: Chapter on character sets]
At 09:52 AM 06/15/2000 -0800, [EMAIL PROTECTED] wrote: >Since Latin-1 was the >encoding of choice prior to Latin-9 (and still is in many situations), >there really are much more data encoded as 8859-1 than as 8859-9. Latin9 = ISO-8859-15 not -9.
Re: Chapter on character sets
At 05:01 AM 06/15/2000 -0800, [EMAIL PROTECTED] wrote: >- The C1 range wasn't empty: Microsoft simply took advantage of the fact >that this range isn't needed on PC's, and filled it with graphic >characters in the Windows codepages. Apple (or rather the MUA/WEB Publishers for Apple Programs) did the same for the MacRoman<->ISO-8859-1 translations. Unfortunately the codes that are in the Windows C1 range and MacRoman (the Macintosh native mapping for x80-xFF) do not have the same "ISO-8859-1" mappings so unless Windows usage of this range is marked as CP1252 (in lieu of the inaccurate ISO-8859-1) there will be display problems when transferring content containing glyphs in the C1 range between the two platforms. Of course this is due to the "Head in the Sand" behavior of the ISO Ivory Town types in refusing to issue standards which put usable Glyphs in the C1 range instead of the useless control codes. Note: I am not saying that ISO-8859-1 should not have the junk there but only that there should be a parallel set of Standards to ISO-8859-x with the Glyphs there.
Re: Pictograms (was: (TC304.2313) AND/OR: antediluvian views)
At 11:34 AM 06/13/2000 -0800, Alain wrote: >[Alain] In my example of this morning, it was not mainly because French >was in 5th position that I was the most upset, it is because I was in a >hurry -- that was last Tuesday -- and that I had to wait for the vocal >explanations for many minutes while French was the second most-used >language in this hotel (the others in 2nd, 3rd and 4th position were, for >those interested [remember that we are in Toronto, not in Tokyo nor >Cairo]: Japanese [nihon-go], Spanish [español], Arabic [arabiya] [sorry if >I made a mistakes in spelling, that is what I heard, and I was as >attentive as I could). Even Spanish should have come before Japanese in >North America. It is a matter of common sense. But I was under the >impression that those who took the decision for the order in languages at >this hotel were vicious. Not very good indeed for their customer base... >The guy who did that should be reprimanded... Anyway they risk to lose me... As I noted in a prior comment, since I assume you were in your room at the time, it should have offered YOU French as option 1 (since that could be set at check in time). For a Voice Mail system in a HOTEL, the lack of a way to flag preferred language on a per-room basis is poor design. I do computer program design for a living and as an example of this principle, I designed an ATM system that could display in a number of different languages BUT based on the card used would default to the correct language prior to offering the "What Language do you want" screen.
Re: [unicode] Re: (TC304.2313) AND/OR: antediluvian views
At 07:29 AM 06/13/2000 -0800, Alain wrote: >With more than 2 languages, precedence becomes problematic. As an example >of language precedence, an actual case: at the Toronto Airport Radisson >Suite Hotels, my prefered hotel in Toronto (so far! but it could >change...), they recently introduced a multilingual voice mail system. In >Canada, French and English are the two official languages of the country >(and most probably at this hotel the majority of the customers speak >Englsih and French, with a high concentration of French speakers). In >general in Canada you are presented with a choice of language where you >indicate your option by pressing a specific key on the telephone keypad (1 >English 2 French -- or the reverse in Québec). At this hotel, French is >the 5th choice. It is offensive, I can assure you (I would not have been >offended in Taiwan, of course). It is also a bad design (for a Hotel). When you check into your room the system should be told what language is to be the default FOR THAT ROOM and you should get a list where that language is #1 (with the others listed as #2-x). I agree that in Canada E&F should be the first 2 offered in ALL cases.
Re: [unicode] Re: (TC304.2313) AND/OR: antediluvian views
At 09:57 AM 06/13/2000 -0800, Otto Stolz wrote: > > >Am 2000-06-13 um 17:49 h hat Alain geschrieben: > > [Having pictograms everywhere] is much lighter than having to provide > > indications, say, in 12 languages (most common example: toilets). Watch out when you go to the bathroom in Scotland. The Dress/Slacks Female/Male outlines might not fly there since the Female outline looks Male to them (Kilts you know ).