Re: Unicode and Security

2002-02-09 Thread John Cowan

[EMAIL PROTECTED] scripsit:

> Let's keep going.  Latin Y, Greek Upsilon, Cyrillic U.  Wait a minute, that 
> Cyrillic U doesn't look *quite* the same.  Oh well, it's close enough, right? 

And then there's the Cyrillic U with the straight descender, whic
actually does look just like its Latin and Greek counterparts.
I guess we just can't afford to have two kinds of Cyrillic U around:
off with their heads (or tails)!

Unfortunately, there goes all those Turkic languages written in Cyrillic.
Well, they should Romanize anyway.  In fact all languages should Romanize:
it simplifies everything s much, and if we get rid of diacritics
while we're at at it well, the ASCII Consortium
(off-net, but cached in part at 
http://www.google.com/search?q=cache:IRueJQ1bA-4C:www.wholehog.fsnet.co.uk/robert/ascii/+ASCII+Consortium&hl=en)
will find it a dream come true.  And there was much rejoicing.

-- 
John Cowan   http://www.ccil.org/~cowan  [EMAIL PROTECTED]
To say that Bilbo's breath was taken away is no description at all.  There
are no words left to express his staggerment, since Men changed the language
that they learned of elves in the days when all the world was wonderful.
--_The Hobbit_




Re: IPA keyboard

2002-02-09 Thread DougEwell2

I apologize in advance for replying in public to Michka's private message, 
but he asked a good question.

>> What I don't want:
>> - Anything that requires me to install Keyman
>
> How come? (Just curious).

I really don't want to install a Big Pre-Packaged Solution and have it do 
everything for me, Windows-wide.  What I'm looking for is a technical spec 
that I can use to build something from scratch for one application.

I'd rather not go through the effort of installing Keyman, especially on the 
memory- and disk-challenged machine I'm using here at home, just so I can 
load a single keyboard layout, reverse-engineer it, and uninstall Keyman.  
That's all I'd really be doing with it, at least for now.

I know there are a lot of Keyman devotees on the list, so if I am greatly 
exaggerating the effort vs. payoff, please let me know in a gentle, 
flame-free way.  Also, if nobody is able to come up with a text description 
of the type I wanted, I may have to resort to the Keyman approach anyway.

James Kass wrote:

> Here is an actual layout for IPA UTF-8 entry:
> http://www.elgin.free-online.co.uk/ipa_kb_det.htm

This is weird.  Quoting from the page, "Since Unicode UTF-8 encoding codes 
each IPA symbol as two characters (bytes), you will have to type two keys for 
each letter."  Eventually it is revealed that the two-character sequences the 
user must type are "based on the SAMPA guidelines for typing IPA using 
ASCII."  No, this one isn't what I want.

> This page has graphics showing the Mac-IPA layout:
> http://www.matchfonts.com/pages/m-ipa.html

I had seen this page before.  This is more like what I had in mind, but it 
requires five keyboard states.  I know that full IPA support may require more 
than 47 * 4 = 182 characters, but I really have to stick to this limit.  I 
could tolerate supporting only the 182 "most common" IPA characters (whatever 
that means) if necessary.

Also, the Mac-IPA layout is presented as a bitmap only, without Unicode code 
points or even character names.  I'm not familiar enough with IPA to be able 
to distinguish, say, U+0279 from U+027A by looking at smallish bitmaps.

But I do appreciate James's effort in looking up these two resources and 
letting me know about them.

-Doug Ewell
 Fullerton, California
 (address will soon change to dewell at adelphia dot net)




Re: Unicode and Security

2002-02-09 Thread DougEwell2

In a message dated 2002-02-09 13:00:59 Pacific Standard Time, 
[EMAIL PROTECTED] writes:

> It seems to me that this problem really needs some other fix than the
> merging of all similar-looking characters in all character sets. I
> just can't see that working. 

Even the "merging" part wouldn't work.  Let's say that I, like Ken Sakamura 
or Bernard Miller before me, have decided that I know much more about 
character encoding than the Unicode Consortium or WG2, and I am going to 
develop my own character encoding that will solve the problem of confusables 
once and for all.

OK, we start with the easy ones.  Latin A, Greek Alpha, and Cyrillic A all 
get unified.  Latin E, Greek Epsilon, Cyrillic E, unified.  Hey, this is 
easier than I thought.  Latin B, Greek Beta, Cyrillic Ve.  Ha!  I'm smart 
enough to know that Ve gets unified with B and Beta, even though it 
represents a different sound.  Just like Han unification!  Boy, those Unicode 
dolts really missed something there.

Let's keep going.  Latin Y, Greek Upsilon, Cyrillic U.  Wait a minute, that 
Cyrillic U doesn't look *quite* the same.  Oh well, it's close enough, right? 
 Let's try some lower-case letters.  Latin a, Greek alpha, Cyrillic a.  That 
Greek alpha looks kinda cursive, doesn't it?  Should we unify it or not.  
Hmmm...

How about Latin n and Greek eta?  Is that descender on the eta significant or 
not?  Hey, you could stick an eta in the middle of a Web address and really 
fool somebody.  Better unify.  How about Latin v and Greek nu?  Different 
glyphs or not?  In 9-point MS Sans Serif, they're pretty close, aren't they?  
(And don't forget Armenian vo!)  Same goes for Latin y and Greek gamma.

Well, you get the point.  The world of alphabetic confusables is just not 
that simple or that 1-to-1.  There are more edge cases, in fact, than obvious 
cases such as the a/alpha or o/omicron that we keep hearing about.  And if I 
were trying to design this hypothetical "Uniglyph" encoding to get rid of 
those pesky confusables, and still provide support for alphabetic scripts 
besides Latin, I would eventually have to face the fact that it *can't be 
done*.  Oh, sure, it can be done for a/alpha and o/omicron, so I can make a 
sales presentation or a picket sign.  But a complete technical solution, uh, 
no.

-Doug Ewell
 Fullerton, California
 (address will soon change to dewell at adelphia dot net)




Re: ICU website

2002-02-09 Thread Mark Davis

Wait until we can find out what happened. It was not supposed to
change.

Mark
—

Πόλλ’ ἠπίστατο ἔργα, κακῶς δ’ ἠπίστατο 
πάντα — Ὁμήρου Μαργίτῃ
[For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr]

http://www.macchiato.com

- Original Message -
From: "Addison Phillips [wM]" <[EMAIL PROTECTED]>
To: "Roozbeh Pournader" <[EMAIL PROTECTED]>; "Unicode List"
<[EMAIL PROTECTED]>
Sent: Saturday, February 09, 2002 09:08
Subject: RE: ICU website


> The server changed to www-124.ibm.com
>
> Best Regards,
>
> Addison
>
> Addison P. Phillips
> Globalization Architect / Manager, Globalization Engineering
> webMethods, Inc.  432 Lakeside Drive, Sunnyvale, CA
> +1 408.962.5487 (phone)  +1 408.210.3659 (mobile)
> -
> Internationalization is an architecture. It is not a feature.
>
>
> > -Original Message-
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED]]On Behalf Of Roozbeh Pournader
> > Sent: 2002年2月9日 8:15
> > To: Unicode List
> > Subject: ICU website
> >
> >
> >
> > Does anyone know if the web address for ICU has changed? My URL,
> > , gives me a name lookup error.
> >
> > roozbeh
> >
> >
> >
>
>
>





Re: ISO 8859-11 Latin/Thai + Euro

2002-02-09 Thread Kenneth Whistler

> Date: Sat, 09 Feb 2002 22:42:43 +0100
> Subject: Re: ISO 8859-11 Latin/Thai + Euro
> To: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED],
> [EMAIL PROTECTED]
> 
> * Markus Kuhn
> | 
> | [I don't have a copy of either ISO 8859-11 or TIS 602, so I can't
> | compare the two myself. I suspect they are the same 

Yes, are Lars said.

> | and that it
> | will be the first part of ISO 8859 that has combining characters.]

No, that distinction belongs to ISO 8859-6 Latin/Arabic.

--Ken

> 
> You mean TIS 620, and, yes, they are the same.
>  
> -- 
> Lars Marius Garshol, Ontopian http://www.ontopia.net >
> ISO SC34/WG3, OASIS GeoLang TChttp://www.garshol.priv.no >




Re: ISO 8859-11 Latin/Thai + Euro

2002-02-09 Thread Lars Marius Garshol


* Markus Kuhn
| 
| [I don't have a copy of either ISO 8859-11 or TIS 602, so I can't
| compare the two myself. I suspect they are the same and that it
| will be the first part of ISO 8859 that has combining characters.]

You mean TIS 620, and, yes, they are the same.
 
-- 
Lars Marius Garshol, Ontopian http://www.ontopia.net >
ISO SC34/WG3, OASIS GeoLang TChttp://www.garshol.priv.no >





Re: Unicode and Security

2002-02-09 Thread Lars Marius Garshol


* Elliotte Rusty Harold
|
| Let's say I register microsoft.com, only the fifth letter isn't a
| lower-case Latin o. It's actually a lower case Greek omicron.

I'll grant you that this is possible, perhaps even likely, and that it
may cause problems, but I'm far from convinced that this in any way
supports the "there are security problems in Unicode" thesis.

There are many characters which look alike, and yet are different,
which can cause problems of this kind. There are for example already
viruses which exploit the visual similarity between 1 and l in the
Windows system font to keep themselves from being discovered in file
listings.

So if this really is considered a problem it would seem to me that you
would need to deal with the problem of [EMAIL PROTECTED],
[EMAIL PROTECTED], and [EMAIL PROTECTED] looking very similar to
[EMAIL PROTECTED] in lots of fonts. To exploit this, all you need to
know is what email client someone uses, and usually every email they
write will have that information in its headers.

It seems to me that this problem really needs some other fix than the
merging of all similar-looking characters in all character sets. I
just can't see that working. 

Similarly, the "security problems" caused by using Unicode encoding
tricks to hide or mangle text in, say, contracts, is no different from
using HTML or CSS (or whatever) tricks to achieve the same effect, and
yet nobody is talking about security problems with HTML or CSS. See
[1] for one way of dealing with it that is now being worked on.

So while I accept that there is a problem it does not seem to me that
Unicode is the problem. To me the problem seems to be the complexity
of the relationship between the bytes sent to the user and what the
user actually sees and reacts to. That complexity is not going to
disappear, and aspects of the same "problem" exist with just about any
information representation, so clearly the solution must be something
other than changing all of these syntaxes/formats/encodings.

In the specific case you cite, for example, a better solution might be
for the user's email client to keep track of all the user's contacts
and for it to indicate in some clearly visible way whether the current
email comes from one of them or not. Whether it uses string matching
of email addresses or digital signatures to do that doesn't really
matter; it solves the problem in your example either way.

[1] http://www.w3.org/TR/xmldsig-core/#sec-Seen >

-- 
Lars Marius Garshol, Ontopian http://www.ontopia.net >
ISO SC34/WG3, OASIS GeoLang TChttp://www.garshol.priv.no >





RE: ICU website

2002-02-09 Thread Addison Phillips [wM]

The server changed to www-124.ibm.com

Best Regards,

Addison

Addison P. Phillips
Globalization Architect / Manager, Globalization Engineering
webMethods, Inc.  432 Lakeside Drive, Sunnyvale, CA
+1 408.962.5487 (phone)  +1 408.210.3659 (mobile)
-
Internationalization is an architecture. It is not a feature.


> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED]]On Behalf Of Roozbeh Pournader
> Sent: 2002年2月9日 8:15
> To: Unicode List
> Subject: ICU website
> 
> 
> 
> Does anyone know if the web address for ICU has changed? My URL, 
> , gives me a name lookup error.
> 
> roozbeh
> 
> 
> 





ICU website

2002-02-09 Thread Roozbeh Pournader


Does anyone know if the web address for ICU has changed? My URL, 
, gives me a name lookup error.

roozbeh





Re: IPA keyboard

2002-02-09 Thread James Kass


A couple of articles in *.DOC format linked at:
http://www.phon.ucl.ac.uk/home/wells/
on John Wells' web site are interesting.  They are about using 
the auto-correct feature of Word to input IPA.  See the links 
under "Research" for Eureka and Eureka-IPA.

Here is an actual layout for IPA UTF-8 entry:
http://www.elgin.free-online.co.uk/ipa_kb_det.htm

This page has graphics showing the Mac-IPA layout:
http://www.matchfonts.com/pages/m-ipa.html

Best regards,

James Kass.

- Original Message - 
From: <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; 
<[EMAIL PROTECTED]>
Sent: Friday, February 08, 2002 11:40 PM
Subject: IPA keyboard 


> I am looking for information on IPA keyboards.  I would like to build a 
> keyboard for SC UniPad that would allow the user to type IPA characters 
> directly.
> 
> What I want:
> - Text.  (Could be plain text, PDF, Word, Excel, etc.)
> - Keys referenced by ISO 9995, scan codes, or U.S. English assignment
> - Characters referenced by Unicode values, or at least SGML entities
> - Preferably no more than four (4) discrete keyboard states
> 
> Linux keymaps are fine if they meet the above requirements.
> 
> What I don't want:
> - Graphic images without a corresponding text description
> - Anything that requires me to install Keyman
> - Anything related to "ASCII IPA"
> 
> Auy such information would be appreciated.
> 
> Thank you,
> 
> -Doug Ewell
>  Fullerton, California
>  (address will soon change to dewell at adelphia dot net)
> 
> 





IPA keyboard

2002-02-09 Thread DougEwell2

I am looking for information on IPA keyboards.  I would like to build a 
keyboard for SC UniPad that would allow the user to type IPA characters 
directly.

What I want:
- Text.  (Could be plain text, PDF, Word, Excel, etc.)
- Keys referenced by ISO 9995, scan codes, or U.S. English assignment
- Characters referenced by Unicode values, or at least SGML entities
- Preferably no more than four (4) discrete keyboard states

Linux keymaps are fine if they meet the above requirements.

What I don't want:
- Graphic images without a corresponding text description
- Anything that requires me to install Keyman
- Anything related to "ASCII IPA"

Auy such information would be appreciated.

Thank you,

-Doug Ewell
 Fullerton, California
 (address will soon change to dewell at adelphia dot net)