RE: the Unicode range and code page range bits in the TrueType OS/2 table

2002-02-08 Thread Chris Pratley

Microsoft applications use both of these to try to determine if a font
is likely to support a certain range. Some fonts do not properly set
those values but most do, especially common ones.

Chris Pratley
Group Program Manager
Microsoft Office

Sent with OfficeXP on WindowsXP


-Original Message-
From: Yung-Fong Tang [mailto:[EMAIL PROTECTED]] 
Sent: February 7, 2002 6:45 PM
To: Brian Stell; Deborah Goldsmith; [EMAIL PROTECTED];
[EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: the Unicode range and code page range bits in the TrueType OS/2
table

Dear i18n folks in Unicode.org

Do we know ANY application on ANY platform use the Unicode range or code

rage field in the TrueType OS/2 table to support different langeuags ?


Does Microsoft applications depend on that ?

Deborah:
How about MacOS  and Mac OS Apps

Any Linux application use that ?

Ken:
Do you know any Adobe software depend on that?

I heard a rumer said that those bits are usually unset and keep as 0. 
But I found that some of the font are set if I use ttfdump to look at
them.

Thanks








RE: the Unicode range and code page range bits in the TrueType OS/2 table

2002-02-08 Thread John Hudson

At 00:19 2/8/2002, Chris Pratley wrote:

Microsoft applications use both of these to try to determine if a font
is likely to support a certain range. Some fonts do not properly set
those values but most do, especially common ones.

Chris, how do you define a 'properly set' Unicode range in the OS/2 table?

Correct codepage support is self-evident: a font should indicate codepage 
support only if it's cmap table includes *all* the characters in that codepage.

Our current production tool (FontLab 4.0) indicates support for a Unicode 
range if *any* of the characters in that range are supported. This seems to 
me, on analysis, to be the best approach, since few fonts will support all 
the characters in a Unicode range, the definition of a Unicode range may 
change over time as new characters are added, and arbitrarily insisting on 
a certain percentage of the characters in a Unicode range is, well, arbitrary.

I seem to recall that this approach is approved by your colleagues in the 
MS type group, but would be interested to know if your opinion, as an MS 
app developer, differed.

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

... es ist ein unwiederbringliches Bild der Vergangenheit,
das mit jeder Gegenwart zu verschwinden droht, die sich
nicht in ihm gemeint erkannte.

... every image of the past that is not recognized by the
present as one of its own concerns threatens to disappear
irretrievably.
   Walter Benjamin





Re: Unicode and Security

2002-02-08 Thread Otto Stolz

Elliotte Rusty Harold wrote:

 The problem is that all of these or any other client-based solution you 
 come up with is only going to be implemented in some clients. Many, and 
 at least initially most, users are not going to have any such 
 protections. This needs to be cut off at the protocol level.


Rather, the problem is that replacing just one of the many existing
character encodings  with an allegedly secure one would only be going
to serve some (rather few!) users. Finding a solution that works with
all character encodings alike, is much more efficient (and is probably
feasable, in contrast to the solution advocated by ERH). One possible
solution for the e-mail spoofing problem is kryptographic authentication.
This is independent of the underlying character encoding, and it is al-
ready widely available.

I said 'allegedly secure', because no character encoding standard can
really prevent this sort of spoofing (we had enough examples in this
thread, based on bare ASCII). Trying to find a spoofing-proof character-
encoding is comparable to the task of finding an alphabet that does not
allow to spell any insults.

Best wishes,
Otto Stolz





Re: Unicode and Security

2002-02-08 Thread Michael Everson

At 17:42 -0500 2002-02-07, John Cowan wrote:

The only widely-deployed alternative approach I know of is
ETSI GSM 03.38 (used in mobile telephony),

A truly bizarre character set:  it supports English, French,
mainland Scandinavian languages, Italian, Spanish with Graves, and
GREEK SHOUTING.

On my Nokia I am forced to write SMS messages in Irish with graves 
for ÀàÌìÒòÙù but I am awarded Éé. The Nokia does have ÁáÍíÓóÚú 
available for spelling names in the phone book, but the accents are 
stripped off if they are sent in a text message. :-(
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: Unicode and Security

2002-02-08 Thread Michael Everson

At 15:53 -0500 2002-02-07, Elliotte Rusty Harold wrote:
For text files, probably not. But for the domain name system the 
world very well might. Indeed, maybe it should unless this problem 
can be dealt with. I suspect it can be dealt with by prohibiting 
script mixing in domain names (e.g. each component of the name must 
be entirely Greek or entirely Cyrillic or entirely Latin etc. Note: 
something_Cyrillic.something_greek.com is OK.)  Does anybody really 
need mixed Latin and Greek domain names?

Certainly. Some years ago the European Court upheld the right of a 
Belgian man whose father was Belgian and mother was Greek to spell 
his hyphenated last name in both scripts. Why should he not be 
allowed to register a domain based on his own name?

I don't think this has anything to do with Unicode. In Unicode, we 
wish to make all the world's writing system available to everyone. 
Thieves and cheats will use it if they wish, but this detracts not 
one whit from the nobility of our enterprise.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




RE: Unicode and Security: Domain Names

2002-02-08 Thread Suzanne M. Topping



 -Original Message-
 From: Tom Gewecke [mailto:[EMAIL PROTECTED]]
 Sent: Thursday, February 07, 2002 6:20 PM
 To: [EMAIL PROTECTED]
 Subject: Re: Unicode and Security: Domain Names
 
 
 I note that companies like Verisign already claim to offer 
 domain names
 in dozens of languages and scripts.  Apparently these are converted by
 something called RACE encoding to ASCII for actual use on the 
 internet.
 
 Does anyone know anything about RACE encoding and its properties?

I wrote an article on IDNS in December of 2000 which discusses the
approaches which were being debated at that time, including RACE. RACE
is briefly described in that article. You can find it at:

http://www-106.ibm.com/developerworks/library/u-domains.html

I tried to find an updated internet draft on RACE, but looks like
nothing exists after version 4, which has been archived. I'm guessing
that draft names wich include the text BRACE, TRACE, and GRACE are
probably RACE variations however. Check them out at:
http://www.ietf.org/internet-drafts/ 

Suzanne Topping
BizWonk Inc.
[EMAIL PROTECTED]




RE: the Unicode range and code page range bits in the TrueType OS/2 table

2002-02-08 Thread Peter_Constable

On 02/08/2002 03:01:31 AM John Hudson wrote:

Chris, how do you define a 'properly set' Unicode range in the OS/2 
table?

Correct codepage support is self-evident: a font should indicate codepage
support only if it's cmap table includes *all* the characters in that 
codepage.

Well, there are some gray areas. There are fonts out there that have the 
bit for cp1252 set but that don't have the euro or the upper/lower 
z-caron.

And, I will confess, there are fonts out there that really stretch their 
claims to supporting cp X. For example, when we were completing our Yi 
font a couple of years ago, we wanted it to work in Word 97 and Word 2000. 
There was a problem in that Word 2000 had a bunch of font-linking things 
going on to try to keep the user from seeing boxes, but the algorithms 
were completely unaware of Yi. I ended up having to set codepage bits for 
Japanese and (I think) Central European (some Latin codepage other than 
cp1252) in order to make Word 2000 actually use the font -- if I didn't, 
then Word would quietly substitute Times New Roman or a Far East font for 
characters that the font really did support, including about half the Yi 
range. The claims of supporting those two codepages was very tenuous: 
there were many of the cp1250 characters not supported by the font, and I 
think there was exactly one character from cp932 that we actually 
supported -- 30FB. I'm sure we're not the only ones who have ever 
stretched things like this.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]








Re: Unicode and Security: Domain Names

2002-02-08 Thread DougEwell2

In a message dated 2002-02-08 8:23:22 Pacific Standard Time, 
[EMAIL PROTECTED] writes:

 Does anyone know anything about RACE encoding and its properties?

 I wrote an article on IDNS in December of 2000 which discusses the
 approaches which were being debated at that time, including RACE. RACE
 is briefly described in that article. You can find it at:

 http://www-106.ibm.com/developerworks/library/u-domains.html

 I tried to find an updated internet draft on RACE, but looks like
 nothing exists after version 4, which has been archived. I'm guessing
 that draft names wich include the text BRACE, TRACE, and GRACE are
 probably RACE variations however. Check them out at:
 http://www.ietf.org/internet-drafts/ 

An ACE (ASCII-Compatible Encoding) has been chosen for IDN, and it is neither 
RACE nor DUDE.  Its working name was AMC-ACE-Z, and it has since been renamed 
Punycode.  (No, I don't like the name either.)

A search for punycode in the internet-drafts directory that Suzanne 
mentioned will reveal the details you are looking for.

Beware that in addition to Punycode, there is another step in the IDN process 
called nameprep, which is basically an extended form of normalization to 
keep compatibility characters, non-spacing marks, directional overrides, and 
such out of domain names.  Converting an arbitrary string through Punycode 
does not necessarily make it IDN-ready.

-Doug Ewell
 Fullerton, California
 (address will soon change to dewell at adelphia dot net)




Re: the Unicode range and code page range bits in the TrueType OS/2 table

2002-02-08 Thread Yung-Fong Tang




ok. Let me ask again since my origional question is not good enough
Do font vendor set teh the ulCharRange bits in OS/2range  ?Does Application or OS depend on ulCharRange for what purpose?


Ken Lunde wrote:
Frank, 
  
 
You wrote: 
 
  Ken: 
Do you know any Adobe software depend on that? 
 
I heard a rumer said that those bits are usually unset and keep as 0. But
 I found that some of the font are set ifI use ttfdump to look at them. 

 
Our OpenType fonts include 'OS/2' tables, and we populate these fields with
 meaningful information. To what extent our applications actually make use
 of it, I don't know. 
 
Regards... 
 
-- Ken 






Re: Unicode and Security

2002-02-08 Thread Philipp Reichmuth

Hi Elliotte and others,

ERH Does anybody really need mixed Latin and Greek domain names?

This is the wrong approach altogether. If we want to be universal, we
can't exclude cases on a heuristic basis of no one is probably going
to need this.

BTW People will certainly want mixed Han and Latin characters where
the problem arises with fullwidth forms to some extent, and people
will probably want mixed Cyrillic and Latin domain names as well (one
starts seeing mixed scripts in business names, for instance).

  Philippmailto:[EMAIL PROTECTED]
___
Hal, open the file / Hal, open the damn file, Hal / open the, please Hal





Re[2]: Unicode and Security

2002-02-08 Thread Philipp Reichmuth

Hello Asmus and others,

I'm not sure Unicode can be fixed at this point. The flaws may be
too deeply embedded. The real solution may involve waiting until
companies and  people start losing significant amounts of money as a
result of the flaws  in Unicode, and then throwing it away and
replacing it with something else.

AF This sounds nice and dramatic, but misses the point that the kinds of 
AF issues you highlighted are absolutely common to *all* character sets 
AF containing Latin and Greek, or Latin and Cyrillic characters, suggesting 
AF that you are simply grandstanding here, instead of trying to find real 
AF solutions to your problem.

Oh, it is very well possible to design a character set that supports
all of Latin, Cyrillic and Greek without being susceptible to this
problem beyond the familiar 1-l-|, 0-O dimension. The main premise is
to encode glyphs instead of characters so that one glyph A is used
in all three of these alphabets. Roundtrip compatibility with legacy
character sets would be a problem, though. It looks like there is the
decision between kludge A (roundtrip compatibility missing) and kludge
B (easier spoofability). However, for URLs etc., roundtrip
compatibility is not really necessary, I think.

AF Earlier, you accused Unicode of being in denial about security
AF issues: It is you who is in denial about some underlying
AF realities, among which is  that there are security issues that
AF cannot be fixed by designing a  'better' character set.

I am sure they can be fixed by designing a better character set that
is better suited to a given problem. A lot of problems can be avoided
by regarding a character set as an application-specific entity to some
extent.

This is not what we want, of course; we want a universal encoding
across all applications. This being our premise, the resulting
problems which you cannot possibly deny will have to be dealt with in
one way or the other. To me, it seems a better idea to fix problems
that arise directly from the way we encode our characters already on
the character set level as far as possible, even if it just means
notifying people that mixing characters from different alphabets may
lead to misinterpretations and to denote common glyph similarities in
the standard, such as the glyph A or for that part the character A
being indiscernible in several alphabets.

  Philippmailto:[EMAIL PROTECTED]
___
Seeing my great fault / Through darkening blue windows / I begin again





Re: Unicode and Security

2002-02-08 Thread Barry Caplan


At 15:53 -0500 2002-02-07, Elliotte Rusty Harold wrote:
For text files, probably not. But for the domain name system the world 
very well might. Indeed, maybe it should unless this problem can be dealt 
with. I suspect it can be dealt with by prohibiting script mixing in 
domain names (e.g. each component of the name must be entirely Greek or 
entirely Cyrillic or entirely Latin etc. Note: 
something_Cyrillic.something_greek.com is OK.)  Does anybody really need 
mixed Latin and Greek domain names?



Not only that, why limit the alleged security risks to domain names? Why 
not the part of an email address before the @? the allowed characters for 
that are specified in a different RFC than that for domain names, and has 
nothing to do at all with DNS.

And how many variations of numerals are there in Unicode? After all, every 
place you could use a domain name, you could use the actual IP address too. 
How many ways might that be spoofed?

Barry






Re[2]: Unicode and Security

2002-02-08 Thread Asmus Freytag

At 06:18 PM 2/8/02 +0100, Philipp Reichmuth wrote:
Oh, it is very well possible to design a character set that supports
all of Latin, Cyrillic and Greek without being susceptible to this
problem beyond the familiar 1-l-|, 0-O dimension. The main premise is
to encode glyphs instead of characters so that one glyph A is used
in all three of these alphabets. Roundtrip compatibility with legacy
character sets would be a problem, though. It looks like there is the
decision between kludge A (roundtrip compatibility missing) and kludge
B (easier spoofability).
If your statement was phrased differently, i.e. saying that domain name
registration and resolution should not allow a distinction between
A.com and A.com where one uses the Greek and one the Latin A, that
would be a different matter. Such action would close this spoofing
loophole very effectively w/o restricting the registration of
meaningful names. However, there may be subtle issues with such an
approach. But the important thing is that it does not fiddle with the
character set as such.

However, for URLs etc., roundtrip
compatibility is not really necessary, I think.

I beg to differ. Roundtrip convertibility is very important since URLs live 
in documents encoded in Unicode, ISO/IEC 8859-7, even Shift-JIS etc. that 
are all
not 'glyph' encodings. Whatever specialized 'character set' gets used 
transiently in resolving the domain name is one issue, but it better be 
easily possible to convert between it and the form URLs are actually stored 
in hypertext.

I am sure they can be fixed by designing a better character set that
is better suited to a given problem. A lot of problems can be avoided
by regarding a character set as an application-specific entity to some
extent.

This is not what we want, of course; we want a universal encoding
across all applications. This being our premise, the resulting
problems which you cannot possibly deny will have to be dealt with in
one way or the other.

Nobody argues that spoofing and other security issues shouldn't get
addressed.

To me, it seems a better idea to fix problems
that arise directly from the way we encode our characters already on
the character set level as far as possible, even if it just means
notifying people that mixing characters from different alphabets may
lead to misinterpretations and to denote common glyph similarities in
the standard, such as the glyph A or for that part the character A
being indiscernible in several alphabets.

And we are certainly doing that. But, while A is an important character,
there are nearly 70,000 han characters out there, some with distinctions
so subtle that many fonts will not show them and many users will not
recognize them. This has not featured in this discussion so far, nicely
showing how our perception of issues are colored by our personal experience
with scripts and languages. For han characters even my simple suggestion
above is probably not practical.

A./




RE: Unicode and Security: Domain Names

2002-02-08 Thread Yves Arrouye

Moreover, the IDN WG documents are in final call, so if you have comments to
make on them, now is the time. Visit http://www.i-d-n.net/ and sub-scribe
(with a hyphen here so that listar does not interpret my post as a command!)
to their mailing list (and read their archives) before doing so.

The documents in last call are:

1. Internationalizing Domain Names in Applications (IDNA)
http://www.ietf.org/internet-drafts/draft-ietf-idn-idna-06.txt

2. Stringprep Profile for Internationalized Host Names
http://www.ietf.org/internet-drafts/draft-ietf-idn-nameprep-07.txt

3. Punycode version 0.3.3
http://www.ietf.org/internet-drafts/draft-ietf-idn-punycode-00.txt

4. Preparation of Internationalized Strings (stringprep)
http://www.ietf.org/internet-drafts/draft-hoffman-stringprep-00.txt

and the last call will end on Feb 11th 2002, 23h59m GMT-5. There is little
time left.

YA





RE: Unicode and Security: Domain Names

2002-02-08 Thread Barry Caplan

I want to review these documents, but since time is short, maybe someone 
can answer my question...

Are the actual domain names as stored in the DB going to be canonical 
normalized Unicode strings? It seems this would go a long way towards 
preventing spoofing ... no one would be allowed to register a non-canonical 
normalized domain name. Then, a resolver would be required to normalize any 
request string before the actual resolve.

So my questions are:

1 - Am I way off base here? If so, why?
2 - If not, is it already addressed in these docs?
3 - If it is not in the docs, and the request makes sense, then I will make 
the effort to beat the deadline, which is next Monday.


Thanks!

Barry

At 10:37 AM 2/8/2002 -0800, Yves Arrouye wrote:
Moreover, the IDN WG documents are in final call, so if you have comments to
make on them, now is the time. Visit http://www.i-d-n.net/ and sub-scribe
(with a hyphen here so that listar does not interpret my post as a command!)
to their mailing list (and read their archives) before doing so.

The documents in last call are:

1. Internationalizing Domain Names in Applications (IDNA)
http://www.ietf.org/internet-drafts/draft-ietf-idn-idna-06.txt

2. Stringprep Profile for Internationalized Host Names
http://www.ietf.org/internet-drafts/draft-ietf-idn-nameprep-07.txt

3. Punycode version 0.3.3
http://www.ietf.org/internet-drafts/draft-ietf-idn-punycode-00.txt

4. Preparation of Internationalized Strings (stringprep)
http://www.ietf.org/internet-drafts/draft-hoffman-stringprep-00.txt

and the last call will end on Feb 11th 2002, 23h59m GMT-5. There is little
time left.

YA





RE: Unicode and Security: Domain Names

2002-02-08 Thread Yves Arrouye

Moreover, the IDN WG documents are in final call, so if you have comments to
make on them, now is the time. Visit http://www.i-d-n.net/ and subscribe to
their mailing list (and read their archives) before doing so.

The documents in last call are:

1. Internationalizing Domain Names in Applications (IDNA)
http://www.ietf.org/internet-drafts/draft-ietf-idn-idna-06.txt

2. Stringprep Profile for Internationalized Host Names
http://www.ietf.org/internet-drafts/draft-ietf-idn-nameprep-07.txt

3. Punycode version 0.3.3
http://www.ietf.org/internet-drafts/draft-ietf-idn-punycode-00.txt

4. Preparation of Internationalized Strings (stringprep)
http://www.ietf.org/internet-drafts/draft-hoffman-stringprep-00.txt

and the last call will end on Feb 11th 2002, 23h59m GMT-5. There is little
time left.

YA





21st Unicode Conference, May 2002, Dublin, Ireland

2002-02-08 Thread Misha . Wolf

 First European IUC in two years! 

 Twenty-first International Unicode Conference (IUC21)
Unicode, Localization and the Web: The Global Connection
http://www.unicode.org/iuc/iuc21
May 14-17, 2002
Dublin, Ireland

 Just 13 weeks to go! 

The Unicode Standard has become the foundation for all modern text
processing.  It is used on large machines, tiny portable devices, and
for distributed processing across the Internet.  The standard brings
cost-reducing efficiency to international applications and enables the
exchange of text in an ever increasing list of natural languages.

New technologies and innovative Internet applications, as well as the
evolving Unicode Standard, bring new challenges along with their new
capabilities.  The Twenty-first International Unicode Conference (IUC21)
will explore the opportunities created by the latest advances and how to
leverage them, as well as potential pitfalls to be aware of, and problem
areas that need further research.

Conference attendees will include managers, software engineers, systems
analysts, font designers, graphic designers, content developers,
technical writers, and product marketing personnel, involved in the
development, deployment or use of Unicode software or content, and the
globalization of software and the Internet.

CONFERENCE WEB SITE, PROGRAM and REGISTRATION

   The Conference Program and Registration form will be available soon
   at the Conference Web site:
  http://www.unicode.org/iuc/iuc21

CONFERENCE SPONSORS

   Agfa Monotype Corporation
   Basis Technology Corporation
   Localisation Research Centre
   Microsoft Corporation
   Reuters Ltd.
   Sun Microsystems, Inc.
   World Wide Web Consortium (W3C)

GLOBAL COMPUTING SHOWCASE

   Visit the Showcase to find out more about products supporting the
   Unicode Standard, and products and services that can help you
   globalize/localize your software, documentation and Internet content.
   For details, visit the Conference Web site.

CONFERENCE VENUE

The Conference will take place at:

   The Burlington Hotel
   Upper Leeson Street
   Dublin 4, Ireland

   Tel: (+353 1) 660 5222
   Fax: (+353 1) 660 8496

CONFERENCE MANAGEMENT

   Global Meeting Services Inc.
   8949 Lombard Place, #416
   San Diego, CA 92122, USA

   Tel: +1 858 638 0206 (voice)
+1 858 638 0504 (fax)

   Email: [EMAIL PROTECTED]
  or: [EMAIL PROTECTED]

THE UNICODE CONSORTIUM

The Unicode Consortium was founded as a non-profit organization in 1991.
It is dedicated to the development, maintenance and promotion of The
Unicode Standard, a worldwide character encoding.  The Unicode Standard
encodes the characters of the world's principal scripts and languages,
and is code-for-code identical to the international standard ISO/IEC
10646.  In addition to cooperating with ISO on the future development of
ISO/IEC 10646, the Consortium is responsible for providing character
properties and algorithms for use in implementations.  Today the
membership base of the Unicode Consortium includes major computer
corporations, software producers, database vendors, research
institutions, international agencies and various user groups.

For further information on the Unicode Standard, visit the Unicode Web
site at http://www.unicode.org or e-mail [EMAIL PROTECTED]

   *  *  *  *  *

Unicode(r) and the Unicode logo are registered trademarks of Unicode,
Inc.  Used with permission.









-- --
Visit our Internet site at http://www.reuters.com

Any views expressed in this message are those of  the  individual
sender,  except  where  the sender specifically states them to be
the views of Reuters Ltd.




RE: Unicode and Security: Domain Names

2002-02-08 Thread Yves Arrouye

 Are the actual domain names as stored in the DB going to be canonical
 normalized Unicode strings? It seems this would go a long way towards
 preventing spoofing ... 

Names will be stored according to a normalization called Nameprep. Read the
Stringprep (general framework) and Nameprep (IDN application, or Stringprep
profile) for details. This normalization includes a step of normalizing
using NFKC, but it does more than that.

no one would be allowed to register a non-
 canonical
 normalized domain name. Then, a resolver would be required to normalize
 any
 request string before the actual resolve.

To keep the resolver's loads the same as today, client applications will do
the normalization of their requests. If they don't normalize properly, the
lookup will just fail. Read the IDNA document for more info on this.

All normalized strings are encoded in a so-called ASCII Compatible Encoding
which uses the restricted set of characters used in the DNS today (letters,
digits, hyphen except at the extremities) for host names (which are
different than STD13 names, cf. SRV RRs for example). Read IDNA, again, and
Punycode, the chosen encoding.

YA





RE: Unicode and Security: Domain Names

2002-02-08 Thread Nelson H. F. Beebe

The recent discussions of this list about Internet domain name
spoofing through substitution of Unicode characters that have similar,
or identical, glyphs is an issue that has recently appeared in print
in a prominent journal:

@String{j-CACM  = Communications of the ACM}

@Article{Gabrilovich:2002:IRH,
  author =   Evgeniy Gabrilovich and Alex Gontmakher,
  title =Inside risks: The homograph attack,
  journal =  j-CACM,
  volume =   45,
  number =   2,
  pages =128--128,
  month =feb,
  year = 2002,
  CODEN =CACMA2,
  ISSN = 0001-0782,
  bibdate =  Wed Jan 30 17:45:01 MST 2002,
  bibsource =http://www.acm.org/pubs/contents/journals/cacm/;,
  acknowledgement = ack-nhfb,
}

Bruce Schneier also discussed this in the 15-Mar-2001, 15-Jul-2001,
15-Sep-2001, and 15-Nov-2001 issues of the CRYPTO-GRAM newsletter
(available at

http://www.counterpane.com/crypto-gram.html

) and gave these links for more info:

http://www.theregister.co.uk/content/55/21573.html
http://www.securityfocus.com/bid/3461
http://www.counterpane.com/crypto-gram-0007.html#9
http://www.securityfocus.com/focus/ids/articles/utf8.html

---
- Nelson H. F. BeebeTel: +1 801 581 5254  -
- Center for Scientific Computing   FAX: +1 801 585 1640, +1 801 581 4148 -
- University of UtahInternet e-mail: [EMAIL PROTECTED]  -
- Department of Mathematics, 322 INSCC  [EMAIL PROTECTED]  [EMAIL PROTECTED] -
- 155 S 1400 E RM 233   [EMAIL PROTECTED]-
- Salt Lake City, UT 84112-0090, USAURL: http://www.math.utah.edu/~beebe  -
---




Re: Re[2]: Unicode and Security

2002-02-08 Thread Mark Davis

Asmus is absolutely right about Latin, Greek and Cyrillic. And the
response that Unicode should be encoding glyphs instead of characters
is, in the least, misguided. No character encodings have ever been
predicated on that. For an example of how many glyphs are available
just for the letter A, look at:

http://www.macchiato.com/utc/glyph_variation.html

There have been attempts to develop glyph standards (AFII was one).
All have floundered.

Mark
—

Πόλλ’ ἠπίστατο ἔργα, κακῶς δ’ ἠπίστατο 
πάντα — Ὁμήρου Μαργίτῃ
[For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr]

http://www.macchiato.com

- Original Message -
From: Philipp Reichmuth [EMAIL PROTECTED]
To: Asmus Freytag [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Friday, February 08, 2002 09:18
Subject: Re[2]: Unicode and Security


 Hello Asmus and others,

 I'm not sure Unicode can be fixed at this point. The flaws may be
 too deeply embedded. The real solution may involve waiting until
 companies and  people start losing significant amounts of money as
a
 result of the flaws  in Unicode, and then throwing it away and
 replacing it with something else.

 AF This sounds nice and dramatic, but misses the point that the
kinds of
 AF issues you highlighted are absolutely common to *all* character
sets
 AF containing Latin and Greek, or Latin and Cyrillic characters,
suggesting
 AF that you are simply grandstanding here, instead of trying to
find real
 AF solutions to your problem.

 Oh, it is very well possible to design a character set that supports
 all of Latin, Cyrillic and Greek without being susceptible to this
 problem beyond the familiar 1-l-|, 0-O dimension. The main premise
is
 to encode glyphs instead of characters so that one glyph A is used
 in all three of these alphabets. Roundtrip compatibility with legacy
 character sets would be a problem, though. It looks like there is
the
 decision between kludge A (roundtrip compatibility missing) and
kludge
 B (easier spoofability). However, for URLs etc., roundtrip
 compatibility is not really necessary, I think.

 AF Earlier, you accused Unicode of being in denial about security
 AF issues: It is you who is in denial about some underlying
 AF realities, among which is  that there are security issues that
 AF cannot be fixed by designing a  'better' character set.

 I am sure they can be fixed by designing a better character set that
 is better suited to a given problem. A lot of problems can be
avoided
 by regarding a character set as an application-specific entity to
some
 extent.

 This is not what we want, of course; we want a universal encoding
 across all applications. This being our premise, the resulting
 problems which you cannot possibly deny will have to be dealt with
in
 one way or the other. To me, it seems a better idea to fix problems
 that arise directly from the way we encode our characters already on
 the character set level as far as possible, even if it just means
 notifying people that mixing characters from different alphabets may
 lead to misinterpretations and to denote common glyph similarities
in
 the standard, such as the glyph A or for that part the character
A
 being indiscernible in several alphabets.

   Philippmailto:[EMAIL PROTECTED]
 ___
 Seeing my great fault / Through darkening blue windows / I begin
again








Arabic indexes

2002-02-08 Thread John Cowan

I have here a book with separate English, Hebrew and Arabic indexes.  In
the English index, the indexed words appear (as is conventional)
with a page number after (that is, to the right) of them.  In
the Hebrew index, the words likewise appear with a page number
after (that is, to the left) of them.

In the Arabic index, however, the indexed words appear with a page
number before (that is, to the right) of them.  Is this regular
practice in Arabic indexing, or some bizarre bidi glitch?

-- 
John Cowan   http://www.ccil.org/~cowan  [EMAIL PROTECTED]
To say that Bilbo's breath was taken away is no description at all.  There
are no words left to express his staggerment, since Men changed the language
that they learned of elves in the days when all the world was wonderful.
--_The Hobbit_