Re: Unicode and Security

2002-02-10 Thread Mark Davis

I have long advocated more intelligent GUIs to help distinguish
spoofing names. I think the technique could also help for the
Traditional vs Simplified Chinese issue; to help people type in one or
the other but not mix. I coded up (very rough, I warn you) a quick
demo of what I mean. Try:

http://www.macchiato.com/utc/despoofing

Mark
—

Πόλλ’ ἠπίστατο ἔργα, κακῶς δ’ ἠπίστατο 
πάντα — Ὁμήρου Μαργίτῃ
[For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr]

http://www.macchiato.com

- Original Message -
From: John Cowan [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Saturday, February 09, 2002 18:28
Subject: Re: Unicode and Security


 [EMAIL PROTECTED] scripsit:

  Let's keep going.  Latin Y, Greek Upsilon, Cyrillic U.  Wait a
minute, that
  Cyrillic U doesn't look *quite* the same.  Oh well, it's close
enough, right?

 And then there's the Cyrillic U with the straight descender, whic
 actually does look just like its Latin and Greek counterparts.
 I guess we just can't afford to have two kinds of Cyrillic U around:
 off with their heads (or tails)!

 Unfortunately, there goes all those Turkic languages written in
Cyrillic.
 Well, they should Romanize anyway.  In fact all languages should
Romanize:
 it simplifies everything s much, and if we get rid of diacritics
 while we're at at it well, the ASCII Consortium
 (off-net, but cached in part at
http://www.google.com/search?q=cache:IRueJQ1bA-4C:www.wholehog.fsnet.c
o.uk/robert/ascii/+ASCII+Consortiumhl=en)
 will find it a dream come true.  And there was much rejoicing.

 --
 John Cowan   http://www.ccil.org/~cowan
[EMAIL PROTECTED]
 To say that Bilbo's breath was taken away is no description at all.
There
 are no words left to express his staggerment, since Men changed the
language
 that they learned of elves in the days when all the world was
wonderful.
 --_The Hobbit_







Re: Unicode and Security

2002-02-10 Thread John Cowan

Elliotte Rusty Harold scripsit:

 Another possibility is a super-normalization that does combine 
 similar looking Unicode characters; e.g. in the domain name system we 
 might decide that microsoft.com with Latin o's or Cyrillic o's or 
 Greek o's is to resolve to the same address. 

In that case comment NOW, TODAY or TOMORROW, to the IETF IDN lists
so that they can extend the nameprep process to do such things.
(They will be resistant at this stage, no doubt, but it's worth
a try.)  The Unicode list can't help you.

-- 
John Cowan   http://www.ccil.org/~cowan  [EMAIL PROTECTED]
To say that Bilbo's breath was taken away is no description at all.  There
are no words left to express his staggerment, since Men changed the language
that they learned of elves in the days when all the world was wonderful.
--_The Hobbit_




RE: Unicode and Security

2002-02-10 Thread Carl W. Brown

Doug,

I agree.

I used to do security consulting and found that the biggest problem was that
people tried to come up with solutions for the wrong problem.

We can go back to the typewriter days when there was no.t difference between
1  l or 0  O.  Do. you blame ASCII if you type ST0P instead of STOP?

Reexamine the problem and potential solutions.

Carl



 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
 Behalf Of [EMAIL PROTECTED]
 Sent: Sunday, February 10, 2002 5:24 PM
 To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Subject: Re: Unicode and Security


 In a message dated 2002-02-10 13:00:19 Pacific Standard Time,
 [EMAIL PROTECTED] writes:

  However, I do continue to maintain that character confusion is a real
  security risk that will have real impact on users, and that needs to
  be considered in any system that uses Unicode.

 We have already established that similar-looking characters can cause
 confusion in Unicode-based systems.  However, we have also
 established that
 ISO 8859-1, 8859-5 (Cyrillic), 8859-7 (Greek), and even ASCII can
 suffer from
 this same problem.  It is unrealistic to sugest that the problem
 began with
 Unicode.

  In some domains the
  problem might be severe enough to eliminate Unicode from
  consideration in favor of less extensive character sets like Latin-1.
  That would be a shame, but until the Unicode consortium addresses at
  a root level the real security implications of their work, security
  conscious developers will look elsewhere. (I notice the Unicode 3.0
  book does not even have the word security in its index.) Many more
  developers who are at best tangentially conscious of security issues
  will go ahead and develop insecure systems because they don't realize
  the security implications of adopting Unicode.

 Companies and individuals that choose to throw out the baby with the bath
 water will achieve the kind of results that that approach usually
 delivers.

 Companies and individuals that wish to establish their own
 definitions of,
 and policies for dealing with, confusable characters are free to
 do so.  As I
 stated earlier, and nobody could refute, there is no consistent way to
 determine which sets of characters are confusable with each
 other, other than
 in the most obvious cases like o/omicron.  So of course neither
 the Unicode
 Consortium nor WG2 has taken it upon themselves to draw up such a
 list.  This
 must be a local decision.

  Another possibility is a super-normalization that does combine
  similar looking Unicode characters; e.g. in the domain name system we
  might decide that microsoft.com with Latin o's or Cyrillic o's or
  Greek o's is to resolve to the same address. No separate registration
  would be necessary or possible. This would require detailed analysis
  of the tens of thousands of Unicode characters allowed in domain
  names by fluent speakers of various languages; not easy, not cheap,
  but perhaps necessary. Besides, the security improvements, this
  proposal would also improve the system's usability. Aren't sure
  whether that URL on the bus used an o or an omicron? Doesn't matter,
  type either one.

 Adding this sort of unification to the nameprep stage might have been
 possible about a year or so ago.  It's probably too late now.

  Actually, people have been talking about the security problems with
  HTML for years. Search engines have gone to some effort to eliminate
  spamdexers that use these techniques. The log in HTML's eye does not,
  however, negate the existence of the log in Unicode's eye.

 Again (and again), the problem is not unique to Unicode.
 Existing character
 sets also contain confusables.  Blaming Unicode for exacerbating
 the problem
 by offering so many characters is like blaming your local ice
 cream shop for
 offering 31 flavors, because that makes it so much more difficult
 to choose.

 -Doug Ewell
  Fullerton, California
  (address will soon change to dewell at adelphia dot net)






Re: Unicode and Security

2002-02-09 Thread Lars Marius Garshol


* Elliotte Rusty Harold
|
| Let's say I register microsoft.com, only the fifth letter isn't a
| lower-case Latin o. It's actually a lower case Greek omicron.

I'll grant you that this is possible, perhaps even likely, and that it
may cause problems, but I'm far from convinced that this in any way
supports the there are security problems in Unicode thesis.

There are many characters which look alike, and yet are different,
which can cause problems of this kind. There are for example already
viruses which exploit the visual similarity between 1 and l in the
Windows system font to keep themselves from being discovered in file
listings.

So if this really is considered a problem it would seem to me that you
would need to deal with the problem of [EMAIL PROTECTED],
[EMAIL PROTECTED], and [EMAIL PROTECTED] looking very similar to
[EMAIL PROTECTED] in lots of fonts. To exploit this, all you need to
know is what email client someone uses, and usually every email they
write will have that information in its headers.

It seems to me that this problem really needs some other fix than the
merging of all similar-looking characters in all character sets. I
just can't see that working. 

Similarly, the security problems caused by using Unicode encoding
tricks to hide or mangle text in, say, contracts, is no different from
using HTML or CSS (or whatever) tricks to achieve the same effect, and
yet nobody is talking about security problems with HTML or CSS. See
[1] for one way of dealing with it that is now being worked on.

So while I accept that there is a problem it does not seem to me that
Unicode is the problem. To me the problem seems to be the complexity
of the relationship between the bytes sent to the user and what the
user actually sees and reacts to. That complexity is not going to
disappear, and aspects of the same problem exist with just about any
information representation, so clearly the solution must be something
other than changing all of these syntaxes/formats/encodings.

In the specific case you cite, for example, a better solution might be
for the user's email client to keep track of all the user's contacts
and for it to indicate in some clearly visible way whether the current
email comes from one of them or not. Whether it uses string matching
of email addresses or digital signatures to do that doesn't really
matter; it solves the problem in your example either way.

[1] URL: http://www.w3.org/TR/xmldsig-core/#sec-Seen 

-- 
Lars Marius Garshol, Ontopian URL: http://www.ontopia.net 
ISO SC34/WG3, OASIS GeoLang TCURL: http://www.garshol.priv.no 





Re: Unicode and Security

2002-02-09 Thread DougEwell2

In a message dated 2002-02-09 13:00:59 Pacific Standard Time, 
[EMAIL PROTECTED] writes:

 It seems to me that this problem really needs some other fix than the
 merging of all similar-looking characters in all character sets. I
 just can't see that working. 

Even the merging part wouldn't work.  Let's say that I, like Ken Sakamura 
or Bernard Miller before me, have decided that I know much more about 
character encoding than the Unicode Consortium or WG2, and I am going to 
develop my own character encoding that will solve the problem of confusables 
once and for all.

OK, we start with the easy ones.  Latin A, Greek Alpha, and Cyrillic A all 
get unified.  Latin E, Greek Epsilon, Cyrillic E, unified.  Hey, this is 
easier than I thought.  Latin B, Greek Beta, Cyrillic Ve.  Ha!  I'm smart 
enough to know that Ve gets unified with B and Beta, even though it 
represents a different sound.  Just like Han unification!  Boy, those Unicode 
dolts really missed something there.

Let's keep going.  Latin Y, Greek Upsilon, Cyrillic U.  Wait a minute, that 
Cyrillic U doesn't look *quite* the same.  Oh well, it's close enough, right? 
 Let's try some lower-case letters.  Latin a, Greek alpha, Cyrillic a.  That 
Greek alpha looks kinda cursive, doesn't it?  Should we unify it or not.  
Hmmm...

How about Latin n and Greek eta?  Is that descender on the eta significant or 
not?  Hey, you could stick an eta in the middle of a Web address and really 
fool somebody.  Better unify.  How about Latin v and Greek nu?  Different 
glyphs or not?  In 9-point MS Sans Serif, they're pretty close, aren't they?  
(And don't forget Armenian vo!)  Same goes for Latin y and Greek gamma.

Well, you get the point.  The world of alphabetic confusables is just not 
that simple or that 1-to-1.  There are more edge cases, in fact, than obvious 
cases such as the a/alpha or o/omicron that we keep hearing about.  And if I 
were trying to design this hypothetical Uniglyph encoding to get rid of 
those pesky confusables, and still provide support for alphabetic scripts 
besides Latin, I would eventually have to face the fact that it *can't be 
done*.  Oh, sure, it can be done for a/alpha and o/omicron, so I can make a 
sales presentation or a picket sign.  But a complete technical solution, uh, 
no.

-Doug Ewell
 Fullerton, California
 (address will soon change to dewell at adelphia dot net)




Re: Unicode and Security

2002-02-09 Thread John Cowan

[EMAIL PROTECTED] scripsit:

 Let's keep going.  Latin Y, Greek Upsilon, Cyrillic U.  Wait a minute, that 
 Cyrillic U doesn't look *quite* the same.  Oh well, it's close enough, right? 

And then there's the Cyrillic U with the straight descender, whic
actually does look just like its Latin and Greek counterparts.
I guess we just can't afford to have two kinds of Cyrillic U around:
off with their heads (or tails)!

Unfortunately, there goes all those Turkic languages written in Cyrillic.
Well, they should Romanize anyway.  In fact all languages should Romanize:
it simplifies everything s much, and if we get rid of diacritics
while we're at at it well, the ASCII Consortium
(off-net, but cached in part at 
http://www.google.com/search?q=cache:IRueJQ1bA-4C:www.wholehog.fsnet.co.uk/robert/ascii/+ASCII+Consortiumhl=en)
will find it a dream come true.  And there was much rejoicing.

-- 
John Cowan   http://www.ccil.org/~cowan  [EMAIL PROTECTED]
To say that Bilbo's breath was taken away is no description at all.  There
are no words left to express his staggerment, since Men changed the language
that they learned of elves in the days when all the world was wonderful.
--_The Hobbit_




Re: Unicode and Security

2002-02-08 Thread Otto Stolz

Elliotte Rusty Harold wrote:

 The problem is that all of these or any other client-based solution you 
 come up with is only going to be implemented in some clients. Many, and 
 at least initially most, users are not going to have any such 
 protections. This needs to be cut off at the protocol level.


Rather, the problem is that replacing just one of the many existing
character encodings  with an allegedly secure one would only be going
to serve some (rather few!) users. Finding a solution that works with
all character encodings alike, is much more efficient (and is probably
feasable, in contrast to the solution advocated by ERH). One possible
solution for the e-mail spoofing problem is kryptographic authentication.
This is independent of the underlying character encoding, and it is al-
ready widely available.

I said 'allegedly secure', because no character encoding standard can
really prevent this sort of spoofing (we had enough examples in this
thread, based on bare ASCII). Trying to find a spoofing-proof character-
encoding is comparable to the task of finding an alphabet that does not
allow to spell any insults.

Best wishes,
Otto Stolz





Re: Unicode and Security

2002-02-08 Thread Michael Everson

At 17:42 -0500 2002-02-07, John Cowan wrote:

The only widely-deployed alternative approach I know of is
ETSI GSM 03.38 (used in mobile telephony),

A truly bizarre character set:  it supports English, French,
mainland Scandinavian languages, Italian, Spanish with Graves, and
GREEK SHOUTING.

On my Nokia I am forced to write SMS messages in Irish with graves 
for ÀàÌìÒòÙù but I am awarded Éé. The Nokia does have ÁáÍíÓóÚú 
available for spelling names in the phone book, but the accents are 
stripped off if they are sent in a text message. :-(
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: Unicode and Security

2002-02-08 Thread Michael Everson

At 15:53 -0500 2002-02-07, Elliotte Rusty Harold wrote:
For text files, probably not. But for the domain name system the 
world very well might. Indeed, maybe it should unless this problem 
can be dealt with. I suspect it can be dealt with by prohibiting 
script mixing in domain names (e.g. each component of the name must 
be entirely Greek or entirely Cyrillic or entirely Latin etc. Note: 
something_Cyrillic.something_greek.com is OK.)  Does anybody really 
need mixed Latin and Greek domain names?

Certainly. Some years ago the European Court upheld the right of a 
Belgian man whose father was Belgian and mother was Greek to spell 
his hyphenated last name in both scripts. Why should he not be 
allowed to register a domain based on his own name?

I don't think this has anything to do with Unicode. In Unicode, we 
wish to make all the world's writing system available to everyone. 
Thieves and cheats will use it if they wish, but this detracts not 
one whit from the nobility of our enterprise.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




RE: Unicode and Security: Domain Names

2002-02-08 Thread Suzanne M. Topping



 -Original Message-
 From: Tom Gewecke [mailto:[EMAIL PROTECTED]]
 Sent: Thursday, February 07, 2002 6:20 PM
 To: [EMAIL PROTECTED]
 Subject: Re: Unicode and Security: Domain Names
 
 
 I note that companies like Verisign already claim to offer 
 domain names
 in dozens of languages and scripts.  Apparently these are converted by
 something called RACE encoding to ASCII for actual use on the 
 internet.
 
 Does anyone know anything about RACE encoding and its properties?

I wrote an article on IDNS in December of 2000 which discusses the
approaches which were being debated at that time, including RACE. RACE
is briefly described in that article. You can find it at:

http://www-106.ibm.com/developerworks/library/u-domains.html

I tried to find an updated internet draft on RACE, but looks like
nothing exists after version 4, which has been archived. I'm guessing
that draft names wich include the text BRACE, TRACE, and GRACE are
probably RACE variations however. Check them out at:
http://www.ietf.org/internet-drafts/ 

Suzanne Topping
BizWonk Inc.
[EMAIL PROTECTED]




Re: Unicode and Security: Domain Names

2002-02-08 Thread DougEwell2

In a message dated 2002-02-08 8:23:22 Pacific Standard Time, 
[EMAIL PROTECTED] writes:

 Does anyone know anything about RACE encoding and its properties?

 I wrote an article on IDNS in December of 2000 which discusses the
 approaches which were being debated at that time, including RACE. RACE
 is briefly described in that article. You can find it at:

 http://www-106.ibm.com/developerworks/library/u-domains.html

 I tried to find an updated internet draft on RACE, but looks like
 nothing exists after version 4, which has been archived. I'm guessing
 that draft names wich include the text BRACE, TRACE, and GRACE are
 probably RACE variations however. Check them out at:
 http://www.ietf.org/internet-drafts/ 

An ACE (ASCII-Compatible Encoding) has been chosen for IDN, and it is neither 
RACE nor DUDE.  Its working name was AMC-ACE-Z, and it has since been renamed 
Punycode.  (No, I don't like the name either.)

A search for punycode in the internet-drafts directory that Suzanne 
mentioned will reveal the details you are looking for.

Beware that in addition to Punycode, there is another step in the IDN process 
called nameprep, which is basically an extended form of normalization to 
keep compatibility characters, non-spacing marks, directional overrides, and 
such out of domain names.  Converting an arbitrary string through Punycode 
does not necessarily make it IDN-ready.

-Doug Ewell
 Fullerton, California
 (address will soon change to dewell at adelphia dot net)




Re: Unicode and Security

2002-02-08 Thread Philipp Reichmuth

Hi Elliotte and others,

ERH Does anybody really need mixed Latin and Greek domain names?

This is the wrong approach altogether. If we want to be universal, we
can't exclude cases on a heuristic basis of no one is probably going
to need this.

BTW People will certainly want mixed Han and Latin characters where
the problem arises with fullwidth forms to some extent, and people
will probably want mixed Cyrillic and Latin domain names as well (one
starts seeing mixed scripts in business names, for instance).

  Philippmailto:[EMAIL PROTECTED]
___
Hal, open the file / Hal, open the damn file, Hal / open the, please Hal





Re[2]: Unicode and Security

2002-02-08 Thread Philipp Reichmuth

Hello Asmus and others,

I'm not sure Unicode can be fixed at this point. The flaws may be
too deeply embedded. The real solution may involve waiting until
companies and  people start losing significant amounts of money as a
result of the flaws  in Unicode, and then throwing it away and
replacing it with something else.

AF This sounds nice and dramatic, but misses the point that the kinds of 
AF issues you highlighted are absolutely common to *all* character sets 
AF containing Latin and Greek, or Latin and Cyrillic characters, suggesting 
AF that you are simply grandstanding here, instead of trying to find real 
AF solutions to your problem.

Oh, it is very well possible to design a character set that supports
all of Latin, Cyrillic and Greek without being susceptible to this
problem beyond the familiar 1-l-|, 0-O dimension. The main premise is
to encode glyphs instead of characters so that one glyph A is used
in all three of these alphabets. Roundtrip compatibility with legacy
character sets would be a problem, though. It looks like there is the
decision between kludge A (roundtrip compatibility missing) and kludge
B (easier spoofability). However, for URLs etc., roundtrip
compatibility is not really necessary, I think.

AF Earlier, you accused Unicode of being in denial about security
AF issues: It is you who is in denial about some underlying
AF realities, among which is  that there are security issues that
AF cannot be fixed by designing a  'better' character set.

I am sure they can be fixed by designing a better character set that
is better suited to a given problem. A lot of problems can be avoided
by regarding a character set as an application-specific entity to some
extent.

This is not what we want, of course; we want a universal encoding
across all applications. This being our premise, the resulting
problems which you cannot possibly deny will have to be dealt with in
one way or the other. To me, it seems a better idea to fix problems
that arise directly from the way we encode our characters already on
the character set level as far as possible, even if it just means
notifying people that mixing characters from different alphabets may
lead to misinterpretations and to denote common glyph similarities in
the standard, such as the glyph A or for that part the character A
being indiscernible in several alphabets.

  Philippmailto:[EMAIL PROTECTED]
___
Seeing my great fault / Through darkening blue windows / I begin again





Re: Unicode and Security

2002-02-08 Thread Barry Caplan


At 15:53 -0500 2002-02-07, Elliotte Rusty Harold wrote:
For text files, probably not. But for the domain name system the world 
very well might. Indeed, maybe it should unless this problem can be dealt 
with. I suspect it can be dealt with by prohibiting script mixing in 
domain names (e.g. each component of the name must be entirely Greek or 
entirely Cyrillic or entirely Latin etc. Note: 
something_Cyrillic.something_greek.com is OK.)  Does anybody really need 
mixed Latin and Greek domain names?



Not only that, why limit the alleged security risks to domain names? Why 
not the part of an email address before the @? the allowed characters for 
that are specified in a different RFC than that for domain names, and has 
nothing to do at all with DNS.

And how many variations of numerals are there in Unicode? After all, every 
place you could use a domain name, you could use the actual IP address too. 
How many ways might that be spoofed?

Barry






Re[2]: Unicode and Security

2002-02-08 Thread Asmus Freytag

At 06:18 PM 2/8/02 +0100, Philipp Reichmuth wrote:
Oh, it is very well possible to design a character set that supports
all of Latin, Cyrillic and Greek without being susceptible to this
problem beyond the familiar 1-l-|, 0-O dimension. The main premise is
to encode glyphs instead of characters so that one glyph A is used
in all three of these alphabets. Roundtrip compatibility with legacy
character sets would be a problem, though. It looks like there is the
decision between kludge A (roundtrip compatibility missing) and kludge
B (easier spoofability).
If your statement was phrased differently, i.e. saying that domain name
registration and resolution should not allow a distinction between
A.com and A.com where one uses the Greek and one the Latin A, that
would be a different matter. Such action would close this spoofing
loophole very effectively w/o restricting the registration of
meaningful names. However, there may be subtle issues with such an
approach. But the important thing is that it does not fiddle with the
character set as such.

However, for URLs etc., roundtrip
compatibility is not really necessary, I think.

I beg to differ. Roundtrip convertibility is very important since URLs live 
in documents encoded in Unicode, ISO/IEC 8859-7, even Shift-JIS etc. that 
are all
not 'glyph' encodings. Whatever specialized 'character set' gets used 
transiently in resolving the domain name is one issue, but it better be 
easily possible to convert between it and the form URLs are actually stored 
in hypertext.

I am sure they can be fixed by designing a better character set that
is better suited to a given problem. A lot of problems can be avoided
by regarding a character set as an application-specific entity to some
extent.

This is not what we want, of course; we want a universal encoding
across all applications. This being our premise, the resulting
problems which you cannot possibly deny will have to be dealt with in
one way or the other.

Nobody argues that spoofing and other security issues shouldn't get
addressed.

To me, it seems a better idea to fix problems
that arise directly from the way we encode our characters already on
the character set level as far as possible, even if it just means
notifying people that mixing characters from different alphabets may
lead to misinterpretations and to denote common glyph similarities in
the standard, such as the glyph A or for that part the character A
being indiscernible in several alphabets.

And we are certainly doing that. But, while A is an important character,
there are nearly 70,000 han characters out there, some with distinctions
so subtle that many fonts will not show them and many users will not
recognize them. This has not featured in this discussion so far, nicely
showing how our perception of issues are colored by our personal experience
with scripts and languages. For han characters even my simple suggestion
above is probably not practical.

A./




RE: Unicode and Security: Domain Names

2002-02-08 Thread Yves Arrouye

Moreover, the IDN WG documents are in final call, so if you have comments to
make on them, now is the time. Visit http://www.i-d-n.net/ and sub-scribe
(with a hyphen here so that listar does not interpret my post as a command!)
to their mailing list (and read their archives) before doing so.

The documents in last call are:

1. Internationalizing Domain Names in Applications (IDNA)
http://www.ietf.org/internet-drafts/draft-ietf-idn-idna-06.txt

2. Stringprep Profile for Internationalized Host Names
http://www.ietf.org/internet-drafts/draft-ietf-idn-nameprep-07.txt

3. Punycode version 0.3.3
http://www.ietf.org/internet-drafts/draft-ietf-idn-punycode-00.txt

4. Preparation of Internationalized Strings (stringprep)
http://www.ietf.org/internet-drafts/draft-hoffman-stringprep-00.txt

and the last call will end on Feb 11th 2002, 23h59m GMT-5. There is little
time left.

YA





RE: Unicode and Security: Domain Names

2002-02-08 Thread Barry Caplan

I want to review these documents, but since time is short, maybe someone 
can answer my question...

Are the actual domain names as stored in the DB going to be canonical 
normalized Unicode strings? It seems this would go a long way towards 
preventing spoofing ... no one would be allowed to register a non-canonical 
normalized domain name. Then, a resolver would be required to normalize any 
request string before the actual resolve.

So my questions are:

1 - Am I way off base here? If so, why?
2 - If not, is it already addressed in these docs?
3 - If it is not in the docs, and the request makes sense, then I will make 
the effort to beat the deadline, which is next Monday.


Thanks!

Barry

At 10:37 AM 2/8/2002 -0800, Yves Arrouye wrote:
Moreover, the IDN WG documents are in final call, so if you have comments to
make on them, now is the time. Visit http://www.i-d-n.net/ and sub-scribe
(with a hyphen here so that listar does not interpret my post as a command!)
to their mailing list (and read their archives) before doing so.

The documents in last call are:

1. Internationalizing Domain Names in Applications (IDNA)
http://www.ietf.org/internet-drafts/draft-ietf-idn-idna-06.txt

2. Stringprep Profile for Internationalized Host Names
http://www.ietf.org/internet-drafts/draft-ietf-idn-nameprep-07.txt

3. Punycode version 0.3.3
http://www.ietf.org/internet-drafts/draft-ietf-idn-punycode-00.txt

4. Preparation of Internationalized Strings (stringprep)
http://www.ietf.org/internet-drafts/draft-hoffman-stringprep-00.txt

and the last call will end on Feb 11th 2002, 23h59m GMT-5. There is little
time left.

YA





RE: Unicode and Security: Domain Names

2002-02-08 Thread Yves Arrouye

Moreover, the IDN WG documents are in final call, so if you have comments to
make on them, now is the time. Visit http://www.i-d-n.net/ and subscribe to
their mailing list (and read their archives) before doing so.

The documents in last call are:

1. Internationalizing Domain Names in Applications (IDNA)
http://www.ietf.org/internet-drafts/draft-ietf-idn-idna-06.txt

2. Stringprep Profile for Internationalized Host Names
http://www.ietf.org/internet-drafts/draft-ietf-idn-nameprep-07.txt

3. Punycode version 0.3.3
http://www.ietf.org/internet-drafts/draft-ietf-idn-punycode-00.txt

4. Preparation of Internationalized Strings (stringprep)
http://www.ietf.org/internet-drafts/draft-hoffman-stringprep-00.txt

and the last call will end on Feb 11th 2002, 23h59m GMT-5. There is little
time left.

YA





RE: Unicode and Security: Domain Names

2002-02-08 Thread Yves Arrouye

 Are the actual domain names as stored in the DB going to be canonical
 normalized Unicode strings? It seems this would go a long way towards
 preventing spoofing ... 

Names will be stored according to a normalization called Nameprep. Read the
Stringprep (general framework) and Nameprep (IDN application, or Stringprep
profile) for details. This normalization includes a step of normalizing
using NFKC, but it does more than that.

no one would be allowed to register a non-
 canonical
 normalized domain name. Then, a resolver would be required to normalize
 any
 request string before the actual resolve.

To keep the resolver's loads the same as today, client applications will do
the normalization of their requests. If they don't normalize properly, the
lookup will just fail. Read the IDNA document for more info on this.

All normalized strings are encoded in a so-called ASCII Compatible Encoding
which uses the restricted set of characters used in the DNS today (letters,
digits, hyphen except at the extremities) for host names (which are
different than STD13 names, cf. SRV RRs for example). Read IDNA, again, and
Punycode, the chosen encoding.

YA





RE: Unicode and Security: Domain Names

2002-02-08 Thread Nelson H. F. Beebe

The recent discussions of this list about Internet domain name
spoofing through substitution of Unicode characters that have similar,
or identical, glyphs is an issue that has recently appeared in print
in a prominent journal:

@String{j-CACM  = Communications of the ACM}

@Article{Gabrilovich:2002:IRH,
  author =   Evgeniy Gabrilovich and Alex Gontmakher,
  title =Inside risks: The homograph attack,
  journal =  j-CACM,
  volume =   45,
  number =   2,
  pages =128--128,
  month =feb,
  year = 2002,
  CODEN =CACMA2,
  ISSN = 0001-0782,
  bibdate =  Wed Jan 30 17:45:01 MST 2002,
  bibsource =http://www.acm.org/pubs/contents/journals/cacm/;,
  acknowledgement = ack-nhfb,
}

Bruce Schneier also discussed this in the 15-Mar-2001, 15-Jul-2001,
15-Sep-2001, and 15-Nov-2001 issues of the CRYPTO-GRAM newsletter
(available at

http://www.counterpane.com/crypto-gram.html

) and gave these links for more info:

http://www.theregister.co.uk/content/55/21573.html
http://www.securityfocus.com/bid/3461
http://www.counterpane.com/crypto-gram-0007.html#9
http://www.securityfocus.com/focus/ids/articles/utf8.html

---
- Nelson H. F. BeebeTel: +1 801 581 5254  -
- Center for Scientific Computing   FAX: +1 801 585 1640, +1 801 581 4148 -
- University of UtahInternet e-mail: [EMAIL PROTECTED]  -
- Department of Mathematics, 322 INSCC  [EMAIL PROTECTED]  [EMAIL PROTECTED] -
- 155 S 1400 E RM 233   [EMAIL PROTECTED]-
- Salt Lake City, UT 84112-0090, USAURL: http://www.math.utah.edu/~beebe  -
---




Re: Re[2]: Unicode and Security

2002-02-08 Thread Mark Davis

Asmus is absolutely right about Latin, Greek and Cyrillic. And the
response that Unicode should be encoding glyphs instead of characters
is, in the least, misguided. No character encodings have ever been
predicated on that. For an example of how many glyphs are available
just for the letter A, look at:

http://www.macchiato.com/utc/glyph_variation.html

There have been attempts to develop glyph standards (AFII was one).
All have floundered.

Mark
—

Πόλλ’ ἠπίστατο ἔργα, κακῶς δ’ ἠπίστατο 
πάντα — Ὁμήρου Μαργίτῃ
[For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr]

http://www.macchiato.com

- Original Message -
From: Philipp Reichmuth [EMAIL PROTECTED]
To: Asmus Freytag [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Friday, February 08, 2002 09:18
Subject: Re[2]: Unicode and Security


 Hello Asmus and others,

 I'm not sure Unicode can be fixed at this point. The flaws may be
 too deeply embedded. The real solution may involve waiting until
 companies and  people start losing significant amounts of money as
a
 result of the flaws  in Unicode, and then throwing it away and
 replacing it with something else.

 AF This sounds nice and dramatic, but misses the point that the
kinds of
 AF issues you highlighted are absolutely common to *all* character
sets
 AF containing Latin and Greek, or Latin and Cyrillic characters,
suggesting
 AF that you are simply grandstanding here, instead of trying to
find real
 AF solutions to your problem.

 Oh, it is very well possible to design a character set that supports
 all of Latin, Cyrillic and Greek without being susceptible to this
 problem beyond the familiar 1-l-|, 0-O dimension. The main premise
is
 to encode glyphs instead of characters so that one glyph A is used
 in all three of these alphabets. Roundtrip compatibility with legacy
 character sets would be a problem, though. It looks like there is
the
 decision between kludge A (roundtrip compatibility missing) and
kludge
 B (easier spoofability). However, for URLs etc., roundtrip
 compatibility is not really necessary, I think.

 AF Earlier, you accused Unicode of being in denial about security
 AF issues: It is you who is in denial about some underlying
 AF realities, among which is  that there are security issues that
 AF cannot be fixed by designing a  'better' character set.

 I am sure they can be fixed by designing a better character set that
 is better suited to a given problem. A lot of problems can be
avoided
 by regarding a character set as an application-specific entity to
some
 extent.

 This is not what we want, of course; we want a universal encoding
 across all applications. This being our premise, the resulting
 problems which you cannot possibly deny will have to be dealt with
in
 one way or the other. To me, it seems a better idea to fix problems
 that arise directly from the way we encode our characters already on
 the character set level as far as possible, even if it just means
 notifying people that mixing characters from different alphabets may
 lead to misinterpretations and to denote common glyph similarities
in
 the standard, such as the glyph A or for that part the character
A
 being indiscernible in several alphabets.

   Philippmailto:[EMAIL PROTECTED]
 ___
 Seeing my great fault / Through darkening blue windows / I begin
again








Reversible bidi (wrote RE: Unicode and Security)

2002-02-07 Thread Marco Cimarosti

Otto Stolz wrote:
 Gaspar Sinai wrote:
  Just because some companies who have influence on Unicode
  Consortium use some algorithm, like backing store and re-mapping,
  it does  not mean that this is the only way. [...]
  Yudit does convert the input to view order and back.
 
 Now, this reveals the real problem.
 
  From this description, I gather that Gaspar's editor does not
 preserve the backing store, hence it has to reconstruct it from
 the rendering. As the rendering process is a n-1 mapping, its
 reverse is, intrisically, ambiguous. So, the attempt to recon-
 struct the original character sequence from the vsual appearance
 is bound to fail, in the general case.

Dankeschön, Otto!

I have been wondering for all the duration of this discussion what the heck
Gaspar and everybody else were talking about. Now I begin to understand.
Could we please drop all this garbage about security (this is not the
Anti-fraud Mailing List!) and talk about this implementation problem?

As I see it, dropping the backing store after running the bidi algorithm is
not necessarily a bad idea. But a condition must be respected: each
character's *embedding* levels and *override* information should be
preserved together with the text.

With this additional data in hand, it is not impossible to define a
*reversed* Bidi algorithm which effectively recovers the backing store from
the visual order.

Roozbeh Pournander, I, and other people have discussed this at length on
this list, and a very similar algorithm is actually implemented as part of
ICU.

Such a reversed Bidi technique does not necessarily restore a bit-wise copy
of the original backing store. However, the resulting backing store is
guaranteed to (a) have the same logical order as the original and (b) have
the same nesting of bidi embedding and overrides. The only things that this
approach drops are redundant bidi controls (such as a LTR embedding within
an already LTR segment), but is this all bad?

Even the John Cowan's example becomes perfectly unambiguous, if the bidi
embedding levels are retained:

Case 1:
From visual order:  the Arabs = BARA-LA
And bidi levels:222
Get logical order:  the Arabs = AL-ARAB

Case 2:
From visual order:  the Arabs = BARA-LA
And bidi levels:322
Get logical order:  AL-ARAB = the Arabs

It is not perfectly clear whether this approach is more or less functional
than the traditional approach of maintaining the backing store. What is
important, is that the two techniques have the same result.

My impression is that, although this reverse bidi requires more processing
(text must undergo two bidi algorithms vs. one), it makes the editing of
text a little bit easier, both for the programmer and for the user.

Roozbeh and I also considered that, as the embedding level are available
during the editing process, it would also be possible to *display* them
(e.g., in the form of stacks of horizontal arrows drawn under the text), and
this would make clear to the user the exact reading order.

_ Marco




RE: Reversible bidi (wrote RE: Unicode and Security)

2002-02-07 Thread Marco Cimarosti

I (Marco Cimarosti) wrote:
 Even the John Cowan's example becomes perfectly unambiguous, 
 if the bidi embedding levels are retained:
 
 Case 1:
   From visual order:  the Arabs = BARA-LA
   And bidi levels:222
   Get logical order:  the Arabs = AL-ARAB
 
 Case 2:
   From visual order:  the Arabs = BARA-LA
   And bidi levels:322
   Get logical order:  AL-ARAB = the Arabs

Errata: the bidi levels in the examples should actually be lowered by one
level.

I must also add that also the paragraph's overall embedding level should be
retained, although this would be almost always identical to the lowest
embedding level in the text.

So, here is the correction:

Even John Cowan's example becomes perfectly unambiguous, provided that the
bidi embedding levels are retained:

Case 1:
From visual order:  the Arabs = BARA-LA
And bidi levels:111
And paragraph level:0
You get logical order:  the Arabs = AL-ARAB

Case 2:
From visual order:  the Arabs = BARA-LA
And bidi levels:211
And paragraph level:1
You get logical order:  AL-ARAB = the Arabs

_ Marco




RE: Unicode and Security

2002-02-07 Thread Christopher J Fynn


Gaspar Sinai wrote:
 I am thinking about electronically signed Unicode text documents
 that are rendered correctly or believeed to be rendered correctly,
 still they look different, seem to contain additional or do not
 seem to contain some text when viewed with different viewers due
 to some ambiguities inherent in the standard.

This sounds like a rendering (application) issue not a character encoding
(unicode) issue. If the applicaton or operating environment doesn't properly
support complex script rendering (and / or if the client doesn't have the
right fonts installed) then text in complex scripts might be rendered
incorrectly - or not at all. Chances are such text would either be
nonsensical, look like gobbledegook, or display as string of empty boxes
indicating missing glyphs. Would you sign something like that?

Can you give an example of some text or document a person might be fooled
into signing that would mean one thing if rendered correctly and something
entirely different when rendered incorrectly?

- Chris






RE: Unicode and Security

2002-02-07 Thread Christopher J Fynn

John Hudson wrote:

 I can make an OpenType font for that uses contextual substitution to
 replace the phrase 'The licensee also agrees to pay the type designer
 $10,000 every time he uses the lowercase e' with a series of invisible
 non-spacing glyphs. Of course, the backing store will contain my
 dastardly
 hidden clause and that is the text the unwitting victim will
 electronically
 sign. Hahahaha, he laughed maniacally!

How about a font that displays any number following a dollar sign as only
10% of the actual value in the backing text?

As John pointed out, this sort of thing isn't a Unicode problem. One could
just as easily employ the same kind of hidden rendering rules with ASCII
text. The only way to prevent this sort of fraud altogether would be to
throw out complex script rendering and encode glyphs not characters... I
don't think anyone seriously wants to go back down that route and anyway it
would probably take decades and a huge effort to make such a standard
properly covering all the scripts already in Unicode - and there would
undoubtedly still be other problems.

There are plenty of ways paper documents can be altered, added to or just
plain forged by someone intent on fraud - some of them extremely difficult
to detect. I don't know, but it's probably safest to assume that the
situation is similar with electronic documents - whatever security systems
are in place. That's one reason why you should always keep a duplicate copy
of any contract you sign - whether its an electronic document you digitally
sign or a paper document you sign with a pen.

- Chris





Re: Unicode and Security

2002-02-07 Thread Elliotte Rusty Harold

At 11:54 AM -0700 2/6/02, John H. Jenkins wrote:


Right, but right now is that people are typing things like www.whitehouse.
com instead of www.whitehouse.gov (or, for that matter, 
www.unicode.com).  How likely is it that someone will accidentally 
type www.s?mple.com instead of www.sample.com?


Somebody could easily follow a link to such a site, possibly through 
a pop-up or some spyware installed on their system, and never realize 
they weren't at the actual site.

Security and spoofing are very real issues that were never, as far as 
I know, even considered in the design of Unicode. It's unclear 
whether or not the problem can be fixed now. The Unicode community 
has been in serious denial about this for some time. That other 
technologies also have or contribute to these problems in no way 
absolves Unicode of its problems.
-- 

+---++---+
| Elliotte Rusty Harold | [EMAIL PROTECTED] | Writer/Programmer |
+---++---+
|  The XML Bible, 2nd Edition (Hungry Minds, 2001)   |
|  http://www.ibiblio.org/xml/books/bible2/  |
|   http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/   |
+--+-+
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/  |
|  Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ |
+--+-+




Re: Unicode and Security

2002-02-07 Thread Lars Marius Garshol


* Elliotte Rusty Harold
| 
| Security and spoofing are very real issues that were never, as far
| as I know, even considered in the design of Unicode. It's unclear
| whether or not the problem can be fixed now. The Unicode community
| has been in serious denial about this for some time. That other
| technologies also have or contribute to these problems in no way
| absolves Unicode of its problems.

Could you explain what the problem is, as you see it? I've heard
mumblings about this from various directions for a long time, but
could never make any sense out of them. Is there a problem? If so,
what is it?

It seems to me that as security problems go C/C++ is infinitely much
worse that Unicode, and anyone at all serious about security should
start there, rather than with character sets.

-- 
Lars Marius Garshol, Ontopian URL: http://www.ontopia.net 
ISO SC34/WG3, OASIS GeoLang TCURL: http://www.garshol.priv.no 





Re: Unicode and Security

2002-02-07 Thread Elliotte Rusty Harold

I've been thinking about security issues in Unicode, and I've come up 
with one that's quite scary and worse than any I've heard before. It 
uses only plaintext, no fonts involved, doesn't require buggy 
software, and works over e-mail instead of the Web. All it requires 
added to the existing infrastructure is internationalized domain 
names. So in the hope that this becomes a self-defeating prophecy, 
here's the scenario:

I as a reporter or industrial spy or detective working on a divorce 
case, have learned the identities and internal e-mail addresses of 
two people, call them Alice and Bob, at Microsoft (or just about any 
other large company). I've somehow communicated with these people 
personally, for instance on an e-mail list completely unrelated to 
work but for which they use their work e-mail so I'm familiar with 
their style and signature files. Or perhaps, I've communicated with 
them on work related matters before. In any case, it's not hard to 
get two people who know each other at a large company to send you 
e-mail. Of course, they would presumably be careful not to give me 
secret company information since they know they're talking to an 
outsider.

For the sake of argument, let's call the company they work at 
Microsoft, but this attack could hit most companies with a .com 
address. Let's say I register microsoft.com, only the fifth letter 
isn't a lower-case Latin o. It's actually a lower case Greek omicron. 
I then forge a believable letter from [EMAIL PROTECTED] to 
[EMAIL PROTECTED] saying Can you please update me on your budget? 
Bob, noticing that the e-mail appears to come from Alice, whom he 
knows and trusts, fires off a reply with his confidential 
information. Only it doesn't go to Alice. It goes to me. I can then 
reply to Bob, asking for clarification or more details. I can ask him 
to attach the latest build of his software. I can carry on a 
conversation in which Bob believes me to be Alice and spills his 
guts. This is very, very bad.

E-mail forgery has been a problem for a long time, but it's always 
been one-way. You couldn't trick somebody into sending you a reply 
because doing so required using a different e-mail address than the 
one they expected, thus revealing the message as forged. With a 
Unicode enabled mailer, that's no longer true. If the fonts Bob (not 
me, but Bob) chooses for his e-mail program do not make a clear 
distinction between an o and an omicron, this works. There are lots 
of other attacks. The Cyrillic and Greek alphabets provide lots of 
options for replacing single letters in Latin domain names.

I'm not sure whether or not the internationalized domain names 
working group has fully grokked this or not. Like Unicode, they seem 
to be trying to pass the buck. In particular, they state 
http://www.ietf.org/internet-drafts/draft-ietf-idn-requirements-09.txt:

Specifying requirements for internationalized domain names does not 
itself raise any new security issues. However, any change to the DNS 
MAY affect the security of any protocol that relies on the DNS or on 
DNS names. A thorough evaluation of those protocols for security
concerns will be needed when they are developed. In particular, IDNs 
MUST be compatible with DNSSEC and, if multiple charsets or 
representation forms are permitted, the implications of this 
name-spoof MUST be throughly understood.

In other words, it's not our fault. Blame the client software. Sounds 
distressingly like the Unicode Consortium's approach to these issues. 
Interestingly, my attack works with a single character representation 
(Unicode). It is not dependent on multiple charsets. I don't know if 
the IDN working group has thought of this problem. I hope they have, 
and consider it their responsibility to prevent. I also hope the 
Unicode consortium and vendors of client software think about these 
problems. But I don't think we can count on client software getting 
this right. (Hell, Microsoft, can't even stop e-mail from running 
scripts.)  The problem needs to be fixed closer to the source.
-- 

+---++---+
| Elliotte Rusty Harold | [EMAIL PROTECTED] | Writer/Programmer |
+---++---+
|  The XML Bible, 2nd Edition (Hungry Minds, 2001)   |
|  http://www.ibiblio.org/xml/books/bible2/  |
|   http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/   |
+--+-+
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/  |
|  Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ |
+--+-+




Re: Unicode and Security

2002-02-07 Thread David Starner

On Thu, Feb 07, 2002 at 10:34:20AM -0500, Elliotte Rusty Harold wrote:
 Security and spoofing are very real issues that were never, as far as 
 I know, even considered in the design of Unicode. 

Unicode is a character encoding, not a glyph encoding. Furthermore, it's
a superset of a number of preexisting character sets, so that it was
possible for those users to move to Unicode without problems. Since
important preexisting character sets seperated Greek, Cyrillic and Latin
scripts, Unicode had to. Had Unicode not chosen to follow these
principles, ISO 10646 would have, and it would have become the dominant
character set, with the same problems.

In any case, what is your solution? When the American Mathematical
Society says We need a SMALL CIRCLE for the mathematical texts, do you
say no, we already have the unified LATGRKCRY SMALL O? After they show
you that the two are distinct characters in their texts, do you still
refuse because someone might get confused? The Universal Character Set
can't afford to not encode characters like that. 

-- 
David Starner - [EMAIL PROTECTED], dvdeug/jabber.com (Jabber)
Pointless website: http://dvdeug.dhis.org
What we've got is a blue-light special on truth. It's the hottest thing 
with the youth. -- Information Society, Peace and Love, Inc.




Re: Unicode and Security

2002-02-07 Thread Michael Everson

At 12:22 -0500 2002-02-07, Elliotte Rusty Harold wrote:

For the sake of argument, let's call the company they work at 
Microsoft, but this attack could hit most companies with a .com 
address. Let's say I register microsoft.com, only the fifth letter 
isn't a lower-case Latin o. It's actually a lower case Greek 
omicron. I then forge a believable letter from [EMAIL PROTECTED] 
to [EMAIL PROTECTED] saying Can you please update me on your 
budget? Bob, noticing that the e-mail appears to come from Alice, 
whom he knows and trusts, fires off a reply with his confidential 
information. Only it doesn't go to Alice. It goes to me. I can then 
reply to Bob, asking for clarification or more details. I can ask 
him to attach the latest build of his software. I can carry on a 
conversation in which Bob believes me to be Alice and spills his 
guts. This is very, very bad.

It isn't Unicode's fault that some letters look like others. That's a 
fault of history.

-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: Unicode and Security

2002-02-07 Thread Asmus Freytag

At 11:53 AM 2/7/02 -0600, David Starner wrote:
a superset of a number of preexisting character sets, so that it was
possible for those users to move to Unicode without problems. Since
important preexisting character sets seperated Greek, Cyrillic and Latin
scripts, Unicode had to. Had Unicode not chosen to follow these
principles, ISO 10646 would have, and it would have become the dominant
character set, with the same problems.

Actually, this discussion ignores that, in order to be workable, a
character set standard for *cased* scripts, must support context
free case transitions.

That's why B, B, and B need to be separated, since they lower case
into the three different characters 'b', 'beta' and 'small B'.
That they are also considered to come from different scripts, just
reinforces that argument.

However, the Latin character that looks like a captital D with stroke
can lowercase into a straight 'd with stroke' or a curly form, which
is an icelandic letter. As long as the two lower case forms aren't
unified, and little speaks in favor of that, least of all, legacy,
then the two upper case forms must be separated as well.

The one exception that survived (Turkish I) is causing innumerable
problems, which supports the rule I gave at the outset.

Any workable multilingual character set containing these characters
will allow spoofing on the character level, and all existing ones
(including 8859-7 for Latin/Greek for example) do.

But, as the discussion shows, spoofing on the word level (.com
for .gov) is alive and well, and supported by any character set
whatsoever. For that reason, it seems to promise little gain to
try to chase the holy grail of a multilingual character set that
somehow avoids the character level spoofing, if the word level
spoofing can go on unchecked.

A./




Re: Unicode and Security

2002-02-07 Thread Barry Caplan

At 12:22 PM 2/7/2002 -0500, Elliotte Rusty Harold wrote:
I've been thinking about security issues in Unicode, and I've come up with 
one that's quite scary and worse than any I've heard before. It uses only 
plaintext, no fonts involved, doesn't require buggy software, and works 
over e-mail instead of the Web. All it requires added to the existing 
infrastructure is internationalized domain names. So in the hope that this 
becomes a self-defeating prophecy, here's the scenario:

snipCan you please update me on your budget? Bob, noticing that the 
e-mail appears to come from Alice, whom he knows and trusts, fires off a 
reply with his confidential information. Only it doesn't go to Alice. It 
goes to me. I can then reply to Bob, asking for clarification or more 
details. I can ask him to attach the latest build of his software. I can 
carry on a conversation in which Bob believes me to be Alice and spills 
his guts. This is very, very bad.


This is precisely the problem digital signing is meant to solve. Signing 
means that Alice has encrypted the message with her private key before 
sending to Bob. Bob then unencrypts the message using Alice's public key. 
If the message does not unencrypt, then Bob should not trust that the 
message is from Alice. This algorithm works independent of transport 
mechanism (email, etc.), or domains. Alice's key stays with Alice,not with 
the domain. Of course, how you exchange trusted keys in the first place is 
another matter, but I am sure this is all covered on a security FAQ somewhere.


E-mail forgery has been a problem for a long time, but it's always been 
one-way. You couldn't trick somebody into sending you a reply because 
doing so required using a different e-mail address than the one they 
expected, thus revealing the message as forged.

There are many many ways to get a response from someone via email, even if 
the address is not recognized or forged. Most involve social engineering 
approaches more than anything else. My mailbox filled with spam will attest 
the that!


With a Unicode enabled mailer, that's no longer true. If the fonts Bob 
(not me, but Bob) chooses for his e-mail program do not make a clear 
distinction between an o and an omicron, this works. There are lots of 
other attacks. The Cyrillic and Greek alphabets provide lots of options 
for replacing single letters in Latin domain names.


Unless all messages are signed (technically feasible) , then there is no 
trust at all. When Outlook/Exchange supports, in fact requires, messages to 
be signed, then this problem will start to dwindle away, at least in the 
email realm.

Of course if there is a method to judge the level of trust for properly 
signed messages that arrive from folks you don't know (a human 
failability), then knowing the origin of the message might not help much 
either. My inbound spam can be verifiably signed, but it is still spam.

In other words, it's not our fault. Blame the client software. Sounds 
distressingly like the Unicode Consortium's approach to these issues. 
Interestingly, my attack works with a single character representation 
(Unicode).


Your attack is only a social engineering attack, not a technical weakness 
inherent in any protocol, or character set (even though there may be such 
issues)

Barry





Re: Unicode and Security

2002-02-07 Thread David Starner

On Thu, Feb 07, 2002 at 12:22:18PM -0500, Elliotte Rusty Harold wrote:
 Interestingly, my attack works with a single character representation 
 (Unicode). It is not dependent on multiple charsets. 

It also works with EUC-JP (and other Japanese charsets), all 8-bit
Russian representations, all 8-bit Greek representations . . .

  The problem needs to be fixed closer to the source.

How about a solution that doesn't involve the destruction of Unicode as
a useful tool? The fact that MD5 sums matching doesn't prove that the
files match is not a bug in MD5 sums. Likewise, the fact that glyphs may
look alike in a _character_ is not a bug in the character encoding.

-- 
David Starner - [EMAIL PROTECTED], dvdeug/jabber.com (Jabber)
Pointless website: http://dvdeug.dhis.org
What we've got is a blue-light special on truth. It's the hottest thing 
with the youth. -- Information Society, Peace and Love, Inc.




Re: Unicode and Security

2002-02-07 Thread Elliotte Rusty Harold

On Thu, Feb 07, 2002 at 10:34:20AM -0500, Elliotte Rusty Harold wrote:


Unicode is a character encoding, not a glyph encoding. Furthermore, it's
a superset of a number of preexisting character sets, so that it was
possible for those users to move to Unicode without problems. Since
important preexisting character sets seperated Greek, Cyrillic and Latin
scripts, Unicode had to. Had Unicode not chosen to follow these
principles, ISO 10646 would have, and it would have become the dominant
character set, with the same problems.


I know why these choices were made. That has nothing to do with the 
question of whether the finished product will or will not cause 
security breaches.

In any case, what is your solution? When the American Mathematical
Society says We need a SMALL CIRCLE for the mathematical texts, do you
say no, we already have the unified LATGRKCRY SMALL O? After they show
you that the two are distinct characters in their texts, do you still
refuse because someone might get confused? The Universal Character Set
can't afford to not encode characters like that.


I'm not sure Unicode can be fixed at this point. The flaws may be too 
deeply embedded. The real solution may involve waiting until 
companies and people start losing significant amounts of money as a 
result of the flaws in Unicode, and then throwing it away and 
replacing it with something else. I don't like that solution, but not 
liking it doesn't mean it ain't gonna happen as soon as Exxon loses a 
few billion dollars because somebody spoofed them and thereby gained 
access to their bidding plans for oil leases. Don't be surprised when 
some large companies start issuing memos forbidding the use of 
Unicode, or blocking all non-ASCII domain names at their firewall.

One possible solution at the domain name system level might be to 
limit domain names to a single Unicode block or group. For instance, 
Greek domain names could be allowed but not domain names that mix 
Greek with Latin. Similarly, you couldn't mix Latin with Cyrillic or 
Cyrillic with Greek. That would at least vastly reduce the 
possibility for domain spoofing, if not eliminate it entirely.

Interesting tidbit: app1e.com (not APPLE.COM but APP1E.COM) is in 
fact already registered. This attack may not be as theoretical as I 
initially thought.
-- 

+---++---+
| Elliotte Rusty Harold | [EMAIL PROTECTED] | Writer/Programmer |
+---++---+
|  The XML Bible, 2nd Edition (Hungry Minds, 2001)   |
|  http://www.ibiblio.org/xml/books/bible2/  |
|   http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/   |
+--+-+
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/  |
|  Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ |
+--+-+




RE: Unicode and Security

2002-02-07 Thread jarkko . hietaniemi

 I'm not sure Unicode can be fixed at this point. The flaws may be too 
 deeply embedded. The real solution may involve waiting until 
 companies and people start losing significant amounts of money as a 
 result of the flaws in Unicode, and then throwing it away and 
 replacing it with something else. I don't like that solution, but not 
 liking it doesn't mean it ain't gonna happen as soon as Exxon loses a 
 few billion dollars because somebody spoofed them and thereby gained 
 access to their bidding plans for oil leases. Don't be surprised when 
 some large companies start issuing memos forbidding the use of 
 Unicode, or blocking all non-ASCII domain names at their firewall.

Doom!  Doom!  Doom!  End is nigh, repent ye sinners!

 Interesting tidbit: app1e.com (not APPLE.COM but APP1E.COM) is in 
 fact already registered. This attack may not be as theoretical as I 
 initially thought.

Interestingly enough, I find this (and whitehouse.com and whitehouse.org,
and micros0ft.com, and ...) a good example for Unicode being largely irrelevant.
Sure, Unicode gives more possibilities for abuse, but I fail to see how a character
encoding standard can stop people from being stupid and not using public keys or
some other means of trust in cases where it matters.  Analogously, people will
keep opening executable attachments promising sex, regardless of whether the
's', 'e', and 'x' are Latin letters or not.





Re: Unicode and Security

2002-02-07 Thread Elliotte Rusty Harold

At 10:16 AM -0800 2/7/02, Barry Caplan wrote:

This is precisely the problem digital signing is meant to solve. 
Signing means that Alice has encrypted the message with her private 
key before sending to Bob. Bob then unencrypts the message using 
Alice's public key. If the message does not unencrypt, then Bob 
should not trust that the message is from Alice. This algorithm 
works independent of transport mechanism (email, etc.), or domains. 
Alice's key stays with Alice,not with the domain. Of course, how you 
exchange trusted keys in the first place is another matter, but I am 
sure this is all covered on a security FAQ somewhere.

That's very nice in theory, but it's not the way people use e-mail in 
practice and it's not going to be. Microsoft, a company with a very 
technically literate employee base, might be able to implement this 
scheme (though I doubt it). A company like Exxon never could. The 
system's just too cumbersome.

E-mail forgery has been a problem for a long time, but it's always 
been one-way. You couldn't trick somebody into sending you a reply 
because doing so required using a different e-mail address than the 
one they expected, thus revealing the message as forged.

There are many many ways to get a response from someone via email, 
even if the address is not recognized or forged. Most involve social 
engineering approaches more than anything else. My mailbox filled 
with spam will attest the that!

Yes, but that doesn't address the fact that this makes the problem far worse.

With a Unicode enabled mailer, that's no longer true. If the fonts 
Bob (not me, but Bob) chooses for his e-mail program do not make a 
clear distinction between an o and an omicron, this works. There 
are lots of other attacks. The Cyrillic and Greek alphabets provide 
lots of options for replacing single letters in Latin domain names.


Unless all messages are signed (technically feasible) , then there 
is no trust at all. When Outlook/Exchange supports, in fact 
requires, messages to be signed, then this problem will start to 
dwindle away, at least in the email realm.


Would that it were so, but it's not. As you suggest, people do trust 
e-mail even when they shouldn't. Trust is a human question decided by 
human beings, not a boolean answer that comes out of a computer 
algorithm. I can trust that the message I'm replying to came from a 
person named Barry Caplan even if I have no proof of that 
whatsoever.

Of course if there is a method to judge the level of trust for 
properly signed messages that arrive from folks you don't know (a 
human failability), then knowing the origin of the message might not 
help much either. My inbound spam can be verifiably signed, but it 
is still spam.

In other words, it's not our fault. Blame the client software. 
Sounds distressingly like the Unicode Consortium's approach to 
these issues. Interestingly, my attack works with a single 
character representation (Unicode).


Your attack is only a social engineering attack, not a technical 
weakness inherent in any protocol, or character set (even though 
there may be such issues)


Technical systems can be more or less resistant to social engineering 
attacks. It is the task of the system designers to make the system 
more resistant. I'm reminded of an IBM mainframe system about a 
decade ago where it was possible to change your password by appending 
a slash and the new password to the old password when logging in. Few 
users knew this but hackers did. It wasn't very hard to convince a 
user on the phone that they needed to set their account to debugging 
mode by logging in and appending /DEBUG to the password. This had the 
affect of changing their account password to DEBUG which you knew and 
they didn't. (It's been awhile, but I vaguely recall that this was 
the hack Phiber Optic used to break into the New York City Public 
School System computers)

This particular system was poorly designed and thus vulnerable to a 
social engineering attack. But make no mistake: it was very much a 
design flaw in the system. It was not the user's fault for not 
knowing about an obscure option to change their password at the login 
prompt. It should not have been there in the first place, and once 
discovered it needed to be taken out. That there were other social 
engineering attacks on the system didn't change the need to fix this 
problem.

Design choices have security consequences. It is not enough to claim 
that your system is secure when used properly or when implemented 
properly. The system must be designed in such a way that it is 
natural to use it properly and it is easy to implement properly. 
Furthermore, failure to do so should be obvious. When a system is 
being used incorrectly, the problem needs to be brutally obvious. In 
Unicode, it is not.

-- 

+---++---+
| Elliotte Rusty Harold | [EMAIL PROTECTED] | Writer/Programmer |

Re: Unicode and Security

2002-02-07 Thread Elliotte Rusty Harold

At 11:34 AM -0800 2/7/02, Asmus Freytag wrote:

But, as the discussion shows, spoofing on the word level (.com
for .gov) is alive and well, and supported by any character set
whatsoever. For that reason, it seems to promise little gain to
try to chase the holy grail of a multilingual character set that
somehow avoids the character level spoofing, if the word level
spoofing can go on unchecked.

Burglary at the broken window level is alive and well. Therefore 
there's little point to putting locks on doors.

I hope the fallacy of the above is obvious, but when translated into 
the computer security domain it's all too common a rationalization, 
as this thread demonstrates.

There are many ways to socially engineer someone into doing something 
they shouldn't do. This is just one of them, and one that's mostly 
theoretical at the current time. However, we still need to plug the 
hole. That there are other, less damaging holes (or even more 
damaging ones) is no excuse for not fixing this one.

Just to pull a number out of a hat, imagine there are 10,000 attacks 
a day using spoofing in the current system. Is this any justification 
for opening up a hole that will add 10,000 more? Of course it's not.
-- 

+---++---+
| Elliotte Rusty Harold | [EMAIL PROTECTED] | Writer/Programmer |
+---++---+
|  The XML Bible, 2nd Edition (Hungry Minds, 2001)   |
|  http://www.ibiblio.org/xml/books/bible2/  |
|   http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/   |
+--+-+
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/  |
|  Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ |
+--+-+




Re: Unicode and Security

2002-02-07 Thread David Starner

On Thu, Feb 07, 2002 at 01:21:29PM -0500, Elliotte Rusty Harold wrote:
 I'm not sure Unicode can be fixed at this point. The flaws may be too 
 deeply embedded. The real solution may involve waiting until 
 companies and people start losing significant amounts of money as a 
 result of the flaws in Unicode, and then throwing it away and 
 replacing it with something else. 

What else? As we keep pointing out, almost every character in Unicode
that normally has the same glyph as another is in Unicode with good
reason. To change that to something that would fit your goals will cost
billions right now just for the change, and then you end with a
character set that can't round trip all the others in common use, and
that is more painful to use for Greeks and Russian, and completely
unusable for mathematicians. I seriously doubt the world would go to a
massively inferior character set because of the security holes you're
talking about.

-- 
David Starner - [EMAIL PROTECTED], dvdeug/jabber.com (Jabber)
Pointless website: http://dvdeug.dhis.org
What we've got is a blue-light special on truth. It's the hottest thing 
with the youth. -- Information Society, Peace and Love, Inc.




Re: Unicode and Security

2002-02-07 Thread Elliotte Rusty Harold

At 12:28 PM -0800 2/7/02, John Hudson wrote:

1. The software industry has already devised mechanisms to protect 
against e-mail forgery, e.g. private-public key encryption.


And nobody uses them because they're too complex.

2. What you describe is criminal fraud and there are laws to protect 
against such 'spoofing' and to punish those who perpetrate it.


Yes, there are. And there are laws to protect against spam and denial 
of service attacks and cracking systems. For that matter there are 
laws to protect against burglary, but I still have locks on my front 
door. Laws are no substitute for prevention.
-- 

+---++---+
| Elliotte Rusty Harold | [EMAIL PROTECTED] | Writer/Programmer |
+---++---+
|  The XML Bible, 2nd Edition (Hungry Minds, 2001)   |
|  http://www.ibiblio.org/xml/books/bible2/  |
|   http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/   |
+--+-+
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/  |
|  Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ |
+--+-+




Re: Unicode and Security

2002-02-07 Thread Kenneth Whistler

Elliotte Rusty Harold wrote:

 Interesting tidbit: app1e.com (not APPLE.COM but APP1E.COM) is in 
 fact already registered. This attack may not be as theoretical as I 
 initially thought.

And a0l.com. (It even has a website: www.a0l.com.) [Which, incidentally,
induces a security hazard, by attempting to download a free/10 sec
download that fills forms and brings offers based on websites you
visit, offered by The Gator Corporation -- even authenticated
by Verisign. But would *you* trust a free download offered by
The Gator Corporation, even if it is really them?? Those guys are
bigtime ad-spy facilitation gators, all right.]
But this kind of stuff is old news, as other examples raised by
people have shown.

 The problem needs to be fixed closer to the source.

It simply isn't practical to try to fix the problem of visual
spoofing by trying to prevent it in the character encoding, or
for that matter in the text-based protocols. As Barry Caplan pointed out,
there are other, more robust means of determining trusted identity,
to foil the cases like your Bob and Alice email scenario. But
of course, the best technology won't prevent stupid, gullible
people from falling into traps set for them by unscrupulous,
cunning scam artists. And nobody is going to keep the dedicated
industrial or military spies from finding ways to crack supposedly
secure systems, if for no other reason than secure systems are
administered by fallible humans.

 The Unicode community 
 has been in serious denial about this for some time. That other 
 technologies also have or contribute to these problems in no way 
 absolves Unicode of its problems.

Well, yeah, I guess I'm still in denial. *hehe*  As Asmus, and
any number of other people on this list have pointed out,
the same problem of Latin/Greek/Cyrillic letter spoofing that
has you so worried is present in any number of other 8-bit
character sets, and because of the nature of the writing
systems, the nature of case, and the requirements on textual
processing, would still be present in any alternative to
Unicode that some other character encoding committee could
come up with.

Even if we sat down to do it all over again, with a big
Security Is Our Primary Concern! banner posted on the wall
for every committee meeting, Unicode 2 would still end up
with separate Latin, Greek, and Cyrillic alphabets encoded.
Not to do so would make any proposed new standard crash and
burn before it left the runway.

The only widely-deployed alternative approach I know of is
ETSI GSM 03.38 (used in mobile telephony), which has Greek
(uppercase only) added to Latin, using the same codes for the
uppercase Greek letters which look like Latin (ABEHIKMNOPTXYZ).
But this approach is so patently nonextensible and so
unworkable for any significant text processing requirements,
that SC2, ANSI, or other major players in character encoding
have never seriously considered such an approach for
character encoding.

So perhaps turnabout is fair play here. I'd say that a certain
portion of the security community has been in serious denial
about the nature of character encodings for some time.

--Ken




Re: Unicode and Security

2002-02-07 Thread James E. Agenbroad

 Thursday, February 7, 2002
Would making the about to be misled respondent type the address of the
intended person (with a roman 'o', not a greek omicron) and then having
the system see if they match detect and thwart such tricks?  The
respondent is already typing so it's not a large extra burden.
 Regards,
  Jim Agenbroad (dislcaimer and addresses at bottom)
On Thu, 7 Feb 2002, Michael Everson wrote:

 At 12:22 -0500 2002-02-07, Elliotte Rusty Harold wrote:
 
 For the sake of argument, let's call the company they work at 
 Microsoft, but this attack could hit most companies with a .com 
 address. Let's say I register microsoft.com, only the fifth letter 
 isn't a lower-case Latin o. It's actually a lower case Greek 
 omicron. I then forge a believable letter from [EMAIL PROTECTED] 
 to [EMAIL PROTECTED] saying Can you please update me on your 
 budget? Bob, noticing that the e-mail appears to come from Alice, 
 whom he knows and trusts, fires off a reply with his confidential 
 information. Only it doesn't go to Alice. It goes to me. I can then 
 reply to Bob, asking for clarification or more details. I can ask 
 him to attach the latest build of his software. I can carry on a 
 conversation in which Bob believes me to be Alice and spills his 
 guts. This is very, very bad.
 
 It isn't Unicode's fault that some letters look like others. That's a 
 fault of history.
 
 -- 
 Michael Everson *** Everson Typography *** http://www.evertype.com
 
 

 Regards,
  Jim Agenbroad ( [EMAIL PROTECTED] )
 It is not true that people stop pursuing their dreams because they
grow old, they grow old because they stop pursuing their dreams. Adapted
from a letter by Gabriel Garcia Marquez.
 The above are purely personal opinions, not necessarily the official
views of any government or any agency of any.
 Addresses: Office: Phone: 202 707-9612; Fax: 202 707-0955; US
mail: I.T.S. Sys.Dev.Gp.4, Library of Congress, 101 Independence Ave. SE, 
Washington, D.C. 20540-9334 U.S.A.
Home: Phone: 301 946-7326; US mail: Box 291, Garrett Park, MD 20896.  





Re: Unicode and Security

2002-02-07 Thread John Hudson

At 11:42 2/7/2002, Elliotte Rusty Harold wrote:

Burglary at the broken window level is alive and well. Therefore there's 
little point to putting locks on doors.

I hope the fallacy of the above is obvious, but when translated into the 
computer security domain it's all too common a rationalization, as this 
thread demonstrates.

I disagree. Suggesting that many of the benefits of the Unicode encoding 
model should be abandoned because they might be abused is like saying 
'Burglary at the broken window level is alive and well. Therefore there's 
little point in possessing anything.' Of course there is a point to putting 
locks on doors, but that is analogous to putting locks on e-mail, not to 
obsessing about one potential security problem in one particular software 
standard. If you were able to fix all the 'flaws' in Unicode, you would a) 
be left with a less useful character encoding standard, b) still be facing 
all the remaining security holes in all the remaining software standards 
and applications, and c) have done nothing to combat user ignorance and 
gullibility just waiting to be taken advantage of.

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

... es ist ein unwiederbringliches Bild der Vergangenheit,
das mit jeder Gegenwart zu verschwinden droht, die sich
nicht in ihm gemeint erkannte.

... every image of the past that is not recognized by the
present as one of its own concerns threatens to disappear
irretrievably.
   Walter Benjamin





Re: Unicode and Security

2002-02-07 Thread John Hudson

I think this is probably beginning to get off-topic, but

At 12:45 2/7/2002, Elliotte Rusty Harold wrote:

1. The software industry has already devised mechanisms to protect 
against e-mail forgery, e.g. private-public key encryption.

And nobody uses them because they're too complex.

I think fewer people use them than should not because they are too complex 
but because a) not enough people know about them and b) too many of the 
people who know about them believe them to be a lot more complex than they are.

A few messages ago you suggested that some companies might introduce 'no 
Unicode' policies in order to protect against spoofing (despite that fact 
that many alternative character encodings would leave them equally 
vulnerable). I think it is far more likely that companies would introduce 
compulsory e-mail encryption and signing.

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

... es ist ein unwiederbringliches Bild der Vergangenheit,
das mit jeder Gegenwart zu verschwinden droht, die sich
nicht in ihm gemeint erkannte.

... every image of the past that is not recognized by the
present as one of its own concerns threatens to disappear
irretrievably.
   Walter Benjamin





Re: Unicode and Security

2002-02-07 Thread John Hudson

At 10:21 2/7/2002, Elliotte Rusty Harold wrote:

I'm not sure Unicode can be fixed at this point. The flaws may be too 
deeply embedded.

What flaws? The fact that glyphs in different scripts may be similar or 
identical in some typefaces, and misrepresentation is possible because 
Unicode separately encodes these glyphs as distinct characters? I'm sorry, 
but that is the nature of writing systems, and Unicode's encoding of these 
characters is inherited from existing standard character sets. Is this a 
flaw? Is this as great a flaw as glyph-based encoding would have been? Is 
it as great a flaw as hampering backwards compatibility with other 
encodings would have been?

In your examples, you seem to ignore two things:

1. The software industry has already devised mechanisms to protect against 
e-mail forgery, e.g. private-public key encryption.

2. What you describe is criminal fraud and there are laws to protect 
against such 'spoofing' and to punish those who perpetrate it.

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

... es ist ein unwiederbringliches Bild der Vergangenheit,
das mit jeder Gegenwart zu verschwinden droht, die sich
nicht in ihm gemeint erkannte.

... every image of the past that is not recognized by the
present as one of its own concerns threatens to disappear
irretrievably.
   Walter Benjamin





Re: Unicode and Security

2002-02-07 Thread Barry Caplan

At 02:42 PM 2/7/2002 -0500, Elliotte Rusty Harold wrote:
At 11:34 AM -0800 2/7/02, Asmus Freytag wrote:

But, as the discussion shows, spoofing on the word level (.com
for .gov) is alive and well, and supported by any character set
whatsoever. For that reason, it seems to promise little gain to
try to chase the holy grail of a multilingual character set that
somehow avoids the character level spoofing, if the word level
spoofing can go on unchecked.

Burglary at the broken window level is alive and well. Therefore there's 
little point to putting locks on doors.

I hope the fallacy of the above is obvious, but when translated into the 
computer security domain it's all too common a rationalization, as this 
thread demonstrates.

It is not obvious to me that there is a fallacy at all, let alone what it 
is. Instead of stating that we should be able to infer the fallacy, please 
state it, and a possible solution explicitly.

It seems to me we have already proposed working, and available (if not 
elegant) solutions to the issue of trust of content.

Now the issue seems to be trust of domain names.

My browser already has built in support for identifying groups of domains I 
can assign varying levels of trust to, base on certificate technology. NOt 
elegant, but available.

Similarly, something for email could e done using today's technology.

More importantly, wrt DNS: under what circumstances can you, today, or in 
the future, actually trust that the address resolving information you get 
is accurate? None, really. The packets go too many places on the way that 
could change them. And even if it is accurate, which of course it usually 
is, how can you be sure that packets at a lower level will actually be 
delivered, as intended, and not misdirected or copied elsewhere? You can't, 
really, for the same reason. This is the nature of the system, especially 
at the IP level. None of this has to the slightest bit to do with what 
characters are used for domain names, and hence will not go away with any 
changes to DNS. It has everything to do with why data should be encrypted 
if you care about security of data.


There are many ways to socially engineer someone into doing something they 
shouldn't do. This is just one of them, and one that's mostly theoretical 
at the current time. However, we still need to plug the hole.


That there are other, less damaging holes (or even more damaging ones) is 
no excuse for not fixing this one.



The source code for bind is available. Go ahead and fix it. good luck 
persuading people to upgrade such a mission critical part of the internet 
though.


Just to pull a number out of a hat, imagine there are 10,000 attacks a day 
using spoofing in the current system. Is this any justification for 
opening up a hole that will add 10,000 more? Of course it's not.


I still don't see the attack as anything but social engineering. That a 
telemarketer or door-to-door salesman can get my credit card info by 
misrepresenting their intent does not mean there is a flaw in either the 
phone numbering scheme, or the credit card system. Your attack is exactly 
analogous.

Barry





Re: Unicode and Security

2002-02-07 Thread John Cowan

Kenneth Whistler wrote:


 The only widely-deployed alternative approach I know of is
 ETSI GSM 03.38 (used in mobile telephony),


A truly bizarre character set:  it supports English, French,
mainland Scandinavian languages, Italian, Spanish with Graves, and
GREEK SHOUTING.

-- 
John Cowan [EMAIL PROTECTED] http://www.reutershealth.com
I amar prestar aen, han mathon ne nen,http://www.ccil.org/~cowan
han mathon ne chae, a han noston ne 'wilith.  --Galadriel, _LOTR:FOTR_





Re: Unicode and Security: Domain Names

2002-02-07 Thread Tom Gewecke

I note that companies like Verisign already claim to offer domain names
in dozens of languages and scripts.  Apparently these are converted by
something called RACE encoding to ASCII for actual use on the internet.

Does anyone know anything about RACE encoding and its properties?






RE: Unicode and Security: Domain Names

2002-02-07 Thread Addison Phillips [wM]

It is one of the competitors for internationalized domain names. The ACE
stands for ASCII Compatible Encoding.

The encoding which appears likely to gain overall acceptance is called DUDE
and can be found here: http://www.i-d-n.net/draft/draft-ietf-idn-dude-02.txt

There are several ACE encoding demos on the 'Net (Mark Davis has one at
www.macchiato.com, I have one at www.inter-locale.com)

http://www.i-d-n.net is where you can find out about a whole zoo of Unicode
transfer encoding schemes proposed for use in DNS, plus the relevant issues,
of which there turn out to be a number when creating I18n domain names. The
early implementers have mostly ignored these issues and the interplay
between the ultimate standard and existing registrars should be interesting.

Regards,

Addison

Addison P. Phillips
Globalization Architect / Manager, Globalization Engineering
webMethods, Inc.  |  The Business Integration Company
432 Lakeside Drive, Sunnyvale, California, USA
+1 408.962.5487 (phone)  +1 408.210.3569 (mobile)
-
Internationalization is an architecture. It is not a feature.

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
Behalf Of Tom Gewecke
Sent: Thursday, February 07, 2002 3:20 PM
To: [EMAIL PROTECTED]
Subject: Re: Unicode and Security: Domain Names


I note that companies like Verisign already claim to offer domain names
in dozens of languages and scripts.  Apparently these are converted by
something called RACE encoding to ASCII for actual use on the internet.

Does anyone know anything about RACE encoding and its properties?








Re: Unicode and Security

2002-02-07 Thread Asmus Freytag

At 01:21 PM 2/7/02 -0500, Elliotte Rusty Harold wrote:

I'm not sure Unicode can be fixed at this point. The flaws may be too 
deeply embedded. The real solution may involve waiting until companies and 
people start losing significant amounts of money as a result of the flaws 
in Unicode, and then throwing it away and replacing it with something else.

This sounds nice and dramatic, but misses the point that the kinds of 
issues you highlighted are absolutely common to *all* character sets 
containing Latin and Greek, or Latin and Cyrillic characters, suggesting 
that you are simply grandstanding here, instead of trying to find real 
solutions to your problem.

Earlier, you accused Unicode of being in denial about security issues: It 
is you who is in denial about some underlying realities, among which is 
that there are security issues that cannot be fixed by designing a 
'better' character set. You remind me of the people who keep on designing 
perpetual motion devices, even after the laws of thermodynamics proved the 
futility of such efforts.

If you are interested in advancing security you would stop from barking up 
this blind alley and focus your energy on attacking the problems with other 
means. Plenty of suggestions have been made in this space over the last few 
days. Some of all of these should be explored.  But if we learned anything 
useful in this exchange, it is that no security scheme should be designed 
so that it is dependent on the character encoding as primary defense 
against spoofing. Doing so would burden the character encoding with a task 
it will never be capable of fulfilling, since it would mean seriously 
compromising support for the tasks for which it was created in the first 
place.

A./






Re: Unicode and Security

2002-02-07 Thread Roozbeh Pournader

On Thu, 7 Feb 2002, Elliotte Rusty Harold wrote:

 Trust is a human question decided by human beings, not a boolean answer
 that comes out of a computer algorithm. I can trust that the message I'm
 replying to came from a person named Barry Caplan even if I have no
 proof of that whatsoever.

Or that the book you're reading has been written by a person named 
Nicolas Bourbaki...

(Sorry, I love the idea. I could not stop myself.)

roozbeh





Re: Unicode and Security

2002-02-07 Thread Barry Caplan

At 04:17 AM 2/8/2002 +0330, Roozbeh Pournader wrote:
On Thu, 7 Feb 2002, Elliotte Rusty
Harold wrote:
 Trust is a human question decided by human beings, not a boolean
answer
 that comes out of a computer algorithm. I can trust that the message
I'm
 replying to came from a person named Barry Caplan even
if I have no
 proof of that whatsoever.
Or that the book you're reading has been written by a person named 
Nicolas Bourbaki...
(Sorry, I love the idea. I could not stop myself.)
roozbeh
On what basis can Elliotte know that a message purported to
be from Barry Caplan actually is from Barry
Caplan, or that there even is a Barry Caplan? The
person writing this, who claims to be Barry Caplan, has never
met anyone named Elliotte Rusty Harold to the best of his
recollection. He (Barry Caplan) does claim to personally be
acquainted with many others on this list though - hi - sorry I missed you
in DC! :)

Best Regards,
Barry Caplan
www.i18n.com
- coming soon, preview available now
News | Tools | Process for Global Software
Team I18N



Solipsism (was RE: Unicode and Security)

2002-02-07 Thread Rick Cameron
Title: Message



What makes me 
think you exist, anyway? ;^)

- rick (or so 
I say)

  
  -Original Message-From: Barry Caplan 
  [mailto:[EMAIL PROTECTED]] Sent: Thursday, 7 February 2002 
  17:13To: Unicode ListSubject: Re: Unicode and 
  SecurityAt 04:17 AM 2/8/2002 +0330, Roozbeh Pournader 
  wrote:
  On Thu, 7 Feb 2002, Elliotte 
Rusty Harold wrote: Trust is a human question decided by human 
beings, not a boolean answer that comes out of a computer algorithm. 
I can trust that the message I'm replying to came from a person 
named "Barry Caplan" even if I have no proof of that 
whatsoever.Or that the book you're reading has been written by a 
person named "Nicolas Bourbaki"...(Sorry, I love the idea. I 
could not stop myself.)roozbehOn what basis can 
  "Elliotte" know that a message purported to be from "Barry Caplan" actually is 
  from "Barry Caplan", or that there even is a "Barry Caplan"? The person 
  writing this, who claims to be "Barry Caplan", has never met anyone named 
  "Elliotte Rusty Harold" to the best of his recollection. He ("Barry Caplan") 
  does claim to personally be acquainted with many others on this list though - 
  hi - sorry I missed you in DC! :)
  Best Regards,Barry Caplanwww.i18n.com - 
  coming soon, preview available nowNews | Tools | Process for Global 
  SoftwareTeam I18N


Re: Unicode and Security

2002-02-07 Thread Elliotte Rusty Harold

At 4:31 PM -0500 2/7/02, James E. Agenbroad wrote:
  Thursday, February 7, 2002
Would making the about to be misled respondent type the address of the
intended person (with a roman 'o', not a greek omicron) and then having
the system see if they match detect and thwart such tricks?  The
respondent is already typing so it's not a large extra burden.

Yes, that would probably work, though users would complain. Having 
the outgoing SMTP server drop all messages addressed to spoofs of the 
corporate domain works too on an enterprise level. And using message 
authentication based on public-key certification works too.

The problem is that all of these or any other client-based solution 
you come up with is only going to be implemented in some clients. 
Many, and at least initially most, users are not going to have any 
such protections. This needs to be cut off at the protocol level. It 
is far better to prevent the spoofed messages from being sent in the 
first place than to offer clients tools to stop them once they're 
free in the ether.

The maintainers of the Net and the Web at all levels from local sys 
admins to ISPs to spec implementers to spec writers to router vendors 
are rushing from hole to hole, trying to plug them faster than the 
script kiddies can exploit them. Even Microsoft is starting to 
recognize their culpability for producing an insecure infrastructure. 
This is a result of years of Internet development in all layers from 
the physical hardware on up to the browser without a real 
understanding of security.

For past protocols like HTTP and URLs, we can plead ignorance and 
lack of imagination. We never knew how bad things were going to get. 
Now we do. We no longer have any excuses for knowingly designing 
systems that are open to spoofing, denial of service, or outright 
system cracking. Mistakes will of course continue to be made, but we 
have to try to make as few as possible and fix the problems where we 
can as soon as we can. There are legacy problems in HTTP, DNS, URLs, 
and many other systems; but when we're designing something truly new 
like internationalized domain names it only makes sense to avoid 
these known problems.
-- 

+---++---+
| Elliotte Rusty Harold | [EMAIL PROTECTED] | Writer/Programmer |
+---++---+
|  The XML Bible, 2nd Edition (Hungry Minds, 2001)   |
|  http://www.ibiblio.org/xml/books/bible2/  |
|   http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/   |
+--+-+
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/  |
|  Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ |
+--+-+




Re: Unicode and Security

2002-02-07 Thread Elliotte Rusty Harold

At 5:12 PM -0800 2/7/02, Barry Caplan wrote:

On what basis can Elliotte know that a message purported to be from 
Barry Caplan actually is from Barry Caplan, or that there even is 
a Barry Caplan? The person writing this, who claims to be Barry 
Caplan, has never met anyone named Elliotte Rusty Harold to the 
best of his recollection. He (Barry Caplan) does claim to 
personally be acquainted with many others on this list though - hi - 
sorry I missed you in DC! :)


My point is exactly that I have no knowledge of this, but trust is 
not about knowledge. Trust is a decision made in the human brain on a 
not necessarily rational basis. In a rational world, trust would only 
be given to statements with some level of proof. We do not live in 
this rational world. In practice untrustworthy entities will be 
trusted both as to identity and other statements. Our system should 
be robust in the face of this.
-- 

+---++---+
| Elliotte Rusty Harold | [EMAIL PROTECTED] | Writer/Programmer |
+---++---+
|  The XML Bible, 2nd Edition (Hungry Minds, 2001)   |
|  http://www.ibiblio.org/xml/books/bible2/  |
|   http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/   |
+--+-+
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/  |
|  Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ |
+--+-+




Re: Unicode and Security

2002-02-07 Thread Curtis Clark

At 10:21 AM 2/7/02, Elliotte Rusty Harold wrote:
I don't like that solution, but not liking it doesn't mean it ain't gonna 
happen as soon as Exxon loses a few billion dollars because somebody 
spoofed them and thereby gained access to their bidding plans for oil leases.

Enron lost a few billion dollars, and iirc Unicode was not involved.


-- 
Curtis Clark  http://www.csupomona.edu/~jcclark/
Mockingbird Font Works  http://www.mockfont.com/






Re: Unicode and Security

2002-02-07 Thread Kenneth Whistler

Elliotte Rusty Harold wrote:

 For past protocols like HTTP and URLs, we can plead ignorance and 
 lack of imagination. We never knew how bad things were going to get. 
 Now we do. We no longer have any excuses for knowingly designing 
 systems that are open to spoofing, denial of service, or outright 
 system cracking. Mistakes will of course continue to be made, but we 
 have to try to make as few as possible and fix the problems where we 
 can as soon as we can. There are legacy problems in HTTP, DNS, URLs, 
 and many other systems; but when we're designing something truly new 
 like internationalized domain names it only makes sense to avoid 
 these known problems.

And I'm with you all the way to this point. Where we part company,
I think is at the implied and so...

If the basic requirements are that we find a way (for IDN) to
present meaningful strings to end users (note, not any natural
language phrase, but just a suitably contained, meaningful
subset thereof that users can live with) and then find a foolproof
way to map that to IP numbers, *and* that those meaningful
strings be truly internationalized and not just the current
restricted subset of ASCII, then we have a problem.

Either you have to more or less completely ignore the structure
and integrity of writing systems, and try to constrain down the
problem to a totally etic, psychological perception-based notion
of no visual confusion allowed in visible symbols to be
represented in strings, anywhere, anytime.

Or you have to admit that internationalizing the strings even
just the teensiest bit (e.g. allowing Cyrillic in the door along
with ASCII, or for that matter just allowing in accented Latin
letters along with ASCII) is going to increase the confusability
level in visible symbols used in strings.

The reductio ad absurdum of the first position is that allowing
even a single additional character in domain names, no matter
how distinct or innocuous, incrementally increases the opportunity
for confusion, spoofing, or other monkey business over the
current situation. So if we no longer have any excuses to
do anything that might knowingly increase the opportunity for
security holes, then logically, we should just shut down the
whole IDN effort and proclaim to the world, Let them eat ASCII!

Heck, it doesn't even have to be close to visual confusability to
cause a problem. What if IDN allowed just two Han characters
in, and nothing else, and those Han characters were for nihon
(Japanese for Japan). Then somebody could register Microsoftnihon.com
and never mind the naive user -- the knowledgable, biliterate
English/Japanese user could be spoofed into thinking that was
Microsoft's Japan division, instead of Trojans 'R Us.

I think that rather than coming to the Unicode list to
proclaim Unicode is a security risk! The sky is falling!
the better way to conceive this is that globalization of the
IT infrastructure of the world is a difficult business that
presents many new possibilities for security risks if
internationalization of existing protocols and the handling
of textual data from around the world is not done carefully.

If the customers of the Internet are demanding that it be
internationalized better that it currently is (and I believe
they are), and if part of that internationalization is responding
to demands that Japan be able to have Japanese domain names,
China have Chinese domain names, etc., as I believe it is, then
we just have to come to grips with the complexity of
text handling that that implies. And in turn that means
that just as years ago system programmers learned to their
chagrin that their systems broke because they had been
doing casemapping with c -= 0x20 assignments, so Internet
protocol developers are going to have to learn that their
security is broken if it depends on the structure and
constraints of ASCII, or on the use of small glyph sets where
all the glyphs are visually distinct from each other.

--Ken




Re: [idn] RE: Unicode and Security

2002-02-07 Thread DougEwell2

[EMAIL PROTECTED] observed:

 Analogously, people will keep opening executable attachments promising sex,
 regardless of whether the 's', 'e', and 'x' are Latin letters or not.

They're not, of course:  U+0455 U+0435 U+0445

-Doug Ewell
 Fullerton, California
 (address will soon change to dewell at adelphia dot net)




Re: Unicode and Security

2002-02-06 Thread John H. Jenkins

On Wednesday, February 6, 2002, at 11:12 AM, Lars Kristan wrote:

 Maybe digitally signed messages and bank accounts are not that good of an
 example, since people would be more careful there. Another case where this
 may get exploited will be domain names, once Unicode is allowed there. 
 While
 www.example.com may be a company I trust, www.example.com with a Cyrillic
 'a' in it may be a hacker (and no, I did not imply he/she would be from a
 county that uses Cyrillic) trying to get me to visit the site.


Right, but right now is that people are typing things like www.whitehouse.
com instead of www.whitehouse.gov (or, for that matter, www.unicode.com).  
How likely is it that someone will accidentally type www.s$B'Q(Bmple.com 
instead of www.sample.com?

The original focus was on digital signatures, and I still don't get the 
objection.  Because I don't know *precisely* what bytes Microsoft Word or 
Adobe Acrobat use, do I refuse to sign documents they create?  Is that the 
idea?  I mean, good heavens, I don't even know *precisely* what bytes Mail.
app is going to use for this email.  Should I refuse to sign it?

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/


RE: Unicode and Security

2002-02-06 Thread Lars Kristan

Well, I was tempted to join the discussion for a while now, but one of the
things that stopped me was that I didn't quite understand why it was so
focused on the bidi stuff.

To make a certain portion of the text look like something else should be
easier than that. OK, invisible non-spacing glyphs would be just one more
method, I guess. I was thinking of replacing some characters with their
look-alikes (probably even rendered from the same data in a font), like
using U+0430 instead of U+0061 (Cyrillic 'a' instead of Latin 'a').

Maybe digitally signed messages and bank accounts are not that good of an
example, since people would be more careful there. Another case where this
may get exploited will be domain names, once Unicode is allowed there. While
www.example.com may be a company I trust, www.example.com with a Cyrillic
'a' in it may be a hacker (and no, I did not imply he/she would be from a
county that uses Cyrillic) trying to get me to visit the site.

Yes, it's a fraud. And I want to thank John for pointing that out. But we're
making it a hell of a lot easier now. In ASCII, all one could try was
www.examp1e.com and a couple of other tricks, but it was maybe 10 tricks in
ASCII, some more in case of Latin 1. How many are there with Unicode? U,
a million?

Well, nothing wrong with Unicode of course. Just means that there will need
to be an option in your browser to reject any site without a digital
certificate, and perhaps it will need to be turned on by default. So, there
are ways to fight this (and I am afraid relying on police will not do it),
but maybe these things should be well in place before someone gets a chance
to exploit the new ways.


Just a thought.


Regards,

Lars


 -Original Message-
 From: John Hudson [mailto:[EMAIL PROTECTED]]
 Sent: Wednesday, February 06, 2002 01:54
 To: Unicode List
 Subject: Re: Unicode and Security
 
 
 At 09:39 2/5/2002, John H. Jenkins wrote:
 
 Y'know, I must confess to not following this thread at all.  
 Yes, it is 
 impossible to tell from the glyphs on the screen what 
 sequence of Unicode 
 characters  was used to generate them.  Just *how*, exactly, 
 is this a 
 security problem?
 
 I was wondering the same thing.
 
 I can make an OpenType font for that uses contextual substitution to 
 replace the phrase 'The licensee also agrees to pay the type designer 
 $10,000 every time he uses the lowercase e' with a series of 
 invisible 
 non-spacing glyphs. Of course, the backing store will contain 
 my dastardly 
 hidden clause and that is the text the unwitting victim will 
 electronically 
 sign. Hahahaha, he laughed maniacally!
 
 This has nothing to do with encoding, does not rely on difficult and 
 totally improbable manipulation of a bidirectional algorithm 
 and, most 
 relevantly, is *not* a security problem in the OpenType font 
 specification. 
 It is an example of fraud. I suppose if there was a software 
 solution to 
 all such dangers, we wouldn't need police, felony charges, the court 
 system, prisons, or any of the other things we rely on to 
 protect honest 
 people against dishonest.
 
 John Hudson
 
 Tiro Typeworkswww.tiro.com
 Vancouver, BC [EMAIL PROTECTED]
 
 ... es ist ein unwiederbringliches Bild der Vergangenheit,
 das mit jeder Gegenwart zu verschwinden droht, die sich
 nicht in ihm gemeint erkannte.
 
 ... every image of the past that is not recognized by the
 present as one of its own concerns threatens to disappear
 irretrievably.
Walter Benjamin
 




RE: Unicode and Security

2002-02-06 Thread Yves Arrouye

 Well, nothing wrong with Unicode of course. Just means that there will
 need
 to be an option in your browser to reject any site without a digital
 certificate, and perhaps it will need to be turned on by default. So,

Nothing prevents sites running frauds to get a certificate matching their
name. If the price of certificates drop, or if the fraud has good margins
enough, it will not even be a big inconvenience.

YA





Re: Unicode and Security

2002-02-06 Thread David Starner

On Wed, Feb 06, 2002 at 07:12:19PM +0100, Lars Kristan wrote:
 Well, I was tempted to join the discussion for a while now, but one of the
 things that stopped me was that I didn't quite understand why it was so
 focused on the bidi stuff.

Because it can have a dramatic effect, whereas changing look-alikes has
no effect on the displayed text.
 
 Yes, it's a fraud. And I want to thank John for pointing that out. But we're
 making it a hell of a lot easier now. In ASCII, all one could try was
 www.examp1e.com and a couple of other tricks, but it was maybe 10 tricks in
 ASCII, some more in case of Latin 1. How many are there with Unicode? U,
 a million?

How often does it matter? I can see registars not registering stuff that
was obviously an attempt to defraud, but you won't get there if you type
it in yourself. It's easier for someone to set up a forged Microsoft
link, but it's easy to check that. Rather than everyone being digitally
signed, just checking if it's multiscript and pop up a warning will
catch most of the cases. You could colorcode the major scripts with
confusables . . .
 
-- 
David Starner - [EMAIL PROTECTED], dvdeug/jabber.com (Jabber)
Pointless website: http://dvdeug.dhis.org
What we've got is a blue-light special on truth. It's the hottest thing 
with the youth. -- Information Society, Peace and Love, Inc.




Re: Unicode and Security

2002-02-06 Thread Barry Caplan

At 11:54 AM 2/6/2002 -0700, John H. Jenkins wrote:
The original focus was on digital signatures, and I still don't get the 
objection.  Because I don't know *precisely* what bytes Microsoft Word or 
Adobe Acrobat use, do I refuse to sign documents they create?  Is that the 
idea?  I mean, good heavens, I don't even know *precisely* what bytes Mail.
app is going to use for this email.  Should I refuse to sign it?


I don't think the main issue is whether or not you should sign it. I think 
the main issue the original poster tired to raise, is that as the recipient 
of such a signed document, he is not persuaded he should trust it.

This is a serious issue, although as several have noted, not a Unicode-only 
one. No one doubts the security of the encryption algorithms used for 
signing. But the issue of trust is critical.

In the analog world, people are expected read and understand documents, and 
in general, the worlds legal systems are set up to recognize that a 
signature (or stamp or seal or whatever) is binding evidence that such care 
was taken (even if it wasn't really taken). In the digital world, 
individual behavior and legal processes both may not be so well formed to 
support the technology of digital signatures. I believe this is what the 
original point was.

IANAL, but enforceability of such a kluged, digitally-signed document seems 
in doubt. There is a long history of that type of contract support in our 
US legal systems, and probably others as well. There will surely be 
difficulties adapting it to the digital domain, but I think the basis for 
support is already there

Anyway, it is not, but maybe should be well known, that the purpose of 
digital signatures, is to verify who the sender is, and to verify that the 
document has not been changed in transit. That it might contain tricky 
language or information is an important thing to note, but the reader still 
needs to rely on the document's contents with the same skeptical eye as if 
it were not printed. Just as the Unicode bi-di algorithm makes no claims at 
reversibility, digital signing algorithms make no claim that the signed 
contents are correct,or even useful.







Re: Unicode and Security

2002-02-05 Thread David Starner

On Tue, Feb 05, 2002 at 01:27:49PM +0900, Gaspar Sinai wrote:
 Talking about characters: I think  bi-di should not be in
 Unicode  Standard because it is not a character.
 It is an algorithm.

Why would that fix the problem? Then everyone would just choose their
own algorithim, and instead of a couple different renderings, with the
ability to check it against the standard, you'd get a thousand, each
equally correct.

 I feel sorry for interrupting in the Let's praise and
 celebrate Unicode mood of this mailing list.

Head over to the POSIX list and start complaining about the maldesign of
fixed width buffers and see how long they listen to you. This is the
Unicode list - that means people here are interested in working in
Unicode. The BIDI algorithm is frozen - seriously changing it would
break way too many implementations to be considered. (Note that gets -
so broken that the GNU linker will complain if you use it - is a standard
part of a POSIX system. There's no evidence that the BIDI algorithm is
anywhere near that broken.)

 I wish there was another world character standard besides
 Unicode and not only  half-hearted attempts like bytext.

Unicode has its problems, but it works. It takes a lot of work to build
to create a character standard, and it's hard to find a bunch of people
to work on a project to go against the industry leader without serious
problems in that leader. Anyone on this list could produce a better
Unicode than Unicode, just like any Unix person could produce a better
Unix than Unix. But it's not going to be enough better that it's worth
losing backward compatibility, and any serious changes will never get
consensus. So a standard is entrenched. (Cf. Fortran, Unix, ASCII).

The result is that you get the bizarre ideas of individuals, like Bytext
and Rosetta, never really fully fleshed out or implemented, and the
Japanese-centric universal charsets like Tron and ISO-2022-INT-1.
(I've heard rumors of other cultures producing universal charsets that
fix Unicode's bugs for their language only. I'm not familiar with
them, though.) 

The first are too quirky to be useful.  (Bytext's author compared it
lambda calculus and Unicode to arithmetic.  In some ways, it's an
accurate comparison; while Church numbers are interesting, every real
system directly supports arithmetic on binary numbers, as that's much
more efficent and simple.) 

The later don't support non-Japanese scripts as well as Unicode, and
don't sell well to non-Japanese audiences. ISO-2022-INT-1 supports 7
94x94 character charsets for CJK audiences (roughly 60,000 characters
before any sort of unification), and ISO-8859-1 and ISO-8859-7*, leaving
the Russians, the Hungarians, the Arabs and many more out in the cold.
To the best of my knowledge, there's not enough information avalaible to
the non-Japanese speaker to implement Tron. (Not only is information
available about ISO-10646-1/Unicode in more languages, English is also
more generally known than Japanese.) Again to the best of knowledge,
there has no improvements to non-CJK sections of Tron (besides Braille)
after the Unicode 2.0, whereas Unicode has continually updated to keep
up - Unicode 3.2 handles more archaic documents, more languages and more
scripts than ever before, as well as better linguistic and mathematic
support. 

In all honesty, I only care the CJK parts of Unicode in that they
convince people to implement Unicode so I can play with the Latin,
Greek, Cherokee, IPA and Mathematics sections. Encoding 50,000 more Han
ideographs produces a lot less interest in me than encoding Gothic. A
lot of the audience is the same - who cares about ancient Greek? Will
it handle the Dhammapada in Pali without error?. It appears the serious
attempts to topple Unicode - Tron, for example - forgot that, and looked
to their own issues, leaving Unicode to be the only real attempt to
serve the needs of everyone and hence victor.

* It seems there's disprepancy in what ISO-2022-INT-1 encodes. Another
source adds ISO-8859-2 and ISO-8859-5, still leaving the Arabs, the
residents of the Baltic states, Hindi and a lot of the rest of the world
out.

-- 
David Starner - [EMAIL PROTECTED], dvdeug/jabber.com (Jabber)
Pointless website: http://dvdeug.dhis.org
What we've got is a blue-light special on truth. It's the hottest thing 
with the youth. -- Information Society, Peace and Love, Inc.




Re: Unicode and Security

2002-02-05 Thread Michael Everson

At 13:27 +0900 2002-02-05, Gaspar Sinai wrote:

Just because some companies who have influence on Unicode
Consortium use some algorithm, like backing store and re-mapping,
it does  not mean that this is the only way.  And I don't even
think they  do in cases when character conversion is necessary.

Backing store and remapping are fundamental principles of Unicode. 
They are implemented by people who want to implement the Unicode 
standard.

For me it is very imprtant what a naive user sees on the screen.

For me, too.

Yudit does convert the input to view order and back. Text
direction and  end of line is clearly indicated. [...]

If the standard wants me to confuse the user, I would rather dump the
standard than comply.

I haven't been able to follow how I, the user, am confused by the 
Unicode Standard. It sounds to me as though you want a Show 
Invisibles option to disassemble Hebrew or Arabic text and display 
them in LTR order without any ligation so that the user can see what 
is in the backing store. That's a valid thing to want to do, but it's 
a special case of rendering, which has little to do with the 
algorithm.

I wish there was another world character standard besides
Unicode and not only  half-hearted attempts like bytext.
Talking about characters: I think  bi-di should not be in
Unicode  Standard because it is not a character.
It is an algorithm.

Yes, it is. The Unicode Standard does not just encode characters. It 
also provides tools for implementation.

I feel sorry for interrupting in the Let's praise and
celebrate Unicode mood of this mailing list.

We like Unicode. We work to make it better. Sometimes people come to 
us with problems that aren't problems, or raise issues that have been 
dealt with many times before. Sometimes people bring us real problems 
that need real solutions. We're an intelligent bunch, methinks, and 
we can tell the difference. Unicode may have warts, but it's a lot 
better than ISO 2022.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: Unicode and Security

2002-02-05 Thread Otto Stolz

Gaspar Sinai has a valid point insofar as there is a possible
ambiguity in bidi text. However, he is absolutely wrong in
blaming the Unicode bidi algorithm for this problem.

Gaspar Sinai had written:
  change products or to change the standard and use
  a reversable bidi.

and later:
  Hold on there! You admit that unicode alrgorithm is *really*
  not reversable?

He completely failed to acknowledge the fact that the bidi rendering
process is intrinsically not reversible, in the general case. And he
did not mention John Cowan's (IIRC) simple example illustrating this
fact. In other words, he is discussing on wrong premise, so his con-
clusions are not sound.

If he will not get his basic facts right, this whole discussion is
indeed surreal, and mostly a waste of time.

Gaspar Sinai wrote:

 Just because some companies who have influence on Unicode
 Consortium use some algorithm, like backing store and re-mapping,
 it does  not mean that this is the only way. [...]
 Yudit does convert the input to view order and back.


Now, this reveals the real problem.

 From this description, I gather that Gaspar's editor does not
preserve the backing store, hence it has to reconstruct it from
the rendering. As the rendering process is a n-1 mapping, its
reverse is, intrisically, ambiguous. So, the attempt to recon-
struct the original character sequence from the vsual appearance
is bound to fail, in the general case.

Now Gaspar asks everybody else to comply with his own approach,
and does not even see that this approach will not work!

 Text direction and end of line is clearly indicated. The Unicode
 values of the characters in the cluster under the cursor are
 clearly indicated.


These are good features to have in a decent editor; but they are
entirely unrelated to the perceived problem. They can easily be
implemented in an edtor that keeps the backing store.

 In  all cases what you view be converted back to
 the *same*  bitstream  - except for illegal  encoded text but that
 leaves clearly visible traces in the screen, as it should.


Fine. And a lot easier to attain, if the original bitstream is
not discarded, in the first place ;-)


 If the standard wants me to confuse the user, I would rather dump the
 standard than comply.


That is certainly not the standard's aim. Rather the bidi part of the
standard wants to describe established practice for bidi writing.


 I updated:
 http://www.yudit.org/security/

It would be honest to describe the facts, as they are in reality,
and not overstate, or even falsify, them in order to drive a point
home. E. g.:

  Unicode Bidirectional Algorithm is non-reversable.

Rather:
   Bidi text may be ambiguous, if you cannot determine where to start
   reading. E. g.
  the arabs = SBARA EHT
   (where uppercase represents the arabic equivalent, written right to left)
   can be read from either side. Nested levels of RTL, and LTR, clauses may
   render the interpretation of bidi text even more problematic.

   The ambiguity is normally resolved in one of two ways:
   - The starting direction is determined from the context, e. g. you
 would start reading the preceding example from the left, as it is
 embedded in an English (i. e. LTR) paragraph; you would start reading
 this very same line from the other side if it were embedded in an
 Arabic (i. e. RTL) paragraph.
   - Embedded levels are usually delimited by quotes, or other con-
 textual hints.

  That means that if text converted back from display order we can not get
  back the same text.

Rather:
   ... we will not get back the same text, in every single case.

  Imagine somone signing a digital unicode document. He is looking at
  his viewer but what he signs is the bitstream.

He is probably signing a document that he has entered himself.
Where could the ambiguity come from, if he has not deliberatly intro-
duced it, himself?

  At yudit.org we advise you: please never sign digitally a Unicode 
document -
  or sign it knowing your own risk.

Rather:
   Make sure what you sign, in particular regarding bidi documents.
   If you want to sign the clauses you entered, in logical order, then
   sign your e-mail (or other Unicode text); if you want to sign the
   rendering, then apply your signature to an image, or pdf, file. In
   both cases, try to express your points (particularly the nesting of
   clauses written in oppsite directions) as unambiguously as possible.

Btw.: Decent software should make clear and obvious to the user
what he is really signing.

Best wishes,
   Otto Stolz





Re: Unicode and Security

2002-02-05 Thread John H. Jenkins

Y'know, I must confess to not following this thread at all.  Yes, it is 
impossible to tell from the glyphs on the screen what sequence of Unicode 
characters  was used to generate them.  Just *how*, exactly, is this a 
security problem?

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/





RE: Unicode and Security

2002-02-05 Thread Jonathan Rosenne

 
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I don't see why pick on bidi. Unicode rendering is not reversible in
Latin too - from the rendering you cannot and should not be able to
tell whether a character was decomposed or precomposed.

Looking at some text, you would not be able to tell whether there are
or aren't trailing spaces. This goes for good old ASCII too. 
   

What's the point? What has it to do with security?

Jony

-BEGIN PGP SIGNATURE-
Version: PGPfreeware 7.0.3 for non-commercial use http://www.pgp.com

iQA/AwUBPGBFxRV5/en3UelbEQKZtwCfWhhpuyS8Jf35/FCJltIpiNW3iTEAnRYW
lagDbQlCy5wSd5rmvGOfCGfb
=SZkL
-END PGP SIGNATURE-





Re: Unicode and Security

2002-02-04 Thread John Cowan

Gaspar Sinai scripsit:

 So common language is screenshots... Ok. I updated the page.

Thank you.

 Now the  exact same file is viewed with two different viewers
 at the bottom of this page:
 
   http://www.yudit.org/security/

Outlook Express, at least the version you are using, has a bug;
it is failing to set the overall directionality to RTL even
though the first character is strongly RTL.  The fact that
some implementations are buggy is hardly an argument against
either the use of bidi or Unicode.

Furthermore, the example is perverse: you are providing
a sentence that looks like it's meant to be read as
English, but in Arabic reading order.

 I maintain my view that if there is no proven
 reversable logical-to-viewed/viewed-to-logical
 electronic signatures should be avoided.

There is nothing to be done about the fact that
the visual appearance

the Arabs = BARA-LA
Islam = MALSI-LA

(using caps for RTL as usual) can be read as either
Arabic-to-English or English-to-Arabic, depending on
the larger context.  If you saw it written down in
isolation, you wouldn't know which way to read it
either: nothing in the mere appearance of such a text
tells you whether it is basically in English or Arabic.
Therefore, this appearance can have either of two
encodings:

AL-ARAB = the Arabs\nAL-ISLAM = Islam
the Arabs = AL-ARAB\nIslam = AL-ISLAM

-- 
John Cowan   http://www.ccil.org/~cowan  [EMAIL PROTECTED]
To say that Bilbo's breath was taken away is no description at all.  There
are no words left to express his staggerment, since Men changed the language
that they learned of elves in the days when all the world was wonderful.
--_The Hobbit_




Re: Unicode and Security

2002-02-04 Thread Bob_Hallissy


On 04-02-2002 11:15:25 John Cowan wrote:

Outlook Express, at least the version you are using, has a bug;
it is failing to set the overall directionality to RTL even
though the first character is strongly RTL.  The fact that
some implementations are buggy is hardly an argument against
either the use of bidi or Unicode.

Of course the bidi algorithm permits using a higher-level protocol to set
the paragraph direction (see note under rule P3, TUS 3.0 page 61). Thus one
could argue that this isn't necessarily a bug in Outlook Express -- at
least it isn't a violation of the standard.

Bob





Re: Unicode and Security

2002-02-04 Thread Gaspar Sinai

On Mon, 4 Feb 2002, John Cowan wrote:
 Gaspar Sinai scripsit:
  Now the  exact same file is viewed with two different viewers
  at the bottom of this page:
 
http://www.yudit.org/security/

 Outlook Express, at least the version you are using, has a bug;
 it is failing to set the overall directionality to RTL even
 though the first character is strongly RTL.  The fact that
 some implementations are buggy is hardly an argument against
 either the use of bidi or Unicode.

I am sorry but someone on this list has just said:
+
|The bidi algorithm is anything but vague. Any
|implementation can be rigorously tested against two
|reference implementations, to ensure fully compatible
|implementation.
+
So does this mean that Microsoft does not rigorously
test their products? Or does this mean the test is
wrong? Or maybe the algorithm is vague?

I expect at least one yes answer here.

Come on guys this is only *one* example. And it
happened in MS outlook too. (No more screenshots please
none of my friends use that product any more).

I am ready to publish regularily bad rendering of
the *buggy*  implementations of the non-vague unicode
BIDI (or the non-buggy implementations of the *vague*
BIDI - take your choice).

I wonder which cost more to regualrily patch and
change products or to change the standard and use
a reversable bidi.

It may take some time to find the bug - but the bug
will be there...

Cheers
gaspar







Re: Unicode and Security

2002-02-04 Thread Rick McGowan

Gaspar Sinai...

Pursuing this kind of trivia hunt for bugs in an environment employing  
Unicode is not any different than prusuing the same kind of bugs in any  
other environment.

It is within the purview of the security community to find such bugs  
before hackers find them.

But those bugs are not character set bugs, they are software bugs!

 I wonder which cost more to regualrily patch and
 change products or to change the standard and use
 a reversable bidi.

Oh come now... That sounds like your real agenda -- you must have an  
algorithm that you like better, and apparently you thing that if  
implemented in software it would be less bug prone. Well, changing the  
Unicode bidi algorithm to use a reversible bidi still isn't going to solve  
the problem that all software has bugs!

So if you find a bug in the bidi algorithm or the reference  
implementations, please let people know. It would be helpful. But at this  
point, changing it to be reversible isn't an option.

Rick









Re: Unicode and Security

2002-02-04 Thread Moe Elzubeir




Hello,

Before you call this thread a waste of time, and out of curiosity.. what 

were theconsiderations put forth which determined the way the 
bidi
algorithm is (uax#9). Ie. what were the pros and cons of a reversible
bidi?

Also, who make up the 'bidi community'? The users or the developer(s)
of the bidi algorithm?

Thank you
--
Mohammed Elzubeir "Mark Davis" 
[EMAIL PROTECTED] 02/04/02 10:13AM  Outlook 
Express, at least the version you are using, has a bug;The BIDI 
algorithm is not reversible, and could not be made reversiblewithout 
eliminating features that are important to the bidi community.This was 
considered at the time the bidi algorithm was developed.This thread is a 
waste of time.Mark


Re: Unicode and Security

2002-02-04 Thread Gaspar Sinai


On Mon, 4 Feb 2002, Mark Davis wrote:
  Outlook Express, at least the version you are using, has a bug;

 This is not a bug; it is specifically cited in the Bidirectional
 Conformance section of Chapter 3 as one of the ways a higher-level
 protocol can override the BIDI algorithm. I otherwise agree with John
 about the perversity (perversion ;-) of the examples.

  change products or to change the standard and use
  a reversable bidi.

 The BIDI algorithm is not reversible, and could not be made reversible
 without eliminating features that are important to the bidi community.
 This was considered at the time the bidi algorithm was developed.

Hold on there! You admit that unicode alrgorithm is *really*
not reversable? I was just bluffing because I just saw that their
is no reverse algorithm published in the standard!

Can you imagine the implications of this? Imagine somone signing
a digital unicode document. He is looking at his viewer but
what he signs is the ___bitstream___. So you claim that this guy
who might have no connection to software industry at all will be
able to run an algorithm - that does not exist - in his head?

 This thread is a waste of time.

If unicode bi-di algorithm was reversable none of this
would happen. Software developers, who are flash and blood
people, would be able to do a  clean room implementation of
the algorithm and the reverse of it. The correctness of
the software could be *automatically* checked by just
reversing the view and checking it against the bitstream.

Instead of the automatic check no there are test cases
and if there is a nasty bug the reply is, oh well, sorry
for that, and plug in another fix and test case.

I feel I saw this attitude before... Is it only me?

Gaspar






Re: Unicode and Security

2002-02-04 Thread John Cowan

Gaspar Sinai scripsit:

 Hold on there! You admit that unicode alrgorithm is *really*
 not reversable? I was just bluffing because I just saw that their
 is no reverse algorithm published in the standard!

It can't be reversable, as my little English = CIBARA demonstration
showed.  The only way to make a reversable algorithm would be
to abandon the principle of phonetic internal ordering.

 Can you imagine the implications of this? Imagine somone signing
 a digital unicode document. He is looking at his viewer but
 what he signs is the ___bitstream___. So you claim that this guy
 who might have no connection to software industry at all will be
 able to run an algorithm - that does not exist - in his head?

No Real World document is going to make sense read both ways.
It will make sense one way, thus:  BARA-LA AW MALSI-AL mean
the Arabs and Islam respectively.  The other order will make
no sense at all.

-- 
John Cowan   http://www.ccil.org/~cowan  [EMAIL PROTECTED]
To say that Bilbo's breath was taken away is no description at all.  There
are no words left to express his staggerment, since Men changed the language
that they learned of elves in the days when all the world was wonderful.
--_The Hobbit_




Re: Unicode and Security

2002-02-04 Thread Mark Leisher


 This thread is a waste of time.

Gaspar If unicode bi-di algorithm was reversable none of this would
Gaspar happen. Software developers, who are flash and blood people, would
Gaspar be able to do a clean room implementation of the algorithm and the
Gaspar reverse of it. The correctness of the software could be
Gaspar *automatically* checked by just reversing the view and checking it
Gaspar against the bitstream.

Gaspar Instead of the automatic check no there are test cases and if
Gaspar there is a nasty bug the reply is, oh well, sorry for that, and
Gaspar plug in another fix and test case.

Gaspar I feel I saw this attitude before... Is it only me?

I don't understand your reasoning.  Applying the bidi algorithm or a
higher-level protocol does not change the backing store.  Applying the bidi
algorithm is essentially a one-way transformation, but the original
information need not be thrown away.  Yudit differentiates the backing store
and the display, does it not?

And as for signing a Unicode document, the fact that the user is implicitly
signing the __bitstream__ and not the __document__ is probably the right
thing to do.  To be meaningful, the data will be displayed the same
everywhere, barring incorrect renderers.  And in the case of incorrect
rendering, it is the __bitstream__ that remains correct, and that is what
the user signed.

A user types some text on a computer and signs it.  Is the user signing the
idea expressed by the text or the presentation of the text?  They are signing
the idea.  The presentation can have all kinds of flaws that do not represent
the original idea, such as a printer that can't print the letter e.
-
Mark LeisherOrthodoxy, of whatever color, seems to
Computing Research Lab  demand a lifeless, imitative style.
New Mexico State University
Box 30001, Dept. 3CRL  -- Politics and the English Language,
Las Cruces, NM  88003 George Orwell




Re: Unicode and Security

2002-02-04 Thread $B$m!;!;!;!;(B $B$m!;!;!;(B

No Real World document is going to make sense read both ways.
It will make sense one way, thus:  "BARA-LA AW MALSI-AL mean
the Arabs and Islam respectively".  The other order will make
no sense at all.

Good style might say to put in a line break so you know what's going on.
I don't know if that would help. Maybe it would do more harm than good. 
Let's ask an Israeli. They probably have to deal with this on a daily 
basis.



$B"*!!$8$e$&$$$C$A$c$s!!"+(B
$B!!$@$s$;$$$i$7$5$`$h$&(B

_
$B%$%s%?!<%M%C%H$r$V$i$V$i%7%g%C%T%s%0$9$k$J$i(BMSN $B%7%g%C%T%s%0$X(B 
http://shopping.msn.co.jp/


Re: Unicode and Security

2002-02-04 Thread Kenneth Whistler

Gaspar wrote:

  The BIDI algorithm is not reversible, and could not be made reversible
  without eliminating features that are important to the bidi community.
  This was considered at the time the bidi algorithm was developed.
 
 Hold on there! You admit that unicode alrgorithm is *really*
 not reversable? I was just bluffing because I just saw that their
 is no reverse algorithm published in the standard!

Of course it isn't reversible. (echoing John Cowan)

The bidi algorithm is a set of steps for going from a logical
representation of text to a specification of the *actual* directionality
for rendering in lines.

But there are inherent ambiguities in trying to reverse the
process, to go from line-rendered text display to a logical
representation of text. In addition to John Cowan's example
of ambiguity caused by assumption of the default rendering
order, you could always introduce extraneous embedding levels
that would resolve the same, or you could have otherwise
undetectable differences that would result in the same measurement
and display of text, such as one em space versus a sequence of
two en spaces.

Gaspar continued:

 Can you imagine the implications of this? Imagine somone signing
 a digital unicode document. He is looking at his viewer but
 what he signs is the ___bitstream___. So you claim that this guy
 who might have no connection to software industry at all will be
 able to run an algorithm - that does not exist - in his head?

Reading and understanding the content of text is no guarantee
of being able to reverse a rendering process to intuit the
exact order of characters which was used to produce that
text -- ever. This is not merely a Unicode (and ISO 10646) issue,
but even crops up in the severely limited context of ASCII text
rendered with monowidth fonts. A trivial example of this can
be found in otherwise undetectable spaces at ends of lines, or
in ambiguities with regard to whether a particular spacing was
produced by tabulation or insertion of multiple spaces.

 
  This thread is a waste of time.

I agree with Mark about that.

 
 If unicode bi-di algorithm was reversable none of this
 would happen. 

Nonsense.

 Software developers, who are flash and blood
 people, would be able to do a  clean room implementation of
 the algorithm and the reverse of it. The correctness of
 the software could be *automatically* checked by just
 reversing the view and checking it against the bitstream.

Think again.

 
 Instead of the automatic check no there are test cases
 and if there is a nasty bug the reply is, oh well, sorry
 for that, and plug in another fix and test case.
 
 I feel I saw this attitude before... Is it only me?

'fraid so.

By the way, I just checked www.yudit.org and noted that among
the future plans for Yudit are:

 * Waiting for a standard that makes more sense than Unicode
and jump ship.

with that makes more sense pointing to http://www.bytext.org/
Oh ho! I think the readers of this list who considered the virtues
of ByText would find that an interesting indication of judgement.

--Ken
 




Re: Unicode and Security

2002-02-04 Thread Gaspar Sinai

On Mon, 4 Feb 2002, Mark Leisher wrote:

[...cut some stuff to save room...]

 I don't understand your reasoning.  Applying the bidi algorithm or a
 higher-level protocol does not change the backing store.  Applying the bidi
 algorithm is essentially a one-way transformation, but the original
 information need not be thrown away.  Yudit differentiates the backing store
 and the display, does it not?

Thank you for mentioning Yudit - I don't need advertisement,
there are enough users.

Just because some companies who have influence on Unicode
Consortium use some algorithm, like backing store and re-mapping,
it does  not mean that this is the only way.  And I don't even
think they  do in cases when character conversion is necessary.

For me it is very imprtant what a naive user sees on the screen.
Yudit does convert the input to view order and back. Text
direction and  end of line is clearly indicated. The Unicode
values of the characters in the cluster under the cursor are
clearly  indicated. In  all cases what you view be converted back to
the *same*  bitstream  - except for illegal  encoded text but that
leaves clearly visible traces in the screen, as it should.

If the standard wants me to confuse the user, I would rather dump the
standard than comply.

I wish there was another world character standard besides
Unicode and not only  half-hearted attempts like bytext.
Talking about characters: I think  bi-di should not be in
Unicode  Standard because it is not a character.
It is an algorithm.

I also start think this thread is a waste of time.
This thread won't solve the our problem.

I feel sorry for interrupting in the Let's praise and
celebrate Unicode mood of this mailing list.

gaspar

I updated:
http://www.yudit.org/security/
I wanted to remove it after solving the problem, but
it seems that this page will stay.






Re: Unicode and Security

2002-02-04 Thread Michael \(michka\) Kaplan

From: Gaspar Sinai [EMAIL PROTECTED]

 If the standard wants me to confuse the user, I would
 rather dump the standard than comply.

Well, don't let the door hit you in the a** on the way out?

Te users will be less confused than you realize -- only people who walk in
with agendas see the flaws you claim.

 Talking about characters: I think  bi-di should not be in
 Unicode  Standard because it is not a character.
 It is an algorithm.

And it is documented as such.

Clearly what you want of Unicode does not match what it actually is -- when
my wife and I realized such about each other, she became my ex-wife. Since
that is your goal here, I guess your divorce from Unicode should not be a
surprise?

snip out of order

 Thank you for mentioning Yudit - I don't need
 advertisement, there are enough users.

Perhaps some will leave if you are honest about your divorce though -- you
might be surprised how many people follow the standard?

 I also start think this thread is a waste of time.
 This thread won't solve the our problem.

The only issue though is that we do not have a problem, here?

 I feel sorry for interrupting in the Let's praise and
 celebrate Unicode mood of this mailing list.

Sorry, thats not the mood of the list. But in order to have a healthy
respect for the people who give sound and reasonable arguments, we must show
a matching lack of respect for those who give specious arguments.

 I updated:
 http://www.yudit.org/security/
 I wanted to remove it after solving the problem, but
 it seems that this page will stay.

The problem is solved, though. The real problem at this point can be found
at
http://www.yudit.org/gaspar/ though.


MichKa

Michael Kaplan
Trigeminal Software, Inc.  -- http://www.trigeminal.com/





Re: Unicode and Security

2002-02-03 Thread Barry Caplan

At 02:15 PM 2/3/2002 +0900, you wrote:
On Sat, 2 Feb 2002, David Starner
wrote:
[...several lines cut to save room...]
 I think I'm missing your perspective. To me, these are minor quirks.
Why
 do you see them as huge problems?
I am thinking about electronically signed Unicode text documents
that are rendered correctly or believeed to be rendered correctly,
still they look different, seem to contain additional or do not
seem to contain some text when viewed with different viewers due
to some ambiguities inherent in the standard.
An electronically signed document allows you to trust who wrote it, and
that the *byte* sequence* hasn't been tampered with. It implies nothing
at all trust wise about what software you should use to interpret it. You
would go through the trouble to verify a signature, but trust the .doc
extension and some machine's implementation of Word with your money?
Makes no sense.
That being said, identifying security issues of existing programs and or
protocols when they intersect with Unicode-based data is an important
issue, and one I intend to cover regularly on
www.i18n.com, once it
launches this month.
For those of you that have specific issues to write about, or are
interested in providing a series of security-related articles (length and
frequency TBD, please contact me off-list. I think there are endless
examples already out there, to provide, and I know of at least one that
is serious. Let's find more!


Best Regards,
Barry Caplan
www.i18n.com
- coming soon, preview available now
News | Tools | Process for Global Software
Team I18N



Re: Unicode and Security

2002-02-03 Thread Gaspar Sinai

On Sun, 3 Feb 2002, Asmus Freytag wrote:
 The bidi algorithm is anything but vague. Any
 implementation can be rigorously tested against two
 reference implementations, to ensure fully compatible
 implementation.

Sorry buys to be this short this time but
I kicked life to my Windows laptop and made
and Example for BIDI. That pretty much took
my time away...

The following page contains my view of Unicode
BIDI algorithm (with screenshots).

http://www.yudit.org/security/

This page is not linked up enywhere yet - I just made it
for this list.

My apology for being so bastard - my nature is to be
paranoid.

Gaspar





Re: Unicode and Security

2002-02-03 Thread Gaspar Sinai

On Sun, 3 Feb 2002, John Cowan wrote:

 Gaspar Sinai scripsit:

  The following page contains my view of Unicode
  BIDI algorithm (with screenshots).
 
  http://www.yudit.org/security/

 Oooo-kay.  This is not a Unicode problem per se: it is about
 embedded text vs. text that is not embedded.  The Yudit and
 IE versions are displaying a text (Java code) that is essentially in
 Latin script (LTR) with some RTL inclusions.  However, when
 the Java application actually runs, it displays three
 separate and distinct texts, each of which is an RTL text
 with some LTR inclusions.  They are assumed to be RTL
 text, by the bidi rules, because they begin with a strong
 RTL character.

 Similar things happen when you construct XML documents
 with RTL element names: the bidi rules, which are meant
 for true text and not computer-readable stuff, sometimes
 produce visually confusing results.

So it is perfectly ok? I can make a non-ebedded example too.
I do not have time to make childish examples and screenshots
to get through my  point. I have a job to do and text processing
is just my hobby.

The rendering problems are all side effects of the
unicode bi-di algorithm. If unicode bidi algorithm would
be proven to be reversable (logical-display ; display-logical)
I would not go to bed worrying about my signed documents.

Thats my view of the problem.
Cheers
gaspar





Re: Unicode and Security

2002-02-03 Thread John Cowan

Gaspar Sinai scripsit:

 The following page contains my view of Unicode
 BIDI algorithm (with screenshots).
 
 http://www.yudit.org/security/

Oooo-kay.  This is not a Unicode problem per se: it is about
embedded text vs. text that is not embedded.  The Yudit and
IE versions are displaying a text (Java code) that is essentially in
Latin script (LTR) with some RTL inclusions.  However, when
the Java application actually runs, it displays three
separate and distinct texts, each of which is an RTL text
with some LTR inclusions.  They are assumed to be RTL
text, by the bidi rules, because they begin with a strong
RTL character.

Similar things happen when you construct XML documents
with RTL element names: the bidi rules, which are meant
for true text and not computer-readable stuff, sometimes
produce visually confusing results.

-- 
John Cowan   http://www.ccil.org/~cowan  [EMAIL PROTECTED]
To say that Bilbo's breath was taken away is no description at all.  There
are no words left to express his staggerment, since Men changed the language
that they learned of elves in the days when all the world was wonderful.
--_The Hobbit_




Re: Unicode and Security

2002-02-03 Thread John Cowan

Gaspar Sinai scripsit:

 So it is perfectly ok? I can make a non-ebedded example too.
 I do not have time to make childish examples and screenshots
 to get through my  point. I have a job to do and text processing
 is just my hobby.

Mine too, but it's difficult to understand the merits of an
objection when no actual examples of the problem are given.

-- 
John Cowan   http://www.ccil.org/~cowan  [EMAIL PROTECTED]
To say that Bilbo's breath was taken away is no description at all.  There
are no words left to express his staggerment, since Men changed the language
that they learned of elves in the days when all the world was wonderful.
--_The Hobbit_




Re: Unicode and Security

2002-02-03 Thread Gaspar Sinai


On Sun, 3 Feb 2002, John Cowan wrote:

 Gaspar Sinai scripsit:

  So it is perfectly ok? I can make a non-ebedded example too.
  I do not have time to make childish examples and screenshots
  to get through my  point. I have a job to do and text processing
  is just my hobby.

 Mine too, but it's difficult to understand the merits of an
 objection when no actual examples of the problem are given.

So common language is screenshots... Ok. I updated the page.
Now the  exact same file is viewed with two different viewers
at the bottom of this page:

  http://www.yudit.org/security/

I maintain my view that if there is no proven
reversable logical-to-viewed/viewed-to-logical
electronic signatures should be avoided.

And the bottom line is: I don't really care if
Unicode will admit that this is a problem. If
my reasoning (not my screenshots) convince
*some* people not to sign electronically unicode
text I think I did those guys good - and that
is enough satisfaction for me.

Cheers
gaspar






Re: Unicode and Security

2002-02-03 Thread David Starner

On Mon, Feb 04, 2002 at 02:25:05PM +0900, Gaspar Sinai wrote:
 And the bottom line is: I don't really care if
 Unicode will admit that this is a problem. If
 my reasoning (not my screenshots) convince
 *some* people not to sign electronically unicode
 text I think I did those guys good - and that
 is enough satisfaction for me.

Why not just warn against signing documents with bidi in them? Odds are,
people who would run into this, if warned against using Unicode, would
use ISO-8859-6/8 - which is often ran through the same bidi algorithim.

And what if you don't do those guys good? They miss a multimillion
dollar account because they can't work with the client, or they fall for
something more common because they're worrying about Unicode?

-- 
David Starner - [EMAIL PROTECTED], dvdeug/jabber.com (Jabber)
Pointless website: http://dvdeug.dhis.org
What we've got is a blue-light special on truth. It's the hottest thing 
with the youth. -- Information Society, Peace and Love, Inc.




Re: Unicode and Security

2002-02-02 Thread Rick McGowan

A while back there was some discussion of security. You could start by  
checking the list archies for those threads.

 Is Unicode secure? What character standards can be
 considered secure?

What does security really mean for a character encoding?

In my opinion, security is related to bugs in software, not to  
specifications of character encodings. No matter what character encoding  
you use, you are subject to certains types of security problems in certain  
environments if you don't write correct and robust programs!

The uneasiness you are experiencing at this time is manifest only because  
Unicode is a relatively new character encoding and software/program  
environments in which Unicode is found have not been subjected to the same  
degree of scrutiny and analysis as previous environments which used, for  
example, only ASCII.

 I would also like to know your opinion about the
 need to create another or an 'intermediate' standard.

There is no need to do that. The scenarios you present are related to  
misinterpretations by software, not to any real problems with the  
specification of Unicode itself. If you precisely specify the input that  
your software will accept in secure situations where interpretation  
matters, and specify what things your software will NOT accept as  
substitutes, then you will not have these kinds of security problems.

There is, perhaps, a need for the security community to discuss the types  
of security attacks that could be mounted against naive software that  
accepts Unicode strings in secure situations.

That's my opinion.

Rick





Re: Unicode and Security

2002-02-02 Thread David Starner

On Sun, Feb 03, 2002 at 11:41:11AM +0900, Gaspar Sinai wrote:
 I had the following problems where unicode could not
 be used because of security issues. In all cases
 the signer of  a document can be lured into
 believing that the wording of the document he/she
 is about to sign is different.

This seems more like a legal issue than anything else. It's not legal to
lure someone into believing that the wording of the docuement to be
signed is different. I think you're trying to apply a technical solution
to a legal problem.
 
 1. Character Order Problem
 
The BIDI algorithm is too complex and not reversible.
I could create a BIDI document where only RLO LRO and
PDF characters were used, and the WORD, JAVA and KDE
produced different word ordering. I don't have access
to MS platform  now to reproduce this but as far as
I can tell it was like:
 
 RLOtext1PDFU+0020RLOtext2PDF
 
Because the BIDI algorithm is too complex and vague
it can be said that these programs all displayed
the text correctly, still differently.
 
   text1 text2
   text2 text1

If you support the RLO/PDF characters, the answer is 1txet 2txet, if I'm
reading it right. If you don't, then there's no reason to run the bidi
algoritim, and the answer is text1 text2.
 
   Whether ligature forming will actually happen or not
   is completely up to the font. If the font does have
   the ligature,  it will be formed. The standard does
   not define all the compulsory ligatures.

The whole point of this is that ligatures shouldn't be something most
users have to worry about, and they shouldn't be something that changes
meaning. If I'm using Times New Roman, it should make the ff, fi, and
ffi ligatures automatically. If I switch the document to an old-style
font, it should do ct and st automatically.
 
 b) Hidden Marks
   It is possible to make a combining mark, like a
   negation mark appear in the base characters body
   making it invisible. It is nearly impossible to
   test the rendering engine for all possible
   combinations.

Sure. 
 
 3. Text Search Problem
 
 It is possible to create texts that look the same,
 but the can not be searched because even when fully
 decomposed and ordered they will be different.

I don't see a solution for this. U+0030, U+004F, U+006F, U+039F, U+041E,
U+0555, U+0A66, U+0AE6, U+0B66, U+0C66, U+OCE6, U+0E50, U+0ED0, U+1040,
U+17E0, U+2070, U+2080, U+2134, U+25CB, U+25EF, U+274D, and U+3007 are
all a closed circular shapes. But while they could be confused when used
inappropriately, they each have distinct meaning and use. If you want
text to be searchable, then encode it properly. If you don't, well,
that's your choice. 

This is true in preexisting standards, too - any that include two of the
Latin, Cyrillic and Greek scripts. 

I think I'm missing your perspective. To me, these are minor quirks. Why
do you see them as huge problems?

-- 
David Starner - [EMAIL PROTECTED], dvdeug/jabber.com (Jabber)
Pointless website: http://dvdeug.dhis.org
What we've got is a blue-light special on truth. It's the hottest thing 
with the youth. -- Information Society, Peace and Love, Inc.




Re: Unicode and Security

2002-02-02 Thread Gaspar Sinai

On Sat, 2 Feb 2002, David Starner wrote:
[...several lines cut to save room...]
 I think I'm missing your perspective. To me, these are minor quirks. Why
 do you see them as huge problems?

I am thinking about electronically signed Unicode text documents
that are rendered correctly or believeed to be rendered correctly,
still they look different, seem to contain additional or do not
seem to contain some text when viewed with different viewers due
to some ambiguities inherent in the standard.

It might be just a minor quirk unless they don't cost me
trasferrring all the money from my bank account to a person
unintentionally...

Can all the cases be identified and clearified or there are
infinite number of back-doors in the standard?

Thank you,
Gaspar






Re: Unicode and Security

2002-02-02 Thread David Starner

On Sun, Feb 03, 2002 at 02:15:51PM +0900, Gaspar Sinai wrote:
 I am thinking about electronically signed Unicode text documents
 that are rendered correctly or believeed to be rendered correctly,
 still they look different, seem to contain additional or do not
 seem to contain some text when viewed with different viewers due
 to some ambiguities inherent in the standard.

Some CR's at the right place might produce the same effect in a pure
ASCII document. The O/0 and 1/l/| confusables exist in ASCII. 
 
 It might be just a minor quirk unless they don't cost me
 trasferrring all the money from my bank account to a person
 unintentionally...

There seem to be much easier ways to scam money than to exploit
something like this. Promise the world, take their money and run has
been changed more by Ebay than Unicode. If you don't trust someone,
don't deal with them. If they do pull something like this, it's no more
legal than any other form of scam.
 
 Can all the cases be identified and clearified or there are
 infinite number of back-doors in the standard?

Since the only way to fix all these problems would be be to prescibe
a specific font and specific manner to render text using that font, it's
unlikely they will be fixed. But there aren't an infinite number of
back-doors in the standard, as it's logically a finite document.

-- 
David Starner - [EMAIL PROTECTED], dvdeug/jabber.com (Jabber)
Pointless website: http://dvdeug.dhis.org
What we've got is a blue-light special on truth. It's the hottest thing 
with the youth. -- Information Society, Peace and Love, Inc.