Re: Unicode and Security

2002-02-03 Thread Barry Caplan

At 02:15 PM 2/3/2002 +0900, you wrote:
On Sat, 2 Feb 2002, David Starner
wrote:
[...several lines cut to save room...]
 I think I'm missing your perspective. To me, these are minor quirks.
Why
 do you see them as huge problems?
I am thinking about electronically signed Unicode text documents
that are rendered correctly or believeed to be rendered correctly,
still they look different, seem to contain additional or do not
seem to contain some text when viewed with different viewers due
to some ambiguities inherent in the standard.
An electronically signed document allows you to trust who wrote it, and
that the *byte* sequence* hasn't been tampered with. It implies nothing
at all trust wise about what software you should use to interpret it. You
would go through the trouble to verify a signature, but trust the .doc
extension and some machine's implementation of Word with your money?
Makes no sense.
That being said, identifying security issues of existing programs and or
protocols when they intersect with Unicode-based data is an important
issue, and one I intend to cover regularly on
www.i18n.com, once it
launches this month.
For those of you that have specific issues to write about, or are
interested in providing a series of security-related articles (length and
frequency TBD, please contact me off-list. I think there are endless
examples already out there, to provide, and I know of at least one that
is serious. Let's find more!


Best Regards,
Barry Caplan
www.i18n.com
- coming soon, preview available now
News | Tools | Process for Global Software
Team I18N



Re: Unicode and Security

2002-02-03 Thread Gaspar Sinai

On Sun, 3 Feb 2002, Asmus Freytag wrote:
 The bidi algorithm is anything but vague. Any
 implementation can be rigorously tested against two
 reference implementations, to ensure fully compatible
 implementation.

Sorry buys to be this short this time but
I kicked life to my Windows laptop and made
and Example for BIDI. That pretty much took
my time away...

The following page contains my view of Unicode
BIDI algorithm (with screenshots).

http://www.yudit.org/security/

This page is not linked up enywhere yet - I just made it
for this list.

My apology for being so bastard - my nature is to be
paranoid.

Gaspar





Unicode-Afrique forum

2002-02-03 Thread Andrew Cunningham



Hi everyone, 

thought I'd pass on the info below.

A French language forum discussing the potential of 
Unicode for African langauges has been launched. Details below.

Andrew

==

Unicode-Afrique
http://groups.yahoo.com/group/Unicode-Afrique/

L'Unicode représente probablement la meilleure 
chance pour favoriser l'informatique et le contenu d'Internet en langues 
africaines. La pluralité actuelle de polices et des systèmes de coding 
non-intercompatibles pour les caractères spéciales ou non-Latins empêche un vrai 
plurilinguisme des NTIC en Afrique (et le monde).

Cet e-groupe existe pour: donner publicité aux 
projets en Afrique utilisant l'Unicode; discuter des questions et problèmes 
pratiques avec Unicode et les jeux de caractères pour des langues africaines; et 
partager des expériences utiles sur le développement et utilisation des polices 
unicodes pour langues africaines. Donc il n'est pas en concurrence ni avec 
le newsgroup sur l'Unicode "fr.comp.normes.unicode," ni avec les listes de 
discussion générale sur les NTIC en Afrique tel que 
"afrique-informatique."


Re: When to use markup: (Was:Introducing the idea of a ROMAN VARIANT SELECTOR (was: Re: Proposing Fraktur))

2002-02-03 Thread Stefan Persson

- Original Message -
From: Asmus Freytag [EMAIL PROTECTED]
To: Karl Pentzlin [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: den 31 januari 2002 22:09
Subject: When to use markup: (Was:Introducing the idea of a ROMAN VARIANT
SELECTOR (was: Re: Proposing Fraktur))


 A more productive distinction would be along these lines:

 a) is the feature necessary for correctly expressing the content

Yes.

 b) is the feature rule based, and

Yes.

 b.1) is the rule implementable w/o knowledge of semantics, or

No.

 c) when implementing the feature, is it necessary to
 c.1) provide scope information, or

Yes.

 c.2) is local context sufficient

No.

 Leaving out italics from a document can not only change the level of
 emphasis, but for example in English, there are occasional circumstances
 where the use of italics removes a possible ambiguity in interpreting
 a sentence. Nevertheless (except for mathematics) italics were left to
 a higher level protocol (style markup).

Italics is better supported than Fraktur, as most word processors have an
option for using italics with any font installed on the computer. For
Fraktur one has to use a different font. There is no Fraktur font widely
spread on all Windows computers or something like that, so it's almost
impossible to using Fraktur text in any public document or similar w/o using
bitmaps.

Why was Fraktur supported for mathematics, but not for old
Swedish/German/etc.?

Stefan


_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com





Re: When to use markup: (Was:Introducing the idea of a ROMAN VARIANT SELECTOR (was: Re: Proposing Fraktur))

2002-02-03 Thread Stefan Persson

- Original Message -
From: Asmus Freytag [EMAIL PROTECTED]
To: Karl Pentzlin [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: den 31 januari 2002 22:09
Subject: When to use markup: (Was:Introducing the idea of a ROMAN VARIANT
SELECTOR (was: Re: Proposing Fraktur))


 A more productive distinction would be along these lines:

 a) is the feature necessary for correctly expressing the content

Yes.

 b) is the feature rule based, and

Yes.

 b.1) is the rule implementable w/o knowledge of semantics, or

No.

 c) when implementing the feature, is it necessary to
 c.1) provide scope information, or

Yes.

 c.2) is local context sufficient

No.

 Leaving out italics from a document can not only change the level of
 emphasis, but for example in English, there are occasional circumstances
 where the use of italics removes a possible ambiguity in interpreting
 a sentence. Nevertheless (except for mathematics) italics were left to
 a higher level protocol (style markup).

Italics is better supported than Fraktur, as most word processors have an
option for using italics with any font installed on the computer. For
Fraktur one has to use a different font. There is no Fraktur font widely
spread on all Windows computers or something like that, so it's almost
impossible to use Fraktur text in any public document or similar w/o using
bitmaps to displaying the characters.

Why was Fraktur supported for mathematics, but not for old
Swedish/German/etc.?

Stefan


_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com





Re: When to use markup:

2002-02-03 Thread Michael Everson

At 16:35 +0100 2002-02-03, Stefan Persson wrote:

Italics is better supported than Fraktur, as most word processors have an
option for using italics with any font installed on the computer. For
Fraktur one has to use a different font. There is no Fraktur font widely
spread on all Windows computers or something like that, so it's almost
impossible to using Fraktur text in any public document or similar w/o using
bitmaps.

Are you saying you don't have a Fraktur font? There are many 
available. See http://www.myfonts.com


Why was Fraktur supported for mathematics, but not for old
Swedish/German/etc.?

Because a semantic distinction is made in mathematics between the 
single letter A and the single letter frakturA/fraktur -- a 
distinction which does not obtain in Fraktur as used
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: When to use markup: (Was:Introducing the idea of a ROMAN VARIANT SELECTOR (was: Re: Proposing Fraktur))

2002-02-03 Thread John Hudson

At 07:35 2/3/2002, Stefan Persson wrote:

Italics is better supported than Fraktur, as most word processors have an
option for using italics with any font installed on the computer. For
Fraktur one has to use a different font.

Um, for italics one has to use a different font also. Many programs provide 
an italics button that activates the italic member of a font family, but 
this still involves selecting a separate font.

There is no Fraktur font widely
spread on all Windows computers or something like that, so it's almost
impossible to using Fraktur text in any public document or similar w/o using
bitmaps.

There are plenty of Fraktur and other blackletter fonts available. Many of 
the best ones are available from Linotype in Germany. If you think that a 
Fraktur font should come installed on operating systems, you should 
petition your OS developer.

I don't see that these font availability issues have anything to do with 
Unicode.

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

... es ist ein unwiederbringliches Bild der Vergangenheit,
das mit jeder Gegenwart zu verschwinden droht, die sich
nicht in ihm gemeint erkannte.

... every image of the past that is not recognized by the
present as one of its own concerns threatens to disappear
irretrievably.
   Walter Benjamin





Re: When to use markup: (Was:Introducing the idea of a ROMAN VARIANT SELECTOR (was: Re: Proposing Fraktur))

2002-02-03 Thread Michael \(michka\) Kaplan

From: John Hudson [EMAIL PROTECTED]

 Um, for italics one has to use a different font also. Many
 programs provide an italics button that activates the italic
 member of a font family, but this still involves selecting a
 separate font.

Au contraire, sir! Many fonts *do* have a separate .TTF files for the
italic version, bu there are just as many that do not, yet the italic option
does not find itself disabled in programs.


MichKa

Michael Kaplan
Trigeminal Software, Inc.  -- http://www.trigeminal.com/






Re: When to use markup: (Was:Introducing the idea of a ROMAN VARIANT SELECTOR (was: Re: Proposing Fraktur))

2002-02-03 Thread Curtis Clark

At 10:25 AM 2/3/02, John Hudson wrote:
Um, for italics one has to use a different font also. Many programs 
provide an italics button that activates the italic member of a font 
family, but this still involves selecting a separate font.

And it would be simple to set up a font family so that Fraktur would be the 
normal state, and the italic button on the word processor would select a 
Roman member of the family (if you still needed sloped italics, those could 
be assigned to the bold italic slot).


-- 
Curtis Clark  http://www.csupomona.edu/~jcclark/
Mockingbird Font Works  http://www.mockfont.com/






Re: names of the control characters

2002-02-03 Thread Mark Davis

This has bitten more than a few people. For political reasons, having
to do with the synchronization of names to ISO 10646, the name fields
are empty for the control characters. That is because (at least in
theory) people could have other semantics for those characters.

Field 10 (called Unicode 1.0 Name) contains names for most of those
characters, and should be used for your purpose. See, for example,
http://www.unicode.org/Public/BETA/Unicode3.2/UnicodeData-3.2.0d1.html
where it says:

This is the old name as published in Unicode 1.0. This name is only
provided when it is significantly different from the current name for
the character. The value of field 10 for control characters does not
always match the Unicode 1.0 names. Instead, field 10 contains ISO
6429 names for control functions, for printing in the code charts.

Thus the data from
http://www.unicode.org/Public/BETA/Unicode3.2/UnicodeData-3.2.0d8.txt
has the following. Note the use of parantheses for some (but not all)
abbreviated names, and that some of the names follow the updated ISO
6429 names, e.g. CHARACTER TABULATION instead of the better-known
HORIZONTAL TABULATION (HT).

;control;Cc;0;BN;N;NULL
0001;control;Cc;0;BN;N;START OF HEADING
0002;control;Cc;0;BN;N;START OF TEXT
0003;control;Cc;0;BN;N;END OF TEXT
0004;control;Cc;0;BN;N;END OF TRANSMISSION
0005;control;Cc;0;BN;N;ENQUIRY
0006;control;Cc;0;BN;N;ACKNOWLEDGE
0007;control;Cc;0;BN;N;BELL
0008;control;Cc;0;BN;N;BACKSPACE
0009;control;Cc;0;S;N;CHARACTER TABULATION
000A;control;Cc;0;B;N;LINE FEED (LF)
000B;control;Cc;0;S;N;LINE TABULATION
000C;control;Cc;0;WS;N;FORM FEED (FF)
000D;control;Cc;0;B;N;CARRIAGE RETURN (CR)
000E;control;Cc;0;BN;N;SHIFT OUT
000F;control;Cc;0;BN;N;SHIFT IN
0010;control;Cc;0;BN;N;DATA LINK ESCAPE
0011;control;Cc;0;BN;N;DEVICE CONTROL ONE
0012;control;Cc;0;BN;N;DEVICE CONTROL TWO
0013;control;Cc;0;BN;N;DEVICE CONTROL THREE
0014;control;Cc;0;BN;N;DEVICE CONTROL FOUR
0015;control;Cc;0;BN;N;NEGATIVE ACKNOWLEDGE
0016;control;Cc;0;BN;N;SYNCHRONOUS IDLE
0017;control;Cc;0;BN;N;END OF TRANSMISSION BLOCK
0018;control;Cc;0;BN;N;CANCEL
0019;control;Cc;0;BN;N;END OF MEDIUM
001A;control;Cc;0;BN;N;SUBSTITUTE
001B;control;Cc;0;BN;N;ESCAPE
001C;control;Cc;0;B;N;INFORMATION SEPARATOR FOUR
001D;control;Cc;0;B;N;INFORMATION SEPARATOR THREE
001E;control;Cc;0;B;N;INFORMATION SEPARATOR TWO
001F;control;Cc;0;S;N;INFORMATION SEPARATOR ONE
007F;control;Cc;0;BN;N;DELETE
0080;control;Cc;0;BN;N;
0081;control;Cc;0;BN;N;
0082;control;Cc;0;BN;N;BREAK PERMITTED HERE
0083;control;Cc;0;BN;N;NO BREAK HERE
0084;control;Cc;0;BN;N;
0085;control;Cc;0;B;N;NEXT LINE (NEL)
0086;control;Cc;0;BN;N;START OF SELECTED AREA
0087;control;Cc;0;BN;N;END OF SELECTED AREA
0088;control;Cc;0;BN;N;CHARACTER TABULATION SET
0089;control;Cc;0;BN;N;CHARACTER TABULATION WITH
JUSTIFICATION
008A;control;Cc;0;BN;N;LINE TABULATION SET
008B;control;Cc;0;BN;N;PARTIAL LINE FORWARD
008C;control;Cc;0;BN;N;PARTIAL LINE BACKWARD
008D;control;Cc;0;BN;N;REVERSE LINE FEED
008E;control;Cc;0;BN;N;SINGLE SHIFT TWO
008F;control;Cc;0;BN;N;SINGLE SHIFT THREE
0090;control;Cc;0;BN;N;DEVICE CONTROL STRING
0091;control;Cc;0;BN;N;PRIVATE USE ONE
0092;control;Cc;0;BN;N;PRIVATE USE TWO
0093;control;Cc;0;BN;N;SET TRANSMIT STATE
0094;control;Cc;0;BN;N;CANCEL CHARACTER
0095;control;Cc;0;BN;N;MESSAGE WAITING
0096;control;Cc;0;BN;N;START OF GUARDED AREA
0097;control;Cc;0;BN;N;END OF GUARDED AREA
0098;control;Cc;0;BN;N;START OF STRING
0099;control;Cc;0;BN;N;
009A;control;Cc;0;BN;N;SINGLE CHARACTER INTRODUCER
009B;control;Cc;0;BN;N;CONTROL SEQUENCE INTRODUCER
009C;control;Cc;0;BN;N;STRING TERMINATOR
009D;control;Cc;0;BN;N;OPERATING SYSTEM COMMAND
009E;control;Cc;0;BN;N;PRIVACY MESSAGE
009F;control;Cc;0;BN;N;APPLICATION PROGRAM COMMAND

Personally, I think that this is error-prone, and the UTC would be far
better off instead putting the control code names in field 1, and
simply documenting that field 1 contains the character names for
non-control characters and the ISO 6429 names for control characters.

Fewer people like yourselves would be unpleasantly surprised.

Mark

—

Πόλλ’ ἠπίστατο ἔργα, κακῶς δ’ ἠπίστατο 
πάντα — Ὁμήρου Μαργίτῃ
[For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr]

http://www.macchiato.com

- Original Message -
From: Jarkko Hietaniemi [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Sunday, February 03, 2002 11:03
Subject: names of the control characters


 A question: Perl offers a way to use 

Re: When to use markup: (Was:Introducing the idea of a ROMAN VARIANT SELECTOR (was: Re: Proposing Fraktur))

2002-02-03 Thread John Hudson

At 10:55 2/3/2002, Michael \(michka\) Kaplan wrote:

  Um, for italics one has to use a different font also. Many
  programs provide an italics button that activates the italic
  member of a font family, but this still involves selecting a
  separate font.

Au contraire, sir! Many fonts *do* have a separate .TTF files for the
italic version, bu there are just as many that do not, yet the italic option
does not find itself disabled in programs.

Ah. Those 'italics'. Those are not italics. Those are slanted romans. 
Sorry, I thought we were talking about typography.

In Adobe InDesign, the italic function is disabled if an italic font is not 
available. There is a separate control for slanting text, but it is not 
possible to accidentally produce a sloped roman in the absence of an italic 
font. This is how it should be.

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

... es ist ein unwiederbringliches Bild der Vergangenheit,
das mit jeder Gegenwart zu verschwinden droht, die sich
nicht in ihm gemeint erkannte.

... every image of the past that is not recognized by the
present as one of its own concerns threatens to disappear
irretrievably.
   Walter Benjamin





Should I propose KARA?

2002-02-03 Thread $B$m!;!;!;!;(B $B$m!;!;!;(B
I have been thinking of a character that the Japanese call $B!V$+$i!W!#(BThat 
is, to use romaji, they call it "kara"/"kala". The glyph they usually use 
for this character is that of FULLWIDTH TILDE, but I don't know if it is 
really a tilde.
In horizontal writing, it looks like the first cycle of a sine wave- that 
is, up first and then down, then up. But maybe this is not always so and 
there are variations. I dunno.
Now -- and this is important -- in vertican writing, it is kind of pointed 
downwards, and is NOT the same as the given horizontal glyph rotated 90 
degrees. It is mirrored, I think, so it does not look like $B!V$7!W!#(B
It is used to indicate ranges of numbers and such. Like in a list in $B#5#02;(B 
order, you see headings $B$"!A$*!"$+!A$3!"(Betc., sometimes.

Do not confuse KARA with the cute variant of KATAKANA-HIRAGANA PROLONGED 
SOUND MARK. They are as different functionally as DIGIT ZERO and LATIN 
CAPITAL LETTER O.

Should I propose it?
I could probably give you examples.

$B"*!!$8$e$&$$$C$A$c$s!!"+(B
$B!!$@$s$;$$$i$7$5$`$h$&(B

_
$B$*E9$h$j$b5$7Z$K!*9%$-$J%b%N9%$-$J$@$18+$i$l$k(B MSN $B%7%g%C%T%s%0(B 
http://shopping.msn.co.jp/


Re: Unicode and Security

2002-02-03 Thread Gaspar Sinai

On Sun, 3 Feb 2002, John Cowan wrote:

 Gaspar Sinai scripsit:

  The following page contains my view of Unicode
  BIDI algorithm (with screenshots).
 
  http://www.yudit.org/security/

 Oooo-kay.  This is not a Unicode problem per se: it is about
 embedded text vs. text that is not embedded.  The Yudit and
 IE versions are displaying a text (Java code) that is essentially in
 Latin script (LTR) with some RTL inclusions.  However, when
 the Java application actually runs, it displays three
 separate and distinct texts, each of which is an RTL text
 with some LTR inclusions.  They are assumed to be RTL
 text, by the bidi rules, because they begin with a strong
 RTL character.

 Similar things happen when you construct XML documents
 with RTL element names: the bidi rules, which are meant
 for true text and not computer-readable stuff, sometimes
 produce visually confusing results.

So it is perfectly ok? I can make a non-ebedded example too.
I do not have time to make childish examples and screenshots
to get through my  point. I have a job to do and text processing
is just my hobby.

The rendering problems are all side effects of the
unicode bi-di algorithm. If unicode bidi algorithm would
be proven to be reversable (logical-display ; display-logical)
I would not go to bed worrying about my signed documents.

Thats my view of the problem.
Cheers
gaspar





Re: Unicode and Security

2002-02-03 Thread John Cowan

Gaspar Sinai scripsit:

 The following page contains my view of Unicode
 BIDI algorithm (with screenshots).
 
 http://www.yudit.org/security/

Oooo-kay.  This is not a Unicode problem per se: it is about
embedded text vs. text that is not embedded.  The Yudit and
IE versions are displaying a text (Java code) that is essentially in
Latin script (LTR) with some RTL inclusions.  However, when
the Java application actually runs, it displays three
separate and distinct texts, each of which is an RTL text
with some LTR inclusions.  They are assumed to be RTL
text, by the bidi rules, because they begin with a strong
RTL character.

Similar things happen when you construct XML documents
with RTL element names: the bidi rules, which are meant
for true text and not computer-readable stuff, sometimes
produce visually confusing results.

-- 
John Cowan   http://www.ccil.org/~cowan  [EMAIL PROTECTED]
To say that Bilbo's breath was taken away is no description at all.  There
are no words left to express his staggerment, since Men changed the language
that they learned of elves in the days when all the world was wonderful.
--_The Hobbit_




Re: Unicode and Security

2002-02-03 Thread John Cowan

Gaspar Sinai scripsit:

 So it is perfectly ok? I can make a non-ebedded example too.
 I do not have time to make childish examples and screenshots
 to get through my  point. I have a job to do and text processing
 is just my hobby.

Mine too, but it's difficult to understand the merits of an
objection when no actual examples of the problem are given.

-- 
John Cowan   http://www.ccil.org/~cowan  [EMAIL PROTECTED]
To say that Bilbo's breath was taken away is no description at all.  There
are no words left to express his staggerment, since Men changed the language
that they learned of elves in the days when all the world was wonderful.
--_The Hobbit_




Re: Unicode and Security

2002-02-03 Thread Gaspar Sinai


On Sun, 3 Feb 2002, John Cowan wrote:

 Gaspar Sinai scripsit:

  So it is perfectly ok? I can make a non-ebedded example too.
  I do not have time to make childish examples and screenshots
  to get through my  point. I have a job to do and text processing
  is just my hobby.

 Mine too, but it's difficult to understand the merits of an
 objection when no actual examples of the problem are given.

So common language is screenshots... Ok. I updated the page.
Now the  exact same file is viewed with two different viewers
at the bottom of this page:

  http://www.yudit.org/security/

I maintain my view that if there is no proven
reversable logical-to-viewed/viewed-to-logical
electronic signatures should be avoided.

And the bottom line is: I don't really care if
Unicode will admit that this is a problem. If
my reasoning (not my screenshots) convince
*some* people not to sign electronically unicode
text I think I did those guys good - and that
is enough satisfaction for me.

Cheers
gaspar






Re: Unicode and Security

2002-02-03 Thread David Starner

On Mon, Feb 04, 2002 at 02:25:05PM +0900, Gaspar Sinai wrote:
 And the bottom line is: I don't really care if
 Unicode will admit that this is a problem. If
 my reasoning (not my screenshots) convince
 *some* people not to sign electronically unicode
 text I think I did those guys good - and that
 is enough satisfaction for me.

Why not just warn against signing documents with bidi in them? Odds are,
people who would run into this, if warned against using Unicode, would
use ISO-8859-6/8 - which is often ran through the same bidi algorithim.

And what if you don't do those guys good? They miss a multimillion
dollar account because they can't work with the client, or they fall for
something more common because they're worrying about Unicode?

-- 
David Starner - [EMAIL PROTECTED], dvdeug/jabber.com (Jabber)
Pointless website: http://dvdeug.dhis.org
What we've got is a blue-light special on truth. It's the hottest thing 
with the youth. -- Information Society, Peace and Love, Inc.