Re: Unicode and Security
At 02:15 PM 2/3/2002 +0900, you wrote: On Sat, 2 Feb 2002, David Starner wrote: [...several lines cut to save room...] I think I'm missing your perspective. To me, these are minor quirks. Why do you see them as huge problems? I am thinking about electronically signed Unicode text documents that are rendered correctly or believeed to be rendered correctly, still they look different, seem to contain additional or do not seem to contain some text when viewed with different viewers due to some ambiguities inherent in the standard. An electronically signed document allows you to trust who wrote it, and that the *byte* sequence* hasn't been tampered with. It implies nothing at all trust wise about what software you should use to interpret it. You would go through the trouble to verify a signature, but trust the .doc extension and some machine's implementation of Word with your money? Makes no sense. That being said, identifying security issues of existing programs and or protocols when they intersect with Unicode-based data is an important issue, and one I intend to cover regularly on www.i18n.com, once it launches this month. For those of you that have specific issues to write about, or are interested in providing a series of security-related articles (length and frequency TBD, please contact me off-list. I think there are endless examples already out there, to provide, and I know of at least one that is serious. Let's find more! Best Regards, Barry Caplan www.i18n.com - coming soon, preview available now News | Tools | Process for Global Software Team I18N
Re: Unicode and Security
On Sun, 3 Feb 2002, Asmus Freytag wrote: The bidi algorithm is anything but vague. Any implementation can be rigorously tested against two reference implementations, to ensure fully compatible implementation. Sorry buys to be this short this time but I kicked life to my Windows laptop and made and Example for BIDI. That pretty much took my time away... The following page contains my view of Unicode BIDI algorithm (with screenshots). http://www.yudit.org/security/ This page is not linked up enywhere yet - I just made it for this list. My apology for being so bastard - my nature is to be paranoid. Gaspar
Unicode-Afrique forum
Hi everyone, thought I'd pass on the info below. A French language forum discussing the potential of Unicode for African langauges has been launched. Details below. Andrew == Unicode-Afrique http://groups.yahoo.com/group/Unicode-Afrique/ L'Unicode représente probablement la meilleure chance pour favoriser l'informatique et le contenu d'Internet en langues africaines. La pluralité actuelle de polices et des systèmes de coding non-intercompatibles pour les caractères spéciales ou non-Latins empêche un vrai plurilinguisme des NTIC en Afrique (et le monde). Cet e-groupe existe pour: donner publicité aux projets en Afrique utilisant l'Unicode; discuter des questions et problèmes pratiques avec Unicode et les jeux de caractères pour des langues africaines; et partager des expériences utiles sur le développement et utilisation des polices unicodes pour langues africaines. Donc il n'est pas en concurrence ni avec le newsgroup sur l'Unicode "fr.comp.normes.unicode," ni avec les listes de discussion générale sur les NTIC en Afrique tel que "afrique-informatique."
Re: When to use markup: (Was:Introducing the idea of a ROMAN VARIANT SELECTOR (was: Re: Proposing Fraktur))
- Original Message - From: Asmus Freytag [EMAIL PROTECTED] To: Karl Pentzlin [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: den 31 januari 2002 22:09 Subject: When to use markup: (Was:Introducing the idea of a ROMAN VARIANT SELECTOR (was: Re: Proposing Fraktur)) A more productive distinction would be along these lines: a) is the feature necessary for correctly expressing the content Yes. b) is the feature rule based, and Yes. b.1) is the rule implementable w/o knowledge of semantics, or No. c) when implementing the feature, is it necessary to c.1) provide scope information, or Yes. c.2) is local context sufficient No. Leaving out italics from a document can not only change the level of emphasis, but for example in English, there are occasional circumstances where the use of italics removes a possible ambiguity in interpreting a sentence. Nevertheless (except for mathematics) italics were left to a higher level protocol (style markup). Italics is better supported than Fraktur, as most word processors have an option for using italics with any font installed on the computer. For Fraktur one has to use a different font. There is no Fraktur font widely spread on all Windows computers or something like that, so it's almost impossible to using Fraktur text in any public document or similar w/o using bitmaps. Why was Fraktur supported for mathematics, but not for old Swedish/German/etc.? Stefan _ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Re: When to use markup: (Was:Introducing the idea of a ROMAN VARIANT SELECTOR (was: Re: Proposing Fraktur))
- Original Message - From: Asmus Freytag [EMAIL PROTECTED] To: Karl Pentzlin [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: den 31 januari 2002 22:09 Subject: When to use markup: (Was:Introducing the idea of a ROMAN VARIANT SELECTOR (was: Re: Proposing Fraktur)) A more productive distinction would be along these lines: a) is the feature necessary for correctly expressing the content Yes. b) is the feature rule based, and Yes. b.1) is the rule implementable w/o knowledge of semantics, or No. c) when implementing the feature, is it necessary to c.1) provide scope information, or Yes. c.2) is local context sufficient No. Leaving out italics from a document can not only change the level of emphasis, but for example in English, there are occasional circumstances where the use of italics removes a possible ambiguity in interpreting a sentence. Nevertheless (except for mathematics) italics were left to a higher level protocol (style markup). Italics is better supported than Fraktur, as most word processors have an option for using italics with any font installed on the computer. For Fraktur one has to use a different font. There is no Fraktur font widely spread on all Windows computers or something like that, so it's almost impossible to use Fraktur text in any public document or similar w/o using bitmaps to displaying the characters. Why was Fraktur supported for mathematics, but not for old Swedish/German/etc.? Stefan _ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Re: When to use markup:
At 16:35 +0100 2002-02-03, Stefan Persson wrote: Italics is better supported than Fraktur, as most word processors have an option for using italics with any font installed on the computer. For Fraktur one has to use a different font. There is no Fraktur font widely spread on all Windows computers or something like that, so it's almost impossible to using Fraktur text in any public document or similar w/o using bitmaps. Are you saying you don't have a Fraktur font? There are many available. See http://www.myfonts.com Why was Fraktur supported for mathematics, but not for old Swedish/German/etc.? Because a semantic distinction is made in mathematics between the single letter A and the single letter frakturA/fraktur -- a distinction which does not obtain in Fraktur as used -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: When to use markup: (Was:Introducing the idea of a ROMAN VARIANT SELECTOR (was: Re: Proposing Fraktur))
At 07:35 2/3/2002, Stefan Persson wrote: Italics is better supported than Fraktur, as most word processors have an option for using italics with any font installed on the computer. For Fraktur one has to use a different font. Um, for italics one has to use a different font also. Many programs provide an italics button that activates the italic member of a font family, but this still involves selecting a separate font. There is no Fraktur font widely spread on all Windows computers or something like that, so it's almost impossible to using Fraktur text in any public document or similar w/o using bitmaps. There are plenty of Fraktur and other blackletter fonts available. Many of the best ones are available from Linotype in Germany. If you think that a Fraktur font should come installed on operating systems, you should petition your OS developer. I don't see that these font availability issues have anything to do with Unicode. John Hudson Tiro Typeworks www.tiro.com Vancouver, BC [EMAIL PROTECTED] ... es ist ein unwiederbringliches Bild der Vergangenheit, das mit jeder Gegenwart zu verschwinden droht, die sich nicht in ihm gemeint erkannte. ... every image of the past that is not recognized by the present as one of its own concerns threatens to disappear irretrievably. Walter Benjamin
Re: When to use markup: (Was:Introducing the idea of a ROMAN VARIANT SELECTOR (was: Re: Proposing Fraktur))
From: John Hudson [EMAIL PROTECTED] Um, for italics one has to use a different font also. Many programs provide an italics button that activates the italic member of a font family, but this still involves selecting a separate font. Au contraire, sir! Many fonts *do* have a separate .TTF files for the italic version, bu there are just as many that do not, yet the italic option does not find itself disabled in programs. MichKa Michael Kaplan Trigeminal Software, Inc. -- http://www.trigeminal.com/
Re: When to use markup: (Was:Introducing the idea of a ROMAN VARIANT SELECTOR (was: Re: Proposing Fraktur))
At 10:25 AM 2/3/02, John Hudson wrote: Um, for italics one has to use a different font also. Many programs provide an italics button that activates the italic member of a font family, but this still involves selecting a separate font. And it would be simple to set up a font family so that Fraktur would be the normal state, and the italic button on the word processor would select a Roman member of the family (if you still needed sloped italics, those could be assigned to the bold italic slot). -- Curtis Clark http://www.csupomona.edu/~jcclark/ Mockingbird Font Works http://www.mockfont.com/
Re: names of the control characters
This has bitten more than a few people. For political reasons, having to do with the synchronization of names to ISO 10646, the name fields are empty for the control characters. That is because (at least in theory) people could have other semantics for those characters. Field 10 (called Unicode 1.0 Name) contains names for most of those characters, and should be used for your purpose. See, for example, http://www.unicode.org/Public/BETA/Unicode3.2/UnicodeData-3.2.0d1.html where it says: This is the old name as published in Unicode 1.0. This name is only provided when it is significantly different from the current name for the character. The value of field 10 for control characters does not always match the Unicode 1.0 names. Instead, field 10 contains ISO 6429 names for control functions, for printing in the code charts. Thus the data from http://www.unicode.org/Public/BETA/Unicode3.2/UnicodeData-3.2.0d8.txt has the following. Note the use of parantheses for some (but not all) abbreviated names, and that some of the names follow the updated ISO 6429 names, e.g. CHARACTER TABULATION instead of the better-known HORIZONTAL TABULATION (HT). ;control;Cc;0;BN;N;NULL 0001;control;Cc;0;BN;N;START OF HEADING 0002;control;Cc;0;BN;N;START OF TEXT 0003;control;Cc;0;BN;N;END OF TEXT 0004;control;Cc;0;BN;N;END OF TRANSMISSION 0005;control;Cc;0;BN;N;ENQUIRY 0006;control;Cc;0;BN;N;ACKNOWLEDGE 0007;control;Cc;0;BN;N;BELL 0008;control;Cc;0;BN;N;BACKSPACE 0009;control;Cc;0;S;N;CHARACTER TABULATION 000A;control;Cc;0;B;N;LINE FEED (LF) 000B;control;Cc;0;S;N;LINE TABULATION 000C;control;Cc;0;WS;N;FORM FEED (FF) 000D;control;Cc;0;B;N;CARRIAGE RETURN (CR) 000E;control;Cc;0;BN;N;SHIFT OUT 000F;control;Cc;0;BN;N;SHIFT IN 0010;control;Cc;0;BN;N;DATA LINK ESCAPE 0011;control;Cc;0;BN;N;DEVICE CONTROL ONE 0012;control;Cc;0;BN;N;DEVICE CONTROL TWO 0013;control;Cc;0;BN;N;DEVICE CONTROL THREE 0014;control;Cc;0;BN;N;DEVICE CONTROL FOUR 0015;control;Cc;0;BN;N;NEGATIVE ACKNOWLEDGE 0016;control;Cc;0;BN;N;SYNCHRONOUS IDLE 0017;control;Cc;0;BN;N;END OF TRANSMISSION BLOCK 0018;control;Cc;0;BN;N;CANCEL 0019;control;Cc;0;BN;N;END OF MEDIUM 001A;control;Cc;0;BN;N;SUBSTITUTE 001B;control;Cc;0;BN;N;ESCAPE 001C;control;Cc;0;B;N;INFORMATION SEPARATOR FOUR 001D;control;Cc;0;B;N;INFORMATION SEPARATOR THREE 001E;control;Cc;0;B;N;INFORMATION SEPARATOR TWO 001F;control;Cc;0;S;N;INFORMATION SEPARATOR ONE 007F;control;Cc;0;BN;N;DELETE 0080;control;Cc;0;BN;N; 0081;control;Cc;0;BN;N; 0082;control;Cc;0;BN;N;BREAK PERMITTED HERE 0083;control;Cc;0;BN;N;NO BREAK HERE 0084;control;Cc;0;BN;N; 0085;control;Cc;0;B;N;NEXT LINE (NEL) 0086;control;Cc;0;BN;N;START OF SELECTED AREA 0087;control;Cc;0;BN;N;END OF SELECTED AREA 0088;control;Cc;0;BN;N;CHARACTER TABULATION SET 0089;control;Cc;0;BN;N;CHARACTER TABULATION WITH JUSTIFICATION 008A;control;Cc;0;BN;N;LINE TABULATION SET 008B;control;Cc;0;BN;N;PARTIAL LINE FORWARD 008C;control;Cc;0;BN;N;PARTIAL LINE BACKWARD 008D;control;Cc;0;BN;N;REVERSE LINE FEED 008E;control;Cc;0;BN;N;SINGLE SHIFT TWO 008F;control;Cc;0;BN;N;SINGLE SHIFT THREE 0090;control;Cc;0;BN;N;DEVICE CONTROL STRING 0091;control;Cc;0;BN;N;PRIVATE USE ONE 0092;control;Cc;0;BN;N;PRIVATE USE TWO 0093;control;Cc;0;BN;N;SET TRANSMIT STATE 0094;control;Cc;0;BN;N;CANCEL CHARACTER 0095;control;Cc;0;BN;N;MESSAGE WAITING 0096;control;Cc;0;BN;N;START OF GUARDED AREA 0097;control;Cc;0;BN;N;END OF GUARDED AREA 0098;control;Cc;0;BN;N;START OF STRING 0099;control;Cc;0;BN;N; 009A;control;Cc;0;BN;N;SINGLE CHARACTER INTRODUCER 009B;control;Cc;0;BN;N;CONTROL SEQUENCE INTRODUCER 009C;control;Cc;0;BN;N;STRING TERMINATOR 009D;control;Cc;0;BN;N;OPERATING SYSTEM COMMAND 009E;control;Cc;0;BN;N;PRIVACY MESSAGE 009F;control;Cc;0;BN;N;APPLICATION PROGRAM COMMAND Personally, I think that this is error-prone, and the UTC would be far better off instead putting the control code names in field 1, and simply documenting that field 1 contains the character names for non-control characters and the ISO 6429 names for control characters. Fewer people like yourselves would be unpleasantly surprised. Mark — Πόλλ’ ἠπίστατο ἔργα, κακῶς δ’ ἠπίστατο πάντα — Ὁμήρου Μαργίτῃ [For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr] http://www.macchiato.com - Original Message - From: Jarkko Hietaniemi [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Sunday, February 03, 2002 11:03 Subject: names of the control characters A question: Perl offers a way to use
Re: When to use markup: (Was:Introducing the idea of a ROMAN VARIANT SELECTOR (was: Re: Proposing Fraktur))
At 10:55 2/3/2002, Michael \(michka\) Kaplan wrote: Um, for italics one has to use a different font also. Many programs provide an italics button that activates the italic member of a font family, but this still involves selecting a separate font. Au contraire, sir! Many fonts *do* have a separate .TTF files for the italic version, bu there are just as many that do not, yet the italic option does not find itself disabled in programs. Ah. Those 'italics'. Those are not italics. Those are slanted romans. Sorry, I thought we were talking about typography. In Adobe InDesign, the italic function is disabled if an italic font is not available. There is a separate control for slanting text, but it is not possible to accidentally produce a sloped roman in the absence of an italic font. This is how it should be. John Hudson Tiro Typeworks www.tiro.com Vancouver, BC [EMAIL PROTECTED] ... es ist ein unwiederbringliches Bild der Vergangenheit, das mit jeder Gegenwart zu verschwinden droht, die sich nicht in ihm gemeint erkannte. ... every image of the past that is not recognized by the present as one of its own concerns threatens to disappear irretrievably. Walter Benjamin
Should I propose KARA?
I have been thinking of a character that the Japanese call $B!V$+$i!W!#(BThat is, to use romaji, they call it "kara"/"kala". The glyph they usually use for this character is that of FULLWIDTH TILDE, but I don't know if it is really a tilde. In horizontal writing, it looks like the first cycle of a sine wave- that is, up first and then down, then up. But maybe this is not always so and there are variations. I dunno. Now -- and this is important -- in vertican writing, it is kind of pointed downwards, and is NOT the same as the given horizontal glyph rotated 90 degrees. It is mirrored, I think, so it does not look like $B!V$7!W!#(B It is used to indicate ranges of numbers and such. Like in a list in $B#5#02;(B order, you see headings $B$"!A$*!"$+!A$3!"(Betc., sometimes. Do not confuse KARA with the cute variant of KATAKANA-HIRAGANA PROLONGED SOUND MARK. They are as different functionally as DIGIT ZERO and LATIN CAPITAL LETTER O. Should I propose it? I could probably give you examples. $B"*!!$8$e$&$$$C$A$c$s!!"+(B $B!!$@$s$;$$$i$7$5$`$h$&(B _ $B$*E9$h$j$b5$7Z$K!*9%$-$J%b%N9%$-$J$@$18+$i$l$k(B MSN $B%7%g%C%T%s%0(B http://shopping.msn.co.jp/
Re: Unicode and Security
On Sun, 3 Feb 2002, John Cowan wrote: Gaspar Sinai scripsit: The following page contains my view of Unicode BIDI algorithm (with screenshots). http://www.yudit.org/security/ Oooo-kay. This is not a Unicode problem per se: it is about embedded text vs. text that is not embedded. The Yudit and IE versions are displaying a text (Java code) that is essentially in Latin script (LTR) with some RTL inclusions. However, when the Java application actually runs, it displays three separate and distinct texts, each of which is an RTL text with some LTR inclusions. They are assumed to be RTL text, by the bidi rules, because they begin with a strong RTL character. Similar things happen when you construct XML documents with RTL element names: the bidi rules, which are meant for true text and not computer-readable stuff, sometimes produce visually confusing results. So it is perfectly ok? I can make a non-ebedded example too. I do not have time to make childish examples and screenshots to get through my point. I have a job to do and text processing is just my hobby. The rendering problems are all side effects of the unicode bi-di algorithm. If unicode bidi algorithm would be proven to be reversable (logical-display ; display-logical) I would not go to bed worrying about my signed documents. Thats my view of the problem. Cheers gaspar
Re: Unicode and Security
Gaspar Sinai scripsit: The following page contains my view of Unicode BIDI algorithm (with screenshots). http://www.yudit.org/security/ Oooo-kay. This is not a Unicode problem per se: it is about embedded text vs. text that is not embedded. The Yudit and IE versions are displaying a text (Java code) that is essentially in Latin script (LTR) with some RTL inclusions. However, when the Java application actually runs, it displays three separate and distinct texts, each of which is an RTL text with some LTR inclusions. They are assumed to be RTL text, by the bidi rules, because they begin with a strong RTL character. Similar things happen when you construct XML documents with RTL element names: the bidi rules, which are meant for true text and not computer-readable stuff, sometimes produce visually confusing results. -- John Cowan http://www.ccil.org/~cowan [EMAIL PROTECTED] To say that Bilbo's breath was taken away is no description at all. There are no words left to express his staggerment, since Men changed the language that they learned of elves in the days when all the world was wonderful. --_The Hobbit_
Re: Unicode and Security
Gaspar Sinai scripsit: So it is perfectly ok? I can make a non-ebedded example too. I do not have time to make childish examples and screenshots to get through my point. I have a job to do and text processing is just my hobby. Mine too, but it's difficult to understand the merits of an objection when no actual examples of the problem are given. -- John Cowan http://www.ccil.org/~cowan [EMAIL PROTECTED] To say that Bilbo's breath was taken away is no description at all. There are no words left to express his staggerment, since Men changed the language that they learned of elves in the days when all the world was wonderful. --_The Hobbit_
Re: Unicode and Security
On Sun, 3 Feb 2002, John Cowan wrote: Gaspar Sinai scripsit: So it is perfectly ok? I can make a non-ebedded example too. I do not have time to make childish examples and screenshots to get through my point. I have a job to do and text processing is just my hobby. Mine too, but it's difficult to understand the merits of an objection when no actual examples of the problem are given. So common language is screenshots... Ok. I updated the page. Now the exact same file is viewed with two different viewers at the bottom of this page: http://www.yudit.org/security/ I maintain my view that if there is no proven reversable logical-to-viewed/viewed-to-logical electronic signatures should be avoided. And the bottom line is: I don't really care if Unicode will admit that this is a problem. If my reasoning (not my screenshots) convince *some* people not to sign electronically unicode text I think I did those guys good - and that is enough satisfaction for me. Cheers gaspar
Re: Unicode and Security
On Mon, Feb 04, 2002 at 02:25:05PM +0900, Gaspar Sinai wrote: And the bottom line is: I don't really care if Unicode will admit that this is a problem. If my reasoning (not my screenshots) convince *some* people not to sign electronically unicode text I think I did those guys good - and that is enough satisfaction for me. Why not just warn against signing documents with bidi in them? Odds are, people who would run into this, if warned against using Unicode, would use ISO-8859-6/8 - which is often ran through the same bidi algorithim. And what if you don't do those guys good? They miss a multimillion dollar account because they can't work with the client, or they fall for something more common because they're worrying about Unicode? -- David Starner - [EMAIL PROTECTED], dvdeug/jabber.com (Jabber) Pointless website: http://dvdeug.dhis.org What we've got is a blue-light special on truth. It's the hottest thing with the youth. -- Information Society, Peace and Love, Inc.