Re: About the European MES-2 subset
. Michael Everson wrote, I wasn't talking about that, but if you'd like my opinion, I hate that J too. Apathy, intolerance, bigotry, death, taxation, ignorance, oppression... Surely we can reserve our hatred for targets more worthy than a colleague's variant glyph preferences. Regards, James Kass .
Re: Last Resort Glyphs (was: About the European MES-2 subset)
Philippe Verdy wrote on 07/20/2003 08:37:19 AM: What would be the purpose of encoding these? I can't think of any. They certainly don't need to be encoded as distinct characters to use in a Last Resort font. Mostly for documentation purpose Since Unicode is not a glyph encoding standard, there's no need for it to assign glyphs to codepoints for documentation purposes. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485
RE: About the European MES-2 subset
This is not to say that the MESes are unproblematic. To mention just two points not already mentioned: none of the new math characters are included even in MES-3 (a, b), despite that all math characters were supposed to be included Michael E responded: That isn't true. Eeh, well, disregarding some CJK compat chars that have general category Sm (which are rightly excluded from the MESes), the following blocks (or formally, closely corresponding collections) are missing from MES-3A (the largest of the MESes): 27C0..27EF; Miscellaneous Mathematical Symbols-A 27F0..27FF; Supplemental Arrows-A 2900..297F; Supplemental Arrows-B 2980..29FF; Miscellaneous Mathematical Symbols-B 2A00..2AFF; Supplemental Mathematical Operators 2B00..2BFF; Miscellaneous Symbols and Arrows and (much as I dislike them, and they haven't GC Sm but L{u,l}) 1D400..1D7FF; Mathematical Alphanumeric Symbols (MES-3A lists collections rather than individual characters, and includes some code points are not (yet) bound to any character.) But are you saying that it was not the the intent to include all math characters? But all the old ones (the ones that were included in 10646 at the time the MESes were deviced) are included even in the smaller MES-2. and not even MES-3 covers all official minority languages. What's missing? Hebrew, used for Yiddish, which is now an official minority language in Sweden. (Though various languages written with the Arabic script are more common in official information to the public.) But I understand that was excluded since (in practice) anything bidi was excluded from the MESes. Also of European interest, though not for a language per se, are Braille patterns and modern musical symbols. (Not for all European fonts, though, but the same goes for math symbols.) /kent k
Re: Last Resort Glyphs (was: About the European MES-2 subset)
At 23:34 +0200 2003-07-19, Philippe Verdy wrote: I'm still convinced that these glyphs are much more informative than a default glyph showing a ?, a white rectangle, or a black losange with a mirrored white ?... Of course they are. And Unicode also uses these glyphs in the index page for its charmaps, You mean for its charts. Please. but they are shown as poor bitmaps (may be the PDF or book version use your glyphs in a document-embedded font) That page is in HTML. How were your glyphs contributed? I, uh, drew them. With SVG graphics containing character objects and drawing primitives I have no idea what this means. I used Fontographer. (it seems the simplest way to derive them, using the table shown in Apple's web page, with some exceptions for unassigned, reserved, forbidden or surrogates symbols which require a distinct design)? You can't derive these. You have to draw them individually. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Last Resort Glyphs (was: About the European MES-2 subset)
On Sunday, July 20, 2003 2:21 PM, Michael Everson [EMAIL PROTECTED] wrote: With SVG graphics containing character objects and drawing primitives I have no idea what this means. I used Fontographer. SVG is a W3C-promoted standard for Scalable Vector Graphics, based on a XML language, and allowing to describe vector graphics with 2D primitives, and it can be used to produce custom fonts of symbols, in a more appealing way than with bitmaps. A SVG graphic can be used at the source URL of an img / or object / element within HTML. Most vectorial graphic tool can generate or conert their proprietary format with SVG, used as a lingua franca for vector graphics interchanges (deprecating legacy proprietary formats like MacDraw and WMF, or the many other formats created by every drawing tool on the market). SVG graphics are now very popular and recognized by many publishing layout engines, and they are great for many websites that wish to compute and generate dynamic graphics (because these graphics can be updated online with its DOM tree, and easily generated from templates by XSLT processors). The palette of SVG primitives is rich and includes many presentation features (including colors, shading, transparency effects, regions combining operators). Recent versions of MS-Office use SVG within their new XML document format to embed graphics, or presentation effects, without the limitations of HTML. When I look at the Apple's Developer page, all what I see in the table of glyphs and in the description can be represented with a SVG graphic, including Unicode-encoded text primitives for the representative glyph chosen in their table. In a first approach, each defined PostScript name can be bound to a SVG filename, and a font can be made from it, by packing all these SVG in a ZIP archive, which can also contain description tables. Then any font format can be derived from this editable format.
Re: Last Resort Glyphs (was: About the European MES-2 subset)
Philippe Verdy wrote on 07/19/2003 01:24:48 PM: Isn't this page creating the idea for a specific block of script-representative glyphs, that could be mapped in plane 14 as special supplementary characters ? What would be the purpose of encoding these? I can't think of any. They certainly don't need to be encoded as distinct characters to use in a Last Resort font. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485
Re: About the European MES-2 subset
On Windows, the cannot find a font for it situation is the NULL glyph. The Last Resort font is cool but a Code2000 stab at the actual glyph is (IMHO) cooler than both.:-) Then wouldn't it make sense for Arial Unicode MS to be included with Windows rather than just with Office? - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485
Re: Last Resort Glyphs (was: About the European MES-2 subset)
On Sunday, July 20, 2003 3:20 PM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Philippe Verdy wrote on 07/19/2003 01:24:48 PM: Isn't this page creating the idea for a specific block of script-representative glyphs, that could be mapped in plane 14 as special supplementary characters ? What would be the purpose of encoding these? I can't think of any. They certainly don't need to be encoded as distinct characters to use in a Last Resort font. Mostly for documentation purpose, but also in most system that want to be more informative to users missing a font for a particular script. Michael also judged it to be useful enough to create such a font for Apple, and Apple thought it would be useful for its Mac users. From usefulness comes the use, and thus some legitimacy to encode it within text, as special symbols that should not be represented as the normal glyph, but with these symbols. It's also a fact that these symbols are used (as bitmaps) in the online Unicode charts (not charmaps, sorry for the wrong term), and probably with the Michael's custom font in the published Unicode book. It's true that one can make a documentation without actually using a font with assigned codepoints for them. (A collection of SVG graphic could work for publishing purposes). But editing the cmap of a TrueType font to include all possible codepoints would require to map all the 17 planes in the cmap, and unless the cmap is compressed, this would require 1,114,112 mappings, or more than 2MB only for the cmap. This is probably too much for a default font, even if the system uses paging to access this TrueType font. In fact, a font with only the single glyphs ordered by allocation date for the corresponding block, and an extra table with a a cmap-like table using ranges of codepoints instead of simple entries would probably make things better (of course this would be an extension to the standard tables used by classic fonts). Without such TTF extension, it would be simpler to map only surrogates, and thus use only 128KB for a UTF-16 based cmap. I don't know the internals of the OpenType format, may be such compressed format for internal tables already exists that allows representing ranges, or there is space with table IDs allowed for application-specific custom tables.
Re: Last Resort Glyphs (was: About the European MES-2 subset)
At 08:20 -0500 2003-07-20, [EMAIL PROTECTED] wrote: What would be the purpose of encoding these? I can't think of any. They certainly don't need to be encoded as distinct characters to use in a Last Resort font. I am certain more people want to interchange the LITTER DUDE than would want to interchange script block indicators. (Ken suggested offline that this name might be better-received than the DO NOT LITTER SIGN) -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: About the European MES-2 subset
Well, I thought Arial Unicode MS is a little pricey for just putting it anywhere? I may be wrong here (and I have no idea how much is costs, really), but the huge size compared to megafonts like Code2000, which is based in part on the rich Arial typeface heritage, also makes it a font of some value and a legitimate value add where it is... Of course, all of this is IMHO, as I have no real knowledge of what Office or even nearby Typography think about any of these things MichKa [MS] - Original Message - From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Sunday, July 20, 2003 6:20 AM Subject: Re: About the European MES-2 subset On Windows, the cannot find a font for it situation is the NULL glyph. The Last Resort font is cool but a Code2000 stab at the actual glyph is (IMHO) cooler than both.:-) Then wouldn't it make sense for Arial Unicode MS to be included with Windows rather than just with Office? - Peter -- - Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485
Re: Last Resort Glyphs (was: About the European MES-2 subset)
On Saturday, July 19, 2003, at 1:15 PM, Michael Everson wrote: So fonts containing these glyphs could be designed to display these glyphs, in a way similar to the current assignment of control pictures. Um, that's what the Last Resort font does, outside of Unicode encoding space. (I don't think PUA characters are used, actually, but I could be wrong. No, it uses the acutal Unicode characters, and just has a huge cmap that maps everything in Unicode to the glyph for its block. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jhjenkins/
Re: Last Resort Glyphs (was: About the European MES-2 subset)
On Sunday, July 20, 2003, at 7:37 AM, Philippe Verdy wrote: Mostly for documentation purpose, but also in most system that want to be more informative to users missing a font for a particular script. Michael also judged it to be useful enough to create such a font for Apple, and Apple thought it would be useful for its Mac users. Er, no. Apple thought it would be useful for its Mac users and commissioned Michael to make glyphs. (And I personally think he's done an excellent job.) == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jhjenkins/
Re: About the European MES-2 subset
On Friday, July 18, 2003, at 4:45 PM, Michael (michka) Kaplan wrote: A question mark is a sign of a bad conversion from Unicode (to a code page that did not contain the character). This would likely happen on the Mac too rather than the Last Resort font, wouldn't it? MS Explorer on the Mac converts Unicode to old Mac scripts which it then renders. That's why all the question marks when the page is looked at with MS Explorer. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jhjenkins/
Re: Last Resort Glyphs (was: About the European MES-2 subset)
What would be the purpose of encoding these? I can't think of any. They certainly don't need to be encoded as distinct characters to use in a Last Resort font. Mostly for documentation purpose, Why bother to encode them as distinct characters? For purposes of documentation isn't a good reason to encode these things, which are simply a set of fall-back glyphs for user convenience to show what isn't installed! If you want documentation for the Last Resort font, just make documentation (or ask Apple to make some). Rick
Re: Last Resort Glyphs (was: About the European MES-2 subset)
At 09:56 -0600 2003-07-20, John H. Jenkins wrote: No, it uses the acutal Unicode characters, and just has a huge cmap that maps everything in Unicode to the glyph for its block. That is just so cool. :-) -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: About the European MES-2 subset
On 19/07/2003 17:32, John Cowan wrote: Peter Kirk scripsit: But it can be useful to know whether what you are getting is hangul etc, or an Indian script, or some other script you don't know, or some symbols or mathematical codes, or else the result of some kind of encoding conversion error. Precisely where the Last Resort font shines, without carrying the overhead in glyph images of a normal giant font. Indeed. Where can I get the Last Resort font for Windows (2000)? If the answer is nowhere, I guess I am stuck with Arial Unicode MS or the horrible-looking (the J always grates!) Code2000. -- Peter Kirk [EMAIL PROTECTED] http://web.onetel.net.uk/~peterkirk/
Re: Last Resort Glyphs (was: About the European MES-2 subset)
On 20/07/2003 06:20, [EMAIL PROTECTED] wrote: Philippe Verdy wrote on 07/19/2003 01:24:48 PM: Isn't this page creating the idea for a specific block of script-representative glyphs, that could be mapped in plane 14 as special supplementary characters ? What would be the purpose of encoding these? I can't think of any. They certainly don't need to be encoded as distinct characters to use in a Last Resort font. - Peter One good reason would be so that a page like http://www.unicode.org/charts/ can be represented without having to use lots of .gifs, so for efficiency, searchability etc. Which is pretty much the same reason for defining any Unicode characters at all, given that documents and web pages can always be created, though inefficiently and unsearchably, from lots of images. -- Peter Kirk [EMAIL PROTECTED] http://web.onetel.net.uk/~peterkirk/
Re: About the European MES-2 subset
At 12:38 -0700 2003-07-20, Peter Kirk wrote: Indeed. Where can I get the Last Resort font for Windows (2000)? If the answer is nowhere, I guess I am stuck with Arial Unicode MS or the horrible-looking (the J always grates!) Code2000. I'll go have a chat with some of my Apple colleagues about this. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: About the European MES-2 subset
At 12:38 -0700 2003-07-20, Peter Kirk wrote: Indeed. Where can I get the Last Resort font for Windows (2000)? If the answer is nowhere, I guess I am stuck with Arial Unicode MS or the horrible-looking (the J always grates!) Code2000. I'll go have a chat with some of my Apple colleagues about this. It's unlikely that your Apple colleagues can do anything for the J in Code2000. Best regards, James Kass .
Re: About the European MES-2 subset
At 20:50 + 2003-07-20, [EMAIL PROTECTED] wrote: At 12:38 -0700 2003-07-20, Peter Kirk wrote: Indeed. Where can I get the Last Resort font for Windows (2000)? If the answer is nowhere, I guess I am stuck with Arial Unicode MS or the horrible-looking (the J always grates!) Code2000. I'll go have a chat with some of my Apple colleagues about this. It's unlikely that your Apple colleagues can do anything for the J in Code2000. I wasn't talking about that, but if you'd like my opinion, I hate that J too. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: About the European MES-2 subset
On 20/07/2003 13:50, [EMAIL PROTECTED] wrote: At 12:38 -0700 2003-07-20, Peter Kirk wrote: Indeed. Where can I get the Last Resort font for Windows (2000)? If the answer is nowhere, I guess I am stuck with Arial Unicode MS or the horrible-looking (the J always grates!) Code2000. I'll go have a chat with some of my Apple colleagues about this. It's unlikely that your Apple colleagues can do anything for the J in Code2000. Best regards, James Kass . James, just to clarify since you are here: I am very grateful for the fonts Code2000 and Code2001 and that you have made these so easily available at http://home.att.net/~jameskass/. I don't like some of the glyph shapes, especially the J with a cross-bar like a T. But it is a lot better than nothing. When I need nice glyphs for particular Unicode ranges, I look elsewhere, though sometimes in vain. For example, who else even tries to cover the mathematial symbols in plane 1, at least in a downloadable font? -- Peter Kirk [EMAIL PROTECTED] http://web.onetel.net.uk/~peterkirk/
Re: About the European MES-2 subset
On 18/07/2003 17:42, John Cowan wrote: Seeing hanzi, hangeul, etc. gets old when you a) can't read the text and b) suspect it is spam anyhow. But it can be useful to know whether what you are getting is hangul etc, or an Indian script, or some other script you don't know, or some symbols or mathematical codes, or else the result of some kind of encoding conversion error. -- Peter Kirk [EMAIL PROTECTED] http://web.onetel.net.uk/~peterkirk/
Re: About the European MES-2 subset (was: PUA Audio Description, Subtitle, Signing)
On Friday, July 18, 2003 10:18 PM, Michael Everson [EMAIL PROTECTED] wrote: I *prefer* Unicode to any subset thereof. Why such preference? Unicode does not define the charset (which are defined by ISO10646), but character properties and related algorithms, and (in cooperation with ISO10646) their codepoint assignments. For me, Unicode is NOT a character set, but an encoded character set, with a small but important nuance: You need to specify a version after Unicode to indicate the character set. So Unicode 4.0 is a character set, and a superset of Unicode 3.2, but Unicode alone is not. If you just look at this definition, you cannot prefer Unicode to any subset, because Unicode is just a name of a collection of standards and a collection of character sets and algorithms, and already is a subset of the next version... If you cannot support the idea of subsets, then don't use Unicode, or wait that the Unicode standard is definitely closed, or permanently consider that is repertoire is now closed and no more characters will be added... Of course you would be wrong. MES-2 or its MES extension is a character set (like most legacy encodings in IANA which are also encoded character sets). In practice, nobody can live and implement any software without clearly bounded sets of characters. So versioning is absolutely necessary to fix these bounds in terms of implementation levels. -- Philippe. Spams non tolérés: tout message non sollicité sera rapporté à vos fournisseurs de services Internet.
Re: About the European MES-2 subset (was: PUA Audio Description,Subtitle, Signing)
At 15:23 +0200 2003-07-19, Philippe Verdy wrote: Unicode does not define the charset (which are defined by ISO10646), That isn't true. They both define the same character set. (I will not use the term charset.) but character properties and related algorithms, and (in cooperation with ISO10646) their codepoint assignments. The code position assignments are (formally) assigned by WG2, but there is consensus between UTC and WG2 on this matter. For me, Unicode is NOT a character set, but an encoded character set, with a small but important nuance: You need to specify a version after Unicode to indicate the character set. So Unicode 4.0 is a character set, and a superset of Unicode 3.2, but Unicode alone is not. To me, Unicode refers to the most recent version. :-) If you just look at this definition, you cannot prefer Unicode to any subset, Yes, I can. because Unicode is just a name of a collection of standards and a collection of character sets and algorithms That isn't true. If you think this is true, you really have a lot to learn about Unicode. and already is a subset of the next version... If you cannot support the idea of subsets, then don't use Unicode, or wait that the Unicode standard is definitely closed, or permanently consider that is repertoire is now closed and no more characters will be added... Of course you would be wrong. I think you mistook me. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: About the European MES-2 subset
At 16:41 -0700 2003-07-18, Michael \(michka\) Kaplan wrote: I am pretty sure you have to be wrong here, Michael. Attend me: 1) API converts from Unicode to the wrong code page 2) API does some sort of work with the string 3) API tries to display the string How on earth could it from the Last Resort font, unless it is a generic glyph that contains no script info (which would be no better than a question mark or a NULL glyph) ? Hm. See http://developer.apple.com/fonts/LastResortFont/ where it shows glyphs for illegal characters (FFFE/ etc.) as well as undefined characters (valid code positions which have not been assigned). I thought somehow that there was a glyph for broken characters (characters that were just plain wrong) as well. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Last Resort Glyphs (was: About the European MES-2 subset)
On Saturday, July 19, 2003 1:55 PM, Michael Everson [EMAIL PROTECTED] wrote: Hm. See http://developer.apple.com/fonts/LastResortFont/ where it shows glyphs for illegal characters (FFFE/ etc.) as well as undefined characters (valid code positions which have not been assigned). I thought somehow that there was a glyph for broken characters (characters that were just plain wrong) as well. Isn't this page creating the idea for a specific block of script-representative glyphs, that could be mapped in plane 14 as special supplementary characters ? If the estimated number of Unicode blocks is expected to be under 1024, this block would use one special character to represent the glyph, i.e. not a control character, but a symbol representative of each assigned Unicode block. If such assignment is not easy to estimate now, glyphs for scripts should be assigned in the order of their definition in successive versions of Unicode). So fonts containing these glyphs could be designed to display these glyphs, in a way similar to the current assignment of control pictures. This page already gives the names of the characters according to the official names of scripts, but a more uniform name than the Postscript name could be used, such as: UNASSIGNED BLOCK SYMBOL, UNASSIGNED CHARACTER SYMBOL, ILLEGAL CHARACTER SYMBOL, then... BASIC LATIN SCRIPT SYMBOL, EXTENDED LATIN 1 SCRIPT SYMBOL, ... By itself, this Apple Developers page is nearly the base for such proposal. If needed, the Unicode blocks.txt could specify additional columns to specify the assignment of each script block, with special entries for the symbol used to represent unassigned characters in assigned blocks, or unassigned blocks. -- Philippe. Spams non tolérés: tout message non sollicité sera rapporté à vos fournisseurs de services Internet.
Re: Last Resort Glyphs (was: About the European MES-2 subset)
At 20:24 +0200 2003-07-19, Philippe Verdy wrote: Isn't this page creating the idea for a specific block of script-representative glyphs, that could be mapped in plane 14 as special supplementary characters ? Good heavens, no. It's one thing for me to update this font regularly for Apple when new blocks get added to the standard. It's quite another thing to suggest that we should have to add, formally, a new block symbol to some block in Plane 14 every time we add a new block to the standard. Isn't it? Surely the correct thing to do is to implement Last Resort support for different platforms as Apple indicates using those character names. So fonts containing these glyphs could be designed to display these glyphs, in a way similar to the current assignment of control pictures. Um, that's what the Last Resort font does, outside of Unicode encoding space. (I don't think PUA characters are used, actually, but I could be wrong. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Last Resort Glyphs (was: About the European MES-2 subset)
On Saturday, July 19, 2003 9:15 PM, Michael Everson [EMAIL PROTECTED] wrote: So fonts containing these glyphs could be designed to display these glyphs, in a way similar to the current assignment of control pictures. Um, that's what the Last Resort font does, outside of Unicode encoding space. (I don't think PUA characters are used, actually, but I could be wrong. I see that Apple maps it to a PostScript dictionary namespace, but this seems limitative for the implementation, when almost all foundries are converting now their Type1 fonts to OpenType, which is much more efficient, but still requires some entry point with a numeric assignment (a glyph ID will still require an input codepoint to seek relevant glyphs, and a PUA still requires a table of conversion from ranges to that font-specific PUA, and a TrueType font not marked as Unicode compatible would use direct glyph IDs from a externally defined character set similar to legacy charsets, except that they can't be mapped to Unicode). I'm still convinced that these glyphs are much more informative than a default glyph showing a ?, a white rectangle, or a black losange with a mirrored white ?... And Unicode also uses these glyphs in the index page for its charmaps, but they are shown as poor bitmaps (may be the PDF or book version use your glyphs in a document-embedded font) How were your glyphs contributed? With SVG graphics containing character objects and drawing primitives (it seems the simplest way to derive them, using the table shown in Apple's web page, with some exceptions for unassigned, reserved, forbidden or surrogates symbols which require a distinct design)? -- Philippe. Spams non tolérés: tout message non sollicité sera rapporté à vos fournisseurs de services Internet.
Re: Last Resort Glyphs (was: About the European MES-2 subset)
Apple's version of the Last Resort font is a (relatively) normal font. It just has a cmap that maps lots and lots of characters to the same glyph. :-) Deborah Goldsmith Manager, Fonts / Unicode Liaison Apple Computer, Inc. [EMAIL PROTECTED] On Saturday, July 19, 2003, at 12:15 PM, Michael Everson wrote: Um, that's what the Last Resort font does, outside of Unicode encoding space. (I don't think PUA characters are used, actually, but I could be wrong.
Re: About the European MES-2 subset
Peter Kirk scripsit: But it can be useful to know whether what you are getting is hangul etc, or an Indian script, or some other script you don't know, or some symbols or mathematical codes, or else the result of some kind of encoding conversion error. Precisely where the Last Resort font shines, without carrying the overhead in glyph images of a normal giant font. -- May the hair on your toes never fall out! John Cowan --Thorin Oakenshield (to Bilbo) [EMAIL PROTECTED]
Re: About the European MES-2 subset (was: PUA Audio Description,Subtitle, Signing)
At 00:57 +0200 2003-07-18, Philippe Verdy wrote: Why is row 03 so resticted? Shouldn't it include those accents and diacritics that are used by other characters once canonically decomposed? Or does it imply that MES-2 is only supposed to use strings if NFC form? Also, is this list under full closure with existing character properties, like NFKD decompositions, and case mappings? The MES-2 is what it is, and was developed at the time when it was. It is thought to be a minumum requirement for European requirements, and is certainly a lot better than that old Adobe glyph list that was supported earlier on. It doesn't depend on very smart fonts. Personally I prefer the Multilingual European Subset. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: About the European MES-2 subset (was: PUA Audio Description, Subtitle, Signing)
On Friday, July 18, 2003 7:36 AM, Michael Everson [EMAIL PROTECTED] wrote: At 00:57 +0200 2003-07-18, Philippe Verdy wrote: Why is row 03 so resticted? Shouldn't it include those accents and diacritics that are used by other characters once canonically decomposed? Or does it imply that MES-2 is only supposed to use strings if NFC form? Also, is this list under full closure with existing character properties, like NFKD decompositions, and case mappings? The MES-2 is what it is, and was developed at the time when it was. It is thought to be a minumum requirement for European requirements, and is certainly a lot better than that old Adobe glyph list that was supported earlier on. It doesn't depend on very smart fonts. Personally I prefer the Multilingual European Subset. Is there some work at CEN to align its MES-2 subset into a revized (MES-2.1 ???) which not only takes into consideration the ISO10646 reference but also its Unicode properties to make this set self-closed, and actually implementable, at least with NFC closure and case-mappings closure? Support for NFKC closure should then be added in a next step, which could optionally specify support for the corresponding decompositions (but this would include combining characters, and would extend the number of precomposed characters in NFC form to include in the repertoire). I don't think it's up to Unicode to do this work, but CEN should be contacted to perform this job, or some vendor or open-sourcers may have done it and published it. I still note that modern Hebrew and Arabic are excluded from MES-2, as they are not used in any official language in the European Union or EFTA, or future EU candidates. But They are certainly of great interest for countries with which the EU is a major partner, and which are using these scripts. In some future, it would be needed to include support for modern Georgian (a subset of U+10A0..U+10FF), and modern Armenian (a subset of U+0530..U+058F), as well as some characters from Cyrillic Supplementary (in U+0500..U+052F). On the opposite, I don't understand why MES-2 included characters in row U+25xx (Box Drawing, Block Elements, Geometric Shapes), which are not strictly needed for text purpose (notably legal publications of the E.U., which should better use markup systems), and the two Alphabetic Presentation Forms U+FB01..U+FB02 (fi and fl ligatures) which are really unneeded, even for legal purposes, or they should have been coherent and included ff, ffi, ffl ligatures... I suppose that this may come from widely used legacy encodings in some EU+EFTA+European Council countries, but CEN should have avoided them (they could still be selected by font renderers, if available in fonts). -- Philippe. Spams non tolérés: tout message non sollicité sera rapporté à vos fournisseurs de services Internet.
Re: About the European MES-2 subset (was: PUA Audio Description,Subtitle, Signing)
At 12:16 +0200 2003-07-18, Philippe Verdy wrote: Is there some work at CEN to align its MES-2 subset into a revized (MES-2.1 ???) which not only takes into consideration the ISO10646 reference but also its Unicode properties to make this set self-closed, and actually implementable, at least with NFC closure and case-mappings closure? No. The relevant CEN committee is now dormant. I still note that modern Hebrew and Arabic are excluded from MES-2, as they are not used in any official language in the European Union or EFTA, or future EU candidates. But They are certainly of great interest for countries with which the EU is a major partner, and which are using these scripts. In some future, it would be needed to include support for modern Georgian (a subset of U+10A0..U+10FF), and modern Armenian (a subset of U+0530..U+058F), as well as some characters from Cyrillic Supplementary (in U+0500..U+052F). The European Multilingual Subset supports all of Latin, Greek, Cyrillic, and Armenian. Unicode supports Hebrew and Arabic. On the opposite, I don't understand why MES-2 included characters in row U+25xx (Box Drawing, Block Elements, Geometric Shapes) Legacy compatability with IBM and others. which are not strictly needed for text purpose (notably legal publications of the E.U., which should better use markup systems), and the two Alphabetic Presentation Forms U+FB01..U+FB02 (fi and fl ligatures) which are really unneeded, even for legal purposes, or they should have been coherent and included ff, ffi, ffl ligatures... Legacy compatibility with Apple. I suppose that this may come from widely used legacy encodings in some EU+EFTA+European Council countries, but CEN should have avoided them (they could still be selected by font renderers, if available in fonts). You are entitled to your opinion. This work was begun and finished long ago. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: About the European MES-2 subset (was: PUA Audio Description, Subtitle, Signing)
Philippe Verdy wrote: MES-2 is a collection of characters independant of their actual encoding. To support MES-2 in a Unicode-compliant application, extra characters need to be added, notably if the minimum requirement for information interchange is the NFC form used by XML and HTML related standards. The Unicode normal forms (for a particular version of Unicode) is defined for ALL of the characters in that version. There is no concept of a Unicode normal form for a subset of the characters in a particular version. However, the MESes (there are four of them!) are useful for specifying minimum European font coverage, and input method support (the latter need not be via keyboard). This is not to say that the MESes are unproblematic. To mention just two points not already mentioned: none of the new math characters are included even in MES-3 (a, b), despite that all math characters were supposed to be included, and not even MES-3 covers all official minority languages. It would be interesting to inform CEN about how MES-2 can be documented to comply with all normative Unicode algorithms, and the minimum is to ensure the NFC closure of this subset, which should have better not included compatibility characters canonically decomposed to singleton decompositions, and should now reintegrate the missing NFC form. I think it is [extremely] unlikely at this point to expect anyone to change, or add new, MESes. Note that implementors are in no way prohibited from supporting (in fonts, plus rendering software, and some form of input) more than the MESes state. (But as Philippe states, there are some rather useless characters that have been included for compatibility reasons.) /kent k
Re: About the European MES-2 subset
On 18/07/2003 03:16, Philippe Verdy wrote: I still note that modern Hebrew and Arabic are excluded from MES-2, as they are not used in any official language in the European Union or EFTA, or future EU candidates. ... But they are used in official publications within the EU, those targeted at minority communities. But then so are south Asian and east Asian scripts. ... But They are certainly of great interest for countries with which the EU is a major partner, and which are using these scripts. In some future, it would be needed to include support for modern Georgian (a subset of U+10A0..U+10FF), and modern Armenian (a subset of U+0530..U+058F), as well as some characters from Cyrillic Supplementary (in U+0500..U+052F). If this subset is to be enlarged very much, and to require complex script rendering etc for its implementation, surely there is little point in specifying anything less than the improper (in the mathematical sense!) subset which Ken mentioned, i.e. the whole of Unicode. -- Peter Kirk [EMAIL PROTECTED] http://web.onetel.net.uk/~peterkirk/
Re: About the European MES-2 subset (was: PUA Audio Description, Subtitle, Signing)
On Friday, July 18, 2003 12:42 PM, Michael Everson [EMAIL PROTECTED] wrote: At 12:16 +0200 2003-07-18, Philippe Verdy wrote: Is there some work at CEN to align its MES-2 subset into a revized (MES-2.1 ???) which not only takes into consideration the ISO10646 reference but also its Unicode properties to make this set self-closed, and actually implementable, at least with NFC closure and case-mappings closure? No. The relevant CEN committee is now dormant. So this work must be done by independant open-sourcers sharing their experience to allow fonts to be created that are completely compatible with MES-2. (Here this is my opinion: I think it's stupid to create fonts that are containing strictly, only but completely the MES-2 set, which must only be viewed as a minimum set). I note that Microsoft core fonts for Windows are supporting MES-2, but in a unrestricted way: other characters are also included, and UniScribe allows selecting ligatures and rendering combining sequences with composite glyphs if defined in OpenType fonts, or with a default multi- glyph stack. I note that you prefer the European Multilingual Subset to MES-2. Is it an extended set that includes MES-2, and fills the holes by using all characters defined in blocks of some version of the Unicode set?
Re: About the European MES-2 subset
On Friday, July 18, 2003 1:13 PM, Peter Kirk [EMAIL PROTECTED] wrote: On 18/07/2003 03:16, Philippe Verdy wrote: I still note that modern Hebrew and Arabic are excluded from MES-2, as they are not used in any official language in the European Union or EFTA, or future EU candidates. ... But they are used in official publications within the EU, those targeted at minority communities. But then so are south Asian and east Asian scripts. But for these Asian languages, I think it's best to have fonts designed to handle correctly their corresponding scripts, instead of a giant font poorly hinted for readability at small sizes, and without support of common ligatures. Arabic, Hebrew and Brahmic scripts should better be supported by their own fonts, rather than partially (for example the inclusion of Brahmic digits only in Arial Unicode MS was an error, in my opinion, and Microsoft should have better provided separate fonts for these Brahmic scripts, rather than specifying that its fonts support these scripts). ... But They are certainly of great interest for countries with which the EU is a major partner, and which are using these scripts. In some future, it would be needed to include support for modern Georgian (a subset of U+10A0..U+10FF), and modern Armenian (a subset of U+0530..U+058F), as well as some characters from Cyrillic Supplementary (in U+0500..U+052F). For the case of Armenian and Georgian Mkedruli, they do not seem complex to add in a font. If this subset is to be enlarged very much, and to require complex script rendering etc for its implementation, surely there is little point in specifying anything less than the improper (in the mathematical sense!) subset which Ken mentioned, i.e. the whole of Unicode. I agree with this point. But this is not an excuse to not implement and support at least the NFC and case mapping closures in a decent font for any script, even if the script is reduced to letters used in the modern language. But some optional ligatures not strictly needed for a set of written modern languages may strictly be not needed if the font or renderer supports correct fallback decompositions (for example with fi, fl, ffi, ffl). What is important here is the legality of the printed text, so that no confusion is possible for a text written in any language. One good source of such characters needed for languages can be found in the Openi18n.org LDML database (notably the ICU section which is the most complete collection), which contain definitions of examplarCharacters for each supported language (but there may exist some omissions). One regret: some characters are used and examplar but not mandatory to support a language and they should be listed separately, as well as rare characters if they are used only in proper names or geographical names or translitterated foreign words which can often be written with a the common letters with a phonetic approach. An example is: Norsk Bokmål, most often transcripted to: norvégien bokmal or bokmâl in French (where the circumflex is used both as a way to specify an open and/or lengthened vowel), or translated to: norvégien classique (by opposition to: norvégien réformé, ou nouveau norvégien). So examplarCharacters in a language are a good indication to indicate the needed characters for a language, even if an official transliteration rule is used to translate imported foreign words with more characters. -- Philippe. Spams non tolérés: tout message non sollicité sera rapporté à vos fournisseurs de services Internet.
Re: About the European MES-2 subset
On 18/07/2003 06:21, Philippe Verdy wrote: But for these Asian languages, I think it's best to have fonts designed to handle correctly their corresponding scripts, instead of a giant font poorly hinted for readability at small sizes, and without support of common ligatures. Agreed. Giant fonts have their uses, e.g. Arial Unicode MS and Code2000 let me get a flavour of complex script pages which I browse to on the Internet, often by mistake, without having to install special fonts for scripts I don't read. But publication of official documents is not one of those uses. Software needs to include good font substitution procedures. -- Peter Kirk [EMAIL PROTECTED] http://web.onetel.net.uk/~peterkirk/
Re: About the European MES-2 subset
Peter Kirk scripsit: Agreed. Giant fonts have their uses, e.g. Arial Unicode MS and Code2000 let me get a flavour of complex script pages which I browse to on the Internet, often by mistake, without having to install special fonts for scripts I don't read. However, a font like Last Resort (the world's smallest giant font, as it were) does that just about as well. For my own purposes, I'd like to see more comprehensive Latin-script fonts with all combining characters working. -- Do I contradict myself?John Cowan Very well then, I contradict myself.[EMAIL PROTECTED] I am large, I contain multitudes. http://www.ccil.org/~cowan --Walt Whitman, _Leaves of Grass_ http://www.reutershealth.com
Re: About the European MES-2 subset (was: PUA Audio Description,Subtitle, Signing)
At 13:35 +0200 2003-07-18, Philippe Verdy wrote: I note that you prefer the European Multilingual Subset to MES-2. Is it an extended set that includes MES-2, and fills the holes by using all characters defined in blocks of some version of the Unicode set? It is script-based, not character based. It includes all Latin, Greek, Cyrillic, Georgian, and Armenian characters. And is a superset of MES-2. I *prefer* Unicode to any subset thereof. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: About the European MES-2 subset
At 11:28 -0400 2003-07-18, John Cowan wrote: However, a font like Last Resort (the world's smallest giant font, as it were) does that just about as well. While I hate seeing the Last Resort font show up, I love seeing it when it does. :-) S much better than ?. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: About the European MES-2 subset (was: PUA Audio Description,Subtitle, Signing)
At 13:07 +0200 2003-07-18, Kent Karlsson wrote: This is not to say that the MESes are unproblematic. To mention just two points not already mentioned: none of the new math characters are included even in MES-3 (a, b), despite that all math characters were supposed to be included That isn't true. and not even MES-3 covers all official minority languages. What's missing? (But as Philippe states, there are some rather useless characters that have been included for compatibility reasons.) Same goes for Unicode though. :-) -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: About the European MES-2 subset
A question mark is a sign of a bad conversion from Unicode (to a code page that did not contain the character). This would likely happen on the Mac too rather than the Last Resort font, wouldn't it? On Windows, the cannot find a font for it situation is the NULL glyph. The Last Resort font is cool but a Code2000 stab at the actual glyph is (IMHO) cooler than both.:-) MichKa - Original Message - From: Michael Everson [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Friday, July 18, 2003 1:42 PM Subject: Re: About the European MES-2 subset At 11:28 -0400 2003-07-18, John Cowan wrote: However, a font like Last Resort (the world's smallest giant font, as it were) does that just about as well. While I hate seeing the Last Resort font show up, I love seeing it when it does. :-) S much better than ?. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: About the European MES-2 subset
At 15:45 -0700 2003-07-18, Michael \(michka\) Kaplan wrote: A question mark is a sign of a bad conversion from Unicode (to a code page that did not contain the character). This would likely happen on the Mac too rather than the Last Resort font, wouldn't it? No, it wouldn't. A not a character glyph is displayed in the Last Resort font. On Windows, the cannot find a font for it situation is the NULL glyph. Not much netter than ? -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: About the European MES-2 subset
I am pretty sure you have to be wrong here, Michael. Attend me: 1) API converts from Unicode to the wrong code page 2) API does some sort of work with the string 3) API tries to display the string How on earth could it from the Last Resort font, unless it is a generic glyph that contains no script info (which would be no better than a question mark or a NULL glyph) ? In any case, Code2000 giving some glyph for more cases is still a better solution. MichKa - Original Message - From: Michael Everson [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Friday, July 18, 2003 4:16 PM Subject: Re: About the European MES-2 subset At 15:45 -0700 2003-07-18, Michael \(michka\) Kaplan wrote: A question mark is a sign of a bad conversion from Unicode (to a code page that did not contain the character). This would likely happen on the Mac too rather than the Last Resort font, wouldn't it? No, it wouldn't. A not a character glyph is displayed in the Last Resort font. On Windows, the cannot find a font for it situation is the NULL glyph. Not much netter than ? -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: About the European MES-2 subset
Michael (michka) Kaplan scripsit: In any case, Code2000 giving some glyph for more cases is still a better solution. In any case, if you cannot read any of the languages that use a given script, you are unlikely to care much what glyph appears, and if it turns out that you do care, the LR font gives you a clue about which font you ought to install. Seeing hanzi, hangeul, etc. gets old when you a) can't read the text and b) suspect it is spam anyhow. -- John Cowan [EMAIL PROTECTED] http://www.ccil.org/~cowan Raffiniert ist der Herrgott, aber boshaft ist er nicht. --Albert Einstein
Re: About the European MES-2 subset (was: PUA Audio Description, Subtitle, Signing)
On Thursday, July 17, 2003 9:23 PM, Michael Everson [EMAIL PROTECTED] wrote: At 17:01 +0100 2003-07-17, William Overington wrote: Now, I have never heard of the MES-2 whatever that is. However, I do not have deep knowledge of the various standards which exist. Could you possibly say some more about MES-2 please. 282 MES-2 is specified by the following ranges of code positions as indicated for each row. Rows: Positions (cells) 00: 20-7E A0-FF 01: 00-7F 8F 92 B7 DE-EF FA-FF 02: 18-1B 1E-1F 59 7C 92 BB-BD C6-C7 C9 D8-DD EE 03: 74-75 7A 7E 84-8A 8C 8E-A1 A3-CE D7 DA-E1 04: 00-5F 90-C4 C7-C8 CB-CC D0-EB EE-F5 F8-F9 1E: 02-03 0A-0B 1E-1F 40-41 56-57 60-61 6A-6B 80-85 9B F2-F3 1F: 00-15 18-1D 20-45 48-4D 50-57 59 5B 5D 5F-7D 80-B4 B6-C4 C6-D3 D6-DB DD-EF F2-F4 F6-FE 20: 13-15 17-1E 20-22 26 30 32-33 39-3A 3C 3E 44 4A 7F 82 A3-A4 A7 AC AF 21: 05 16 22 26 5B-5E 90-95 A8 22: 00 02-03 06 08-09 0F 11-12 19-1A 1E-1F 27-2B 48 59 60-61 64-65 82-83 95 97 23: 02 10 20-21 29-2A 25: 00 02 0C 10 14 18 1C 24 2C 34 3C 50-6C 80 84 88 8C 90-93 A0 AC B2 BA BC C4 CA-CB D8-D9 26: 3A-3C 40 42 60 63 65-66 6A-6B FB: 01-02 FF: FD As most of these characters are canonically decomposable, shouldn't this list include also the decomposed characters? Why is row 03 so resticted? Shouldn't it include those accents and diacritics that are used by other characters once canonically decomposed? Or does it imply that MES-2 is only supposed to use strings if NFC form? Also, is this list under full closure with existing character properties, like NFKD decompositions, and case mappings? -- Philippe. Spams non tolérés: tout message non sollicité sera rapporté à vos fournisseurs de services Internet.
Re: About the European MES-2 subset (was: PUA Audio Description, Subtitle, Signing)
282 MES-2 is specified by the following ranges of code positions as indicated for each row... Philippe Verdy asked: As most of these characters are canonically decomposable, shouldn't this list include also the decomposed characters? Why is row 03 so resticted? Shouldn't it include those accents and diacritics that are used by other characters once canonically decomposed? Or does it imply that MES-2 is only supposed to use strings if NFC form? MES-2 (and all the rest of the Multilingual European Subsets) are a CEN construct. See the CEN Workshop Agreement, CWA 13873:2000 posted at Michael Everson's site: http://www.evertype.com/standards/iso10646/pdf/cwa13873.pdf Among other things, that CWA states: This CWA does *not* specify any encoding of the European Subsets. so conceptually it is more like a repertoire listing. MES-2 is formally listed in 10646 as one of the normative subsets there, but since 10646 has no concepts of decomposition, normalization, or equivalence, the fact that MES-2 contains precomposed characters but not their decompositions or the relevant combining accents is formally irrelevant. The Unicode Standard does not make subsets a normative construct for that standard and doesn't even mention MES-2. Conformance to 10646 doesn't require you to make use of its subsets, but if anyone is worried about the articulation of the standards, the Unicode Standard itself formally consists of Subset 305 of 10646:2003, namely the UNICODE 4.0 subset -- the subset which contains *all* of the encoded characters of 10646:2003. Think of the Multilingual European Subsets as a kind of way for people in Europe associated with standards organizations and governments to try to communicate with software vendors regarding which user characters they want to ensure are supported by their software. The CWA 13873 contains some questionable presuppositions about how software vendors are actually proceeding to roll out their Unicode support, but the intent of the CWA is clear: It is estimated that implementing the full character set of the UCS may be costly in the first stages of UCS use, and that many manufacturers will implement in subset-stages. To ensure that a common subset usable to the vast majority of European users be available for a reasonable price, and as a guide to manufacturers, it will be helpful to specify, to users and procurers of systems, European subsets of the UCS encompassing the characters for use in European languages as well as other frequently used and specialist characters. Also, is this list under full closure with existing character properties, like NFKD decompositions, and case mappings? MES-2 is clearly *not* closed under NFD, NFKD, or NFKC normalizations. Although less obvious, it is also not closed under NFC normalization. For example, it includes the angle brackets U+2329, U+232A, but not their canonical equivalents, U+3008, U+3009. There are also some characters outside the MES-2 repertoire where NFC(x) *is* in the MES-2 repertoire. Singleton canonical equivalences like U+212B ANGSTROM SIGN come to mind, for example. I haven't checked on case mappings and case foldings, but would not be too surprised to find an anomaly or two there, as well. MES-2 was not designed by the UTC, nor did it take any of these considerations into account. It is not really an appropriate construct for the Unicode Standard. A more meaningful way to think of it is: if you want to sell software in Europe, you better be able to input and display all the characters we Europeans have in this list. --Ken
Re: About the European MES-2 subset (was: PUA Audio Description, Subtitle, Signing)
On Friday, July 18, 2003 2:18 AM, Kenneth Whistler [EMAIL PROTECTED] wrote: MES-2 was not designed by the UTC, nor did it take any of these considerations into account. It is not really an appropriate construct for the Unicode Standard. A more meaningful way to think of it is: if you want to sell software in Europe, you better be able to input and display all the characters we Europeans have in this list. I interpret it like this way: MES-2 is a collection of characters independant of their actual encoding. To support MES-2 in a Unicode-compliant application, extra characters need to be added, notably if the minimum requirement for information interchange is the NFC form used by XML and HTML related standards. It would be interesting to inform CEN about how MES-2 can be documented to comply with all normative Unicode algorithms, and the minimum is to ensure the NFC closure of this subset, which should have better not included compatibility characters canonically decomposed to singleton decompositions, and should now reintegrate the missing NFC form. For obvious reasons, the case mappings should also be closed, but not necassarily compatibility decompositions, or characters needed for the NFD form (notably combining diacritics, which may be added only on applications that can process and recompose them on the when querying supported precomposed characters in fonts). Does the default TrueType fonts for Windows support the whole MES-2 repertoire (Times New Roman, Arial and Courrier New), including on Windows 95 without Uniscribe installed and used? In practice, MES-2 support will always need additional characters to ensure the minimum closures, and ISO10646 should work with CEN to fix their set in a revision. -- Philippe. Spams non tolérés: tout message non sollicité sera rapporté à vos fournisseurs de services Internet.