Re: About the European MES-2 subset

2003-07-22 Thread jameskass
.
Michael Everson wrote,

 I wasn't talking about that, but if you'd like my opinion, I hate that J too.

Apathy, intolerance, bigotry, death, taxation, ignorance, oppression...

Surely we can reserve our hatred for targets more worthy than
a colleague's variant glyph preferences.

Regards,

James Kass
.



Re: Last Resort Glyphs (was: About the European MES-2 subset)

2003-07-21 Thread Peter_Constable

Philippe Verdy wrote on 07/20/2003 08:37:19 AM:

  What would be the purpose of encoding these? I can't think of any.
  They certainly don't need to be encoded as distinct characters to use
  in a Last Resort font.

 Mostly for documentation purpose

Since Unicode is not a glyph encoding standard, there's no need for it to
assign glyphs to codepoints for documentation purposes.


- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485






RE: About the European MES-2 subset

2003-07-20 Thread Kent Karlsson

 This is not to say that the MESes are unproblematic.  To mention just
 two points not already mentioned: none of the new math characters
 are included even in MES-3 (a, b), despite that all math characters
 were supposed to be included

Michael E responded:

 That isn't true.

Eeh, well, disregarding some CJK compat chars that have
general category Sm (which are rightly excluded from the MESes), 
the following blocks (or formally, closely corresponding
collections) are missing from MES-3A (the largest of the MESes):

27C0..27EF; Miscellaneous Mathematical Symbols-A
27F0..27FF; Supplemental Arrows-A
2900..297F; Supplemental Arrows-B
2980..29FF; Miscellaneous Mathematical Symbols-B
2A00..2AFF; Supplemental Mathematical Operators
2B00..2BFF; Miscellaneous Symbols and Arrows
and (much as I dislike them, and they haven't GC Sm but L{u,l})
1D400..1D7FF; Mathematical Alphanumeric Symbols

(MES-3A lists collections rather than individual characters, and
includes some code points are not (yet) bound to any character.)

But are you saying that it was not the the intent to include all
math characters?  But all the old ones (the ones that were
included in 10646 at the time the MESes were deviced) are
included even in the smaller MES-2.

 and not even MES-3 covers all official minority languages.
 
 What's missing?

Hebrew, used for Yiddish, which is now an official minority
language in Sweden.  (Though various languages written with
the Arabic script are more common in official information to
the public.)  But I understand that was excluded since (in practice)
anything bidi was excluded from the MESes.

Also of European interest, though not for a language per se,
are Braille patterns and modern musical symbols.  (Not for all
European fonts, though, but the same goes for math symbols.)

/kent k





Re: Last Resort Glyphs (was: About the European MES-2 subset)

2003-07-20 Thread Michael Everson
At 23:34 +0200 2003-07-19, Philippe Verdy wrote:

I'm still convinced that these glyphs are much more informative than 
a default glyph showing a ?, a white rectangle, or a black losange 
with a mirrored white ?...
Of course they are.

And Unicode also uses these glyphs in the index page for its charmaps,
You mean for its charts. Please.

but they are shown as poor bitmaps (may be the PDF or book version 
use your glyphs in a document-embedded font)
That page is in HTML.

How were your glyphs contributed?
I, uh, drew them.

With SVG graphics containing character objects and drawing primitives
I have no idea what this means. I used Fontographer.

(it seems the simplest way to derive them, using the table shown in 
Apple's web page, with some exceptions for unassigned, reserved, 
forbidden or
surrogates symbols which require a distinct design)?
You can't derive these. You have to draw them individually.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: Last Resort Glyphs (was: About the European MES-2 subset)

2003-07-20 Thread Philippe Verdy
On Sunday, July 20, 2003 2:21 PM, Michael Everson [EMAIL PROTECTED] wrote:

  With SVG graphics containing character objects and drawing
  primitives 
 
 I have no idea what this means. I used Fontographer.

SVG is a W3C-promoted standard for Scalable Vector Graphics,
based on a XML language, and allowing to describe vector
graphics with 2D primitives, and it can be used to produce
custom fonts of symbols, in a more appealing way than with
bitmaps.

A SVG graphic can be used at the source URL of an img /
or object / element within HTML. Most vectorial graphic tool
can generate or conert their proprietary format with SVG, used
as a lingua franca for vector graphics interchanges (deprecating
legacy proprietary formats like MacDraw and WMF, or the many
other formats created by every drawing tool on the market).

SVG graphics are now very popular and recognized by many
publishing layout engines, and they are great for many websites
that wish to compute and generate dynamic graphics (because
these graphics can be updated online with its DOM tree, and
easily generated from templates by XSLT processors).

The palette of SVG primitives is rich and includes many
presentation features (including colors, shading, transparency
effects, regions combining operators). Recent versions of
MS-Office use SVG within their new XML document format to
embed graphics, or presentation effects, without the limitations
of HTML.

When I look at the Apple's Developer page, all what I see in
the table of glyphs and in the description can be represented
with a SVG graphic, including Unicode-encoded text primitives
for the representative glyph chosen in their table. In a first
approach, each defined PostScript name can be bound to
a SVG filename, and a font can be made from it, by packing
all these SVG in a ZIP archive, which can also contain
description tables. Then any font format can be derived from
this editable format.





Re: Last Resort Glyphs (was: About the European MES-2 subset)

2003-07-20 Thread Peter_Constable
Philippe Verdy wrote on 07/19/2003 01:24:48 PM:

 Isn't this page creating the idea for a specific block of
 script-representative glyphs, that could be mapped in plane 14
 as special supplementary characters ?

What would be the purpose of encoding these? I can't think of any. They 
certainly don't need to be encoded as distinct characters to use in a Last 
Resort font.


- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485




Re: About the European MES-2 subset

2003-07-20 Thread Peter_Constable
 On Windows, the cannot find a font for it situation is the NULL glyph. 
The
 Last Resort font is cool but a Code2000 stab at the actual glyph is 
(IMHO)
 cooler than both.:-)

Then wouldn't it make sense for Arial Unicode MS to be included with 
Windows rather than just with Office?



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485




Re: Last Resort Glyphs (was: About the European MES-2 subset)

2003-07-20 Thread Philippe Verdy
On Sunday, July 20, 2003 3:20 PM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

 Philippe Verdy wrote on 07/19/2003 01:24:48 PM:
  Isn't this page creating the idea for a specific block of
  script-representative glyphs, that could be mapped in plane 14
  as special supplementary characters ?
 
 What would be the purpose of encoding these? I can't think of any.
 They certainly don't need to be encoded as distinct characters to use
 in a Last Resort font.

Mostly for documentation purpose, but also in most system that want to be more 
informative to users missing a font for a particular script. Michael also judged it to 
be useful enough to create such a font for Apple, and Apple thought it would be useful 
for its Mac users. From usefulness comes the use, and thus some legitimacy to encode 
it within text, as special symbols that should not be represented as the normal glyph, 
but with these symbols. It's also a fact that these symbols are used (as bitmaps) in 
the online Unicode charts (not charmaps, sorry for the wrong term), and probably with 
the Michael's custom font in the published Unicode book.

It's true that one can make a documentation without actually using a font with 
assigned codepoints for them. (A collection of SVG graphic could work for publishing 
purposes).

But editing the cmap of a TrueType font to include all possible codepoints would 
require to map all the 17 planes in the cmap, and unless the cmap is compressed, this 
would require 1,114,112 mappings, or more than 2MB only for the cmap.

This is probably too much for a default font, even if the system uses paging to access 
this TrueType font. In fact, a font with only the single glyphs ordered by allocation 
date for the corresponding block, and an extra table with a a cmap-like table using 
ranges of codepoints instead of simple entries would probably make things better (of 
course this would be an extension to the standard tables used by classic fonts). 
Without such TTF extension, it would be simpler to map only surrogates, and thus use 
only 128KB
for a UTF-16 based cmap. I don't know the internals of the OpenType format, may be 
such compressed format for internal tables already exists that allows representing 
ranges, or there is space with table IDs allowed for application-specific custom 
tables.




Re: Last Resort Glyphs (was: About the European MES-2 subset)

2003-07-20 Thread Michael Everson
At 08:20 -0500 2003-07-20, [EMAIL PROTECTED] wrote:

What would be the purpose of encoding these? I can't think of any. 
They  certainly don't need to be encoded as distinct characters to 
use in a Last  Resort font.
I am certain more people want to interchange the LITTER DUDE than 
would want to interchange script block indicators.

(Ken suggested offline that this name might be better-received than 
the DO NOT LITTER SIGN)
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: About the European MES-2 subset

2003-07-20 Thread Michael \(michka\) Kaplan
Well, I thought Arial Unicode MS is a little pricey for just putting it
anywhere? I may be wrong here (and I have no idea how much is costs,
really), but the huge size compared to megafonts like Code2000, which is
based in part on the rich Arial typeface heritage, also makes it a font of
some value and a legitimate value add where it is...

Of course, all of this is IMHO, as I have no real knowledge of what Office
or even nearby Typography think about any of these things

MichKa [MS]

- Original Message - 
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Sunday, July 20, 2003 6:20 AM
Subject: Re: About the European MES-2 subset


  On Windows, the cannot find a font for it situation is the NULL glyph.
 The
  Last Resort font is cool but a Code2000 stab at the actual glyph is
 (IMHO)
  cooler than both.:-)

 Then wouldn't it make sense for Arial Unicode MS to be included with
 Windows rather than just with Office?



 - Peter


 --
-
 Peter Constable

 Non-Roman Script Initiative, SIL International
 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
 Tel: +1 972 708 7485







Re: Last Resort Glyphs (was: About the European MES-2 subset)

2003-07-20 Thread John H. Jenkins
On Saturday, July 19, 2003, at 1:15 PM, Michael Everson wrote:

So fonts containing these glyphs could be designed to display these 
glyphs, in a way similar to the current assignment of control 
pictures.
Um, that's what the Last Resort font does, outside of Unicode encoding 
space. (I don't think PUA characters are used, actually, but I could 
be wrong.

No, it uses the acutal Unicode characters, and just has a huge cmap 
that maps everything in Unicode to the glyph for its block.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jhjenkins/



Re: Last Resort Glyphs (was: About the European MES-2 subset)

2003-07-20 Thread John H. Jenkins
On Sunday, July 20, 2003, at 7:37 AM, Philippe Verdy wrote:

Mostly for documentation purpose, but also in most system that want to 
be more informative to users missing a font for a particular script. 
Michael also judged it to be useful enough to create such a font for 
Apple, and Apple thought it would be useful for its Mac users.
Er, no.  Apple thought it would be useful for its Mac users and 
commissioned Michael to make glyphs.  (And I personally think he's done 
an excellent job.)

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jhjenkins/



Re: About the European MES-2 subset

2003-07-20 Thread John H. Jenkins
On Friday, July 18, 2003, at 4:45 PM, Michael (michka) Kaplan wrote:

A question mark is a sign of a bad conversion from Unicode (to a code 
page
that did not contain the character). This would likely happen on the 
Mac too
rather than the Last Resort font, wouldn't it?

MS Explorer on the Mac converts Unicode to old Mac scripts which it 
then renders.  That's why all the question marks when the page is 
looked at with MS Explorer.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jhjenkins/



Re: Last Resort Glyphs (was: About the European MES-2 subset)

2003-07-20 Thread Rick McGowan
  What would be the purpose of encoding these? I can't think of any.
  They certainly don't need to be encoded as distinct characters to use
  in a Last Resort font.

 Mostly for documentation purpose,

Why bother to encode them as distinct characters? For purposes of  
documentation isn't a good reason to encode these things, which are simply  
a set of fall-back glyphs for user convenience to show what isn't  
installed! If you want documentation for the Last Resort font, just make  
documentation (or ask Apple to make some).

Rick



Re: Last Resort Glyphs (was: About the European MES-2 subset)

2003-07-20 Thread Michael Everson
At 09:56 -0600 2003-07-20, John H. Jenkins wrote:

No, it uses the acutal Unicode characters, and just has a huge cmap 
that maps everything in Unicode to the glyph for its block.
That is just so cool. :-)
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: About the European MES-2 subset

2003-07-20 Thread Peter Kirk
On 19/07/2003 17:32, John Cowan wrote:

Peter Kirk scripsit:

 

But it can be useful to know whether what you are getting is hangul etc, 
or an Indian script, or some other script you don't know, or some 
symbols or mathematical codes, or else the result of some kind of 
encoding conversion error.
   

Precisely where the Last Resort font shines, without carrying the
overhead in glyph images of a normal giant font.
 

Indeed. Where can I get the Last Resort font for Windows (2000)? If the 
answer is nowhere, I guess I am stuck with Arial Unicode MS or the 
horrible-looking (the J always grates!) Code2000.

--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/




Re: Last Resort Glyphs (was: About the European MES-2 subset)

2003-07-20 Thread Peter Kirk
On 20/07/2003 06:20, [EMAIL PROTECTED] wrote:

Philippe Verdy wrote on 07/19/2003 01:24:48 PM:

 

Isn't this page creating the idea for a specific block of
script-representative glyphs, that could be mapped in plane 14
as special supplementary characters ?
   

What would be the purpose of encoding these? I can't think of any. They 
certainly don't need to be encoded as distinct characters to use in a Last 
Resort font.

- Peter

 

One good reason would be so that a page like 
http://www.unicode.org/charts/ can be represented without having to use 
lots of .gifs, so for efficiency, searchability etc. Which is pretty 
much the same reason for defining any Unicode characters at all, given 
that documents and web pages can always be created, though inefficiently 
and unsearchably, from lots of images.

--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/




Re: About the European MES-2 subset

2003-07-20 Thread Michael Everson
At 12:38 -0700 2003-07-20, Peter Kirk wrote:

Indeed. Where can I get the Last Resort font for Windows (2000)? If 
the answer is nowhere, I guess I am stuck with Arial Unicode MS or 
the horrible-looking (the J always grates!) Code2000.
I'll go have a chat with some of my Apple colleagues about this.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: About the European MES-2 subset

2003-07-20 Thread jameskass
 At 12:38 -0700 2003-07-20, Peter Kirk wrote:
 
 Indeed. Where can I get the Last Resort font for Windows (2000)? If 
 the answer is nowhere, I guess I am stuck with Arial Unicode MS or 
 the horrible-looking (the J always grates!) Code2000.
 
 I'll go have a chat with some of my Apple colleagues about this.

It's unlikely that your Apple colleagues can do anything for
the J in Code2000.

Best regards,

James Kass
.



Re: About the European MES-2 subset

2003-07-20 Thread Michael Everson
At 20:50 + 2003-07-20, [EMAIL PROTECTED] wrote:
  At 12:38 -0700 2003-07-20, Peter Kirk wrote:
 Indeed. Where can I get the Last Resort font for Windows (2000)? If
 the answer is nowhere, I guess I am stuck with Arial Unicode MS or
 the horrible-looking (the J always grates!) Code2000.
 I'll go have a chat with some of my Apple colleagues about this.
It's unlikely that your Apple colleagues can do anything for
the J in Code2000.
I wasn't talking about that, but if you'd like my opinion, I hate that J too.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: About the European MES-2 subset

2003-07-20 Thread Peter Kirk
On 20/07/2003 13:50, [EMAIL PROTECTED] wrote:

At 12:38 -0700 2003-07-20, Peter Kirk wrote:

   

Indeed. Where can I get the Last Resort font for Windows (2000)? If 
the answer is nowhere, I guess I am stuck with Arial Unicode MS or 
the horrible-looking (the J always grates!) Code2000.
 

I'll go have a chat with some of my Apple colleagues about this.
   

It's unlikely that your Apple colleagues can do anything for
the J in Code2000.
Best regards,

James Kass
.
 

James, just to clarify since you are here: I am very grateful for the 
fonts Code2000 and Code2001 and that you have made these so easily 
available at http://home.att.net/~jameskass/. I don't like some of the 
glyph shapes, especially the J with a cross-bar like a T. But it is a 
lot better than nothing. When I need nice glyphs for particular Unicode 
ranges, I look elsewhere, though sometimes in vain. For example, who 
else even tries to cover the mathematial symbols in plane 1, at least in 
a downloadable font?

--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/




Re: About the European MES-2 subset

2003-07-19 Thread Peter Kirk
On 18/07/2003 17:42, John Cowan wrote:

Seeing hanzi, hangeul, etc. gets old when you a) can't read the text
and b) suspect it is spam anyhow.
 

But it can be useful to know whether what you are getting is hangul etc, 
or an Indian script, or some other script you don't know, or some 
symbols or mathematical codes, or else the result of some kind of 
encoding conversion error.

--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/




Re: About the European MES-2 subset (was: PUA Audio Description, Subtitle, Signing)

2003-07-19 Thread Philippe Verdy
On Friday, July 18, 2003 10:18 PM, Michael Everson [EMAIL PROTECTED] wrote:

 I *prefer* Unicode to any subset thereof.

Why such preference? Unicode does not define the charset (which are defined by 
ISO10646), but character properties and related algorithms, and (in cooperation with 
ISO10646) their codepoint assignments.

For me, Unicode is NOT a character set, but an encoded character set, with a small but 
important nuance: You need to specify a version after Unicode to indicate the 
character set. So Unicode 4.0 is a character set, and a superset of Unicode 3.2, but 
Unicode alone is not.

If you just look at this definition, you cannot prefer Unicode to any subset, 
because Unicode is just a name of a collection of standards and a collection of 
character sets and algorithms, and already is a subset of the next version... If you 
cannot support the idea of subsets, then don't use Unicode, or wait that the Unicode 
standard is definitely closed, or permanently consider that is repertoire is now 
closed and no more characters will be added... Of course you would be wrong.

MES-2 or its MES extension is a character set (like most legacy encodings in IANA 
which are also encoded character sets). In practice, nobody can live and implement any 
software without clearly bounded sets of characters. So versioning is absolutely 
necessary to fix these bounds in terms of implementation levels.

-- 
Philippe.
Spams non tolérés: tout message non sollicité sera
rapporté à vos fournisseurs de services Internet.




Re: About the European MES-2 subset (was: PUA Audio Description,Subtitle, Signing)

2003-07-19 Thread Michael Everson
At 15:23 +0200 2003-07-19, Philippe Verdy wrote:
Unicode does not define the charset (which are defined by ISO10646),
That isn't true. They both define the same character set. (I will not 
use the term charset.)

but character properties and related algorithms, and (in cooperation 
with ISO10646) their codepoint assignments.
The code position assignments are (formally) assigned by WG2, but 
there is consensus between UTC and WG2 on this matter.

For me, Unicode is NOT a character set, but an encoded character 
set, with a small but important nuance: You need to specify a 
version after Unicode to indicate the character set. So Unicode 4.0 
is a character set, and a superset of Unicode 3.2, but Unicode alone 
is not.
To me, Unicode refers to the most recent version. :-)

If you just look at this definition, you cannot prefer Unicode to 
any subset,
Yes, I can.

because Unicode is just a name of a collection of standards and a 
collection of character sets and algorithms
That isn't true. If you think this is true, you really have a lot to 
learn about Unicode.

and already is a subset of the next version... If you cannot support 
the idea of subsets, then don't use Unicode, or wait that the 
Unicode standard is definitely closed, or permanently consider that 
is repertoire is now closed and no more characters will be added... 
Of course you would be wrong.
I think you mistook me.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: About the European MES-2 subset

2003-07-19 Thread Michael Everson
At 16:41 -0700 2003-07-18, Michael \(michka\) Kaplan wrote:
I am pretty sure you have to be wrong here, Michael. Attend me:

1) API converts from Unicode to the wrong code page
2) API does some sort of work with the string
3) API tries to display the string
How on earth could it from the Last Resort font, unless it is a generic
glyph that contains no script info (which would be no better than a question
mark or a NULL glyph) ?
Hm. See http://developer.apple.com/fonts/LastResortFont/ where it 
shows glyphs for illegal characters (FFFE/ etc.) as well as 
undefined characters (valid code positions which have not been 
assigned). I thought somehow that there was a glyph for broken 
characters (characters that were just plain wrong) as well.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Last Resort Glyphs (was: About the European MES-2 subset)

2003-07-19 Thread Philippe Verdy
On Saturday, July 19, 2003 1:55 PM, Michael Everson [EMAIL PROTECTED] wrote:

 Hm. See http://developer.apple.com/fonts/LastResortFont/ where it
 shows glyphs for illegal characters (FFFE/ etc.) as well as
 undefined characters (valid code positions which have not been
 assigned). I thought somehow that there was a glyph for broken
 characters (characters that were just plain wrong) as well.

Isn't this page creating the idea for a specific block of
script-representative glyphs, that could be mapped in plane 14
as special supplementary characters ?

If the estimated number of Unicode blocks is expected to be
under 1024, this block would use one special character to
represent the glyph, i.e. not a control character, but a symbol
representative of each assigned Unicode block. If such
assignment is not easy to estimate now, glyphs for scripts
should be assigned in the order of their definition in successive
versions of Unicode).

So fonts containing these glyphs could be designed to display
these glyphs, in a way similar to the current assignment of control
pictures. This page already gives the names of the characters
according to the official names of scripts, but a more uniform
name than the Postscript name could be used, such as:

UNASSIGNED BLOCK SYMBOL,
UNASSIGNED CHARACTER SYMBOL,
ILLEGAL CHARACTER SYMBOL,
then...
BASIC LATIN SCRIPT SYMBOL,
EXTENDED LATIN 1 SCRIPT SYMBOL,
...

By itself, this Apple Developers page is nearly the base for such
proposal. If needed, the Unicode blocks.txt could specify additional
columns to specify the assignment of each script block, with
special entries for the symbol used to represent unassigned
characters in assigned blocks, or unassigned blocks.

-- 
Philippe.
Spams non tolérés: tout message non sollicité sera
rapporté à vos fournisseurs de services Internet.




Re: Last Resort Glyphs (was: About the European MES-2 subset)

2003-07-19 Thread Michael Everson
At 20:24 +0200 2003-07-19, Philippe Verdy wrote:

Isn't this page creating the idea for a specific block of 
script-representative glyphs, that could be mapped in plane 14 as 
special supplementary characters ?
Good heavens, no.

It's one thing for me to update this font regularly for Apple when 
new blocks get added to the standard.

It's quite another thing to suggest that we should have to add, 
formally, a new block symbol to some block in Plane 14 every time we 
add a new block to the standard.

Isn't it?

Surely the correct thing to do is to implement Last Resort support 
for different platforms as Apple indicates using those character 
names.

So fonts containing these glyphs could be designed to display these 
glyphs, in a way similar to the current assignment of control 
pictures.
Um, that's what the Last Resort font does, outside of Unicode 
encoding space. (I don't think PUA characters are used, actually, but 
I could be wrong.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Last Resort Glyphs (was: About the European MES-2 subset)

2003-07-19 Thread Philippe Verdy
On Saturday, July 19, 2003 9:15 PM, Michael Everson [EMAIL PROTECTED] wrote:

  So fonts containing these glyphs could be designed to display these
  glyphs, in a way similar to the current assignment of control
  pictures.
 
 Um, that's what the Last Resort font does, outside of Unicode
 encoding space. (I don't think PUA characters are used, actually, but
 I could be wrong.

I see that Apple maps it to a PostScript dictionary namespace, but this
seems limitative for the implementation, when almost all foundries are
converting now their Type1 fonts to OpenType, which is much more
efficient, but still requires some entry point with a numeric assignment
(a glyph ID will still require an input codepoint to seek relevant glyphs,
and a PUA still requires a table of conversion from ranges to that 
font-specific PUA, and a TrueType font not marked as Unicode
compatible would use direct glyph IDs from a externally defined
character set similar to legacy charsets, except that they can't be
mapped to Unicode).

I'm still convinced that these glyphs are much more informative than
a default glyph showing a ?, a white rectangle, or a black losange
with a mirrored white ?... And Unicode also uses these glyphs
in the index page for its charmaps, but they are shown as poor
bitmaps (may be the PDF or book version use your glyphs in
a document-embedded font)

How were your glyphs contributed? With SVG graphics containing
character objects and drawing primitives (it seems the simplest
way to derive them, using the table shown in Apple's web page,
with some exceptions for unassigned, reserved, forbidden or
surrogates symbols which require a distinct design)?

-- 
Philippe.
Spams non tolérés: tout message non sollicité sera
rapporté à vos fournisseurs de services Internet.




Re: Last Resort Glyphs (was: About the European MES-2 subset)

2003-07-19 Thread Deborah Goldsmith
Apple's version of the Last Resort font is a (relatively) normal font. 
It just has a cmap that maps lots and lots of characters to the same 
glyph. :-)

Deborah Goldsmith
Manager, Fonts / Unicode Liaison
Apple Computer, Inc.
[EMAIL PROTECTED]
On Saturday, July 19, 2003, at 12:15  PM, Michael Everson wrote:

Um, that's what the Last Resort font does, outside of Unicode encoding 
space. (I don't think PUA characters are used, actually, but I could 
be wrong.





Re: About the European MES-2 subset

2003-07-19 Thread John Cowan
Peter Kirk scripsit:

 But it can be useful to know whether what you are getting is hangul etc, 
 or an Indian script, or some other script you don't know, or some 
 symbols or mathematical codes, or else the result of some kind of 
 encoding conversion error.

Precisely where the Last Resort font shines, without carrying the
overhead in glyph images of a normal giant font.

-- 
May the hair on your toes never fall out! John Cowan
--Thorin Oakenshield (to Bilbo) [EMAIL PROTECTED]



Re: About the European MES-2 subset (was: PUA Audio Description,Subtitle, Signing)

2003-07-18 Thread Michael Everson
At 00:57 +0200 2003-07-18, Philippe Verdy wrote:

Why is row 03 so resticted? Shouldn't it include those accents and 
diacritics that are used by other characters once canonically 
decomposed? Or does it imply that MES-2 is only supposed to use 
strings if NFC form?

Also, is this list under full closure with existing character properties, like
NFKD decompositions, and case mappings?
The MES-2 is what it is, and was developed at the time when it was. 
It is thought to be a minumum requirement for European requirements, 
and is certainly a lot better than that old Adobe glyph list that was 
supported earlier on. It doesn't depend on very smart fonts.

Personally I prefer the Multilingual European Subset.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: About the European MES-2 subset (was: PUA Audio Description, Subtitle, Signing)

2003-07-18 Thread Philippe Verdy
On Friday, July 18, 2003 7:36 AM, Michael Everson [EMAIL PROTECTED] wrote:

 At 00:57 +0200 2003-07-18, Philippe Verdy wrote:
 
  Why is row 03 so resticted? Shouldn't it include those accents and
  diacritics that are used by other characters once canonically
  decomposed? Or does it imply that MES-2 is only supposed to use
  strings if NFC form?
  
  Also, is this list under full closure with existing character
  properties, like NFKD decompositions, and case mappings?
 
 The MES-2 is what it is, and was developed at the time when it was.
 It is thought to be a minumum requirement for European requirements,
 and is certainly a lot better than that old Adobe glyph list that was
 supported earlier on. It doesn't depend on very smart fonts.
 
 Personally I prefer the Multilingual European Subset.

Is there some work at CEN to align its MES-2 subset into a
revized (MES-2.1 ???) which not only takes into consideration the
ISO10646 reference but also its Unicode properties to make this set
self-closed, and actually implementable, at least with NFC closure
and case-mappings closure?

Support for NFKC closure should then be added in a next step, which
could optionally specify support for the corresponding decompositions
(but this would include combining characters, and would extend the
number of precomposed characters in NFC form to include in the
repertoire).

I don't think it's up to Unicode to do this work, but CEN should be
contacted to perform this job, or some vendor or open-sourcers
may have done it and published it.

I still note that modern Hebrew and Arabic are excluded from MES-2,
as they are not used in any official language in the European Union
or EFTA, or future EU candidates. But They are certainly of great
interest for countries with which the EU is a major partner, and which
are using these scripts. In some future, it would be needed to include
support for modern Georgian (a subset of U+10A0..U+10FF), and modern
Armenian (a subset of U+0530..U+058F), as well as some characters
from Cyrillic Supplementary (in U+0500..U+052F).

On the opposite, I don't understand why MES-2 included characters
in row U+25xx (Box Drawing, Block Elements, Geometric Shapes),
which are not strictly needed for text purpose (notably legal publications
of the E.U., which should better use markup systems), and the two
Alphabetic Presentation Forms U+FB01..U+FB02 (fi and fl
ligatures) which are really unneeded, even for legal purposes, or they
should have been coherent and included ff, ffi, ffl ligatures...

I suppose that this may come from widely used legacy encodings in
some EU+EFTA+European Council countries, but CEN should have
avoided them (they could still be selected by font renderers, if available
in fonts).

-- 
Philippe.
Spams non tolérés: tout message non sollicité sera
rapporté à vos fournisseurs de services Internet.




Re: About the European MES-2 subset (was: PUA Audio Description,Subtitle, Signing)

2003-07-18 Thread Michael Everson
At 12:16 +0200 2003-07-18, Philippe Verdy wrote:

Is there some work at CEN to align its MES-2 subset into a revized 
(MES-2.1 ???) which not only takes into consideration the ISO10646 
reference but also its Unicode properties to make this set 
self-closed, and actually implementable, at least with NFC closure 
and case-mappings closure?
No. The relevant CEN committee is now dormant.

I still note that modern Hebrew and Arabic are excluded from MES-2, 
as they are not used in any official language in the European Union 
or EFTA, or future EU candidates. But They are certainly of great 
interest for countries with which the EU is a major partner, and 
which are using these scripts. In some future, it would be needed to 
include support for modern Georgian (a subset of U+10A0..U+10FF), 
and modern Armenian (a subset of U+0530..U+058F), as well as some 
characters from Cyrillic Supplementary (in U+0500..U+052F).
The European Multilingual Subset supports all of Latin, Greek, 
Cyrillic, and Armenian. Unicode supports Hebrew and Arabic.

On the opposite, I don't understand why MES-2 included characters
in row U+25xx (Box Drawing, Block Elements, Geometric Shapes)
Legacy compatability with IBM and others.

which are not strictly needed for text purpose (notably legal 
publications of the E.U., which should better use markup systems), 
and the two Alphabetic Presentation Forms U+FB01..U+FB02 (fi and 
fl ligatures) which are really unneeded, even for legal purposes, 
or they should have been coherent and included ff, ffi, ffl 
ligatures...
Legacy compatibility with Apple.

I suppose that this may come from widely used legacy encodings in 
some EU+EFTA+European Council countries, but CEN should have avoided 
them (they could still be selected by font renderers, if available 
in fonts).
You are entitled to your opinion. This work was begun and finished long ago.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


RE: About the European MES-2 subset (was: PUA Audio Description, Subtitle, Signing)

2003-07-18 Thread Kent Karlsson

Philippe Verdy wrote:

 MES-2 is a collection of characters independant of their actual
encoding.
 To support MES-2 in a Unicode-compliant application, extra characters
 need to be added, notably if the minimum requirement for information
 interchange is the NFC form used by XML and HTML related standards.

The Unicode normal forms (for a particular version of Unicode) is
defined
for ALL of the characters in that version.  There is no concept of a
Unicode normal form for a subset of the characters in a particular
version.

However, the MESes (there are four of them!) are useful for specifying 
minimum European font coverage, and input method support (the
latter need not be via keyboard).

This is not to say that the MESes are unproblematic.  To mention just
two points not already mentioned: none of the new math characters
are included even in MES-3 (a, b), despite that all math characters
were supposed to be included, and not even MES-3 covers all official
minority languages.

 It would be interesting to inform CEN about how MES-2 can be
 documented to comply with all normative Unicode algorithms, and
 the minimum is to ensure the NFC closure of this subset, which
 should have better not included compatibility characters canonically
 decomposed to singleton decompositions, and should now reintegrate
 the missing NFC form.

I think it is [extremely] unlikely at this point to expect anyone to
change,
or add new, MESes.  Note that implementors are in no way prohibited
from supporting (in fonts, plus rendering software, and some form of
input) more than the MESes state.  (But as Philippe states, there are
some
rather useless characters that have been included for compatibility
reasons.)

/kent k




Re: About the European MES-2 subset

2003-07-18 Thread Peter Kirk
On 18/07/2003 03:16, Philippe Verdy wrote:

I still note that modern Hebrew and Arabic are excluded from MES-2,
as they are not used in any official language in the European Union
or EFTA, or future EU candidates. ...
But they are used in official publications within the EU, those targeted 
at minority communities. But then so are south Asian and east Asian scripts.

... But They are certainly of great
interest for countries with which the EU is a major partner, and which
are using these scripts. In some future, it would be needed to include
support for modern Georgian (a subset of U+10A0..U+10FF), and modern
Armenian (a subset of U+0530..U+058F), as well as some characters
from Cyrillic Supplementary (in U+0500..U+052F).
If this subset is to be enlarged very much, and to require complex 
script rendering etc for its implementation, surely there is little 
point in specifying anything less than the improper (in the mathematical 
sense!) subset which Ken mentioned, i.e. the whole of Unicode.

--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/




Re: About the European MES-2 subset (was: PUA Audio Description, Subtitle, Signing)

2003-07-18 Thread Philippe Verdy
On Friday, July 18, 2003 12:42 PM, Michael Everson [EMAIL PROTECTED] wrote:

 At 12:16 +0200 2003-07-18, Philippe Verdy wrote:
 
  Is there some work at CEN to align its MES-2 subset into a revized
  (MES-2.1 ???) which not only takes into consideration the ISO10646
  reference but also its Unicode properties to make this set
  self-closed, and actually implementable, at least with NFC closure
  and case-mappings closure?
 
 No. The relevant CEN committee is now dormant.

So this work must be done by independant open-sourcers sharing their
experience to allow fonts to be created that are completely compatible
with MES-2. (Here this is my opinion: I think it's stupid to create fonts
that are containing strictly, only but completely the MES-2 set, which
must only be viewed as a minimum set).

I note that Microsoft core fonts for Windows are supporting MES-2, but
in a unrestricted way: other characters are also included, and UniScribe
allows selecting ligatures and rendering combining sequences with
composite glyphs if defined in OpenType fonts, or with a default multi-
glyph stack.

I note that you prefer the European Multilingual Subset to MES-2.
Is it an extended set that includes MES-2, and fills the holes by
using all characters defined in blocks of some version of the Unicode
set?




Re: About the European MES-2 subset

2003-07-18 Thread Philippe Verdy
On Friday, July 18, 2003 1:13 PM, Peter Kirk [EMAIL PROTECTED] wrote:

 On 18/07/2003 03:16, Philippe Verdy wrote:
 
  I still note that modern Hebrew and Arabic are excluded from MES-2,
  as they are not used in any official language in the European Union
  or EFTA, or future EU candidates. ...
  
 But they are used in official publications within the EU, those
 targeted 
 at minority communities. But then so are south Asian and east Asian
 scripts. 

But for these Asian languages, I think it's best to have fonts designed to
handle correctly their corresponding scripts, instead of a giant font poorly
hinted for readability at small sizes, and without support of common
ligatures.

Arabic, Hebrew and Brahmic scripts should better be supported by their
own fonts, rather than partially (for example the inclusion of Brahmic
digits only in Arial Unicode MS was an error, in my opinion, and Microsoft
should have better provided separate fonts for these Brahmic scripts, rather
than specifying that its fonts support these scripts).

  ... But They are certainly of great
  interest for countries with which the EU is a major partner, and
  which 
  are using these scripts. In some future, it would be needed to
  include support for modern Georgian (a subset of U+10A0..U+10FF),
  and modern Armenian (a subset of U+0530..U+058F), as well as some
  characters 
  from Cyrillic Supplementary (in U+0500..U+052F).

For the case of Armenian and Georgian Mkedruli, they do not seem complex
to add in a font.

 If this subset is to be enlarged very much, and to require complex
 script rendering etc for its implementation, surely there is little
 point in specifying anything less than the improper (in the
 mathematical sense!) subset which Ken mentioned, i.e. the whole of
 Unicode. 

I agree with this point. But this is not an excuse to not implement and
support at least the NFC and case mapping closures in a decent font
for any script, even if the script is reduced to letters used in the modern
language.

But some optional ligatures not strictly needed for a set of written
modern languages may strictly be not needed if the font or renderer
supports correct fallback decompositions (for example with fi, fl,
ffi, ffl). What is important here is the legality of the printed text,
so that no confusion is possible for a text written in any language.

One good source of such characters needed for languages can be
found in the Openi18n.org LDML database (notably the ICU section
which is the most complete collection), which contain definitions of
examplarCharacters for each supported language (but there may
exist some omissions). One regret: some characters are used and
examplar but not mandatory to support a language and they should
be listed separately, as well as rare characters if they are used only
in proper names or geographical names or translitterated foreign
words which can often be written with a the common letters with a
phonetic approach.

An example is: Norsk Bokmål, most often transcripted to: norvégien
bokmal or bokmâl in French (where the circumflex is used both as
a way to specify an open and/or lengthened vowel), or translated to:
norvégien classique (by opposition to: norvégien réformé, ou
nouveau norvégien).

So examplarCharacters in a language are a good indication to
indicate the needed characters for a language, even if an official
transliteration rule is used to translate imported foreign words with more
characters.

-- 
Philippe.
Spams non tolérés: tout message non sollicité sera
rapporté à vos fournisseurs de services Internet.




Re: About the European MES-2 subset

2003-07-18 Thread Peter Kirk
On 18/07/2003 06:21, Philippe Verdy wrote:

But for these Asian languages, I think it's best to have fonts designed to
handle correctly their corresponding scripts, instead of a giant font poorly
hinted for readability at small sizes, and without support of common
ligatures.
Agreed. Giant fonts have their uses, e.g. Arial Unicode MS and Code2000 
let me get a flavour of complex script pages which I browse to on the 
Internet, often by mistake, without having to install special fonts for 
scripts I don't read. But publication of official documents is not one 
of those uses. Software needs to include good font substitution procedures.

--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/




Re: About the European MES-2 subset

2003-07-18 Thread John Cowan
Peter Kirk scripsit:

 Agreed. Giant fonts have their uses, e.g. Arial Unicode MS and Code2000 
 let me get a flavour of complex script pages which I browse to on the 
 Internet, often by mistake, without having to install special fonts for 
 scripts I don't read. 

However, a font like Last Resort (the world's smallest giant font, as it were)
does that just about as well.  For my own purposes, I'd like to see more
comprehensive Latin-script fonts with all combining characters working.

-- 
Do I contradict myself?John Cowan
Very well then, I contradict myself.[EMAIL PROTECTED]
I am large, I contain multitudes.   http://www.ccil.org/~cowan
--Walt Whitman, _Leaves of Grass_   http://www.reutershealth.com



Re: About the European MES-2 subset (was: PUA Audio Description,Subtitle, Signing)

2003-07-18 Thread Michael Everson
At 13:35 +0200 2003-07-18, Philippe Verdy wrote:

I note that you prefer the European Multilingual Subset to MES-2. 
Is it an extended set that includes MES-2, and fills the holes by 
using all characters defined in blocks of some version of the 
Unicode set?
It is script-based, not character based. It includes all Latin, 
Greek, Cyrillic, Georgian, and Armenian characters. And is a superset 
of MES-2.

I *prefer* Unicode to any subset thereof.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: About the European MES-2 subset

2003-07-18 Thread Michael Everson
At 11:28 -0400 2003-07-18, John Cowan wrote:

However, a font like Last Resort (the world's smallest giant font, as it were)
does that just about as well.
While I hate seeing the Last Resort font show up, I love seeing it 
when it does. :-) S much better than ?.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: About the European MES-2 subset (was: PUA Audio Description,Subtitle, Signing)

2003-07-18 Thread Michael Everson
At 13:07 +0200 2003-07-18, Kent Karlsson wrote:

This is not to say that the MESes are unproblematic.  To mention just
two points not already mentioned: none of the new math characters
are included even in MES-3 (a, b), despite that all math characters
were supposed to be included
That isn't true.

and not even MES-3 covers all official minority languages.
What's missing?

(But as Philippe states, there are some rather useless characters 
that have been included for compatibility reasons.)
Same goes for Unicode though. :-)
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: About the European MES-2 subset

2003-07-18 Thread Michael \(michka\) Kaplan
A question mark is a sign of a bad conversion from Unicode (to a code page
that did not contain the character). This would likely happen on the Mac too
rather than the Last Resort font, wouldn't it?

On Windows, the cannot find a font for it situation is the NULL glyph. The
Last Resort font is cool but a Code2000 stab at the actual glyph is (IMHO)
cooler than both.:-)

MichKa

- Original Message - 
From: Michael Everson [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Friday, July 18, 2003 1:42 PM
Subject: Re: About the European MES-2 subset


 At 11:28 -0400 2003-07-18, John Cowan wrote:

 However, a font like Last Resort (the world's smallest giant font, as it
were)
 does that just about as well.

 While I hate seeing the Last Resort font show up, I love seeing it
 when it does. :-) S much better than ?.
 -- 
 Michael Everson * * Everson Typography *  * http://www.evertype.com






Re: About the European MES-2 subset

2003-07-18 Thread Michael Everson
At 15:45 -0700 2003-07-18, Michael \(michka\) Kaplan wrote:
A question mark is a sign of a bad conversion from Unicode (to a code page
that did not contain the character). This would likely happen on the Mac too
rather than the Last Resort font, wouldn't it?
No, it wouldn't. A not a character glyph is displayed in the Last 
Resort font.

On Windows, the cannot find a font for it situation is the NULL glyph.
Not much netter than ?
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: About the European MES-2 subset

2003-07-18 Thread Michael \(michka\) Kaplan
I am pretty sure you have to be wrong here, Michael. Attend me:

1) API converts from Unicode to the wrong code page
2) API does some sort of work with the string
3) API tries to display the string

How on earth could it from the Last Resort font, unless it is a generic
glyph that contains no script info (which would be no better than a question
mark or a NULL glyph) ?

In any case, Code2000 giving some glyph for more cases is still a better
solution.

MichKa

- Original Message - 
From: Michael Everson [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Friday, July 18, 2003 4:16 PM
Subject: Re: About the European MES-2 subset


 At 15:45 -0700 2003-07-18, Michael \(michka\) Kaplan wrote:
 A question mark is a sign of a bad conversion from Unicode (to a code
page
 that did not contain the character). This would likely happen on the Mac
too
 rather than the Last Resort font, wouldn't it?

 No, it wouldn't. A not a character glyph is displayed in the Last
 Resort font.

 On Windows, the cannot find a font for it situation is the NULL glyph.

 Not much netter than ?
 -- 
 Michael Everson * * Everson Typography *  * http://www.evertype.com






Re: About the European MES-2 subset

2003-07-18 Thread John Cowan
Michael (michka) Kaplan scripsit:

 In any case, Code2000 giving some glyph for more cases is still a better
 solution.

In any case, if you cannot read any of the languages that use a given
script, you are unlikely to care much what glyph appears, and if it
turns out that you do care, the LR font gives you a clue about which
font you ought to install.

Seeing hanzi, hangeul, etc. gets old when you a) can't read the text
and b) suspect it is spam anyhow.

-- 
John Cowan  [EMAIL PROTECTED]  http://www.ccil.org/~cowan
Raffiniert ist der Herrgott, aber boshaft ist er nicht.
--Albert Einstein



Re: About the European MES-2 subset (was: PUA Audio Description, Subtitle, Signing)

2003-07-17 Thread Philippe Verdy
On Thursday, July 17, 2003 9:23 PM, Michael Everson [EMAIL PROTECTED] wrote:

 At 17:01 +0100 2003-07-17, William Overington wrote:
  Now, I have never heard of the MES-2 whatever that is.  However, I
  do not have deep knowledge of the various standards which exist. 
  Could you possibly say some more about MES-2 please.
 
 282 MES-2 is specified by the following ranges of code positions as
 indicated for each row.
 Rows: Positions (cells)
 00: 20-7E A0-FF
 01: 00-7F 8F 92 B7 DE-EF FA-FF
 02: 18-1B 1E-1F 59 7C 92 BB-BD C6-C7 C9 D8-DD EE

 03: 74-75 7A 7E 84-8A 8C 8E-A1 A3-CE D7 DA-E1
 04: 00-5F 90-C4 C7-C8 CB-CC D0-EB EE-F5 F8-F9

 1E: 02-03 0A-0B 1E-1F 40-41 56-57 60-61 6A-6B 80-85 9B
   F2-F3
 1F: 00-15 18-1D 20-45 48-4D 50-57 59 5B 5D 5F-7D
   80-B4 B6-C4 C6-D3 D6-DB DD-EF F2-F4 F6-FE
 20: 13-15 17-1E 20-22 26 30 32-33 39-3A 3C 3E 44 4A
   7F 82 A3-A4 A7 AC AF 
 21: 05 16 22 26 5B-5E 90-95 A8
 22: 00 02-03 06 08-09 0F 11-12 19-1A 1E-1F 27-2B
   48 59 60-61 64-65 82-83 95 97 
 23: 02 10 20-21 29-2A

 25: 00 02 0C 10 14 18 1C 24 2C 34 3C 50-6C 80 84 88 8C
   90-93 A0 AC B2 BA BC C4 CA-CB D8-D9
 26: 3A-3C 40 42 60 63 65-66 6A-6B

 FB: 01-02
 FF: FD

As most of these characters are canonically decomposable, shouldn't this
list include also the decomposed characters?

Why is row 03 so resticted? Shouldn't it include those accents and
diacritics that are used by other characters once canonically
decomposed? Or does it imply that MES-2 is only supposed to use
strings if NFC form?

Also, is this list under full closure with existing character properties, like
NFKD decompositions, and case mappings?

-- 
Philippe.
Spams non tolérés: tout message non sollicité sera
rapporté à vos fournisseurs de services Internet.




Re: About the European MES-2 subset (was: PUA Audio Description, Subtitle, Signing)

2003-07-17 Thread Kenneth Whistler

  282 MES-2 is specified by the following ranges of code positions as
  indicated for each row...

Philippe Verdy asked:

 As most of these characters are canonically decomposable, shouldn't this
 list include also the decomposed characters?
 
 Why is row 03 so resticted? Shouldn't it include those accents and
 diacritics that are used by other characters once canonically
 decomposed? Or does it imply that MES-2 is only supposed to use
 strings if NFC form?

MES-2 (and all the rest of the Multilingual European Subsets) are
a CEN construct. See the CEN Workshop Agreement, CWA 13873:2000
posted at Michael Everson's site:

http://www.evertype.com/standards/iso10646/pdf/cwa13873.pdf

Among other things, that CWA states:

This CWA does *not* specify any encoding of the European Subsets.

so conceptually it is more like a repertoire listing.

MES-2 is formally listed in 10646 as one of the normative subsets
there, but since 10646 has no concepts of decomposition, normalization,
or equivalence, the fact that MES-2 contains precomposed characters
but not their decompositions or the relevant combining accents
is formally irrelevant.

The Unicode Standard does not make subsets a normative construct
for that standard and doesn't even mention MES-2. Conformance to
10646 doesn't require you to make use of its subsets, but if anyone
is worried about the articulation of the standards, the Unicode
Standard itself formally consists of Subset 305 of 10646:2003,
namely the UNICODE 4.0 subset -- the subset which contains *all*
of the encoded characters of 10646:2003.

Think of the Multilingual European Subsets as a kind of
way for people in Europe associated with standards organizations
and governments to try to communicate with software vendors
regarding which user characters they want to ensure are
supported by their software. The CWA 13873 contains some
questionable presuppositions about how software vendors are
actually proceeding to roll out their Unicode support, but
the intent of the CWA is clear:

It is estimated that implementing the full character set of the
UCS may be costly in the first stages of UCS use, and that many
manufacturers will implement in subset-stages. To ensure that a
common subset usable to the vast majority of European users be
available for a reasonable price, and as a guide to manufacturers,
it will be helpful to specify, to users and procurers of systems,
European subsets of the UCS encompassing the characters for use
in European languages as well as other frequently used and
specialist characters.

 Also, is this list under full closure with existing character properties, like
 NFKD decompositions, and case mappings?

MES-2 is clearly *not* closed under NFD, NFKD, or NFKC normalizations.

Although less obvious, it is also not closed under NFC
normalization. For example, it includes the angle brackets
U+2329, U+232A, but not their canonical equivalents,
U+3008, U+3009. There are also some characters outside the MES-2 
repertoire where NFC(x) *is* in the MES-2 repertoire. Singleton canonical
equivalences like U+212B ANGSTROM SIGN come to mind, for example.

I haven't checked on case mappings and case foldings, but would
not be too surprised to find an anomaly or two there, as well.

MES-2 was not designed by the UTC, nor did it take any of
these considerations into account. It is not really an
appropriate construct for the Unicode Standard. A more
meaningful way to think of it is: if you want to sell software
in Europe, you better be able to input and display all the
characters we Europeans have in this list.

--Ken




Re: About the European MES-2 subset (was: PUA Audio Description, Subtitle, Signing)

2003-07-17 Thread Philippe Verdy
On Friday, July 18, 2003 2:18 AM, Kenneth Whistler [EMAIL PROTECTED] wrote:

 MES-2 was not designed by the UTC, nor did it take any of
 these considerations into account. It is not really an
 appropriate construct for the Unicode Standard. A more
 meaningful way to think of it is: if you want to sell software
 in Europe, you better be able to input and display all the
 characters we Europeans have in this list.

I interpret it like this way:

MES-2 is a collection of characters independant of their actual encoding.
To support MES-2 in a Unicode-compliant application, extra characters
need to be added, notably if the minimum requirement for information
interchange is the NFC form used by XML and HTML related standards.

It would be interesting to inform CEN about how MES-2 can be
documented to comply with all normative Unicode algorithms, and
the minimum is to ensure the NFC closure of this subset, which
should have better not included compatibility characters canonically
decomposed to singleton decompositions, and should now reintegrate
the missing NFC form.

For obvious reasons, the case mappings should also be closed, but
not necassarily compatibility decompositions, or characters needed
for the NFD form (notably combining diacritics, which may be added
only on applications that can process and recompose them on the
when querying supported precomposed characters in fonts).

Does the default TrueType fonts for Windows support the whole
MES-2 repertoire (Times New Roman, Arial and Courrier New),
including on Windows 95 without Uniscribe installed and used?

In practice, MES-2 support will always need additional characters
to ensure the minimum closures, and ISO10646 should work with
CEN to fix their set in a revision.

-- 
Philippe.
Spams non tolérés: tout message non sollicité sera
rapporté à vos fournisseurs de services Internet.