Re: About the European MES-2 subset

2003-07-22 Thread jameskass
.
Michael Everson wrote,

 I wasn't talking about that, but if you'd like my opinion, I hate that J too.

Apathy, intolerance, bigotry, death, taxation, ignorance, oppression...

Surely we can reserve our hatred for targets more worthy than
a colleague's variant glyph preferences.

Regards,

James Kass
.



RE: About the European MES-2 subset

2003-07-20 Thread Kent Karlsson

 This is not to say that the MESes are unproblematic.  To mention just
 two points not already mentioned: none of the new math characters
 are included even in MES-3 (a, b), despite that all math characters
 were supposed to be included

Michael E responded:

 That isn't true.

Eeh, well, disregarding some CJK compat chars that have
general category Sm (which are rightly excluded from the MESes), 
the following blocks (or formally, closely corresponding
collections) are missing from MES-3A (the largest of the MESes):

27C0..27EF; Miscellaneous Mathematical Symbols-A
27F0..27FF; Supplemental Arrows-A
2900..297F; Supplemental Arrows-B
2980..29FF; Miscellaneous Mathematical Symbols-B
2A00..2AFF; Supplemental Mathematical Operators
2B00..2BFF; Miscellaneous Symbols and Arrows
and (much as I dislike them, and they haven't GC Sm but L{u,l})
1D400..1D7FF; Mathematical Alphanumeric Symbols

(MES-3A lists collections rather than individual characters, and
includes some code points are not (yet) bound to any character.)

But are you saying that it was not the the intent to include all
math characters?  But all the old ones (the ones that were
included in 10646 at the time the MESes were deviced) are
included even in the smaller MES-2.

 and not even MES-3 covers all official minority languages.
 
 What's missing?

Hebrew, used for Yiddish, which is now an official minority
language in Sweden.  (Though various languages written with
the Arabic script are more common in official information to
the public.)  But I understand that was excluded since (in practice)
anything bidi was excluded from the MESes.

Also of European interest, though not for a language per se,
are Braille patterns and modern musical symbols.  (Not for all
European fonts, though, but the same goes for math symbols.)

/kent k





Re: About the European MES-2 subset

2003-07-20 Thread Peter_Constable
 On Windows, the cannot find a font for it situation is the NULL glyph. 
The
 Last Resort font is cool but a Code2000 stab at the actual glyph is 
(IMHO)
 cooler than both.:-)

Then wouldn't it make sense for Arial Unicode MS to be included with 
Windows rather than just with Office?



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485




Re: About the European MES-2 subset

2003-07-20 Thread Michael \(michka\) Kaplan
Well, I thought Arial Unicode MS is a little pricey for just putting it
anywhere? I may be wrong here (and I have no idea how much is costs,
really), but the huge size compared to megafonts like Code2000, which is
based in part on the rich Arial typeface heritage, also makes it a font of
some value and a legitimate value add where it is...

Of course, all of this is IMHO, as I have no real knowledge of what Office
or even nearby Typography think about any of these things

MichKa [MS]

- Original Message - 
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Sunday, July 20, 2003 6:20 AM
Subject: Re: About the European MES-2 subset


  On Windows, the cannot find a font for it situation is the NULL glyph.
 The
  Last Resort font is cool but a Code2000 stab at the actual glyph is
 (IMHO)
  cooler than both.:-)

 Then wouldn't it make sense for Arial Unicode MS to be included with
 Windows rather than just with Office?



 - Peter


 --
-
 Peter Constable

 Non-Roman Script Initiative, SIL International
 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
 Tel: +1 972 708 7485







Re: About the European MES-2 subset

2003-07-20 Thread John H. Jenkins
On Friday, July 18, 2003, at 4:45 PM, Michael (michka) Kaplan wrote:

A question mark is a sign of a bad conversion from Unicode (to a code 
page
that did not contain the character). This would likely happen on the 
Mac too
rather than the Last Resort font, wouldn't it?

MS Explorer on the Mac converts Unicode to old Mac scripts which it 
then renders.  That's why all the question marks when the page is 
looked at with MS Explorer.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jhjenkins/



Re: About the European MES-2 subset

2003-07-20 Thread Peter Kirk
On 19/07/2003 17:32, John Cowan wrote:

Peter Kirk scripsit:

 

But it can be useful to know whether what you are getting is hangul etc, 
or an Indian script, or some other script you don't know, or some 
symbols or mathematical codes, or else the result of some kind of 
encoding conversion error.
   

Precisely where the Last Resort font shines, without carrying the
overhead in glyph images of a normal giant font.
 

Indeed. Where can I get the Last Resort font for Windows (2000)? If the 
answer is nowhere, I guess I am stuck with Arial Unicode MS or the 
horrible-looking (the J always grates!) Code2000.

--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/




Re: About the European MES-2 subset

2003-07-20 Thread Michael Everson
At 12:38 -0700 2003-07-20, Peter Kirk wrote:

Indeed. Where can I get the Last Resort font for Windows (2000)? If 
the answer is nowhere, I guess I am stuck with Arial Unicode MS or 
the horrible-looking (the J always grates!) Code2000.
I'll go have a chat with some of my Apple colleagues about this.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: About the European MES-2 subset

2003-07-20 Thread jameskass
 At 12:38 -0700 2003-07-20, Peter Kirk wrote:
 
 Indeed. Where can I get the Last Resort font for Windows (2000)? If 
 the answer is nowhere, I guess I am stuck with Arial Unicode MS or 
 the horrible-looking (the J always grates!) Code2000.
 
 I'll go have a chat with some of my Apple colleagues about this.

It's unlikely that your Apple colleagues can do anything for
the J in Code2000.

Best regards,

James Kass
.



Re: About the European MES-2 subset

2003-07-20 Thread Michael Everson
At 20:50 + 2003-07-20, [EMAIL PROTECTED] wrote:
  At 12:38 -0700 2003-07-20, Peter Kirk wrote:
 Indeed. Where can I get the Last Resort font for Windows (2000)? If
 the answer is nowhere, I guess I am stuck with Arial Unicode MS or
 the horrible-looking (the J always grates!) Code2000.
 I'll go have a chat with some of my Apple colleagues about this.
It's unlikely that your Apple colleagues can do anything for
the J in Code2000.
I wasn't talking about that, but if you'd like my opinion, I hate that J too.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: About the European MES-2 subset

2003-07-20 Thread Peter Kirk
On 20/07/2003 13:50, [EMAIL PROTECTED] wrote:

At 12:38 -0700 2003-07-20, Peter Kirk wrote:

   

Indeed. Where can I get the Last Resort font for Windows (2000)? If 
the answer is nowhere, I guess I am stuck with Arial Unicode MS or 
the horrible-looking (the J always grates!) Code2000.
 

I'll go have a chat with some of my Apple colleagues about this.
   

It's unlikely that your Apple colleagues can do anything for
the J in Code2000.
Best regards,

James Kass
.
 

James, just to clarify since you are here: I am very grateful for the 
fonts Code2000 and Code2001 and that you have made these so easily 
available at http://home.att.net/~jameskass/. I don't like some of the 
glyph shapes, especially the J with a cross-bar like a T. But it is a 
lot better than nothing. When I need nice glyphs for particular Unicode 
ranges, I look elsewhere, though sometimes in vain. For example, who 
else even tries to cover the mathematial symbols in plane 1, at least in 
a downloadable font?

--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/




Re: About the European MES-2 subset

2003-07-19 Thread Peter Kirk
On 18/07/2003 17:42, John Cowan wrote:

Seeing hanzi, hangeul, etc. gets old when you a) can't read the text
and b) suspect it is spam anyhow.
 

But it can be useful to know whether what you are getting is hangul etc, 
or an Indian script, or some other script you don't know, or some 
symbols or mathematical codes, or else the result of some kind of 
encoding conversion error.

--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/




Re: About the European MES-2 subset (was: PUA Audio Description, Subtitle, Signing)

2003-07-19 Thread Philippe Verdy
On Friday, July 18, 2003 10:18 PM, Michael Everson [EMAIL PROTECTED] wrote:

 I *prefer* Unicode to any subset thereof.

Why such preference? Unicode does not define the charset (which are defined by 
ISO10646), but character properties and related algorithms, and (in cooperation with 
ISO10646) their codepoint assignments.

For me, Unicode is NOT a character set, but an encoded character set, with a small but 
important nuance: You need to specify a version after Unicode to indicate the 
character set. So Unicode 4.0 is a character set, and a superset of Unicode 3.2, but 
Unicode alone is not.

If you just look at this definition, you cannot prefer Unicode to any subset, 
because Unicode is just a name of a collection of standards and a collection of 
character sets and algorithms, and already is a subset of the next version... If you 
cannot support the idea of subsets, then don't use Unicode, or wait that the Unicode 
standard is definitely closed, or permanently consider that is repertoire is now 
closed and no more characters will be added... Of course you would be wrong.

MES-2 or its MES extension is a character set (like most legacy encodings in IANA 
which are also encoded character sets). In practice, nobody can live and implement any 
software without clearly bounded sets of characters. So versioning is absolutely 
necessary to fix these bounds in terms of implementation levels.

-- 
Philippe.
Spams non tolérés: tout message non sollicité sera
rapporté à vos fournisseurs de services Internet.




Re: About the European MES-2 subset (was: PUA Audio Description,Subtitle, Signing)

2003-07-19 Thread Michael Everson
At 15:23 +0200 2003-07-19, Philippe Verdy wrote:
Unicode does not define the charset (which are defined by ISO10646),
That isn't true. They both define the same character set. (I will not 
use the term charset.)

but character properties and related algorithms, and (in cooperation 
with ISO10646) their codepoint assignments.
The code position assignments are (formally) assigned by WG2, but 
there is consensus between UTC and WG2 on this matter.

For me, Unicode is NOT a character set, but an encoded character 
set, with a small but important nuance: You need to specify a 
version after Unicode to indicate the character set. So Unicode 4.0 
is a character set, and a superset of Unicode 3.2, but Unicode alone 
is not.
To me, Unicode refers to the most recent version. :-)

If you just look at this definition, you cannot prefer Unicode to 
any subset,
Yes, I can.

because Unicode is just a name of a collection of standards and a 
collection of character sets and algorithms
That isn't true. If you think this is true, you really have a lot to 
learn about Unicode.

and already is a subset of the next version... If you cannot support 
the idea of subsets, then don't use Unicode, or wait that the 
Unicode standard is definitely closed, or permanently consider that 
is repertoire is now closed and no more characters will be added... 
Of course you would be wrong.
I think you mistook me.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: About the European MES-2 subset

2003-07-19 Thread Michael Everson
At 16:41 -0700 2003-07-18, Michael \(michka\) Kaplan wrote:
I am pretty sure you have to be wrong here, Michael. Attend me:

1) API converts from Unicode to the wrong code page
2) API does some sort of work with the string
3) API tries to display the string
How on earth could it from the Last Resort font, unless it is a generic
glyph that contains no script info (which would be no better than a question
mark or a NULL glyph) ?
Hm. See http://developer.apple.com/fonts/LastResortFont/ where it 
shows glyphs for illegal characters (FFFE/ etc.) as well as 
undefined characters (valid code positions which have not been 
assigned). I thought somehow that there was a glyph for broken 
characters (characters that were just plain wrong) as well.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: About the European MES-2 subset

2003-07-19 Thread John Cowan
Peter Kirk scripsit:

 But it can be useful to know whether what you are getting is hangul etc, 
 or an Indian script, or some other script you don't know, or some 
 symbols or mathematical codes, or else the result of some kind of 
 encoding conversion error.

Precisely where the Last Resort font shines, without carrying the
overhead in glyph images of a normal giant font.

-- 
May the hair on your toes never fall out! John Cowan
--Thorin Oakenshield (to Bilbo) [EMAIL PROTECTED]



Re: About the European MES-2 subset (was: PUA Audio Description,Subtitle, Signing)

2003-07-18 Thread Michael Everson
At 00:57 +0200 2003-07-18, Philippe Verdy wrote:

Why is row 03 so resticted? Shouldn't it include those accents and 
diacritics that are used by other characters once canonically 
decomposed? Or does it imply that MES-2 is only supposed to use 
strings if NFC form?

Also, is this list under full closure with existing character properties, like
NFKD decompositions, and case mappings?
The MES-2 is what it is, and was developed at the time when it was. 
It is thought to be a minumum requirement for European requirements, 
and is certainly a lot better than that old Adobe glyph list that was 
supported earlier on. It doesn't depend on very smart fonts.

Personally I prefer the Multilingual European Subset.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: About the European MES-2 subset (was: PUA Audio Description, Subtitle, Signing)

2003-07-18 Thread Philippe Verdy
On Friday, July 18, 2003 7:36 AM, Michael Everson [EMAIL PROTECTED] wrote:

 At 00:57 +0200 2003-07-18, Philippe Verdy wrote:
 
  Why is row 03 so resticted? Shouldn't it include those accents and
  diacritics that are used by other characters once canonically
  decomposed? Or does it imply that MES-2 is only supposed to use
  strings if NFC form?
  
  Also, is this list under full closure with existing character
  properties, like NFKD decompositions, and case mappings?
 
 The MES-2 is what it is, and was developed at the time when it was.
 It is thought to be a minumum requirement for European requirements,
 and is certainly a lot better than that old Adobe glyph list that was
 supported earlier on. It doesn't depend on very smart fonts.
 
 Personally I prefer the Multilingual European Subset.

Is there some work at CEN to align its MES-2 subset into a
revized (MES-2.1 ???) which not only takes into consideration the
ISO10646 reference but also its Unicode properties to make this set
self-closed, and actually implementable, at least with NFC closure
and case-mappings closure?

Support for NFKC closure should then be added in a next step, which
could optionally specify support for the corresponding decompositions
(but this would include combining characters, and would extend the
number of precomposed characters in NFC form to include in the
repertoire).

I don't think it's up to Unicode to do this work, but CEN should be
contacted to perform this job, or some vendor or open-sourcers
may have done it and published it.

I still note that modern Hebrew and Arabic are excluded from MES-2,
as they are not used in any official language in the European Union
or EFTA, or future EU candidates. But They are certainly of great
interest for countries with which the EU is a major partner, and which
are using these scripts. In some future, it would be needed to include
support for modern Georgian (a subset of U+10A0..U+10FF), and modern
Armenian (a subset of U+0530..U+058F), as well as some characters
from Cyrillic Supplementary (in U+0500..U+052F).

On the opposite, I don't understand why MES-2 included characters
in row U+25xx (Box Drawing, Block Elements, Geometric Shapes),
which are not strictly needed for text purpose (notably legal publications
of the E.U., which should better use markup systems), and the two
Alphabetic Presentation Forms U+FB01..U+FB02 (fi and fl
ligatures) which are really unneeded, even for legal purposes, or they
should have been coherent and included ff, ffi, ffl ligatures...

I suppose that this may come from widely used legacy encodings in
some EU+EFTA+European Council countries, but CEN should have
avoided them (they could still be selected by font renderers, if available
in fonts).

-- 
Philippe.
Spams non tolérés: tout message non sollicité sera
rapporté à vos fournisseurs de services Internet.




Re: About the European MES-2 subset (was: PUA Audio Description,Subtitle, Signing)

2003-07-18 Thread Michael Everson
At 12:16 +0200 2003-07-18, Philippe Verdy wrote:

Is there some work at CEN to align its MES-2 subset into a revized 
(MES-2.1 ???) which not only takes into consideration the ISO10646 
reference but also its Unicode properties to make this set 
self-closed, and actually implementable, at least with NFC closure 
and case-mappings closure?
No. The relevant CEN committee is now dormant.

I still note that modern Hebrew and Arabic are excluded from MES-2, 
as they are not used in any official language in the European Union 
or EFTA, or future EU candidates. But They are certainly of great 
interest for countries with which the EU is a major partner, and 
which are using these scripts. In some future, it would be needed to 
include support for modern Georgian (a subset of U+10A0..U+10FF), 
and modern Armenian (a subset of U+0530..U+058F), as well as some 
characters from Cyrillic Supplementary (in U+0500..U+052F).
The European Multilingual Subset supports all of Latin, Greek, 
Cyrillic, and Armenian. Unicode supports Hebrew and Arabic.

On the opposite, I don't understand why MES-2 included characters
in row U+25xx (Box Drawing, Block Elements, Geometric Shapes)
Legacy compatability with IBM and others.

which are not strictly needed for text purpose (notably legal 
publications of the E.U., which should better use markup systems), 
and the two Alphabetic Presentation Forms U+FB01..U+FB02 (fi and 
fl ligatures) which are really unneeded, even for legal purposes, 
or they should have been coherent and included ff, ffi, ffl 
ligatures...
Legacy compatibility with Apple.

I suppose that this may come from widely used legacy encodings in 
some EU+EFTA+European Council countries, but CEN should have avoided 
them (they could still be selected by font renderers, if available 
in fonts).
You are entitled to your opinion. This work was begun and finished long ago.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


RE: About the European MES-2 subset (was: PUA Audio Description, Subtitle, Signing)

2003-07-18 Thread Kent Karlsson

Philippe Verdy wrote:

 MES-2 is a collection of characters independant of their actual
encoding.
 To support MES-2 in a Unicode-compliant application, extra characters
 need to be added, notably if the minimum requirement for information
 interchange is the NFC form used by XML and HTML related standards.

The Unicode normal forms (for a particular version of Unicode) is
defined
for ALL of the characters in that version.  There is no concept of a
Unicode normal form for a subset of the characters in a particular
version.

However, the MESes (there are four of them!) are useful for specifying 
minimum European font coverage, and input method support (the
latter need not be via keyboard).

This is not to say that the MESes are unproblematic.  To mention just
two points not already mentioned: none of the new math characters
are included even in MES-3 (a, b), despite that all math characters
were supposed to be included, and not even MES-3 covers all official
minority languages.

 It would be interesting to inform CEN about how MES-2 can be
 documented to comply with all normative Unicode algorithms, and
 the minimum is to ensure the NFC closure of this subset, which
 should have better not included compatibility characters canonically
 decomposed to singleton decompositions, and should now reintegrate
 the missing NFC form.

I think it is [extremely] unlikely at this point to expect anyone to
change,
or add new, MESes.  Note that implementors are in no way prohibited
from supporting (in fonts, plus rendering software, and some form of
input) more than the MESes state.  (But as Philippe states, there are
some
rather useless characters that have been included for compatibility
reasons.)

/kent k




Re: About the European MES-2 subset

2003-07-18 Thread Peter Kirk
On 18/07/2003 03:16, Philippe Verdy wrote:

I still note that modern Hebrew and Arabic are excluded from MES-2,
as they are not used in any official language in the European Union
or EFTA, or future EU candidates. ...
But they are used in official publications within the EU, those targeted 
at minority communities. But then so are south Asian and east Asian scripts.

... But They are certainly of great
interest for countries with which the EU is a major partner, and which
are using these scripts. In some future, it would be needed to include
support for modern Georgian (a subset of U+10A0..U+10FF), and modern
Armenian (a subset of U+0530..U+058F), as well as some characters
from Cyrillic Supplementary (in U+0500..U+052F).
If this subset is to be enlarged very much, and to require complex 
script rendering etc for its implementation, surely there is little 
point in specifying anything less than the improper (in the mathematical 
sense!) subset which Ken mentioned, i.e. the whole of Unicode.

--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/




Re: About the European MES-2 subset (was: PUA Audio Description, Subtitle, Signing)

2003-07-18 Thread Philippe Verdy
On Friday, July 18, 2003 12:42 PM, Michael Everson [EMAIL PROTECTED] wrote:

 At 12:16 +0200 2003-07-18, Philippe Verdy wrote:
 
  Is there some work at CEN to align its MES-2 subset into a revized
  (MES-2.1 ???) which not only takes into consideration the ISO10646
  reference but also its Unicode properties to make this set
  self-closed, and actually implementable, at least with NFC closure
  and case-mappings closure?
 
 No. The relevant CEN committee is now dormant.

So this work must be done by independant open-sourcers sharing their
experience to allow fonts to be created that are completely compatible
with MES-2. (Here this is my opinion: I think it's stupid to create fonts
that are containing strictly, only but completely the MES-2 set, which
must only be viewed as a minimum set).

I note that Microsoft core fonts for Windows are supporting MES-2, but
in a unrestricted way: other characters are also included, and UniScribe
allows selecting ligatures and rendering combining sequences with
composite glyphs if defined in OpenType fonts, or with a default multi-
glyph stack.

I note that you prefer the European Multilingual Subset to MES-2.
Is it an extended set that includes MES-2, and fills the holes by
using all characters defined in blocks of some version of the Unicode
set?




Re: About the European MES-2 subset

2003-07-18 Thread Philippe Verdy
On Friday, July 18, 2003 1:13 PM, Peter Kirk [EMAIL PROTECTED] wrote:

 On 18/07/2003 03:16, Philippe Verdy wrote:
 
  I still note that modern Hebrew and Arabic are excluded from MES-2,
  as they are not used in any official language in the European Union
  or EFTA, or future EU candidates. ...
  
 But they are used in official publications within the EU, those
 targeted 
 at minority communities. But then so are south Asian and east Asian
 scripts. 

But for these Asian languages, I think it's best to have fonts designed to
handle correctly their corresponding scripts, instead of a giant font poorly
hinted for readability at small sizes, and without support of common
ligatures.

Arabic, Hebrew and Brahmic scripts should better be supported by their
own fonts, rather than partially (for example the inclusion of Brahmic
digits only in Arial Unicode MS was an error, in my opinion, and Microsoft
should have better provided separate fonts for these Brahmic scripts, rather
than specifying that its fonts support these scripts).

  ... But They are certainly of great
  interest for countries with which the EU is a major partner, and
  which 
  are using these scripts. In some future, it would be needed to
  include support for modern Georgian (a subset of U+10A0..U+10FF),
  and modern Armenian (a subset of U+0530..U+058F), as well as some
  characters 
  from Cyrillic Supplementary (in U+0500..U+052F).

For the case of Armenian and Georgian Mkedruli, they do not seem complex
to add in a font.

 If this subset is to be enlarged very much, and to require complex
 script rendering etc for its implementation, surely there is little
 point in specifying anything less than the improper (in the
 mathematical sense!) subset which Ken mentioned, i.e. the whole of
 Unicode. 

I agree with this point. But this is not an excuse to not implement and
support at least the NFC and case mapping closures in a decent font
for any script, even if the script is reduced to letters used in the modern
language.

But some optional ligatures not strictly needed for a set of written
modern languages may strictly be not needed if the font or renderer
supports correct fallback decompositions (for example with fi, fl,
ffi, ffl). What is important here is the legality of the printed text,
so that no confusion is possible for a text written in any language.

One good source of such characters needed for languages can be
found in the Openi18n.org LDML database (notably the ICU section
which is the most complete collection), which contain definitions of
examplarCharacters for each supported language (but there may
exist some omissions). One regret: some characters are used and
examplar but not mandatory to support a language and they should
be listed separately, as well as rare characters if they are used only
in proper names or geographical names or translitterated foreign
words which can often be written with a the common letters with a
phonetic approach.

An example is: Norsk Bokmål, most often transcripted to: norvégien
bokmal or bokmâl in French (where the circumflex is used both as
a way to specify an open and/or lengthened vowel), or translated to:
norvégien classique (by opposition to: norvégien réformé, ou
nouveau norvégien).

So examplarCharacters in a language are a good indication to
indicate the needed characters for a language, even if an official
transliteration rule is used to translate imported foreign words with more
characters.

-- 
Philippe.
Spams non tolérés: tout message non sollicité sera
rapporté à vos fournisseurs de services Internet.




Re: About the European MES-2 subset

2003-07-18 Thread Peter Kirk
On 18/07/2003 06:21, Philippe Verdy wrote:

But for these Asian languages, I think it's best to have fonts designed to
handle correctly their corresponding scripts, instead of a giant font poorly
hinted for readability at small sizes, and without support of common
ligatures.
Agreed. Giant fonts have their uses, e.g. Arial Unicode MS and Code2000 
let me get a flavour of complex script pages which I browse to on the 
Internet, often by mistake, without having to install special fonts for 
scripts I don't read. But publication of official documents is not one 
of those uses. Software needs to include good font substitution procedures.

--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/




Re: About the European MES-2 subset

2003-07-18 Thread John Cowan
Peter Kirk scripsit:

 Agreed. Giant fonts have their uses, e.g. Arial Unicode MS and Code2000 
 let me get a flavour of complex script pages which I browse to on the 
 Internet, often by mistake, without having to install special fonts for 
 scripts I don't read. 

However, a font like Last Resort (the world's smallest giant font, as it were)
does that just about as well.  For my own purposes, I'd like to see more
comprehensive Latin-script fonts with all combining characters working.

-- 
Do I contradict myself?John Cowan
Very well then, I contradict myself.[EMAIL PROTECTED]
I am large, I contain multitudes.   http://www.ccil.org/~cowan
--Walt Whitman, _Leaves of Grass_   http://www.reutershealth.com



Re: About the European MES-2 subset (was: PUA Audio Description,Subtitle, Signing)

2003-07-18 Thread Michael Everson
At 13:35 +0200 2003-07-18, Philippe Verdy wrote:

I note that you prefer the European Multilingual Subset to MES-2. 
Is it an extended set that includes MES-2, and fills the holes by 
using all characters defined in blocks of some version of the 
Unicode set?
It is script-based, not character based. It includes all Latin, 
Greek, Cyrillic, Georgian, and Armenian characters. And is a superset 
of MES-2.

I *prefer* Unicode to any subset thereof.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: About the European MES-2 subset

2003-07-18 Thread Michael Everson
At 11:28 -0400 2003-07-18, John Cowan wrote:

However, a font like Last Resort (the world's smallest giant font, as it were)
does that just about as well.
While I hate seeing the Last Resort font show up, I love seeing it 
when it does. :-) S much better than ?.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: About the European MES-2 subset (was: PUA Audio Description,Subtitle, Signing)

2003-07-18 Thread Michael Everson
At 13:07 +0200 2003-07-18, Kent Karlsson wrote:

This is not to say that the MESes are unproblematic.  To mention just
two points not already mentioned: none of the new math characters
are included even in MES-3 (a, b), despite that all math characters
were supposed to be included
That isn't true.

and not even MES-3 covers all official minority languages.
What's missing?

(But as Philippe states, there are some rather useless characters 
that have been included for compatibility reasons.)
Same goes for Unicode though. :-)
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: About the European MES-2 subset

2003-07-18 Thread Michael \(michka\) Kaplan
A question mark is a sign of a bad conversion from Unicode (to a code page
that did not contain the character). This would likely happen on the Mac too
rather than the Last Resort font, wouldn't it?

On Windows, the cannot find a font for it situation is the NULL glyph. The
Last Resort font is cool but a Code2000 stab at the actual glyph is (IMHO)
cooler than both.:-)

MichKa

- Original Message - 
From: Michael Everson [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Friday, July 18, 2003 1:42 PM
Subject: Re: About the European MES-2 subset


 At 11:28 -0400 2003-07-18, John Cowan wrote:

 However, a font like Last Resort (the world's smallest giant font, as it
were)
 does that just about as well.

 While I hate seeing the Last Resort font show up, I love seeing it
 when it does. :-) S much better than ?.
 -- 
 Michael Everson * * Everson Typography *  * http://www.evertype.com






Re: About the European MES-2 subset

2003-07-18 Thread Michael Everson
At 15:45 -0700 2003-07-18, Michael \(michka\) Kaplan wrote:
A question mark is a sign of a bad conversion from Unicode (to a code page
that did not contain the character). This would likely happen on the Mac too
rather than the Last Resort font, wouldn't it?
No, it wouldn't. A not a character glyph is displayed in the Last 
Resort font.

On Windows, the cannot find a font for it situation is the NULL glyph.
Not much netter than ?
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: About the European MES-2 subset

2003-07-18 Thread Michael \(michka\) Kaplan
I am pretty sure you have to be wrong here, Michael. Attend me:

1) API converts from Unicode to the wrong code page
2) API does some sort of work with the string
3) API tries to display the string

How on earth could it from the Last Resort font, unless it is a generic
glyph that contains no script info (which would be no better than a question
mark or a NULL glyph) ?

In any case, Code2000 giving some glyph for more cases is still a better
solution.

MichKa

- Original Message - 
From: Michael Everson [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Friday, July 18, 2003 4:16 PM
Subject: Re: About the European MES-2 subset


 At 15:45 -0700 2003-07-18, Michael \(michka\) Kaplan wrote:
 A question mark is a sign of a bad conversion from Unicode (to a code
page
 that did not contain the character). This would likely happen on the Mac
too
 rather than the Last Resort font, wouldn't it?

 No, it wouldn't. A not a character glyph is displayed in the Last
 Resort font.

 On Windows, the cannot find a font for it situation is the NULL glyph.

 Not much netter than ?
 -- 
 Michael Everson * * Everson Typography *  * http://www.evertype.com






Re: About the European MES-2 subset

2003-07-18 Thread John Cowan
Michael (michka) Kaplan scripsit:

 In any case, Code2000 giving some glyph for more cases is still a better
 solution.

In any case, if you cannot read any of the languages that use a given
script, you are unlikely to care much what glyph appears, and if it
turns out that you do care, the LR font gives you a clue about which
font you ought to install.

Seeing hanzi, hangeul, etc. gets old when you a) can't read the text
and b) suspect it is spam anyhow.

-- 
John Cowan  [EMAIL PROTECTED]  http://www.ccil.org/~cowan
Raffiniert ist der Herrgott, aber boshaft ist er nicht.
--Albert Einstein



Re: About the European MES-2 subset (was: PUA Audio Description, Subtitle, Signing)

2003-07-17 Thread Philippe Verdy
On Thursday, July 17, 2003 9:23 PM, Michael Everson [EMAIL PROTECTED] wrote:

 At 17:01 +0100 2003-07-17, William Overington wrote:
  Now, I have never heard of the MES-2 whatever that is.  However, I
  do not have deep knowledge of the various standards which exist. 
  Could you possibly say some more about MES-2 please.
 
 282 MES-2 is specified by the following ranges of code positions as
 indicated for each row.
 Rows: Positions (cells)
 00: 20-7E A0-FF
 01: 00-7F 8F 92 B7 DE-EF FA-FF
 02: 18-1B 1E-1F 59 7C 92 BB-BD C6-C7 C9 D8-DD EE

 03: 74-75 7A 7E 84-8A 8C 8E-A1 A3-CE D7 DA-E1
 04: 00-5F 90-C4 C7-C8 CB-CC D0-EB EE-F5 F8-F9

 1E: 02-03 0A-0B 1E-1F 40-41 56-57 60-61 6A-6B 80-85 9B
   F2-F3
 1F: 00-15 18-1D 20-45 48-4D 50-57 59 5B 5D 5F-7D
   80-B4 B6-C4 C6-D3 D6-DB DD-EF F2-F4 F6-FE
 20: 13-15 17-1E 20-22 26 30 32-33 39-3A 3C 3E 44 4A
   7F 82 A3-A4 A7 AC AF 
 21: 05 16 22 26 5B-5E 90-95 A8
 22: 00 02-03 06 08-09 0F 11-12 19-1A 1E-1F 27-2B
   48 59 60-61 64-65 82-83 95 97 
 23: 02 10 20-21 29-2A

 25: 00 02 0C 10 14 18 1C 24 2C 34 3C 50-6C 80 84 88 8C
   90-93 A0 AC B2 BA BC C4 CA-CB D8-D9
 26: 3A-3C 40 42 60 63 65-66 6A-6B

 FB: 01-02
 FF: FD

As most of these characters are canonically decomposable, shouldn't this
list include also the decomposed characters?

Why is row 03 so resticted? Shouldn't it include those accents and
diacritics that are used by other characters once canonically
decomposed? Or does it imply that MES-2 is only supposed to use
strings if NFC form?

Also, is this list under full closure with existing character properties, like
NFKD decompositions, and case mappings?

-- 
Philippe.
Spams non tolérés: tout message non sollicité sera
rapporté à vos fournisseurs de services Internet.




Re: About the European MES-2 subset (was: PUA Audio Description, Subtitle, Signing)

2003-07-17 Thread Kenneth Whistler

  282 MES-2 is specified by the following ranges of code positions as
  indicated for each row...

Philippe Verdy asked:

 As most of these characters are canonically decomposable, shouldn't this
 list include also the decomposed characters?
 
 Why is row 03 so resticted? Shouldn't it include those accents and
 diacritics that are used by other characters once canonically
 decomposed? Or does it imply that MES-2 is only supposed to use
 strings if NFC form?

MES-2 (and all the rest of the Multilingual European Subsets) are
a CEN construct. See the CEN Workshop Agreement, CWA 13873:2000
posted at Michael Everson's site:

http://www.evertype.com/standards/iso10646/pdf/cwa13873.pdf

Among other things, that CWA states:

This CWA does *not* specify any encoding of the European Subsets.

so conceptually it is more like a repertoire listing.

MES-2 is formally listed in 10646 as one of the normative subsets
there, but since 10646 has no concepts of decomposition, normalization,
or equivalence, the fact that MES-2 contains precomposed characters
but not their decompositions or the relevant combining accents
is formally irrelevant.

The Unicode Standard does not make subsets a normative construct
for that standard and doesn't even mention MES-2. Conformance to
10646 doesn't require you to make use of its subsets, but if anyone
is worried about the articulation of the standards, the Unicode
Standard itself formally consists of Subset 305 of 10646:2003,
namely the UNICODE 4.0 subset -- the subset which contains *all*
of the encoded characters of 10646:2003.

Think of the Multilingual European Subsets as a kind of
way for people in Europe associated with standards organizations
and governments to try to communicate with software vendors
regarding which user characters they want to ensure are
supported by their software. The CWA 13873 contains some
questionable presuppositions about how software vendors are
actually proceeding to roll out their Unicode support, but
the intent of the CWA is clear:

It is estimated that implementing the full character set of the
UCS may be costly in the first stages of UCS use, and that many
manufacturers will implement in subset-stages. To ensure that a
common subset usable to the vast majority of European users be
available for a reasonable price, and as a guide to manufacturers,
it will be helpful to specify, to users and procurers of systems,
European subsets of the UCS encompassing the characters for use
in European languages as well as other frequently used and
specialist characters.

 Also, is this list under full closure with existing character properties, like
 NFKD decompositions, and case mappings?

MES-2 is clearly *not* closed under NFD, NFKD, or NFKC normalizations.

Although less obvious, it is also not closed under NFC
normalization. For example, it includes the angle brackets
U+2329, U+232A, but not their canonical equivalents,
U+3008, U+3009. There are also some characters outside the MES-2 
repertoire where NFC(x) *is* in the MES-2 repertoire. Singleton canonical
equivalences like U+212B ANGSTROM SIGN come to mind, for example.

I haven't checked on case mappings and case foldings, but would
not be too surprised to find an anomaly or two there, as well.

MES-2 was not designed by the UTC, nor did it take any of
these considerations into account. It is not really an
appropriate construct for the Unicode Standard. A more
meaningful way to think of it is: if you want to sell software
in Europe, you better be able to input and display all the
characters we Europeans have in this list.

--Ken




Re: About the European MES-2 subset (was: PUA Audio Description, Subtitle, Signing)

2003-07-17 Thread Philippe Verdy
On Friday, July 18, 2003 2:18 AM, Kenneth Whistler [EMAIL PROTECTED] wrote:

 MES-2 was not designed by the UTC, nor did it take any of
 these considerations into account. It is not really an
 appropriate construct for the Unicode Standard. A more
 meaningful way to think of it is: if you want to sell software
 in Europe, you better be able to input and display all the
 characters we Europeans have in this list.

I interpret it like this way:

MES-2 is a collection of characters independant of their actual encoding.
To support MES-2 in a Unicode-compliant application, extra characters
need to be added, notably if the minimum requirement for information
interchange is the NFC form used by XML and HTML related standards.

It would be interesting to inform CEN about how MES-2 can be
documented to comply with all normative Unicode algorithms, and
the minimum is to ensure the NFC closure of this subset, which
should have better not included compatibility characters canonically
decomposed to singleton decompositions, and should now reintegrate
the missing NFC form.

For obvious reasons, the case mappings should also be closed, but
not necassarily compatibility decompositions, or characters needed
for the NFD form (notably combining diacritics, which may be added
only on applications that can process and recompose them on the
when querying supported precomposed characters in fonts).

Does the default TrueType fonts for Windows support the whole
MES-2 repertoire (Times New Roman, Arial and Courrier New),
including on Windows 95 without Uniscribe installed and used?

In practice, MES-2 support will always need additional characters
to ensure the minimum closures, and ISO10646 should work with
CEN to fix their set in a revision.

-- 
Philippe.
Spams non tolérés: tout message non sollicité sera
rapporté à vos fournisseurs de services Internet.