Re: Character identities

2002-10-28 Thread Marc Wilhelm Küster
At 11:37 25.10.2002 -0700, Doug Ewell wrote:

Marc Wilhelm Küster kuester at saphor dot net wrote:

 As to the long s, it is not used for writing present-day German except
 in rare cases, notably in some scholarly editions and in the Fraktur
 script. Very few texts beyond the names of newspapers are nowadays
 produced in Fraktur. To put the long s on the German keyboard would be
 quite contrary to user requirements -- and if a requirement existed,
 it would be DIN's job to amend DIN 2137-2 and the upcoming DIN 2137-12
 to cater for it.

Irrelevant, sure, but contrary?  I don't see what harm could come
from adding a character to a previously unassigned key, especially in
the relatively obscure AltGr zone (Level 3).  Most users could safely
ignore it, and most would never even know it was there.


In principle, you are right. Unfortunately, there's quite a bit of software 
around that (mis-)uses unassigned AltGr-Keys for their own purposes - this 
includes, on Windows NT ff at least, software such as the localized MS 
Word. So, adding new assignments potentially clashes with existing software 
and should only be done if there is a sufficiently high public interest in 
doing so.


But yes, of course it would be DIN's job to standardize such a thing (or
not).

Patrick Andries asked if a revised German keyboard standard would be
ignored in the market with the same cavalier attitude seen in Canada
(and the U.S.).  My impression is that European manufacturers are held
more closely to conformance with national and international standards
than North American manufacturers, but I'd want some Europeans to back
me up on this.


Speaking of Europe, it differs from country to country. In Germany 
certainly DIN 2137 is widely adhered to and changes to it would in all 
likelihood be taken up fast on the market.

Best regards,

Marc Küster


-Doug Ewell
 Fullerton, California


*
Marc Wilhelm Küster
Saphor GmbH

Fronländer 22
D-72072 Tübingen

Tel.: (+49) / (0)7472 / 949 100
Fax: (+49) / (0)7472 / 949 114





BabelPad

2002-10-28 Thread Andrew C. West
BabelPad, my free Unicode plain text editor for Windows has now been released.
Further information is available at
http://uk.geocities.com/BabelStone1357/Software/BabelPad.html.

BabelPad also includes input methods for a number of scripts which I am
interested in, currently :
Tibetan (using Extended Wylie)
Yi (using standard romanisation)

Note that although a build of BabelPad for Windows 95/98/ME is available, it is
not as feature-complete as the builds for Windows NT 4.0 or Windows 2000/XP, and
may not work properly when configured to use different fonts for different
Unicode ranges.

I haven't written the help system yet, but a FAQ is available at
http://uk.geocities.com/BabelStone1357/Software/BabelPad.html.

My thanks to those members of this list who have commented on the pre-releaee
versions of BabelPad.

Regards,

Andrew




RE: Character identities

2002-10-28 Thread Kent Karlsson
...
  For this reason it is quite impermissible to render the
  combining letter small e as a diaeresis

 So far so good. There would be no reason for doing such a thing.
...
  or, for that matter, the diaeresis as a combining
  letter small e (however, you see the latter version
  sometimes, very infrequently, in advertisement).

 This is the case I though we were discussing, and it is a
 very different case.

No, the claim was that diaresis and overscript e are the same,
so the reversed case Marc is talking about is not different at all.

 Standing Keld's opinion and Marc's wholehearted support, it

Please don't confuse me with Keld!

 follows that
 those infrequent advertisements should be encoded using U+0364...

 But U+0364 (COMBINING LATIN SMALL LETTER E) belongs to a
 small collection of
 Medieval superscript letter diactrics, which is supposed to appear
 primarily in medieval Germanic manuscripts, or to reproduce
 some usage as late as the 19th century in some languages.

Yes, but you should not read too much into the explanation,
which, while correct, does not limit the existence of their
glyphs to fonts used only by germanic professors...
Some of them (overscript e in particular) should be(come)
quite commonly occurring in any Fraktur Unicode font.

 Using such a character to encode 21st century advertisements
 is doomed to cause problems:

 1) The glyph for U+0364 is more likely found in the font
 collection of the
 Faculty of Germanic Studies that on the PC of people wishing
 to read the
 advertisement for Ye Olde Küster Pub. So, most people will
 be unable to
 view the advertisement correctly.

 2) The designer of the advertisement will be unable to use
 his spell-checker and hyphenator on the advertisement's text.

Advertisements should invariably be final spell-checked and
hyphenated by humans!  Automated spell checkers and hyphenators
for German (as well as Scandinavian languages) have (so far)
not been good enough even for running text that you want to
publish...

 3) User's will be unable to find the Küster Pub by searching
 Küster in a
 search engine.

Depends on the search engine, and if it uses a correct collation
table (for the language) or not...

 What will actually happen is that everybody will see an empty
 square, so
 they'll think that the web designer is an idiot, apart the
 professors at the
 Faculty of Germanic Studies, who'll think that the designer
 is an idiot
 because she doesn't know the difference between U+0308 and
 U+0364 in ancient German.

Most modern use of Fraktur seem to use diaeresis or double
acute for this. (But the web designer could use a dynamically
downloaded font fragment, if there is worry that all glyphs
might not be supported by the fonts used by the vast majority
of the target audience.)

 The real error (IMHO) is the idea that font designers should
 stick to the
 *sample* glyphs printed on the Unicode book, because this would force

Well, the diacritics are allocated/unified on glyphic grounds.
While a diaeresis may look different from font to font, it is
basically two dots (of some shape in line with the design of the
font), never an e shape.  At least not in the *default mode* of a
*Unicode font*. And overscript small e will also vary with the font,
looking like a shrunken ordinary e glyph of (ideally) the same font.
But never like two dots (in the default mode of a Unicode font).

 graphic designer to change the *encoding* of their text in
 order to get the desired result.

A graphic designer is likely to turn the whole thing into 2-d
or 3-d graphics, probably distorted, possibly animated, to get
the desired result!  At which point the original, or intemediary,
encoding of any text elements is not very relevant to the
end result.

 Another big error (IMHO, once again) is the idea that two
 different Unicode characters should look different.

I have never said that! E.g., a µ as well as an Å (both of which
are allocated twice!) should look the same (resp.) regardless of
which of their respective code points is used. There are many
more examples of characters that definitely should (e.g. capital
K and Kelvin sign, small i and small roman numeral one) or may
(capital A, capital Alpha, ...) look the same.

There are also lots of characters that mean the same, but
always (in a Unicode font in default mode) should/must
look different. Like M and Roman Numeral One Thousand C D
(just to take an example closer to Italy... ;-).

 The difference must be preserved when it
 is useful -- e.g., U+0308 should not look like U+0364 in a

should not -- must never

 font designed for
 publishing books on the history of German!

a font . -- any Unicode font in default mode

(Bad example, Marco!)


 What should really happen, IMHO, is that modern German should
 be encoded as
 modern German. A U+0308 (COMBINING DIAERESIS) should remain a U+0308,
 regardless that the corresponding glyph *looks* like U+0364
 (COMBINING LATIN
 SMALL LETTER E) in one font, and it looks 

Re: Character identities

2002-10-28 Thread David Starner
On Mon, Oct 28, 2002 at 11:21:30AM +0100, Kent Karlsson wrote:
 No, the claim was that diaresis and overscript e are the same,
 so the reversed case Marc is talking about is not different at all.

The claim is, that for certain fonts, it is appropriate to image the
a-umlaut character as an a^e. That doesn't imply anything about the
other way around, or else t' could legally be displayed as a t with
caron above.

  A U+0308 (COMBINING DIAERESIS) should remain a U+0308,
  regardless that the corresponding glyph *looks* like U+0364
  (COMBINING LATIN
  SMALL LETTER E) in one font, and it looks like U+0304
  (COMBINING MACRON) in
  another font, and it looks like two five-pointed start
  side-by-side in a
  third font, and it looks like Mickey Mouse's ears in Disney.ttf...
 
 These are all unacceptable variations in a *Unicode font (in
 default mode)*.  But you can have all kinds of silly variations
 in *non*-Unicode fonts applied to Unicode text, including ciphers
 or rebuses... (ok, there are degrees...)

Basically, any decorative or handwriting font can't be a Unicode font.
(The glyph for my German teachers umlaut was definitely a macron.) Seems
pointless to tell a lot of the fontmakers out there that they shouldn't
worry about Unicode, because Unicode's only for standard book fonts, but
that's the only way I can read your last statement.

-- 
David Starner - [EMAIL PROTECTED]
Great is the battle-god, great, and his kingdom--
A field where a thousand corpses lie. 
  -- Stephen Crane, War is Kind




Copyright on gif images via http://www.unicode.org/cgi-bin/GetUnihanData.pl

2002-10-28 Thread Dan Kogai
I have asked this question before without answer so I am repeating 
again.

The Unihan Database browser at 
http://www.unicode.org/cgi-bin/GetUnihanData.pl shows an example glyph 
via http://www.unicode.org/cgi-bin/refglyph?24-codepoint.  I would 
like to use this image but where can I ask for the permission?

I have written a CGI which renders a banner using the URI above but I 
am not sure if I can cache the image.  The CGI is still on my intranet 
but I can disclose upon request.

Dan Kogai or http://www.unicode.org/cgi-bin/refglyph?24-5f3e




RE: Character identities

2002-10-28 Thread Marco Cimarosti
Kent Karlsson wrote:
   For this reason it is quite impermissible to render the
   combining letter small e as a diaeresis
 
  So far so good. There would be no reason for doing such a thing.
 ...
   or, for that matter, the diaeresis as a combining
   letter small e (however, you see the latter version
   sometimes, very infrequently, in advertisement).
 
  This is the case I though we were discussing, and it is a
  very different case.
 
 No, the claim was that diaresis and overscript e are the same,

The claim was that dieresis and overscript e are the same in *modern*
*standard* German. Or, better stated, that overscript e is just a glyph
variant of dieresis, in *modern* *standard* German typeset in Fraktur.

Sorry if I haven't stated this clearly enough.

 so the reversed case Marc is talking about is not different at all.

It is. In the first case, we are talking about a glyph variant in *modern*
*standard* German, in the second case, we are talking about two different
diacritics in some *other* context. (Ancient German? ancient Swedish?).

  Standing Keld's opinion and Marc's wholehearted support, it
 
 Please don't confuse me with Keld!

Oooops! My apologies!

  follows that
  those infrequent advertisements should be encoded using U+0364...
 
  But U+0364 (COMBINING LATIN SMALL LETTER E) belongs to a
  small collection of
  Medieval superscript letter diactrics, which is supposed 
 to appear
  primarily in medieval Germanic manuscripts, or to reproduce
  some usage as late as the 19th century in some languages.
 
 Yes, but you should not read too much into the explanation,
 which, while correct, does not limit the existence of their
 glyphs to fonts used only by germanic professors...
 Some of them (overscript e in particular) should be(come)
 quite commonly occurring in any Fraktur Unicode font.

Commonly sounds funny near Fraktur...

  Using such a character to encode 21st century advertisements
  is doomed to cause problems:
 
  1) The glyph for U+0364 is more likely found in the font
  collection of the
  Faculty of Germanic Studies that on the PC of people wishing
  to read the
  advertisement for Ye Olde Küster Pub. So, most people will
  be unable to
  view the advertisement correctly.
 
  2) The designer of the advertisement will be unable to use
  his spell-checker and hyphenator on the advertisement's text.
 
 Advertisements should invariably be final spell-checked and
 hyphenated by humans!  Automated spell checkers and hyphenators
 for German (as well as Scandinavian languages) have (so far)
 not been good enough even for running text that you want to
 publish...

This has no connection with this discussion.

However, IMHO, the presence U+0364 (COMBINING LATIN SMALL LETTER E) in a
modern German or Swedish text is just a plain spelling error, and even the
naivest spellchecker should flag it as such.

  3) User's will be unable to find the Küster Pub by searching
  Küster in a
  search engine.
 
 Depends on the search engine, and if it uses a correct collation
 table (for the language) or not...

  What will actually happen is that everybody will see an empty
  square, so
  they'll think that the web designer is an idiot, apart the
  professors at the
  Faculty of Germanic Studies, who'll think that the designer
  is an idiot
  because she doesn't know the difference between U+0308 and
  U+0364 in ancient German.
 
 Most modern use of Fraktur seem to use diaeresis or double
 acute for this. 

U+0308 (COMBINING DIAERESIS) should be the only umlaut to be found in
modern German text. What that diacritic *looks* like (two dots, an e, a
double acute, a macron, Mickey Mouse's ears), is a choice of the font
designer.

 (But the web designer could use a dynamically
 downloaded font fragment, if there is worry that all glyphs
 might not be supported by the fonts used by the vast majority
 of the target audience.)

This too has no connection with this discussion, and is OT. Unicode is
concerned with how text is *encoded* the details of fonts and display
technology are out of scope.

What Unicode really mandates is that the encoding should not change to
obtain a certain graphic effect.

  The real error (IMHO) is the idea that font designers should
  stick to the
  *sample* glyphs printed on the Unicode book, because this 
 would force
 
 Well, the diacritics are allocated/unified on glyphic grounds.
 While a diaeresis may look different from font to font, it is
 basically two dots (of some shape in line with the design of the
 font), never an e shape.  At least not in the *default mode* of a
 *Unicode font*.

 And overscript small e will also vary with the font,
 looking like a shrunken ordinary e glyph of (ideally) the same font.
 But never like two dots (in the default mode of a Unicode font).

You haven't yet defined your meaning of Unicode font and, now, you add a
new fancy term: default mode!

What's a default mode? Unicode does not require fonts to have any kind of
modes. You seem to be 

Re: Character identities

2002-10-28 Thread Doug Ewell
Marco Cimarosti marco dot cimarosti at essetre dot it wrote:

 There are also lots of characters that mean the same, but
 always (in a Unicode font in default mode) should/must
 look different. Like M and Roman Numeral One Thousand C D
 (just to take an example closer to Italy... ;-).

 Well, the first and only time I have seen that Thousand C D was on
 the Unicode charts... However, if I'd be asked which glyph is more
 appropriate for that character, I would say: the same as capital M.

I would disagree with this.  It seems to me the whole reason for both
U+216F ROMAN NUMERAL ONE THOUSAND and U+2180 ROMAN NUMERAL ONE THOUSAND
C D to exist is that they should have different glyphs.  This is not
necessarily is keeping with the purest spirit of Unicode (which might
regard these as two glyphs of a single character), but in reality they
are encoded as two characters.

Note, however, that there is nothing wrong with using the same glyph for
U+004D and U+216F, although in many fonts they are different for no
obvious reason.

-Doug Ewell
 Fullerton, California





Re: Character identities

2002-10-28 Thread Anto'nio Martins-Tuva'lkin
On 2002.10.28, 13:09, David Starner [EMAIL PROTECTED] wrote:

 Basically, any decorative or handwriting font can't be a Unicode font.
...
 Seems pointless to tell a lot of the fontmakers out there that they
 shouldn't worry about Unicode, because Unicode's only for standard
 book fonts

Hm, what if I want to make, say, snow capped Devanagari glyphs for my
hiking company in Nepal? Shouldn't I assign them to Unicode code points?

--   .
António MARTINS-Tuválkin|  ()|
[EMAIL PROTECTED]   ||
R. Laureano de Oliveira, 64 r/c esq. |
PT-1885-050 MOSCAVIDE (LRS)  Não me invejo de quem tem   |
+351 917 511 549 carros, parelhas e montes   |
http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe   |
http://pagina.de/bandeiras/  a água em todas as fontes   |





Re: Character identities

2002-10-28 Thread John Hudson


On 2002.10.28, 13:09, David Starner [EMAIL PROTECTED] wrote:

 Basically, any decorative or handwriting font can't be a Unicode font.
...
 Seems pointless to tell a lot of the fontmakers out there that they
 shouldn't worry about Unicode, because Unicode's only for standard
 book fonts


Hello? Who says decorative or handwriting fonts can't be Unicode fonts? 
I've got dozens of fonts on my system that prove this wrong. Zapfino, which 
ships with OS X and which I had the privilege to work on, is about as 
decorative a handwriting font as you could wish for, and of course it has a 
Unicode cmap.

Or are you working with some definition of 'Unicode font' other than 'font 
with a Unicode cmap'?

John Hudson

Tiro Typeworks		www.tiro.com
Vancouver, BC		[EMAIL PROTECTED]

It is necessary that by all means and cunning,
the cursed owners of books should be persuaded
to make them available to us, either by argument
or by force.  - Michael Apostolis, 1467




Re: Character identities

2002-10-28 Thread Michael Everson
At 20:59 + 2002-10-28, Anto'nio Martins-Tuva'lkin wrote:

On 2002.10.28, 13:09, David Starner [EMAIL PROTECTED] wrote:


 Basically, any decorative or handwriting font can't be a Unicode font.

...

 Seems pointless to tell a lot of the fontmakers out there that they
 shouldn't worry about Unicode, because Unicode's only for standard
 book fonts


Hm, what if I want to make, say, snow capped Devanagari glyphs for my
hiking company in Nepal? Shouldn't I assign them to Unicode code points?


That's what Private Use code positions are for.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com




RE: Character identities

2002-10-28 Thread Figge, Donald

At 20:59 + 2002-10-28, Anto'nio Martins-Tuva'lkin wrote:
On 2002.10.28, 13:09, David Starner [EMAIL PROTECTED] wrote:

  Basically, any decorative or handwriting font can't be a Unicode font.
...
  Seems pointless to tell a lot of the fontmakers out there that they
  shouldn't worry about Unicode, because Unicode's only for standard
  book fonts

Hm, what if I want to make, say, snow capped Devanagari glyphs for my
hiking company in Nepal? Shouldn't I assign them to Unicode code points?

That's what Private Use code positions are for.
-- 
Michael Everson * * Everson Typography *  * http://www.evertype.com
--
I don't think so. He seems to be talking about a specific typographic style.
Code points don't care about style, whether it's Franklin Gothic or
Snowcapped Helvetica.

Don




Re: Character identities

2002-10-28 Thread Michael Everson
At 13:36 -0700 2002-10-28, John Hudson wrote:


Or are you working with some definition of 'Unicode font' other than 
'font with a Unicode cmap'?

It seemed to me that he was talking about fonts that had characters 
that weren't in Unicode at all. I don't mean precomposed vowels, but, 
say, fonts with moon phases in them.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Character identities

2002-10-28 Thread Kenneth Whistler

 Hm, what if I want to make, say, snow capped Devanagari glyphs for my
 hiking company in Nepal? Shouldn't I assign them to Unicode code points?
 
 That's what Private Use code positions are for.
 -- 
 Michael Everson * * Everson Typography *  * http://www.evertype.com

Um, Michael, I think Anto'nio was talking about glyphs in a
decorative font, which should -- clearly -- just be mapped to
ordinary Unicode characters, via an ordinary Unicode cmap.

Or do you think that the yellow, cursive, shadow-dropped, 3-D
letters Getaway! at:

http://www.trekking-in-nepal.com/

should also be represented by Private Use code positions? ;-)

--Ken





Re: Character identities

2002-10-28 Thread David Starner
On Mon, Oct 28, 2002 at 09:36:34PM +, Michael Everson wrote:
 At 20:59 + 2002-10-28, Anto'nio Martins-Tuva'lkin wrote:
 On 2002.10.28, 13:09, David Starner [EMAIL PROTECTED] wrote:
 
  Basically, any decorative or handwriting font can't be a Unicode font.
 ...
  Seems pointless to tell a lot of the fontmakers out there that they
  shouldn't worry about Unicode, because Unicode's only for standard
  book fonts
 
 Hm, what if I want to make, say, snow capped Devanagari glyphs for my
 hiking company in Nepal? Shouldn't I assign them to Unicode code points?
 
 That's what Private Use code positions are for.

But think of the utility if Unicode added a COMBINING SNOWCAP and
COMBINING FIRECAP! But should we combine the SNOWCAP with the ICECAP?

(-:

-- 
David Starner - [EMAIL PROTECTED]
Great is the battle-god, great, and his kingdom--
A field where a thousand corpses lie. 
  -- Stephen Crane, War is Kind




Re: Character identities

2002-10-28 Thread David Starner
On Mon, Oct 28, 2002 at 01:36:08PM -0700, John Hudson wrote:
 
 On 2002.10.28, 13:09, David Starner [EMAIL PROTECTED] wrote:
 
  Basically, any decorative or handwriting font can't be a Unicode font.
 ...
  Seems pointless to tell a lot of the fontmakers out there that they
  shouldn't worry about Unicode, because Unicode's only for standard
  book fonts
 
 Hello? Who says decorative or handwriting fonts can't be Unicode fonts? 
[...]
 Or are you working with some definition of 'Unicode font' other than 'font 
 with a Unicode cmap'?

Right above where it was cut it said:

Marco:
  A U+0308 (COMBINING DIAERESIS) should remain a U+0308,
  regardless that the corresponding glyph *looks* like U+0364
  (COMBINING LATIN
  SMALL LETTER E) in one font, and it looks like U+0304
  (COMBINING MACRON) in
  another font, and it looks like two five-pointed start
  side-by-side in a
  third font, and it looks like Mickey Mouse's ears in Disney.ttf...
 
Kent:
  These are all unacceptable variations in a *Unicode font (in
  default mode)*.

Earlier:

Marco:
  there are fonts which don't have dots over i and j;

Kent:
  You have a slight point there, but those are not intended for
  running text.  And I'm hesitant to label them Unicode fonts.

Given that definition of Unicode fonts, a number of decorative or
handwriting fonts (though fewer than I expected) are arbitrarily
excluded from being Unicode fonts.

-- 
David Starner - [EMAIL PROTECTED]
Great is the battle-god, great, and his kingdom--
A field where a thousand corpses lie. 
  -- Stephen Crane, War is Kind




Re: Character identities

2002-10-28 Thread Michael Everson
At 14:30 -0800 2002-10-28, Kenneth Whistler wrote:

  Hm, what if I want to make, say, snow capped Devanagari glyphs for my

 hiking company in Nepal? Shouldn't I assign them to Unicode code points?

 That's what Private Use code positions are for.
 --
 Michael Everson * * Everson Typography *  * http://www.evertype.com


Um, Michael, I think Anto'nio was talking about glyphs in a
decorative font, which should -- clearly -- just be mapped to
ordinary Unicode characters, via an ordinary Unicode cmap.


If they correspond to Unicode characters, yes, certainly.


Or do you think that the yellow, cursive, shadow-dropped, 3-D
letters Getaway! at:

http://www.trekking-in-nepal.com/

should also be represented by Private Use code positions? ;-)


Not at all. Fonts with images of igloos and yurts would use it, 
though, I would think.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: Character identities

2002-10-28 Thread Michael Everson
At 14:31 -0800 2002-10-28, Figge, Donald wrote:

At 20:59 + 2002-10-28, Anto'nio Martins-Tuva'lkin wrote:

On 2002.10.28, 13:09, David Starner [EMAIL PROTECTED] wrote:


  Basically, any decorative or handwriting font can't be a Unicode font.

...

  Seems pointless to tell a lot of the fontmakers out there that they
  shouldn't worry about Unicode, because Unicode's only for standard
  book fonts


Hm, what if I want to make, say, snow capped Devanagari glyphs for my
hiking company in Nepal? Shouldn't I assign them to Unicode code points?


That's what Private Use code positions are for.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com
--
I don't think so. He seems to be talking about a specific typographic style.
Code points don't care about style, whether it's Franklin Gothic or
Snowcapped Helvetica.


I must have misunderstood. I think I only saw the snow-capped and 
not the Devanagari. Sorry.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Character identities

2002-10-28 Thread Doug Ewell
My USD 0.02, as someone who is neither a professional typographer nor a
font designer (more than one, but not quite two, different things)...

Discussions about the character-glyph model often mention the essential
characteristics of a given character.  For example, a Latin capital A
can be bold, italic, script, sans-serif, etc., but it must always have
that essential A-ness such that readers of (e.g.) English can identify
it as an A instead of, say, an O or a 4 or a picture of a duck.  (Mark
Davis has a chart showing dozens of different A's in his Unicode Myths
presentation.)

Somewhere in between the obvious relationships (A = A, B ≠ A), we have
the case pair A and a.  They are not identical, but they are certainly
more similar to each other than are A and B.

It seems to me, as a non-font guy, that calling a font a Unicode font
implies two things:

1.  It must be based on Unicode code points.  For True- and OpenType
fonts, this implies a Unicode cmap; for other font technologies it
implies some more-or-less equivalent mechanism.  The point is that
glyphs must be associated with Unicode code points (not necessarily
1-to-1, of course), not merely with an internal 8-bit table that can be
mapped to Unicode only through some other piece of software.

2.  The glyphs must reflect the essential characteristics of the
Unicode character to which they are mapped.  That means a capital A can
be bold, italic, script, sans-serif, etc.  A small a can also be
small-caps (or even full-size caps), but I think this is the only
controversial point.

In a Unicode font, U+0041 cannot be mapped to a capital A with macron,
as it is in Bookshelf Symbol 1; nor to a six-pointed star, as in
Monotype Sorts; nor to a hand holding up two fingers, as in Wingdings.
(But it can be mapped to a notdef glyph, if the font makes no claim to
supporting U+0041.)

U+0915 absolutely can have snow on it, or be bold or italic or whatever
(or all of these), as long as a Devanagari reader would recognize its
essential ka-ness.  It cannot look like a Latin A, nor for that matter
can U+0041 look like a Devanagari ka.

Font guys, do you agree with this?

Of course, the term Unicode font is also often used to mean a font
that covers all, or nearly all, of Unicode.  Font technologies
generally don't even allow this, of course, and even by the standards of
nearly we are still limiting ourselves to things like Bitstream
Cyberbit, Arial Unicode MS, Code2000, Cardo, etc.  Right or wrong, this
is a commonly accepted meaning for Unicode font.

-Doug Ewell
 Fullerton, California





Re: Character identities

2002-10-28 Thread Mark Davis
I'm pretty much in agreement with what you say, except the following:

 Of course, the term Unicode font is also often used to mean a font
 that covers all, or nearly all, of Unicode.

I would consider a Unicode font to be one that met your other conditions,
aside from the repertoire. If I had a font that covered Latin, Greek and
Cyrillic and worked with Unicode strings, for example, I would still
consider that a Unicode font. I just wouldn't consider it a (pick your
adjective) full / complete Unicode font.

Mark
__
http://www.macchiato.com
►  “Eppur si muove” ◄

- Original Message -
From: Doug Ewell [EMAIL PROTECTED]
To: Unicode Mailing List [EMAIL PROTECTED]
Sent: Monday, October 28, 2002 17:37
Subject: Re: Character identities


 My USD 0.02, as someone who is neither a professional typographer nor a
 font designer (more than one, but not quite two, different things)...

 Discussions about the character-glyph model often mention the essential
 characteristics of a given character.  For example, a Latin capital A
 can be bold, italic, script, sans-serif, etc., but it must always have
 that essential A-ness such that readers of (e.g.) English can identify
 it as an A instead of, say, an O or a 4 or a picture of a duck.  (Mark
 Davis has a chart showing dozens of different A's in his Unicode Myths
 presentation.)

 Somewhere in between the obvious relationships (A = A, B ≠ A), we have
 the case pair A and a.  They are not identical, but they are certainly
 more similar to each other than are A and B.

 It seems to me, as a non-font guy, that calling a font a Unicode font
 implies two things:

 1.  It must be based on Unicode code points.  For True- and OpenType
 fonts, this implies a Unicode cmap; for other font technologies it
 implies some more-or-less equivalent mechanism.  The point is that
 glyphs must be associated with Unicode code points (not necessarily
 1-to-1, of course), not merely with an internal 8-bit table that can be
 mapped to Unicode only through some other piece of software.

 2.  The glyphs must reflect the essential characteristics of the
 Unicode character to which they are mapped.  That means a capital A can
 be bold, italic, script, sans-serif, etc.  A small a can also be
 small-caps (or even full-size caps), but I think this is the only
 controversial point.

 In a Unicode font, U+0041 cannot be mapped to a capital A with macron,
 as it is in Bookshelf Symbol 1; nor to a six-pointed star, as in
 Monotype Sorts; nor to a hand holding up two fingers, as in Wingdings.
 (But it can be mapped to a notdef glyph, if the font makes no claim to
 supporting U+0041.)

 U+0915 absolutely can have snow on it, or be bold or italic or whatever
 (or all of these), as long as a Devanagari reader would recognize its
 essential ka-ness.  It cannot look like a Latin A, nor for that matter
 can U+0041 look like a Devanagari ka.

 Font guys, do you agree with this?

 Of course, the term Unicode font is also often used to mean a font
 that covers all, or nearly all, of Unicode.  Font technologies
 generally don't even allow this, of course, and even by the standards of
 nearly we are still limiting ourselves to things like Bitstream
 Cyberbit, Arial Unicode MS, Code2000, Cardo, etc.  Right or wrong, this
 is a commonly accepted meaning for Unicode font.

 -Doug Ewell
  Fullerton, California








Re: Character identities

2002-10-28 Thread Michael \(michka\) Kaplan
All this talk about the letter A reminded me of something from Hofstadter:

The problem of intelligence, as I see it is to understand the fluid nature
of mental categories, to understand the invariant cores of percepts such as
your mother’s face, to understand the strangely flexible yet strong
boundaries of concepts such as “chair” or the letter “a“ … The central
problem of (artificial intelligence) is the question: What is the letter ‘a’
and ‘i’? ...By making these claims, I am suggesting that, for any program to
handle letterforms with the flexibility that human beings do, it would have
to possess full-scale general intelligence.

-- Douglas R. Hofstadter, from one of his Metamagical Themas articles

The notion that we could ever capture the essence of A-ness has already
been discussed at length and dismissed as impossible without an AI
breakthrough. :-)

MichKa





Re: Character identities

2002-10-28 Thread John Cowan
Doug Ewell scripsit:

 1.  It must be based on Unicode code points.  For True- and OpenType
 fonts, this implies a Unicode cmap; for other font technologies it
 implies some more-or-less equivalent mechanism.  The point is that
 glyphs must be associated with Unicode code points (not necessarily
 1-to-1, of course), not merely with an internal 8-bit table that can be
 mapped to Unicode only through some other piece of software.

If it's a FIGlet font, of course, it's automatically Unicode, since FIGlet's
table is 32 bits wide.

 In a Unicode font, U+0041 cannot be mapped to a capital A with macron,
 as it is in Bookshelf Symbol 1; nor to a six-pointed star, as in
 Monotype Sorts; nor to a hand holding up two fingers, as in Wingdings.
 (But it can be mapped to a notdef glyph, if the font makes no claim to
 supporting U+0041.)

In fact, these fonts map these glyphs to U+F041.  Only when seen as 8-bit
fonts do they map to 0x41.

-- 
With techies, I've generally found  John Cowan
If your arguments lose the first round  http://www.reutershealth.com
Make it rhyme, make it scan http://www.ccil.org/~cowan
Then you generally can  [EMAIL PROTECTED]
Make the same stupid point seem profound!   --Jonathan Robie




Re: Character identities

2002-10-28 Thread John Hudson
At 18:37 10/28/2002, Doug Ewell wrote:


It seems to me, as a non-font guy, that calling a font a Unicode font
implies two things:

1.  It must be based on Unicode code points.  For True- and OpenType
fonts, this implies a Unicode cmap; for other font technologies it
implies some more-or-less equivalent mechanism.  The point is that
glyphs must be associated with Unicode code points (not necessarily
1-to-1, of course), not merely with an internal 8-bit table that can be
mapped to Unicode only through some other piece of software.


My only ammendment to that would be:

'The point is that those glyphs that are intended to represent the default 
form of the characters supported by that font must be associated with 
Unicode codepoints, whether directly or indirectly, not merely...'

Not every glyph in a font needs to be encoded, and in general glyph 
variants and things like ligatures should not be, unless standard Unicode 
codepoints happen to be available for them (even then, it would be 
legitimate to leave them unencoded and access them only via glyph 
processing features).

2.  The glyphs must reflect the essential characteristics of the
Unicode character to which they are mapped.  That means a capital A can
be bold, italic, script, sans-serif, etc.  A small a can also be
small-caps (or even full-size caps), but I think this is the only
controversial point.


Yes, I would agree with that, with the caveat that the A-ness of an A isn't 
necessarily something that can be defined: it can only be recognised.

Of course, the term Unicode font is also often used to mean a font
that covers all, or nearly all, of Unicode.  Font technologies
generally don't even allow this, of course, and even by the standards of
nearly we are still limiting ourselves to things like Bitstream
Cyberbit, Arial Unicode MS, Code2000, Cardo, etc.  Right or wrong, this
is a commonly accepted meaning for Unicode font.


I really think we should all do what we can to bury this use of the term. 
It is singularly unhelpful, and the idea in the minds of some customers 
that they *need* a font that covers all of Unicode has not done anyone any 
good. Sure some font developers made some money making these ridiculously 
huge grab-bag fonts, but their time could have been much better spent.

John Hudson

Tiro Typeworks		www.tiro.com
Vancouver, BC		[EMAIL PROTECTED]

It is necessary that by all means and cunning,
the cursed owners of books should be persuaded
to make them available to us, either by argument
or by force.  - Michael Apostolis, 1467




The comet circumflex system.

2002-10-28 Thread William Overington
Readers interested in internationalization using Unicode might like to know
that I have recently added some documents about the comet circumflex system
to the web.

The introduction and index page are as follows.

http://www.users.globalnet.co.uk/~ngo/c_c0.htm

The main index page of the webspace is as follows.

http://www.users.globalnet.co.uk/~ngo

William Overington

29 October 2002









Re: Character identities

2002-10-28 Thread William Overington
John Hudson commented.

At 02:46 10/26/2002, William Overington wrote:

I don't know whether you might be interested in the use of a small letter
a
with an e as an accent codified within the Private Use Area, but in case
you
might be interested, the web page is as follows.

http://www.users.globalnet.co.uk/~ngo/ligatur5.htm

I have encoded the a with an e as an accent as U+E7B4 so that both
variants
may coexist in a document encoded in a plain text format and displayed
with
an ordinary TrueType font.

If anyone were interested, he could do this himself and use any codepoint
in the Private Use Area.

The meaning which I intended to convey was as follows.

I don't know whether you might be interested in having a look at a
particular example of the use of a small letter a with an e as an accent
codified within the Private Use Area by an individual with an interest in
applying Unicode, but in case you might be interested in having a look at
that particular example, the web page is as follows.

If, following from your response to the way that you read my sentence,
someone were interested in defining a codepoint in the Private Use Area then
certainly he or she could do that himself or herself and use any codepoint
in the Private Use Area.

However, exercising that freedom is something which could benefit from some
thought.

If someone wishes to encode an a with an e as an accent in the Private Use
Area, he or she may wish to be able to apply that code point allocation in a
document.  If he or she looks at which Private Use Area codepoints are
already in use within some existing fonts, then selecting a code point which
is at present unused in those fonts might give a greater chance of his or
her new character assignment being implemented than choosing a code point
for which those fonts already have a glyph in use.

Searching through such fonts takes time and requires some skill.

If someone does wish to use a Private Use Area code point for an a with an e
accent, then by using U+E7B4 does give a possible slight advantage in that
the code point is already part of a published set of code points available
on the web, for, even though that set of code points is not a standard, it
is a consistent set and other people might well use those codepoints as
well.  However, anyone may produce and publish such a set of code point
allocations of his or her own if he or she so wishes, or indeed keep them to
himself or herself.

Yet I was not seeking to make any such point in my posting.  I simply added
to a thread on a specialised topic what I thought might be a short
interesting note with a link to a web page at which some readers might like
to look.  The web page indeed provides two external links to interesting
documents on the web.

Maybe it is time to include a note in the Unicode
Standard to suggest that 'Private' Use Area means that one should keep it
to oneself 

Well, at the moment the Unicode Standard does include the word publish in
the text about the Private Use Area.

I have published details of various uses of the Private Use Area on the web
yet not mentioned them in this forum.  For example, readers might perhaps
like to have a look at the following.

http://www.users.globalnet.co.uk/~ngo/ast07101.htm

Anyone who chooses to do so might like to have a look at the following file
as well, which introduces the application area.

http://www.users.glpbalnet.co.uk/~ngo/ast02100.htm

This is an application of the Unicode Private Use Area so as to produce a
set of soft buttons for a Java calculator so that the twenty hard button
minimum configuration of a hand held infra-red control device for a DVB-MHP
(Digital Video Broadcasting - Multimedia Home Platform) television can be
used in a consistent manner to signal information from the end user to the
computer in the television set.  I am very pleased with the result.  The
encoding achieves a useful effect while being consistent for information
handling purposes with the Unicode specification, so that an input stream of
characters may be processed by a Java program without any ambiguity over
whether a particular code point is a printing character or a calculator
button (or indeed mouse event or simulated mouse event as mouse events are
also encoded using the Private Use Area in my research).

William Overington

29 October 2002













Re: Character identities

2002-10-28 Thread Barry Caplan
At 04:39 PM 10/28/2002 -0600, David Starner wrote:


But think of the utility if Unicode added a COMBINING SNOWCAP and
COMBINING FIRECAP! But should we combine the SNOWCAP with the ICECAP?

(-:

Unicode captures the ice-age during the global warming era!

Do we have codepoints for images found on the walls of caves?

:)

Barry
www.i18n.com