Re: Character identities

2002-11-01 Thread Doug Ewell
William Overington WOverington at ngo dot globalnet dot co dot uk
wrote:

 Would it be possible to define the U+FE00 variant sequence for a with
 two dots above it to be a with an e above it, and similarly U+FE00
 variant sequences for o with two dots above it and for u with two dots
 above it, and possibly for e with two dots above it as well?

It would be possible for the Unicode Technical Committee to define such
a standardized variant, though they have not elected to do so.  It would
*not* be possible for end users such as you or me to do so.

-Doug Ewell
 Fullerton, California





RE: Character identities

2002-10-31 Thread Kent Karlsson

Let me take a few comparable examples;

1. Some (I think font makers) a few years ago argued
   that the Lithuanian i-dot-circumflex was just a
   glyph variant (Lithuanian specific) of i-circumflex,
   and a few other similar characters.

   Still, the Unicode standard now does not regard those as
   glyph variants (anymore, if it ever did), and embodies
   that the Lithuanian i-dot-circumflex is a different
   character in its casing rules (see SpecialCasing.txt).
   There are special rules for inserting (when lowercasing)
   or removing (when uppercasing) dot-aboves on i-s and I-s
   for Lithuanian.  I can only conclude that it would be
   wrong even for a Lithuanian specific font to display an
   i-circumflex character as an i-dot-circumflex glyph,
   even though an i-circumflex glyph is never used for
   Lithuanian.

2. The Khmer script got allocated a KHMER SIGN BEYYAL.
   It stands (stood...) for any abbreviation of the
   Khmer correspondence to etc.; there are at least four
   different abbreviations, much like etc, etc., c,
   et c., ... It would be up to the font maker to decide
   exactly which abbreviation, and would vary by font.

   However, it is now targeted for deprecation for precisely
   that reason: it is *not* the font (maker) that should
   decide which abbreviation convention to use in a document,
   it is the *author* of the document who should decide.
   Just as for the Latin script, the author decides how to
   abbreviate et cetera. The way of abbreviating should stay
   the same *regardless of font*. Note that the font may be
   chosen at a much later time, and not for wanting to
   change abbreviation convention. That convention one
   may want to have the same throughout a document also
   when using several different fonts in it, not having to
   carefully consider abbreviation conventions when choosing
   fonts.

3. Marco would even allow (by default; I cannot get away
   from that caveat since some (not all) font technologies
   do what they do) displaying the ROMAN NUMERAL ONE THOUSAND
   C D (U+2180) as an M, and it would be up to the font
   designer. While the glyphs are informative, this glyphic
   substitution definitely goes too far.  If the author
   chose to use U+2180, a glyph having at least some
   similarity to the sample glyph should be shown, unless
   and until someone makes a (permanent or transient)
   explicit character change.

4. Some people write è instead of é (I claim they cannot
   spell...).  So is it up to a font designer to display
   é as è if the font is made for a context where many
   people does not make a distinction?  Can a correctly
   spelled name (say) be turned into an apparent misspelling
   by just choosing such a font?  And that would be a Unicode
   font?

5. I can't leave the ö vs. ø; these are just different
   ways of writing the same letter; and it is not
   the case that ø is used instead of ö for any 
   7-bit reasons. It is conventional to use ø for ö
   in Norway and Denmark for any Swedish name (or
   word) containing it.  The same goes for ä vs. æ.
   Why shouldn't this one be up to the font makers too?
   If the font is made purely for Norwegian, why not
   display ö as ø, as is the convention?  This is
   *exactly* the same situation as with ä vs. a^e.

I say, let the *author* decide in all these cases, and
let that decision stand, *regardless of font changes*.
[There is an implicit qualification there, but I'm
tired of writing it.]


 Kent Karlsson wrote:
   I insist that you can talk about character-to-character 
   mappings only when
   the so-called backing store is affected in some way.
  
  No, why?  It is perfectly permissible to do the equivalent
  of print(to_upper(mystring)) without changing the backing
  store (mystring in the pseudocode); to_upper here would
  return a NEW string without changing the argument.
 
 And that, conceptually, is a character-to-glyph mapping.

Now I have lost you.  How can it be that?  The print
part, yes. But not the to_upper part; that is a
character-to-character mapping, inserted between the
backing store and mapping characters to glyphs.
It is still an (apparent) character-to-character
mapping even if it is not stored in the backing store.

 In my mind, you are so much into the OpenType architecture, 
 and so much used
 to the concept that glyphization is what a font does, that 
 you can't view the big picture.

Now I have lost you again.  Some fonts (in some font
technologies) do more that pure glyphization. This
is why I have been putting in caveats, since many people
seem to think that all fonts *only* do glyphisation,
which is not the case.

But to be general I was referring to such mappings regardless
of if that is built into some font (using character code points
or, as in OT/AAT, using glyph indices) or (better) were external
to the font.

I was trying to use general formulations, but I cannot
avoid having caveats for certain mappings that certain
technologies do 

[OT] Gthe (was: Re: RE: Character identities)

2002-10-31 Thread Doug Ewell
Adam Twardoch list dot adam at twardoch dot com wrote:

 Should an English language font render ö as oe,  so that Göthe
 appears automatically in the more normal English form Goethe?

 If you refer to Johann Wolfgang von Goethe, his name is *not* spelled
 with an ö anyway.

Somebody thinks so:

http://www.transkription.de/gb_seiten/beispiele/goethe.htm

-Doug Ewell
 Fullerton, California





Re: [OT] Gthe (was: Re: RE: Character identities)

2002-10-31 Thread Marc Wilhelm Küster
At 08:32 31.10.2002 -0800, Doug Ewell wrote:

Adam Twardoch list dot adam at twardoch dot com wrote:

 Should an English language font render ö as oe,  so that Göthe
 appears automatically in the more normal English form Goethe?

 If you refer to Johann Wolfgang von Goethe, his name is *not* spelled
 with an ö anyway.

Somebody thinks so:

http://www.transkription.de/gb_seiten/beispiele/goethe.htm


Both forms are permissible and used, even though Goethe is today by far the 
more frequent version -- remember that there was no standardized German 
orthography before the late 19th century and that the idea that a person's 
name has exactly one spelling is a fairly young idea in Europe.

Taking such facts into account for matching purposes is a good idea, but
changing the version for rendering is not.

Best regards,

Marc



*
Marc Wilhelm Küster
Saphor GmbH

Fronländer 22
D-72072 Tübingen

Tel.: (+49) / (0)7472 / 949 100
Fax: (+49) / (0)7472 / 949 114




Re: Character identities

2002-10-31 Thread Anto'nio Martins-Tuva'lkin
(After sending this unadvertedly to Dominikus only, here's
for the list also...) On 2002.10.30, 16:26, Dominikus Scherkl
[EMAIL PROTECTED] wrote:

 A font representing my mothers handwriting (german only :-) would
 render u as u with breve above to distinguish it from the
 representation of n. I don't know how my mother would write a text
 containing an u with breve above,

FWIW, I've seen the handwriting of an elder German esperantist, and he
does exactly that: he puts breves above each and every u, both on
those which have it and on those which don't -- slightly confusing...

On the brink of off-topic-ness, something of that sort is made in
handwritten cyrillic (at least in Russian tradition): the triple wave
of a lower case t is distinguished from the triple wave of a lower
case shch (*) by means of a stroke above the former and a stroke below
the latter.

(*) Not that I'm an enthusiast of this transliteration...

--   .
António MARTINS-Tuválkin,   |  ()|
[EMAIL PROTECTED]   ||
R. Laureano de Oliveira, 64 r/c esq. |
PT-1885-050 MOSCAVIDE (LRS)  Não me invejo de quem tem   |
+351 917 511 549 carros, parelhas e montes   |
http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe   |
http://pagina.de/bandeiras/  a água em todas as fontes   |





Re: Character identities

2002-10-31 Thread Jim Allan




In Unicode code point U+308 is applied to COMBINING DIAERESIS.
There are a number of precomposed forms with diaeresis.

Let's take one of these, :


  The diaeresis may mean separate pronunication of the u, indicating it is not merged with preceding
of following letter but is pronounced distinctly, as in the classical Greek
name Peirithos or Spanish antigedad. Similarly in Catalan. It
is identified with the Greek dialytika
of the same meaning, which is indeed the ultimate known origin of the symbol.

  
  The diaeresis indicates umlaut modification of u, as in German ber, a use also found in Finnish, Turkish,
Pinyin Chinese Romanization and in many other languages.

  
  In Magyar indicates a sound like French eu.

  
  In IPA it indicates u with
a centralized pronunciation. 

There are may be other phonic interpretations.

Of these uses, only for the second (and possibly the third), might combining
superscript e be used instead of
the diaeresis. The second certainly represents the most common use of  tody, but not the only only one.

Unicode encodes the character COMBINING DIAERESIS, not a generic UMLAUT MARKER
which might take various forms. It provides itself no way of distinguishing
between uses of diaeresis.

All the above uses might occur in German text, or Swedish text, or Finnish
text or any text which might introduce personal names or geographical names
or particular words or phrases from various languages outside the main language
of the text. The same applies for 
and .

Indeed individual words with vowels and umlaut marker, whether represented
as a COMBINING DIAERESIS or COMBINING LATIN SMALL LETTER
E or following e may appear
in text in any language because
of use of technical vocabulary, eg. Senhnscht,
or in personal or place names. 

Now any use of diaeresis meaning umlaut in any language might, it seems to
me, be reasonably replaced by superscript e meaning umlaut. But it is incorrect
to replace diaeresis used for any other purpose by superscript e.

In stright, plain Unicode, if you want to use diaeresis for umlaut, use diaeresis.
If you want to use combining superscript e to indicate umlaut, use COMBINING
LATIN SMALL LETTER E. Leave
any other occurrences of umlaut alone. This is the only possiblitiy at the plain text level,
and the most robust way of chosing between diaeresis and superscript e at any level.

Given a higher protocol, we can do more. We might, as suggested, have a
font which uses superscript e instead
of diaeresis, at least for the combination characters with the base characters
a, o, or u and in place of the diaeresis symbol
itself. If we have another generally identical
font with a true diaeresis instead, we can switch between fonts as necessary
depending on whether diaeresis is used for umlaut or not, or whether in particular
cases we wish to use one or the other symbol for umlaut. 

Switching between such alternate fonts as long been a standby when fancy
typography is required.

Yet I don't see there is any advantage to switching betwen between fonts
and switching between the Unicode character COMBINING DIAERESIS
and COMBINING LATIN SMALL LETTER E. And it makes us dependent on a particular
set of fonts. That is probably not good. :-(

A better solution might be an intelligent font that recognizes some kinds
of tagging and which allows us to turn on different glyphs for diaeresis according
to the tagging, one of these glyphs being a superscript e. So we tag words and phrases. And,
magically, if that particular font works properly, we see diaeresis where
we want diaeresis and superscript e where
we want superscript e.

But it is not evident that tagging for this purpose is any easier than
entering the different Unicode characters from the beginning. And we are
again dependent on the intelligence of a particular font. Of course, we might
expect there will be soon be many such intelligent fonts. It is less likely
that they will all work exactly the same, and understand exactly the same
tags in the same way. And we are restricted to such intelligent fonts as
understand a particular system of tagging rather than using almost any font.
:-(

We might propose introducing a tag or indicator of some kind at some level
to indicate a diaeresis has umlaut function, but such a tag or indicator would
probably only be used when a user wanted to use a superscript e, in
which case it is not clear that using it would have any advantage over actually
entering COMBINING LATIN SMALL LETTER E. :-(

We might go to a still higher level of protocol, to a routine or plugin in
an application or a new style feature added to HTML or XML which allows diaeresis
replacement. Just as Microsoft Word and some other programs now allow capitalization
and small capitalization as an effect, though the underlying text is still
actually in upper and lower case, so we might show a diaeresis as a superscript
e, though in fact at the plain text
level the text has a diaeresis. Presumably for viewing 

RE: Character identities

2002-10-30 Thread Marco Cimarosti
Keld Jørn Simonsen wrote:
 On Tue, Oct 29, 2002 at 09:07:16PM +0100, Marco Cimarosti wrote:
  Kent Karlsson wrote:
   Marco, 
  
  Keld, please allow me to begin with the end of your post:
 
 I really have not contributed much to this thread, I think you mean
 Kent.

Oh No! Again! Apologies to both of you! I seriously start to be worried
about my dislexia...

_ Marco






Re: Character identities

2002-10-30 Thread William Overington
Summary:

Would it be possible to define the U+FE00 variant sequence for a with two
dots above it to be a with an e above it, and similarly U+FE00 variant
sequences for o with two dots above it and for u with two dots above it, and
possibly for e with two dots above it as well?

I may not have got the details right about this suggestion, but, if the
general idea is thought good, I am sure that one of the experts on this list
could codify it properly.



It seems to me that there is middle ground between the two views being
expressed.

Suppose, for example, hypothetically, that there is a font available in
Germany, named Volksmusik which is a display font intended for setting
headings in modern German, such as for the headings in advertisements for
restaurants and so on, and that in that font the a umlaut, o umlaut and u
umlaut are all expressed using a mark which is something like a small letter
e.

Then, it seems to me that if a theatre restaurant manager has set out the
text required for a menu for the restaurant for some special gala evening to
be held soon using a plain text editor on a PC using a font such as Arial,
with a umlaut characters appearing many times, sometimes in headings and
sometimes in the main body of the text, then stored the text on a floppy
disc and walked down the road to the print shop and explained to the print
shop manager that here is the text content for the menus in Arial, could the
print shop please supply 500 menus using that text content yet jazzing it up
a bit so that the headings on each of the four pages is in a fancy typeface
in a different colour, then it should be quite straightforward for the print
shop manager to copy the text onto the clipboard from the Arial file, and
paste it into some other file, then change the font for each of the page
headings to the Volksmusik font, and make the font for the rest of the menu
some plainer font.  Thus, some a umlaut characters originally keyed by the
restaurant manager would display on the final menu as a with two dots above
and some a umlaut characters keyed by the restaurant manager would display
on the final menu as a with a small letter e above.

The restaurant manager is, however, studying part-time for a research degree
at the local university.  This involves producing essays about various
aspects of the printing of German literature, including quoting passages
from earlier times, taking care to distinguish clearly between a with two
dots above it and a with an e above it, all within using a plain text file,
so that there is maximum portability in sending copies of the essay to
various people, including the project supervisor at the University and the
editors of various learned journals.

How is the a with an e above it set, bearing in mind that there is no
precomposed a with an e accent above character in regular Unicode and also
that it would be nice if the text could be searched for keywords using just
the usual search methods?

Would it be possible to define the U+FE00 variant sequence for a with two
dots above it to be a with an e above it, and similarly U+FE00 variant
sequences for o with two dots above it and for u with two dots above it, and
possibly for e with two dots above it as well?

I may not have got the details right about this suggestion, but, if the
general idea is thought good, I am sure that one of the experts on this list
could codify it properly.

William Overington

30 October 2002






RE: Character identities

2002-10-30 Thread Alain LaBonté 
A 21:46 2002-10-29 +, Michael Everson a écrit :

At 13:27 -0800 2002-10-29, Kenneth Whistler wrote:

Michael asked:


 My eyes have glazed over reading this discussion. What am I being
 asked to agree with?


Here's the executive summary for those without the time to
plow through the longer exchange:

Marco: It is o.k. (in a German-specific context) to display
   an umlaut as a macron (or a tilde, or a little e above),
   since that is what Germans do.

Kent:  It is *not* o.k. -- that constitutes changing a character.


[Michael]  Kent can't be right here.


[Alain]  However I agree with Kent. Let's say a text identified as German 
quotes a French word with an U DIAERESIS *in the German text* (a word like 
capharnaüm). It would be a heresy to show a macron in a printed text in 
this context. In French *nobody* uses this practice that is frequent in 
German handwriting (but not in printing, unless I am wrong).

One has to respect characters for what they are. A U DIAERESIS is not a U 
MACRON even if its codepoint is shared with a German U UMLAUT that may be 
handwritten with a *vague* resemblance to a U MACRON.

Alain LaBonté
Québec




Re: RE: Character identities

2002-10-30 Thread Alain LaBonté 
A 22:21 2002-10-29 +, Michael Everson a écrit :

At 15:56 -0600 2002-10-29, [EMAIL PROTECTED] wrote:


Is it complaint with Unicode to have a font where a-umlaut has a glyph of
a with e above? What about a glyph of a-macron (e.g. a handwriting font 
for someone who writes a-umlaut that way)?

Of course it is. Glyphs are informative.


[Alain]  (:

If they are informative, they should inform, not disinform... (;

Alain





Re: RE: Character identities

2002-10-30 Thread Doug Ewell
John Cowan jcowan at reutershealth dot com wrote:

 If I find your Suetterlin font unreadable, however, and switch to an
 Antiqua font to read your German, I expect to find the text littered
 with diaereses, not macrons, although the Suetterlin umlaut-mark looks
 pretty much like a macron.

Actually, the Sütterlin umlaut-mark is a small italicized e, which is
very similar to an n.  What it really ends up looking like, from a
distance, is a double acute.  (John's point is still perfectly valid, of
course.)

Sütterlin does use a macron over m and n to indicate that the letter
should be doubled, and it uses a breve over u to differentiate it from
the otherwise identical n.

-Doug Ewell
 Fullerton, California





RE: Character identities

2002-10-30 Thread Michael Everson
At 10:53 -0500 2002-10-30, Alain LaBontÈÝ wrote:

A 21:46 2002-10-29 +, Michael Everson a écrit :

At 13:27 -0800 2002-10-29, Kenneth Whistler wrote:

Michael asked:


 My eyes have glazed over reading this discussion. What am I being
 asked to agree with?


Here's the executive summary for those without the time to
plow through the longer exchange:

Marco: It is o.k. (in a German-specific context) to display
   an umlaut as a macron (or a tilde, or a little e above),
   since that is what Germans do.

Kent:  It is *not* o.k. -- that constitutes changing a character.


[Michael]  Kent can't be right here.


[Alain]  However I agree with Kent. Let's say a text identified as 
German quotes a French word with an U DIAERESIS *in the German text* 
(a word like capharnaüm). It would be a heresy to show a macron in 
a printed text in this context. In French *nobody* uses this 
practice that is frequent in German handwriting (but not in 
printing, unless I am wrong).

All that means is that the German font which did that would not be 
useful for French. The underlying coded character is the same, and 
the glyph is INFORMATIVE.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: Character identities

2002-10-30 Thread Dominikus Scherkl
 [Alain]  However I agree with Kent. Let's say a text 
 identified as German quotes a French word with an
 U DIAERESIS *in the German text* (a word like capharnaüm).
 It would be a heresy to show a macron in a printed text in 
 this context.
Hm.
A font representing my mothers handwriting (german only :-)
would render u as u with breve above to distinguish it
from the representation of n.
I don't know how my mother would write a text containing an
u with breve above, but nevertheless the u-glyphe has to have
a breve, even if it may conflict with another charakter.

If you got a text with such ambiguities, why don't use another
font for the quotings? - has the additional advantage of
pointing out visualy that it's a quotation.

-- 
Dominikus Scherkl
[EMAIL PROTECTED]




Re: RE: Character identities

2002-10-30 Thread Michael Everson
At 10:54 -0500 2002-10-30, Alain LaBontÈÝ wrote:

A 22:21 2002-10-29 +, Michael Everson a écrit :

At 15:56 -0600 2002-10-29, [EMAIL PROTECTED] wrote:


Is it complaint with Unicode to have a font where a-umlaut has a glyph of
a with e above? What about a glyph of a-macron (e.g. a handwriting 
font for someone who writes a-umlaut that way)?

Of course it is. Glyphs are informative.


[Alain]  (:

If they are informative, they should inform, not disinform... (;


I think this thread has about worn itself out ;-)
--
Michael Everson * * Everson Typography *  * http://www.evertype.com




RE: Character identities

2002-10-30 Thread Kent Karlsson

 I insist that you can talk about character-to-character 
 mappings only when
 the so-called backing store is affected in some way.

No, why?  It is perfectly permissible to do the equivalent
of print(to_upper(mystring)) without changing the backing
store (mystring in the pseudocode); to_upper here would
return a NEW string without changing the argument.

 If the backing store
 is not changed, it is only a character-to-glyph mapping, 
 however complicate and indirect it may be.

Yes.  But with several font technologies the user can affect
the mapping in some ways, via features.  Including what 
*amounts to* mapping to uppercase (an x-height A glyph is
an A not an a, even if you have an a in the backing
store), or various other changes, like changing diaeresis
to e-above (they are still not glyph variants of eachother,
even in German, which is why DIN asked for e-above, etc.).

My claim is that it is a bad idea for fonts (I don't dare
say Unicode font at this point) to do what *amounts to*
such in-effect character mappings *without explicit request*
from whoever is in charge of the text in some way (author,
editor, graphic designer, reader who like to make changes to
the text, ...).  Such changes should NOT be the result of
JUST changing font.

(I still think it is a bad idea to build such *in effect*
transient character to character mappings into fonts;
but people are doing that anyway, so...)


 I totally agree with Doug's careful definition, and I am glad 
 that you agree as well.
 
 Doug indicates two key points that a font must respect to be 
 suitable for Unicode:
 
 « [...] calling a font a Unicode font implies two things:
 1. It must be based on Unicode code points. [...]
 2. The glyphs must reflect the essential characteristics of 
 the Unicode
 character to which they are mapped. [...] »
 
 If we agree that the only requirement for a glyph 
 representing a certain
 Unicode character is to respect the essential 
 characteristics which make
 it recognizable, then all our discussion is simply about 
 determining which
 essential characteristics a particular character is 
 supposed to have.

So far we agree completely re. that definition.

 To me, a glyph floating atop of letters a, o and u is 
 recognizably a
 German umlaut if (a) the text is written in German, and (b) 
 the glyph has
 one of the following shapes:
 
 1. Two small blobs (e.g. circles, squares, acute accents) 
 places side by side;

I'm going to opt staying on the restrictive side here.

Except for the last one, that is a diaeresis, yes.  That is the
modern standard way of writing umlaut in typeset German. The
last one is a double acute, which is normally not used for this
in German, and it is stretching things a bit too far to consider
it a glyph variant of diaeresis.

 2. A straight horizontal line;

That's a macron.  Not used in *standard orthography* for German.
Using that as a glyph variant for diaeresis is stretching things
quite a lot, even if it occurs in particular forms of handwriting
or some signs. (In handwriting, some people use I-dot-above, or
even I-ring-above.  Does that make them glyph variants of I,
in a (non-Turkish) font (that mimic handwriting)?  I hope not.
If you want I-ring-above, then do what *in effect* amounts to a
(permanent or transient) mapping to I, combining-ring-above.)

 3. A wavy horizontal line;

That's a tilde.  Not used in *standard orthography* for German.
Using that as a glyph variant for diaeresis is stretching things
quite a lot.  Though it is quite common to use tilde instead of
diaeresis in handwriting.  (If there were a handwriting font
feature, what amounts to a transient mapping from diaeresis to
tilde would be expected under that feature.  For some fonts I
might even agree that it might have that feature on by default;
but possible to turn off.)

 4. a small lowercase e, or something recalling it.

Our major point of disagreement (along with M vs. Roman Numeral
One Thousand C D ;-). Historically that is the origin of
the umlaut.  It is definitely distinct from diaeresis,
just as much as æ is distinct from ä, even in a German context. 
This is not just stretching it very far, I'd say it's plain wrong,
also in a purely German context.  That does not at all prevent a
hist feature (or whatever; but never on by default) to do
what amounts to a transient mapping from diaeresis to e-above.

 I don't argue this for caprice or provocation, but because 
 these particular
 shapes are commonly attested in one context or another: be it modern
 typography, traditional typography, handwriting, fancy graphics, etc.

Yes.

...
 If (and only if!) the author/editor of the text asks for an
  overscript e should the font produce one. It is not up to
  the font maker to make such substitutions without request,
 
 Yes. But a font which displays U+0308 with a glyph resembling 
 the typical
 glyph for U+0364 is not producing anything; it is not substituting
 anything with anything else: it is just 

RE: Character identities

2002-10-30 Thread Kent Karlsson

 Marco: It is o.k. (in a German-specific context) to display
 an umlaut as a macron (or a tilde, or a little e above),
 since that is what Germans do.
 
 Kent:  It is *not* o.k. -- that constitutes changing a character.

 Kent can't be right here.

 1. We have all seen examples, in print, in signage, and in
 handwriting of German umlauts being displayed in each of those ways.
 Obviously the underlying encoding of them is the same, as is the
 intent.

The underlying encoding *may* be the same (if there is an encoding
at all...).  Still, I claim, it should not be up to the font
designer to make a font that shows e.g. an a-with-e-above glyph
for a-diaeresis *without also* the font being explicitly requested
(via some higher-level protocol) to do such a mapping, via a
hist feature (off by default) or whatever other mechanism. Such
a mapping *amounts to* a transient character-to-character mapping.

Just as I think an author (I use that in a general sense)
should be in charge of the spelling in a document, the author
should be in charge of what diacritics are used.  Would it be
a good idea for a British font to change color to colour,
i18n to internationalisation?  AAT fonts can in principle
do that (via glyph index mappings executed through a finite
automaton, but that is beside the point), so should they? Is
such a font (if it did this mapping by default) a Unicode font?
Each item in these two example pairs are seen in print (etc.)
and they are known to mean the same within each pair...

There are signs (and printed texts) that say Gøteborg; but
we usually spell that Göteborg.  Does that mean that the
underlying encoding (if any) therefore must be the same (the
same city is intended...), and ø is just a glyph variant of ö
(or the other way around), and a ((Unicode)) font may display
ö as ø (without being asked to perform any extraneous mapping).
Say the font is made for Norwegian. Is this all up to the font
designer? This is an exact parallel to what we started off with.

/Kent K





Re: RE: Character identities

2002-10-30 Thread John Cowan
Doug Ewell scripsit:

 Actually, the Sütterlin umlaut-mark is a small italicized e, which is
 very similar to an n.  What it really ends up looking like, from a
 distance, is a double acute.

Oops, yes.  Brain fart.

 Sütterlin does use a macron over m and n to indicate that the letter
 should be doubled,

This I think is a true COMBINING MACRON.

 and it uses a breve over u to differentiate it from
 the otherwise identical n.

Part of the u glyph.

-- 
XQuery Blueberry DOMJohn Cowan
Entity parser dot-com   [EMAIL PROTECTED]
Abstract schemata   http://www.reutershealth.com
XPointer errata http://www.ccil.org/~cowan
Infoset Unicode BOM --Richard Tobin




RE: RE: Character identities

2002-10-30 Thread Kent Karlsson

 Sütterlin does use a macron over m and n to indicate that 
 the letter should be doubled

So should a Sütterlin font then by default replace mm with an m-macron
glyph?  Or should the author decide which orthography to use?

/Kent K





Re: Character identities

2002-10-30 Thread Philipp Reichmuth
Hello Doug,

DE Actually, the Sütterlin umlaut-mark is a small italicized e,
DE which is very similar to an n. What it really ends up looking
DE like, from a distance, is a double acute. [...] Sütterlin does use
DE a macron over m and n to indicate that the letter should be
DE doubled,

Actually, when I learned it in school about seventeen years ago, I was
taught to use double acutes as umlaut markers, and there were no
macrons to indicate doubled letters. Double m was not very legible,
however.

Cheers -
  Philipp Reichmuthmailto:mailinglistenprozessor;gmx.net

--
You step in the stream, / but the water has moved on / This page is not here





RE: Character identities

2002-10-30 Thread Marco Cimarosti
Alain LaBonté wrote:
 [Alain]  However I agree with Kent. Let's say a text 
 identified as German quotes a French word with an
 U DIAERESIS *in the German text* (a word like
 capharnaüm).

A Fraktur font designed solely for German should not be used for typesetting
French words. (And, BTW, that is probably why German Fraktur books used
roman type for foreign words).

In general, you cannot expect a good result using a font designed for one
language to typeset another: see, in the attached image, what your
capharnaüm looks like in a font designed for Chinese. Nice typography, eh?
That ü is so weird because it is designed to be used in conjunction with
the full width letters in U+FF41..U+FF5A, which is perhaps the right choice
for Chinese, but not for French.

_ Marco


attachment: cafarnao.gif

RE: RE: Character identities

2002-10-30 Thread Marco Cimarosti
I said:
 Ah! I never realized that the Sütterlin zig-zag-shaped e 
 was the missing with the ¨ glyph!
^

Sorry: ... the missing LINK with 

_ Marco




RE: RE: Character identities

2002-10-30 Thread Marco Cimarosti
Doug Ewell wrote:
 Actually, the Sütterlin umlaut-mark is a small italicized
 e, which is very similar to an n.  What it really
 ends up looking like, from a distance, is a double acute.

Ah! I never realized that the Sütterlin zig-zag-shaped e was the missing
with the ¨ glyph!

Thanks! After all, this discussion has not been completely useless. :-)

_ Marco




RE: Character identities

2002-10-30 Thread Marco Cimarosti
Kent Karlsson wrote:
  I insist that you can talk about character-to-character 
  mappings only when
  the so-called backing store is affected in some way.
 
 No, why?  It is perfectly permissible to do the equivalent
 of print(to_upper(mystring)) without changing the backing
 store (mystring in the pseudocode); to_upper here would
 return a NEW string without changing the argument.

And that, conceptually, is a character-to-glyph mapping.

In my mind, you are so much into the OpenType architecture, and so much used
to the concept that glyphization is what a font does, that you can't view
the big picture.

If you look at Unicode from a platform independent perspective, fonts do not
necessarily do something. In some architectures, fonts are just inert
repository of glyphs, and the display intelligence is somewhere out of the
font.

  If the backing store
  is not changed, it is only a character-to-glyph mapping, 
  however complicate and indirect it may be.
 
 Yes.  But with several font technologies the user can affect
 the mapping in some ways, via features. [...]

Even in the simplest of technologies, the user can affect the mapping in
some way, e.g. using a different font.

 My claim is that it is a bad idea for fonts (I don't dare
 say Unicode font at this point) to do what *amounts to*
 such in-effect character mappings *without explicit request*
 from whoever is in charge of the text in some way (author,
 editor, graphic designer, reader who like to make changes to
 the text, ...).  Such changes should NOT be the result of
 JUST changing font.

All undue generalizations of the OpenType paradigm. Not all fonts do
something (let alone doing what you wish them to do); not all font
technologies have modes (better said, *no* font technologies have modes,
if not in theory).

  To me, a glyph floating atop of letters a, o and u is 
  recognizably a
  German umlaut if (a) the text is written in German, and (b) 
  the glyph has
  one of the following shapes:
  
  1. Two small blobs (e.g. circles, squares, acute accents) 
  places side by side;
 
 I'm going to opt staying on the restrictive side here.
 
 Except for the last one, that is a diaeresis, yes.  That is the
 modern standard way of writing umlaut in typeset German. The
 last one is a double acute, which is normally not used for this
 in German, and it is stretching things a bit too far to consider
 it a glyph variant of diaeresis.

I think stretching things is not seeing that the umlaut of most Fraktur
fonts looks like a double acute: a shape which is consistent with the usual
shape of the dots on i and j.

BTW, strangely, you don't seem to be worried by the fact that also i and
í look the same... What if I use Fraktur for Spanish?

[...]
  If (and only if!) the author/editor of the text asks for an
   overscript e should the font produce one. It is not up to
   the font maker to make such substitutions without request,
  
  Yes. But a font which displays U+0308 with a glyph resembling 
  the typical
  glyph for U+0364 is not producing anything; it is not 
 substituting
  anything with anything else: it is just faithfully 
  reproducing the text,
  according to the content decided by the author *and* 
 according to the
  typographical style decided by the font designer.
 
 This is not a typographic decision, it is a spelling decision,
 and not up to the font designer, I'd say.  It is a typographic
 decision whether the diaeresis digs into the glyph below, or if
 an e-above looks like a capital e inside.  But spelling changes,
 whether transient or permanent, should be the author's call.

It is a cat biting its tail (*). If you consider it a glyph variation, it
is just a typographic decision; if you consider it a character change, it
becomes an orthographic issue.

But considering a character change the fact that a certain code point is
displayed with a certain glyph is, IMHO, totally out of the letter and
spirit of the Unicode character-glyph model.

(*: Am I exporting an Italian idiom or is this used in English too? Anyway,
it means a chicken-egg issue)

_ Marco




RE: Character identities

2002-10-30 Thread P. T. Rourke


This is not a typographic decision, it is a spelling decision,

and not up to the font designer, I'd say.  It is a typographic
decision whether the diaeresis digs into the glyph below, or if
an e-above looks like a capital e inside.  But spelling changes,
whether transient or permanent, should be the author's call.



No, it is not a spelling decision.  Both are umlauts: one with a letter 
form of /e/ and one with a letter form of  ¨ .  Any textual editor in 
the world would make that judgment call, and typeset according to the 
graphic expectations of his (or her) readers, not according to the 
graphic usage of the author, no matter how conservative the text.  







Re: Character identities

2002-10-30 Thread David Starner
On Wed, Oct 30, 2002 at 10:53:10AM -0500, Alain LaBonté  wrote:
 [Alain]  However I agree with Kent. Let's say a text identified as German 
 quotes a French word with an U DIAERESIS *in the German text* (a word like 
 capharnaüm). It would be a heresy to show a macron in a printed text in 
 this context. 

It would be heresy not to change the font, since in the typesetting
convention used with Fraktur fonts, French quotes were in set in Roman
fonts different from the surrounding text.

-- 
David Starner - [EMAIL PROTECTED]
Great is the battle-god, great, and his kingdom--
A field where a thousand corpses lie. 
  -- Stephen Crane, War is Kind




RE: Character identities

2002-10-29 Thread jarkko.hietaniemi
 Unicode captures the ice-age during the global warming era!
 
 Do we have codepoints for images found on the walls of caves?
 
 :)

CRO-MAGNON PAINTING HUMAN SPEARING A MAMMOTH
CRO-MAGNON PAINTING MAMMOTH STOMPING A HUMAN
...









RE: Character identities

2002-10-29 Thread Kent Karlsson


 -Original Message-
 From: Marco Cimarosti [mailto:marco.cimarosti;essetre.it]
 Sent: den 28 oktober 2002 16:23
 To: 'Kent Karlsson'; Marco Cimarosti
 Cc: [EMAIL PROTECTED]
 Subject: RE: Character identities


 Kent Karlsson wrote:
For this reason it is quite impermissible to render the
combining letter small e as a diaeresis
  
   So far so good. There would be no reason for doing such a thing.
  ...
or, for that matter, the diaeresis as a combining
letter small e (however, you see the latter version
sometimes, very infrequently, in advertisement).
  
   This is the case I though we were discussing, and it is a
   very different case.
 
  No, the claim was that diaresis and overscript e are the same,

 The claim was that dieresis and overscript e are the same in *modern*
 *standard* German. Or, better stated, that overscript e is
 just a glyph
 variant of dieresis, in *modern* *standard* German typeset in Fraktur.

Well, we strongly disagree about that then.  Marc and I clearly see them
as different.  More about this below.

 Sorry if I haven't stated this clearly enough.

You have several times.  No need to emphasise it anymore.  We still
don't agree.

...
  Some of them (overscript e in particular) should be(come)
  quite commonly occurring in any Fraktur Unicode font.

 Commonly sounds funny near Fraktur...

We were talkning about Fraktur fonts (which may not be all that
common.)

   Using such a character to encode 21st century advertisements
   is doomed to cause problems:
  
   1) The glyph for U+0364 is more likely found in the font
   collection of the
   Faculty of Germanic Studies that on the PC of people wishing
   to read the
   advertisement for Ye Olde Küster Pub. So, most people will
   be unable to
   view the advertisement correctly.
  
   2) The designer of the advertisement will be unable to use
   his spell-checker and hyphenator on the advertisement's text.
 
  Advertisements should invariably be final spell-checked and
  hyphenated by humans!  Automated spell checkers and hyphenators
  for German (as well as Scandinavian languages) have (so far)
  not been good enough even for running text that you want to
  publish...

 This has no connection with this discussion.

Well, you brought it up.  I'm usually rather picky about spelling,
so a spell checker can only suggest corrections, often to be
rejected as wrong or even silly.

 However, IMHO, the presence U+0364 (COMBINING LATIN SMALL
 LETTER E) in a
 modern German or Swedish text is just a plain spelling error,
 and even the
 naivest spellchecker should flag it as such.

So what? Naïve spell checkers flag all kinds of correctly spelled
words!

...
  Most modern use of Fraktur seem to use diaeresis or double
  acute for this.

 U+0308 (COMBINING DIAERESIS) should be the only umlaut to
 be found in
 modern German text. What that diacritic *looks* like (two
 dots, an e, a
 double acute, a macron, Mickey Mouse's ears), is a choice of the font
 designer.

Not quite.  Please note that some characters are defined to have
very specific glyphs, e.g. the estimated sign, there is no shape
variability except for size.  Others are glyphically allocated/
unified, like the diacritics, and some glyphic variability is
expected. But a diaeresis is two dots (of some shape, and it would
be a margin case to have them elongated), never a tilde, macron
or overscript e.  Those are other characters, not just a glyph
variation.  Other characters have more glyphic variability
(informally) associated with them, like A, but some of them
have compatibility variants that have a somewhat more restricted
glyphic variability, like the Math Fraktur A in plane 1.

Some scripts have by tradition some very strong ligatures;
strong in the sense that may be hard to recognise the ligated
pieces in the result glyph.  That does not mean that you can
legitimately use an M glyph for One Thousand C D, just because
they mean the same.  Nor does that mean that diacritics can be
substituted for each other, asking for a diaeresis and get a tilde.
Yes, it is common practice with many to use a tilde instead of
a diaresis in handwriting, but it is still character substitution,
not a glyphic variant (since that is the way diacritics are
allocated in Unicode).


  (But the web designer could use a dynamically
  downloaded font fragment, if there is worry that all glyphs
  might not be supported by the fonts used by the vast majority
  of the target audience.)

 This too has no connection with this discussion, and is OT. Unicode is
 concerned with how text is *encoded* the details of fonts and display
 technology are out of scope.

We were talking about fonts.

 What Unicode really mandates is that the encoding should not change to
 obtain a certain graphic effect.

You can do any character mappings you like before you apply any
font, or make it into graphics...

...
  And overscript small e will also vary with the font,
  looking like a shrunken ordinary e

Re: Character identities

2002-10-29 Thread Michael Everson
At 23:21 -0800 2002-10-28, Barry Caplan wrote:


Do we have codepoints for images found on the walls of caves?


No. The closest we come to that is wondering about the Tartaria 
proto-script, which we haven't readmapped.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: Character identities

2002-10-29 Thread Marco Cimarosti
Kent Karlsson wrote:
  The claim was that dieresis and overscript e are the same 
 in *modern*
  *standard* German. Or, better stated, that overscript e is
  just a glyph
  variant of dieresis, in *modern* *standard* German typeset 
 in Fraktur.
 
 Well, we strongly disagree about that then.  Marc and I 
 clearly see them as different.  More about this below.

We could simply agree to disagree, weren't it for the fact that we both
claim that each other's view violates the principles of Unicode.

I have tried to show that glyphic variation is part the principles of
Unicode, as per TUS 3.0. You might wish to point us to where the current
Unicode Standard support your view, or contradicts mine.

  However, IMHO, the presence U+0364 (COMBINING LATIN SMALL
  LETTER E) in a
  modern German or Swedish text is just a plain spelling error,
  and even the
  naivest spellchecker should flag it as such.
 
 So what? Naïve spell checkers flag all kinds of correctly spelled
 words!

Yes but, IMHO, in this case they would be right: I never heard that U+0364
(COMBINING LATIN SMALL LETTER E) is part of the spelling of modern German or
Swedish.

 Not quite.  Please note that some characters are defined to have
 very specific glyphs, e.g. the estimated sign, there is no shape
 variability except for size.

A small set of *symbols* like the estimate sign and some dingbats are an
exception to the rule that Unicode encodes character but not glyphs.

 Others are glyphically allocated/
 unified, like the diacritics, and some glyphic variability is
 expected. But a diaeresis is two dots (of some shape, and it would
 be a margin case to have them elongated), never a tilde, macron
 or overscript e.

Would you care to go in Germany and have a look at shop signs? The umlaut is
more often a straight line than not. But this doesn't make it a macron:
there is no macron in German.

 Those are other characters, not just a glyph variation.

So I was wrong: German orthography uses macrons! Can you please explain the
German pronunciation of ā, ō and ū?

 Other characters have more glyphic variability
 (informally) associated with them, like A, but some of them
 have compatibility variants that have a somewhat more restricted
 glyphic variability, like the Math Fraktur A in plane 1.

More *symbol* characters which escape the general rule. 

 Some scripts have by tradition some very strong ligatures;
 strong in the sense that may be hard to recognise the ligated
 pieces in the result glyph.  That does not mean that you can
 legitimately use an M glyph for One Thousand C D, just because
 they mean the same.

Perhaps. It could have been a poor example. But the opposite is much more
important: you cannot use a character in place of another which means a
different thing just because you want a different look.

 Nor does that mean that diacritics can be
 substituted for each other, asking for a diaeresis and get a tilde.

Substituting diacritics for each other is what *you* seem to suggest!

 Yes, it is common practice with many to use a tilde instead of
 a diaresis in handwriting, but it is still character substitution,
 not a glyphic variant (since that is the way diacritics are
 allocated in Unicode).

So, German orthography uses tildes too! Can you please explain the German
pronunciation of ã, õ and ũ?

  What Unicode really mandates is that the encoding should 
 not change to
  obtain a certain graphic effect.
 
 You can do any character mappings you like before you apply any
 font, or make it into graphics...

There can be no character-to-character mapping inside a font or a display
engine! Applications are allowed to do character-to-character mappings only
when they want to *change* the text in some way (e.g., a case conversion, a
transliteration, etc.), not when they want to display it.

Displaying Unicode only implies character-to-glyph mappings. Internally,
there can be some glyph-to-glyph mapping, but never a character-to-character
mapping. Even character-to-character mappings done on a temporary copy of
the text are, conceptually, a step on the  character-to-glyph mapping.

This fundamental error spreads throughout all your post, and makes it
impossible to go into the details without keeping on saying: you can't do
any character-to-character mappings during display; you can't do any
character-to-character mappings during display; you can't do any...

 I was trying to be general (not fancy) and not just talk about
 Opentype.  But yes, I meant (at least) the case where no
 features (or similar) are invoked.

Who tells you that there are any features to be invoked? There is no
similar requirement in Unicode!

 What I was aiming at excluding were features that implicitly
 involve character mappings, [...]

You see? You can't do any character-to-character mappings during display.
For simplicity, I will simply cut off all passages where you assume this.

 A font that by default (that is ordinary English, not a fancy
 term)

Who tells you that 

RE: Character identities

2002-10-29 Thread Kent Karlsson
Marco, 

   Standard orthography, and orthography that someone may
choose to use on a sign, or in handwriting, are often not
the same.

   And I did say that current font technologies (e.g. OT)
does not actually do character to character mappings,
but the net effect is *as if* they did (if, and I hope
only if, certain features are invoked, like smallcaps).
It would be more honest to do them as character-to-character
mappings though, either inside (which OT does not support)
or outside of the font.  Capital A, even at x-height, is not
a glyph variant of small a (even though, centuries ago, that
was the case, but then I and J were the same, and U and V,
et and , ad and , ...). But displaying U as V (in effect
doing a character replacement on a copy of the input) would
be ok in a non-default mode (using the hist feature, say).
My point here is that that replacement (effectively) should
not be done by default in a Unicode font (see Doug's explanation
for what a Unicode font is, if you don't like mine).

 [...] I never heard that U+0364
 (COMBINING LATIN SMALL LETTER E) is part of the spelling of 
 modern German or Swedish.

   True (that is not part of modern standard orthography),
but I don't see how that could imply some kind of support
for your (rather surprising and extreme) position.

   If (and only if!) the author/editor of the text asks for an
overscript e should the font produce one.  It is not up to
the font maker to make such substitutions without request,
either by the author/(human) editor changing the text, or by
the author/editor invoking a non-default font feature
(via some higher-level protocol, can't be done in plain text).
The default mode (for lack of a better term) would be the
one used, well, by default; e.g. on plain text.

  Other characters have more glyphic variability
  (informally) associated with them, like A, but some of them
  have compatibility variants that have a somewhat more restricted
  glyphic variability, like the Math Fraktur A in plane 1.
 
 More *symbol* characters which escape the general rule. 

   Math Fraktur A is a letter (of course!).  Many letters,
including ordinary A, are used as symbols too.

You seem to argue that for symbols (whichever those are,
I'm sure you *don't* mean general categories S*...) there is
total rigidity, while for non-symbols (whichever those are)
there is near total anarchy and font makers can change glyphs
to something entirely different.

   I claim that there are no characters for which there is total
anarchy (except possibly for view invisibles of normally
invisible characters), but that there are several degrees
of flexibility (I'm sure someone can list more than three,
but here is a coarse division): 

1. glyph (almost) fixed: Dingbats, estimated sign, ...
   [could possibly be given a rugged look, or texture
   if you want to mimic e.g. a typewriter look]

2. abstract glyph is fixed but there can be minor
   shape variations: diacritics, math symbols (Sm),
   math letters (there are several Math Fraktur
   designs, several Math sans-serif designs, etc.
   that could suit), Arabic presentation forms (initial/
   medial/final/isolated decided but other aspects are
   not fixed, maybe this case is between 2 and 3), ...

3. fairly free as long as (some) readers recognise
   the character from the glyph (modulo compatibility/
   canonical variants and what should have been
   compatibility/canonical variants...): nominal
   digits/letters/punctuation, ... [This, however,
   does NOT allow, e.g., the One Thousand C D character
   to be shown with an M glyph, nor display € as EUR, ...
   in a Unicode font in...; if it did so in default mode
   [by default], it would not be a Unicode font.]

[4. Near anarchy; you seem to argue that a large part of
case 2 and all of case 3 fall here...]

Yes, you can have glyphic variation, but for the diacritics
there is (by design, but maybe not sufficiently explicit
stated in the book) a limit to how much it can vary (in
default mode).  There are limits also for, e.g., 'nominal'
letters and roman numeral characters, that are (by design)
somewhat less constrained.  In addition you may note that
those who asked for the inclusion of overscript e does not
regard an overscript e glyph to be an acceptable way
of displaying a diaeresis [in a Uni..., you know].

   These things come up quite often in discussions about 
proposals to add characters, even though it is not formally
stated.  If some of the Unicode elders care to elaborate,
please feel free.

   Marco, I'm not sure it is of any use to try to explain in
more detail, since you don't appear to be listening.  However,
I think I, Marc, Doug, and Mark (at the very least) seem
to be in approximate agreement on this (at least, I have
yet to see any major disagreement).  I'm sure Michael

RE: Character identities

2002-10-29 Thread Marco Cimarosti
Kent Karlsson wrote:
 Marco, 

Keld, please allow me to begin with the end of your post:

Marco, please calm down and reread every sentence of my
 previous message.  You seem to have misread quite a few things,
 but it is better you reread calmly before I try to clear
 up any remaining misunderstandings.

I have been absolutely calm, and I apologize if I gave a different
impression. I may happen to heat up when discussing things like ethics,
politics, religions, racism, war, etc., but definitely not when discussing
about the details of the Unicode character-glyph model.

I wish to recall that we are just discussing about a glyph variation for a
diacritic character: a variation that I consider acceptable and you consider
undesirable. Please let's not make this bigger than it could reasonably be.

Standard orthography, and orthography that someone may
 choose to use on a sign, or in handwriting, are often not
 the same.
 
And I did say that current font technologies (e.g. OT)
 does not actually do character to character mappings,
 but the net effect is *as if* they did (if, and I hope
 only if, certain features are invoked, like smallcaps).
 It would be more honest to do them as character-to-character
 mappings though, either inside (which OT does not support)
 or outside of the font.  Capital A, even at x-height, is not
 a glyph variant of small a (even though, centuries ago, that
 was the case, but then I and J were the same, and U and V,
 et and , ad and , ...). But displaying U as V (in effect
 doing a character replacement on a copy of the input) would
 be ok in a non-default mode (using the hist feature, say).

I insist that you can talk about character-to-character mappings only when
the so-called backing store is affected in some way. If the backing store
is not changed, it is only a character-to-glyph mapping, however complicate
and indirect it may be.

Whether these mappings takes part inside or outside a font is irrelevant as
far, again, as the backing store is not changed.

 My point here is that that replacement (effectively) should
 not be done by default in a Unicode font (see Doug's explanation
 for what a Unicode font is, if you don't like mine).

I totally agree with Doug's careful definition, and I am glad that you agree
as well.

Doug indicates two key points that a font must respect to be suitable for
Unicode:

« [...] calling a font a Unicode font implies two things:
1. It must be based on Unicode code points. [...]
2. The glyphs must reflect the essential characteristics of the Unicode
character to which they are mapped. [...] »

If we agree that the only requirement for a glyph representing a certain
Unicode character is to respect the essential characteristics which make
it recognizable, then all our discussion is simply about determining which
essential characteristics a particular character is supposed to have.

To me, a glyph floating atop of letters a, o and u is recognizably a
German umlaut if (a) the text is written in German, and (b) the glyph has
one of the following shapes:

1. Two small blobs (e.g. circles, squares, acute accents) places side by
side;
2. A straight horizontal line;
3. A wavy horizontal line;
4. a small lowercase e, or something recalling it.

I don't argue this for caprice or provocation, but because these particular
shapes are commonly attested in one context or another: be it modern
typography, traditional typography, handwriting, fancy graphics, etc.

You seem to argue that only case 1 is acceptable, and probably also add some
constraints on the shape of the blobs (e.g., I think I understood that you
find that a double acute shape would be unacceptable).

As I see it, the only reason for which you say this is because the other
shapes are similar or identical to the typical shapes of other Unicode
characters. As I said, I don't find that this is valid reason, unless the
font we are talking about is to be used in contexts (e.g., linguistics, or
languages other than German) in which the distinction is meaningful.

  [...] I never heard that U+0364
  (COMBINING LATIN SMALL LETTER E) is part of the spelling of 
  modern German or Swedish.
 
True (that is not part of modern standard orthography),
 but I don't see how that could imply some kind of support
 for your (rather surprising and extreme) position.

(Frankly, I find surprising and extreme your position -- perhaps we're only
choosing bad examples.)

What I meant is that if (a) U+0364 is not supposed to appear in modern
German, and (b) the font we are considering is designed to be used for
modern German only, then (c) the possibility of confusing U+0364 with U+0308
is a non issue.

If (and only if!) the author/editor of the text asks for an
 overscript e should the font produce one. It is not up to
 the font maker to make such substitutions without request,

Yes. But a font which displays U+0308 with a glyph resembling the typical
glyph for U+0364 is not producing anything; it is not 

Re: Character identities

2002-10-29 Thread starner
   Standard orthography, and orthography that someone may
choose to use on a sign, or in handwriting, are often not
the same.

If someone's writes an a-umlaut, no matter what it looks,
it should be encoded as an a-umlaut. That's the identity
of the character they wrote. I'm sure my German teacher
would not appreciate us typing up our homework and using
A-macron, even if the symbol she used for a-umlaut on the
blackboard looked like a macron.

   Math Fraktur A is a letter (of course!).  Many letters,
including ordinary A, are used as symbols too.

If it were a letter, then no one would have a problem with
you writing language with it. But there are warnings all 
over the place, about how A and an appropriate font should 
be used for Fraktur A. Math Fraktur A is a symbol - it doesn't
stand for a sound or a word.

You seem to argue that for symbols (whichever those are,
I'm sure you *don't* mean general categories S*...) there is
total rigidity, while for non-symbols (whichever those are)
there is near total anarchy and font makers can change glyphs
to something entirely different.

Font makers can change the glyphs to whatever they want, so long
as it is uniquely that character.

   Marco, I'm not sure it is of any use to try to explain in
more detail, since you don't appear to be listening.  However,
I think I, Marc, Doug, and Mark (at the very least) seem
to be in approximate agreement on this (at least, I have
yet to see any major disagreement).  I'm sure Michael
would agree too (at least I hope so), and many others.

Interesting. I don't agree totally with Marco, but I'm of the opinion
that glyphs of a with e above, a with macron above, and a with Disney
ears above can be suitable glyphs for a-umlaut, and I got the impression
that Mark and Doug agreed with me.




RE: Character identities

2002-10-29 Thread Michael Everson
At 21:07 +0100 2002-10-29, Marco Cimarosti wrote:


  I'm sure Michael would agree too (at least I hope so), and many others.

There are many Michaels and many others here... If any of them wish to
intervene, I hope they'll rather say something new to take the discussion
out of the loop, rather than joining one faction.


My eyes have glazed over reading this discussion. What am I being 
asked to agree with?
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: Character identities

2002-10-29 Thread Kenneth Whistler
Michael asked:

 My eyes have glazed over reading this discussion. What am I being 
 asked to agree with?

Here's the executive summary for those without the time to
plow through the longer exchange:

Marco: It is o.k. (in a German-specific context) to display
   an umlaut as a macron (or a tilde, or a little e above),
   since that is what Germans do.
   
Kent:  It is *not* o.k. -- that constitutes changing a character.

[Sorry, guys, if I have ridden roughshod over the nuances... ;-)]

Michael, you might have to recuse yourself, however, since when
it was suggested that displaying Devanagari characters with
snowpeaked glyphs for a Nepali hiking company would be o.k.,
you misunderstood and suggested private use characters!

--Ken





RE: Character identities

2002-10-29 Thread Michael Everson
At 13:27 -0800 2002-10-29, Kenneth Whistler wrote:

Michael asked:


 My eyes have glazed over reading this discussion. What am I being
 asked to agree with?


Here's the executive summary for those without the time to
plow through the longer exchange:

Marco: It is o.k. (in a German-specific context) to display
   an umlaut as a macron (or a tilde, or a little e above),
   since that is what Germans do.

Kent:  It is *not* o.k. -- that constitutes changing a character.


Kent can't be right here.

1. We have all seen examples, in print, in signage, and in 
handwriting of German umlauts being displayed in each of those ways. 
Obviously the underlying encoding of them is the same, as is the 
intent.

2. The fact that a + diaeresis with a superscript e glyph could be 
mistaken for a + superscript-e is not more troublesome than the 
possibility of mistaking Latin or Cyrillic o with Greek omicron.

Michael, you might have to recuse yourself, however, since when it 
was suggested that displaying Devanagari characters with snowpeaked 
glyphs for a Nepali hiking company would be o.k., you misunderstood 
and suggested private use characters!

I did admit that I did not read the sentence entirely
--
Michael Everson * * Everson Typography *  * http://www.evertype.com




Re: RE: Character identities

2002-10-29 Thread starner
At 21:07 +0100 2002-10-29, Marco Cimarosti wrote:

   I'm sure Michael would agree too (at least I hope so), and many others.

There are many Michaels and many others here... If any of them wish to
intervene, I hope they'll rather say something new to take the discussion
out of the loop, rather than joining one faction.

My eyes have glazed over reading this discussion. What am I being 
asked to agree with?

Is it complaint with Unicode to have a font where a-umlaut has a glyph of
a with e above? What about a glyph of a-macron (e.g. a handwriting font for someone 
who writes a-umlaut that way)?




Re: RE: Character identities

2002-10-29 Thread Michael Everson
At 15:56 -0600 2002-10-29, [EMAIL PROTECTED] wrote:


Is it complaint with Unicode to have a font where a-umlaut has a glyph of
a with e above? What about a glyph of a-macron (e.g. a handwriting 
font for someone who writes a-umlaut that way)?

Of course it is. Glyphs are informative.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com




Re: Character identities

2002-10-29 Thread Keld Jørn Simonsen
On Tue, Oct 29, 2002 at 09:07:16PM +0100, Marco Cimarosti wrote:
 Kent Karlsson wrote:
  Marco, 
 
 Keld, please allow me to begin with the end of your post:

I really have not contributed much to this thread, I think you mean
Kent.

Best regards
keld




Re: RE: Character identities

2002-10-29 Thread John Hudson
At 14:56 10/29/2002, [EMAIL PROTECTED] wrote:


Is it complaint with Unicode to have a font where a-umlaut has a glyph of
a with e above? What about a glyph of a-macron (e.g. a handwriting font 
for someone who writes a-umlaut that way)?

Yes, I would say that it is compliant with Unicode because there is 
absolutely nothing in the Unicode Standard to say that it is non-compliant. 
I have seen German display types in which the umlaut is indicated by a 
miniature uppercase E *inside* the uppercase O. The point is that the small 
e is an accepted traditional German convention for indicating an umlaut, 
and any recognisable glyph variant of that convention fits the cognitive 
model for many competent readers reading German. The example of a 
handwriting font in which the umlaut is represented by something that looks 
like a macron, or a tilde, or a duckbilled platypus, should be judged by 
the same criteria: does the reader recognise the glyph as representing a 
vowel with umlaut? If so, it is a perfectly valid glyph representation of 
the umlaut character. It is, of course, a perfectly valid response to a 
typeface design to say 'I don't want to use this font because it has a 
weird umlaut', but it is equally valid for a typeface to have a weird 
umlaut; it may limit the popularity of the typeface, but so might the shape 
of the lowercase f or the curl of the tail of the Q, but would you say that 
these forms need to be a certain way to be valid or compliant? Although the 
line between glyph variants that are recognised by readers as valid 
representations of characters and those that are not is difficult to 
define, in practice readers are capable of making these decisions (and even 
of recognising, accepting or learning new forms that they have not 
encountered before): it is a bit like the distinction between pornography 
and erotica, which is hard to define but which magistrates and juries 
regularly decide on with confidence, competence and consensus.

John Hudson

Tiro Typeworks		www.tiro.com
Vancouver, BC		[EMAIL PROTECTED]

It is necessary that by all means and cunning,
the cursed owners of books should be persuaded
to make them available to us, either by argument
or by force.  - Michael Apostolis, 1467




Re: RE: Character identities

2002-10-29 Thread Adam Twardoch
 Do we again need an intelligent font that understands language tagging?

This should be achievable with OpenType, no?

 Do we now have different flavors of Unicocde, one for English, one for
Icelandic, one for French, one for German ... ?

In most of the cases described be you, you can still have just one Unicode
character but different glyphs representing it. In OpenType, you could
assign glyph substitutions to some features such as historical forms and
do it on a language-dependant level.

 Should an English language font render ö as oe,  so that Göthe appears
automatically in the more normal English form Goethe?

If you refer to Johann Wolfgang von Goethe, his name is *not* spelled with
an ö anyway.

 The use of macron for dieresis is somewhat a different matter.  If a
particular style of German script uses a line for a diaeresis, then indeed
the diaeresis in that script has fallen together in appearance with the
macron.

But this doesn't mean that you have to encode it just once. Unicode should
be of what characters *mean*, not what characters look like. Unfortunately,
for spatial reasons, many lookalikes have been consolidated. But you can
intelligently split them with OpenType. You can have styllistic sets that
you choose basing on your preferred writing.

Adam





Re: RE: Character identities

2002-10-29 Thread David Starner
On Tue, Oct 29, 2002 at 08:53:59PM -0500, Jim Allan wrote:
 Using the Unicode method makes far more sense than creating fonts that 
 work for particular languages only, provided no foreign words or names 
 appear, or which require language tagging.

Why does the Unicode method exclude creating fonts that work for
particular language only? A lot of fontmakers specialize in the one
purpose font, and may not want or need to put in the time to cover
multiple languages.

 Marco's desire to use a font to indicate combining superscript einstead 
 of the way Unicode wants it done seems prompted because currently most 
 Unicode fonts do not currently support the combinining superscript 
 characters and he wishes a fallback to normal diaeresis instead of to an 
 undefined character indicator.

It was my wish, and it had nothing to do with that. I was looking at the
book mentioned in my first message, which was printed in 1920 and yet
used the superscript e instead of an umlaut. I thought about encoding
that font in a computer, and then about printing a text in the font. If
I take a sample German text, and want to print it in this font, why
should I have to change the text? The text hasn't changed, just the
presentation. While _I_ could change the text, the average user would
probably find it prohibatively complex, and even if walked throught it,
would be frustrated to have to put so much work into it.

As for the concerns brought up by you and Marc, I find them absurd in
this case. This font won't support other languages, because the book
doesn't have the glyphs for them. (Not even ô or ï, if you're one of the
people who think English needs them.) The font's not made for academic
or scholarly work, and even if I were to encode the a-e in an a-e slot,
it probably won't have a proper a-diaresis.

-- 
David Starner - [EMAIL PROTECTED]
Great is the battle-god, great, and his kingdom--
A field where a thousand corpses lie. 
  -- Stephen Crane, War is Kind




Re: Character identities

2002-10-28 Thread Marc Wilhelm Küster
At 11:37 25.10.2002 -0700, Doug Ewell wrote:

Marc Wilhelm Küster kuester at saphor dot net wrote:

 As to the long s, it is not used for writing present-day German except
 in rare cases, notably in some scholarly editions and in the Fraktur
 script. Very few texts beyond the names of newspapers are nowadays
 produced in Fraktur. To put the long s on the German keyboard would be
 quite contrary to user requirements -- and if a requirement existed,
 it would be DIN's job to amend DIN 2137-2 and the upcoming DIN 2137-12
 to cater for it.

Irrelevant, sure, but contrary?  I don't see what harm could come
from adding a character to a previously unassigned key, especially in
the relatively obscure AltGr zone (Level 3).  Most users could safely
ignore it, and most would never even know it was there.


In principle, you are right. Unfortunately, there's quite a bit of software 
around that (mis-)uses unassigned AltGr-Keys for their own purposes - this 
includes, on Windows NT ff at least, software such as the localized MS 
Word. So, adding new assignments potentially clashes with existing software 
and should only be done if there is a sufficiently high public interest in 
doing so.


But yes, of course it would be DIN's job to standardize such a thing (or
not).

Patrick Andries asked if a revised German keyboard standard would be
ignored in the market with the same cavalier attitude seen in Canada
(and the U.S.).  My impression is that European manufacturers are held
more closely to conformance with national and international standards
than North American manufacturers, but I'd want some Europeans to back
me up on this.


Speaking of Europe, it differs from country to country. In Germany 
certainly DIN 2137 is widely adhered to and changes to it would in all 
likelihood be taken up fast on the market.

Best regards,

Marc Küster


-Doug Ewell
 Fullerton, California


*
Marc Wilhelm Küster
Saphor GmbH

Fronländer 22
D-72072 Tübingen

Tel.: (+49) / (0)7472 / 949 100
Fax: (+49) / (0)7472 / 949 114





RE: Character identities

2002-10-28 Thread Kent Karlsson
...
  For this reason it is quite impermissible to render the
  combining letter small e as a diaeresis

 So far so good. There would be no reason for doing such a thing.
...
  or, for that matter, the diaeresis as a combining
  letter small e (however, you see the latter version
  sometimes, very infrequently, in advertisement).

 This is the case I though we were discussing, and it is a
 very different case.

No, the claim was that diaresis and overscript e are the same,
so the reversed case Marc is talking about is not different at all.

 Standing Keld's opinion and Marc's wholehearted support, it

Please don't confuse me with Keld!

 follows that
 those infrequent advertisements should be encoded using U+0364...

 But U+0364 (COMBINING LATIN SMALL LETTER E) belongs to a
 small collection of
 Medieval superscript letter diactrics, which is supposed to appear
 primarily in medieval Germanic manuscripts, or to reproduce
 some usage as late as the 19th century in some languages.

Yes, but you should not read too much into the explanation,
which, while correct, does not limit the existence of their
glyphs to fonts used only by germanic professors...
Some of them (overscript e in particular) should be(come)
quite commonly occurring in any Fraktur Unicode font.

 Using such a character to encode 21st century advertisements
 is doomed to cause problems:

 1) The glyph for U+0364 is more likely found in the font
 collection of the
 Faculty of Germanic Studies that on the PC of people wishing
 to read the
 advertisement for Ye Olde Küster Pub. So, most people will
 be unable to
 view the advertisement correctly.

 2) The designer of the advertisement will be unable to use
 his spell-checker and hyphenator on the advertisement's text.

Advertisements should invariably be final spell-checked and
hyphenated by humans!  Automated spell checkers and hyphenators
for German (as well as Scandinavian languages) have (so far)
not been good enough even for running text that you want to
publish...

 3) User's will be unable to find the Küster Pub by searching
 Küster in a
 search engine.

Depends on the search engine, and if it uses a correct collation
table (for the language) or not...

 What will actually happen is that everybody will see an empty
 square, so
 they'll think that the web designer is an idiot, apart the
 professors at the
 Faculty of Germanic Studies, who'll think that the designer
 is an idiot
 because she doesn't know the difference between U+0308 and
 U+0364 in ancient German.

Most modern use of Fraktur seem to use diaeresis or double
acute for this. (But the web designer could use a dynamically
downloaded font fragment, if there is worry that all glyphs
might not be supported by the fonts used by the vast majority
of the target audience.)

 The real error (IMHO) is the idea that font designers should
 stick to the
 *sample* glyphs printed on the Unicode book, because this would force

Well, the diacritics are allocated/unified on glyphic grounds.
While a diaeresis may look different from font to font, it is
basically two dots (of some shape in line with the design of the
font), never an e shape.  At least not in the *default mode* of a
*Unicode font*. And overscript small e will also vary with the font,
looking like a shrunken ordinary e glyph of (ideally) the same font.
But never like two dots (in the default mode of a Unicode font).

 graphic designer to change the *encoding* of their text in
 order to get the desired result.

A graphic designer is likely to turn the whole thing into 2-d
or 3-d graphics, probably distorted, possibly animated, to get
the desired result!  At which point the original, or intemediary,
encoding of any text elements is not very relevant to the
end result.

 Another big error (IMHO, once again) is the idea that two
 different Unicode characters should look different.

I have never said that! E.g., a µ as well as an Å (both of which
are allocated twice!) should look the same (resp.) regardless of
which of their respective code points is used. There are many
more examples of characters that definitely should (e.g. capital
K and Kelvin sign, small i and small roman numeral one) or may
(capital A, capital Alpha, ...) look the same.

There are also lots of characters that mean the same, but
always (in a Unicode font in default mode) should/must
look different. Like M and Roman Numeral One Thousand C D
(just to take an example closer to Italy... ;-).

 The difference must be preserved when it
 is useful -- e.g., U+0308 should not look like U+0364 in a

should not -- must never

 font designed for
 publishing books on the history of German!

a font . -- any Unicode font in default mode

(Bad example, Marco!)


 What should really happen, IMHO, is that modern German should
 be encoded as
 modern German. A U+0308 (COMBINING DIAERESIS) should remain a U+0308,
 regardless that the corresponding glyph *looks* like U+0364
 (COMBINING LATIN
 SMALL LETTER E) in one font, and it looks 

Re: Character identities

2002-10-28 Thread David Starner
On Mon, Oct 28, 2002 at 11:21:30AM +0100, Kent Karlsson wrote:
 No, the claim was that diaresis and overscript e are the same,
 so the reversed case Marc is talking about is not different at all.

The claim is, that for certain fonts, it is appropriate to image the
a-umlaut character as an a^e. That doesn't imply anything about the
other way around, or else t' could legally be displayed as a t with
caron above.

  A U+0308 (COMBINING DIAERESIS) should remain a U+0308,
  regardless that the corresponding glyph *looks* like U+0364
  (COMBINING LATIN
  SMALL LETTER E) in one font, and it looks like U+0304
  (COMBINING MACRON) in
  another font, and it looks like two five-pointed start
  side-by-side in a
  third font, and it looks like Mickey Mouse's ears in Disney.ttf...
 
 These are all unacceptable variations in a *Unicode font (in
 default mode)*.  But you can have all kinds of silly variations
 in *non*-Unicode fonts applied to Unicode text, including ciphers
 or rebuses... (ok, there are degrees...)

Basically, any decorative or handwriting font can't be a Unicode font.
(The glyph for my German teachers umlaut was definitely a macron.) Seems
pointless to tell a lot of the fontmakers out there that they shouldn't
worry about Unicode, because Unicode's only for standard book fonts, but
that's the only way I can read your last statement.

-- 
David Starner - [EMAIL PROTECTED]
Great is the battle-god, great, and his kingdom--
A field where a thousand corpses lie. 
  -- Stephen Crane, War is Kind




RE: Character identities

2002-10-28 Thread Marco Cimarosti
Kent Karlsson wrote:
   For this reason it is quite impermissible to render the
   combining letter small e as a diaeresis
 
  So far so good. There would be no reason for doing such a thing.
 ...
   or, for that matter, the diaeresis as a combining
   letter small e (however, you see the latter version
   sometimes, very infrequently, in advertisement).
 
  This is the case I though we were discussing, and it is a
  very different case.
 
 No, the claim was that diaresis and overscript e are the same,

The claim was that dieresis and overscript e are the same in *modern*
*standard* German. Or, better stated, that overscript e is just a glyph
variant of dieresis, in *modern* *standard* German typeset in Fraktur.

Sorry if I haven't stated this clearly enough.

 so the reversed case Marc is talking about is not different at all.

It is. In the first case, we are talking about a glyph variant in *modern*
*standard* German, in the second case, we are talking about two different
diacritics in some *other* context. (Ancient German? ancient Swedish?).

  Standing Keld's opinion and Marc's wholehearted support, it
 
 Please don't confuse me with Keld!

Oooops! My apologies!

  follows that
  those infrequent advertisements should be encoded using U+0364...
 
  But U+0364 (COMBINING LATIN SMALL LETTER E) belongs to a
  small collection of
  Medieval superscript letter diactrics, which is supposed 
 to appear
  primarily in medieval Germanic manuscripts, or to reproduce
  some usage as late as the 19th century in some languages.
 
 Yes, but you should not read too much into the explanation,
 which, while correct, does not limit the existence of their
 glyphs to fonts used only by germanic professors...
 Some of them (overscript e in particular) should be(come)
 quite commonly occurring in any Fraktur Unicode font.

Commonly sounds funny near Fraktur...

  Using such a character to encode 21st century advertisements
  is doomed to cause problems:
 
  1) The glyph for U+0364 is more likely found in the font
  collection of the
  Faculty of Germanic Studies that on the PC of people wishing
  to read the
  advertisement for Ye Olde Küster Pub. So, most people will
  be unable to
  view the advertisement correctly.
 
  2) The designer of the advertisement will be unable to use
  his spell-checker and hyphenator on the advertisement's text.
 
 Advertisements should invariably be final spell-checked and
 hyphenated by humans!  Automated spell checkers and hyphenators
 for German (as well as Scandinavian languages) have (so far)
 not been good enough even for running text that you want to
 publish...

This has no connection with this discussion.

However, IMHO, the presence U+0364 (COMBINING LATIN SMALL LETTER E) in a
modern German or Swedish text is just a plain spelling error, and even the
naivest spellchecker should flag it as such.

  3) User's will be unable to find the Küster Pub by searching
  Küster in a
  search engine.
 
 Depends on the search engine, and if it uses a correct collation
 table (for the language) or not...

  What will actually happen is that everybody will see an empty
  square, so
  they'll think that the web designer is an idiot, apart the
  professors at the
  Faculty of Germanic Studies, who'll think that the designer
  is an idiot
  because she doesn't know the difference between U+0308 and
  U+0364 in ancient German.
 
 Most modern use of Fraktur seem to use diaeresis or double
 acute for this. 

U+0308 (COMBINING DIAERESIS) should be the only umlaut to be found in
modern German text. What that diacritic *looks* like (two dots, an e, a
double acute, a macron, Mickey Mouse's ears), is a choice of the font
designer.

 (But the web designer could use a dynamically
 downloaded font fragment, if there is worry that all glyphs
 might not be supported by the fonts used by the vast majority
 of the target audience.)

This too has no connection with this discussion, and is OT. Unicode is
concerned with how text is *encoded* the details of fonts and display
technology are out of scope.

What Unicode really mandates is that the encoding should not change to
obtain a certain graphic effect.

  The real error (IMHO) is the idea that font designers should
  stick to the
  *sample* glyphs printed on the Unicode book, because this 
 would force
 
 Well, the diacritics are allocated/unified on glyphic grounds.
 While a diaeresis may look different from font to font, it is
 basically two dots (of some shape in line with the design of the
 font), never an e shape.  At least not in the *default mode* of a
 *Unicode font*.

 And overscript small e will also vary with the font,
 looking like a shrunken ordinary e glyph of (ideally) the same font.
 But never like two dots (in the default mode of a Unicode font).

You haven't yet defined your meaning of Unicode font and, now, you add a
new fancy term: default mode!

What's a default mode? Unicode does not require fonts to have any kind of
modes. You seem to be 

Re: Character identities

2002-10-28 Thread Doug Ewell
Marco Cimarosti marco dot cimarosti at essetre dot it wrote:

 There are also lots of characters that mean the same, but
 always (in a Unicode font in default mode) should/must
 look different. Like M and Roman Numeral One Thousand C D
 (just to take an example closer to Italy... ;-).

 Well, the first and only time I have seen that Thousand C D was on
 the Unicode charts... However, if I'd be asked which glyph is more
 appropriate for that character, I would say: the same as capital M.

I would disagree with this.  It seems to me the whole reason for both
U+216F ROMAN NUMERAL ONE THOUSAND and U+2180 ROMAN NUMERAL ONE THOUSAND
C D to exist is that they should have different glyphs.  This is not
necessarily is keeping with the purest spirit of Unicode (which might
regard these as two glyphs of a single character), but in reality they
are encoded as two characters.

Note, however, that there is nothing wrong with using the same glyph for
U+004D and U+216F, although in many fonts they are different for no
obvious reason.

-Doug Ewell
 Fullerton, California





Re: Character identities

2002-10-28 Thread Anto'nio Martins-Tuva'lkin
On 2002.10.28, 13:09, David Starner [EMAIL PROTECTED] wrote:

 Basically, any decorative or handwriting font can't be a Unicode font.
...
 Seems pointless to tell a lot of the fontmakers out there that they
 shouldn't worry about Unicode, because Unicode's only for standard
 book fonts

Hm, what if I want to make, say, snow capped Devanagari glyphs for my
hiking company in Nepal? Shouldn't I assign them to Unicode code points?

--   .
António MARTINS-Tuválkin|  ()|
[EMAIL PROTECTED]   ||
R. Laureano de Oliveira, 64 r/c esq. |
PT-1885-050 MOSCAVIDE (LRS)  Não me invejo de quem tem   |
+351 917 511 549 carros, parelhas e montes   |
http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe   |
http://pagina.de/bandeiras/  a água em todas as fontes   |





Re: Character identities

2002-10-28 Thread John Hudson


On 2002.10.28, 13:09, David Starner [EMAIL PROTECTED] wrote:

 Basically, any decorative or handwriting font can't be a Unicode font.
...
 Seems pointless to tell a lot of the fontmakers out there that they
 shouldn't worry about Unicode, because Unicode's only for standard
 book fonts


Hello? Who says decorative or handwriting fonts can't be Unicode fonts? 
I've got dozens of fonts on my system that prove this wrong. Zapfino, which 
ships with OS X and which I had the privilege to work on, is about as 
decorative a handwriting font as you could wish for, and of course it has a 
Unicode cmap.

Or are you working with some definition of 'Unicode font' other than 'font 
with a Unicode cmap'?

John Hudson

Tiro Typeworks		www.tiro.com
Vancouver, BC		[EMAIL PROTECTED]

It is necessary that by all means and cunning,
the cursed owners of books should be persuaded
to make them available to us, either by argument
or by force.  - Michael Apostolis, 1467




Re: Character identities

2002-10-28 Thread Michael Everson
At 20:59 + 2002-10-28, Anto'nio Martins-Tuva'lkin wrote:

On 2002.10.28, 13:09, David Starner [EMAIL PROTECTED] wrote:


 Basically, any decorative or handwriting font can't be a Unicode font.

...

 Seems pointless to tell a lot of the fontmakers out there that they
 shouldn't worry about Unicode, because Unicode's only for standard
 book fonts


Hm, what if I want to make, say, snow capped Devanagari glyphs for my
hiking company in Nepal? Shouldn't I assign them to Unicode code points?


That's what Private Use code positions are for.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com




RE: Character identities

2002-10-28 Thread Figge, Donald

At 20:59 + 2002-10-28, Anto'nio Martins-Tuva'lkin wrote:
On 2002.10.28, 13:09, David Starner [EMAIL PROTECTED] wrote:

  Basically, any decorative or handwriting font can't be a Unicode font.
...
  Seems pointless to tell a lot of the fontmakers out there that they
  shouldn't worry about Unicode, because Unicode's only for standard
  book fonts

Hm, what if I want to make, say, snow capped Devanagari glyphs for my
hiking company in Nepal? Shouldn't I assign them to Unicode code points?

That's what Private Use code positions are for.
-- 
Michael Everson * * Everson Typography *  * http://www.evertype.com
--
I don't think so. He seems to be talking about a specific typographic style.
Code points don't care about style, whether it's Franklin Gothic or
Snowcapped Helvetica.

Don




Re: Character identities

2002-10-28 Thread Michael Everson
At 13:36 -0700 2002-10-28, John Hudson wrote:


Or are you working with some definition of 'Unicode font' other than 
'font with a Unicode cmap'?

It seemed to me that he was talking about fonts that had characters 
that weren't in Unicode at all. I don't mean precomposed vowels, but, 
say, fonts with moon phases in them.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Character identities

2002-10-28 Thread Kenneth Whistler

 Hm, what if I want to make, say, snow capped Devanagari glyphs for my
 hiking company in Nepal? Shouldn't I assign them to Unicode code points?
 
 That's what Private Use code positions are for.
 -- 
 Michael Everson * * Everson Typography *  * http://www.evertype.com

Um, Michael, I think Anto'nio was talking about glyphs in a
decorative font, which should -- clearly -- just be mapped to
ordinary Unicode characters, via an ordinary Unicode cmap.

Or do you think that the yellow, cursive, shadow-dropped, 3-D
letters Getaway! at:

http://www.trekking-in-nepal.com/

should also be represented by Private Use code positions? ;-)

--Ken





Re: Character identities

2002-10-28 Thread David Starner
On Mon, Oct 28, 2002 at 09:36:34PM +, Michael Everson wrote:
 At 20:59 + 2002-10-28, Anto'nio Martins-Tuva'lkin wrote:
 On 2002.10.28, 13:09, David Starner [EMAIL PROTECTED] wrote:
 
  Basically, any decorative or handwriting font can't be a Unicode font.
 ...
  Seems pointless to tell a lot of the fontmakers out there that they
  shouldn't worry about Unicode, because Unicode's only for standard
  book fonts
 
 Hm, what if I want to make, say, snow capped Devanagari glyphs for my
 hiking company in Nepal? Shouldn't I assign them to Unicode code points?
 
 That's what Private Use code positions are for.

But think of the utility if Unicode added a COMBINING SNOWCAP and
COMBINING FIRECAP! But should we combine the SNOWCAP with the ICECAP?

(-:

-- 
David Starner - [EMAIL PROTECTED]
Great is the battle-god, great, and his kingdom--
A field where a thousand corpses lie. 
  -- Stephen Crane, War is Kind




Re: Character identities

2002-10-28 Thread David Starner
On Mon, Oct 28, 2002 at 01:36:08PM -0700, John Hudson wrote:
 
 On 2002.10.28, 13:09, David Starner [EMAIL PROTECTED] wrote:
 
  Basically, any decorative or handwriting font can't be a Unicode font.
 ...
  Seems pointless to tell a lot of the fontmakers out there that they
  shouldn't worry about Unicode, because Unicode's only for standard
  book fonts
 
 Hello? Who says decorative or handwriting fonts can't be Unicode fonts? 
[...]
 Or are you working with some definition of 'Unicode font' other than 'font 
 with a Unicode cmap'?

Right above where it was cut it said:

Marco:
  A U+0308 (COMBINING DIAERESIS) should remain a U+0308,
  regardless that the corresponding glyph *looks* like U+0364
  (COMBINING LATIN
  SMALL LETTER E) in one font, and it looks like U+0304
  (COMBINING MACRON) in
  another font, and it looks like two five-pointed start
  side-by-side in a
  third font, and it looks like Mickey Mouse's ears in Disney.ttf...
 
Kent:
  These are all unacceptable variations in a *Unicode font (in
  default mode)*.

Earlier:

Marco:
  there are fonts which don't have dots over i and j;

Kent:
  You have a slight point there, but those are not intended for
  running text.  And I'm hesitant to label them Unicode fonts.

Given that definition of Unicode fonts, a number of decorative or
handwriting fonts (though fewer than I expected) are arbitrarily
excluded from being Unicode fonts.

-- 
David Starner - [EMAIL PROTECTED]
Great is the battle-god, great, and his kingdom--
A field where a thousand corpses lie. 
  -- Stephen Crane, War is Kind




Re: Character identities

2002-10-28 Thread Michael Everson
At 14:30 -0800 2002-10-28, Kenneth Whistler wrote:

  Hm, what if I want to make, say, snow capped Devanagari glyphs for my

 hiking company in Nepal? Shouldn't I assign them to Unicode code points?

 That's what Private Use code positions are for.
 --
 Michael Everson * * Everson Typography *  * http://www.evertype.com


Um, Michael, I think Anto'nio was talking about glyphs in a
decorative font, which should -- clearly -- just be mapped to
ordinary Unicode characters, via an ordinary Unicode cmap.


If they correspond to Unicode characters, yes, certainly.


Or do you think that the yellow, cursive, shadow-dropped, 3-D
letters Getaway! at:

http://www.trekking-in-nepal.com/

should also be represented by Private Use code positions? ;-)


Not at all. Fonts with images of igloos and yurts would use it, 
though, I would think.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: Character identities

2002-10-28 Thread Michael Everson
At 14:31 -0800 2002-10-28, Figge, Donald wrote:

At 20:59 + 2002-10-28, Anto'nio Martins-Tuva'lkin wrote:

On 2002.10.28, 13:09, David Starner [EMAIL PROTECTED] wrote:


  Basically, any decorative or handwriting font can't be a Unicode font.

...

  Seems pointless to tell a lot of the fontmakers out there that they
  shouldn't worry about Unicode, because Unicode's only for standard
  book fonts


Hm, what if I want to make, say, snow capped Devanagari glyphs for my
hiking company in Nepal? Shouldn't I assign them to Unicode code points?


That's what Private Use code positions are for.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com
--
I don't think so. He seems to be talking about a specific typographic style.
Code points don't care about style, whether it's Franklin Gothic or
Snowcapped Helvetica.


I must have misunderstood. I think I only saw the snow-capped and 
not the Devanagari. Sorry.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Character identities

2002-10-28 Thread Doug Ewell
My USD 0.02, as someone who is neither a professional typographer nor a
font designer (more than one, but not quite two, different things)...

Discussions about the character-glyph model often mention the essential
characteristics of a given character.  For example, a Latin capital A
can be bold, italic, script, sans-serif, etc., but it must always have
that essential A-ness such that readers of (e.g.) English can identify
it as an A instead of, say, an O or a 4 or a picture of a duck.  (Mark
Davis has a chart showing dozens of different A's in his Unicode Myths
presentation.)

Somewhere in between the obvious relationships (A = A, B ≠ A), we have
the case pair A and a.  They are not identical, but they are certainly
more similar to each other than are A and B.

It seems to me, as a non-font guy, that calling a font a Unicode font
implies two things:

1.  It must be based on Unicode code points.  For True- and OpenType
fonts, this implies a Unicode cmap; for other font technologies it
implies some more-or-less equivalent mechanism.  The point is that
glyphs must be associated with Unicode code points (not necessarily
1-to-1, of course), not merely with an internal 8-bit table that can be
mapped to Unicode only through some other piece of software.

2.  The glyphs must reflect the essential characteristics of the
Unicode character to which they are mapped.  That means a capital A can
be bold, italic, script, sans-serif, etc.  A small a can also be
small-caps (or even full-size caps), but I think this is the only
controversial point.

In a Unicode font, U+0041 cannot be mapped to a capital A with macron,
as it is in Bookshelf Symbol 1; nor to a six-pointed star, as in
Monotype Sorts; nor to a hand holding up two fingers, as in Wingdings.
(But it can be mapped to a notdef glyph, if the font makes no claim to
supporting U+0041.)

U+0915 absolutely can have snow on it, or be bold or italic or whatever
(or all of these), as long as a Devanagari reader would recognize its
essential ka-ness.  It cannot look like a Latin A, nor for that matter
can U+0041 look like a Devanagari ka.

Font guys, do you agree with this?

Of course, the term Unicode font is also often used to mean a font
that covers all, or nearly all, of Unicode.  Font technologies
generally don't even allow this, of course, and even by the standards of
nearly we are still limiting ourselves to things like Bitstream
Cyberbit, Arial Unicode MS, Code2000, Cardo, etc.  Right or wrong, this
is a commonly accepted meaning for Unicode font.

-Doug Ewell
 Fullerton, California





Re: Character identities

2002-10-28 Thread Mark Davis
I'm pretty much in agreement with what you say, except the following:

 Of course, the term Unicode font is also often used to mean a font
 that covers all, or nearly all, of Unicode.

I would consider a Unicode font to be one that met your other conditions,
aside from the repertoire. If I had a font that covered Latin, Greek and
Cyrillic and worked with Unicode strings, for example, I would still
consider that a Unicode font. I just wouldn't consider it a (pick your
adjective) full / complete Unicode font.

Mark
__
http://www.macchiato.com
►  “Eppur si muove” ◄

- Original Message -
From: Doug Ewell [EMAIL PROTECTED]
To: Unicode Mailing List [EMAIL PROTECTED]
Sent: Monday, October 28, 2002 17:37
Subject: Re: Character identities


 My USD 0.02, as someone who is neither a professional typographer nor a
 font designer (more than one, but not quite two, different things)...

 Discussions about the character-glyph model often mention the essential
 characteristics of a given character.  For example, a Latin capital A
 can be bold, italic, script, sans-serif, etc., but it must always have
 that essential A-ness such that readers of (e.g.) English can identify
 it as an A instead of, say, an O or a 4 or a picture of a duck.  (Mark
 Davis has a chart showing dozens of different A's in his Unicode Myths
 presentation.)

 Somewhere in between the obvious relationships (A = A, B ≠ A), we have
 the case pair A and a.  They are not identical, but they are certainly
 more similar to each other than are A and B.

 It seems to me, as a non-font guy, that calling a font a Unicode font
 implies two things:

 1.  It must be based on Unicode code points.  For True- and OpenType
 fonts, this implies a Unicode cmap; for other font technologies it
 implies some more-or-less equivalent mechanism.  The point is that
 glyphs must be associated with Unicode code points (not necessarily
 1-to-1, of course), not merely with an internal 8-bit table that can be
 mapped to Unicode only through some other piece of software.

 2.  The glyphs must reflect the essential characteristics of the
 Unicode character to which they are mapped.  That means a capital A can
 be bold, italic, script, sans-serif, etc.  A small a can also be
 small-caps (or even full-size caps), but I think this is the only
 controversial point.

 In a Unicode font, U+0041 cannot be mapped to a capital A with macron,
 as it is in Bookshelf Symbol 1; nor to a six-pointed star, as in
 Monotype Sorts; nor to a hand holding up two fingers, as in Wingdings.
 (But it can be mapped to a notdef glyph, if the font makes no claim to
 supporting U+0041.)

 U+0915 absolutely can have snow on it, or be bold or italic or whatever
 (or all of these), as long as a Devanagari reader would recognize its
 essential ka-ness.  It cannot look like a Latin A, nor for that matter
 can U+0041 look like a Devanagari ka.

 Font guys, do you agree with this?

 Of course, the term Unicode font is also often used to mean a font
 that covers all, or nearly all, of Unicode.  Font technologies
 generally don't even allow this, of course, and even by the standards of
 nearly we are still limiting ourselves to things like Bitstream
 Cyberbit, Arial Unicode MS, Code2000, Cardo, etc.  Right or wrong, this
 is a commonly accepted meaning for Unicode font.

 -Doug Ewell
  Fullerton, California








Re: Character identities

2002-10-28 Thread Michael \(michka\) Kaplan
All this talk about the letter A reminded me of something from Hofstadter:

The problem of intelligence, as I see it is to understand the fluid nature
of mental categories, to understand the invariant cores of percepts such as
your mother’s face, to understand the strangely flexible yet strong
boundaries of concepts such as “chair” or the letter “a“ … The central
problem of (artificial intelligence) is the question: What is the letter ‘a’
and ‘i’? ...By making these claims, I am suggesting that, for any program to
handle letterforms with the flexibility that human beings do, it would have
to possess full-scale general intelligence.

-- Douglas R. Hofstadter, from one of his Metamagical Themas articles

The notion that we could ever capture the essence of A-ness has already
been discussed at length and dismissed as impossible without an AI
breakthrough. :-)

MichKa





Re: Character identities

2002-10-28 Thread John Cowan
Doug Ewell scripsit:

 1.  It must be based on Unicode code points.  For True- and OpenType
 fonts, this implies a Unicode cmap; for other font technologies it
 implies some more-or-less equivalent mechanism.  The point is that
 glyphs must be associated with Unicode code points (not necessarily
 1-to-1, of course), not merely with an internal 8-bit table that can be
 mapped to Unicode only through some other piece of software.

If it's a FIGlet font, of course, it's automatically Unicode, since FIGlet's
table is 32 bits wide.

 In a Unicode font, U+0041 cannot be mapped to a capital A with macron,
 as it is in Bookshelf Symbol 1; nor to a six-pointed star, as in
 Monotype Sorts; nor to a hand holding up two fingers, as in Wingdings.
 (But it can be mapped to a notdef glyph, if the font makes no claim to
 supporting U+0041.)

In fact, these fonts map these glyphs to U+F041.  Only when seen as 8-bit
fonts do they map to 0x41.

-- 
With techies, I've generally found  John Cowan
If your arguments lose the first round  http://www.reutershealth.com
Make it rhyme, make it scan http://www.ccil.org/~cowan
Then you generally can  [EMAIL PROTECTED]
Make the same stupid point seem profound!   --Jonathan Robie




Re: Character identities

2002-10-28 Thread John Hudson
At 18:37 10/28/2002, Doug Ewell wrote:


It seems to me, as a non-font guy, that calling a font a Unicode font
implies two things:

1.  It must be based on Unicode code points.  For True- and OpenType
fonts, this implies a Unicode cmap; for other font technologies it
implies some more-or-less equivalent mechanism.  The point is that
glyphs must be associated with Unicode code points (not necessarily
1-to-1, of course), not merely with an internal 8-bit table that can be
mapped to Unicode only through some other piece of software.


My only ammendment to that would be:

'The point is that those glyphs that are intended to represent the default 
form of the characters supported by that font must be associated with 
Unicode codepoints, whether directly or indirectly, not merely...'

Not every glyph in a font needs to be encoded, and in general glyph 
variants and things like ligatures should not be, unless standard Unicode 
codepoints happen to be available for them (even then, it would be 
legitimate to leave them unencoded and access them only via glyph 
processing features).

2.  The glyphs must reflect the essential characteristics of the
Unicode character to which they are mapped.  That means a capital A can
be bold, italic, script, sans-serif, etc.  A small a can also be
small-caps (or even full-size caps), but I think this is the only
controversial point.


Yes, I would agree with that, with the caveat that the A-ness of an A isn't 
necessarily something that can be defined: it can only be recognised.

Of course, the term Unicode font is also often used to mean a font
that covers all, or nearly all, of Unicode.  Font technologies
generally don't even allow this, of course, and even by the standards of
nearly we are still limiting ourselves to things like Bitstream
Cyberbit, Arial Unicode MS, Code2000, Cardo, etc.  Right or wrong, this
is a commonly accepted meaning for Unicode font.


I really think we should all do what we can to bury this use of the term. 
It is singularly unhelpful, and the idea in the minds of some customers 
that they *need* a font that covers all of Unicode has not done anyone any 
good. Sure some font developers made some money making these ridiculously 
huge grab-bag fonts, but their time could have been much better spent.

John Hudson

Tiro Typeworks		www.tiro.com
Vancouver, BC		[EMAIL PROTECTED]

It is necessary that by all means and cunning,
the cursed owners of books should be persuaded
to make them available to us, either by argument
or by force.  - Michael Apostolis, 1467




Re: Character identities

2002-10-28 Thread William Overington
John Hudson commented.

At 02:46 10/26/2002, William Overington wrote:

I don't know whether you might be interested in the use of a small letter
a
with an e as an accent codified within the Private Use Area, but in case
you
might be interested, the web page is as follows.

http://www.users.globalnet.co.uk/~ngo/ligatur5.htm

I have encoded the a with an e as an accent as U+E7B4 so that both
variants
may coexist in a document encoded in a plain text format and displayed
with
an ordinary TrueType font.

If anyone were interested, he could do this himself and use any codepoint
in the Private Use Area.

The meaning which I intended to convey was as follows.

I don't know whether you might be interested in having a look at a
particular example of the use of a small letter a with an e as an accent
codified within the Private Use Area by an individual with an interest in
applying Unicode, but in case you might be interested in having a look at
that particular example, the web page is as follows.

If, following from your response to the way that you read my sentence,
someone were interested in defining a codepoint in the Private Use Area then
certainly he or she could do that himself or herself and use any codepoint
in the Private Use Area.

However, exercising that freedom is something which could benefit from some
thought.

If someone wishes to encode an a with an e as an accent in the Private Use
Area, he or she may wish to be able to apply that code point allocation in a
document.  If he or she looks at which Private Use Area codepoints are
already in use within some existing fonts, then selecting a code point which
is at present unused in those fonts might give a greater chance of his or
her new character assignment being implemented than choosing a code point
for which those fonts already have a glyph in use.

Searching through such fonts takes time and requires some skill.

If someone does wish to use a Private Use Area code point for an a with an e
accent, then by using U+E7B4 does give a possible slight advantage in that
the code point is already part of a published set of code points available
on the web, for, even though that set of code points is not a standard, it
is a consistent set and other people might well use those codepoints as
well.  However, anyone may produce and publish such a set of code point
allocations of his or her own if he or she so wishes, or indeed keep them to
himself or herself.

Yet I was not seeking to make any such point in my posting.  I simply added
to a thread on a specialised topic what I thought might be a short
interesting note with a link to a web page at which some readers might like
to look.  The web page indeed provides two external links to interesting
documents on the web.

Maybe it is time to include a note in the Unicode
Standard to suggest that 'Private' Use Area means that one should keep it
to oneself 

Well, at the moment the Unicode Standard does include the word publish in
the text about the Private Use Area.

I have published details of various uses of the Private Use Area on the web
yet not mentioned them in this forum.  For example, readers might perhaps
like to have a look at the following.

http://www.users.globalnet.co.uk/~ngo/ast07101.htm

Anyone who chooses to do so might like to have a look at the following file
as well, which introduces the application area.

http://www.users.glpbalnet.co.uk/~ngo/ast02100.htm

This is an application of the Unicode Private Use Area so as to produce a
set of soft buttons for a Java calculator so that the twenty hard button
minimum configuration of a hand held infra-red control device for a DVB-MHP
(Digital Video Broadcasting - Multimedia Home Platform) television can be
used in a consistent manner to signal information from the end user to the
computer in the television set.  I am very pleased with the result.  The
encoding achieves a useful effect while being consistent for information
handling purposes with the Unicode specification, so that an input stream of
characters may be processed by a Java program without any ambiguity over
whether a particular code point is a printing character or a calculator
button (or indeed mouse event or simulated mouse event as mouse events are
also encoded using the Private Use Area in my research).

William Overington

29 October 2002













Re: Character identities

2002-10-28 Thread Barry Caplan
At 04:39 PM 10/28/2002 -0600, David Starner wrote:


But think of the utility if Unicode added a COMBINING SNOWCAP and
COMBINING FIRECAP! But should we combine the SNOWCAP with the ICECAP?

(-:

Unicode captures the ice-age during the global warming era!

Do we have codepoints for images found on the walls of caves?

:)

Barry
www.i18n.com





Re: Character identities

2002-10-26 Thread William Overington
I don't know whether you might be interested in the use of a small letter a
with an e as an accent codified within the Private Use Area, but in case you
might be interested, the web page is as follows.

http://www.users.globalnet.co.uk/~ngo/ligatur5.htm

I have encoded the a with an e as an accent as U+E7B4 so that both variants
may coexist in a document encoded in a plain text format and displayed with
an ordinary TrueType font.

http://www.users.globalnet.co.uk/~ngo

William Overington

25 October 2002







Re: Character identities

2002-10-26 Thread John Hudson
At 02:46 10/26/2002, William Overington wrote:


I don't know whether you might be interested in the use of a small letter a
with an e as an accent codified within the Private Use Area, but in case you
might be interested, the web page is as follows.

http://www.users.globalnet.co.uk/~ngo/ligatur5.htm

I have encoded the a with an e as an accent as U+E7B4 so that both variants
may coexist in a document encoded in a plain text format and displayed with
an ordinary TrueType font.


If anyone were interested, he could do this himself and use any codepoint 
in the Private Use Area. Maybe it is time to include a note in the Unicode 
Standard to suggest that 'Private' Use Area means that one should keep it 
to oneself and not keep pestering other people about one's private use of it.

John Hudson


Tiro Typeworks		www.tiro.com
Vancouver, BC		[EMAIL PROTECTED]

It is necessary that by all means and cunning,
the cursed owners of books should be persuaded
to make them available to us, either by argument
or by force.  - Michael Apostolis, 1467




RE: Character identities

2002-10-25 Thread Marco Cimarosti
Peter Constable wrote:
  then *any* font having a unicode cmap is a Unicode font.
 
 No, not if the glyps (for the supported characters) are
 inappropriate for the characters given.
 
 Kent is quite right here. There are a *lot* of fonts out 
 there with Unicode
 cmaps that do not at all conform to the Unicode standard  ---
 custom-encoded (some call them hacked) fonts, usually abusing the
 characters that make up Windows cp1252.

IMHO, you are confusing two very different things here:

1) Assigning arbitrary glyphs to some Unicode characters. E.g., assigning
the $ character to long S; the ASCII letters to Greek letters; the whole
Latin-1 range to Devanagari characters, etc.

2) Choosing strange or unorthodox glyph variants for some Unicode
characters.

The hacked fonts you mention are case (1); what is being discussed in this
thread is case (2). Like it or not, superscript e *is* the same diacritic
that later become ¨, so there is absolutely no violation of the Unicode
standard. Of course, this only applies German.

The fact that umlaut and dieresis have been unified in Unicode, makes such a
variant glyph only applicable to a font targeted to German. You could not
use that font to, e.g., typeset English or French, because the ¨ in
coöperation or naïve is a dieresis, not an umlaut sign.

There are other cases out there of Unicode fonts suitable for Chinese but
not Japanese, Italian but not Polish,  Arabic but not Urdu, etc. Why should
a Unicode font suitable for German but not for English be any worse?

_ Marco




Re: Character identities

2002-10-25 Thread Stefan Persson
- Original Message - 
From: Marco Cimarosti [EMAIL PROTECTED]
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Friday, October 25, 2002 10:42 AM
Subject: RE: Character identities

 Of course, this only applies German.

And Swedish.

Stefan

_
Gratis e-mail resten av livet på www.yahoo.se/mail
Busenkelt!





RE: Character identities

2002-10-25 Thread Kent Karlsson

... Like it or not, superscript e *is* the
 same diacritic
 that later become ¨, so there is absolutely no violation of
 the Unicode
 standard. Of course, this only applies German.

Font makers, please do not meddle with the authors intent
(as reflected in the text of the document!).  Just as it
is inappropriate for font makers to use an ø glyph for ö
(they are the same, just slightly different derivations
from o^e), it is just as inappropriate for font makers to
use a o^e glyph for ö (by default in a Unicode font). Though
in some sense the same they are still different enough for
authors to care, and it is up to the document author/editor
to decide, not the font maker.

 From:  [EMAIL PROTECTED]

... We've implemented this  successfully in
 OpenType fonts using the Historical Forms hist feature.

If the umlaut to overscript e transformation is put under
this feature for some fonts, I see no major reason to complain...
(As others have noted, it does not really work for the long s,
unless the language is labelled 'en'...)

/Kent K





Re: Character identities

2002-10-25 Thread Otto Stolz
To all contributors to this thread:
Please cease cc-ing [EMAIL PROTECTED]! The CC was meant for
my remark on fuzzy search wrt. long-s and round-s. Google are
certainly not interested in any and all other turns this thread
has taken, or may take later.



David J. Perry had written:

An OpenType font that is smart enough to substitute a long s glyph at 
the right spots is the much superior long-term solution.

To which I had replied:

This will not work, cf. infra.


John Hudson wrote:

To be accurate, it works for display of English but not for German.



David's remark was about German Fraktur orthography. My quote was too
short, so this detail was lost. I apologize for any misunderstandings
possibly caused by my omission.

Best wishes,
  Otto Stolz





Superscript e (was: Character identities)

2002-10-25 Thread Otto Stolz
Marco Cimarosti (amongst others, using the same term) wrote:


superscript e *is* the same diacritic that later become ¨



The term superscript e does not aptly describe the situation.
Rather, the German a-Umlaut is derived from U+0061 U+0364
(LATIN SMALL CHARACTER A + COMBINING LATIN SMALL LETTER E),
cf. http://www.unicode.org/charts/PDF/U0300.pdf.

Best wishes,
   Otto Stolz









RE: Character identities

2002-10-25 Thread Marc Wilhelm Küster
At 14:04 25.10.2002 +0200, Kent Karlsson wrote:

Font makers, please do not meddle with the authors intent
(as reflected in the text of the document!).  Just as it
is inappropriate for font makers to use an ø glyph for ö
(they are the same, just slightly different derivations
from o^e), it is just as inappropriate for font makers to
use a o^e glyph for ö (by default in a Unicode font). Though
in some sense the same they are still different enough for
authors to care, and it is up to the document author/editor
to decide, not the font maker.


My wholehearted support!

DIN asked for the combining letter small e as well as the other combining 
small letters specifically to cater for the requirements of scholars in a 
number of countries, notably Germany. In a large number of editions and 
scholarly dictionaries, both diacritics, the combining diaeresis and the 
combining letter e, are used on the very same page, even directly next to 
each other. The former is used for modern German words, the latter for 
medieval German words.

The combining letter small e does not even necessarily stand for what today 
is the umlaut, it may have a number of different interpretations.

For modern and medieval German words, the base font is in these cases the 
same -- editions are not normally printed in some sort of pseudo-archaic 
font.

For this reason it is quite impermissible to render the combining letter 
small e as a diaeresis or, for that matter, the diaeresis as a combining 
letter small e (however, you see the latter version sometimes, very 
infrequently, in advertisement).

As to the long s, it is not used for writing present-day German except in 
rare cases, notably in some scholarly editions and in the Fraktur script. 
Very few texts beyond the names of newspapers are nowadays produced in 
Fraktur. To put the long s on the German keyboard would be quite contrary 
to user requirements -- and if a requirement existed, it would be DIN's job 
to amend DIN 2137-2 and the upcoming DIN 2137-12 to cater for it.

Best regards,

Marc


*
Marc Wilhelm Küster
Saphor GmbH

Fronländer 22
D-72072 Tübingen

Tel.: (+49) / (0)7472 / 949 100
Fax: (+49) / (0)7472 / 949 114




RE: Character identities

2002-10-25 Thread Marco Cimarosti
Marc Wilhelm Küster wrote:
 At 14:04 25.10.2002 +0200, Kent Karlsson wrote:
 Font makers, please do not meddle with the authors intent
 (as reflected in the text of the document!).  Just as it
 is inappropriate for font makers to use an ø glyph for ö
 (they are the same, just slightly different derivations
 from o^e), it is just as inappropriate for font makers to
 use a o^e glyph for ö (by default in a Unicode font). Though
 in some sense the same they are still different enough for
 authors to care, and it is up to the document author/editor
 to decide, not the font maker.
 
 My wholehearted support!
 
 [...]
 
 For this reason it is quite impermissible to render the 
 combining letter small e as a diaeresis

So far so good. There would be no reason for doing such a thing.

If the author of a scholarly work used U+0364 (COMBINING LATIN SMALL LETTER
E), this character should be displayed as either a letter e superscript to
the base letter, or as an empty square (for fonts not caring about that
character).

 or, for that matter, the diaeresis as a combining 
 letter small e (however, you see the latter version
 sometimes, very infrequently, in advertisement).

This is the case I though we were discussing, and it is a very different
case.

Standing Keld's opinion and Marc's wholehearted support, it follows that
those infrequent advertisements should be encoded using U+0364...

But U+0364 (COMBINING LATIN SMALL LETTER E) belongs to a small collection of
Medieval superscript letter diactrics, which is supposed to appear
primarily in medieval Germanic manuscripts, or to reproduce some usage as
late as the 19th century in some languages.

Using such a character to encode 21st century advertisements is doomed to
cause problems:

1) The glyph for U+0364 is more likely found in the font collection of the
Faculty of Germanic Studies that on the PC of people wishing to read the
advertisement for Ye Olde Küster Pub. So, most people will be unable to
view the advertisement correctly.

2) The designer of the advertisement will be unable to use his spell-checker
and hyphenator on the advertisement's text.

3) User's will be unable to find the Küster Pub by searching Küster in a
search engine.

What will actually happen is that everybody will see an empty square, so
they'll think that the web designer is an idiot, apart the professors at the
Faculty of Germanic Studies, who'll think that the designer is an idiot
because she doesn't know the difference between U+0308 and U+0364 in ancient
German.

The real error (IMHO) is the idea that font designers should stick to the
*sample* glyphs printed on the Unicode book, because this would force
graphic designer to change the *encoding* of their text in order to get the
desired result.

Another big error (IMHO, once again) is the idea that two different Unicode
characters should look different. The difference must be preserved when it
is useful -- e.g., U+0308 should not look like U+0364 in a font designed for
publishing books on the history of German!

What should really happen, IMHO, is that modern German should be encoded as
modern German. A U+0308 (COMBINING DIAERESIS) should remain a U+0308,
regardless that the corresponding glyph *looks* like U+0364 (COMBINING LATIN
SMALL LETTER E) in one font, and it looks like U+0304 (COMBINING MACRON) in
another font, and it looks like two five-pointed start side-by-side in a
third font, and it looks like Mickey Mouse's ears in Disney.ttf...

_ Marco




RE: Character identities

2002-10-25 Thread Marco Cimarosti
Kent Karlsson wrote:
 ... Like it or not, superscript e *is* the
  same diacritic
  that later become ¨, so there is absolutely no violation of
  the Unicode
  standard. Of course, this only applies German.

 Font makers, please do not meddle with the authors intent
 (as reflected in the text of the document!).  Just as it
 is inappropriate for font makers to use an ø glyph for ö
 (they are the same, just slightly different derivations
 from o^e), it is just as inappropriate for font makers to
 use a o^e glyph for ö (by default in a Unicode font). Though
 in some sense the same they are still different enough for
 authors to care, and it is up to the document author/editor
 to decide, not the font maker.

It is certainly up to the author of the document to decide.

But, as I explained more at length in my reply to Marc, the are two
different approaches for deciding this:

1. When this decision is a matter of *content* (as may be the case when
writing about linguistics, to differentiate spellings with o^e from
spellings with ö), it is more appropriate to make the difference at the
*encoding* level, by using the appropriate code point.

2. When this decision is only a matter of *presentation*, it is more
appropriate to make the difference by using a font which uses the desired
glyph for the normal ¨.

 If the umlaut to overscript e transformation is put under
 this feature for some fonts, I see no major reason to complain...
 (As others have noted, it does not really work for the long s,
 unless the language is labelled 'en'...)

And, of course, in an ideal word option 2 will be done by switching a font
feature, rather than switching to an ad-hoc font. This makes it possible for
font designers to provide a single font which covers both needs. But this is
just optimization, not compliance!

_ Marco




Re: hacked fonts in MS-Windows: rev. solidus vs Yen/Won(was..RE: Character identities)

2002-10-25 Thread Doug Ewell
Jungshik Shin jshin at mailaps dot org wrote:

 ...
 MS-Windows has to provide distinct ways to enter 'reverse solidus' and
 'Yen/Won' sign (both full-width and half-width) in Japanese and Korean
 IMEs.
 ...

Good points, well stated.  To make matters worse, the keyboard
references at Microsoft's Global Development subsite [1] show:

1.  for Korean, a won sign and the legend U+005C Reverse Solidus\nWon
Sign
2.  for Japanese, a yen sign and the legend U+005C Reverse Solidus\nYen
Sign

This helps perpetuate the idea that U+005C could be either a reverse
solidus, a won sign, or a yen sign, depending on the font.  This is
exactly what Unicode is *not* about.  Microsoft usually understands
this.

-Doug Ewell
 Fullerton, California

[1] http://www.microsoft.com/globaldev/keyboards/keyboards.asp





Re: Character identities

2002-10-24 Thread Doug Ewell
David Starner starner at okstate dot edu wrote:

 Likewise, ä is printed as a with e above in old texts.* Would it be
 acceptable to make a font with a a^e glyph for ä? It's not even
 changing the meaning of the character in any way.

Indeed, that is exactly what Sütterlin fonts do.  (Then again, Sütterlin
fonts assign the long-s glyph to U+0073 and make you type $ to get a
round s, so they may not be the best example.)

Stefan Persson alsjebegrijptwatikbedoel at yahoo dot se replied:

 Unicode defines a^e as U+0061 U+0364 (though it's exactly the same
 character as ä). Why?

They're not exactly the same, except in this particular German example.

Combining superscript e was encoded along with combining superscript a,
i, o, u, c, d, h, m, r, t, v, and x, none of which evolved into a real
diacritical mark the way e did.  Combining e had non-German uses as
well, as in early modern English Yͤ (which did not become Ÿ).

As for the diaeresis, its use in French, English (coöperate), and
other languages often has no relationship to the letter e.  Indeed, in
the sequence güe in Spanish, the diaeresis serves as a sort of anti-e,
ensuring the separate pronunciation of the u when the e would otherwise
prevent it!

Historically speaking, I and J were once equivalent, and U and V were
once equivalent, but they are all encoded today.

-Doug Ewell
 Fullerton, California





RE: Character identities

2002-10-24 Thread Kent Karlsson

 First, is it compliant with Unicode for an Antiqua font to use an s
 glyph for ſ (U+017F)? It makes switching between Antiqua and Fraktur
 fonts possible, and it is arguably the glyph given to the middle s in
 modern Antiqua fonts. 
 
 Likewise, ä is printed as a with e above in old texts.* Would it be
 acceptable to make a font with a a^e glyph for ä?

Please don't.  a^e is U+0061, U+0364.

 It's not even changing the meaning of the character in any way.

And ä and æ are the same, likewise are ö, œ, and ø the same
(in some sense, but not in general).  Some (in Denmark and Norway,
no-where else) even consider aa and å (and a, small o above) to
be the same (but not quite, especially when spelling names...).

Still they are definitely different enough to be considered
othographic differences, not font differences.  Likewise for
your examples. As for collation, and searches that are advanced
enough to make use of collation keys, the collation tables
*can* be tailored so that these variants, within each equivalence
(in some sense) group, have the same level 1 weights (which is
appropriate for scandinavian and german uses), but different
level 2 weights (as is appropriate, since this difference is
(usually) more significant than case distinctions).

/Kent Karlsson





Re: Character identities

2002-10-24 Thread David Starner
On Thu, Oct 24, 2002 at 11:46:04AM +0200, Kent Karlsson wrote:
 Please don't.  a^e is U+0061, U+0364.

Which is great, if you're a scholar trying to accurately reproduce an
old text; if you're Joe User, trying to print a document in an Olde
German font, it's far more inconvienant than helpful.

 Still they are definitely different enough to be considered
 othographic differences, not font differences. 

Changing a^e to ä is all that would need to be done to make the books
that use a^e look like those of the same timeframe that use ä. I'm not
sure where you draw the line between font and orthographic differences,
but this does not require dictionary lookup, and for my purposes is
most easily done by a font change.

-- 
David Starner - [EMAIL PROTECTED]
Great is the battle-god, great, and his kingdom--
A field where a thousand corpses lie. 
  -- Stephen Crane, War is Kind




Re: Character identities

2002-10-24 Thread Otto Stolz
David J. Perry had written:

An OpenType font that is smart enough to substitute a long s glyph at the
right spots is the much superior long-term solution.



This will not work, cf. infra.


David Starner wrote:

no matter what the convention, it requires a dictionary lookup for



various case;



A dictionary lookup will not suffice, as there are pairs of words
differing only in an ſ vs. s (long vs. round s), e. g.
· Wachſtube ['vaxʃtu:bə] = guard room
· Wachstube ['vakstu:bə] = wax tube
[Pronounciation in brackets]

To substitute a long s glyph at the right spots you must fully
analyse the sentence -- grammatically, and in cases as in the
previous example, even semantically -- to find the correct
spelling. Hence, it is much easier to type the ſ, and s,
characters in their proper places, and then replace ſ with
s, if so desired.

Fuzzy searches should equate ſ with s. Apparently,
Google.de doesn't do it right: a search vor Kinderſtube
yields no hits, while a search for Kinderstube yields
about 10700.

Best wishes,
  Otto Stolz





Re: Character identities

2002-10-24 Thread Stefan Persson
- Original Message -
From: [EMAIL PROTECTED]
To: John Hudson [EMAIL PROTECTED]
Cc: Otto Stolz [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Thursday, October 24, 2002 8:44 PM
Subject: Re: Character identities

 Looking at a Fraktur book published in 1917, which is neither English
 nor German, use of the long s appears almost whimsical.  Words like
 historie and utgivelse use the long s, while words like
 oplysninger and ensformig use the final s medially.  (The title
 of the book is En norsk bygds historie.)

S is the firſt letter of a ſyllable in words ſuch at hiſtorie and
utgivelſe, but the laſt in words ſuch as oplysninger and ensformig.

Stefan

_
Gratis e-mail resten av livet på www.yahoo.se/mail
Busenkelt!





Re: Character identities

2002-10-24 Thread jameskass

John Hudson wrote,

 At 06:47 AM 24-10-02, Otto Stolz wrote:
 
 David J. Perry had written:
 An OpenType font that is smart enough to substitute a long s glyph at the
 right spots is the much superior long-term solution.
 
 This will not work, cf. infra.
 
 To be accurate, it works for display of English but not for German. The 
 British convention for using the long-s can be handled contextually, 
 because it does not need to consider whether the letter is occuring at the 
 beginning or end of a syllable. We've implemented this successfully in 
 OpenType fonts using the Historical Forms hist feature. German presents a 
 much more difficult problem.
 

Looking at a copy of Of the Law-Terms:  A Discourse Written by The
Learned Antiquary. Sir Henry Spelman, Kt. (1684 edition) here.  Use
of initial/medial s versus final s is straightforward except in
cases like Malmesbury and Sarisburiam, in which the final s
is used medially.

Looking at a Fraktur book published in 1917, which is neither English
nor German, use of the long s appears almost whimsical.  Words like
historie and utgivelse use the long s, while words like 
oplysninger and ensformig use the final s medially.  (The title 
of the book is En norsk bygds historie.)

Best regards,

James Kass.




RE: Character identities

2002-10-24 Thread Marco Cimarosti
Kent Karlsson wrote:
 And it is easy for Joe User to make a simple (visual...)
 substitution cipher by just swiching to a font with the
 glyphs for letters (etc.) permuted.  Sure!  I think it
 would be a bad idea to call it a Unicode font though...
 (That it technically may have a unicode cmap is beside
 my point.)

The only meaning that I can attach to the expression Unicode font is a
pan-Unicode font: a font which covers all the scripts in Unicode.

If this is what you mean, then displaying ä as an a^e is clearly not a
good idea. But neither choosing Fraktur glyphs would be a good idea! How can
you have Fraktur IPA!? Fraktur Pinyin!? Fraktur Devanagari!? Fraktur
Arabic!? In general, no noticeable difference from the glyphs used on the
Unicode book would be a good idea for a pan-Unicode font.

But if by Unicode font you just mean a font which is compliant with the
Unicode standard, but only supports one or more of the scripts, then *any*
font having a unicode cmap is a Unicode font. And also many fonts *not*
having a Unicode cmap are, provided that something inside or outside the
font knows how to pick up the right glyphs.

In this sense, what is or is not appropriate depends on the font's style and
targeted usages and languages: there are fonts which don't have dots over
i and j; fonts where U+0059 and U+03A5 look different; fonts where
U+0061, U+0251, U+03B1 and U+FF41 look identical; fonts where capital and
small letters look identical...

Why can't there be a Fraktur font where ä and a^e look identical, if
this is appropriate for that typographical style and for the usages and
languages intended for the font?

Ciao.
Marco




Re: Character identities

2002-10-24 Thread Michael Everson
At 09:46 -0700 2002-10-24, John Hudson wrote:

At 06:47 AM 24-10-02, Otto Stolz wrote:


David J. Perry had written:

An OpenType font that is smart enough to substitute a long s glyph at the
right spots is the much superior long-term solution.


This will not work, cf. infra.


To be accurate, it works for display of English but not for German. 
The British convention for using the long-s can be handled 
contextually, because it does not need to consider whether the 
letter is occuring at the beginning or end of a syllable.


Not even for compounds?
--
Michael Everson * * Everson Typography *  * http://www.evertype.com




Re: Character identities

2002-10-24 Thread John Hudson
At 06:47 AM 24-10-02, Otto Stolz wrote:


David J. Perry had written:

An OpenType font that is smart enough to substitute a long s glyph at the
right spots is the much superior long-term solution.


This will not work, cf. infra.


To be accurate, it works for display of English but not for German. The 
British convention for using the long-s can be handled contextually, 
because it does not need to consider whether the letter is occuring at the 
beginning or end of a syllable. We've implemented this successfully in 
OpenType fonts using the Historical Forms hist feature. German presents a 
much more difficult problem.

John Hudson

Tiro Typeworks		www.tiro.com
Vancouver, BC		[EMAIL PROTECTED]

It is necessary that by all means and cunning,
the cursed owners of books should be persuaded
to make them available to us, either by argument
or by force.  - Michael Apostolis, 1467




Re: Long S on keyboard (was: Character identities)

2002-10-24 Thread Patrick Andries

- Message d'origine -
De : Otto Stolz [EMAIL PROTECTED]
À : Doug Ewell [EMAIL PROTECTED]
Cc : Unicode Mailing List [EMAIL PROTECTED]; Torsten Mohrin
[EMAIL PROTECTED]
Envoyé : 24 oct. 2002 12:06
Objet : Long S on keyboard (was: Character identities)


 Doug Ewell wrote:

  I'm not aware of any keyboard layout, German or otherwise, that contains
  U+017F.  Would it be reasonable to suggest that it be added to the
  standard German layout?  AltGr+s seems to be available.

To whom would you suggest such an addition? DIN ? Are its standard as
loosely followed as the Canadian standards as far as PC keyboards are
concerned? It is about impossible to find the CSA/ACNOR keyboard (CAN/CSA
Z243.200-92) in the main office and electronic equipement stores in
Canada...

 P. A.
- o - O - o -
Unicode et ISO10646
Nouveaux articles
http://hapax.iquebec.com









RE: Character identities

2002-10-24 Thread Kent Karlsson

And it is easy for Joe User to make a simple (visual...)
substitution cipher by just swiching to a font with the
glyphs for letters (etc.) permuted.  Sure!  I think it
would be a bad idea to call it a Unicode font though...
(That it technically may have a unicode cmap is beside
my point.)

Likewise for for your (less extreme) suggestions. They
are very close to suggesting making a swedish text use
Danish writing style for åäö (aa or å, æ, ø) by just a
font change.  Which you easily could do by a special font.
Would that font be a Unicode font?  I think all of these
changes would be unexpected (for the Latin script) from
a mere font change (between Unicode fonts).

If someone really wants such substitutions, it is easy
enough to produce using character string substitution.
No fonts like you suggest are needed.  But you would need
(a) font(s) that display U+0061, U+0364 and similar cases
properly.  The latter would be very welcome!

/Kent K


 On Thu, Oct 24, 2002 at 11:46:04AM +0200, Kent Karlsson wrote:
  Please don't.  a^e is U+0061, U+0364.

 Which is great, if you're a scholar trying to accurately reproduce an
 old text; if you're Joe User, trying to print a document in an Olde
 German font, it's far more inconvienant than helpful.

  Still they are definitely different enough to be considered
  othographic differences, not font differences.

 Changing a^e to ä is all that would need to be done to make the books
 that use a^e look like those of the same timeframe that use ä. I'm not
 sure where you draw the line between font and orthographic
 differences,
 but this does not require dictionary lookup, and for my purposes is
 most easily done by a font change.






Long S on keyboard (was: Character identities)

2002-10-24 Thread Otto Stolz
Doug Ewell wrote:


I'm not aware of any keyboard layout, German or otherwise, that contains
U+017F.  Would it be reasonable to suggest that it be added to the
standard German layout?  AltGr+s seems to be available.


It would certainly not hurt to have it there.

Fraktur, and Long-s, are not much used, these days. So, there will be
not much demand for a long-s key -- though it would come handy for some
kinds of usage, e. g. modern advertising, or reproducing texts from
before ~1950 (cf. http://www.gutenberg2000.de/).

Most German Fraktur fonts currently available seem to have particular,
proprietary encodings. A standardized Long-s key would certainly help
to promote Unicode amongst Fraktur font designers.

Best wishes,
  Otto Stolz





RE: Character identities

2002-10-24 Thread Kent Karlsson

 Kent Karlsson wrote:
  And it is easy for Joe User to make a simple (visual...)
  substitution cipher by just swiching to a font with the
  glyphs for letters (etc.) permuted.  Sure!  I think it
  would be a bad idea to call it a Unicode font though...
  (That it technically may have a unicode cmap is beside
  my point.)

 The only meaning that I can attach to the expression Unicode
 font is a
 pan-Unicode font: a font which covers all the scripts in Unicode.

 If this is what you mean,

No. (No current font technology can handle that b.t.w., them
having a limit of 64 Ki glyphs...; you'd need to one way or
another coalesce several fonts.  Or do something very neat
for CJK...)

 But if by Unicode font you just mean a font which is
 compliant with the
 Unicode standard, but only supports one or more of the
 scripts,

Yes, including that the glyphs are recognisably correct
for the given characters.

 then *any* font having a unicode cmap is a Unicode font.

No, not if the glyps (for the supported characters) are
inappropriate for the characters given.

 In this sense, what is or is not appropriate depends on the
 font's style and
 targeted usages and languages: there are fonts which don't
 have dots over
 i and j;

You have a slight point there, but those are not intended for
running text.  And I'm hesitant to label them Unicode fonts.

 fonts where U+0059 and U+03A5 look different;

Of course, those aren't even in the same script (though they are
similarlooking).

 fonts where
 U+0061, U+0251, U+03B1 and U+FF41 look identical;

So?

 fonts where capital and small letters look identical...

If you want small caps, or capitals, via the font, yes.
(But that should not be the default 'mode', should it?)

 Why can't there be a Fraktur font where ä and a^e look
 identical, if

ä and a^e look different even in Fraktur... Maybe the use
of ä in Fraktur is a beast, but that is beside my point.

 this is appropriate for that typographical style and for the
 usages and languages intended for the font?

Of course you can have such a font.  You can have any font
you like.  But I would not label it a Unicode font (regardless
if there is a Unicode cmap, in a particular subset of font
technologies, or not; bugs nothwithstanding).  Talking about
this particular subset of font technologies, maybe interested
parties (not me) should lobby for a new font feature for this.
But do you really want a font feature for this? Is it worth
the cost? (I'd just do some global substitutions; or put that
in a little special-purpose utility somewhere.)

/Kent K

 Ciao.
 Marco





Re: Long S on keyboard (was: Character identities)

2002-10-24 Thread Michael Everson
At 12:47 -0400 2002-10-24, Patrick Andries wrote:

- Message d'origine -
De : Otto Stolz [EMAIL PROTECTED]
ˆÄ : Doug Ewell [EMAIL PROTECTED]
Cc : Unicode Mailing List [EMAIL PROTECTED]; Torsten Mohrin
[EMAIL PROTECTED]
Envoyˆ© : 24 oct. 2002 12:06
Objet : Long S on keyboard (was: Character identities)



 Doug Ewell wrote:

  I'm not aware of any keyboard layout, German or otherwise, that contains
  U+017F.  Would it be reasonable to suggest that it be added to the

   standard German layout?  AltGr+s seems to be available.


I'm developing some drivers which take it into account. Of course GHA 
and WYNN and HWAIR are of greater concern to me
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: Character identities

2002-10-24 Thread Peter_Constable

On 10/24/2002 01:02:39 PM Kent Karlsson wrote:

 then *any* font having a unicode cmap is a Unicode font.

No, not if the glyps (for the supported characters) are
inappropriate for the characters given.

Kent is quite right here. There are a *lot* of fonts out there with Unicode
cmaps that do not at all conform to the Unicode standard  ---
custom-encoded (some call them hacked) fonts, usually abusing the
characters that make up Windows cp1252.




- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]







Character identities

2002-10-23 Thread David Starner
I have several questions about character identities.

First, is it compliant with Unicode for an Antiqua font to use an s
glyph for ſ (U+017F)? It makes switching between Antiqua and Fraktur
fonts possible, and it is arguably the glyph given to the middle s in
modern Antiqua fonts. 

Likewise, ä is printed as a with e above in old texts.* Would it be
acceptable to make a font with a a^e glyph for ä? It's not even changing
the meaning of the character in any way.

(I suspect the answer is it's not technically complaint, but nobody
cares.)

(To my surprise, I came across a text from 1920 that used the e-above
instead of a diearsis. The only other texts I've see with this date
before 1810. It was Islands Kultur zur Wikingerzeit by Felix Niedner,
in the series (?) Thule: Altnordische Dichtung und Prosa, which leads
me to believe, based off my limited German, that it's a deliberate
anacronism. Right?)

As a third case, I looked briefly at information and advocacy of the
duodecimal system. Chi and epsilon have been used as glyphs for 10 and
11, as well as an upside-down 2 and 3, a chi and reversed pound symbol
(? I'd need at that one again . . .) and * and #. Unified, they might a
proposal here, if someone still cares enough to make it. Would it be
unreasonable to unify them? There's quite a disparity in glyphs, but not
much argument against them all being the same character, and I don't
think there's anyone wanting to make the distinction.

-- 
David Starner - [EMAIL PROTECTED]
Great is the battle-god, great, and his kingdom--
A field where a thousand corpses lie. 
  -- Stephen Crane, War is Kind




Re: Character identities

2002-10-23 Thread Stefan Persson
- Original Message -
From: David Starner [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, October 23, 2002 7:00 PM
Subject: Character identities

 Likewise, ä is printed as a with e above in old texts.* Would it be
 acceptable to make a font with a a^e glyph for ä? It's not even changing
 the meaning of the character in any way.

Unicode defines a^e as U+0061 U+0364 (though it's exactly the same
character as ä). Why?

Stefan

_
Gratis e-mail resten av livet på www.yahoo.se/mail
Busenkelt!





Re: Character identities

2002-10-23 Thread Markus Scherer
David Starner wrote:

First, is it compliant with Unicode for an Antiqua font to use an s
glyph for ſ (U+017F)? It makes switching between Antiqua and Fraktur
fonts possible, and it is arguably the glyph given to the middle s in
modern Antiqua fonts. 

Likewise, ä is printed as a with e above in old texts.* Would it be
acceptable to make a font with a a^e glyph for ä? It's not even changing
the meaning of the character in any way.

In my opinion, this is all reasonable and should be allowed.
Viel Erfolg!


As a third case, I looked briefly at information and advocacy of the
duodecimal system. Chi and epsilon have been used as glyphs for 10 and  ...


I assume that the answer will be that these things are just alternate uses of existing characters.

markus

--
Opinions expressed here may not reflect my company's positions unless otherwise noted.