date:20021028

Re: Character identities

2002-10-28 Thread Barry Caplan

At 04:39 PM 10/28/2002 -0600, David Starner wrote:

>But think of the utility if Unicode added a COMBINING SNOWCAP and
>COMBINING FIRECAP! But should we combine the SNOWCAP with the ICECAP?
>
>(-:

Unicode captures the ice-age during the global warming era!

Do we have codepoints for images found on the walls of caves?

:)

Barry
www.i18n.com

Re: Character identities

2002-10-28 Thread William Overington

John Hudson commented.

>At 02:46 10/26/2002, William Overington wrote:
>
>>I don't know whether you might be interested in the use of a small letter
a
>>with an e as an accent codified within the Private Use Area, but in case
you
>>might be interested, the web page is as follows.
>>
>>http://www.users.globalnet.co.uk/~ngo/ligatur5.htm
>>
>>I have encoded the a with an e as an accent as U+E7B4 so that both
variants
>>may coexist in a document encoded in a plain text format and displayed
with
>>an ordinary TrueType font.
>
>If anyone were interested, he could do this himself and use any codepoint
>in the Private Use Area.

The meaning which I intended to convey was as follows.

I don't know whether you might be interested in having a look at a
particular example of the use of a small letter a with an e as an accent
codified within the Private Use Area by an individual with an interest in
applying Unicode, but in case you might be interested in having a look at
that particular example, the web page is as follows.

If, following from your response to the way that you read my sentence,
someone were interested in defining a codepoint in the Private Use Area then
certainly he or she could do that himself or herself and use any codepoint
in the Private Use Area.

However, exercising that freedom is something which could benefit from some
thought.

If someone wishes to encode an a with an e as an accent in the Private Use
Area, he or she may wish to be able to apply that code point allocation in a
document.  If he or she looks at which Private Use Area codepoints are
already in use within some existing fonts, then selecting a code point which
is at present unused in those fonts might give a greater chance of his or
her new character assignment being implemented than choosing a code point
for which those fonts already have a glyph in use.

Searching through such fonts takes time and requires some skill.

If someone does wish to use a Private Use Area code point for an a with an e
accent, then by using U+E7B4 does give a possible slight advantage in that
the code point is already part of a published set of code points available
on the web, for, even though that set of code points is not a standard, it
is a consistent set and other people might well use those codepoints as
well.  However, anyone may produce and publish such a set of code point
allocations of his or her own if he or she so wishes, or indeed keep them to
himself or herself.

Yet I was not seeking to make any such point in my posting.  I simply added
to a thread on a specialised topic what I thought might be a short
interesting note with a link to a web page at which some readers might like
to look.  The web page indeed provides two external links to interesting
documents on the web.

>Maybe it is time to include a note in the Unicode
>Standard to suggest that 'Private' Use Area means that one should keep it
>to oneself 

Well, at the moment the Unicode Standard does include the word publish in
the text about the Private Use Area.

I have published details of various uses of the Private Use Area on the web
yet not mentioned them in this forum.  For example, readers might perhaps
like to have a look at the following.

http://www.users.globalnet.co.uk/~ngo/ast07101.htm

Anyone who chooses to do so might like to have a look at the following file
as well, which introduces the application area.

http://www.users.glpbalnet.co.uk/~ngo/ast02100.htm

This is an application of the Unicode Private Use Area so as to produce a
set of soft buttons for a Java calculator so that the twenty hard button
minimum configuration of a hand held infra-red control device for a DVB-MHP
(Digital Video Broadcasting - Multimedia Home Platform) television can be
used in a consistent manner to signal information from the end user to the
computer in the television set.  I am very pleased with the result.  The
encoding achieves a useful effect while being consistent for information
handling purposes with the Unicode specification, so that an input stream of
characters may be processed by a Java program without any ambiguity over
whether a particular code point is a printing character or a calculator
button (or indeed mouse event or simulated mouse event as mouse events are
also encoded using the Private Use Area in my research).

William Overington

29 October 2002

The comet circumflex system.

2002-10-28 Thread William Overington

Readers interested in internationalization using Unicode might like to know
that I have recently added some documents about the comet circumflex system
to the web.

The introduction and index page are as follows.

http://www.users.globalnet.co.uk/~ngo/c_c0.htm

The main index page of the webspace is as follows.

http://www.users.globalnet.co.uk/~ngo

William Overington

29 October 2002

Re: Character identities

2002-10-28 Thread John Hudson

At 18:37 10/28/2002, Doug Ewell wrote:


It seems to me, as a non-font guy, that calling a font a "Unicode font"
implies two things:

1.  It must be based on Unicode code points.  For True- and OpenType
fonts, this implies a Unicode cmap; for other font technologies it
implies some more-or-less equivalent mechanism.  The point is that
glyphs must be associated with Unicode code points (not necessarily
1-to-1, of course), not merely with an internal 8-bit table that can be
mapped to Unicode only through some other piece of software.


My only ammendment to that would be:

'The point is that those glyphs that are intended to represent the default 
form of the characters supported by that font must be associated with 
Unicode codepoints, whether directly or indirectly, not merely...'

Not every glyph in a font needs to be encoded, and in general glyph 
variants and things like ligatures should not be, unless standard Unicode 
codepoints happen to be available for them (even then, it would be 
legitimate to leave them unencoded and access them only via glyph 
processing features).

2.  The glyphs must reflect the "essential characteristics" of the
Unicode character to which they are mapped.  That means a capital A can
be bold, italic, script, sans-serif, etc.  A small a can also be
small-caps (or even full-size caps), but I think this is the only
controversial point.


Yes, I would agree with that, with the caveat that the A-ness of an A isn't 
necessarily something that can be defined: it can only be recognised.

Of course, the term "Unicode font" is also often used to mean "a font
that covers all, or nearly all, of Unicode."  Font technologies
generally don't even allow this, of course, and even by the standards of
"nearly" we are still limiting ourselves to things like Bitstream
Cyberbit, Arial Unicode MS, Code2000, Cardo, etc.  Right or wrong, this
is a commonly accepted meaning for "Unicode font."


I really think we should all do what we can to bury this use of the term. 
It is singularly unhelpful, and the idea in the minds of some customers 
that they *need* a font that covers all of Unicode has not done anyone any 
good. Sure some font developers made some money making these ridiculously 
huge grab-bag fonts, but their time could have been much better spent.

John Hudson

Tiro Typeworks		www.tiro.com
Vancouver, BC		[EMAIL PROTECTED]

It is necessary that by all means and cunning,
the cursed owners of books should be persuaded
to make them available to us, either by argument
or by force.  - Michael Apostolis, 1467

Re: Character identities

2002-10-28 Thread John Cowan

Doug Ewell scripsit:

> 1.  It must be based on Unicode code points.  For True- and OpenType
> fonts, this implies a Unicode cmap; for other font technologies it
> implies some more-or-less equivalent mechanism.  The point is that
> glyphs must be associated with Unicode code points (not necessarily
> 1-to-1, of course), not merely with an internal 8-bit table that can be
> mapped to Unicode only through some other piece of software.

If it's a FIGlet font, of course, it's automatically Unicode, since FIGlet's
table is 32 bits wide.

> In a Unicode font, U+0041 cannot be mapped to a capital A with macron,
> as it is in Bookshelf Symbol 1; nor to a six-pointed star, as in
> Monotype Sorts; nor to a hand holding up two fingers, as in Wingdings.
> (But it can be mapped to a "notdef" glyph, if the font makes no claim to
> supporting U+0041.)

In fact, these fonts map these glyphs to U+F041.  Only when seen as 8-bit
fonts do they map to 0x41.

-- 
With techies, I've generally found  John Cowan
If your arguments lose the first round  http://www.reutershealth.com
Make it rhyme, make it scan http://www.ccil.org/~cowan
Then you generally can  [EMAIL PROTECTED]
Make the same stupid point seem profound!   --Jonathan Robie

Re: Character identities

2002-10-28 Thread Michael \(michka\) Kaplan

All this talk about the letter "A" reminded me of something from Hofstadter:

"The problem of intelligence, as I see it is to understand the fluid nature
of mental categories, to understand the invariant cores of percepts such as
your mother’s face, to understand the strangely flexible yet strong
boundaries of concepts such as “chair” or the letter “a“ … The central
problem of (artificial intelligence) is the question: What is the letter ‘a’
and ‘i’? ...By making these claims, I am suggesting that, for any program to
handle letterforms with the flexibility that human beings do, it would have
to possess full-scale general intelligence."

-- Douglas R. Hofstadter, from one of his Metamagical Themas articles

The notion that we could ever capture the essence of "A-ness" has already
been discussed at length and dismissed as impossible without an AI
breakthrough. :-)

MichKa

Re: Character identities

2002-10-28 Thread Mark Davis

I'm pretty much in agreement with what you say, except the following:

> Of course, the term "Unicode font" is also often used to mean "a font
> that covers all, or nearly all, of Unicode."

I would consider a Unicode font to be one that met your other conditions,
aside from the repertoire. If I had a font that covered Latin, Greek and
Cyrillic and worked with Unicode strings, for example, I would still
consider that a Unicode font. I just wouldn't consider it a (pick your
adjective) full / complete Unicode font.

Mark
__
http://www.macchiato.com
►  “Eppur si muove” ◄

- Original Message -
From: "Doug Ewell" <[EMAIL PROTECTED]>
To: "Unicode Mailing List" <[EMAIL PROTECTED]>
Sent: Monday, October 28, 2002 17:37
Subject: Re: Character identities


> My USD 0.02, as someone who is neither a professional typographer nor a
> font designer (more than one, but not quite two, different things)...
>
> Discussions about the character-glyph model often mention the "essential
> characteristics" of a given character.  For example, a Latin capital A
> can be bold, italic, script, sans-serif, etc., but it must always have
> that essential "A-ness" such that readers of (e.g.) English can identify
> it as an A instead of, say, an O or a 4 or a picture of a duck.  (Mark
> Davis has a chart showing dozens of different A's in his "Unicode Myths"
> presentation.)
>
> Somewhere in between the obvious relationships (A = A, B ≠ A), we have
> the case pair A and a.  They are not identical, but they are certainly
> more similar to each other than are A and B.
>
> It seems to me, as a non-font guy, that calling a font a "Unicode font"
> implies two things:
>
> 1.  It must be based on Unicode code points.  For True- and OpenType
> fonts, this implies a Unicode cmap; for other font technologies it
> implies some more-or-less equivalent mechanism.  The point is that
> glyphs must be associated with Unicode code points (not necessarily
> 1-to-1, of course), not merely with an internal 8-bit table that can be
> mapped to Unicode only through some other piece of software.
>
> 2.  The glyphs must reflect the "essential characteristics" of the
> Unicode character to which they are mapped.  That means a capital A can
> be bold, italic, script, sans-serif, etc.  A small a can also be
> small-caps (or even full-size caps), but I think this is the only
> controversial point.
>
> In a Unicode font, U+0041 cannot be mapped to a capital A with macron,
> as it is in Bookshelf Symbol 1; nor to a six-pointed star, as in
> Monotype Sorts; nor to a hand holding up two fingers, as in Wingdings.
> (But it can be mapped to a "notdef" glyph, if the font makes no claim to
> supporting U+0041.)
>
> U+0915 absolutely can have snow on it, or be bold or italic or whatever
> (or all of these), as long as a Devanagari reader would recognize its
> essential "ka-ness."  It cannot look like a Latin A, nor for that matter
> can U+0041 look like a Devanagari ka.
>
> Font guys, do you agree with this?
>
> Of course, the term "Unicode font" is also often used to mean "a font
> that covers all, or nearly all, of Unicode."  Font technologies
> generally don't even allow this, of course, and even by the standards of
> "nearly" we are still limiting ourselves to things like Bitstream
> Cyberbit, Arial Unicode MS, Code2000, Cardo, etc.  Right or wrong, this
> is a commonly accepted meaning for "Unicode font."
>
> -Doug Ewell
>  Fullerton, California
>
>
>

Re: Character identities

2002-10-28 Thread Doug Ewell

My USD 0.02, as someone who is neither a professional typographer nor a
font designer (more than one, but not quite two, different things)...

Discussions about the character-glyph model often mention the "essential
characteristics" of a given character.  For example, a Latin capital A
can be bold, italic, script, sans-serif, etc., but it must always have
that essential "A-ness" such that readers of (e.g.) English can identify
it as an A instead of, say, an O or a 4 or a picture of a duck.  (Mark
Davis has a chart showing dozens of different A's in his "Unicode Myths"
presentation.)

Somewhere in between the obvious relationships (A = A, B ≠ A), we have
the case pair A and a.  They are not identical, but they are certainly
more similar to each other than are A and B.

It seems to me, as a non-font guy, that calling a font a "Unicode font"
implies two things:

1.  It must be based on Unicode code points.  For True- and OpenType
fonts, this implies a Unicode cmap; for other font technologies it
implies some more-or-less equivalent mechanism.  The point is that
glyphs must be associated with Unicode code points (not necessarily
1-to-1, of course), not merely with an internal 8-bit table that can be
mapped to Unicode only through some other piece of software.

2.  The glyphs must reflect the "essential characteristics" of the
Unicode character to which they are mapped.  That means a capital A can
be bold, italic, script, sans-serif, etc.  A small a can also be
small-caps (or even full-size caps), but I think this is the only
controversial point.

In a Unicode font, U+0041 cannot be mapped to a capital A with macron,
as it is in Bookshelf Symbol 1; nor to a six-pointed star, as in
Monotype Sorts; nor to a hand holding up two fingers, as in Wingdings.
(But it can be mapped to a "notdef" glyph, if the font makes no claim to
supporting U+0041.)

U+0915 absolutely can have snow on it, or be bold or italic or whatever
(or all of these), as long as a Devanagari reader would recognize its
essential "ka-ness."  It cannot look like a Latin A, nor for that matter
can U+0041 look like a Devanagari ka.

Font guys, do you agree with this?

Of course, the term "Unicode font" is also often used to mean "a font
that covers all, or nearly all, of Unicode."  Font technologies
generally don't even allow this, of course, and even by the standards of
"nearly" we are still limiting ourselves to things like Bitstream
Cyberbit, Arial Unicode MS, Code2000, Cardo, etc.  Right or wrong, this
is a commonly accepted meaning for "Unicode font."

-Doug Ewell
 Fullerton, California

RE: Character identities

2002-10-28 Thread Michael Everson

At 14:31 -0800 2002-10-28, Figge, Donald wrote:

At 20:59 + 2002-10-28, Anto'nio Martins-Tuva'lkin wrote:

On 2002.10.28, 13:09, David Starner <[EMAIL PROTECTED]> wrote:


  Basically, any decorative or handwriting font can't be a Unicode font.

<...>

  Seems pointless to tell a lot of the fontmakers out there that they
  shouldn't worry about Unicode, because Unicode's only for standard
  book fonts


Hm, what if I want to make, say, snow capped Devanagari glyphs for my
hiking company in Nepal? Shouldn't I assign them to Unicode code points?


That's what Private Use code positions are for.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com
--
I don't think so. He seems to be talking about a specific typographic style.
Code points don't care about style, whether it's Franklin Gothic or
Snowcapped Helvetica.


I must have misunderstood. I think I only saw the "snow-capped" and 
not the "Devanagari". Sorry.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Character identities

2002-10-28 Thread Michael Everson

At 14:30 -0800 2002-10-28, Kenneth Whistler wrote:

 > >Hm, what if I want to make, say, snow capped Devanagari glyphs for my

 >hiking company in Nepal? Shouldn't I assign them to Unicode code points?

 That's what Private Use code positions are for.
 --
 Michael Everson * * Everson Typography *  * http://www.evertype.com

Um, Michael, I think Anto'nio was talking about glyphs in a
decorative font, which should -- clearly -- just be mapped to
ordinary Unicode characters, via an ordinary Unicode cmap.

If they correspond to Unicode characters, yes, certainly.

Or do you think that the yellow, cursive, shadow-dropped, 3-D
letters "Getaway!" at:

http://www.trekking-in-nepal.com/

should also be represented by Private Use code positions? ;-)

Not at all. Fonts with images of igloos and yurts would use it, 
though, I would think.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Character identities

2002-10-28 Thread David Starner

On Mon, Oct 28, 2002 at 01:36:08PM -0700, John Hudson wrote:
> 
> >On 2002.10.28, 13:09, David Starner <[EMAIL PROTECTED]> wrote:
> >
> >> Basically, any decorative or handwriting font can't be a Unicode font.
> ><...>
> >> Seems pointless to tell a lot of the fontmakers out there that they
> >> shouldn't worry about Unicode, because Unicode's only for standard
> >> book fonts
> 
> Hello? Who says decorative or handwriting fonts can't be Unicode fonts? 
[...]
> Or are you working with some definition of 'Unicode font' other than 'font 
> with a Unicode cmap'?

Right above where it was cut it said:

Marco:
 > A U+0308 (COMBINING DIAERESIS) should remain a U+0308,
 > regardless that the corresponding glyph *looks* like U+0364
 > (COMBINING LATIN
 > SMALL LETTER E) in one font, and it looks like U+0304
 > (COMBINING MACRON) in
 > another font, and it looks like two five-pointed start
 > side-by-side in a
 > third font, and it looks like Mickey Mouse's ears in ...
 
Kent:
 > These are all unacceptable variations in a *Unicode font (in
 > default mode)*.

Earlier:

Marco:
 > there are fonts which don't have dots over "i" and "j";

Kent:
 > You have a slight point there, but those are not intended for
 > running text.  And I'm hesitant to label them "Unicode fonts".

Given that definition of Unicode fonts, a number of decorative or
handwriting fonts (though fewer than I expected) are arbitrarily
excluded from being Unicode fonts.

-- 
David Starner - [EMAIL PROTECTED]
Great is the battle-god, great, and his kingdom--
A field where a thousand corpses lie. 
  -- Stephen Crane, "War is Kind"

Re: Character identities

2002-10-28 Thread David Starner

On Mon, Oct 28, 2002 at 09:36:34PM +, Michael Everson wrote:
> At 20:59 + 2002-10-28, Anto'nio Martins-Tuva'lkin wrote:
> >On 2002.10.28, 13:09, David Starner <[EMAIL PROTECTED]> wrote:
> >
> >> Basically, any decorative or handwriting font can't be a Unicode font.
> ><...>
> >> Seems pointless to tell a lot of the fontmakers out there that they
> >> shouldn't worry about Unicode, because Unicode's only for standard
> >> book fonts
> >
> >Hm, what if I want to make, say, snow capped Devanagari glyphs for my
> >hiking company in Nepal? Shouldn't I assign them to Unicode code points?
> 
> That's what Private Use code positions are for.

But think of the utility if Unicode added a COMBINING SNOWCAP and
COMBINING FIRECAP! But should we combine the SNOWCAP with the ICECAP?

(-:

-- 
David Starner - [EMAIL PROTECTED]
Great is the battle-god, great, and his kingdom--
A field where a thousand corpses lie. 
  -- Stephen Crane, "War is Kind"

Re: Character identities

2002-10-28 Thread Kenneth Whistler


> >Hm, what if I want to make, say, snow capped Devanagari glyphs for my
> >hiking company in Nepal? Shouldn't I assign them to Unicode code points?
> 
> That's what Private Use code positions are for.
> -- 
> Michael Everson * * Everson Typography *  * http://www.evertype.com

Um, Michael, I think Anto'nio was talking about glyphs in a
decorative font, which should -- clearly -- just be mapped to
ordinary Unicode characters, via an ordinary Unicode cmap.

Or do you think that the yellow, cursive, shadow-dropped, 3-D
letters "Getaway!" at:

http://www.trekking-in-nepal.com/

should also be represented by Private Use code positions? ;-)

--Ken

Re: Character identities

2002-10-28 Thread Michael Everson

At 13:36 -0700 2002-10-28, John Hudson wrote:


Or are you working with some definition of 'Unicode font' other than 
'font with a Unicode cmap'?

It seemed to me that he was talking about fonts that had characters 
that weren't in Unicode at all. I don't mean precomposed vowels, but, 
say, fonts with moon phases in them.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

RE: Character identities

2002-10-28 Thread Figge, Donald


At 20:59 + 2002-10-28, Anto'nio Martins-Tuva'lkin wrote:
>On 2002.10.28, 13:09, David Starner <[EMAIL PROTECTED]> wrote:
>
>>  Basically, any decorative or handwriting font can't be a Unicode font.
><...>
>>  Seems pointless to tell a lot of the fontmakers out there that they
>>  shouldn't worry about Unicode, because Unicode's only for standard
>>  book fonts
>
>Hm, what if I want to make, say, snow capped Devanagari glyphs for my
>hiking company in Nepal? Shouldn't I assign them to Unicode code points?

That's what Private Use code positions are for.
-- 
Michael Everson * * Everson Typography *  * http://www.evertype.com
--
I don't think so. He seems to be talking about a specific typographic style.
Code points don't care about style, whether it's Franklin Gothic or
Snowcapped Helvetica.

Don

Re: Character identities

2002-10-28 Thread Michael Everson

At 20:59 + 2002-10-28, Anto'nio Martins-Tuva'lkin wrote:

On 2002.10.28, 13:09, David Starner <[EMAIL PROTECTED]> wrote:


 Basically, any decorative or handwriting font can't be a Unicode font.

<...>

 Seems pointless to tell a lot of the fontmakers out there that they
 shouldn't worry about Unicode, because Unicode's only for standard
 book fonts


Hm, what if I want to make, say, snow capped Devanagari glyphs for my
hiking company in Nepal? Shouldn't I assign them to Unicode code points?


That's what Private Use code positions are for.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Character identities

2002-10-28 Thread John Hudson

On 2002.10.28, 13:09, David Starner <[EMAIL PROTECTED]> wrote:

> Basically, any decorative or handwriting font can't be a Unicode font.
<...>
> Seems pointless to tell a lot of the fontmakers out there that they
> shouldn't worry about Unicode, because Unicode's only for standard
> book fonts

Hello? Who says decorative or handwriting fonts can't be Unicode fonts? 
I've got dozens of fonts on my system that prove this wrong. Zapfino, which 
ships with OS X and which I had the privilege to work on, is about as 
decorative a handwriting font as you could wish for, and of course it has a 
Unicode cmap.

Or are you working with some definition of 'Unicode font' other than 'font 
with a Unicode cmap'?

John Hudson

Tiro Typeworks		www.tiro.com
Vancouver, BC		[EMAIL PROTECTED]

It is necessary that by all means and cunning,
the cursed owners of books should be persuaded
to make them available to us, either by argument
or by force.  - Michael Apostolis, 1467

Re: Character identities

2002-10-28 Thread Anto'nio Martins-Tuva'lkin

On 2002.10.28, 13:09, David Starner <[EMAIL PROTECTED]> wrote:

> Basically, any decorative or handwriting font can't be a Unicode font.
<...>
> Seems pointless to tell a lot of the fontmakers out there that they
> shouldn't worry about Unicode, because Unicode's only for standard
> book fonts

Hm, what if I want to make, say, snow capped Devanagari glyphs for my
hiking company in Nepal? Shouldn't I assign them to Unicode code points?

--   .
António MARTINS-Tuválkin|  ()|
<[EMAIL PROTECTED]>   ||
R. Laureano de Oliveira, 64 r/c esq. |
PT-1885-050 MOSCAVIDE (LRS)  Não me invejo de quem tem   |
+351 917 511 549 carros, parelhas e montes   |
http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe   |
http://pagina.de/bandeiras/  a água em todas as fontes   |

Re: Character identities

2002-10-28 Thread Doug Ewell

Marco Cimarosti  wrote:

>> There are also lots of characters that "mean" the same, but
>> always (in a Unicode font in default mode) should/must
>> look different. Like M and Roman Numeral One Thousand C D
>> (just to take an example closer to Italy... ;-).
>
> Well, the first and only time I have seen that "Thousand C D" was on
> the Unicode charts... However, if I'd be asked which glyph is more
> appropriate for that character, I would say: the same as capital "M".

I would disagree with this.  It seems to me the whole reason for both
U+216F ROMAN NUMERAL ONE THOUSAND and U+2180 ROMAN NUMERAL ONE THOUSAND
C D to exist is that they should have different glyphs.  This is not
necessarily is keeping with the purest spirit of Unicode (which might
regard these as two glyphs of a single character), but in reality they
are encoded as two characters.

Note, however, that there is nothing wrong with using the same glyph for
U+004D and U+216F, although in many fonts they are different for no
obvious reason.

-Doug Ewell
 Fullerton, California

RE: Character identities

2002-10-28 Thread Marco Cimarosti

Kent Karlsson wrote:
> > > For this reason it is quite impermissible to render the
> > > combining letter small e as a diaeresis
> >
> > So far so good. There would be no reason for doing such a thing.
> ...
> > > or, for that matter, the diaeresis as a combining
> > > letter small e (however, you see the latter version
> > > sometimes, very infrequently, in advertisement).
> >
> > This is the case I though we were discussing, and it is a
> > very different case.
> 
> No, the claim was that diaresis and overscript e are the same,

The claim was that dieresis and overscript e are the same in *modern*
*standard* German. Or, better stated, that overscript e is just a glyph
variant of dieresis, in *modern* *standard* German typeset in Fraktur.

Sorry if I haven't stated this clearly enough.

> so the reversed case Marc is talking about is not different at all.

It is. In the first case, we are talking about a glyph variant in *modern*
*standard* German, in the second case, we are talking about two different
diacritics in some *other* context. (Ancient German? ancient Swedish?).

> > Standing Keld's opinion and Marc's wholehearted support, it
> 
> Please don't confuse me with Keld!

Oooops! My apologies!

> > follows that
> > those infrequent advertisements should be encoded using U+0364...
> >
> > But U+0364 (COMBINING LATIN SMALL LETTER E) belongs to a
> > small collection of
> > "Medieval superscript letter diactrics", which is supposed 
> to "appear
> > primarily in medieval Germanic manuscripts", or to reproduce
> > "some usage as late as the 19th century in some languages".
> 
> Yes, but you should not read too much into the explanation,
> which, while correct, does not limit the existence of their
> glyphs to fonts used only by germanic professors...
> Some of them (overscript e in particular) should be(come)
> quite commonly occurring in any Fraktur Unicode font.

"Commonly" sounds funny near "Fraktur"...

> > Using such a character to encode 21st century advertisements
> > is doomed to cause problems:
> >
> > 1) The glyph for U+0364 is more likely found in the font
> > collection of the
> > Faculty of Germanic Studies that on the PC of people wishing
> > to read the
> > advertisement for "Ye Olde Küster Pub". So, most people will
> > be unable to
> > view the advertisement correctly.
> >
> > 2) The designer of the advertisement will be unable to use
> > his spell-checker and hyphenator on the advertisement's text.
> 
> Advertisements should invariably be final spell-checked and
> hyphenated by humans!  Automated spell checkers and hyphenators
> for German (as well as Scandinavian languages) have (so far)
> not been good enough even for running text that you want to
> publish...

This has no connection with this discussion.

However, IMHO, the presence U+0364 (COMBINING LATIN SMALL LETTER E) in a
modern German or Swedish text is just a plain spelling error, and even the
naivest spellchecker should flag it as such.

> > 3) User's will be unable to find the Küster Pub by searching
> > "Küster" in a
> > search engine.
> 
> Depends on the search engine, and if it uses a correct collation
> table (for the language) or not...
>
> > What will actually happen is that everybody will see an empty
> > square, so
> > they'll think that the web designer is an idiot, apart the
> > professors at the
> > Faculty of Germanic Studies, who'll think that the designer
> > is an idiot
> > because she doesn't know the difference between U+0308 and
> > U+0364 in ancient German.
> 
> Most modern use of Fraktur seem to use diaeresis or double
> acute for this. 

U+0308 (COMBINING DIAERESIS) should be the only "umlaut" to be found in
modern German text. What that diacritic *looks* like (two dots, an "e", a
double acute, a macron, Mickey Mouse's ears), is a choice of the font
designer.

> (But the web designer could use a dynamically
> downloaded font fragment, if there is worry that all glyphs
> might not be supported by the fonts used by the vast majority
> of the target audience.)

This too has no connection with this discussion, and is OT. Unicode is
concerned with how text is *encoded* the details of fonts and display
technology are out of scope.

What Unicode really mandates is that the encoding should not change to
obtain a certain graphic effect.

> > The real error (IMHO) is the idea that font designers should
> > stick to the
> > *sample* glyphs printed on the Unicode book, because this 
> would force
> 
> Well, the diacritics are allocated/unified on glyphic grounds.
> While a diaeresis may look different from font to font, it is
> basically two "dots" (of some shape in line with the design of the
> font), never an "e" shape.  At least not in the *default mode* of a
> *Unicode font*.
>
> And overscript small e will also vary with the font,
> looking like a shrunken ordinary e glyph of (ideally) the same font.
> But never like two dots (in the default mode of a Unicode font).

You haven't yet defined your meaning of "Uni

Copyright on gif images via http://www.unicode.org/cgi-bin/GetUnihanData.pl

2002-10-28 Thread Dan Kogai

I have asked this question before without answer so I am repeating 
again.

The Unihan Database browser at 
http://www.unicode.org/cgi-bin/GetUnihanData.pl shows an example glyph 
via http://www.unicode.org/cgi-bin/refglyph?24-.  I would 
like to use this image but where can I ask for the permission?

I have written a CGI which "renders" a banner using the URI above but I 
am not sure if I can cache the image.  The CGI is still on my intranet 
but I can disclose upon request.

Dan Kogai or http://www.unicode.org/cgi-bin/refglyph?24-5f3e

Re: Character identities

2002-10-28 Thread David Starner

On Mon, Oct 28, 2002 at 11:21:30AM +0100, Kent Karlsson wrote:
> No, the claim was that diaresis and overscript e are the same,
> so the reversed case Marc is talking about is not different at all.

The claim is, that for certain fonts, it is appropriate to image the
a-umlaut character as an a^e. That doesn't imply anything about the
other way around, or else t' could legally be displayed as a t with
caron above.

> > A U+0308 (COMBINING DIAERESIS) should remain a U+0308,
> > regardless that the corresponding glyph *looks* like U+0364
> > (COMBINING LATIN
> > SMALL LETTER E) in one font, and it looks like U+0304
> > (COMBINING MACRON) in
> > another font, and it looks like two five-pointed start
> > side-by-side in a
> > third font, and it looks like Mickey Mouse's ears in ...
> 
> These are all unacceptable variations in a *Unicode font (in
> default mode)*.  But you can have all kinds of silly variations
> in *non*-Unicode fonts applied to Unicode text, including ciphers
> or rebuses... (ok, there are degrees...)

Basically, any decorative or handwriting font can't be a Unicode font.
(The glyph for my German teachers umlaut was definitely a macron.) Seems
pointless to tell a lot of the fontmakers out there that they shouldn't
worry about Unicode, because Unicode's only for standard book fonts, but
that's the only way I can read your last statement.

-- 
David Starner - [EMAIL PROTECTED]
Great is the battle-god, great, and his kingdom--
A field where a thousand corpses lie. 
  -- Stephen Crane, "War is Kind"

RE: Character identities

2002-10-28 Thread Kent Karlsson

...
> > For this reason it is quite impermissible to render the
> > combining letter small e as a diaeresis
>
> So far so good. There would be no reason for doing such a thing.
...
> > or, for that matter, the diaeresis as a combining
> > letter small e (however, you see the latter version
> > sometimes, very infrequently, in advertisement).
>
> This is the case I though we were discussing, and it is a
> very different case.

No, the claim was that diaresis and overscript e are the same,
so the reversed case Marc is talking about is not different at all.

> Standing Keld's opinion and Marc's wholehearted support, it

Please don't confuse me with Keld!

> follows that
> those infrequent advertisements should be encoded using U+0364...
>
> But U+0364 (COMBINING LATIN SMALL LETTER E) belongs to a
> small collection of
> "Medieval superscript letter diactrics", which is supposed to "appear
> primarily in medieval Germanic manuscripts", or to reproduce
> "some usage as late as the 19th century in some languages".

Yes, but you should not read too much into the explanation,
which, while correct, does not limit the existence of their
glyphs to fonts used only by germanic professors...
Some of them (overscript e in particular) should be(come)
quite commonly occurring in any Fraktur Unicode font.

> Using such a character to encode 21st century advertisements
> is doomed to cause problems:
>
> 1) The glyph for U+0364 is more likely found in the font
> collection of the
> Faculty of Germanic Studies that on the PC of people wishing
> to read the
> advertisement for "Ye Olde Küster Pub". So, most people will
> be unable to
> view the advertisement correctly.
>
> 2) The designer of the advertisement will be unable to use
> his spell-checker and hyphenator on the advertisement's text.

Advertisements should invariably be final spell-checked and
hyphenated by humans!  Automated spell checkers and hyphenators
for German (as well as Scandinavian languages) have (so far)
not been good enough even for running text that you want to
publish...

> 3) User's will be unable to find the Küster Pub by searching
> "Küster" in a
> search engine.

Depends on the search engine, and if it uses a correct collation
table (for the language) or not...

> What will actually happen is that everybody will see an empty
> square, so
> they'll think that the web designer is an idiot, apart the
> professors at the
> Faculty of Germanic Studies, who'll think that the designer
> is an idiot
> because she doesn't know the difference between U+0308 and
> U+0364 in ancient German.

Most modern use of Fraktur seem to use diaeresis or double
acute for this. (But the web designer could use a dynamically
downloaded font fragment, if there is worry that all glyphs
might not be supported by the fonts used by the vast majority
of the target audience.)

> The real error (IMHO) is the idea that font designers should
> stick to the
> *sample* glyphs printed on the Unicode book, because this would force

Well, the diacritics are allocated/unified on glyphic grounds.
While a diaeresis may look different from font to font, it is
basically two "dots" (of some shape in line with the design of the
font), never an "e" shape.  At least not in the *default mode* of a
*Unicode font*. And overscript small e will also vary with the font,
looking like a shrunken ordinary e glyph of (ideally) the same font.
But never like two dots (in the default mode of a Unicode font).

> graphic designer to change the *encoding* of their text in
> order to get the desired result.

A graphic designer is likely to turn the whole thing into 2-d
or 3-d graphics, probably distorted, possibly animated, to get
the desired result!  At which point the original, or intemediary,
encoding of any text elements is not very relevant to the
end result.

> Another big error (IMHO, once again) is the idea that two
> different Unicode characters should look different.

I have never said that! E.g., a µ as well as an Å (both of which
are allocated twice!) should look the same (resp.) regardless of
which of their respective code points is used. There are many
more examples of characters that definitely should (e.g. capital
K and Kelvin sign, small i and small roman numeral one) or may
(capital A, capital Alpha, ...) look the same.

There are also lots of characters that "mean" the same, but
always (in a Unicode font in default mode) should/must
look different. Like M and Roman Numeral One Thousand C D
(just to take an example closer to Italy... ;-).

> The difference must be preserved when it
> is useful -- e.g., U+0308 should not look like U+0364 in a

"should not" --> "must never"

> font designed for
> publishing books on the history of German!

"a font ." --> "any Unicode font in default mode"

(Bad example, Marco!)

>
> What should really happen, IMHO, is that modern German should
> be encoded as
> modern German. A U+0308 (COMBINING DIAERESIS) should remain a U+0308,
> regardless that the correspondi

BabelPad

2002-10-28 Thread Andrew C. West

BabelPad, my free Unicode plain text editor for Windows has now been released.
Further information is available at
.

BabelPad also includes input methods for a number of scripts which I am
interested in, currently :
Tibetan (using Extended Wylie)
Yi (using standard romanisation)

Note that although a build of BabelPad for Windows 95/98/ME is available, it is
not as feature-complete as the builds for Windows NT 4.0 or Windows 2000/XP, and
may not work properly when configured to use different fonts for different
Unicode ranges.

I haven't written the help system yet, but a FAQ is available at
.

My thanks to those members of this list who have commented on the pre-releaee
versions of BabelPad.

Regards,

Andrew

Re: Character identities

2002-10-28 Thread Marc Wilhelm Küster

At 11:37 25.10.2002 -0700, Doug Ewell wrote:

Marc Wilhelm KÃ¼ster  wrote:

> As to the long s, it is not used for writing present-day German except
> in rare cases, notably in some scholarly editions and in the Fraktur
> script. Very few texts beyond the names of newspapers are nowadays
> produced in Fraktur. To put the long s on the German keyboard would be
> quite contrary to user requirements -- and if a requirement existed,
> it would be DIN's job to amend DIN 2137-2 and the upcoming DIN 2137-12
> to cater for it.

"Irrelevant," sure, but "contrary"?  I don't see what harm could come
from adding a character to a previously unassigned key, especially in
the relatively obscure AltGr zone (Level 3).  Most users could safely
ignore it, and most would never even know it was there.

In principle, you are right. Unfortunately, there's quite a bit of software 
around that (mis-)uses unassigned AltGr-Keys for their own purposes - this 
includes, on Windows NT ff at least, software such as the localized MS 
Word. So, adding new assignments potentially clashes with existing software 
and should only be done if there is a sufficiently high public interest in 
doing so.

But yes, of course it would be DIN's job to standardize such a thing (or
not).

Patrick Andries asked if a revised German keyboard standard would be
ignored in the market with the same cavalier attitude seen in Canada
(and the U.S.).  My impression is that European manufacturers are held
more closely to conformance with national and international standards
than North American manufacturers, but I'd want some Europeans to back
me up on this.

Speaking of Europe, it differs from country to country. In Germany 
certainly DIN 2137 is widely adhered to and changes to it would in all 
likelihood be taken up fast on the market.

Best regards,

Marc Küster

-Doug Ewell
 Fullerton, California

*
Marc Wilhelm Küster
Saphor GmbH

Fronländer 22
D-72072 Tübingen

Tel.: (+49) / (0)7472 / 949 100
Fax: (+49) / (0)7472 / 949 114

Re: Character identities

Re: Character identities

The comet circumflex system.

Re: Character identities

Re: Character identities

Re: Character identities

Re: Character identities

Re: Character identities

RE: Character identities

Re: Character identities

Re: Character identities

Re: Character identities

Re: Character identities

Re: Character identities

RE: Character identities

Re: Character identities

Re: Character identities

Re: Character identities

Re: Character identities

RE: Character identities

Copyright on gif images via http://www.unicode.org/cgi-bin/GetUnihanData.pl

Re: Character identities

RE: Character identities

BabelPad

Re: Character identities

25 matches

Site Navigation

Mail list logo

Footer information