Re: "Missing character" glyph- example

2002-08-02 Thread Martin Kochanski

Periphrasis is always possible, of course; but that doesn't mean that it is desirable. 

1. Periphrasis is by definition longer. In a page where you want to present a lot of 
information and not have it squeezed out by meta-information, the first paragraph in 
my example could read "Seeing things like []? Click here". (I do agree with you that 
"click here" is more sensible than "download a font or...", but I just wanted to 
squeeze my example onto a single page instead of having to provide a target for the 
link).

>If you have trouble displaying any of the characters in the text on this page,
2. "Having trouble displaying" implies that the reader knows what the stuff he is 
displaying ought to look like. If you show me a page of Arabic, then as long as it 
looks all sort of squiggly I have no real way of knowing whether it is right or wrong. 
[If you can read Arabic, then substitute a script that you don't know; unless you are 
Michael Everson, in which case there is no such thing]. If you show me a page of text 
that I *can* read, and tell me to look for "trouble", I'm also lost unless you tell me 
what sort of trouble I am meant to be looking for ("characters" don't mean much to a 
naive user). 
If you want a rubric that asks the user to "click here" for other kinds of display 
problem, you could phrase it a little differently (eg: "Seeing weird things like []? 
Click here for help"). I don't want to get into the detailed poetics of user 
interfaces: all I am seeking to establish is that being able to display "[]" can make 
for shorter and more direct messages.

>A. Avoids font-specific circularity in your attempt to explain...
3. "Font-specific circularity" is the **entire purpose** of this proposal. If you make 
an ostensive reference to something, then it helps if the reference looks the same as 
the thing that you are referring to.

>C. Doesn't depend on dubious assignments of a code point in
>Unicode for a confusing (non-)use.
I'm sorry, I don't understand what this could mean. But possibly it is not relevant to 
the rest of your argument?

4. Other people's posts have, I think, eliminated "U+" as a possibility, not least 
because it's not defined (in Unicode) as not being an ordinary printable character at 
all. I am no expert, but it seems to this innocent observer that glyph numbers and 
Unicode code points inhabit different universes with no necessary connection between 
them and that if, in a particular font, glyph LXV happens to correspond to code point 
U+0041, that is a cheerful fact about the font but not to be relied upon in general.

5. I *should* reiterate (because some people seem not to have noticed... this is the 
trouble with reading Courier email) that all existing fonts *do* already display the 
proposed new character correctly, so that no changes will be required for them to 
implement it.
Why, in that case, make a proposal at all?
(i) To make sure that whatever code point is decided upon does not suddenly receive a 
glyph in a new version of Unicode.
(ii) To allow sophisticated systems that distinguish "unassigned Unicode character" 
from "Unicode character that I happen not to be able to display" to display the latter 
glyph.

At 12:34 01/08/02 -0700, Kenneth Whistler wrote:
>> As a clarification, here is a sample web page:
>> 
>> http://www.cardbox.com/missing.htm
>> 
>> The requirement is to be able to display the first paragraph of the 
>> page in such a way that it makes sense in its reference to the text 
>> on the rest of the page.
>> 
>> The character after the word "this:" in the first paragraph cannot 
>> be reliably represented by any existing Unicode character.
>> 
>> Nevertheless, I believe it is legitimate to want to say what the 
>> first paragraph says. 
>
>Well, I would put it differently, if it were my web page.
>Rather than:
>
>
>If any of the following text contains characters such as this: {blort}
>then please change to a different font, or download a more recent
>version of your current font.
>
>
>I would suggest something more along the line of:
>
>
>If you have trouble displaying any of the characters in
>the text on this page, please consult 
>Troubleshooting Display Problems.
>
>
>Then the troubleshooting page could provide a nice explanation
>of the problem, show several neatly formatted *graphics* of
>the kind of nondisplayable glyph issues (with alternate forms
>picked from various fonts) that a user might run into, and
>then give helpful links to actual font resources that would
>help, or in the case of specialized data, actually provide a
>usable font directly.
>
>Such an approach:
>
>A. Avoids font-specific circularity in your attempt to explain
>to a user what is going on when the display is broken.
>
>B. Provides much more useful information that will actually
>have a better chance of helping the user get by the problem.
>Also, since the problem(s) may not only be some nondisplayable
>glyphs, the approach is extensible for whatever display

Re: "Missing character" glyph- example

2002-08-01 Thread John Cowan

James Kass scripsit:

> Please note that the first entry in the cmap covers Glyph ID 3.
> Glyph IDs 0, 1, and 2 don't need to be covered by cmap, as they
> are constants which are supposed to be handled by default.

For the record, in FIGfonts the glyphs are labeled by their Unicode
character number (no complex shaping in FIGlet), and the glyph labeled
U+ is the no-definition glyph.  If there is none, a zero-width
glyph is used instead.  This glyph is *never* the first glyph, since
the first 103 glyphs are prescribed.

-- 
John Cowanhttp://www.ccil.org/~cowan   <[EMAIL PROTECTED]>
"Any legal document draws most of its meaning from context. A telegram
that says 'SELL HUNDRED THOUSAND SHARES IBM SHORT' (only 190 bits in
5-bit Baudot code plus appropriate headers) is as good a legal document
as any, even sans digital signature." --me




Re: "Missing character" glyph- example

2002-08-01 Thread James Kass


Peter Constable wrote,

> ... For instance, in Times New Roman, Arial, Tahoma and even 
> James' own Code2000, the first entry in the cmap is for U+0020:

Please note that the first entry in the cmap covers Glyph ID 3.
Glyph IDs 0, 1, and 2 don't need to be covered by cmap, as they
are constants which are supposed to be handled by default.

Glyph ID Zero is the first glyph in every font.  (TTF/OTF)

Zero = Null ---> this is the glyph used for any code point 
not covered by the font, that is to say not included in
the cmap (character map).

Unfortunately, entering "�" in a web page will only display
the string ampersand, number sign, zero, zero, zero, zero, semi-colon.

John Hudson wrote,

> If, by 'missing glyph', you mean the .notdef glyph it should indeed be the 
> first glyph in the repertoire (but alas, may not be due to bad font tools), 

Bad font tools may allow a designer to place a LATIN CAPITAL LETTER A
glyph first in the font.  By definition, in that bad font, LATIN CAPITAL
LETTER A would be used for 'missing glyph'.

A good font tool should allow a designer to draw their interpretation
of the 'missing glyph', though.  Some designers use their own logo
as 'missing glyph', and a designer with a wicked sense of humour
and a poor sense of perspective might even make the 'missing 
glyph' look just like LATIN CAPITAL LETTER A.


> but it should *not* be encoded as U+ or as any other codepoint. .notdef 
> should be unencoded.
> 
> The first four glyphs in a font should be:
> 
>  .notdef (unencoded, symbolic glyph signifying missing glyph)
>  .null (sometimes call NUL or NULL, U+, usually zero-width sans 
> outline)
>  CR (U+000D, usually zero-width sans serif)
>  space (U+0020, often double-mapped to U+00A0)
> 

(Smile)  What is the difference between a zero-width sans serif
glyph and a zero-width serif glyph?

Seriously, aside from the typo, John Hudson is essentially correct.

The conventions John mentions were originally part of the
MacIntosh character set.  Post script names "notdef", ".null",
and "CR" in the older TTF specs have no Unicode value assigned
at all.  Assigning 0x0 to .null and 0xd to CR were originally
MacIntosh conventions.  Indeed, these hex numbers are called
"US Macintosh character code for glyph" in the old TTF specs.

Even though notdef, .null, and CR were not part of either the 
UGL character set or the US Win31 character set; they are 
included in the WGL4 character set.

"notdef", ".null", and "CR" are all unencoded.  

I've always considered "notdef" and ".null" to be semantically 
equal.  Technically, though, this is incorrect.

Best regards,

James Kass.







Re: "Missing character" glyph- example

2002-08-01 Thread Eric Muller

John Hudson wrote:

>   but it should *not* be encoded as U+ or as any other codepoint. 
> .notdef should be unencoded. 

Almost. OpenType specifies that there is no functional difference 
between a code point that is not mapped and a code point that is 
explicitly mapped to GID 0, so there is never a need to map any code 
point to GID 0. But at the same time, there is no prohibition against 
mapping explicitly a code point to GID 0.

Eric.






Re: "Missing character" glyph- example

2002-08-01 Thread John Hudson

At 01:42 PM 01-08-02, [EMAIL PROTECTED] wrote:

>I think James is mistaken on this point: the missing glyph *is* the first
>glyph in any TTF, but it is *not* necessarily (probably not typically)
>mapped from U+. For instance, in Times New Roman, Arial, Tahoma and
>even James' own Code2000, the first entry in the cmap is for U+0020:

If, by 'missing glyph', you mean the .notdef glyph it should indeed be the 
first glyph in the repertoire (but alas, may not be due to bad font tools), 
but it should *not* be encoded as U+ or as any other codepoint. .notdef 
should be unencoded.

The first four glyphs in a font should be:

 .notdef (unencoded, symbolic glyph signifying missing glyph)
 .null (sometimes call NUL or NULL, U+, usually zero-width sans 
outline)
 CR (U+000D, usually zero-width sans serif)
 space (U+0020, often double-mapped to U+00A0)

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

Language must belong to the Other -- to my linguistic community
as a whole -- before it can belong to me, so that the self comes to its
unique articulation in a medium which is always at some level
indifferent to it.  - Terry Eagleton





Re: "Missing character" glyph- example

2002-08-01 Thread Peter_Constable


On 08/01/2002 02:34:17 PM Kenneth Whistler wrote:

>But if you insist on having a code point to stick directly in
>a sentence like that above, I'd take the cue from James Kass:
>
>> The missing glyph is the first glyph in any font.  This is mapped to
>> U+ and the system correctly substitutes the glyph mapped to
>> U+ any time a font being used lacks an outline for a called
>> character.

I think James is mistaken on this point: the missing glyph *is* the first
glyph in any TTF, but it is *not* necessarily (probably not typically)
mapped from U+. For instance, in Times New Roman, Arial, Tahoma and
even James' own Code2000, the first entry in the cmap is for U+0020:


; TrueType v1.0 Dump Program - v1.60, Jul 10 1995, rrt, dra, gch, ddb, lcp
; Copyright (C) 1991 ZSoft Corporation. All rights reserved.
; Portions Copyright (C) 1991-1995 Microsoft Corporation. All rights
reserved.

; Dumping file 'code2000.ttf'

[snip]


Which Means:
   1. Char 0020 -> Index 3
  Char 0021 -> Index 4

[snip]

On the other hand, not being explicitly mapped from a character means that
it is effectively implicitly mapped from a character. So,

>Thus, you have a reasonably good chance that if you try to
>purposefully display the character U+, you will get the
>missing glyph for the font in use. (Unless the application is
>filtering out NULL characters.)

is probably valid.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>







Re: "Missing character" glyph- example

2002-08-01 Thread Kenneth Whistler

> As a clarification, here is a sample web page:
> 
> http://www.cardbox.com/missing.htm
> 
> The requirement is to be able to display the first paragraph of the 
> page in such a way that it makes sense in its reference to the text 
> on the rest of the page.
> 
> The character after the word "this:" in the first paragraph cannot 
> be reliably represented by any existing Unicode character.
> 
> Nevertheless, I believe it is legitimate to want to say what the 
> first paragraph says. 

Well, I would put it differently, if it were my web page.
Rather than:


If any of the following text contains characters such as this: {blort}
then please change to a different font, or download a more recent
version of your current font.


I would suggest something more along the line of:


If you have trouble displaying any of the characters in
the text on this page, please consult 
Troubleshooting Display Problems.


Then the troubleshooting page could provide a nice explanation
of the problem, show several neatly formatted *graphics* of
the kind of nondisplayable glyph issues (with alternate forms
picked from various fonts) that a user might run into, and
then give helpful links to actual font resources that would
help, or in the case of specialized data, actually provide a
usable font directly.

Such an approach:

A. Avoids font-specific circularity in your attempt to explain
to a user what is going on when the display is broken.

B. Provides much more useful information that will actually
have a better chance of helping the user get by the problem.
Also, since the problem(s) may not only be some nondisplayable
glyphs, the approach is extensible for whatever display help
is needed.

C. Doesn't depend on dubious assignments of a code point in
Unicode for a confusing (non-)use.

But if you insist on having a code point to stick directly in
a sentence like that above, I'd take the cue from James Kass:

> The missing glyph is the first glyph in any font.  This is mapped to
> U+ and the system correctly substitutes the glyph mapped to
> U+ any time a font being used lacks an outline for a called
> character.

Thus, you have a reasonably good chance that if you try to
purposefully display the character U+, you will get the
missing glyph for the font in use. (Unless the application is
filtering out NULL characters.)

--Ken







Re: "Missing character" glyph- example

2002-08-01 Thread Martin Kochanski

As a clarification, here is a sample web page:

http://www.cardbox.com/missing.htm

The requirement is to be able to display the first paragraph of the page in such a way 
that it makes sense in its reference to the text on the rest of the page.

The character after the word "this:" in the first paragraph cannot be reliably 
represented by any existing Unicode character.

Nevertheless, I believe it is legitimate to want to say what the first paragraph says.