Re: "Missing character" glyph- example
Periphrasis is always possible, of course; but that doesn't mean that it is desirable. 1. Periphrasis is by definition longer. In a page where you want to present a lot of information and not have it squeezed out by meta-information, the first paragraph in my example could read "Seeing things like []? Click here". (I do agree with you that "click here" is more sensible than "download a font or...", but I just wanted to squeeze my example onto a single page instead of having to provide a target for the link). >If you have trouble displaying any of the characters in the text on this page, 2. "Having trouble displaying" implies that the reader knows what the stuff he is displaying ought to look like. If you show me a page of Arabic, then as long as it looks all sort of squiggly I have no real way of knowing whether it is right or wrong. [If you can read Arabic, then substitute a script that you don't know; unless you are Michael Everson, in which case there is no such thing]. If you show me a page of text that I *can* read, and tell me to look for "trouble", I'm also lost unless you tell me what sort of trouble I am meant to be looking for ("characters" don't mean much to a naive user). If you want a rubric that asks the user to "click here" for other kinds of display problem, you could phrase it a little differently (eg: "Seeing weird things like []? Click here for help"). I don't want to get into the detailed poetics of user interfaces: all I am seeking to establish is that being able to display "[]" can make for shorter and more direct messages. >A. Avoids font-specific circularity in your attempt to explain... 3. "Font-specific circularity" is the **entire purpose** of this proposal. If you make an ostensive reference to something, then it helps if the reference looks the same as the thing that you are referring to. >C. Doesn't depend on dubious assignments of a code point in >Unicode for a confusing (non-)use. I'm sorry, I don't understand what this could mean. But possibly it is not relevant to the rest of your argument? 4. Other people's posts have, I think, eliminated "U+" as a possibility, not least because it's not defined (in Unicode) as not being an ordinary printable character at all. I am no expert, but it seems to this innocent observer that glyph numbers and Unicode code points inhabit different universes with no necessary connection between them and that if, in a particular font, glyph LXV happens to correspond to code point U+0041, that is a cheerful fact about the font but not to be relied upon in general. 5. I *should* reiterate (because some people seem not to have noticed... this is the trouble with reading Courier email) that all existing fonts *do* already display the proposed new character correctly, so that no changes will be required for them to implement it. Why, in that case, make a proposal at all? (i) To make sure that whatever code point is decided upon does not suddenly receive a glyph in a new version of Unicode. (ii) To allow sophisticated systems that distinguish "unassigned Unicode character" from "Unicode character that I happen not to be able to display" to display the latter glyph. At 12:34 01/08/02 -0700, Kenneth Whistler wrote: >> As a clarification, here is a sample web page: >> >> http://www.cardbox.com/missing.htm >> >> The requirement is to be able to display the first paragraph of the >> page in such a way that it makes sense in its reference to the text >> on the rest of the page. >> >> The character after the word "this:" in the first paragraph cannot >> be reliably represented by any existing Unicode character. >> >> Nevertheless, I believe it is legitimate to want to say what the >> first paragraph says. > >Well, I would put it differently, if it were my web page. >Rather than: > > >If any of the following text contains characters such as this: {blort} >then please change to a different font, or download a more recent >version of your current font. > > >I would suggest something more along the line of: > > >If you have trouble displaying any of the characters in >the text on this page, please consult >Troubleshooting Display Problems. > > >Then the troubleshooting page could provide a nice explanation >of the problem, show several neatly formatted *graphics* of >the kind of nondisplayable glyph issues (with alternate forms >picked from various fonts) that a user might run into, and >then give helpful links to actual font resources that would >help, or in the case of specialized data, actually provide a >usable font directly. > >Such an approach: > >A. Avoids font-specific circularity in your attempt to explain >to a user what is going on when the display is broken. > >B. Provides much more useful information that will actually >have a better chance of helping the user get by the problem. >Also, since the problem(s) may not only be some nondisplayable >glyphs, the approach is extensible for whatever display
Re: "Missing character" glyph- example
James Kass scripsit: > Please note that the first entry in the cmap covers Glyph ID 3. > Glyph IDs 0, 1, and 2 don't need to be covered by cmap, as they > are constants which are supposed to be handled by default. For the record, in FIGfonts the glyphs are labeled by their Unicode character number (no complex shaping in FIGlet), and the glyph labeled U+ is the no-definition glyph. If there is none, a zero-width glyph is used instead. This glyph is *never* the first glyph, since the first 103 glyphs are prescribed. -- John Cowanhttp://www.ccil.org/~cowan <[EMAIL PROTECTED]> "Any legal document draws most of its meaning from context. A telegram that says 'SELL HUNDRED THOUSAND SHARES IBM SHORT' (only 190 bits in 5-bit Baudot code plus appropriate headers) is as good a legal document as any, even sans digital signature." --me
Re: "Missing character" glyph- example
Peter Constable wrote, > ... For instance, in Times New Roman, Arial, Tahoma and even > James' own Code2000, the first entry in the cmap is for U+0020: Please note that the first entry in the cmap covers Glyph ID 3. Glyph IDs 0, 1, and 2 don't need to be covered by cmap, as they are constants which are supposed to be handled by default. Glyph ID Zero is the first glyph in every font. (TTF/OTF) Zero = Null ---> this is the glyph used for any code point not covered by the font, that is to say not included in the cmap (character map). Unfortunately, entering "�" in a web page will only display the string ampersand, number sign, zero, zero, zero, zero, semi-colon. John Hudson wrote, > If, by 'missing glyph', you mean the .notdef glyph it should indeed be the > first glyph in the repertoire (but alas, may not be due to bad font tools), Bad font tools may allow a designer to place a LATIN CAPITAL LETTER A glyph first in the font. By definition, in that bad font, LATIN CAPITAL LETTER A would be used for 'missing glyph'. A good font tool should allow a designer to draw their interpretation of the 'missing glyph', though. Some designers use their own logo as 'missing glyph', and a designer with a wicked sense of humour and a poor sense of perspective might even make the 'missing glyph' look just like LATIN CAPITAL LETTER A. > but it should *not* be encoded as U+ or as any other codepoint. .notdef > should be unencoded. > > The first four glyphs in a font should be: > > .notdef (unencoded, symbolic glyph signifying missing glyph) > .null (sometimes call NUL or NULL, U+, usually zero-width sans > outline) > CR (U+000D, usually zero-width sans serif) > space (U+0020, often double-mapped to U+00A0) > (Smile) What is the difference between a zero-width sans serif glyph and a zero-width serif glyph? Seriously, aside from the typo, John Hudson is essentially correct. The conventions John mentions were originally part of the MacIntosh character set. Post script names "notdef", ".null", and "CR" in the older TTF specs have no Unicode value assigned at all. Assigning 0x0 to .null and 0xd to CR were originally MacIntosh conventions. Indeed, these hex numbers are called "US Macintosh character code for glyph" in the old TTF specs. Even though notdef, .null, and CR were not part of either the UGL character set or the US Win31 character set; they are included in the WGL4 character set. "notdef", ".null", and "CR" are all unencoded. I've always considered "notdef" and ".null" to be semantically equal. Technically, though, this is incorrect. Best regards, James Kass.
Re: "Missing character" glyph- example
John Hudson wrote: > but it should *not* be encoded as U+ or as any other codepoint. > .notdef should be unencoded. Almost. OpenType specifies that there is no functional difference between a code point that is not mapped and a code point that is explicitly mapped to GID 0, so there is never a need to map any code point to GID 0. But at the same time, there is no prohibition against mapping explicitly a code point to GID 0. Eric.
Re: "Missing character" glyph- example
At 01:42 PM 01-08-02, [EMAIL PROTECTED] wrote: >I think James is mistaken on this point: the missing glyph *is* the first >glyph in any TTF, but it is *not* necessarily (probably not typically) >mapped from U+. For instance, in Times New Roman, Arial, Tahoma and >even James' own Code2000, the first entry in the cmap is for U+0020: If, by 'missing glyph', you mean the .notdef glyph it should indeed be the first glyph in the repertoire (but alas, may not be due to bad font tools), but it should *not* be encoded as U+ or as any other codepoint. .notdef should be unencoded. The first four glyphs in a font should be: .notdef (unencoded, symbolic glyph signifying missing glyph) .null (sometimes call NUL or NULL, U+, usually zero-width sans outline) CR (U+000D, usually zero-width sans serif) space (U+0020, often double-mapped to U+00A0) John Hudson Tiro Typeworks www.tiro.com Vancouver, BC [EMAIL PROTECTED] Language must belong to the Other -- to my linguistic community as a whole -- before it can belong to me, so that the self comes to its unique articulation in a medium which is always at some level indifferent to it. - Terry Eagleton
Re: "Missing character" glyph- example
On 08/01/2002 02:34:17 PM Kenneth Whistler wrote: >But if you insist on having a code point to stick directly in >a sentence like that above, I'd take the cue from James Kass: > >> The missing glyph is the first glyph in any font. This is mapped to >> U+ and the system correctly substitutes the glyph mapped to >> U+ any time a font being used lacks an outline for a called >> character. I think James is mistaken on this point: the missing glyph *is* the first glyph in any TTF, but it is *not* necessarily (probably not typically) mapped from U+. For instance, in Times New Roman, Arial, Tahoma and even James' own Code2000, the first entry in the cmap is for U+0020: ; TrueType v1.0 Dump Program - v1.60, Jul 10 1995, rrt, dra, gch, ddb, lcp ; Copyright (C) 1991 ZSoft Corporation. All rights reserved. ; Portions Copyright (C) 1991-1995 Microsoft Corporation. All rights reserved. ; Dumping file 'code2000.ttf' [snip] Which Means: 1. Char 0020 -> Index 3 Char 0021 -> Index 4 [snip] On the other hand, not being explicitly mapped from a character means that it is effectively implicitly mapped from a character. So, >Thus, you have a reasonably good chance that if you try to >purposefully display the character U+, you will get the >missing glyph for the font in use. (Unless the application is >filtering out NULL characters.) is probably valid. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: <[EMAIL PROTECTED]>
Re: "Missing character" glyph- example
> As a clarification, here is a sample web page: > > http://www.cardbox.com/missing.htm > > The requirement is to be able to display the first paragraph of the > page in such a way that it makes sense in its reference to the text > on the rest of the page. > > The character after the word "this:" in the first paragraph cannot > be reliably represented by any existing Unicode character. > > Nevertheless, I believe it is legitimate to want to say what the > first paragraph says. Well, I would put it differently, if it were my web page. Rather than: If any of the following text contains characters such as this: {blort} then please change to a different font, or download a more recent version of your current font. I would suggest something more along the line of: If you have trouble displaying any of the characters in the text on this page, please consult Troubleshooting Display Problems. Then the troubleshooting page could provide a nice explanation of the problem, show several neatly formatted *graphics* of the kind of nondisplayable glyph issues (with alternate forms picked from various fonts) that a user might run into, and then give helpful links to actual font resources that would help, or in the case of specialized data, actually provide a usable font directly. Such an approach: A. Avoids font-specific circularity in your attempt to explain to a user what is going on when the display is broken. B. Provides much more useful information that will actually have a better chance of helping the user get by the problem. Also, since the problem(s) may not only be some nondisplayable glyphs, the approach is extensible for whatever display help is needed. C. Doesn't depend on dubious assignments of a code point in Unicode for a confusing (non-)use. But if you insist on having a code point to stick directly in a sentence like that above, I'd take the cue from James Kass: > The missing glyph is the first glyph in any font. This is mapped to > U+ and the system correctly substitutes the glyph mapped to > U+ any time a font being used lacks an outline for a called > character. Thus, you have a reasonably good chance that if you try to purposefully display the character U+, you will get the missing glyph for the font in use. (Unless the application is filtering out NULL characters.) --Ken
Re: "Missing character" glyph- example
As a clarification, here is a sample web page: http://www.cardbox.com/missing.htm The requirement is to be able to display the first paragraph of the page in such a way that it makes sense in its reference to the text on the rest of the page. The character after the word "this:" in the first paragraph cannot be reliably represented by any existing Unicode character. Nevertheless, I believe it is legitimate to want to say what the first paragraph says.