Re: Revised proposal for Missing character glyph
Peter_Constable at sil dot org wrote: A couple of corrections. First, if an app supports only WM_CHAR and not also WM_UNICHAR, that does not imply that it uses a legacy encoding. If running on NT/2K/XP and registered as a wide (Unicode) app, the WM_CHAR messages will supply UTF-16 code units. If running on Win9x/Me and registered as an ANSI app, the WM_CHAR messages supply codepoints in some Windows codepage, but the app can still store text as Unicode if it takes the WM_CHAR data and immediately converts it. Secondly, the question of whether an app supports WM_UNICHAR in addition to WM_CHAR has no direct bearing on what it puts onto the clipboard -- the two are independent. If an app encodes text as Unicode, though, it is true that it would probably include Unicode- encoded plain text among the formats it copies to the clipboard. Thanks for the corrections. I haven't actually played around with this very much, and I thought I understood more than I did about Unicode on the clipboard. (As I mentioned earlier, I should have said CF_TEXT and CF_UNICODETEXT rather than WM_CHAR and WM_UNICHAR.) The collected knowledge on this list is a real treasure, very helpful and enlightening. Thanks. -Doug Ewell Fullerton, California
Re: Revised proposal for Missing character glyph
-BEGIN PGP SIGNED MESSAGE- Michael (michka) Kaplan wrote: Not sure how this could be generally possible to restrict, since WinNT/2K/XP/.Net all will transparently map CF_TEXT an CF_UNICODETEXT so that if one if put on the clipboard and the other is asked for, you will get it. Synthetic clipboard formats, etc... However, you can enumerate the formats that an app actually put on the clipboard, rather than asking for a specific one. I happen to have a code snippet demonstrating that, which is attached. - -- David Hopwood [EMAIL PROTECTED] Home page PGP public key: http://www.users.zetnet.co.uk/hopwood/ RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01 Nothing in this message is intended to be legally binding. If I revoke a public key but refuse to specify why, it is because the private key has been seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip -BEGIN PGP SIGNATURE- Version: 2.6.3i Charset: noconv iQEVAwUBPXLgRzkCAxeYt5gVAQGl/wf+L17suZyJRwjpTRBVaUpckCHANcHv5na5 O83ZrzRHFpdU1iGxOrqz5gPGWIywgYd9Od+KgqwtVII0bX1pHg7MssABmNVU9i3Z GAiYkuuuhhR1pWHorqazQTlix8rgtd6aXtZ4Rip77UcYs9uwk1mQgYBhj7YDWAom tRamUCChRsoGrXRqU+mFXOAU0YIYafRDQ++WljjxH2FI1pPVa5PmFjBNW+W5O7Ys Z8/mFDxvs+QFKy2Wl9zj/VELCCeuSImo8B0q9LPzXKHfIOofNbx07uuY5ZiWM1Mf rIMZIGXaB/95/AwbSU1x0oROnakBL/3rLKqg+w/W2BVbvQCWm59JLA== =cRFn -END PGP SIGNATURE- #include stdio.h #include stdlib.h #define WIN32_LEAN_AND_MEAN #include windows.h int main(int argc, char **argv); BOOL testPaste(void); BOOL pasteFormat(int format, const void *data, size_t size); int main(int argc, char **argv) { int retval = EXIT_SUCCESS; int format = 0; char name[20]; HANDLE data; if (!OpenClipboard(NULL)) { printf(Could not open clipboard.\n); return EXIT_FAILURE; } while ((format = EnumClipboardFormats(format)) != 0) { printf(Format %d, format); if (GetClipboardFormatNameA(format, name, sizeof(name)-1)) { printf( (%s), name); } switch (format) { case CF_UNICODETEXT: printf( (CF_UNICODETEXT)); data = GetClipboardData(format); wprintf(L = \%s\, (wchar_t *) data); break; case CF_TEXT: printf( (CF_TEXT)); data = GetClipboardData(format); printf( = \%s\, (char *) data); break; case CF_OEMTEXT: printf( (CF_OEMTEXT)); data = GetClipboardData(format); printf( = \%s\, (char *) data); break; case CF_LOCALE: printf( (CF_LOCALE)); break; } printf(\n); } if (GetLastError() != NO_ERROR) { printf(Error enumerating clipboard formats.\n); retval = EXIT_FAILURE; } if (!testPaste()) { retval = EXIT_FAILURE; } if (!CloseClipboard()) { printf(Could not close clipboard.\n); retval = EXIT_FAILURE; } return retval; } BOOL testPaste(void) { wchar_t pastew[] = Lhello; char pastea[] = ascii; if (!EmptyClipboard()) { printf(Could not empty clipboard.\n); return FALSE; } if (!pasteFormat(CF_UNICODETEXT, pastew, sizeof(pastew))) { printf(Could not paste Unicode text.\n); return FALSE; } if (!pasteFormat(CF_TEXT, pastea, sizeof(pastea))) { printf(Could not paste MBCS text.\n); return FALSE; } if (!pasteFormat(CF_OEMTEXT, pastea, sizeof(pastea))) { printf(Could not paste OEM text.\n); return FALSE; } return TRUE; } BOOL pasteFormat(int format, const void *data, size_t size) { HANDLE handle; void *buf; if (!(handle = GlobalAlloc(GMEM_MOVEABLE | GMEM_DDESHARE, size))) { return FALSE; } if (!(buf = GlobalLock(handle))) { GlobalFree(handle); return FALSE; } memcpy(buf, data, size); GlobalUnlock(buf); if (!SetClipboardData(format, handle)) { GlobalFree(handle); return FALSE; } return TRUE; }
Re: Revised proposal for Missing character glyph
Peter_Constable at sil dot org wrote: Something that wouldn't be difficult would be an item that copied data to the clipboard, and then displayed character info based on the clipboard content. Hmm, an interesting thought. I would be willing to write a mini-tool like this, if enough people let me know (on- or off-line) that it would be useful to them, and provide some suggestions for output formats. Of course, one limitation is that apps can alter the data before they put it on the clipboard; in fact, an app might opt to convert everything to some default codepage and put only that on the clipboard. It would make sense for a Unicode-specific tool such as this to only accept data in WM_UNICHAR format, not WM_CHAR. Unicode data in WM_CHAR format is pretty much guaranteed to have gone through some conversion step. -Doug Ewell Fullerton, California
Re: Revised proposal for Missing character glyph
On 08/28/2002 05:38:05 PM Doug Ewell wrote: Edit controls (edit boxes, text widgets) in Windows already come equipped with a right-click menu... It's not hard to imagine that menu being extended with a Character Info or What's This Glyph? item... Of course, I have no idea if such a thing will ever be added to Windows (or any other OS). I'm sure it's not as simple to implement as I'm making it sound. Something that wouldn't be difficult would be an item that copied data to the clipboard, and then displayed character info based on the clipboard content. Of course, one limitation is that apps can alter the data before they put it on the clipboard; in fact, an app might opt to convert everything to some default codepage and put only that on the clipboard. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
Re: Revised proposal for Missing character glyph
Kenneth Whistler wrote the following at 2:01 PM on Mon, Aug 26, 2002: And an approach which strikes me as a much more useful and extensible way to deal with this would be the concept of a What's This? text accessory. Essentially a small tool that a user could select a piece of text with (think of it like a little magnifying glass, if you will), which will then pop up the contents selected, deconstructed into its character sequence explicitly. Good idea - the big attraction being extensibility. But a detraction is that it would typically mean multiple, or at least explicit, deployment at the application level on any given platform. (I'm presuming such a system service would present an optional API to application developers, who may or may not be using higher level system services for rendering text). But a font-based approach, being lower level, would be inherited by all software including that which bypasses all but the lowest level system services - there's nothing for application developers to do in such a scenario. Seems like it would be nice to have both solutions. Respectfully, Dean A. Snyder Scholarly Technology Specialist Center For Scholarly Resources, Sheridan Libraries Garrett Room, MSE Library, 3400 N. Charles St. The Johns Hopkins University Baltimore, Maryland, USA 21218 office: 410 516-6850 mobile: 410 245-7168 fax: 410-516-6229 Digital Hammurabi: www.jhu.edu/digitalhammurabi Initiative for Cuneiform Encoding: www.jhu.edu/ice
Re: Revised proposal for Missing character glyph
Dean Snyder dean dot snyder at jhu dot edu wrote: Good idea - the big attraction being extensibility. But a detraction is that it would typically mean multiple, or at least explicit, deployment at the application level on any given platform. (I'm presuming such a system service would present an optional API to application developers, who may or may not be using higher level system services for rendering text). But a font-based approach, being lower level, would be inherited by all software including that which bypasses all but the lowest level system services - there's nothing for application developers to do in such a scenario. The ability to pinpoint individual glyphs and get code point and other information could be provided as a system service. Edit controls (edit boxes, text widgets) in Windows already come equipped with a right-click menu that allows the user to cut, copy, paste, and select all. With Windows 2000 (I don't know about NT 4) there are also Unicode-specific options, such as Right-to-left reading order and Insert Unicode control character (which leads to a submenu where you can choose exciting options like IAFS and NADS, at least until somebody catches you and calls the police). It's not hard to imagine that menu being extended with a Character Info or What's This Glyph? item, which would display a Help cursor (question mark + arrow pointing NNW). The user could click on a glyph within the edit control, and the system would display all the relevant information about the character corresponding to that glyph in a small ToolTip™-style window. Of course, I have no idea if such a thing will ever be added to Windows (or any other OS). I'm sure it's not as simple to implement as I'm making it sound. But the advantage would be the same as what Dean envisions for a font-based solution -- applications would get the support for free, instead of having to re-implement it in multiple, slightly different ways. -Doug Ewell Fullerton, California
Re: Revised proposal for Missing character glyph
Doug Ewell wrote the following at 8:38 AM on Wed, Aug 28, 2002: But the advantage would be the same as what Dean envisions for a font-based solution -- applications would get the support for free, instead of having to re-implement it in multiple, slightly different ways. I don't believe so. Such a system service would have to have access to the target text to do its work. And if the target text is not known implicitly by the system (because an application is not using higher level system text services, your edit boxes, text widgets, and a lot of applications do NOT use this stuff exclusively) then the target text must be provided explicitly by the application. But this would not the case for the font-based approach, because there are extremely few applications I am aware of that bypass the system's actual rendering of font glyphs (only Adobe's ATM comes to mind). Respectfully, Dean A. Snyder Scholarly Technology Specialist Center For Scholarly Resources, Sheridan Libraries Garrett Room, MSE Library, 3400 N. Charles St. The Johns Hopkins University Baltimore, Maryland, USA 21218 office: 410 516-6850 mobile: 410 245-7168 fax: 410-516-6229 Digital Hammurabi: www.jhu.edu/digitalhammurabi Initiative for Cuneiform Encoding: www.jhu.edu/ice
Re: Revised proposal for Missing character glyph
[Resend of a response which got eaten by the Unicode email during the system maintenance last week. Carl already responded to me on this, but others may not have seen what he was responding to. --Ken] Proposed unknown and missing character representation. This would be an alternate to method currently described in 5.3. The missing or unknown character would be represented as a series of vertical hex digit pairs for each byte of the character. The problem I have with this is that is seems to be an overengineered approach that conflates two issues: a. What does a font do when requested to display a character (or sequence) for which it has no glyph. b. What does a user do to diagnose text content that may be causing a rendering failure. For the first problem, we already have a widespread approach that seems adequate. And other correspondents on this topic have pointed out that the particular approach of displaying up hex numbers for characters may pose technical difficulties for at least some font technologies. [snip] This representation would be recognized by untrained people as unrenderable data or garbage. So it would serve the same function as a missing glyph character except that it would be different from normal glyphs so that they would know that something was wrong and the text did not just happen to have funny characters. I don't see any particular problem in training people to recognize when they are seeing their fonts' notdef glyphs. The whole concept of seeing little boxes where the characters should be is not hard to explain to people -- even to people who otherwise have difficulty with a lot of computer abstractions. Things will be better-behaved when applications finally get past the related but worse problem of screwing up the character encodings -- which results in the more typical misdisplay: lots of recognizable glyphs, but randomly arranged into nonsensical junk. (Ah, yes, that must be another piece of Korean spam mail in my mail tray.) It would aid people in finding the problem and for people with Unicode books the text would be decipherable. If the information was truly critical they could have the text deciphered. Rather than trying to engineer a questionable solution into the fonts, I'd like to step back and ask what would better serve the user in such circumstances. And an approach which strikes me as a much more useful and extensible way to deal with this would be the concept of a What's This? text accessory. Essentially a small tool that a user could select a piece of text with (think of it like a little magnifying glass, if you will), which will then pop up the contents selected, deconstructed into its character sequence explicitly. Limited versions of such things exist already -- such as the tooltip-like popup windows for Asmus' Unibook program, which give attribute information for characters in the code chart. But I'm thinking of something a little more generic, associated with textedit/richedit type text editing areas (or associated with general word processing programs). The reason why such an approach is more extensible is that it is not merely focussed on the nondisplayable character glyph issue, but rather represents a general ability to query text, whether normally displayable or not. I could query a black box notdef glyph to find out what in the text caused its display; but I could just as well query a properly displayed Telugu glyph, for example, to find out what it was, as well. This is comparable (although more point-oriented) to the concept of giving people a source display for HTML, so they can figure out what in the markup is causing rendering problems for their rich text content. [snip] This proposal would provide a standardized approach that vendors could adopt to clarify missing character rendering and reduce support costs. By including this in the standard we could provide a cross vendor approach. This would provide a consistent solution. In my opinion, the standard already provides a description of a cross-vendor approach to the notdef glyph problem, with the advantage that it is the de facto, widely adopted approach as well. As long as font vendors stay away from making {p}'s and {q}'s their notdef glyphs, as I think we can safely presume they will, and instead use variants on the themes of hollowed or filled boxes, then the problem of *recognition* of the notdef glyphs for what they are is a pretty marginal problem. And as for how to provide users better diagnostics for figuring out the content of undisplayable text, I suppose the standard could suggest some implementation guidelines there, but this might be a better area to just leave up to competing implementation practice until certain user interface models catch on and get widespread acceptance. --Ken
RE: Revised proposal for Missing character glyph
William, -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of William Overington Sent: Friday, August 23, 2002 12:55 AM To: James Kass; Carl W. Brown; Unicode List Cc: [EMAIL PROTECTED] Subject: Re: Revised proposal for Missing character glyph James Kass wrote as follows. quote For non-BMP, how about a double tall glyph at the left as the plane signifier? I double high number or letter will look like a standard letter that will just be narrower unless you are displaying text in a narrow font. In that case it will look like a separate character... This will be very confusing. Besides I don't like mixing bases and more than using octal for represents 8 bit bytes. It was confusing to use base 4, base 8, base 8, base 4, base 8, base 8 etc. How will you display the rest of the data. Will you use 65536 glyphs? That is a monster font. Better would be to use the top 4 bits of the low order 2 bytes then the bottom 4 bits of the same bytes. In any case you are going to a lot of trouble to avoid vertical hex which is the simple solution. Remember keep it stupid, simple. Carl
Re: Revised proposal for Missing character glyph
Kenneth Whistler scripsit: Things will be better-behaved when applications finally get past the related but worse problem of screwing up the character encodings -- which results in the more typical misdisplay: lots of recognizable glyphs, but randomly arranged into nonsensical junk. (Ah, yes, that must be another piece of Korean spam mail in my mail tray.) In the old days, experts could detect mismatched serial-line connections based on the nature of the baud barf that the remote system emitted. Nowadays, experts can detect mismatched character sets from the nature of the byte barf that appears on their screen. -- John Cowan [EMAIL PROTECTED] You need a change: try Canada You need a change: try China --fortune cookies opened by a couple that I know
RE: Revised proposal for Missing character glyph
Ken, The little square boxes do not help much if you what to know exactly what the missing characters are. I do however feel that any solution to the problems should be Unicode based. If left to the vendors that may display the code page characters and you are guessing again. The tool idea is great but I do not see how it could be embedded in the OS without changing the application. It will also require user training. I think that as we move away from code page text we will find that the next big problem will be characters that are missing from the font or sets of fonts. The trick will be to change the set of fonts. This might require trial and error if we do not have good diagnostic tools. Implementing this change will probably be easier that using the special symbols for the script which will also require special handling and many not catch all errors. This approach will also allow critical test that can not be redisplayed to be deciphered. This has been a pet peeve of mine having used the Fujitsu Shift JIS solution and seen it work in a real live situation. Carl -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Kenneth Whistler Sent: Monday, August 26, 2002 2:01 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: Revised proposal for Missing character glyph [Resend of a response which got eaten by the Unicode email during the system maintenance last week. Carl already responded to me on this, but others may not have seen what he was responding to. --Ken] Proposed unknown and missing character representation. This would be an alternate to method currently described in 5.3. The missing or unknown character would be represented as a series of vertical hex digit pairs for each byte of the character. The problem I have with this is that is seems to be an overengineered approach that conflates two issues: a. What does a font do when requested to display a character (or sequence) for which it has no glyph. b. What does a user do to diagnose text content that may be causing a rendering failure. For the first problem, we already have a widespread approach that seems adequate. And other correspondents on this topic have pointed out that the particular approach of displaying up hex numbers for characters may pose technical difficulties for at least some font technologies. [snip] This representation would be recognized by untrained people as unrenderable data or garbage. So it would serve the same function as a missing glyph character except that it would be different from normal glyphs so that they would know that something was wrong and the text did not just happen to have funny characters. I don't see any particular problem in training people to recognize when they are seeing their fonts' notdef glyphs. The whole concept of seeing little boxes where the characters should be is not hard to explain to people -- even to people who otherwise have difficulty with a lot of computer abstractions. Things will be better-behaved when applications finally get past the related but worse problem of screwing up the character encodings -- which results in the more typical misdisplay: lots of recognizable glyphs, but randomly arranged into nonsensical junk. (Ah, yes, that must be another piece of Korean spam mail in my mail tray.) It would aid people in finding the problem and for people with Unicode books the text would be decipherable. If the information was truly critical they could have the text deciphered. Rather than trying to engineer a questionable solution into the fonts, I'd like to step back and ask what would better serve the user in such circumstances. And an approach which strikes me as a much more useful and extensible way to deal with this would be the concept of a What's This? text accessory. Essentially a small tool that a user could select a piece of text with (think of it like a little magnifying glass, if you will), which will then pop up the contents selected, deconstructed into its character sequence explicitly. Limited versions of such things exist already -- such as the tooltip-like popup windows for Asmus' Unibook program, which give attribute information for characters in the code chart. But I'm thinking of something a little more generic, associated with textedit/richedit type text editing areas (or associated with general word processing programs). The reason why such an approach is more extensible is that it is not merely focussed on the nondisplayable character glyph issue, but rather represents a general ability to query text, whether normally displayable or not. I could query a black box notdef glyph to find out what in the text caused its display; but I could just as well query a properly displayed Telugu glyph, for example, to find out what it was, as well. This is comparable (although more point-oriented
Re: Revised proposal for Missing character glyph
At 09:49 PM 8/26/2002 -0400, John Cowan wrote: Nowadays, experts can detect mismatched character sets from the nature of the byte barf that appears on their screen. And super-experts can read languages in byte barf as it is not random! Barry Caplan http://www.i18n.com
Re: Revised proposal for Missing character glyph
-BEGIN PGP SIGNED MESSAGE- Carl W. Brown wrote: Proposed unknown and missing character representation. This would be an alternate to method currently described in 5.3. The missing or unknown character would be represented as a series of vertical hex digit pairs for each byte of the character. Why vertical? Hexadecimal is almost invariably written left-to-right, top-to-bottom, and that's the order I would expect. Garbage data with non-zero bits 24-31 may require 8 digits or 4 pairs of digits. I thought this proposal was intended for characters that cannot be rendered by a font, not ill-formed encodings? - -- David Hopwood [EMAIL PROTECTED] Home page PGP public key: http://www.users.zetnet.co.uk/hopwood/ RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01 Nothing in this message is intended to be legally binding. If I revoke a public key but refuse to specify why, it is because the private key has been seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip -BEGIN PGP SIGNATURE- Version: 2.6.3i Charset: noconv iQEVAwUBPV1OszkCAxeYt5gVAQFFMgf+MeVRGfb0I/Jpv6nTlSA0cmLT5XAJ/NoU AqYucA3EW0NbEPmVHo++w9erTStLrRBO4O236YDW4ZlXZEpBBgaAbmVfytHpZUmX pzsneWvo1kOsxdn5ajxW9CrJgQ7fahGNPJrhIH16bcETfxbNUFXKoMMw2KZZIiHb KbTN9AwlGFqTzUeL4l2U3Il/uFNEirqYeRFqnp7/uH24u0Phgf73/8AR6x1psbC7 s0/bGXRD0Vjje0XZWa2bVRrdoARWiE22pVXWWu+LTpB9ipDLSIy3ccRWOp9oPZSz L9AF+czOZ9/vPm82DMbKlTKNcBxlCcQORRAyc7feEPBj4F8IwYfPBw== =bNEr -END PGP SIGNATURE-
RE: Revised proposal for Missing character glyph
Ken, This is an alternate to representing bad glyphs with a missing glyph character. People can implement either. -Original Message- From: Kenneth Whistler [mailto:[EMAIL PROTECTED]] Sent: Friday, August 16, 2002 2:28 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: Revised proposal for Missing character glyph Proposed unknown and missing character representation. This would be an alternate to method currently described in 5.3. The missing or unknown character would be represented as a series of vertical hex digit pairs for each byte of the character. The problem I have with this is that is seems to be an overengineered approach that conflates two issues: a. What does a font do when requested to display a character (or sequence) for which it has no glyph. b. What does a user do to diagnose text content that may be causing a rendering failure. For the first problem, we already have a widespread approach that seems adequate. And other correspondents on this topic have pointed out that the particular approach of displaying up hex numbers for characters may pose technical difficulties for at least some font technologies. Because proportional fonts require font metrics processing the process must be able to determine if a character can not be rendered. The logic can be changed to use a special font with 257 glyphs to produce these characters. Thus it should be possible to incorporate this into the operating system code rather than each application. It would be best to put it in Open Type or equivalent code but not all systems have this type of code. ICU's layout code would also be a good place. Systems limited to monospaced fonts will have problems implementing this. This representation would be recognized by untrained people as unrenderable data or garbage. So it would serve the same function as a missing glyph character except that it would be different from normal glyphs so that they would know that something was wrong and the text did not just happen to have funny characters. I don't see any particular problem in training people to recognize when they are seeing their fonts' notdef glyphs. The whole concept of seeing little boxes where the characters should be is not hard to explain to people -- even to people who otherwise have difficulty with a lot of computer abstractions. Things will be better-behaved when applications finally get past the related but worse problem of screwing up the character encodings -- which results in the more typical misdisplay: lots of recognizable glyphs, but randomly arranged into nonsensical junk. (Ah, yes, that must be another piece of Korean spam mail in my mail tray.) Unicode text will do more to fix character encoding problems. Then the problem will be either truly bad characters or font problems. Many systems have difficulties handling sets of fonts each covering a porting of the character range. This would provide an indication of which scripts were missing. Yes you could use the suggested script id glyphs but that would require special processing that would be as difficult as this to implement. It would aid people in finding the problem and for people with Unicode books the text would be decipherable. If the information was truly critical they could have the text deciphered. Rather than trying to engineer a questionable solution into the fonts, I'd like to step back and ask what would better serve the user in such circumstances. And an approach which strikes me as a much more useful and extensible way to deal with this would be the concept of a What's This? text accessory. Essentially a small tool that a user could select a piece of text with (think of it like a little magnifying glass, if you will), which will then pop up the contents selected, deconstructed into its character sequence explicitly. Limited versions of such things exist already -- such as the tooltip-like popup windows for Asmus' Unibook program, which give attribute information for characters in the code chart. But I'm thinking of something a little more generic, associated with textedit/richedit type text editing areas (or associated with general word processing programs). The reason why such an approach is more extensible is that it is not merely focussed on the nondisplayable character glyph issue, but rather represents a general ability to query text, whether normally displayable or not. I could query a black box notdef glyph to find out what in the text caused its display; but I could just as well query a properly displayed Telugu glyph, for example, to find out what it was, as well. This is comparable (although more point-oriented) to the concept of giving people a source display for HTML, so they can figure out what in the markup is causing rendering problems for their rich text content. Text query will requite that each