Re: Revised proposal for Missing character glyph

2002-09-01 Thread Doug Ewell

Peter_Constable at sil dot org wrote:

 A couple of corrections. First, if an app supports only WM_CHAR and
 not also WM_UNICHAR, that does not imply that it uses a legacy
 encoding. If running on NT/2K/XP and registered as a wide (Unicode)
 app, the WM_CHAR messages will supply UTF-16 code units. If running
 on Win9x/Me and registered as an ANSI app, the WM_CHAR messages
 supply codepoints in some Windows codepage, but the app can still
 store text as Unicode if it takes the WM_CHAR data and immediately
 converts it.

 Secondly, the question of whether an app supports WM_UNICHAR in
 addition to WM_CHAR has no direct bearing on what it puts onto the
 clipboard -- the two are independent. If an app encodes text as
 Unicode, though, it is true that it would probably include Unicode-
 encoded plain text among the formats it copies to the clipboard.

Thanks for the corrections.  I haven't actually played around with this
very much, and I thought I understood more than I did about Unicode on
the clipboard.  (As I mentioned earlier, I should have said CF_TEXT and
CF_UNICODETEXT rather than WM_CHAR and WM_UNICHAR.)

The collected knowledge on this list is a real treasure, very helpful
and enlightening.  Thanks.

-Doug Ewell
 Fullerton, California





Re: Revised proposal for Missing character glyph

2002-09-01 Thread David Hopwood

-BEGIN PGP SIGNED MESSAGE-

Michael (michka) Kaplan wrote:
 Not sure how this could be generally possible to restrict, since
 WinNT/2K/XP/.Net all will transparently map CF_TEXT an CF_UNICODETEXT so
 that if one if put on the clipboard and the other is asked for, you will get
 it. Synthetic clipboard formats, etc...

However, you can enumerate the formats that an app actually put on the
clipboard, rather than asking for a specific one. I happen to have a
code snippet demonstrating that, which is attached.

- -- 
David Hopwood [EMAIL PROTECTED]

Home page  PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5  0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip


-BEGIN PGP SIGNATURE-
Version: 2.6.3i
Charset: noconv

iQEVAwUBPXLgRzkCAxeYt5gVAQGl/wf+L17suZyJRwjpTRBVaUpckCHANcHv5na5
O83ZrzRHFpdU1iGxOrqz5gPGWIywgYd9Od+KgqwtVII0bX1pHg7MssABmNVU9i3Z
GAiYkuuuhhR1pWHorqazQTlix8rgtd6aXtZ4Rip77UcYs9uwk1mQgYBhj7YDWAom
tRamUCChRsoGrXRqU+mFXOAU0YIYafRDQ++WljjxH2FI1pPVa5PmFjBNW+W5O7Ys
Z8/mFDxvs+QFKy2Wl9zj/VELCCeuSImo8B0q9LPzXKHfIOofNbx07uuY5ZiWM1Mf
rIMZIGXaB/95/AwbSU1x0oROnakBL/3rLKqg+w/W2BVbvQCWm59JLA==
=cRFn
-END PGP SIGNATURE-


#include stdio.h
#include stdlib.h

#define WIN32_LEAN_AND_MEAN
#include windows.h

int main(int argc, char **argv);
BOOL testPaste(void);
BOOL pasteFormat(int format, const void *data, size_t size);

int main(int argc, char **argv) {
int retval = EXIT_SUCCESS;
int format = 0;
char name[20];
HANDLE data;

if (!OpenClipboard(NULL)) {
printf(Could not open clipboard.\n);
return EXIT_FAILURE;
}

while ((format = EnumClipboardFormats(format)) != 0) {
printf(Format %d, format);
if (GetClipboardFormatNameA(format, name, sizeof(name)-1)) {
printf( (%s), name);
}
switch (format) {
  case CF_UNICODETEXT:
printf( (CF_UNICODETEXT));
data = GetClipboardData(format);
wprintf(L = \%s\, (wchar_t *) data);
break;

  case CF_TEXT:
printf( (CF_TEXT));
data = GetClipboardData(format);
printf( = \%s\, (char *) data);
break;

  case CF_OEMTEXT:
printf( (CF_OEMTEXT));
data = GetClipboardData(format);
printf( = \%s\, (char *) data);
break;

  case CF_LOCALE:
printf( (CF_LOCALE));
break;
}
printf(\n);
}

if (GetLastError() != NO_ERROR) {
printf(Error enumerating clipboard formats.\n);
retval = EXIT_FAILURE;
}

if (!testPaste()) {
retval = EXIT_FAILURE;
}

if (!CloseClipboard()) {
printf(Could not close clipboard.\n);
retval = EXIT_FAILURE;
}
return retval;
}

BOOL testPaste(void) {
wchar_t pastew[] = Lhello;
char pastea[] = ascii;

if (!EmptyClipboard()) {
printf(Could not empty clipboard.\n);
return FALSE;
}
if (!pasteFormat(CF_UNICODETEXT, pastew, sizeof(pastew))) {
printf(Could not paste Unicode text.\n);
return FALSE;
}
if (!pasteFormat(CF_TEXT, pastea, sizeof(pastea))) {
printf(Could not paste MBCS text.\n);
return FALSE;
}
if (!pasteFormat(CF_OEMTEXT, pastea, sizeof(pastea))) {
printf(Could not paste OEM text.\n);
return FALSE;
}
return TRUE;
}

BOOL pasteFormat(int format, const void *data, size_t size) {
HANDLE handle;
void *buf;

if (!(handle = GlobalAlloc(GMEM_MOVEABLE | GMEM_DDESHARE, size))) {
return FALSE;
}
if (!(buf = GlobalLock(handle))) {
GlobalFree(handle);
return FALSE;
}
memcpy(buf, data, size);
GlobalUnlock(buf);

if (!SetClipboardData(format, handle)) {
GlobalFree(handle);
return FALSE;
}
return TRUE;
}


Re: Revised proposal for Missing character glyph

2002-08-31 Thread Doug Ewell

Peter_Constable at sil dot org wrote:

 Something that wouldn't be difficult would be an item that copied data
 to the clipboard, and then displayed character info based on the
 clipboard content.

Hmm, an interesting thought.  I would be willing to write a mini-tool
like this, if enough people let me know (on- or off-line) that it would
be useful to them, and provide some suggestions for output formats.

 Of course, one limitation is that apps can alter
 the data before they put it on the clipboard; in fact, an app might
 opt to convert everything to some default codepage and put only that
 on the clipboard.

It would make sense for a Unicode-specific tool such as this to only
accept data in WM_UNICHAR format, not WM_CHAR.  Unicode data in WM_CHAR
format is pretty much guaranteed to have gone through some conversion
step.

-Doug Ewell
 Fullerton, California





Re: Revised proposal for Missing character glyph

2002-08-30 Thread Peter_Constable

On 08/28/2002 05:38:05 PM Doug Ewell wrote:

Edit controls (edit boxes, text widgets) in Windows already come
equipped with a right-click menu...

It's not hard to imagine that menu being extended with a Character
Info or What's This Glyph? item...

Of course, I have no idea if such a thing will ever be added to Windows
(or any other OS).  I'm sure it's not as simple to implement as I'm
making it sound.

Something that wouldn't be difficult would be an item that copied data to 
the clipboard, and then displayed character info based on the clipboard 
content. Of course, one limitation is that apps can alter the data before 
they put it on the clipboard; in fact, an app might opt to convert 
everything to some default codepage and put only that on the clipboard.


- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]





Re: Revised proposal for Missing character glyph

2002-08-28 Thread Dean Snyder

Kenneth Whistler wrote the following at 2:01 PM on Mon, Aug 26, 2002:

And an approach which strikes me as a much more useful and extensible
way to deal with this would be the concept of a What's This?
text accessory. Essentially a small tool that a user could select
a piece of text with (think of it like a little magnifying glass,
if you will), which will then pop up the contents selected, deconstructed
into its character sequence explicitly.

Good idea - the big attraction being extensibility. But a detraction is
that it would typically mean multiple, or at least explicit, deployment
at the application level on any given platform. (I'm presuming such a
system service would present an optional API to application developers,
who may or may not be using higher level system services for rendering
text). But a font-based approach, being lower level, would be inherited
by all software including that which bypasses all but the lowest level
system services - there's nothing for application developers to do in
such a scenario.

Seems like it would be nice to have both solutions.


Respectfully,

Dean A. Snyder
Scholarly Technology Specialist
Center For Scholarly Resources, Sheridan Libraries
Garrett Room, MSE Library, 3400 N. Charles St.
The Johns Hopkins University
Baltimore, Maryland, USA 21218

office: 410 516-6850 mobile: 410 245-7168 fax: 410-516-6229
Digital Hammurabi: www.jhu.edu/digitalhammurabi
Initiative for Cuneiform Encoding: www.jhu.edu/ice






Re: Revised proposal for Missing character glyph

2002-08-28 Thread Doug Ewell

Dean Snyder dean dot snyder at jhu dot edu wrote:

 Good idea - the big attraction being extensibility. But a detraction
 is that it would typically mean multiple, or at least explicit,
 deployment at the application level on any given platform. (I'm
 presuming such a system service would present an optional API to
 application developers, who may or may not be using higher level
 system services for rendering text). But a font-based approach, being
 lower level, would be inherited by all software including that which
 bypasses all but the lowest level system services - there's nothing
 for application developers to do in such a scenario.

The ability to pinpoint individual glyphs and get code point and other
information could be provided as a system service.

Edit controls (edit boxes, text widgets) in Windows already come
equipped with a right-click menu that allows the user to cut, copy,
paste, and select all.  With Windows 2000 (I don't know about NT 4)
there are also Unicode-specific options, such as Right-to-left reading
order and Insert Unicode control character (which leads to a submenu
where you can choose exciting options like IAFS and NADS, at least until
somebody catches you and calls the police).

It's not hard to imagine that menu being extended with a Character
Info or What's This Glyph? item, which would display a Help cursor
(question mark + arrow pointing NNW).  The user could click on a glyph
within the edit control, and the system would display all the relevant
information about the character corresponding to that glyph in a small
ToolTip™-style window.

Of course, I have no idea if such a thing will ever be added to Windows
(or any other OS).  I'm sure it's not as simple to implement as I'm
making it sound.  But the advantage would be the same as what Dean
envisions for a font-based solution -- applications would get the
support for free, instead of having to re-implement it in multiple,
slightly different ways.

-Doug Ewell
 Fullerton, California





Re: Revised proposal for Missing character glyph

2002-08-28 Thread Dean Snyder

Doug Ewell wrote the following at 8:38 AM on Wed, Aug 28, 2002:

But the advantage would be the same as what Dean
envisions for a font-based solution -- applications would get the
support for free, instead of having to re-implement it in multiple,
slightly different ways.

I don't believe so.

Such a system service would have to have access to the target text to do
its work. And if the target text is not known implicitly by the system
(because an application is not using higher level system text services,
your edit boxes, text widgets, and a lot of applications do NOT use
this stuff exclusively) then the target text must be provided explicitly
by the application.

But this would not the case for the font-based approach, because there
are extremely few applications I am aware of that bypass the system's
actual rendering of font glyphs (only Adobe's ATM comes to mind).

Respectfully,

Dean A. Snyder
Scholarly Technology Specialist
Center For Scholarly Resources, Sheridan Libraries
Garrett Room, MSE Library, 3400 N. Charles St.
The Johns Hopkins University
Baltimore, Maryland, USA 21218

office: 410 516-6850 mobile: 410 245-7168 fax: 410-516-6229
Digital Hammurabi: www.jhu.edu/digitalhammurabi
Initiative for Cuneiform Encoding: www.jhu.edu/ice






Re: Revised proposal for Missing character glyph

2002-08-26 Thread Kenneth Whistler

[Resend of a response which got eaten by the Unicode email
during the system maintenance last week. Carl already responded
to me on this, but others may not have seen what he was
responding to. --Ken]


 Proposed unknown and missing character representation.  This would be an
 alternate to method currently described in 5.3.
 
 The missing or unknown character would be represented as a series of
 vertical hex digit pairs for each byte of the character.

The problem I have with this is that is seems to be an overengineered
approach that conflates two issues:

  a. What does a font do when requested to display a character
 (or sequence) for which it has no glyph.

  b. What does a user do to diagnose text content that may be
 causing a rendering failure.

For the first problem, we already have a widespread approach that
seems adequate. And other correspondents on this topic have pointed
out that the particular approach of displaying up hex numbers for
characters may pose technical difficulties for at least some font
technologies. 

[snip]
 
 
 This representation would be recognized by untrained people as unrenderable
 data or garbage.  So it would serve the same function as a missing glyph
 character except that it would be different from normal glyphs so that they
 would know that something was wrong and the text did not just happen to have
 funny characters.

I don't see any particular problem in training people to recognize when
they are seeing their fonts' notdef glyphs. The whole concept of seeing
little boxes where the characters should be is not hard to explain to
people -- even to people who otherwise have difficulty with a lot of
computer abstractions.

Things will be better-behaved when applications finally get past the
related but worse problem of screwing up the character encodings --
which results in the more typical misdisplay: lots of recognizable 
glyphs, but randomly arranged into nonsensical junk. (Ah, yes, that
must be another piece of Korean spam mail in my mail tray.)

 
 It would aid people in finding the problem and for people with Unicode books
 the text would be decipherable.  If the information was truly critical they
 could have the text deciphered.

Rather than trying to engineer a questionable solution into the fonts,
I'd like to step back and ask what would better serve the user
in such circumstances.

And an approach which strikes me as a much more useful and extensible
way to deal with this would be the concept of a What's This?
text accessory. Essentially a small tool that a user could select
a piece of text with (think of it like a little magnifying glass,
if you will), which will then pop up the contents selected, deconstructed
into its character sequence explicitly. Limited versions of such things
exist already -- such as the tooltip-like popup windows for Asmus'
Unibook program, which give attribute information for characters
in the code chart. But I'm thinking of something a little more generic,
associated with textedit/richedit type text editing areas (or associated
with general word processing programs).

The reason why such an approach is more extensible is that it is not
merely focussed on the nondisplayable character glyph issue, but rather
represents a general ability to query text, whether normally
displayable or not. I could query a black box notdef glyph to find
out what in the text caused its display; but I could just as well
query a properly displayed Telugu glyph, for example, to find out what 
it was, as well.

This is comparable (although more point-oriented) to the concept of
giving people a source display for HTML, so they can figure out
what in the markup is causing rendering problems for their rich
text content.

[snip]

 This proposal would provide a standardized approach that vendors could adopt
 to clarify missing character rendering and reduce support costs.  By
 including this in the standard we could provide a cross vendor approach.
 This would provide a consistent solution.

In my opinion, the standard already provides a description of a cross-vendor
approach to the notdef glyph problem, with the advantage that it is
the de facto, widely adopted approach as well. As long as font vendors stay
away from making {p}'s and {q}'s their notdef glyphs, as I think we can
safely presume they will, and instead use variants on the themes of hollowed
or filled boxes, then the problem of *recognition* of the notdef glyphs
for what they are is a pretty marginal problem.

And as for how to provide users better diagnostics for figuring out the
content of undisplayable text, I suppose the standard could suggest some
implementation guidelines there, but this might be a better area to just
leave up to competing implementation practice until certain user interface
models catch on and get widespread acceptance.

--Ken




RE: Revised proposal for Missing character glyph

2002-08-26 Thread Carl W. Brown

William,

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
 Behalf Of William Overington
 Sent: Friday, August 23, 2002 12:55 AM
 To: James Kass; Carl W. Brown; Unicode List
 Cc: [EMAIL PROTECTED]
 Subject: Re: Revised proposal for Missing character glyph
 
 
 James Kass wrote as follows.
 
 quote
 
 For non-BMP, how about a double tall glyph at the left as the
 plane signifier?  

I double high number or letter will look like a standard letter that will just be 
narrower unless you are displaying text in a narrow font.  In that case it will look 
like a separate character...

This will be very confusing.  Besides I don't like mixing bases and more than using 
octal for represents 8 bit bytes.  It was confusing to use base 4, base 8, base 8, 
base 4, base 8, base 8 etc.

How will you display the rest of the data.  Will you use 65536 glyphs?  That is a 
monster font.  Better would be to use the top 4 bits of the low order 2 bytes then the 
bottom 4 bits of the same bytes.  

In any case you are going to a lot of trouble to avoid vertical hex which is the 
simple solution.  Remember keep it stupid, simple.

Carl
   






Re: Revised proposal for Missing character glyph

2002-08-26 Thread John Cowan

Kenneth Whistler scripsit:

 Things will be better-behaved when applications finally get past the
 related but worse problem of screwing up the character encodings --
 which results in the more typical misdisplay: lots of recognizable 
 glyphs, but randomly arranged into nonsensical junk. (Ah, yes, that
 must be another piece of Korean spam mail in my mail tray.)

In the old days, experts could detect mismatched serial-line
connections based on the nature of the baud barf that the remote
system emitted.

Nowadays, experts can detect mismatched character sets from the
nature of the byte barf that appears on their screen.

-- 
John Cowan   [EMAIL PROTECTED]
You need a change: try Canada  You need a change: try China
--fortune cookies opened by a couple that I know




RE: Revised proposal for Missing character glyph

2002-08-26 Thread Carl W. Brown

Ken,

The little square boxes do not help much if you what to know exactly what
the missing characters are.  I do however feel that any solution to the
problems should be Unicode based.  If left to the vendors that may display
the code page characters and you are guessing again.

The tool idea is great but I do not see how it could be embedded in the OS
without changing the application.  It will also require user training.

I think that as we move away from code  page text we will find that the next
big problem will be characters that are missing from the font or sets of
fonts.  The trick will be to change the set of fonts.  This might require
trial and error if we do not have good diagnostic tools.

Implementing this change will probably be easier that using the special
symbols for the script which will also require special handling and many not
catch all errors.  This approach will also allow critical test that can not
be redisplayed to be deciphered.

This has been a pet peeve of mine having used the Fujitsu Shift JIS solution
and seen it work in a real live situation.

Carl



 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
 Behalf Of Kenneth Whistler
 Sent: Monday, August 26, 2002 2:01 PM
 To: [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Subject: Re: Revised proposal for Missing character glyph


 [Resend of a response which got eaten by the Unicode email
 during the system maintenance last week. Carl already responded
 to me on this, but others may not have seen what he was
 responding to. --Ken]


  Proposed unknown and missing character representation.  This would be an
  alternate to method currently described in 5.3.
 
  The missing or unknown character would be represented as a series of
  vertical hex digit pairs for each byte of the character.

 The problem I have with this is that is seems to be an overengineered
 approach that conflates two issues:

   a. What does a font do when requested to display a character
  (or sequence) for which it has no glyph.

   b. What does a user do to diagnose text content that may be
  causing a rendering failure.

 For the first problem, we already have a widespread approach that
 seems adequate. And other correspondents on this topic have pointed
 out that the particular approach of displaying up hex numbers for
 characters may pose technical difficulties for at least some font
 technologies.

 [snip]

 
  This representation would be recognized by untrained people as
 unrenderable
  data or garbage.  So it would serve the same function as a missing glyph
  character except that it would be different from normal glyphs
 so that they
  would know that something was wrong and the text did not just
 happen to have
  funny characters.

 I don't see any particular problem in training people to recognize when
 they are seeing their fonts' notdef glyphs. The whole concept of seeing
 little boxes where the characters should be is not hard to explain to
 people -- even to people who otherwise have difficulty with a lot of
 computer abstractions.

 Things will be better-behaved when applications finally get past the
 related but worse problem of screwing up the character encodings --
 which results in the more typical misdisplay: lots of recognizable
 glyphs, but randomly arranged into nonsensical junk. (Ah, yes, that
 must be another piece of Korean spam mail in my mail tray.)

 
  It would aid people in finding the problem and for people with
 Unicode books
  the text would be decipherable.  If the information was truly
 critical they
  could have the text deciphered.

 Rather than trying to engineer a questionable solution into the fonts,
 I'd like to step back and ask what would better serve the user
 in such circumstances.

 And an approach which strikes me as a much more useful and extensible
 way to deal with this would be the concept of a What's This?
 text accessory. Essentially a small tool that a user could select
 a piece of text with (think of it like a little magnifying glass,
 if you will), which will then pop up the contents selected, deconstructed
 into its character sequence explicitly. Limited versions of such things
 exist already -- such as the tooltip-like popup windows for Asmus'
 Unibook program, which give attribute information for characters
 in the code chart. But I'm thinking of something a little more generic,
 associated with textedit/richedit type text editing areas (or associated
 with general word processing programs).

 The reason why such an approach is more extensible is that it is not
 merely focussed on the nondisplayable character glyph issue, but rather
 represents a general ability to query text, whether normally
 displayable or not. I could query a black box notdef glyph to find
 out what in the text caused its display; but I could just as well
 query a properly displayed Telugu glyph, for example, to find out what
 it was, as well.

 This is comparable (although more point-oriented

Re: Revised proposal for Missing character glyph

2002-08-26 Thread Barry Caplan

At 09:49 PM 8/26/2002 -0400, John Cowan wrote:
Nowadays, experts can detect mismatched character sets from the
nature of the byte barf that appears on their screen.

And super-experts can read languages in byte barf as it is not random!

Barry Caplan
http://www.i18n.com





Re: Revised proposal for Missing character glyph

2002-08-19 Thread David Hopwood

-BEGIN PGP SIGNED MESSAGE-

Carl W. Brown wrote:
 Proposed unknown and missing character representation.  This would be an
 alternate to method currently described in 5.3.
 
 The missing or unknown character would be represented as a series of
 vertical hex digit pairs for each byte of the character.

Why vertical? Hexadecimal is almost invariably written left-to-right,
top-to-bottom, and that's the order I would expect.

 Garbage data with non-zero bits 24-31 may require 8 digits or 4 pairs of
 digits.

I thought this proposal was intended for characters that cannot be
rendered by a font, not ill-formed encodings?

- -- 
David Hopwood [EMAIL PROTECTED]

Home page  PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5  0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip


-BEGIN PGP SIGNATURE-
Version: 2.6.3i
Charset: noconv

iQEVAwUBPV1OszkCAxeYt5gVAQFFMgf+MeVRGfb0I/Jpv6nTlSA0cmLT5XAJ/NoU
AqYucA3EW0NbEPmVHo++w9erTStLrRBO4O236YDW4ZlXZEpBBgaAbmVfytHpZUmX
pzsneWvo1kOsxdn5ajxW9CrJgQ7fahGNPJrhIH16bcETfxbNUFXKoMMw2KZZIiHb
KbTN9AwlGFqTzUeL4l2U3Il/uFNEirqYeRFqnp7/uH24u0Phgf73/8AR6x1psbC7
s0/bGXRD0Vjje0XZWa2bVRrdoARWiE22pVXWWu+LTpB9ipDLSIy3ccRWOp9oPZSz
L9AF+czOZ9/vPm82DMbKlTKNcBxlCcQORRAyc7feEPBj4F8IwYfPBw==
=bNEr
-END PGP SIGNATURE-




RE: Revised proposal for Missing character glyph

2002-08-19 Thread Carl W. Brown

Ken,

This is an alternate to representing bad glyphs with a missing glyph
character.  People can implement either.

 -Original Message-
 From: Kenneth Whistler [mailto:[EMAIL PROTECTED]]
 Sent: Friday, August 16, 2002 2:28 PM
 To: [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Subject: Re: Revised proposal for Missing character glyph


  Proposed unknown and missing character representation.  This would be an
  alternate to method currently described in 5.3.
 
  The missing or unknown character would be represented as a series of
  vertical hex digit pairs for each byte of the character.

 The problem I have with this is that is seems to be an overengineered
 approach that conflates two issues:

   a. What does a font do when requested to display a character
  (or sequence) for which it has no glyph.

   b. What does a user do to diagnose text content that may be
  causing a rendering failure.

 For the first problem, we already have a widespread approach that
 seems adequate. And other correspondents on this topic have pointed
 out that the particular approach of displaying up hex numbers for
 characters may pose technical difficulties for at least some font
 technologies.


Because proportional fonts require font metrics processing the process must
be able to determine if a character can not be rendered.  The logic can be
changed to use a special font with 257 glyphs to produce these characters.
Thus it should be possible to incorporate this into the operating system
code rather than each application.  It would be best to put it in Open Type
or equivalent code but not all systems have this type of code.  ICU's layout
code would also be a good place.

Systems limited to monospaced fonts will have problems implementing this.


 
  This representation would be recognized by untrained people as
 unrenderable
  data or garbage.  So it would serve the same function as a missing glyph
  character except that it would be different from normal glyphs
 so that they
  would know that something was wrong and the text did not just
 happen to have
  funny characters.

 I don't see any particular problem in training people to recognize when
 they are seeing their fonts' notdef glyphs. The whole concept of seeing
 little boxes where the characters should be is not hard to explain to
 people -- even to people who otherwise have difficulty with a lot of
 computer abstractions.

 Things will be better-behaved when applications finally get past the
 related but worse problem of screwing up the character encodings --
 which results in the more typical misdisplay: lots of recognizable
 glyphs, but randomly arranged into nonsensical junk. (Ah, yes, that
 must be another piece of Korean spam mail in my mail tray.)


Unicode text will do more to fix character encoding problems.  Then the
problem will be either truly bad characters or font problems.  Many systems
have difficulties handling sets of fonts each covering a porting of the
character range.  This would provide an indication of which scripts were
missing.  Yes you could use the suggested script id glyphs but that would
require special processing that would be as difficult as this to implement.

 
  It would aid people in finding the problem and for people with
 Unicode books
  the text would be decipherable.  If the information was truly
 critical they
  could have the text deciphered.

 Rather than trying to engineer a questionable solution into the fonts,
 I'd like to step back and ask what would better serve the user
 in such circumstances.

 And an approach which strikes me as a much more useful and extensible
 way to deal with this would be the concept of a What's This?
 text accessory. Essentially a small tool that a user could select
 a piece of text with (think of it like a little magnifying glass,
 if you will), which will then pop up the contents selected, deconstructed
 into its character sequence explicitly. Limited versions of such things
 exist already -- such as the tooltip-like popup windows for Asmus'
 Unibook program, which give attribute information for characters
 in the code chart. But I'm thinking of something a little more generic,
 associated with textedit/richedit type text editing areas (or associated
 with general word processing programs).

 The reason why such an approach is more extensible is that it is not
 merely focussed on the nondisplayable character glyph issue, but rather
 represents a general ability to query text, whether normally
 displayable or not. I could query a black box notdef glyph to find
 out what in the text caused its display; but I could just as well
 query a properly displayed Telugu glyph, for example, to find out what
 it was, as well.

 This is comparable (although more point-oriented) to the concept of
 giving people a source display for HTML, so they can figure out
 what in the markup is causing rendering problems for their rich
 text content.


Text query will requite that each

Revised proposal for Missing character glyph

2002-08-16 Thread Carl W. Brown

Proposed unknown and missing character representation.  This would be an
alternate to method currently described in 5.3.

The missing or unknown character would be represented as a series of
vertical hex digit pairs for each byte of the character.  BMP characters
would be represented with 4 hex digits or two pairs of hex digits.  Plane
1-16 characters would be represented as 6 digits or 3 pairs of digits.
Garbage data with non-zero bits 24-31 may require 8 digits or 4 pairs of
digits.

This representation would be recognized by untrained people as unrenderable
data or garbage.  So it would serve the same function as a missing glyph
character except that it would be different from normal glyphs so that they
would know that something was wrong and the text did not just happen to have
funny characters.

It would aid people in finding the problem and for people with Unicode books
the text would be decipherable.  If the information was truly critical they
could have the text deciphered.

The missing character glyphs will be best rendered as a series of glyphs by
a font engine capable of glyph positioning.  If that is not possible it
could also be rendered by displaying a fractional space followed by a set of
two to three hex pair glyphs for each character byte follows by another
fractional space.  This would require 256 glyphs for the vertical hex pairs
and a fractional space glyph.

This proposal would provide a standardized approach that vendors could adopt
to clarify missing character rendering and reduce support costs.  By
including this in the standard we could provide a cross vendor approach.
This would provide a consistent solution.