SC UniPad 0.99 released.

2002-08-26 Thread William Overington

As an end user of Unicode I was interested to learn recently that the latest
version of SC UniPad, a Unicode plain text editor for various PCs, has been
released.

This latest version is SC UniPad 0.99 and is available for free download
from the following address on the web.

http://www.unipad.org

A particularly interesting new feature is that one may hold down the Control
key and press the Q key and a small dialogue box appears within which one
may enter the hexadecimal code for any Unicode character.  Upon pressing the
Enter key, that character is entered into the document.  SC UniPad contains
its own font.

Please note in particular the buttons in a column down the left hand side of
the display.  These alter the way in which some code points are indicated in
the display.  For example, if one clicks on the button labelled FMT (which
controls Character Rendering: Formatting Characters)and selects Picture
Glyph, then entry of U+200D into the text document shows a box with the
letters ZWJ in it.

I first learned of the existence of the UniPad program in a response to a
question which I asked in this forum, so I am posting this note so that any
end users of the Unicode system who are at present unaware of the existence
of the UniPad program might know of the opportunity to have a look at it if
they so choose.

The web site has a facility to request email notification of developments to
SC UniPad.  It was by a such requested email notification that I became
aware of the availability of SC UniPad 0.99.

William Overington

26 August 2002








Re: Recent changes to i18n standards

2002-08-26 Thread James E. Agenbroad

On Fri, 23 Aug 2002 [EMAIL PROTECTED] wrote:

 On 08/23/2002 04:54:58 AM Doug Ewell wrote:
 
 For those who like to keep up on such things, there have been recent
 changes to the code lists of two important standards related to
 internationalization -- ISO 639 (language codes) and ISO 3166-2 (codes
 for country subdivisions).
 
 In addition to the two new code elements in ISO 639-2, there's another 
 development of interest in relation to language coding: ISO/TC 37 has 
 begun working toward development of a new part to this standard, to be 
 designated ISO 639-3, that will provide 3-letter identifiers for all known 
 languages. The relationship to part 2 will be that this the 
 individual-language code elements in part 2 will be a subset of part 3 
 (part 2 will continue to have collective-language identifiers but part 3 will 
 not). The reason for the subsetting relationship of part 2 to part 3 
 (rather than just adding a bunch of things to part 2) is that some user 
 communities (e.g. bibliographers) have indicated a need to restrict 
 individual-language identifiers to only developed languages with 
 significant bodies of literature. I'm anticipating a time frame of about 
 one year for this to be completed (assuming the process goes smoothly).
 
 
 
 - Peter
 
 
 ---
 Peter Constable
 
 Non-Roman Script Initiative, SIL International
 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
 Tel: +1 972 708 7485
 E-mail: [EMAIL PROTECTED]
 
Monday, August 26, 2002
Peter, 
 I congratulate you and others who reached this reasonable solution.
 Regards,
  Jim Agenbroad ( [EMAIL PROTECTED] )
 It is not true that people stop pursuing their dreams because they
grow old, they grow old because they stop pursuing their dreams. Adapted
from a letter by Gabriel Garcia Marquez.
 The above are purely personal opinions, not necessarily the official
views of any government or any agency of any.
 Addresses: Office: Phone: 202 707-9612; Fax: 202 707-0955; US
mail: I.T.S. Sys.Dev.Gp.4, Library of Congress, 101 Independence Ave. SE, 
Washington, D.C. 20540-9334 U.S.A.
Home: Phone: 301 946-7326; US mail: Box 291, Garrett Park, MD 20896.  





Romanized Cyrillic bibliographic data--viable fonts?

2002-08-26 Thread J M Craig

Anyone at all familiar with bibliographical data (the MARC standards) 
knows that they can be a real pain to deal with. In this case, the 
difficulty isn't with the MARC data itself, but with the Library of 
Congress's Romanization standards and the lack of support for combining 
half marks in available fonts. I'm trying to help a client properly 
display Romanized Cyrillic from MARC data on a Unicode-enabled 
application. The ultimate problem is, I can't find an available font 
that properly supports the combining half marks FE20 and FE21.

Alan Wood lists these two on his page of fonts by ranges (a truly 
impressive collection of info, BTW, Mr. Wood):

Arial Unicode MS
   Apparently you can only get this with MS Office or Publisher these 
days--not a good solution for my client since their budget's very 
limited and they'd need it on a bunch of workstations. The most 
important issue from a technical point of view is that the marks may not 
properly combine and I don't have a copy of the font to test it myself. 
Does anyone know if these marks will properly combine with T, t, S, s, 
I, i, A, a,  U, u when using the MS font?

Naqsh
   A cursive font (not practical) and the marks don't appear to combine 
properly in any case.

Any suggestions welcomed! Is there a tool out there that will allow you 
to edit a font to add a couple of missing characters?

(A more extensive explanation of the problem follows for those who want 
the gory details.)

John Craig
Alpha-G Consulting, LLC

Gory details:
The bibliographical data in question follows the Library of Congress 
Romanization rules (see this link):

http://lcweb.loc.gov/catdir/cpso/romanization/russian.pdf

An effective conversion to Unicode for the specified Romanizations of 
these Cyrillic characters is proving elusive:

/ts/
Unicode 0426 (capital)  0446 (lower case)
/yu/
Unicode 042E  044E
/ya/
Unicode 042F  044F

The specified Romanization for each of these Cyrillic characters 
includes a ligature over the top of the two Latin code points in 
question (to indicate that the Latin characters represent a single 
Cyrillic character presumably). Now, the proper Unicode sequence for 
what the Library of Congress wants (based on their own documentation of 
the correspondances between the MARC ANSEL character set and Unicode) 
requires the use of the combining half marks left-half ligature U + FE20 
and right-half ligature U + FE21:

/ts/
Unicode 0078 FE20 0077 FE21
t left half ligature s right half ligature
/yu/
Unicode 0069 FE20 0075 FE21
i left half ligature u right half ligature
/ya/
Unicode 0069 FE20 0061 FE21
i left half ligature a right half ligature

All very well, but the application can't paint it because of the lack of 
the combining half marks in the available fonts.







Re: GX Technology

2002-08-26 Thread John Jenkins

On Sunday, August 25, 2002, at 10:12 PM, K S Rohilla wrote:

Hi Everybody
I am Working On Open type Font Technology. Pl. tell me any one GX Technology.
 


Well, outside of the fact that what you want to ask about is called Apple Advanced Typography now (AAT), what is it you need to know?  Have you checked Apple's typography site, http://fonts.apple.com>?

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jhjenkins/



Re: Romanized Cyrillic bibliographic data--viable fonts?

2002-08-26 Thread Frank da Cruz

 Gory details:
 ...
 The specified Romanization for each of these Cyrillic characters 
 includes a ligature over the top of the two Latin code points in 
 question (to indicate that the Latin characters represent a single 
 Cyrillic character presumably).

If you can use horizontal bars over the characters rather than than
the half-ligature marks, this seems to be supported by most fonts:

  http://www.columbia.edu/kermit/st-erkenwald.html

- Frank




Re: SC UniPad 0.99 released.

2002-08-26 Thread Doug Ewell

William Overington WOverington at ngo dot globalnet dot co dot uk
wrote:

 A particularly interesting new feature is that one may hold down the
 Control key and press the Q key and a small dialogue box appears
 within which one may enter the hexadecimal code for any Unicode
 character.  Upon pressing the Enter key, that character is entered
 into the document.  SC UniPad contains its own font.

In a thread two weeks ago about Alt+NumPad sequences, I did mention that
SC UniPad 0.99 would include this Ctrl+Q feature.  It's a very handy
device; my biggest obstacle so far, in fact, is simply *remembering that
it's there* and using it, instead of opening Character Map and clicking
on the character, which is what I had to do before (and which is still
useful if I needed to browse CM to find the character in the first
place).

 Please note in particular the buttons in a column down the left hand
 side of the display.  These alter the way in which some code points
 are indicated in the display.  For example, if one clicks on the
 button labelled FMT (which controls Character Rendering: Formatting
 Characters)and selects Picture Glyph, then entry of U+200D into the
 text document shows a box with the letters ZWJ in it.

And best of all, you can set these rendering options independently for
space characters, ASCII controls, other formatting characters (a broad
category), characters unsupported in the UniPad font (a dying breed;
only Plane 2 is not supported), unassigned code points, unpaired
surrogates, and private-use characters.  Note that unpaired surrogates
are supported for testing purposes, but aren't really a good thing to
have lying around.  Also note that your choices for private-use
characters are a generic picture glyph or a rectangle containing the USV
in hex -- sorry, you can't install your own PUA font.

ALSO, note that the hex-value display option for unassigned code points
provides a neat solution to Martin Kochanski's earlier question about
.notdef glyphs (and the ensuing discussion where Carl Brown and others
suggested 2×2, 2×3, or 3×2 blocks of hex digits).

BTW, the View toolbar doesn't have to run down the left side.  It's
there by default, but you can dock it elsewhere or let it float as a
separate window.  I have the Convert toolbar on the left side and View
on the right because I use Convert more often.

 I first learned of the existence of the UniPad program in a response
 to a question which I asked in this forum, so I am posting this note
 so that any end users of the Unicode system who are at present unaware
 of the existence of the UniPad program might know of the opportunity
 to have a look at it if they so choose.

 The web site has a facility to request email notification of
 developments to SC UniPad.  It was by a such requested email
 notification that I became aware of the availability of SC UniPad
 0.99.

I have asked the main developer of UniPad to post regular update notices
on this list, and he says he will do so shortly, when he can put
together a more thorough list of the new features in 0.99.  Trust me,
there are a LOT.  ☺

-Doug Ewell
 Fullerton, California





Re: Romanized Cyrillic bibliographic data--viable fonts?

2002-08-26 Thread Michael Everson

At 07:27 -0600 2002-08-26, J M Craig wrote:

Any suggestions welcomed! Is there a tool out there that will allow 
you to edit a font to add a couple of missing characters?

The choices are, in general, buying font programs or hiring someone 
to modify your font for you.

Having said that, it would be nice if the major OSes had better 
support for Latin than they do. :-)
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: Romanized Cyrillic bibliographic data--viable fonts?

2002-08-26 Thread William Overington

J M Craig wrote as follows.

[snipped]

Any suggestions welcomed! Is there a tool out there that will allow you
to edit a font to add a couple of missing characters?

You might like to have a look at Softy, which is a shareware font editor for
TrueType fonts.  Softy can be used to produce new TrueType fonts and to edit
existing TrueType fonts.

http://users.iclway.co.uk/l.emmett/

There is some more information about Softy, including the correct email
address for registrations, at the following page.

http://cgm.cs.mcgill.ca/~luc/editors.html

Having a look for

Softy

and

Softy font

at http://www.yahoo.com might be helpful.

I am trying to obtain a copy of the tutorial by Grumpy, so far without
success.

I have found the other tutorial and it is very useful.

I have had lots of fun with the Softy program and although I have not tried
to implement the U+FE20 and U+FE21 which you mention, I have tried various
experiments using Softy and have found it a very satisfactory package to
use.

Softy is shareware, so perhaps you might think it worth a try to find out if
it will help you do what you want to achieve.

Also, you might like to have a look at the SC UniPad program which I
mentioned earlier today in another thread.  When I was studying your posting
I used SC UniPad to have a look at the various Cyrillic characters which you
mentioned.  As far as I can tell at present SC UniPad does not position the
U+FE20 and U+FE21 characters as you might want them to appear, yet SC UniPad
would seem like a good way to key in the text, ready to copy and paste it
into another program which would be used to display the thus keyed text
using a font of your choice.

William Overington

26 August 2002








Re: Revised proposal for Missing character glyph

2002-08-26 Thread Kenneth Whistler

[Resend of a response which got eaten by the Unicode email
during the system maintenance last week. Carl already responded
to me on this, but others may not have seen what he was
responding to. --Ken]


 Proposed unknown and missing character representation.  This would be an
 alternate to method currently described in 5.3.
 
 The missing or unknown character would be represented as a series of
 vertical hex digit pairs for each byte of the character.

The problem I have with this is that is seems to be an overengineered
approach that conflates two issues:

  a. What does a font do when requested to display a character
 (or sequence) for which it has no glyph.

  b. What does a user do to diagnose text content that may be
 causing a rendering failure.

For the first problem, we already have a widespread approach that
seems adequate. And other correspondents on this topic have pointed
out that the particular approach of displaying up hex numbers for
characters may pose technical difficulties for at least some font
technologies. 

[snip]
 
 
 This representation would be recognized by untrained people as unrenderable
 data or garbage.  So it would serve the same function as a missing glyph
 character except that it would be different from normal glyphs so that they
 would know that something was wrong and the text did not just happen to have
 funny characters.

I don't see any particular problem in training people to recognize when
they are seeing their fonts' notdef glyphs. The whole concept of seeing
little boxes where the characters should be is not hard to explain to
people -- even to people who otherwise have difficulty with a lot of
computer abstractions.

Things will be better-behaved when applications finally get past the
related but worse problem of screwing up the character encodings --
which results in the more typical misdisplay: lots of recognizable 
glyphs, but randomly arranged into nonsensical junk. (Ah, yes, that
must be another piece of Korean spam mail in my mail tray.)

 
 It would aid people in finding the problem and for people with Unicode books
 the text would be decipherable.  If the information was truly critical they
 could have the text deciphered.

Rather than trying to engineer a questionable solution into the fonts,
I'd like to step back and ask what would better serve the user
in such circumstances.

And an approach which strikes me as a much more useful and extensible
way to deal with this would be the concept of a What's This?
text accessory. Essentially a small tool that a user could select
a piece of text with (think of it like a little magnifying glass,
if you will), which will then pop up the contents selected, deconstructed
into its character sequence explicitly. Limited versions of such things
exist already -- such as the tooltip-like popup windows for Asmus'
Unibook program, which give attribute information for characters
in the code chart. But I'm thinking of something a little more generic,
associated with textedit/richedit type text editing areas (or associated
with general word processing programs).

The reason why such an approach is more extensible is that it is not
merely focussed on the nondisplayable character glyph issue, but rather
represents a general ability to query text, whether normally
displayable or not. I could query a black box notdef glyph to find
out what in the text caused its display; but I could just as well
query a properly displayed Telugu glyph, for example, to find out what 
it was, as well.

This is comparable (although more point-oriented) to the concept of
giving people a source display for HTML, so they can figure out
what in the markup is causing rendering problems for their rich
text content.

[snip]

 This proposal would provide a standardized approach that vendors could adopt
 to clarify missing character rendering and reduce support costs.  By
 including this in the standard we could provide a cross vendor approach.
 This would provide a consistent solution.

In my opinion, the standard already provides a description of a cross-vendor
approach to the notdef glyph problem, with the advantage that it is
the de facto, widely adopted approach as well. As long as font vendors stay
away from making {p}'s and {q}'s their notdef glyphs, as I think we can
safely presume they will, and instead use variants on the themes of hollowed
or filled boxes, then the problem of *recognition* of the notdef glyphs
for what they are is a pretty marginal problem.

And as for how to provide users better diagnostics for figuring out the
content of undisplayable text, I suppose the standard could suggest some
implementation guidelines there, but this might be a better area to just
leave up to competing implementation practice until certain user interface
models catch on and get widespread acceptance.

--Ken




RE: Revised proposal for Missing character glyph

2002-08-26 Thread Carl W. Brown

William,

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
 Behalf Of William Overington
 Sent: Friday, August 23, 2002 12:55 AM
 To: James Kass; Carl W. Brown; Unicode List
 Cc: [EMAIL PROTECTED]
 Subject: Re: Revised proposal for Missing character glyph
 
 
 James Kass wrote as follows.
 
 quote
 
 For non-BMP, how about a double tall glyph at the left as the
 plane signifier?  

I double high number or letter will look like a standard letter that will just be 
narrower unless you are displaying text in a narrow font.  In that case it will look 
like a separate character...

This will be very confusing.  Besides I don't like mixing bases and more than using 
octal for represents 8 bit bytes.  It was confusing to use base 4, base 8, base 8, 
base 4, base 8, base 8 etc.

How will you display the rest of the data.  Will you use 65536 glyphs?  That is a 
monster font.  Better would be to use the top 4 bits of the low order 2 bytes then the 
bottom 4 bits of the same bytes.  

In any case you are going to a lot of trouble to avoid vertical hex which is the 
simple solution.  Remember keep it stupid, simple.

Carl
   






Re: The Unicode Technical Committee meeting in Redmond, Washington State, USA.

2002-08-26 Thread Kenneth Whistler

William Overington inquired:

 As many readers may know, the Unicode Technical Committee was due to start a
 four day meeting yesterday at the Redmond, Washington State, USA campus of
 Microsoft, that is, on 20 August 2002.
 
 Here in England I am interested to know of what is happening and to learn of
 news from the meeting.

As Sarasvati has indicated, minutes will be publicly posted in a few weeks.
See:
 
http://www.unicode.org/unicode/consortium/utc-minutes.html

[BTW, the minutes from the February and April/May meetings have actually been 
approved, although their status has not been updated to Approved yet on the 
website page.]

 It is the early hours of the morning in Washington State at present.  It is
 hoped that when delegates get up for breakfast that they might look in their
 emails and make early morning responses, or perhaps arrange for an official
 briefing to be posted later in the day.
 
 If I were conducting a live interview with the committee chairman or with an
 official spokesperson I would ask the following questions.

Unfortunately, the UTC has not yet arranged its television contract
with ESPN, since character encoding has not generally been considered
a mass-appeal spectator sport.

However, since I did attend the UTC meeting last week, I may be able to
provide up-to-date commentary regarding some of the questions which are
not better answered by waiting for the official minutes.

 * What was discussed yesterday (Tuesday) please, and what formal decisions,
 if any, were taken please?

Wait for the minutes.

 
 * How many people attended please?

16 on Tuesday. 18 on Wednesday. Back down to 15(?) on Thursday and Friday.

 
 * Is it only companies which are full members of the Unicode Consortium who
 send delegates to the meeting, or are there also representatives of
 organizations who do not vote in decisions present as well?

The latter.

 * Will there be a press statement at the close of the meeting please, and if
 so, will it also be posted in the Unicode mailing list please?

No, there will not be a press statement. Encoding of a VERTICAL LINE EXTENSION
character was not considered of such earth-shattering consequence that
it would lead to headlines in the technology press.

 * Has there been, or is there on the agenda, any discussion of the wording
 in the Unicode specification about the use of the Private Use Area and, if
 so, are any changes to that wording being implemented?

Not discussed by the UTC last week. This is in the purview of the editorial
committee.

 
 * Has there been, or is there on the agenda, any discussion concerning the
 status of the code points U+FFF9 through to U+FFFC please?  There has been
 some discussion recently in the Unicode mailing list about these code
 points, as regards issues of U+FFF9 through to U+FFFB as an issue, the issue
 of using U+FFFC as a single issue, and the issue of using U+FFF9 through to
 U+FFFC all together.  Is the committee discussing these issues at all and,
 if so, are they discussing the matter of whether U+FFFC can be used in
 sending documents from a sender to a receiver please?  Is there any
 discussion of a possible rewording, or changing of meaning, of the wording
 about the U+FFF9 through to U+FFFC code points in the Unicode specification
 please?

Not discussed by the UTC last week. This is in the purview of the editorial
committee.

 
 * Are any matters concerning how the Unicode specification interacts with
 the way that fonts are implemented being discussed please? 

Yes. In a general way, this ends up being discussed at every meeting. 

 If so, is due
 care being taken that as font format is not, at present, an international
 standards matter that therefore the committee must take great care to ensure
 that Unicode does not become dependent upon a usage, express or implied, of
 the intellectual property rights or format of any particular font format
 specification?

The UTC always attempts to exercise due care in what it considers, but it
is unclear just what clarification you are asking for here. The UTC does
not standardize font formats.

 * Is there any discussion of the possibility of adding further noncharacters
 please, considering either or both adding some more noncharacters in plane 0
 and a large block of noncharacters in one of the planes 1 through to 14?

No.

 * Is the committee discussing the issue of interpretation, namely as to how,
 if various people read the published specification so as to have different
 meanings, how people may receive a ruling as to the formally correct meaning
 of the wording of the specification.  This recently arose in relation to the
 U+FFFC character and has previously arisen in relation to what is correct
 usage of the Private Use Area, so there are at least two areas where the
 issue of interpretation has arisen.

No. The UTC is a standardization committee, not a court of law.

If a problem of interpretation of the standard arises, and if the UTC
thinks that is a 

Re: Romanized Cyrillic bibliographic data--viable fonts?

2002-08-26 Thread James Kass


J. M. Craig wrote,

 ... The ultimate problem is, I can't find an available font 
 that properly supports the combining half marks FE20 and FE21.
 

Why not use U+0360 and U+0361 instead?

 /ts/
 Unicode 0078 FE20 0077 FE21
 t left half ligature s right half ligature

...would become:

Unicode 0078 0360 0077
t combining inverted breve s

... or, three characters vs. four characters to write the same thing.

 Any suggestions welcomed! Is there a tool out there that will allow you 
 to edit a font to add a couple of missing characters?


William Overington has mentioned the Softy editor.  Please keep
in mind that fonts are copyrighted material, and, mostly users
are forbidden to modify them, even for internal use purposes.

The best way to get characters added to a font is to ask the
font's developer.

Best regards,

James Kass,
who is now adding U+FE20 .. U+FE23 to the font here.
 






Re: Romanized Cyrillic bibliographic data--viable fonts?

2002-08-26 Thread J M Craig

Thanks for the suggestion--of U+0361 (I don't think U+0360 is going to 
do what I want terribly well). I'm assuming that U+0361 IS in your font 
(I hadn't checked yet). One of the problems with that approach is that I 
don't have enough control over the conversion algorithm to make that 
work--or maybe I could make the right ligature half a non-translated 
character--hmm. I'll have to think about that. At any rate, what I'm 
working with is an algorithm that is much happier with round-trippable 
conversions (which the double breve wouldn't give me). So, no, I don't 
think that'll work. Shoot.

I appreciate your pointing out about the copyright issues--I try to take 
copyrights appropriately seriously. I am in contact with the developer 
of the font in question (from Agfa/Monotype) and I'm REALLY hoping 
they'll agree to add the characters in question. If anyone has access to 
the Arial Unicode MS font and can check to see if U+FE20 and U+FE21 
combine properly, I'd be grateful--I don't want to spend the money to 
get it if it won't solve the display problem!

James Kass wrote:

J. M. Craig wrote,

... The ultimate problem is, I can't find an available font 
that properly supports the combining half marks FE20 and FE21.

Why not use U+0360 and U+0361 instead?

/ts/
Unicode 0078 FE20 0077 FE21
t left half ligature s right half ligature


...would become:

Unicode 0078 0360 0077
t combining inverted breve s

... or, three characters vs. four characters to write the same thing.
snip


James Kass,
who is now adding U+FE20 .. U+FE23 to the font here.
 
Great!

John






Re: Romanized Cyrillic bibliographic data--viable fonts?

2002-08-26 Thread James Kass

James Kass wrote,

 ...would become:
 
 Unicode 0078 0360 0077
 t combining inverted breve s


U+0360 is the double wide combining tilde.
U+0361 is the double wide combining inverted breve.

Oops.

Best regards,

James Kass.






Re: Romanized Cyrillic bibliographic data--viable fonts?

2002-08-26 Thread James Kass


J. M. Craig wrote,

 ... If anyone has access to 
 the Arial Unicode MS font and can check to see if U+FE20 and U+FE21 
 combine properly, I'd be grateful--I don't want to spend the money to 
 get it if it won't solve the display problem!
 

Unless a font is fixed width, Latin combiners can't currently
consistently combine well without smart font technology 
support enabled on the system.  So, don't blame the Arial 
Unicode MS font if these glyphs don't always merge well.  

While awaiting Latin OpenType support, it might be a good
idea to take a look at a well populated fixed width pan-Unicode 
font like Everson Mono.

Best regards,

James Kass.






Re: Revised proposal for Missing character glyph

2002-08-26 Thread John Cowan

Kenneth Whistler scripsit:

 Things will be better-behaved when applications finally get past the
 related but worse problem of screwing up the character encodings --
 which results in the more typical misdisplay: lots of recognizable 
 glyphs, but randomly arranged into nonsensical junk. (Ah, yes, that
 must be another piece of Korean spam mail in my mail tray.)

In the old days, experts could detect mismatched serial-line
connections based on the nature of the baud barf that the remote
system emitted.

Nowadays, experts can detect mismatched character sets from the
nature of the byte barf that appears on their screen.

-- 
John Cowan   [EMAIL PROTECTED]
You need a change: try Canada  You need a change: try China
--fortune cookies opened by a couple that I know




Re: SC UniPad 0.99 released.

2002-08-26 Thread Jungshik Shin


On Mon, 26 Aug 2002, William Overington wrote:

 This latest version is SC UniPad 0.99 and is available for free download
 from the following address on the web.

 http://www.unipad.org

 On several occasions, I heard  about it on this mailing list and finally
my curiosity drove me to try it. Unfortunately, I was mightly
disappointed.  At first, I was intrigued by their claim that it
supports Hangul Jamos.  I've seen some false claims that Hangul
Jamos is supported and wanted to see if it really support them. Well,
it does not do any better than most other fonts/software that made that
claim. It just treats them as 'spacing characters' instead of combining
characters. Basically, it's useless except for making Unicode code chart
(so is Arial MS Unicode.)

Then, I found its claim that it supports 300 languages(scripts). Wow !
Does it properly support various South and Southeast Asian scripts?
Again, it does not. It treats combining characters as spacing characters.
I don't think users of those scripts would regard SC Unipad as supporting
their scripts/languages.

Its FAQ 4.2 has the following:

SC We have to differentiate between the simple inclusion of
SC the glyphs into the UniPad font and the implementation of special
SC text processing algorithms. It's definitely our goal to finally support
SC all CJK (Chinese, Japanese, Korean) characters and all Indic scripts
SC (Devanagari, Gurmukhi, etc.).

Judging from the above, I think they are well aware that simply including
the nominal glyphs for scripts taken from the Unicode code chart in
the UniPad font is diffferent from supporting scripts.  In addition,
its list of general features makes it clear that it does not support
'combined rendering of non-spacing marks'.  I can't help wondering, then,
why they  list Hindi, Thai, Tibetan, Lao, Bengali and many other
South and Southeast Asian languages  in the list of supported languages.


 A particularly interesting new feature is that one may hold down the Control
 key and press the Q key and a small dialogue box appears within which one
 may enter the hexadecimal code for any Unicode character.  Upon pressing the


 I first learned of the existence of the UniPad program in a response to a
 question which I asked in this forum, so I am posting this note so that any

  You may want to check out Yudit (http://www.yudit.org). Although its
author is not so fond of MS Windows, it works in MS Windows as well
as in Unix/X11. It supports South and Southeast Asian scripts, Arabic,
Hebrew with BIDI, Hangul Jamos(at the same level as Korean MS Office XP
in terms of the number of syllables made out of Jamos) and many other
(easier-to-deal-with) writing systems with various input methods/keyboards
(including Unicode codepoint in hex input).  It can also  represent
unrenderable characters with hex code in a box. If it lacks support for
your script/language and you can code, you may be able to add it yourself
either for yourself or with the author's help as I did for Hangul Jamos.

  Jungshik





RE: Revised proposal for Missing character glyph

2002-08-26 Thread Carl W. Brown

Ken,

The little square boxes do not help much if you what to know exactly what
the missing characters are.  I do however feel that any solution to the
problems should be Unicode based.  If left to the vendors that may display
the code page characters and you are guessing again.

The tool idea is great but I do not see how it could be embedded in the OS
without changing the application.  It will also require user training.

I think that as we move away from code  page text we will find that the next
big problem will be characters that are missing from the font or sets of
fonts.  The trick will be to change the set of fonts.  This might require
trial and error if we do not have good diagnostic tools.

Implementing this change will probably be easier that using the special
symbols for the script which will also require special handling and many not
catch all errors.  This approach will also allow critical test that can not
be redisplayed to be deciphered.

This has been a pet peeve of mine having used the Fujitsu Shift JIS solution
and seen it work in a real live situation.

Carl



 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
 Behalf Of Kenneth Whistler
 Sent: Monday, August 26, 2002 2:01 PM
 To: [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Subject: Re: Revised proposal for Missing character glyph


 [Resend of a response which got eaten by the Unicode email
 during the system maintenance last week. Carl already responded
 to me on this, but others may not have seen what he was
 responding to. --Ken]


  Proposed unknown and missing character representation.  This would be an
  alternate to method currently described in 5.3.
 
  The missing or unknown character would be represented as a series of
  vertical hex digit pairs for each byte of the character.

 The problem I have with this is that is seems to be an overengineered
 approach that conflates two issues:

   a. What does a font do when requested to display a character
  (or sequence) for which it has no glyph.

   b. What does a user do to diagnose text content that may be
  causing a rendering failure.

 For the first problem, we already have a widespread approach that
 seems adequate. And other correspondents on this topic have pointed
 out that the particular approach of displaying up hex numbers for
 characters may pose technical difficulties for at least some font
 technologies.

 [snip]

 
  This representation would be recognized by untrained people as
 unrenderable
  data or garbage.  So it would serve the same function as a missing glyph
  character except that it would be different from normal glyphs
 so that they
  would know that something was wrong and the text did not just
 happen to have
  funny characters.

 I don't see any particular problem in training people to recognize when
 they are seeing their fonts' notdef glyphs. The whole concept of seeing
 little boxes where the characters should be is not hard to explain to
 people -- even to people who otherwise have difficulty with a lot of
 computer abstractions.

 Things will be better-behaved when applications finally get past the
 related but worse problem of screwing up the character encodings --
 which results in the more typical misdisplay: lots of recognizable
 glyphs, but randomly arranged into nonsensical junk. (Ah, yes, that
 must be another piece of Korean spam mail in my mail tray.)

 
  It would aid people in finding the problem and for people with
 Unicode books
  the text would be decipherable.  If the information was truly
 critical they
  could have the text deciphered.

 Rather than trying to engineer a questionable solution into the fonts,
 I'd like to step back and ask what would better serve the user
 in such circumstances.

 And an approach which strikes me as a much more useful and extensible
 way to deal with this would be the concept of a What's This?
 text accessory. Essentially a small tool that a user could select
 a piece of text with (think of it like a little magnifying glass,
 if you will), which will then pop up the contents selected, deconstructed
 into its character sequence explicitly. Limited versions of such things
 exist already -- such as the tooltip-like popup windows for Asmus'
 Unibook program, which give attribute information for characters
 in the code chart. But I'm thinking of something a little more generic,
 associated with textedit/richedit type text editing areas (or associated
 with general word processing programs).

 The reason why such an approach is more extensible is that it is not
 merely focussed on the nondisplayable character glyph issue, but rather
 represents a general ability to query text, whether normally
 displayable or not. I could query a black box notdef glyph to find
 out what in the text caused its display; but I could just as well
 query a properly displayed Telugu glyph, for example, to find out what
 it was, as well.

 This is comparable (although more point-oriented) 

Re: Revised proposal for Missing character glyph

2002-08-26 Thread Barry Caplan

At 09:49 PM 8/26/2002 -0400, John Cowan wrote:
Nowadays, experts can detect mismatched character sets from the
nature of the byte barf that appears on their screen.

And super-experts can read languages in byte barf as it is not random!

Barry Caplan
http://www.i18n.com





Re: The Unicode Technical Committee meeting in Redmond, Washington State, USA.

2002-08-26 Thread Doug Ewell

Kenneth Whistler kenw at sybase dot com wrote:

 Is there an official press spokesperson for the meeting please?

 Well, I guess I just nominated myself. ;-)

A fine choice.  The ability to answer a reporter's questions BEFORE they
are asked is a rare gift in the field of press relations, and the mark
of a true professional.

-Doug Ewell
 Fullerton, California


 --Ken Whistler

 16 August 2002


 William Overington

 21 August 2002