RE: Unicode Ruby

2004-12-19 Thread Murray Sargent
Couple of notes on Word's support. Word has been based on Unicode since
Word '97, although it certainly didn't support all of Unicode at that
time. Word has displayed ruby in built-up form for several versions now
(the name for it is under Asian formatting and called "phonetic guide").

Murray 

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
Behalf Of Dean Snyder
Sent: Saturday, December 18, 2004 11:53 AM
To: Unicode List
Subject: Unicode Ruby

Can anyone recommend common and/or cross-platform technologies that
render Unicode ruby text in ways other than simply enclosing it within
trailing parentheses (in other words, technologies that would place it
above the annotated text and in a smaller font size, as is typically
done traditionally)? By technologies I'm thinking of things like
internet browsers, email clients, word processors, desktop publishing
programs, computer operating systems, and cross-platform programming
platforms (like Java).

-

So far I've only checked on Mac OS X 10.3.6 and found the following:

Browsers
Safari, Firefox, Internet Explorer, and OmniWeb all display Unicode ruby
in parentheses. (Internet Explorer does, however, display *HTML* ruby
above and smaller.)

Word Processors
Nisus Writer Express, Mellel, and TextEdit all display Unicode ruby in
parentheses. I don't know about the latest Microsoft Word, which I
understand does some Unicode, but the previous one, of course, didn't
even do Unicode at all.

Email Clients
PowerMail and Apple Mail use parentheses.

Desktop Publishing/Graphics
Adobe InDesign, Photoshop, and Illustrator (all CS) use parentheses.

Computer Operating Systems
I can only assume, based on application behavior, that the Mac OS X
default rendering of Unicode ruby is with parentheses, but I haven't
explicitly checked the API documentation for this yet. I am not
qualified to comment on Windows XP or Linux.

Java
I haven't checked the various Java virtual machines yet, but plan to do
so.

I would be very interested if anyone could provide similar information
for Windows and Linux.

---

Frankly I am disappointed with the results so far. It seems like
everyone has taken the easy and ugly way out. I'm particularly surprised
and disappointed by InDesign, a great page layout and desktop publishing
application. But I would also have thought that the browsers would have
been motivated to do better; even Internet Explorer, which shows that it
is both doable and desirable by implementing it for html ruby, punted
when it came to Unicode ruby.

Isn't this basically just unacceptable for Japanese readers? Do we
really put out computer operating systems localized for Japanese users
without OS support for super-posed ruby?

Anyway, my interest is in applying the ruby mechanism to cuneiform text,
where, similar to Japanese, there is a one-to-many relationship between
any given single (ideographic) character and its many possible context-
free realizations. It would be important not to clutter the visual
cuneiform text with roman-transliterations in parentheses after every
character.

I know custom software can handle ruby any way it wants to, and I am
working on such software, but at the same time it is very important that
operating systems and major software do the right thing here - users do
not want to keep their text isolated in custom applications. And,
anyway, shouldn't this already be in place and ubiquitous given the
importance of properly supporting the Japanese script?

---

An interesting aside: it is particularly felicitous to note that the
typical practice of rendering ruby text in smaller font sizes than the
text it annotates happens to be a PERFECT match for the needs of
rendering annotated cuneiform plain text. All one needs to do is to look
at the visual complexity of cuneiform glyphs to realize that, in order
to be distinguishable on foreseeable display technologies, cuneiform
glyphs need to be rendered in relatively larger font sizes than, say,
Roman text. And exactly analogous to the Japanese situation, the
secondary glyphs used for annotation of cuneiform happen to be
glyphically simpler that the primary glyphs thereby permitting the
reduction in size that emphasizes their secondary nature. A nice
coincidence the benefits of which cuneiformists will simply inherit - no
work request will be added to anybody's agenda (any implementor that
does the right thing for Japanese will, by definition, be doing the
right thing for cuneiform).
It's always nice when such unforeseen things happen.


Respectfully,

Dean A. Snyder

Assistant Research Scholar
Manager, Digital Hammurabi Project
Computer Science Department
Whiting School of Engineering
218C New Engineering Building
3400 North Charles Street
Johns Hopkins University
Baltimore, Maryland, USA 21218

office: 410 516-6850
cell: 717 817-4897
www.jhu.edu/digital

RE: Unicode Ruby

2004-12-19 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On
> Behalf Of Dean Snyder


> Can anyone recommend common and/or cross-platform technologies that
> render Unicode ruby text in ways other than simply enclosing it within
> trailing parentheses (in other words, technologies that would place it
> above the annotated text and in a smaller font size, as is typically
done
> traditionally)?...

> Browsers
> Safari, Firefox, Internet Explorer, and OmniWeb all display Unicode
ruby
> in parentheses. (Internet Explorer does, however, display *HTML* ruby
> above and smaller.)...

> Frankly I am disappointed with the results so far...

I think you've set the wrong expectations. Have you read the description
and implementation guidelines for the annotation characters in Unicode?


The annotation characters are used in internal processing when
out-of-band information is associated with a character stream...

Usage of the annotation characters in plain text interchange is strongly
discouraged without prior agreement between the sender and the
receiver...

When an output for plain text usage is desired and when the receiver is
unknown to the sender, these interlinear annotation characters should be
removed as well as the annotating text included between the INTERLINEAR
ANNOTATION SEPARATOR and the INTERLINEAR ANNOTATION TERMINATOR.

This restriction does not preclude the use of annotation characters in
plain text interchange, but it requires a prior agreement between the
sender and the receiver for correct interpretation of the annotations.


These characters are primarily intended for apps to use in internal
processing, not for encoding documents.


> I know custom software can handle ruby any way it wants to, and I am
> working on such software, but at the same time it is very important
that
> operating systems and major software do the right thing here...

It appears that they are (apart from Web browsers that don't handle HTML
ruby).


Peter Constable





Re: Simplified Chinese radical set in Unihan

2004-12-19 Thread Richard Cook
On Dec 16, 2004, at 3:20 PM, Tom Emerson wrote:
Ah, I don't have my copy of the Comprehensive ABC here at home with me.
If you have Wenlin, you have it in electronic form. Wenlin does the 
typesetting (and sub-licensing) for ABC, and the ABC data is accessible 
from within the Wenlin app.

But on the subject of a Simplified Chinese radical set for Unihan:
Please see the new field kHDZRadBreak coming in the Unihan 4.1 beta. 
This field shows a way to add additional radical info to Unihan. That 
is, for a lexical kSource in Unihan, one can associate kSource mappings 
with radical transitions. The Hanyu Da Zidian radical set is in fact a 
simplification of Kang Xi, though not one using simplified characters. 
When lexical mappings for a good simplified PRC lexicon are included in 
Unihan, a similar table can be built. We've got mapping and pinyin data 
for all of Xiandai Hanyu Cidian, accepted by UTC for inclusion in 
future Unihan. This will hopefully be added to Unihan in the coming 
year (pending final proofing).