Re: Application that displays CJK text in Normalization Form D

2010-11-13 Thread Jim Breen
On Sat, 13 Nov 2010  Jim Monty  wrote:
>
> Is there even a single software application that properly displays CJK text in
> Normalization Form D?
>
> NFC: ドライドマンゴス
> NFD: ドライドマンゴス
>
> NFC: 나는 유리를 먹을 수 있어요. 그래도 아프지 않아요
> NFD: 나는 유리를 먹을 수 있어요. 그래도 아프지 않아요

Google's Chromium browser (6.0.409.0 (47612) Ubuntu) displayed both
correctly. Yudit (Unicode editor - http://www.yudit.org/) also displayed both
correctly.

Firefox (3.6,12 - Ubuntu) placed the dakuten over the following katakana
and mangled the hangul. GNOME Terminal (2.28.1) did the same.

Opera (10.63 - Linux) displayed the dakuten and most of the hangul as
rectangles.

> NFC: 나는 유리를 먹을 수 있어요. 그래도 아프지 않아요
> NFD: 나는 유리를 먹을 수 있어요. 그래도 아프지 않아요

Jim

--
Jim Breen
Adjunct Snr Research Fellow, Clayton School of IT, Monash University
Vice-president: Hawthorn Rowing Club, Treasurer: Japanese Studies Centre
Graduate student: Language Technology Group, University of Melbourne




Re: Application that displays CJK text in Normalization Form D

2010-11-13 Thread Philippe Verdy
Note however that when editing a reply to your message within Gmail, the
text that appears in the webform containing your text in NFD will cause
Gmail to reject storing the text or sending it.

If you try to save the temporary message or send it, Gmail says "error, the
action has failed. Please retry", and you can retry any number of times, it
will fail. I think this is a severe bug of Gmail :  you need to delete the
NFD text or normalize it in an external application.

Philippe.

2010/11/14 Jim Monty 

>
> Is there even a single software application that properly displays CJK text
> in
> Normalization Form D?
>
> NFC: ドライドマンゴス
>
> NFC: 나는 유리를 먹을 수 있어요. 그래도 아프지 않아요
>
> Aren't the two versions of the same Unicode text supposed to be rendered
> the
> same? They're not, at least not in any of the applications in which I've
> viewed
> them: Microsoft Internet Explorer, Microsoft Notepad, Vim, BabelPad and SC
> Unipad.
>
> Jim Monty
>
>
>
>
>
>
>


Re: Application that displays CJK text in Normalization Form D

2010-11-13 Thread Philippe Verdy
They are the same for me when viewed in Gmail (in any one of the modern
browsers in their most current versions on Windows, I did not test on MacOS
X or Linux).

I suppose that Gmail renormalizes the texts to NFC before displaying them...

I can't even detect a difference in the HTML source of the displayed
message, all seems to be in NFC (could that originate from the web browser
performing such normalization immediately on HTML text elements before
entering them in the DOM and making them accessible from Javascript ?)

I've stopped using local mail clients (like Outlook, Outlook Express,
Windows Mail, and others since long now, because webmails are definitely
more practical for me, from any PC or smart phone, and offer comfortable
storage space for storing many years or emails, as long as you cleanup the
undetected spams, as most spams fall in a specific box whose cleanup is
automated), so I can't confirm that they will normalize the texts. This may
not be the case however for attachments (if their MIME type is not "text/*",
or if they are digitally signed).

Plain text editors are not supposed to perform such normalizations, so all
will depend on how they manage their own internal data storage. But yes,
these editors should display them exactly the same (if not, this is an issue
of how they use their text renderers), even if they are left in their
initial normalization form (or in unnormalized forms).

Philippe.


Re: Application that displays CJK text in Normalization Form D

2010-11-13 Thread Aki Inoue
All Cocoa/Cocoa Touch apps display them correctly. 

Aki Inoue


On 2010/11/13, at 17:07, Bill Poser  wrote:

> 
> 
> On Sat, Nov 13, 2010 at 4:46 PM, Jim Monty  wrote:
> Is there even a single software application that properly displays CJK text in
> Normalization Form D?
> 
> 
> I just tried your examples in Yudit (http://www.yudit.org) and they seem to 
> work: the NFD text looks the same as the NFC text. 
> 


Re: Application that displays CJK text in Normalization Form D

2010-11-13 Thread Bill Poser
On Sat, Nov 13, 2010 at 4:46 PM, Jim Monty  wrote:

> Is there even a single software application that properly displays CJK text
> in
> Normalization Form D?
>
>
I just tried your examples in Yudit (http://www.yudit.org) and they seem to
work: the NFD text looks the same as the NFC text.


Application that displays CJK text in Normalization Form D

2010-11-13 Thread Jim Monty
Is there even a single software application that properly displays CJK text in 
Normalization Form D?

NFC: ドライドマンゴス
NFD: ドライドマンゴス

NFC: 나는 유리를 먹을 수 있어요. 그래도 아프지 않아요
NFD: 나는 유리를 먹을 수 있어요. 그래도 아프지 않아요

Aren't the two versions of the same Unicode text supposed to be rendered the 
same? They're not, at least not in any of the applications in which I've viewed 
them: Microsoft Internet Explorer, Microsoft Notepad, Vim, BabelPad and SC 
Unipad.

Jim Monty





Re: Combining Triple Diacritics (N3915) not accepted by UTC #125

2010-11-13 Thread Benjamin M Scarborough
I believe that the key to getting these characters encoded is 
establishing that there is a vital semantic importance to the character 
that is lost if it is stripped away. This is the grounds for the 
Mathematical Alphanumeric Symbols block.

Unfortunately, figures 1 and 2 from JTC1/SC2/WG2 N3915 actually provide 
a reason -against- encoding. The meaning of the diacritic in these two 
examples is that the transliterated letters were ligated in the 
original text. In this usage, the mark can span any arbitrary number of 
letters; indeed, figure 2 shows the mark in question spanning four 
letters. This makes it a much better candidate for use in higher-level 
markup than a set of combining marks.

Figures 3 and 4 present a better case and show a stronger need for some 
combining triple diacritic. I notice that all seven examples between 
the two figures represent what would normally be two letters with a 
double diacritic, but some modifier symbol intervenes and stretches the 
tie to span three. However, proposing the triple diacritics used this 
way would require proof that the sequence of letters with the diacritic 
has some important difference from the same sequence of letters 
without, which N3915 fails to establish.

In any event, I happen to know that there is in some phonetic 
transcription system an "sch" with breve below. It is used to represent 
[ʒ], which contrasts with the unmarked sch used to represent [ʃ]. This 
is a clear semantic distinction, and so the sch with breve below should 
be encoded in some fashion, either as a sequence of characters or some 
fully composed one.

--Ben Scarborough