Re: Application that displays CJK text in Normalization Form D
On Sat, 13 Nov 2010 Jim Monty wrote: > > Is there even a single software application that properly displays CJK text in > Normalization Form D? > > NFC: ドライドマンゴス > NFD: ドライドマンゴス > > NFC: 나는 유리를 먹을 수 있어요. 그래도 아프지 않아요 > NFD: 나는 유리를 먹을 수 있어요. 그래도 아프지 않아요 Google's Chromium browser (6.0.409.0 (47612) Ubuntu) displayed both correctly. Yudit (Unicode editor - http://www.yudit.org/) also displayed both correctly. Firefox (3.6,12 - Ubuntu) placed the dakuten over the following katakana and mangled the hangul. GNOME Terminal (2.28.1) did the same. Opera (10.63 - Linux) displayed the dakuten and most of the hangul as rectangles. > NFC: 나는 유리를 먹을 수 있어요. 그래도 아프지 않아요 > NFD: 나는 유리를 먹을 수 있어요. 그래도 아프지 않아요 Jim -- Jim Breen Adjunct Snr Research Fellow, Clayton School of IT, Monash University Vice-president: Hawthorn Rowing Club, Treasurer: Japanese Studies Centre Graduate student: Language Technology Group, University of Melbourne
Re: Application that displays CJK text in Normalization Form D
Note however that when editing a reply to your message within Gmail, the text that appears in the webform containing your text in NFD will cause Gmail to reject storing the text or sending it. If you try to save the temporary message or send it, Gmail says "error, the action has failed. Please retry", and you can retry any number of times, it will fail. I think this is a severe bug of Gmail : you need to delete the NFD text or normalize it in an external application. Philippe. 2010/11/14 Jim Monty > > Is there even a single software application that properly displays CJK text > in > Normalization Form D? > > NFC: ドライドマンゴス > > NFC: 나는 유리를 먹을 수 있어요. 그래도 아프지 않아요 > > Aren't the two versions of the same Unicode text supposed to be rendered > the > same? They're not, at least not in any of the applications in which I've > viewed > them: Microsoft Internet Explorer, Microsoft Notepad, Vim, BabelPad and SC > Unipad. > > Jim Monty > > > > > > >
Re: Application that displays CJK text in Normalization Form D
They are the same for me when viewed in Gmail (in any one of the modern browsers in their most current versions on Windows, I did not test on MacOS X or Linux). I suppose that Gmail renormalizes the texts to NFC before displaying them... I can't even detect a difference in the HTML source of the displayed message, all seems to be in NFC (could that originate from the web browser performing such normalization immediately on HTML text elements before entering them in the DOM and making them accessible from Javascript ?) I've stopped using local mail clients (like Outlook, Outlook Express, Windows Mail, and others since long now, because webmails are definitely more practical for me, from any PC or smart phone, and offer comfortable storage space for storing many years or emails, as long as you cleanup the undetected spams, as most spams fall in a specific box whose cleanup is automated), so I can't confirm that they will normalize the texts. This may not be the case however for attachments (if their MIME type is not "text/*", or if they are digitally signed). Plain text editors are not supposed to perform such normalizations, so all will depend on how they manage their own internal data storage. But yes, these editors should display them exactly the same (if not, this is an issue of how they use their text renderers), even if they are left in their initial normalization form (or in unnormalized forms). Philippe.
Re: Application that displays CJK text in Normalization Form D
All Cocoa/Cocoa Touch apps display them correctly. Aki Inoue On 2010/11/13, at 17:07, Bill Poser wrote: > > > On Sat, Nov 13, 2010 at 4:46 PM, Jim Monty wrote: > Is there even a single software application that properly displays CJK text in > Normalization Form D? > > > I just tried your examples in Yudit (http://www.yudit.org) and they seem to > work: the NFD text looks the same as the NFC text. >
Re: Application that displays CJK text in Normalization Form D
On Sat, Nov 13, 2010 at 4:46 PM, Jim Monty wrote: > Is there even a single software application that properly displays CJK text > in > Normalization Form D? > > I just tried your examples in Yudit (http://www.yudit.org) and they seem to work: the NFD text looks the same as the NFC text.
Application that displays CJK text in Normalization Form D
Is there even a single software application that properly displays CJK text in Normalization Form D? NFC: ドライドマンゴス NFD: ドライドマンゴス NFC: 나는 유리를 먹을 수 있어요. 그래도 아프지 않아요 NFD: 나는 유리를 먹을 수 있어요. 그래도 아프지 않아요 Aren't the two versions of the same Unicode text supposed to be rendered the same? They're not, at least not in any of the applications in which I've viewed them: Microsoft Internet Explorer, Microsoft Notepad, Vim, BabelPad and SC Unipad. Jim Monty
Re: Combining Triple Diacritics (N3915) not accepted by UTC #125
I believe that the key to getting these characters encoded is establishing that there is a vital semantic importance to the character that is lost if it is stripped away. This is the grounds for the Mathematical Alphanumeric Symbols block. Unfortunately, figures 1 and 2 from JTC1/SC2/WG2 N3915 actually provide a reason -against- encoding. The meaning of the diacritic in these two examples is that the transliterated letters were ligated in the original text. In this usage, the mark can span any arbitrary number of letters; indeed, figure 2 shows the mark in question spanning four letters. This makes it a much better candidate for use in higher-level markup than a set of combining marks. Figures 3 and 4 present a better case and show a stronger need for some combining triple diacritic. I notice that all seven examples between the two figures represent what would normally be two letters with a double diacritic, but some modifier symbol intervenes and stretches the tie to span three. However, proposing the triple diacritics used this way would require proof that the sequence of letters with the diacritic has some important difference from the same sequence of letters without, which N3915 fails to establish. In any event, I happen to know that there is in some phonetic transcription system an "sch" with breve below. It is used to represent [ʒ], which contrasts with the unmarked sch used to represent [ʃ]. This is a clear semantic distinction, and so the sch with breve below should be encoded in some fashion, either as a sequence of characters or some fully composed one. --Ben Scarborough