Re: User-Hostile Text Editing (was: Unicode String Models)
On 2012-07-21, Richard Wordingham richard.wording...@ntlworld.com wrote: Are there any widely available ways of enabling the deleting of the first character in a default grapheme cluster? Having carefully added two or more marks to a base character, I find it extremely irritating to find I have entered the wrong base character and have to type the whole thing again. As one can delete the last character in a cluster, why not the first? It's not as though the default grapheme cluster is usually thought of as a single character. What do you mean by widely available? A decent editor should let you choose whether to break apart clusters or not. I presume that such editors exist! (Mine always breaks clusters, but that's because I'm the only user, and I don't care enough to implement clustering;-) Yudit might be one, but since it seems to have no documentation, I can't tell. If yours doesn't, then get on to its authors! -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Re: User-Hostile Text Editing
On Sun, 22 Jul 2012 00:25:07 + Murray Sargent murr...@exchange.microsoft.com wrote: I'd think deleting the first character of a cluster would make a nice context-menu option. For example, when you right-click on a cluster, the resulting context menu could have an entry like delete first character. Maybe other such options could be added as well. I was thinking of simplifying, rather than adding features. One could imagine a whole cluster-editing pop-up, but in the end it might not make editing any easier. Mind you, when consonant stacks are treated as grapheme clusters, it might be useful - Tibetan can have some fairly large legacy grapheme clusters, and if one tailors aksharas to be grapheme clusters one can get some monsters in the Tai Tham script - NA, SAKOT, HIGH TA, MEDIAL RA, AA, SAKOT, LOW YA (visually the base consonant is the ligature NA, AA, with dependent consonants arrayed around it) and HIGH HA, SAKOT, NA, E, OA BELOW, I, TONE-1, SAKOT, LOW YA spring to mind. Richard.
No appropriate code point for some Chinese punctuation marks
As a matter of fact, the situation of Chinese punctuation marks is a mess. Until now we do not have independant symbols for Chinese dashes, ellipsis, interpunct in Unicode. In practice, we use three kinds of dashes in Chinese: one is halfwidth, corresponding to the Latin hyphen symbol; one is one-character-width, corresponding to the Latin en dash symbol; one is two-character-width, corresponding to the Latin em dash. The Chinese now always treat U+2013(en dash) or U+002D(hyphen-minus) as the halfwidth dash, and assign U+2014(em dash) as their one-character-width dash, and for the two-character-width dash, they just enter two em dashes. However, it is just a compromise, since these dashes and hyphen were always designed for Latin characters typesetting, the horizontal line does not sit in the middle height of a Hanzi character. And for some typefaces, two continual em dashes come out a long horizontal line but a break in the middle, which is not consistent with the appearance of a two-character-width dash. Chinese don't have a two-character-width ellipsis(six dots) either. Actually, we can generate a two-character-width ellipsis with two continual one-character-ellipsis(three dots) which is well designed for the position of each dot. But now Chinese use a ugly hack—they just type two continual '…'(HORIZONTAL ELLIPSIS, U+2026). As the case of dashes, the dots do not lie in the middle height of a Hanzi character. Some people will choose the mathematical operator '⋯'(MIDLINE HORIZONTAL ELLIPSIS, U+22EF) as a substitute, but a lot of Chinese fonts don't support such a symbol, and most of all, it is a mathematical operator, not punctuation mark!! For interpunct, we need a solid dot which sits in both vertical and horizontal center of the character box. Actually the Katakana symbol '・'(KATAKANA MIDDLE DOT, U+30FB) is a good implement for Chinese interpunct, but as the name reveals, it is just a Katakana symbol, not a common punctuation mark for East Asian characters. So should we submit a proposal for these Chinese punctuation?
Re: No appropriate code point for some Chinese punctuation marks
On 7/22/2012 7:08 AM, Gary Kilfear wrote: should we submit a proposal for these Chinese punctuation? My take is that a proposal, with its requirements for evidence and samples, it the best way to systematically capture and collect the information. Once everything is on the table, UTC will be in a position to resolve these issues. It may be, that some characters exist that are perfect fits for some required punctuation mark, but have been misunderstood in the user community. I suspect that for KATAKANA MIDDLE DOT. Knowing the issue would allow this to be better documented. In other cases UTC would have the ability to review, and either reaffirm or revisit certain explicit or implicit unifications. Detailed documentation of usage (and examples of success and failure of certain approaches taken by users today are essential here). I'm personally no great fan of unifications of punctuation based on basic similarity of shape and shared purpose alone. I believe that vertical alignment issues as well as width (or sidebearing) differences, if significant and visible, should be considered grounds for disunification. Especially in multiscript environment, and those are not that rare, really, it's almost impossible to get such unfications to behave correctly without explicit font binding. And we all know that control of that is elusive in many contexts. Finally, one or the other character may well be missing entirely. The existing state of affairs is not based on a systematic, complete, and detailed review of the use of punctuation in China (or East Asia). Such a review is overdue, in fact, and in view, the kind of document that would form a suitable base for such a review would be indistinguishable from the typical background document for a proposal. Most helpful would be if this paper could be written like a monograph on Chinese Puncutaion marks and Unicode, and would include, on the same footing also the material about existing characters that map well to specific Chinese punctuation marks. If that were done, the resulting paper could be used twice. Once to support any necessary changes to unifications or any additions of characters, the second time, as a Technical Note on the subject, which will continue to guide users to the best practice. That would have the most impact long term. A./
Re: No appropriate code point for some Chinese punctuation marks
On Sun, Jul 22, 2012 at 09:43:29AM -0700, Asmus Freytag wrote: Especially in multiscript environment, and those are not that rare, really, it's almost impossible to get such unfications to behave correctly without explicit font binding. And we all know that control of that is elusive in many contexts. It is a quite possible actually, all needed is a text layout engine that does automatic script tagging e.g. Pango and, to some extent, Firefox, and font that provide localised, script-specific punctuation glyphs, and it should just work even with plain text. I've been doing that with Arabic and it works rather reliably. Regards, Khaled
Re: User-Hostile Text Editing (was: Unicode String Models)
On Sun, 22 Jul 2012 08:59:13 +0100 Julian Bradfield jcb+unic...@inf.ed.ac.uk wrote: On 2012-07-21, Richard Wordingham richard.wording...@ntlworld.com wrote: Are there any widely available ways of enabling the deleting of the first character in a default grapheme cluster? What do you mean by widely available? An example would be a technique that worked for many application on a platform, or for several significant applications across most platforms. An example of the former would be an effective per user tailoring of grapheme clusters. A candidate for the latter is Libreoffice's rule that alt+cursor key moves within grapheme clusters rather than moving the point to the start of the next grapheme cluster. (Unfortunately this doesn't even work inside tables, so it doesn't look much of a candidate.) This can be used in the sequence alt/right-arrow rubout. Richard.
Manipulation of System Fonts on Windows 7
I would like to manipulate system fonts on a Windows 7 computer. More precisely, I wish to do the following: 1. Change the font for CJK Unified Ideographs (and CJK punctuation, radicals etc.; maybe the CJK Ideographs Extensions as well?) from the current Japanese-looking one to one in simplified Chinese style, though of course the new system font should also contain traditional characters. 2. Assign a system font for Shavian. Currently boxes/squares are displayed. What I need is: 1. advice on which fonts to choose and 2. a brief tutorial how to safely change fonts system-wide. Although I am aware that this request is somewhat off-topic I am sure that some people here will be able to give me the hints I am looking for. Thanks in advance, Charlie