Re: User-Hostile Text Editing (was: Unicode String Models)

2012-07-22 Thread Julian Bradfield
On 2012-07-21, Richard Wordingham richard.wording...@ntlworld.com wrote:
 Are there any widely available ways of enabling the deleting of the
 first character in a default grapheme cluster?  Having carefully added
 two or more marks to a base character, I find it extremely irritating
 to find I have entered the wrong base character and have to type the
 whole thing again. As one can delete the last character in a cluster,
 why not the first? It's not as though the default grapheme cluster is
 usually thought of as a single character.

What do you mean by widely available?
A decent editor should let you choose whether to break apart clusters
or not. I presume that such editors exist! (Mine always breaks
clusters, but that's because I'm the only user, and I don't care
enough to implement clustering;-) Yudit might be one, but since it
seems to have no documentation, I can't tell.
If yours doesn't, then get on to its authors!


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.




Re: User-Hostile Text Editing

2012-07-22 Thread Richard Wordingham
On Sun, 22 Jul 2012 00:25:07 +
Murray Sargent murr...@exchange.microsoft.com wrote:

 I'd think deleting the first character of a cluster would make a nice
 context-menu option. For example, when you right-click on a cluster,
 the resulting context menu could have an entry like delete first
 character. Maybe other such options could be added as well.

I was thinking of simplifying, rather than adding features.  One could
imagine a whole cluster-editing pop-up, but in the end it might not
make editing any easier.  Mind you, when consonant stacks are treated as
grapheme clusters, it might be useful - Tibetan can have some fairly
large legacy grapheme clusters, and if one tailors aksharas to be
grapheme clusters one can get some monsters in the Tai Tham
script - NA, SAKOT, HIGH TA, MEDIAL RA, AA, SAKOT, LOW YA (visually
the base consonant is the ligature NA, AA, with dependent consonants
arrayed around it) and HIGH HA, SAKOT, NA, E, OA BELOW, I, TONE-1,
SAKOT, LOW YA spring to mind.

Richard.



No appropriate code point for some Chinese punctuation marks

2012-07-22 Thread Gary Kilfear
As a matter of fact, the situation of Chinese punctuation marks is a
mess. Until now we do not have independant symbols for Chinese dashes,
ellipsis, interpunct in Unicode.

In practice, we use three kinds of dashes in Chinese: one is halfwidth,
corresponding to the Latin hyphen symbol; one is one-character-width,
corresponding to the Latin en dash symbol; one is two-character-width,
corresponding to the Latin em dash. The Chinese now always treat
U+2013(en dash) or U+002D(hyphen-minus) as the halfwidth dash, and
assign U+2014(em dash) as their one-character-width dash, and for the
two-character-width dash, they just enter two em dashes. However, it
is just a compromise, since these dashes and hyphen were always
designed for Latin characters typesetting, the horizontal line does
not sit in the middle height of a Hanzi character. And for some
typefaces, two continual em dashes come out a long horizontal line but
a break in the middle, which is not consistent with the appearance of
a two-character-width dash.

Chinese don't have a two-character-width ellipsis(six dots)
either. Actually, we can generate a two-character-width ellipsis with
two continual one-character-ellipsis(three dots) which is well
designed for the position of each dot. But now Chinese use a ugly
hack—they just type two continual '…'(HORIZONTAL ELLIPSIS,
U+2026). As the case of dashes, the dots do not lie in the middle
height of a Hanzi character. Some people will choose the mathematical
operator '⋯'(MIDLINE HORIZONTAL ELLIPSIS, U+22EF) as a substitute, but
a lot of Chinese fonts don't support such a symbol, and most of all,
it is a mathematical operator, not punctuation mark!!

For interpunct, we need a solid dot which sits in both vertical and
horizontal center of the character box. Actually the Katakana
symbol '・'(KATAKANA MIDDLE DOT, U+30FB) is a good implement for
Chinese interpunct, but as the name reveals, it is just a Katakana
symbol, not a common punctuation mark for East Asian characters.

So should we submit a proposal for these Chinese punctuation?


Re: No appropriate code point for some Chinese punctuation marks

2012-07-22 Thread Asmus Freytag

On 7/22/2012 7:08 AM, Gary Kilfear wrote:

 should we submit a proposal for these Chinese punctuation?


My take is that a proposal, with its requirements for evidence and 
samples, it the best way to systematically capture and collect the 
information.


Once everything is on the table, UTC will be in a position to resolve 
these issues.


It  may be, that some characters exist that are perfect fits for some 
required punctuation mark, but have been misunderstood in the user 
community. I suspect that for KATAKANA MIDDLE DOT. Knowing the issue 
would allow this to be better documented.


In other cases UTC would have the ability to review, and either reaffirm 
or revisit certain explicit or implicit unifications. Detailed 
documentation of usage (and examples of success and failure of certain 
approaches taken by users today are essential here).


I'm personally no great fan of unifications of punctuation based on 
basic similarity of shape and shared purpose alone. I believe that 
vertical alignment issues as well as width (or sidebearing) differences, 
if significant and visible, should be considered grounds for disunification.


Especially in multiscript environment, and those are not that rare, 
really, it's almost impossible to get such unfications to behave 
correctly without explicit font binding. And we all know that control of 
that is elusive in many contexts.


Finally, one or the other character may well be missing entirely.

The existing state of affairs is not based on a systematic, complete, 
and detailed review of the use of punctuation in China (or East Asia). 
Such a review is overdue, in fact, and in view, the kind of document 
that would form a suitable base for such a review would be 
indistinguishable from the typical background document for a proposal.


Most helpful would be if this paper could be written like a monograph on 
Chinese Puncutaion marks and Unicode, and would include, on the same 
footing also the material about existing characters that map well to 
specific Chinese punctuation marks.


If that were done, the resulting paper could be used twice. Once to 
support any necessary changes to unifications or any additions of 
characters, the second time, as a Technical Note on the subject, which 
will continue to guide users to the best practice.


That would have the most impact long term.
A./






Re: No appropriate code point for some Chinese punctuation marks

2012-07-22 Thread Khaled Hosny
On Sun, Jul 22, 2012 at 09:43:29AM -0700, Asmus Freytag wrote:
 Especially in multiscript environment, and those are not that rare,
 really, it's almost impossible to get such unfications to behave
 correctly without explicit font binding. And we all know that
 control of that is elusive in many contexts.

It is a quite possible actually, all needed is a text layout engine that
does automatic script tagging e.g. Pango and, to some extent, Firefox,
and font that provide localised, script-specific punctuation glyphs, and
it should just work even with plain text. I've been doing that with
Arabic and it works rather reliably.

Regards,
 Khaled



Re: User-Hostile Text Editing (was: Unicode String Models)

2012-07-22 Thread Richard Wordingham
On Sun, 22 Jul 2012 08:59:13 +0100
Julian Bradfield jcb+unic...@inf.ed.ac.uk wrote:

 On 2012-07-21, Richard Wordingham richard.wording...@ntlworld.com
 wrote:

  Are there any widely available ways of enabling the deleting of the
  first character in a default grapheme cluster?

 What do you mean by widely available?

An example would be a technique that worked for many application on a
platform, or for several significant applications across most
platforms.  An example of the former would be an effective per
user tailoring of grapheme clusters.  A candidate for the latter is
Libreoffice's rule that alt+cursor key moves within grapheme clusters
rather than moving the point to the start of the next grapheme
cluster.  (Unfortunately this doesn't even work inside tables, so it
doesn't look much of a candidate.)  This can be used in the sequence
alt/right-arrow rubout.

Richard.



Manipulation of System Fonts on Windows 7

2012-07-22 Thread Charlie Ruland
I would like to manipulate system fonts on a Windows 7 computer. More 
precisely, I wish to do the following:


1. Change the font for CJK Unified Ideographs (and CJK punctuation, 
radicals etc.; maybe the CJK Ideographs Extensions as well?) from the 
current Japanese-looking one to one in simplified Chinese style, though 
of course the new system font should also contain traditional characters.


2. Assign a system font for Shavian. Currently boxes/squares are displayed.

What I need is: 1. advice on which fonts to choose and 2. a brief 
tutorial how to safely change fonts system-wide.


Although I am aware that this request is somewhat off-topic I am sure 
that some people here will be able to give me the hints I am looking for.


Thanks in advance,

Charlie