> GL: Non-breaking ("Glue") (XB/XA) (Non-tailorable)
 > Non-breaking characters prohibit breaks on either side, but that
prohibition
 > can be overridden by  SP or ZW. In particular, when NBSP follows
SPACE,
 > there is a break opportunity after the SPACE and NBSP will go as
visible
 > space onto the next line. See also WJ. The following lists the
characters
 > of line break class GL with additional description.

Oh, that's right. I had forgotten about this exception stated in UAX
14. I guess it makes sense, in a way, to have different character
combinations to behave differently. On the other hand, the exception
threatens to dilute the basic idea of non-breaking characters and make
things more complicated without offering any reasoning. Even
OpenOffice Writer (which usually tends to break lines rather sensibly)
appears confused by the inconsistency -- to the extent that it allows
breaking between a regular space and a _word_joiner_ (U+2060) although
UAX 14 specifically states that the word joiner takes precedence over
the space.*
*See: http://www.unicode.org/reports/tr14/tr14-20.html#WJ

I can't see any harm (apart from the confusing factor) caused by the
SPACE+NBSP exception in itself, but it would be interesting to hear
whether there was any real-life case where you actually wanted a
double space to be breakable. Remarkably, even the quote above doesn't
actually say that the prohibition _must_ (nor that it should) be
overridden but only that it _can_ be. So UAX 14 seems to leave the
final decision on this matter to implementors.

Another, seemingly more important case has been added to the proposed
update on UAX 14, considering a non-breaking _hyphen_ that follows a
space. This time there is even some reasoning that refers to words
with a hyphen as the first character (the special case described,
albeit on a very abstract level, is from Finnish ortography but
comparable cases -- such as "suffix -ed" -- may sometimes occur even
in English, as well as in other languages). In such a situation, UAX
14 recommends the authors to insert a non-breaking hyphen instead of a
regular hyphen, and consequently a line break should be allowed
between the hyphen and the preceding space.*
*See: http://www.unicode.org/reports/tr14/tr14-20.html#Hyphen

Actually, as I have already suggested in this thread, it is more
logical to use a regular hyphen in this kind of a situation, since it
is apparent that the preferred break point is after the space and not
after the hyphen. It would be absurd to leave a word-starting hyphen
orphaned at the end of a line. (This is an example where OpenOffice
gets it right, while many other applications -- such as IE, Opera and
Word -- fail miserably; Word even tends to auto-replace the hyphen
with an en-dash, which is totally unacceptable in many -- if not most
-- cases.)

Another new exception added to UAX 14 is the broken double hyphen that
may occur in Polish and Portuguese ortographies. In this case, two
hyphens are shown only if there is a line break in between; normally
there is just a single hyphen visible. For example, the Polish
compound word "czerwono-niebieska" should be broken in the middle so
that there's a hyphen both at the end of the first line and at the
beginning of the second line:

czerwono-
-niebieska

In order to produce this effect, UAX 14 recommends the authors to use
the combination of a soft hyphen and a non-breaking hyphen. Again it
might be considered more logical to use a regular hyphen instead of a
non-breaking hyphen, since a soft hyphen was supposed to show a
preferred break point anyway* -- but of course this would be a little
problematic if some browsers didn't recognize the preferred break
point and broke the word after the regular hyphen instead.
*See: http://www.unicode.org/reports/tr14/tr14-20.html#SoftHyphen

Thus, a line break should be allowed even between a soft hyphen and a
non-breaking hyphen.

Nevertheless, generally I'd expect a non-breaking character to
prohibit breaks both before and after. For example, I'd certainly not
expect a line break between a hyphen-minus and a no-break space (this
is another example where OpenOffice gets it right). If the breaking
characters did by default take precedence over the non-breaking
characters, it would be pointless for UAX 14 to state that the non-
breaking characters prohibited breaks before as well as after.

It is interesting that, in addition to the ASCII hyphen-minus (U
+002D), Unicode specifies even a "regular" hyphen (U+2010). It is
rarely used in real life (since it is clumsy to produce with a typical
computer keyboard and downright harmful if the data stray into an
application that doesn't recognize it), and an extra hyphen character
seems redundant unless it is treated differently from the traditional
hyphen-minus. According to UAX 14, the main difference seems to be
that the hyphen-minus requires additional context analysis in order to
be able to distinguish its two usages as a hyphen and as a minus.

As the "regular" hyphen is actually quite a marginal character
nowadays, it may be considered tempting to allow it to break always,
irrespective of the context (like IE, Opera and Word seem to treat
even the hyphen-minus). This would offer authors a way to ignore some
of the traditional principles of Western typography if it was
considered beneficial in a specific case. However, this could cause
troubles if one day the "regular" hyphen did become the default hyphen
character used in computer applications. As the preferred hyphen
character for the future, its default breaking behavior should rather
be as optimal as possible. We have already seen more than enough
negligent solutions in digitalized typography.

--
Simo Kaupinmäki

_______________________________________________
dev-tech-layout mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-tech-layout

Reply via email to