On 2003/10/04 13:44, Eran Tromer wrote:
OOe 1.1 seems to have the usual hebrew-hyphen-number problem
("H-5" renders as "H5-"), which necessitates typing of the logically incorrect "H5-" and causes bad importing of newer MS Word documents.


http://www.openoffice.org/issues/show_bug.cgi?id=19848

What's the proper way to handle this? Using "hebrew hyphens" or something of the sorts?

The following are my conclusions from the discussion here, as well as the following threads.
http://bugzilla.mozilla.org/show_bug.cgi?id=73251#c32
http://lists.w3.org/Archives/Public/www-international/2003JulSep/0084.html
http://mozilla.org.il/board/viewtopic.php?p=1790#1790


I have cross-posted this summary to the OpenOffice IssueZilla [sic] at
http://www.openoffice.org/issues/show_bug.cgi?id=19848
You can point out my (undoubtedly numerous and grave) errors there, but please don't spam it unnecessarily.


I see two practical alternatives to solving the problem.

1. Break compatibility with the Unicode algorithm. Starting with
Office 2000, Microsoft uses a different algorithm that fixes this
problem (I'm not aware of any other deviation from Unicode) -- use
that instead.

-or-

2. a. During text input, use heuristics to produce an encoding that's
rendered as desired. In the case of hebrew+minus+digit, instead of a
plain HYPHEN-MINUS insert some appropriate Unicode sequences such as
RLE+(HYPHEN-MINUS)+PDF or RLE+(NON-BREAKING HYPHEN)+PDF (see note below).
   b. Do something smart about those sequences during editing (e.g.,
treat them as one logical character).
   c. In the MS Office import filters, add RLE+PDF where necessary so
as to simulate Microsoft's algorithm.
   d. Likewise, kludge the MS Office output filters as necessary.

Both seem rather horrible, but is the current situation. The
hebrew+hyphen+digit pattern occurs in many (perhaps most) Hebrew
documents, so its being rendered incorrectly in legacy documents is a
major issue. As for new documents, "enter a space between the minus
and the number" is unsatisfactory since the result is typographically
appalling, especially if the space induces a line break.

A couple of notes on 2.a. above:
The sequence (HYPHEN-MINUS)+LRM can be used in RTL context, but breaks
things in LTR context.
Arguably, the Right Thing is to use the single character U+05BE
(HEBREW PUNCTUATION MAQAF). Alas, this seems impractical as the
character is misrendered or missing in most fonts. Also, Maqaf is not
represented on keyboards and is missing from the iso8859-8 charset
(though it's present in windows-1255). Moreover, the widespread use of
HYPHEN-MINUS instead of the Maqaf character has virtually eliminated
the latter from common texts -- it seems to be perceived as a quaint
historical quirk that is bearable in "professional" typesetting, but
would look quite strange in (say) everyday correspondence.


Regards, Eran



=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Reply via email to