Re: [emacs-bidi] UTR#9 - Unicode BiDi (was Re: OpenOffice BiDi kudos) (fwd)
behdad, who is going to study after finishing this mail. -- Forwarded message -- Date: Sat, 11 Oct 2003 15:54:31 -0400 From: Eli Zaretskii [EMAIL PROTECTED] To: Behdad Esfahbod [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: [emacs-bidi] UTR#9 - Unicode BiDi (was Re: OpenOffice BiDi kudos) Date: Sat, 11 Oct 2003 04:15:13 -0400 From: Behdad Esfahbod [EMAIL PROTECTED] Is it true that your implementation of Unicode Bidi algorithm does not follow the UTR#9, with respect to handligh dash? Just wanted to make sure this is not true, otherwise, please consider following the standard. Handa-san is currently trying to plug the sequential implementation of UAX#9 that I wrote into the Emacs display code. The code I wrote renders H-5 as -5H, as per UAX#9. One needs to type H-{RLM}5 to get the H-5 result that most Hebrew users want. I guess we will need to get used to type RLM and LRM in similar situations, since we must be UAX#9 compliant, and since UAX#9 results in such madness in quite a few cases like this, sigh. = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
UTR#9 - Unicode BiDi (was Re: OpenOffice BiDi kudos)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Sat, 04 Oct 2003 15:01:04 +0200, Shachar Shemesh [EMAIL PROTECTED] wrote: Eran Tromer wrote: OOe 1.1 seems to have the usual hebrew-hyphen-number problem (H-5 renders as H5-), which necessitates typing of the logically incorrect H5- and causes bad importing of newer MS Word documents. I'm not sure how to tackle this particular problem. I think the best place to fix it would be at the root of the problem - the Unicode BiDi algorithm. I *think* I have a reasonably portable solution to this issue. I guess it's time to register with another forum This is a known issue with Unicode BiDi. It arises because we use the - character for both minus and hyphen. When one wants to connects letters with numbers one is using a HYPHEN and wants it to appear as 5-word. When one wants to write a negative number one uses a MINUS SIGN and would like it to appear as -5 word. The Unicode wise men have ignored the 1st case (or require the use of a special Hebrew MAKAF). I have pointed this and some other problem at the m17n2000 conference. (See http://www.m17n.org/m17n2000_all_but_registration/proceedings/ehud/ See slide no. 10). I proposed my solution (slides 11-15) and this algorithm was implemented by Kenichi Handa in his Emacs-BiDi (see notes on http://www.m17n.org/emacs-bidi/ ). Ehud. - -- Ehud Karni Tel: +972-3-7966-561 /\ Mivtach - Simon Fax: +972-3-7966-667 \ / ASCII Ribbon Campaign Insurance agencies (USA) voice mail and X Against HTML Mail http://www.mvs.co.il FAX: 1-815-5509341 / \ GnuPG: 98EA398D http://www.keyserver.net/Better Safe Than Sorry -BEGIN PGP SIGNATURE- Comment: use http://www.keyserver.net/ to get my key (and others) iD8DBQE/hFA7LFvTvpjqOY0RAiGKAJ0Y6lV+IaWZPqLhGwOTVa3gDv/gGACfa3Br KaVInTd6je8gWB/26loM1+A= =904+ -END PGP SIGNATURE- = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: UTR#9 - Unicode BiDi (was Re: OpenOffice BiDi kudos)
On 2003/10/08 19:58, Ehud Karni wrote: This is a known issue with Unicode BiDi. It arises because we use the - character for both minus and hyphen. When one wants to connects letters with numbers one is using a HYPHEN and wants it to appear as 5-word. When one wants to write a negative number one uses a MINUS SIGN and would like it to appear as -5 word. The Unicode wise men have ignored the 1st case (or require the use of a special Hebrew MAKAF). Won't a regular U+2010 HYPHEN (instead of the U+05BE maqaf) do the job, proper Hebrew typography aside? I've tested it on fribidi and it's rendered correctly in both LTR and RTL context. I proposed my solution (slides 11-15) and this algorithm was implemented by Kenichi Handa in his Emacs-BiDi (see notes on http://www.m17n.org/emacs-bidi/ ). As you note, your algorithm is incompatible with Unicode's. All means are valid for converting legacy text, but there's a strong case for insisting that all newly created text must be rendered correctly by the standard algorithm. This, of course, leaves open the problem of distinguishing the two types of texts. It may be easy when importing a file since you know its type, but what do you do with a HYPHEN-MINUS when pasting from the clipboard? Maybe the right strategy is to convert all U+002D HYPHEN-MINUS to either U+2010 HYPHEN or to U+2212 MINUS SIGN upon import/paste/keypress (via appropriate heuristics), so that HYPHEN-MINUS never occurs in the output. Here the only breakage is for legacy/external texts for which the heuristic fails. Eran = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: UTR#9 - Unicode BiDi (was Re: OpenOffice BiDi kudos)
Eran Tromer wrote on 2003-10-08: As you note, your algorithm is incompatible with Unicode's. All means are valid for converting legacy text, but there's a strong case for insisting that all newly created text must be rendered correctly by the standard algorithm. This, of course, leaves open the problem of distinguishing the two types of texts. It may be easy when importing a file since you know its type, but what do you do with a HYPHEN-MINUS when pasting from the clipboard? Maybe the right strategy is to convert all U+002D HYPHEN-MINUS to either U+2010 HYPHEN or to U+2212 MINUS SIGN upon import/paste/keypress (via appropriate heuristics), so that HYPHEN-MINUS never occurs in the output. Here the only breakage is for legacy/external texts for which the heuristic fails. I think modifying pasted text is wrong. Instead you should fix the text from which you copy, by whatever means needed. -- Beni Cherniavsky [EMAIL PROTECTED] = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]