Re: [emacs-bidi] UTR#9 - Unicode BiDi (was Re: OpenOffice BiDi kudos) (fwd)

2003-10-11 Thread Behdad Esfahbod


behdad,
who is going to study after finishing this mail.

-- Forwarded message --
Date: Sat, 11 Oct 2003 15:54:31 -0400
From: Eli Zaretskii [EMAIL PROTECTED]
To: Behdad Esfahbod [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: [emacs-bidi] UTR#9 - Unicode BiDi (was Re: OpenOffice BiDi
kudos)

 Date: Sat, 11 Oct 2003 04:15:13 -0400
 From: Behdad Esfahbod [EMAIL PROTECTED]

 Is it true that your implementation of Unicode Bidi algorithm
 does not follow the UTR#9, with respect to handligh dash?  Just
 wanted to make sure this is not true, otherwise, please consider
 following the standard.

Handa-san is currently trying to plug the sequential implementation of
UAX#9 that I wrote into the Emacs display code.  The code I wrote
renders H-5 as -5H, as per UAX#9.  One needs to type H-{RLM}5
to get the H-5 result that most Hebrew users want.

I guess we will need to get used to type RLM and LRM in similar
situations, since we must be UAX#9 compliant, and since UAX#9 results
in such madness in quite a few cases like this, sigh.


=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



UTR#9 - Unicode BiDi (was Re: OpenOffice BiDi kudos)

2003-10-08 Thread Ehud Karni
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Sat, 04 Oct 2003 15:01:04 +0200, Shachar Shemesh [EMAIL PROTECTED] wrote:

 Eran Tromer wrote:

  OOe 1.1 seems to have the usual hebrew-hyphen-number problem
  (H-5 renders as H5-), which necessitates typing of the logically
  incorrect H5- and causes bad importing of newer MS Word documents.

 I'm not sure how to tackle this particular problem. I think the best
 place to fix it would be at the root of the problem - the Unicode BiDi
 algorithm. I *think* I have a reasonably portable solution to this issue.

 I guess it's time to register with another forum

This is a known issue with Unicode BiDi. It arises because we use the -
character for both minus and hyphen. When one wants to connects letters
with numbers one is using a HYPHEN and wants it to appear as 5-word.
When one wants to write a negative number one uses a MINUS SIGN and
would like it to appear as -5 word. The Unicode wise men have ignored
the 1st case (or require the use of a special Hebrew MAKAF). I have
pointed this and some other problem at the m17n2000 conference. (See
http://www.m17n.org/m17n2000_all_but_registration/proceedings/ehud/
See slide no. 10). I proposed my solution (slides 11-15) and this
algorithm was implemented by Kenichi Handa in his Emacs-BiDi (see
notes on http://www.m17n.org/emacs-bidi/ ).

Ehud.


- --
 Ehud Karni   Tel: +972-3-7966-561  /\
 Mivtach - Simon  Fax: +972-3-7966-667  \ /  ASCII Ribbon Campaign
 Insurance agencies   (USA) voice mail and   X   Against   HTML   Mail
 http://www.mvs.co.il  FAX:  1-815-5509341  / \
 GnuPG: 98EA398D http://www.keyserver.net/Better Safe Than Sorry
-BEGIN PGP SIGNATURE-
Comment: use http://www.keyserver.net/ to get my key (and others)

iD8DBQE/hFA7LFvTvpjqOY0RAiGKAJ0Y6lV+IaWZPqLhGwOTVa3gDv/gGACfa3Br
KaVInTd6je8gWB/26loM1+A=
=904+
-END PGP SIGNATURE-

=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: UTR#9 - Unicode BiDi (was Re: OpenOffice BiDi kudos)

2003-10-08 Thread Eran Tromer
On 2003/10/08 19:58, Ehud Karni wrote:
This is a known issue with Unicode BiDi. It arises because we use the -
character for both minus and hyphen. When one wants to connects letters
with numbers one is using a HYPHEN and wants it to appear as 5-word.
When one wants to write a negative number one uses a MINUS SIGN and
would like it to appear as -5 word. The Unicode wise men have ignored
the 1st case (or require the use of a special Hebrew MAKAF). 
Won't a regular U+2010 HYPHEN (instead of the U+05BE maqaf) do the job, 
proper Hebrew typography aside? I've tested it on fribidi and it's 
rendered correctly in both LTR and RTL context.

I proposed my solution (slides 11-15) and this
algorithm was implemented by Kenichi Handa in his Emacs-BiDi (see
notes on http://www.m17n.org/emacs-bidi/ ).
As you note, your algorithm is incompatible with Unicode's.
All means are valid for converting legacy text, but there's a strong 
case for insisting that all newly created text must be rendered 
correctly by the standard algorithm.

This, of course, leaves open the problem of distinguishing the two types 
of texts. It may be easy when importing a file since you know its type, 
but what do you do with a HYPHEN-MINUS when pasting from the clipboard?

Maybe the right strategy is to convert all U+002D HYPHEN-MINUS to either 
U+2010 HYPHEN or to U+2212 MINUS SIGN upon import/paste/keypress (via 
appropriate heuristics), so that HYPHEN-MINUS never occurs in the 
output. Here the only breakage is for legacy/external texts for which 
the heuristic fails.

  Eran



=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]


Re: UTR#9 - Unicode BiDi (was Re: OpenOffice BiDi kudos)

2003-10-08 Thread Beni Cherniavsky
Eran Tromer wrote on 2003-10-08:

 As you note, your algorithm is incompatible with Unicode's.
 All means are valid for converting legacy text, but there's a strong
 case for insisting that all newly created text must be rendered
 correctly by the standard algorithm.

 This, of course, leaves open the problem of distinguishing the two types
 of texts. It may be easy when importing a file since you know its type,
 but what do you do with a HYPHEN-MINUS when pasting from the clipboard?

 Maybe the right strategy is to convert all U+002D HYPHEN-MINUS to either
 U+2010 HYPHEN or to U+2212 MINUS SIGN upon import/paste/keypress (via
 appropriate heuristics), so that HYPHEN-MINUS never occurs in the
 output. Here the only breakage is for legacy/external texts for which
 the heuristic fails.

I think modifying pasted text is wrong.  Instead you should fix the
text from which you copy, by whatever means needed.

-- 
Beni Cherniavsky [EMAIL PROTECTED]


=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]