On Mon, May 20, 2002, Matitiahu Allouche wrote about "Re: official hebrew in Linux-IL 
mailing lists?":
> According to Unicode, the paragraph embedding level is computed anew for 
> each block.  A block is delimited by the start/end of text, and by Block 
> Separators.  There are no Block Separators within ISO-8859 code pages.  It 
> would be up to applications to recompute the direction for each line, or 
> sentence, or paragraph or whatever units make sense for them.

This was exactly my point. When you have a iso-8859-8-i email, what
are "blocks"? If the mail reader and writer don't agree on the same
definition of blocks, there can be problems.

My bidiv heuristics are as follows: a new "block" on an empty line. All
the lines in a block are given the same base direction, determined by the
first character that has a direction in the first line of the block. If
none of the characters of that first line has a direction, I use the previous
block's direction for that line, and continue to the next line.

These heuristics are necessary for sensibly formatting email (or other
plain text) that might contain blocks of English text, such as headers,
signatures, included code, and so on. I have no idea what heuristics
Microsoft Outlook uses, for example.

-- 
Nadav Har'El                        |        Monday, May 20 2002, 9 Sivan 5762
[EMAIL PROTECTED]             |-----------------------------------------
Phone: +972-53-245868, ICQ 13349191 |This '|' is not a pipe.
http://nadav.harel.org.il           |

=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

Reply via email to