On 05/30/2010 08:46 PM, Gaal Yahas wrote:
> The other is technical. It is simply impossible to get all email
> clients to work correctly in bidi languages using only plain text.
Not impossible. Just "not simple at the moment", as we can see even in
Oron's message, which does not mix LTR and RTL text in the same
line (in
thunderbird, for example, the semicolons/colons display in the wrong
side of the code/hebrew when you use the keyboard shortcut to
switch to
RTL/LTR mode, respectively. You can never see them both in the same
window without any garbling).
>
I take this mostly back. I misremembered the spec's treatment of
paragraphs: they reset bidi context, which is fine (UAX #9); the
problem lies in 5.8 which doesn't make the definition of a paragraph
separator bulletproof. I suppose you can start every line with either
RLE or LRE and always emit a PDF before linebreaks, to be safe. This
is very cumbersome.
In the context I was talking about - a script to be run automatically
when submitting "rich text" email as plaintext, this is still OK.
Scripts do not complain about cumbersome keyboard mappings.
It could also make sure that MUA's (compliant ones at least) dont mess
up the paragraph structure by using LS (U+2028) and PS (U+2029) instead
of CR/LF or whatever.
To really solve the garbling, one has to use unicode control
characters.
Samples:
הנה קטעי הקוד של גבור--->:
In my viewer at least, the comment opener here seems wrong (mirrored).
Either that, or the closing part of the tag is wrong. I didn't bother
inspecting your source (because, here's another problem, bidi marks
are invisible and difficult to debug). You're probably aware that in
certain cases bidi contexts do not fully reset after PDFs and need a
RLM or LRM back in the document directionality for things to work out.
As intended. This was ment to be "example starts here+arrow+colon"
(simple ASCII graphics, not conforming to any kind of SGML-ish syntax).
I also did not bless the indicator line itself with any kind of magic
unicode (I probably should have, because in the quoted line of your
reply it did get reversed).
הגדרת משתנה סקלרי:
my $x = 42;
הגדרת מערך:
my @x = qw(4 2);
This part is fine.
ושיהיה קצת יותר מעניין, בשורה אחת--->:
Comment trouble.
נגדיר משתנה סקלרי ע"י my $x = 42; ואח"כ עוד משהו.
Looks good.
Look mom, no HTML!
Of course, I "cheated" by using characters which are not available in
common keyboard layouts. The point is that one could write simple
scripts to do that automatically in the MUA (e.g. as some plugin
activated when submitting "rich text" as plaintext).
Once such a solution is out there, it should be easier to spread it to
other agents (maybe even to gmail).
My point, apart from the obvious fact that directionality marks are
hard to author correctly, was that some of their interpretation is
underspecified so receiving MUAs may still behave differently.
Directionality should be well specified and consistent as long as the
partition to paragraphs and the setting of the paragraphs'
directionality is fixed. Whatever heuristics MUAs apply to guess these,
one can at least avoid the garbling within each line by using explicit
embeddings as you suggested (indeed this is how I did that).
> Alignment is the least of your problems.
But alignment is the only part of the problem that *can not* be solved
in plaintext.
Simply due to the fact that plaintext does not provide a way to encode
that information (so user agents use their own algorithms to
decide, if
at all, and you can not rely on having it displayed the same way
everywhere).
This insufficient determination is compounded by heuristic solutions.
HTML-capable viewers may try to do the right thing with completely
unmarked text, but that would be a guess and will occasionally be
wrong (and wrong differently among viewers). It also means that they
have to scan the entire document (or a reasonable portion of it at
least) to establish that it indeed contains RTL characters but no bidi
marks.
Tightening the specs is the right technical solution, but doing that +
getting MUAs to comply is difficult.
> If you mix Hebrew and English in the same paragraph, it is almost
> certain that garbling will occur. In prose this is just very
annoying.
> In technical discussion it can render text completely unreadable.
>
> Examples of garbling include reversed parentheses, misplaced
> punctuation, reversed number segments. These have potential to
do real
> damage to coherence of the text. Unicode offers some technology to
> help with this, but it is just not sufficient for email when used in
> plain text. There are underspecified features that are interpreted
> differently by clients, and regardless, these mechanisms are hard to
> use, even for a technical user.
Well, I still have to see if my examples above work or not
(thuderbird/icedove is known to do some garbling of its own if you
choose the wrong setup option).
Unicode does have enough support to prevent all the garbling you
mention
(excluding alignment). The problem is that user agents do not
insert the
proper unicode. The community could help by writing plugins, but
we are
too lazy and prefer to revert to an "evil" but working solution
such as
HTML (at least until someone else writes the script).
The proper Unicode is not as straightforward to pick as you make it.
Not straightforward to pick manually maybe. But a script could be
fine-tuned to the point where it would display correctly on all relevant
viewers. However, I do not see any way to make *alignment* work
consistently with plain text alone.
AA
p.s. I hope that if nothing else, this conversation provides some answer
to Mickael's question about the specific problems that HTML emails were
supposed to solve.
_______________________________________________
Perl mailing list
[email protected]
http://mail.perl.org.il/mailman/listinfo/perl