On 05/30/2010 08:46 PM, Gaal Yahas wrote:

    > The other is technical. It is simply impossible to get all email
    > clients to work correctly in bidi languages using only plain text.

    Not impossible. Just "not simple at the moment", as we can see even in
    Oron's message, which does not mix LTR and RTL text in the same
    line (in
    thunderbird, for example, the semicolons/colons display in the wrong
    side of the code/hebrew when you use the keyboard shortcut to
    switch to
    RTL/LTR mode, respectively. You can never see them both in the same
    window without any garbling).

>

I take this mostly back. I misremembered the spec's treatment of paragraphs: they reset bidi context, which is fine (UAX #9); the problem lies in 5.8 which doesn't make the definition of a paragraph separator bulletproof. I suppose you can start every line with either RLE or LRE and always emit a PDF before linebreaks, to be safe. This is very cumbersome.

In the context I was talking about - a script to be run automatically when submitting "rich text" email as plaintext, this is still OK. Scripts do not complain about cumbersome keyboard mappings. It could also make sure that MUA's (compliant ones at least) dont mess up the paragraph structure by using LS (U+2028) and PS (U+2029) instead of CR/LF or whatever.

    To really solve the garbling, one has to use unicode control
    characters.
    Samples:

    הנה קטעי הקוד של גבור--->:

In my viewer at least, the comment opener here seems wrong (mirrored). Either that, or the closing part of the tag is wrong. I didn't bother inspecting your source (because, here's another problem, bidi marks are invisible and difficult to debug). You're probably aware that in certain cases bidi contexts do not fully reset after PDFs and need a RLM or LRM back in the document directionality for things to work out.

As intended. This was ment to be "example starts here+arrow+colon" (simple ASCII graphics, not conforming to any kind of SGML-ish syntax). I also did not bless the indicator line itself with any kind of magic unicode (I probably should have, because in the quoted line of your reply it did get reversed).

    ‫הגדרת משתנה סקלרי:‬
    ‪my $x = 42;‬
    ‫הגדרת מערך:‬
    ‪my @x = qw(4 2);‬

This part is fine.

    ושיהיה קצת יותר מעניין, בשורה אחת--->:

Comment trouble.

    ‫נגדיר משתנה סקלרי ע"י ‪my $x = 42;‬ ואח"כ עוד משהו.‬

Looks good.

    ‪Look mom, no HTML!‬

    Of course, I "cheated" by using characters which are not available in
    common keyboard layouts.  The point is that one could write simple
    scripts to do that automatically in the MUA (e.g. as some plugin
    activated when submitting "rich text" as plaintext).
    Once such a solution is out there, it should be easier to spread it to
    other agents (maybe even to gmail).

My point, apart from the obvious fact that directionality marks are hard to author correctly, was that some of their interpretation is underspecified so receiving MUAs may still behave differently.

Directionality should be well specified and consistent as long as the partition to paragraphs and the setting of the paragraphs' directionality is fixed. Whatever heuristics MUAs apply to guess these, one can at least avoid the garbling within each line by using explicit embeddings as you suggested (indeed this is how I did that).

    > Alignment is the least of your problems.

    But alignment is the only part of the problem that *can not* be solved
    in plaintext.
    Simply due to the fact that plaintext does not provide a way to encode
    that information (so user agents use their own algorithms to
    decide, if
    at all, and you can not rely on having it displayed the same way
    everywhere).

This insufficient determination is compounded by heuristic solutions. HTML-capable viewers may try to do the right thing with completely unmarked text, but that would be a guess and will occasionally be wrong (and wrong differently among viewers). It also means that they have to scan the entire document (or a reasonable portion of it at least) to establish that it indeed contains RTL characters but no bidi marks.

Tightening the specs is the right technical solution, but doing that + getting MUAs to comply is difficult.

    > If you mix Hebrew and English in the same paragraph, it is almost
    > certain that garbling will occur. In prose this is just very
    annoying.
    > In technical discussion it can render text completely unreadable.
    >
    > Examples of garbling include reversed parentheses, misplaced
    > punctuation, reversed number segments. These have potential to
    do real
    > damage to coherence of the text. Unicode offers some technology to
    > help with this, but it is just not sufficient for email when used in
    > plain text. There are underspecified features that are interpreted
    > differently by clients, and regardless, these mechanisms are hard to
    > use, even for a technical user.

    Well, I still have to see if my examples above work or not
    (thuderbird/icedove is known to do some garbling of its own if you
    choose the wrong setup option).
    Unicode does have enough support to prevent all the garbling you
    mention
    (excluding alignment). The problem is that user agents do not
    insert the
    proper unicode. The community could help by writing plugins, but
    we are
    too lazy and prefer to revert to an "evil" but working solution
    such as
    HTML (at least until someone else writes the script).

The proper Unicode is not as straightforward to pick as you make it.

Not straightforward to pick manually maybe. But a script could be fine-tuned to the point where it would display correctly on all relevant viewers. However, I do not see any way to make *alignment* work consistently with plain text alone.

    AA

p.s. I hope that if nothing else, this conversation provides some answer to Mickael's question about the specific problems that HTML emails were supposed to solve.

_______________________________________________
Perl mailing list
[email protected]
http://mail.perl.org.il/mailman/listinfo/perl

Reply via email to