Re: [Israel.pm] multipart/alternative added

Amit Aronovitch Sun, 30 May 2010 14:01:09 -0700

On 05/30/2010 08:46 PM, Gaal Yahas wrote:

    > The other is technical. It is simply impossible to get all email
    > clients to work correctly in bidi languages using only plain text.

    Not impossible. Just "not simple at the moment", as we can see even in
    Oron's message, which does not mix LTR and RTL text in the same
    line (in
    thunderbird, for example, the semicolons/colons display in the wrong
    side of the code/hebrew when you use the keyboard shortcut to
    switch to
    RTL/LTR mode, respectively. You can never see them both in the same
    window without any garbling).

>
I take this mostly back. I misremembered the spec's treatment ofparagraphs: they reset bidi context, which is fine (UAX #9); theproblem lies in 5.8 which doesn't make the definition of a paragraphseparator bulletproof. I suppose you can start every line with eitherRLE or LRE and always emit a PDF before linebreaks, to be safe. Thisis very cumbersome.

In the context I was talking about - a script to be run automaticallywhen submitting "rich text" email as plaintext, this is still OK.Scripts do not complain about cumbersome keyboard mappings.It could also make sure that MUA's (compliant ones at least) dont messup the paragraph structure by using LS (U+2028) and PS (U+2029) insteadof CR/LF or whatever.

    To really solve the garbling, one has to use unicode control
    characters.
    Samples:

    הנה קטעי הקוד של גבור--->:
In my viewer at least, the comment opener here seems wrong (mirrored).Either that, or the closing part of the tag is wrong. I didn't botherinspecting your source (because, here's another problem, bidi marksare invisible and difficult to debug). You're probably aware that incertain cases bidi contexts do not fully reset after PDFs and need aRLM or LRM back in the document directionality for things to work out.

As intended. This was ment to be "example starts here+arrow+colon"(simple ASCII graphics, not conforming to any kind of SGML-ish syntax).I also did not bless the indicator line itself with any kind of magicunicode (I probably should have, because in the quoted line of yourreply it did get reversed).

    ‫הגדרת משתנה סקלרי:‬
    ‪my $x = 42;‬
    ‫הגדרת מערך:‬
    ‪my @x = qw(4 2);‬

This part is fine.

    ושיהיה קצת יותר מעניין, בשורה אחת--->:

Comment trouble.

    ‫נגדיר משתנה סקלרי ע"י ‪my $x = 42;‬ ואח"כ עוד משהו.‬

Looks good.

    ‪Look mom, no HTML!‬

    Of course, I "cheated" by using characters which are not available in
    common keyboard layouts.  The point is that one could write simple
    scripts to do that automatically in the MUA (e.g. as some plugin
    activated when submitting "rich text" as plaintext).
    Once such a solution is out there, it should be easier to spread it to
    other agents (maybe even to gmail).

My point, apart from the obvious fact that directionality marks arehard to author correctly, was that some of their interpretation isunderspecified so receiving MUAs may still behave differently.

Directionality should be well specified and consistent as long as thepartition to paragraphs and the setting of the paragraphs'directionality is fixed. Whatever heuristics MUAs apply to guess these,one can at least avoid the garbling within each line by using explicitembeddings as you suggested (indeed this is how I did that).

    > Alignment is the least of your problems.

    But alignment is the only part of the problem that *can not* be solved
    in plaintext.
    Simply due to the fact that plaintext does not provide a way to encode
    that information (so user agents use their own algorithms to
    decide, if
    at all, and you can not rely on having it displayed the same way
    everywhere).

This insufficient determination is compounded by heuristic solutions.HTML-capable viewers may try to do the right thing with completelyunmarked text, but that would be a guess and will occasionally bewrong (and wrong differently among viewers). It also means that theyhave to scan the entire document (or a reasonable portion of it atleast) to establish that it indeed contains RTL characters but no bidimarks.

Tightening the specs is the right technical solution, but doing that +getting MUAs to comply is difficult.


    > If you mix Hebrew and English in the same paragraph, it is almost
    > certain that garbling will occur. In prose this is just very
    annoying.
    > In technical discussion it can render text completely unreadable.
    >
    > Examples of garbling include reversed parentheses, misplaced
    > punctuation, reversed number segments. These have potential to
    do real
    > damage to coherence of the text. Unicode offers some technology to
    > help with this, but it is just not sufficient for email when used in
    > plain text. There are underspecified features that are interpreted
    > differently by clients, and regardless, these mechanisms are hard to
    > use, even for a technical user.

    Well, I still have to see if my examples above work or not
    (thuderbird/icedove is known to do some garbling of its own if you
    choose the wrong setup option).
    Unicode does have enough support to prevent all the garbling you
    mention
    (excluding alignment). The problem is that user agents do not
    insert the
    proper unicode. The community could help by writing plugins, but
    we are
    too lazy and prefer to revert to an "evil" but working solution
    such as
    HTML (at least until someone else writes the script).

The proper Unicode is not as straightforward to pick as you make it.

Not straightforward to pick manually maybe. But a script could befine-tuned to the point where it would display correctly on all relevantviewers. However, I do not see any way to make *alignment* workconsistently with plain text alone.

AA

p.s. I hope that if nothing else, this conversation provides some answerto Mickael's question about the specific problems that HTML emails weresupposed to solve.

_______________________________________________
Perl mailing list
[email protected]
http://mail.perl.org.il/mailman/listinfo/perl

Re: [Israel.pm] multipart/alternative added

Reply via email to