Today I forwarded a short scientific report from a colleague to our internal
mailing list. I had the choice between sending it as it was written, i.e. as
a MS Word attached document, or to send directly from a copy/paste of the
content of the document and its structure and attributes to my HTML mailer.
I choosed HTML but as I am also interested by bandwith consideration, I
looked at both sources: the  plaintext file is 28,977 characters long and
the HTML is 39,755 characters. It is 37% more. It is not 3 times bigger as
it was said on this list.
I dropped the logo which is not essential (HTML has the "Alt" attribute
which replaces the logo with some text instead of an image).

I know that this simple example is not a statistics, but the original text
was not written by myself and I simply used the actual tools that exist on
my machine to copy and paste.

Looking at both sources HTML and ASCII I discovered that the copy and paste
action (on Windows) output quite a nice HTML source without too much garbage
like <P>&nbsp;</P>, or useless fonts. The original text was well structured
with the use of enumerated lists and paragraphs.

Therefore, a 37% increase is acceptable in my view.

Let me mention that the test was done on a French text with a lot of
"e-acute". And as you may know, some of the american gateways still do not
accept 8 bit characters and either refuse the message or truncate the 8th
bit.

Thus, by default, mailers still send plaintexts using the so-called
"Quoted-printable" Mime encoding (for example a e-acute is coded with 3
characters: equal, upper E, nine) and send HTML texts using the litteral
expressions (for example a e-acute is coded with 8 characters: commercial
and, e, a, c, u, t, e, semi-colon).

Because of this 7 bits restriction caused by people promoting ASCII
(American standard characters for ...) as universal, the increase is about
15%; it is about half of the increase from plaintext to HTML as shown on the
following table (I also mentioned the CR/LF problem in DOS/Windows which
does not exist in UNIX).

                              Size      Ratio
HTML Ascii 7 bits with CR/LF    39755   137,20%
Plaintext 8 bits with LF only   28977   100,00% Baseline
HTML 8 bits with CR/LF          33510   115,64%
HTML 8 bits with LF only        33258   114,77%
(if this table is not correctly formatted, I can send my mail with HTML:
tables are forwarded correctly now, but the increase is high, I aggree).

In my view, the additional cost in bandwidth from plaintext to
formatted/structured text is ridiculous compared a much better comprehension
of a text. And in any case, the addtional cost of transmitting non-english
texts via an e-mail is already half this cost only because of this old 8/7
bits war. I only mentioned here the 8 bits iso8859-1 standard but the
situation is worse with UNICODE which is the future in this Babel world. Or
do mailing list standards concern only english-speaking people?

--
Nicolas Brouard
Institut national d'�tudes d�mographiques
Paris
mailto:[EMAIL PROTECTED]  http://sauvy.ined.fr/~brouard/english

Reply via email to