On Thu, Feb 25, 1999 at 07:20:12AM -0800, Russ Allbery wrote:
> Nicolas Brouard <[EMAIL PROTECTED]> writes:
> > I choosed HTML but as I am also interested by bandwith consideration, I
> > looked at both sources: the plaintext file is 28,977 characters long and
> > the HTML is 39,755 characters. It is 37% more. It is not 3 times bigger
> > as it was said on this list.
>
> Yes. Three times larger is only if you have a *really* pathetically bad
> converter. For comparison, here's the size difference for my faq2html
> script, which I use to generate HTML versions of various FAQs I maintain
> for posting on the web:
>
> windlord:~/faqs> ls -l mjqmail mjqmail.html
> -rw-r--r-- 1 eagle root 21246 Feb 15 07:20 mjqmail
> -rw-r--r-- 1 eagle root 22736 Feb 25 07:13 mjqmail.html
>
> See <URL:http://www.eyrie.org/~eagle/faqs/mjqmail.html>.
>
> Now, this gets a lot worse with quoted text, since HTML doesn't have good
> mechanisms to deal with that (particularly with nested quoting). So in a
> discussion context, things are messier.
Note that you don't need to escape '>' when converting from plain
text to HTML (which is where a lot of the verbosity can come from
when converting news or mail); you just need to escape '&' and '<'.
Leaving the '>'s as-is is valid HTML, and I've never heard of a
browser that chokes on it (and my news archive software has been
leaving them unescaped for years.)
--
Gerald Oskoboiny <[EMAIL PROTECTED]>
http://impressive.net/people/gerald/