Hi Michael,
Michael Peters wrote:
Raymond Wan wrote:
I had looked at the effect compression has on web pages a while ago.
Though not relevant to modperl, there is obviously a cost to
compression and since most HTML pages are small, sometimes it is hard
to justify.
Not to discredit the work you did researching this, but a lot of
people are studying the same thing and coming to different conclusions:
http://developer.yahoo.com/performance/rules.html
Yes, backend performance matters, but more and more we realize that
the front end tweaks we can make give a better performance for users.
Take google as an example. The overhead of compressing their content
and decompressing it on the browser takes less time than sending the
same content uncompressed over the network. I'd say the same is true
for most other applications too.
It's ok; I don't consider another opinion as discrediting my work. :-)
Actually, it was a while ago and it was only one aspect of my work and
in a smaller test bed. My fault for handwaving in my reply, though.
The point is actually the "sometimes"... My research was more in
general compression and web compression was only one aspect. My point
is if you take a one byte file and run gzip -9 on it (again, the same
algorithm as deflate), you get a 24 byte file. As you increase that
file size, you will reach a point where it becomes more beneficial to
compress. Though my example is both silly and pathological, it just
shows that there are cases when compression may not be beneficial. And
one can imagine the average file size of a web site to be some kind of
knob and as it turns (average file size increases as you go from site to
site), the benefits become more and more evident.
For example, compressing an already compressed file is generally
pointless (if it was done right the first time). MP3, JPEG, GIF, etc.
are all file formats that have or may have compression incorporated.
PDFs can be compressed too if someone selected that option when creating
it. English text compresses well (25%, in general?) but two-byte
encodings such as Chinese and Japanese (I think) get around 40-50%
[handwaving again :-) there are more updated numbers out there]. Also,
compression works if it is a uniform file; if a web page has a mix of
text, images, etc., then each one has to be compressed individually.
As for Google, you are right -- I can imagine why it would work well for
Google. However, I can also hypothesize that it might be a special
case. I presume you mean the results of a query. The result we get is
a list of results which all are related to each other. i.e., if you
searched for "apache2 modperl", we can expect those two words to be in
every result and the type of words to be similar from result to result
[they would all be computer-oriented]. As compression aims to reduce
redundancy, their results are perfect for it. Especially if
Anyway, what I wanted to say is that there ought to be instances when
compression is beneficial and when it isn't. I think it is fine to do
what the Yahoo site says and have it "on" by default; but if someone
examines the traffic and data and realizes it should be "off", that
isn't beyond reason.
As for dialup, if I remember from those dark modem days :-)
Even non dialup customers can benefit. Many "broadband" connections
aren't very fast, especially in rural places (I'm thinking large
portions of the US).
But all this talk is really useless in the abstract. Take a tool like
YSlow for a spin and see how your sites perform with and without
compression. Especially looking at the waterfall display.
Well, one good thing about deflate is that it is *fast*. Very fast.
So, while my silly one byte file example shows there are exceptions, it
might be closer to one byte. :-)
One cost savings might be to pre-compress files since it is more
time-consuming to compress than decompress using deflate. i.e., have
them reside on the server in compressed form. Of course, that offers
many problems and is one reason why things like Stacker didn't really
catch on (much)...
Ray