On 07/02/12 01:05, Robert Vesse wrote:
Comments inline:

On Feb 6, 2012, at 2:06 PM, Andy Seaborne wrote:

On 06/02/12 17:21, Robert Vesse wrote:
On a local network it does not seem to be beneficial, it adds a
few seconds of overhead.

I was trying gzip (in Java) today for Fuseki backups - I found
similarly that writing to disk was faster without gzip.

The Java gzip was about "gzip -6" in time and space reduction
achived with large enough gzip buffer - I used 8K where the default
is 0.5K.

So in my testing I ran a query that dumped approx 660k thousand
triples out of the endpoint and compared gzip'd vs non-gzip'd times
averaged over several runs.  For reference the Jetty GzipFilter uses
an 8K buffer size

With the server and client both on the local machine gzip added
around 5s of overhead for TSV and 7s overhead for XML

But when running the server on a remote machine (outside of the LAN)
gzip gave a 2x speed up for TSV and a 5x speed up for XML

Note I tried to get figures for JSON as well but discovered that the
SPARQL JSON parser appears to be non-streaming as I hit OOM
exceptions so I'll look into that and maybe file a separate JIRA for
that.

That's somethign that ought to get changed. The JSON parser is itself stream capable with an event model (like SAX).

Due to other more pressing work I haven't had chance to test for
the overheads when used over a decidedly non-local network though
I should get chance to do that later today at which point I'll
submit the patch.

The Fuseki patch I have at the moment enables the filter by
default but due to the way Jetty works the filter only gets
applied if the client explicitly states that they accept GZip
encoded content with the Accept-Encoding parameter.

I think that's correct generally, not a Jetty-ism.  If the client
does not ask for it, it should not happen (the client may not have
the ability or desire to uncompress e.g scripting languages, curl
etc).

Well yes, I just meant in terms of how their filter functions.  As a
general rule any HTTP server should not give back something the
client hasn't asked for.


Browsers typically send this so it may be that it would be best
to have this feature off by default because if people do
prototyping and testing on their local machine with their browser
they may see slower performance because of this.

Agreed. Off by default.

Ok, I will make it off by default, do you want it enabled by command
line parameter or only programmatically.  My preference would be to
add a config symbol which the code uses to determine if the feature
should be enabled and have the command line parameter enable that
symbol if present.

By "on" and "off" I think we may mean different things.

I meant "off" => "it does not apply compression on every request whether asked for or not"; I think you may mean the filter is in the servlet chain. I was assuming Fuseki would have the filter there always, ready to respond if the client sets the HTTP request header appropriately ("Accept-Encoding").

I see your JIRA+patch ... great.

        Andy

However most SPARQL clients in various APIs probably won't
include this header by default - whether this wants to be enabled
by default will probably depend on the performance figures.>
I'll aim to have the submitted patch make the behavior
configurable and leave it up to you whether you want to have it
enabled/disabled by default once you've seen the figures.

Rob

ps. I also have a related patch for ARQ which allows it to ask
for GZip and Deflate encoded content though I'll likely package
that as part of a more extensive patch for QueryEngineHttp I've
been working on which also adds in support for configuring
requested content type.

ARQ isn't being released - just Fuseki so this is less time
critical.

Andy


Rob

Reply via email to