Comments inline:

On Feb 6, 2012, at 2:06 PM, Andy Seaborne wrote:

> On 06/02/12 17:21, Robert Vesse wrote:
>> On a local network it does not seem to be beneficial, it adds a few
>> seconds of overhead.
> 
> I was trying gzip (in Java) today for Fuseki backups - I found similarly that 
> writing to disk was faster without gzip.
> 
> The Java gzip was about "gzip -6" in time and space reduction achived with 
> large enough gzip buffer - I used 8K where the default is 0.5K.

So in my testing I ran a query that dumped approx 660k thousand triples out of 
the endpoint and compared gzip'd vs non-gzip'd times averaged over several 
runs.  For reference the Jetty GzipFilter uses an 8K buffer size

With the server and client both on the local machine gzip added around 5s of 
overhead for TSV and 7s overhead for XML

But when running the server on a remote machine (outside of the LAN) gzip gave 
a 2x speed up for TSV and a 5x speed up for XML

Note I tried to get figures for JSON as well but discovered that the SPARQL 
JSON parser appears to be non-streaming as I hit OOM exceptions so I'll look 
into that and maybe file a separate JIRA for that.

> 
>> Due to other more pressing work I haven't had
>> chance to test for the overheads when used over a decidedly non-local
>> network though I should get chance to do that later today at which
>> point I'll submit the patch.
>> 
>> The Fuseki patch I have at the moment enables the filter by default
>> but due to the way Jetty works the filter only gets applied if the
>> client explicitly states that they accept GZip encoded content with
>> the Accept-Encoding parameter.
> 
> I think that's correct generally, not a Jetty-ism.  If the client does not 
> ask for it, it should not happen (the client may not have the ability or 
> desire to uncompress e.g scripting languages, curl etc).

Well yes, I just meant in terms of how their filter functions.  As a general 
rule any HTTP server should not give back something the client hasn't asked for.

> 
>> Browsers typically send this so it may be that it would be best to
>> have this feature off by default because if people do prototyping and
>> testing on their local machine with their browser they may see slower
>> performance because of this.
> 
> Agreed. Off by default.

Ok, I will make it off by default, do you want it enabled by command line 
parameter or only programmatically.  My preference would be to add a config 
symbol which the code uses to determine if the feature should be enabled and 
have the command line parameter enable that symbol if present.

> 
>> However most SPARQL clients in various
>> APIs probably won't include this header by default - whether this
>> wants to be enabled by default will probably depend on the
>> performance figures. > I'll aim to have the submitted patch make the
>> behavior configurable and leave it up to you whether you want to have
>> it enabled/disabled by default once you've seen the figures.
>> 
>> Rob
>> 
>> ps. I also have a related patch for ARQ which allows it to ask for
>> GZip and Deflate encoded content though I'll likely package that as
>> part of a more extensive patch for QueryEngineHttp I've been working
>> on which also adds in support for configuring requested content
>> type.
> 
> ARQ isn't being released - just Fuseki so this is less time critical.
> 
>       Andy
> 

Rob

Reply via email to