On 07/02/12 01:05, Robert Vesse wrote:
Comments inline:On Feb 6, 2012, at 2:06 PM, Andy Seaborne wrote:On 06/02/12 17:21, Robert Vesse wrote:On a local network it does not seem to be beneficial, it adds a few seconds of overhead.I was trying gzip (in Java) today for Fuseki backups - I found similarly that writing to disk was faster without gzip. The Java gzip was about "gzip -6" in time and space reduction achived with large enough gzip buffer - I used 8K where the default is 0.5K.So in my testing I ran a query that dumped approx 660k thousand triples out of the endpoint and compared gzip'd vs non-gzip'd times averaged over several runs. For reference the Jetty GzipFilter uses an 8K buffer size With the server and client both on the local machine gzip added around 5s of overhead for TSV and 7s overhead for XML But when running the server on a remote machine (outside of the LAN) gzip gave a 2x speed up for TSV and a 5x speed up for XML Note I tried to get figures for JSON as well but discovered that the SPARQL JSON parser appears to be non-streaming as I hit OOM exceptions so I'll look into that and maybe file a separate JIRA for that.
That's somethign that ought to get changed. The JSON parser is itself stream capable with an event model (like SAX).
Due to other more pressing work I haven't had chance to test for the overheads when used over a decidedly non-local network though I should get chance to do that later today at which point I'll submit the patch. The Fuseki patch I have at the moment enables the filter by default but due to the way Jetty works the filter only gets applied if the client explicitly states that they accept GZip encoded content with the Accept-Encoding parameter.I think that's correct generally, not a Jetty-ism. If the client does not ask for it, it should not happen (the client may not have the ability or desire to uncompress e.g scripting languages, curl etc).Well yes, I just meant in terms of how their filter functions. As a general rule any HTTP server should not give back something the client hasn't asked for.Browsers typically send this so it may be that it would be best to have this feature off by default because if people do prototyping and testing on their local machine with their browser they may see slower performance because of this.Agreed. Off by default.Ok, I will make it off by default, do you want it enabled by command line parameter or only programmatically. My preference would be to add a config symbol which the code uses to determine if the feature should be enabled and have the command line parameter enable that symbol if present.
By "on" and "off" I think we may mean different things.I meant "off" => "it does not apply compression on every request whether asked for or not"; I think you may mean the filter is in the servlet chain. I was assuming Fuseki would have the filter there always, ready to respond if the client sets the HTTP request header appropriately ("Accept-Encoding").
I see your JIRA+patch ... great.
Andy
However most SPARQL clients in various APIs probably won't include this header by default - whether this wants to be enabled by default will probably depend on the performance figures.> I'll aim to have the submitted patch make the behavior configurable and leave it up to you whether you want to have it enabled/disabled by default once you've seen the figures. Rob ps. I also have a related patch for ARQ which allows it to ask for GZip and Deflate encoded content though I'll likely package that as part of a more extensive patch for QueryEngineHttp I've been working on which also adds in support for configuring requested content type.ARQ isn't being released - just Fuseki so this is less time critical. AndyRob
