Andy Seaborne wrote:
On 03/03/11 18:09, Paolo Castagna wrote:
Hi,
we have noticed a high CPU usage when a client send a SPARQL query
which returns a large result set and the client goes away when we
are streaming back the response.
I think I might have found the reason why the high CPU usage persist
even after the client went away.
We use ResultSetFormatter.outputAsXML(OutputStream outStream,
ResultSet qresults) which is then using IndentedWriter:
java.io.IOException.<init>(String)
simple.http.MonitoredOutputStream.ensureOpen()
simple.http.MonitoredOutputStream.write(byte[], int, int)
simple.http.ResponseStream.write(byte[], int, int)
java.io.PrintWriter.write(int)
org.openjena.atlas.io.IndentedWriter.write(char)
org.openjena.atlas.io.IndentedWriter.padInt()
org.openjena.atlas.io.IndentedWriter.lineStart()
org.openjena.atlas.io.IndentedWriter.printOneChar(char)
org.openjena.atlas.io.IndentedWriter.print(Object)
org.openjena.atlas.io.IndentedWriter.println(Object)
com.hp.hpl.jena.sparql.resultset.XMLOutputResultSet.printResource(Resource)
com.hp.hpl.jena.sparql.resultset.XMLOutputResultSet.printBindingValue(RDFNode)
com.hp.hpl.jena.sparql.resultset.XMLOutputResultSet.binding(String,
RDFNode)
com.hp.hpl.jena.sparql.resultset.ResultSetApply.apply()
com.hp.hpl.jena.sparql.resultset.XMLOutput.format(OutputStream,
ResultSet)
com.hp.hpl.jena.query.ResultSetFormatter.outputAsXML(OutputStream,
ResultSet, String)
com.hp.hpl.jena.query.ResultSetFormatter.outputAsXML(OutputStream,
ResultSet)
If an IOException is generated, for example, because the client which
we were streaming the response to went away, the exception is swallowed
by the write method in IndentedWriter:
private void write(char ch)
{ try { out.write(ch) ; } catch (IOException ex) {} }
The problem does not depend on the particular web framework used.
It's high CPU usage because, while the underlying stream exists, there
is I/O going on - remove the I/O and you get no brake on the output and
it's taking a lot of exceptions at a point when normal very little is
done. A CPU loop.
It could throw an (ARQ type) exception. i.e. a RuntimeException. The
stack of checked IOExceptions is fine in theory but I find checked
exceptions don't really work out in practice for error handling. OK,
and sensible, for alternative return modelling, but then the catcher is
the immediate caller. For errors, the catcher might be many levels up
the stack and so every method merely adds "throws IOEXception" which
devalues the whole point of the declaration.
Related: IndentedWriter uses a PrintWriter to warp the OutputStream.
There is (for UTF-8 only) a hopefully optimized BufferingWriter in
Atlas. It might be faster and in a way that shows up in system
performance.
Both changes made.
If you could, could you try the SVN dev version? It will throw a
runtime exception, wrapping the IOException.
Problem fixed.
Query, client goes away while we are streaming back results, CPU comes
back to idle immediately.
Thanks,
Paolo
I think Fuseki is affected by the same problem, but I have not time to
replicate
it with Fuseki yet.
It will be as it just uses ResultSetFormatter.
Am I missing something here?
No - good to fix this.
Andy
Thank you,
Paolo