On 17/02/12 17:17, Robert Vesse wrote:
Generally my recommendation and what we do internally is to time the
time from when we make the execSelect() call to when we finish
iterating over the results, I would recommend not doing anything with
the iteration other than incrementing a count as otherwise you may
skew your figures as what you do with each result may be far more
computationally costly than just iterating over them.

ResultSetFormatter.consume will do what you need to do for timing.

/** This operation faithfully walks the results but does nothing with them.
*  @return The count of the number of solutions.
*/

It not only iterates over the rows, but it also touches every variable in the results. See the code for details.

In TDB this matters:

SELECT (count(*) AS ?c) { ?s ?p ?o }

does not touch the nodes, just the internal ids with no fetching the representation of the URIs etc. TDB returns a lazy-eval result row; that query does not need the bytes for ?s etc. This is fast for count(*).

We have a benchmarking tool that we use internally and we distinguish
these two things as response time and runtime, the former being the
time for the first result to be received and the latter being the
time for all results to be received.  Often the two figures can be
massively differently especially with queries that generate very
large results.

That's also useful - the first row can be more expensive than the rest. This is not an execSelect thing - the first hasNext() can trigger anything from a little work to most of the query, depending on the query. ORDER BY and GROUP BY being extreme cases.

        Andy

Reply via email to