Re: timing out big sorts

Andy Seaborne Mon, 18 Jul 2016 13:06:50 -0700

On 18/07/16 15:43, Chris Dollin wrote:

On 18/07/16 14:50, Andy Seaborne wrote:

So obvious question - what testing has been done? And was it in a live
scenario?


Not if "live" means "in a production version".

I constructed a big(ish) boring dataset

for s in {1..10000}; do
   for p in {1..10}; do
    for o in {27..42}; do
      echo "<http://chris.com/loc/"$s";>" "<http://chris.com/prop/"$p";>"
$o . ;   done;
    done;
done > mega.ttl

and loaded it into a  Fuseki serving its default graph.

I instrumented the code of the comparator so that it announced itself
to on System.err every few thousands of calls to compare -- I adjusted the
number so that the entire query generated 10-20 messages.

I ran the query with

   s-query --service http://localhost:3030/ds/query 'SELECT * {?s ?p ?o}
ORDER BY ?o'

and a simple config file with no timeout to size up the messages.

Then I ran the query on Fuseki's with different timeouts (ie,
restarting Fuseki for each revised configuration file timeout
setting). When the timeout happens (observation: log message
on console) then there is at most (and usually) one additional
message from the comparator, indicating that the sort has been
abandoned. Shorten the timeout (enough) and additional comparator
messages no longer appear.

What about spill to disk and cancellation in the second or later spill?(did the files get cleared?)


There are a reasonable number of cases here.

It would be nice if the code followed the formatting conventions the
project now
tends to follow.


Oops! I thought I'd fixed those. Which conventions am I flouting?
I will sort them out in a bit.

There are some development code items in the diff that need removing.


I will do that in the pull request.

Chris

Re: timing out big sorts

Reply via email to