Thanks Andy,
Having updated to ARQ 2.8.8, I now get the expected behaviour. The timings
below suggest that using BINDINGS with a large batch size is the way to go
(I'm reluctant to remove the batching entirely because occasionally I need
to select tens of thousands of resources, though I will experiment to see
what happens with really large queries).
Also, increasing the query size and the batch size to 5000 (!) causes a
stack overflow using the FILTER strategy, but not with BINDINGS (see stack
trace below).
Cheers,
David.
+++
Resources = 25000
Triples = 100040
Doing 1 runs, selecting 500 resources...
[500] Bindings (batches of 50): 375
[500] Bindings (batches of 100): 47
[500] Bindings (batches of 500): 47
[25000] Query all: 422
[500] Naive: 296
[500] Union (no batching): 204
[500] Filter (batches of 50): 46
[500] Filter (batches of 100): 47
[500] Filter (batches of 500): 94
Finished.
++++
Loading data...OK
Resources = 25000
Triples = 100040
Doing 1 runs, selecting 5000 resources...
[5000] Bindings (batches of 50): 688
[5000] Bindings (batches of 100): 250
[5000] Bindings (batches of 5000): 219
[25000] Query all: 421
[5000] Naive: 1766
Exception in thread "main" java.lang.StackOverflowError
at
com.hp.hpl.jena.sparql.algebra.OpVisitorByType.visit(OpVisitorByType.java:65)
at com.hp.hpl.jena.sparql.algebra.op.OpUnion.visit(OpUnion.java:32)
at
com.hp.hpl.jena.sparql.algebra.OpWalker$WalkerVisitor.visit2(OpWalker.java:93)
at
com.hp.hpl.jena.sparql.algebra.OpVisitorByType.visit(OpVisitorByType.java:65)
at com.hp.hpl.jena.sparql.algebra.op.OpUnion.visit(OpUnion.java:32)
at
com.hp.hpl.jena.sparql.algebra.OpWalker$WalkerVisitor.visit2(OpWalker.java:93)
at
com.hp.hpl.jena.sparql.algebra.OpVisitorByType.visit(OpVisitorByType.java:65)
at com.hp.hpl.jena.sparql.algebra.op.OpUnion.visit(OpUnion.java:32)
at
com.hp.hpl.jena.sparql.algebra.OpWalker$WalkerVisitor.visit2(OpWalker.java:93)
at
com.hp.hpl.jena.sparql.algebra.OpVisitorByType.visit(OpVisitorByType.java:65)
at com.hp.hpl.jena.sparql.algebra.op.OpUnion.visit(OpUnion.java:32)
at
com.hp.hpl.jena.sparql.algebra.OpWalker$WalkerVisitor.visit2(OpWalker.java:93)
at
com.hp.hpl.jena.sparql.algebra.OpVisitorByType.visit(OpVisitorByType.java:65)
at com.hp.hpl.jena.sparql.algebra.op.OpUnion.visit(OpUnion.java:32)
at
com.hp.hpl.jena.sparql.algebra.OpWalker$WalkerVisitor.visit2(OpWalker.java:93)
at
com.hp.hpl.jena.sparql.algebra.OpVisitorByType.visit(OpVisitorByType.java:65)
at com.hp.hpl.jena.sparql.algebra.op.OpUnion.visit(OpUnion.java:32)
at
com.hp.hpl.jena.sparql.algebra.OpWalker$WalkerVisitor.visit2(OpWalker.java:93)
at
com.hp.hpl.jena.sparql.algebra.OpVisitorByType.visit(OpVisitorByType.java:65)
at com.hp.hpl.jena.sparql.algebra.op.OpUnion.visit(OpUnion.java:32)
On 17 October 2011 12:30, Andy Seaborne <[email protected]> wrote:
> David,
>
> Thanks for the example.
>
> I got
>
> OntModel:
>
> [500] Bindings (batches of 50): 668
> [25000] Query all: 437
> [500] Naive: 422
> [500] Union (no batching): 179
> [500] Filter (batches of 50): 57
>
> and changing to a default Model:
>
> [500] Bindings (batches of 50): 351
> [25000] Query all: 209
> [500] Naive: 460
> [500] Union (no batching): 192
> [500] Filter (batches of 50): 70
>
> The OntModel results were more variable, the plain default model results
> were more stable.
>
> IN is optimized (not sure which versions ...)
>
> BINDINGs are (lightly) optimized - it (nowadays) tries to turn them into a
> sequence of more grounded queries but that optimization wasn't in earlier
> version of ARQ with BINDINGS.
>
> qparse --print=opt --query=Q.rq will print out what the high-level
> optimizer is doing.
>
>