Re: Performance of SPARQL filter with IN() function

David Allsopp Tue, 18 Oct 2011 03:33:01 -0700

Thanks Andy,

Having updated to ARQ 2.8.8, I now get the expected behaviour. The timings
below suggest that using BINDINGS with a large batch size is the way to go
(I'm reluctant to remove the batching entirely because occasionally I need
to select tens of thousands of resources, though I will experiment to see
what happens with really large queries).


Also, increasing the query size and the batch size to 5000 (!) causes a
stack overflow using the FILTER strategy, but not with BINDINGS (see stack
trace below).

Cheers,

David.

+++

Resources = 25000
Triples = 100040
Doing 1 runs, selecting 500 resources...

[500] Bindings (batches of 50): 375
[500] Bindings (batches of 100): 47
[500] Bindings (batches of 500): 47
[25000] Query all: 422
[500] Naive: 296
[500] Union (no batching): 204
[500] Filter (batches of 50): 46
[500] Filter (batches of 100): 47
[500] Filter (batches of 500): 94

Finished.


++++

Loading data...OK
Resources = 25000
Triples = 100040
Doing 1 runs, selecting 5000 resources...

[5000] Bindings (batches of 50): 688
[5000] Bindings (batches of 100): 250
[5000] Bindings (batches of 5000): 219
[25000] Query all: 421
[5000] Naive: 1766
Exception in thread "main" java.lang.StackOverflowError
    at
com.hp.hpl.jena.sparql.algebra.OpVisitorByType.visit(OpVisitorByType.java:65)
    at com.hp.hpl.jena.sparql.algebra.op.OpUnion.visit(OpUnion.java:32)
    at
com.hp.hpl.jena.sparql.algebra.OpWalker$WalkerVisitor.visit2(OpWalker.java:93)
    at
com.hp.hpl.jena.sparql.algebra.OpVisitorByType.visit(OpVisitorByType.java:65)
    at com.hp.hpl.jena.sparql.algebra.op.OpUnion.visit(OpUnion.java:32)
    at
com.hp.hpl.jena.sparql.algebra.OpWalker$WalkerVisitor.visit2(OpWalker.java:93)
    at
com.hp.hpl.jena.sparql.algebra.OpVisitorByType.visit(OpVisitorByType.java:65)
    at com.hp.hpl.jena.sparql.algebra.op.OpUnion.visit(OpUnion.java:32)
    at
com.hp.hpl.jena.sparql.algebra.OpWalker$WalkerVisitor.visit2(OpWalker.java:93)
    at
com.hp.hpl.jena.sparql.algebra.OpVisitorByType.visit(OpVisitorByType.java:65)
    at com.hp.hpl.jena.sparql.algebra.op.OpUnion.visit(OpUnion.java:32)
    at
com.hp.hpl.jena.sparql.algebra.OpWalker$WalkerVisitor.visit2(OpWalker.java:93)
    at
com.hp.hpl.jena.sparql.algebra.OpVisitorByType.visit(OpVisitorByType.java:65)
    at com.hp.hpl.jena.sparql.algebra.op.OpUnion.visit(OpUnion.java:32)
    at
com.hp.hpl.jena.sparql.algebra.OpWalker$WalkerVisitor.visit2(OpWalker.java:93)
    at
com.hp.hpl.jena.sparql.algebra.OpVisitorByType.visit(OpVisitorByType.java:65)
    at com.hp.hpl.jena.sparql.algebra.op.OpUnion.visit(OpUnion.java:32)
    at
com.hp.hpl.jena.sparql.algebra.OpWalker$WalkerVisitor.visit2(OpWalker.java:93)
    at
com.hp.hpl.jena.sparql.algebra.OpVisitorByType.visit(OpVisitorByType.java:65)
    at com.hp.hpl.jena.sparql.algebra.op.OpUnion.visit(OpUnion.java:32)

On 17 October 2011 12:30, Andy Seaborne <[email protected]> wrote:

> David,
>
> Thanks for the example.
>
> I got
>
> OntModel:
>
> [500] Bindings (batches of 50): 668
> [25000] Query all: 437
> [500] Naive: 422
> [500] Union (no batching): 179
> [500] Filter (batches of 50): 57
>
> and changing to a default Model:
>
> [500] Bindings (batches of 50): 351
> [25000] Query all: 209
> [500] Naive: 460
> [500] Union (no batching): 192
> [500] Filter (batches of 50): 70
>
> The OntModel results were more variable, the plain default model results
> were more stable.
>
> IN is optimized (not sure which versions ...)
>
> BINDINGs are (lightly) optimized - it (nowadays) tries to turn them into a
> sequence of more grounded queries but that optimization wasn't in earlier
> version of ARQ with BINDINGS.
>
> qparse --print=opt --query=Q.rq will print out what the high-level
> optimizer is doing.
>
>

Re: Performance of SPARQL filter with IN() function

Reply via email to