Optimise DISTINCT + ORDER BY + LIMIT queries
--------------------------------------------
Key: JENA-108
URL: https://issues.apache.org/jira/browse/JENA-108
Project: Jena
Issue Type: Improvement
Components: ARQ
Reporter: Paolo Castagna
Assignee: Paolo Castagna
Priority: Minor
This SPARQL query:
SELECT * { ... } ORDER BY ... LIMIT 10
has this algebra:
(slice _ 10
(order (...)
(bgp (...))))
which is now transformed into (see JENA-89):
(top (10 ...)
(bgp (...)))
This query:
SELECT DISTINCT * { ... } ORDER BY ...
has this algebra:
(distinct
(order (...)
(bgp (...))))
which is now transformed into (see JENA-90):
(reduced
(order (...)
(bgp (...))))
What about:
SELECT DISTINCT * { ... } ORDER BY ... LIMIT 10 ?
The algebra expression is:
(slice _ 10
(distinct
(order (?p ?o)
(bgp (triple ?s ?p ?o)))))
which is (currently) transformed into:
(slice _ 10
(reduced
(order (?p ?o)
(bgp (triple ?s ?p ?o)))))
The optimization in JENA-90 kicks in, but it would be better to use the top
operator for small limits.
However, it must be a slightly different top implementation (which accepts only
distinct bindings).
As Andy suggested in a comment on JENA-90 (12 August 2011), we can transform
the algebra into:
(top (10 ...)
(distinct
(bgp (...))))
and then use a QueryIterTopNDistinct when OpExecutor(OpTop) sees an
opTop.getSubOp() instanceof OpDistinct.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira