Optimise DISTINCT + ORDER BY + LIMIT queries
--------------------------------------------

                 Key: JENA-108
                 URL: https://issues.apache.org/jira/browse/JENA-108
             Project: Jena
          Issue Type: Improvement
          Components: ARQ
            Reporter: Paolo Castagna
            Assignee: Paolo Castagna
            Priority: Minor


This SPARQL query: 
SELECT * { ... } ORDER BY ... LIMIT 10
has this algebra:
(slice _ 10
  (order (...)
    (bgp (...))))
which is now transformed into (see JENA-89):
(top (10 ...)
  (bgp (...)))

This query: 
SELECT DISTINCT * { ... } ORDER BY ... 
has this algebra:
(distinct
  (order (...)
    (bgp (...))))
which is now transformed into (see JENA-90):
(reduced
  (order (...)
    (bgp (...))))

What about: 
SELECT DISTINCT * { ... } ORDER BY ... LIMIT 10 ?
The algebra expression is:
(slice _ 10
  (distinct
    (order (?p ?o)
      (bgp (triple ?s ?p ?o)))))
which is (currently) transformed into:
(slice _ 10
  (reduced
    (order (?p ?o)
      (bgp (triple ?s ?p ?o)))))

The optimization in JENA-90 kicks in, but it would be better to use the top 
operator for small limits. 
However, it must be a slightly different top implementation (which accepts only 
distinct bindings). 

As Andy suggested in a comment on JENA-90 (12 August 2011), we can transform 
the algebra into:
(top (10 ...)
  (distinct
    (bgp (...))))
and then use a QueryIterTopNDistinct when OpExecutor(OpTop) sees an 
opTop.getSubOp() instanceof OpDistinct.




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to