[jira] [Commented] (JENA-90) Use OpReduce instead of OpDistinct for DISTINCT + ORDER BY queries

Paolo Castagna (JIRA) Thu, 11 Aug 2011 06:11:56 -0700

    [ 
https://issues.apache.org/jira/browse/JENA-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083102#comment-13083102
 ]


Paolo Castagna commented on JENA-90:
------------------------------------

Let's take, for example, this SPARQL query:

SELECT DISTINCT  *
WHERE
  { ?s ?p ?o }
ORDER BY ?p
LIMIT   10

The correspondent algebra expression is:

(slice _ 10
  (distinct
    (order (?p)
      (bgp (triple ?s ?p ?o)))))

Which is equivalent to:

(slice _ 10
  (reduced
    (order (?p)
      (bgp (triple ?s ?p ?o)))))

However, the distinct or reduced operators forbid the optimization described in 
JENA-89. Maybe we can modify the 'top' operator to yields only distinct 
bindings or add a new 'top_distinct' operator for that:

(top_distinct (10 ?p ?s)
  (bgp (triple ?s ?p ?o))) 

SPARQL queries of the type SELECT DISTINCT ... WHERE {...} ORDER BY ...  LIMIT 
10 are common when people want to display the 10 most 'something' things in 
their dataset.

The implementation of a QueryIterTopNDistinct is almost the same as 
QueryIterTopN (see: JENA-89) but we add bindings to the PriorityQueue if and 
only if they are not already there (using .contains() to check).

Is it worth adding a top_distinct operator or it just pollutes the algebra?

> Use OpReduce instead of OpDistinct for DISTINCT + ORDER BY queries
> ------------------------------------------------------------------
>
>                 Key: JENA-90
>                 URL: https://issues.apache.org/jira/browse/JENA-90
>             Project: Jena
>          Issue Type: Improvement
>          Components: ARQ
>            Reporter: Paolo Castagna
>            Priority: Trivial
>              Labels: arq, optimizer, sparql
>
> ARQ's optimizer could use an OpReduce instead of OpDistinct if a query is 
> DISTINCT + ORDER BY.
> OpReduce removes adjacent duplicates and it does not require a set of already 
> seen bindings as the current OpDistinct implementation does.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (JENA-90) Use OpReduce instead of OpDistinct for DISTINCT + ORDER BY queries

Reply via email to