[jira] [Commented] (JENA-90) Use OpReduce instead of OpDistinct for DISTINCT + ORDER BY queries

Stephen Allen (JIRA) Fri, 26 Aug 2011 15:06:01 -0700

    [ 
https://issues.apache.org/jira/browse/JENA-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092086#comment-13092086
 ]


Stephen Allen commented on JENA-90:
-----------------------------------

Hi Paolo,

I think the approach you want is to use QueryIterReduced instead of the new 
QueryIterDistinctSort class you propose (also, an important note: [1]).  
Perhaps QueryIterReduced could possibly be optimized a little bit by 
eliminating the general purpose window array and using a single variable in 
this particular case of a sorted input.

Although, in my mind, a better approach would be to modify the algebra as part 
of a query optimization step (replace the OpDistinct with an OpReduced) when it 
is known that the QueryIterator to which it is applied to is sorted (either 
because of an underlying OpOrder or a sorted triple/quad index).  This makes it 
easier to determine what is going on during a query execution by examining the 
transformed algebra instead of having branches in the physical operators 
themselves.


[1]  DistinctDataBag is not guaranteed to be sorted.  The in-memory bindings 
are stored in a HashSet, thus if the bag does not spill to disk then no attempt 
is made to sort the bindings in the iterator (so as not to perform extra 
effort).  It would not be hard to create a DistinctSortedDataBag, but I'm not 
sure that it is necessary (and IMO limiting the number of primitive operations 
helps simplify the system).

> Use OpReduce instead of OpDistinct for DISTINCT + ORDER BY queries
> ------------------------------------------------------------------
>
>                 Key: JENA-90
>                 URL: https://issues.apache.org/jira/browse/JENA-90
>             Project: Jena
>          Issue Type: Improvement
>          Components: ARQ
>            Reporter: Paolo Castagna
>            Assignee: Paolo Castagna
>            Priority: Trivial
>              Labels: arq, optimizer, sparql
>         Attachments: ARQ_JENA-90_r1159636.patch
>
>
> ARQ's optimizer could use an OpReduce instead of OpDistinct if a query is 
> DISTINCT + ORDER BY.
> OpReduce removes adjacent duplicates and it does not require a set of already 
> seen bindings as the current OpDistinct implementation does.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (JENA-90) Use OpReduce instead of OpDistinct for DISTINCT + ORDER BY queries

Reply via email to