[ 
https://issues.apache.org/jira/browse/JENA-441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13636919#comment-13636919
 ] 

Andy Seaborne edited comment on JENA-441 at 4/19/13 10:07 PM:
--------------------------------------------------------------

Rob - 

(reduced (order (?x) ...)

puts things into ?x order but within each section of same ?x  same rows are not 
necessarily adjacent.

Suppose we have (?x,?y) from pattern matching and ORDER BY ?x

(?x=x,?y=1) (?x=x,?y=2) (?x=x,?y=1)

is a legal but REDUCED is not DISTINCT.

However, on checking (i'd forgotten), ARQ always produces a stable order so 
that sorting by ?x will also order by the rest of the binding if two ?x's are 
equal.  That's unnecessary purely for sorting but stable ordering is desirable 
for OFFSET/LIMIT slicing and happens to make the rewrite safe.

                
      was (Author: andy.seaborne):
    Rob - 

(reduced (order (?x) ...)

puts things into ?x order but within each ?x group same items are not 
necessarily adjacent.

(?x=x,?y=1) (?x=x,?y=2) (?x=x,?y1) is a legal but REDUCED is not DISTINCT.

However, on checking (i'd forgotten), ARQ always produces a stable order so 
that sorting by ?x will also order by the rest of the binding if two ?x's are 
equal.  That's unnecessary but stable ordering is desirable for OFFSET/LIMIT 
slicing and happens to make the rewrite safe.


                  
> In some cases it may be useful to apply DISTINCT before applying ORDER BY
> -------------------------------------------------------------------------
>
>                 Key: JENA-441
>                 URL: https://issues.apache.org/jira/browse/JENA-441
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ, Optimizer
>    Affects Versions: Jena 2.10.0
>            Reporter: Rob Vesse
>            Assignee: Rob Vesse
>            Priority: Minor
>              Labels: distinct, optimization, order, project
>         Attachments: order-distinct.csv, order-distinct-opt-2.csv, 
> order-distinct-opt-2.txt, order-distinct-opt-2.xml, order-distinct-opt.csv, 
> order-distinct-opt.txt, order-distinct-opt.xml, order-distinct.txt, 
> order-distinct.xml
>
>
> One of our internal users highlighted an interesting query where changing the 
> plan makes a big difference in performance.
> The query is essentially the following:
> SELECT DISTINCT ?p
> WHERE 
> {
>   ?s ?p ?o
> } ORDER BY ?p
> Leaving the fact that it is a fundamentally dumb query to write the user had 
> an interesting suggestion about the query plan, currently this generates the 
> following:
> (distinct
>     (project (?predicate)
>       (order (?predicate)
>         (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?s ?predicate ?o)))))
> For cases like this it may actually be much more performant to do the 
> distinct first, because of the associated semantics of the various operators 
> you can't just simply put distinct before the order but if you rewrite the 
> query as follows:
> SELECT ?p
> WHERE
> {
>   { SELECT DISTINCT ?p WHERE { ?s ?p ?o } }
> } ORDER BY ?p
> You get the likely much more performant plan:
> (project (?predicate)
>     (order (?predicate)
>       (distinct
>         (project (?predicate)
>           (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?/s ?predicate 
> ?/o))))))
> Clearly this optimization does not apply in the general case, I think it only 
> applies in the case where you have a DISTINCT and all ORDER BY conditions are 
> simple variables and only those variables are projected in which case I think 
> you could produce a plan like the following:
> (order (?predicate)
>     (project (?predicate)
>       (distinct
>         (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?s ?predicate ?o)))))
> I am pretty certain this applies in only a few cases and I haven't reproduced 
> this with TDB to see if the performance difference is noticeable yet.
> Andy - I will try and gather some more information and experiment with this 
> so don't feel you have to look at this one for the time being.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to