[jira] [Commented] (JENA-441) In some cases it may be useful to apply DISTINCT before applying ORDER BY

Hudson (JIRA) Sat, 20 Apr 2013 03:09:20 -0700

    [ 
https://issues.apache.org/jira/browse/JENA-441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13637186#comment-13637186
 ]


Hudson commented on JENA-441:
-----------------------------

Integrated in Jena__Development_Test #624 (See 
[https://builds.apache.org/job/Jena__Development_Test/624/])
    Expand DISTINCT to REDUCED optimization to cope with SELECT DISTINCT ?var { 
} ORDER BY ?var style queries (JENA-441) (Revision 1470043)

     Result = SUCCESS
rvesse : 
Files : 
* /jena/trunk/jena-arq/ReleaseNotes.txt
* 
/jena/trunk/jena-arq/src/main/java/com/hp/hpl/jena/sparql/algebra/optimize/TransformDistinctToReduced.java
* 
/jena/trunk/jena-arq/src/test/java/com/hp/hpl/jena/sparql/algebra/optimize/TestOptimizer.java

                
> In some cases it may be useful to apply DISTINCT before applying ORDER BY
> -------------------------------------------------------------------------
>
>                 Key: JENA-441
>                 URL: https://issues.apache.org/jira/browse/JENA-441
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ, Optimizer
>    Affects Versions: Jena 2.10.0
>            Reporter: Rob Vesse
>            Assignee: Rob Vesse
>            Priority: Minor
>              Labels: distinct, optimization, order, project
>         Attachments: order-distinct.csv, order-distinct-opt-2.csv, 
> order-distinct-opt-2.txt, order-distinct-opt-2.xml, order-distinct-opt.csv, 
> order-distinct-opt.txt, order-distinct-opt.xml, order-distinct.txt, 
> order-distinct.xml
>
>
> One of our internal users highlighted an interesting query where changing the 
> plan makes a big difference in performance.
> The query is essentially the following:
> SELECT DISTINCT ?p
> WHERE 
> {
>   ?s ?p ?o
> } ORDER BY ?p
> Leaving the fact that it is a fundamentally dumb query to write the user had 
> an interesting suggestion about the query plan, currently this generates the 
> following:
> (distinct
>     (project (?predicate)
>       (order (?predicate)
>         (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?s ?predicate ?o)))))
> For cases like this it may actually be much more performant to do the 
> distinct first, because of the associated semantics of the various operators 
> you can't just simply put distinct before the order but if you rewrite the 
> query as follows:
> SELECT ?p
> WHERE
> {
>   { SELECT DISTINCT ?p WHERE { ?s ?p ?o } }
> } ORDER BY ?p
> You get the likely much more performant plan:
> (project (?predicate)
>     (order (?predicate)
>       (distinct
>         (project (?predicate)
>           (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?/s ?predicate 
> ?/o))))))
> Clearly this optimization does not apply in the general case, I think it only 
> applies in the case where you have a DISTINCT and all ORDER BY conditions are 
> simple variables and only those variables are projected in which case I think 
> you could produce a plan like the following:
> (order (?predicate)
>     (project (?predicate)
>       (distinct
>         (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?s ?predicate ?o)))))
> I am pretty certain this applies in only a few cases and I haven't reproduced 
> this with TDB to see if the performance difference is noticeable yet.
> Andy - I will try and gather some more information and experiment with this 
> so don't feel you have to look at this one for the time being.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (JENA-441) In some cases it may be useful to apply DISTINCT before applying ORDER BY

Reply via email to