[jira] [Updated] (JENA-89) Avoid a total sort for ORDER BY + LIMIT queries

Paolo Castagna (JIRA) Mon, 08 Aug 2011 01:45:00 -0700

     [ 
https://issues.apache.org/jira/browse/JENA-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Paolo Castagna updated JENA-89:
-------------------------------

    Description: 
In case of SPARQL queries with ORDER BY + LIMIT, ARQ sorts the entire result 
set and then produces the first N, according to the specified LIMIT.
As an alternative, discussed on jena-dev [1], we can use a PriorityQueue [2] 
(which is based on a priority heap) to avoid a sort operation.

ARQ's algebra package contains already a OpTopN [3] operator. The OpExecutor 
[4] will need to use a new QueryIterTopN instead of QueryIterSort + 
QueryIterSlice. A new TransformOrderByLimit to be used by Optimize is also 
necessary.

ORDER BY + LIMIT queries are typically used to construct the first page when 
results are paginated. Then the following query is ORDER BY + OFFSET + LIMIT. 
(Often users stop at the first page). Ideally, we could cache the ORDER BY and 
implement the OFFSET|LIMIT using results from the cache. However, the 
improvement described by this issue is limited to the ORDER BY + LIMIT case for 
which a priority heap is a good enough solution.

Hopefully, this would improve the scalability of ORDER BY + LIMIT queries in 
case of small values specified on the LIMIT. 

  [1] http://markmail.org/thread/5d2gtazkoxsa2ayv
  [2] http://download.oracle.com/javase/6/docs/api/java/util/PriorityQueue.html
  [3] 
https://svn.apache.org/repos/asf/incubator/jena/Jena2/ARQ/trunk/src/com/hp/hpl/jena/sparql/algebra/op/OpTopN.java
  [4] 
https://svn.apache.org/repos/asf/incubator/jena/Jena2/ARQ/trunk/src/com/hp/hpl/jena/sparql/engine/main/OpExecutor.java

  was:
In case of SPARQL queries with ORDER BY + LIMIT, ARQ sorts the entire result 
set and then produces the first N, according to the specified LIMIT.
As an alternative, we can use a PriorityQueue [1] (which is based on a priority 
heap) to avoid a sort operation.

ARQ's algebra package contains already a OpTopN [2] operator. The OpExecutor 
[3] will need to use a new QueryIterTopN instead of QueryIterSort + 
QueryIterSlice. A new TransformOrderByLimit to be used by Optimize is also 
necessary.

ORDER BY + LIMIT queries are typically used to construct the first page when 
results are paginated. Then the following query is ORDER BY + OFFSET + LIMIT. 
(Often users stop at the first page). Ideally, we could cache the ORDER BY and 
implement the OFFSET|LIMIT using results from the cache. However, the 
improvement described by this issue is limited to the ORDER BY + LIMIT case for 
which a priority heap is a good enough solution.

Hopefully, this would improve the scalability of ORDER BY + LIMIT queries in 
case of small values specified on the LIMIT. 

  [1] http://download.oracle.com/javase/6/docs/api/java/util/PriorityQueue.html
  [2] 
https://svn.apache.org/repos/asf/incubator/jena/Jena2/ARQ/trunk/src/com/hp/hpl/jena/sparql/algebra/op/OpTopN.java
  [3] 
https://svn.apache.org/repos/asf/incubator/jena/Jena2/ARQ/trunk/src/com/hp/hpl/jena/sparql/engine/main/OpExecutor.java


> Avoid a total sort for ORDER BY + LIMIT queries
> -----------------------------------------------
>
>                 Key: JENA-89
>                 URL: https://issues.apache.org/jira/browse/JENA-89
>             Project: Jena
>          Issue Type: Improvement
>          Components: ARQ
>            Reporter: Paolo Castagna
>            Assignee: Paolo Castagna
>            Priority: Minor
>              Labels: arq, optimization, scalability, sparql
>
> In case of SPARQL queries with ORDER BY + LIMIT, ARQ sorts the entire result 
> set and then produces the first N, according to the specified LIMIT.
> As an alternative, discussed on jena-dev [1], we can use a PriorityQueue [2] 
> (which is based on a priority heap) to avoid a sort operation.
> ARQ's algebra package contains already a OpTopN [3] operator. The OpExecutor 
> [4] will need to use a new QueryIterTopN instead of QueryIterSort + 
> QueryIterSlice. A new TransformOrderByLimit to be used by Optimize is also 
> necessary.
> ORDER BY + LIMIT queries are typically used to construct the first page when 
> results are paginated. Then the following query is ORDER BY + OFFSET + LIMIT. 
> (Often users stop at the first page). Ideally, we could cache the ORDER BY 
> and implement the OFFSET|LIMIT using results from the cache. However, the 
> improvement described by this issue is limited to the ORDER BY + LIMIT case 
> for which a priority heap is a good enough solution.
> Hopefully, this would improve the scalability of ORDER BY + LIMIT queries in 
> case of small values specified on the LIMIT. 
>   [1] http://markmail.org/thread/5d2gtazkoxsa2ayv
>   [2] 
> http://download.oracle.com/javase/6/docs/api/java/util/PriorityQueue.html
>   [3] 
> https://svn.apache.org/repos/asf/incubator/jena/Jena2/ARQ/trunk/src/com/hp/hpl/jena/sparql/algebra/op/OpTopN.java
>   [4] 
> https://svn.apache.org/repos/asf/incubator/jena/Jena2/ARQ/trunk/src/com/hp/hpl/jena/sparql/engine/main/OpExecutor.java

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (JENA-89) Avoid a total sort for ORDER BY + LIMIT queries

Reply via email to