[
https://issues.apache.org/jira/browse/JENA-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083012#comment-13083012
]
Paolo Castagna edited comment on JENA-89 at 8/11/11 8:44 AM:
-------------------------------------------------------------
The optimizer applies the transformation if and only if LIMIT is less than the
threshold and no OFFSET is used.
Are we happy with a fixed TOPN_LIMIT_THRESHOLD = 100?
The optimization is active by default, but it can be turned off via --set
http://jena.hpl.hp.com/ARQ#optTopNSorting=false
The changes in BuilderOp are the ones I am less sure about:
- BuilderLib.checkLength(4, list, Tags.tagTopN) ;
- int N = BuilderNode.buildInt(list, 1, -1) ;
- ItemList conditions = list.get(2).getList() ;
+ BuilderLib.checkLength(3, list, Tags.tagTopN) ;
+ long N = BuilderNode.buildInt(list.get(1).getList(), 0, -1) ;
+ ItemList conditions = list.get(1).getList().cdr() ;
An example of the optimization in action follows:
SPARQL query:
SELECT *
WHERE
{ ?s ?p ?o }
ORDER BY ?p ?s
LIMIT 10
SPARQL algebra expression unchanged:
(slice _ 10
(order (?p ?s)
(bgp (triple ?s ?p ?o))))
SPARQL algebra expression with the optimization applied:
(top (10 ?p ?s)
(bgp (triple ?s ?p ?o)))
was (Author: castagna):
The optimizer applies the transformation if and only if LIMIT is less than
the threshold and no OFFSET is used.
Are we happy with a fixed TOPN_LIMIT_THRESHOLD = 100?
The optimization is active by default, but it can be turned off via --set
http://jena.hpl.hp.com/ARQ#optTopNSorting=false
The changes in BuilderOp are the ones I am less sure about:
- BuilderLib.checkLength(4, list, Tags.tagTopN) ;
- int N = BuilderNode.buildInt(list, 1, -1) ;
- ItemList conditions = list.get(2).getList() ;
+ BuilderLib.checkLength(3, list, Tags.tagTopN) ;
+ long N = BuilderNode.buildInt(list.get(1).getList(), 0, -1) ;
+ ItemList conditions = list.get(1).getList().cdr() ;
An example of the optimization in action follows:
SPARQL query:
SELECT *
WHERE
{ ?s ?p ?o }
ORDER BY ?p ?s
LIMIT 10
SPARQL algebra expression unchanged:
(slice _ 10
(order (?p ?s)
(bgp (triple ?s ?p ?o))))
SPARQL algebra expression with the optimization applied:
(top (10 ?p ?s)
(bgp (triple ?s ?p ?o)))
> Avoid a total sort for ORDER BY + LIMIT queries
> -----------------------------------------------
>
> Key: JENA-89
> URL: https://issues.apache.org/jira/browse/JENA-89
> Project: Jena
> Issue Type: Improvement
> Components: ARQ
> Reporter: Paolo Castagna
> Assignee: Paolo Castagna
> Priority: Minor
> Labels: arq, optimization, scalability, sparql
> Attachments: ARQ_JENA-89_r1156212.patch
>
>
> In case of SPARQL queries with ORDER BY + LIMIT, ARQ sorts the entire result
> set and then produces the first N, according to the specified LIMIT.
> As an alternative, discussed on jena-dev [1], we can use a PriorityQueue [2]
> (which is based on a priority heap) to avoid a sort operation.
> ARQ's algebra package contains already a OpTopN [3] operator. The OpExecutor
> [4] will need to use a new QueryIterTopN instead of QueryIterSort +
> QueryIterSlice. A new TransformOrderByLimit to be used by Optimize is also
> necessary.
> ORDER BY + LIMIT queries are typically used to construct the first page when
> results are paginated. Then the following query is ORDER BY + OFFSET + LIMIT.
> (Often users stop at the first page). Ideally, we could cache the ORDER BY
> and implement the OFFSET|LIMIT using results from the cache. However, the
> improvement described by this issue is limited to the ORDER BY + LIMIT case
> for which a priority heap is a good enough solution.
> Hopefully, this would improve the scalability of ORDER BY + LIMIT queries in
> case of small values specified on the LIMIT.
> [1] http://markmail.org/thread/5d2gtazkoxsa2ayv
> [2]
> http://download.oracle.com/javase/6/docs/api/java/util/PriorityQueue.html
> [3]
> https://svn.apache.org/repos/asf/incubator/jena/Jena2/ARQ/trunk/src/com/hp/hpl/jena/sparql/algebra/op/OpTopN.java
> [4]
> https://svn.apache.org/repos/asf/incubator/jena/Jena2/ARQ/trunk/src/com/hp/hpl/jena/sparql/engine/main/OpExecutor.java
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira