Re: How does Jena-ARQ execute the queries containing ORDER BY + LIMIT clause

Rose Beck Tue, 10 Mar 2015 04:11:58 -0700

Dear Rob,

Thanks a lot for the reply again.


But I am curious does Jena implement Hash joins -- which are
essentially blocking in nature. If Jena does not, then how is a join
between two unsorted lists (of intermediate results) done in Jena?



On Tue, Mar 10, 2015 at 4:34 PM, Rob Vesse <[email protected]> wrote:
> Yes that is pretty much what happens
>
> Note though that most query evaluation in ARQ is done in a streaming
> fashion so the full set of solutions is typically never held in memory for
> any query unless an operator which requires full /partial materialisation
> e.g. DISTINCT is encountered
>
> Rob
>
> On 10/03/2015 10:24, "Rose Beck" <[email protected]> wrote:
>
>>Dear Rob,
>>
>>First and foremost thank you for such a wonderful explanation.
>>
>>Just to clarify, say my example query is:
>>
>>select ?a?b?c where{?a <pred1> ?c. ?a <pred2> ?b} order by ?c limit 10
>>
>>Then for the query above all the solutions are generated just as it
>>would be the case for the SPARQL query: select ?a?b?c where{?a <pred1>
>>?c. ?a <pred2> ?b}. But within the priority queue (which is the last
>>operation which is applied before results are output to the users) at
>>any point just 10 solutions ordered by ?c are placed. Please correct
>>me if I am wrong.
>>
>>Cheers,
>>Rose
>>
>>
>>On Tue, Mar 10, 2015 at 3:38 PM, Rob Vesse <[email protected]> wrote:
>>> Query execution in ARQ is based on nested iterators so QueryIterTopN
>>>will
>>> always apply over another iterator
>>>
>>> A PriorityQueue is used internally as temporary storage within
>>> QueryIterTopN while it exhausts the inner iterator allowing it to only
>>>use
>>> at most the limit amount of storage in the priority queue plus whatever
>>> temporary storage the inner iterator(s) may need.
>>>
>>> There is still a "total sort" in the sense that every possible solution
>>> has to be compared to see if it should be placed into the priority queue
>>> however there is not a "total sort" in the sense of needing to
>>>materialise
>>> all possible solutions into memory and then sort.
>>>
>>> Rob
>>>
>>> p.s. Please don't post identical questions to both users@ and dev@ - one
>>> list is sufficient as the developers are on both lists.  As a general
>>>rule
>>> general support questions should go to users@ and technical/architecture
>>> questions like this should go to dev@
>>>
>>>
>>>
>>> On 10/03/2015 05:54, "Rose Beck" <[email protected]> wrote:
>>>
>>>>Hi,
>>>>
>>>>I saw the following issue posted on Jena website (which has been
>>>>recently resolved):
>>>>Avoid a total sort for ORDER BY + LIMIT queries
>>>>(https://issues.apache.org/jira/browse/JENA-89).
>>>>
>>>>I am very interested in understanding as to how does Jena-ARQ avoids
>>>>total sort for ORDER BY + LIMIT queries. In the post it is mentioned
>>>>that Jena-ARQ uses priority queue for avoiding a final sort, however
>>>>it is also mentioned that "ARQ's algebra package contains already a
>>>>OpTopN [3] operator. The OpExecutor [4] will need to use a new
>>>>QueryIterTopN instead of QueryIterSort + QueryIterSlice." It is not
>>>>clear now does the priority queue benefit from OpTopN operator and
>>>>QueryIterTopN as the links [3] and [4] mentioned on the website does
>>>>not work, so I am not able to understand their operation and as to how
>>>>do they help in avoiding a total sort.
>>>>
>>>>Can someone please explain how does Jena-ARQ execute the queries
>>>>containing ORDER BY + LIMIT clause.
>>>>
>>>>With Warm Regards,
>>>>Rose
>>>
>>>
>>>
>>>
>>
>>
>>
>>--
>>With Warm Regards,
>>Rose
>
>
>
>



-- 
With Warm Regards,
Rose

Re: How does Jena-ARQ execute the queries containing ORDER BY + LIMIT clause

Reply via email to