Re: How can I add paging support with jena

Paolo Castagna Wed, 12 Oct 2011 09:51:29 -0700

Hi Nadav,
your is quite a common use case when you want to drive a UI having a SPARQL 
endpoint in your backend.

The problem, as you said, is that some sorting needs to happen and then slicing 
(i.e. LIMIT + OFFSET).
When you have a large dataset you need to sort or scan through a lot of data.
Moreover, currently, each page you hit (== each query you run, same query with 
just a different OFFSET)
will need to do the same.

We are well aware of this.

You might be interested in this thread from jena-dev mailing list:
http://markmail.org/message/p5x334m7dy676oik

In particular point 3. which I report verbatim here:

 "3/ Paging.

  The idiom of a sequence of SELECT / ORDER BY / OFFSET / LIMIT calls with 
changes in OFFSET to get different slices of a result set happens in linked 
data apps (and others).

  We've been optimizing these in ARQ using "top N" queries but LD-Access can 
offer facilities at a different granularity. Catch that query, issue the full 
SELECT / ORDER BY query, cache the results.
  Then you can slice the results as pages without going back to the server.

  One side effect of this is paging without sorting, another is moving sorting 
away from the origin server.

  Sorting is expensive but it's needed to guarantee stability of the result set 
being sliced into pages. So issue the query as SELECT and either sort locally 
(you get to choose the resources
available), to get the same sorted pageable results. Or if ordering is only for 
stability, just remove the ORDER by and replace with a promise to slice from an 
unchanging result set."

We tried to improve things as much as possible from the query engine 
optimization point of view:
https://issues.apache.org/jira/browse/JENA-89
https://issues.apache.org/jira/browse/JENA-90
https://issues.apache.org/jira/browse/JENA-108
https://issues.apache.org/jira/browse/JENA-109
https://issues.apache.org/jira/browse/JENA-111
https://issues.apache.org/jira/browse/JENA-114

Judging from the amount of effort I put into this, you can imagine I have 
problems very similar
to yours. :-)

I don't thing there is something more we can do from the point of view of the 
query engine.
If someone has good ideas on this, I am all ear.

Views are an option (== a sort of internal cache).

But, then something like LD-Access described by Andy would be much better.
Since you can use it with remote SPARQL endpoints and with different 
implementations.

To conclude, I do not have a good answer to your question... other than caching.
But, at the moment, you do not have something which you can use out-of-the-box 
from us.

Do you need this for a commercial product/service/project?

Here is another option for you and your company:

 "The Epimorphics team has unparalleled expertise in the development of
  Apache Jena and includes many of the original developers. Epimorphics
  has 4 committers on the Apache Jena project. [...]
  We also carry out custom development of extensions to Jena and Jena
  based systems.
  For more information on any of these support packages, or other
  Jena-related services, please contact [email protected]."
  -- http://www.epimorphics.com/web/support

(disclaimer: I do not work for Epimorphics).

I am sure a lot of out Jena (and Fuseki) users will benefit immensely from a 
proper
caching layer.

By the way, how big is your dataset (in terms of triples)?
Are you using TDB? Joseki? Fuseki? What are you using to run your queries?
How much RAM do you have on the machine? Is it a 64 bit OS and JVM?
There is a lot you can do to tune your performances... tell us more details.

Regards,
Paolo

Nadav Hoze wrote:
> hi,
> I have a medical ontology stored with Jena tdb.
> the object model is quite simple:
> 1. we have medical concepts that have the following fields: code, code system 
> (explain it on #3) a unique id and text.
> 2. Medical relations between these
> concepts.
> 3. becuase medical concepts are produced from a certain code system we have 
> an object for that which is the container for the concepts (details of it are 
> not important).
> 
> all of the data of course is stored as triples, where as for medical concepts 
> the id is the triple identifier.
> 
> when I query for all the concepts of a code system the result is huge and I 
> would like to get it by paging.
> now I do support it by using limit offset and sorting  but it's extremely 
> slow because of the sorting every time I ask for the next bulk.
> is their a way to do so without sorting, maybe use index?
> 
> thanks,
> Sent from orange email services
> This email and any files transmitted with it are confidential and intended 
> solely for the use of the individual or entity to whom they are addressed. 
> Please note that any disclosure, copying or distribution of the content of 
> this information is strictly forbidden. If you have received this email 
> message in error please notify its sender and then delete it from your files.

Re: How can I add paging support with jena

Reply via email to