Hi all,

I frequently need to query specific subsets of resources; the subsets are
determined at runtime, so are unpredictable - I may need to query all the
resources, or just a handful. Each query extracts a few properties.

*Example*: I have 10,000 resources in the model, and I want to query 1000 of
them (selected according to arbitrary criteria, probably as a result of an
earlier query) to get their latitude and longitude, so I can plot them on a
map.

The naive way is to make 1000 separate queries:

SELECT ?lat ?lon {<http://example.com#myresource123> <http://example.com#lat>
?lat; <http://example.com#lon> ?lon}

This gets rather slow for larger datasets.

Another approach would be to simply query *all *resources in a single query,
and discard the unwanted solutions. Typically this is a similar speed to the
naive approach above, and clearly doesn't scale well!

What I am currently trying is a query that lists the resources of interest,
using a filter and the IN function, e.g.

SELECT ?res ?lat ?lon {?res <http://example.com#lat> ?lat; <
http://example.com#lon> ?lon . filter( ?res IN(<
http://example.com#myresource123>,<http://example.com#myresource124>,...) )}

This works well for small numbers of resources (a few tens) in the list
passed to IN(), giving something like a 10-fold speed increase.
However, performance rapidly drops off as the number grows into 100s or
1000s.

I have implemented a form of batching, where I do 20 queries each with 50
resources in the list, for example - and this helps greatly, though adds
complexity.
*
So my question (finally) is:  *

Is the performance of the IN() function O(n), and if so can anything be
easily done to improve this?

and, is there a better way to do this kind of query?

Thanks,

David.

Reply via email to