Hi all,
I frequently need to query specific subsets of resources; the subsets are
determined at runtime, so are unpredictable - I may need to query all the
resources, or just a handful. Each query extracts a few properties.
*Example*: I have 10,000 resources in the model, and I want to query 1000 of
them (selected according to arbitrary criteria, probably as a result of an
earlier query) to get their latitude and longitude, so I can plot them on a
map.
The naive way is to make 1000 separate queries:
SELECT ?lat ?lon {<http://example.com#myresource123> <http://example.com#lat>
?lat; <http://example.com#lon> ?lon}
This gets rather slow for larger datasets.
Another approach would be to simply query *all *resources in a single query,
and discard the unwanted solutions. Typically this is a similar speed to the
naive approach above, and clearly doesn't scale well!
What I am currently trying is a query that lists the resources of interest,
using a filter and the IN function, e.g.
SELECT ?res ?lat ?lon {?res <http://example.com#lat> ?lat; <
http://example.com#lon> ?lon . filter( ?res IN(<
http://example.com#myresource123>,<http://example.com#myresource124>,...) )}
This works well for small numbers of resources (a few tens) in the list
passed to IN(), giving something like a 10-fold speed increase.
However, performance rapidly drops off as the number grows into 100s or
1000s.
I have implemented a form of batching, where I do 20 queries each with 50
resources in the list, for example - and this helps greatly, though adds
complexity.
*
So my question (finally) is: *
Is the performance of the IN() function O(n), and if so can anything be
easily done to improve this?
and, is there a better way to do this kind of query?
Thanks,
David.