On 28/03/11 08:49, Paolo Castagna wrote:
Hi,
I sometimes see queries like this one:

SELECT *
WHERE {
...
?event :start ?start .
?event :end ?end .
FILTER (
(?start > "2010-03-27T00:00:00Z"^^xsd:dateTime) &&
(?end < "2010-03-28T00:00:00Z"^^xsd:dateTime)
)
}

The FILTER is scanning the indexes and the node table to find the values
which satisfy the filter expression.

With TDB, if you have a large dataset, this query can be slow.

TDB already encodes certain node values (including DateTime) inline in
the node ids and the good news is that the encoding scheme preserves the
order.

See NodeId's inline(Node node) method:
https://jena.svn.sourceforge.net/svnroot/jena/TDB/trunk/src/main/java/com/hp/hpl/jena/tdb/store/NodeId.java

And, DateTimeNode, for example:
https://jena.svn.sourceforge.net/svnroot/jena/TDB/trunk/src/main/java/com/hp/hpl/jena/tdb/store/DateTimeNode.java


However, I am not sure these in line node ids are at the moment used at
query time. Am I right?

The dataTime value are rebuilt directly from the bits stored in the NodeId, There isn't an entry in the node table for value directly encoded. It uses unpackDateTime

The custom "NumberUtils.formatInt" stuff is because it's appreciably faster than the standard java operations for paring integers which are locale sensitive but that isn't needed here.

Inlcing into Nodeids is added by NodeTableInlin and it calls NodeId.inline(node) ;

This is probably not a trivial change, but one worth aiming at.

What could be done is add better range index scans that started at a particular value, and not look for "any" in that slot. However, the biggest benefit is not hitting the NodeTable at all which is already there.

In theory, it should speed up this kind of FILTER expressions and TDB
will be able to answer certain queries without touching the node table
or scanning a large portion of your data just to find a few values.

Another very similar use case is with queries involving locations (i.e.
latitude and longitude). Sometimes you want to find things within a
bounded box, therefore you have a similar expression for latitude values
and one for longitude values.

Is it worth opening a JIRA issue (i.e. a feature request) for this?

Paolo

        Andy

Reply via email to