On 28/03/11 08:49, Paolo Castagna wrote:
Hi,
I sometimes see queries like this one:
SELECT *
WHERE {
...
?event :start ?start .
?event :end ?end .
FILTER (
(?start > "2010-03-27T00:00:00Z"^^xsd:dateTime) &&
(?end < "2010-03-28T00:00:00Z"^^xsd:dateTime)
)
}
The FILTER is scanning the indexes and the node table to find the values
which satisfy the filter expression.
With TDB, if you have a large dataset, this query can be slow.
TDB already encodes certain node values (including DateTime) inline in
the node ids and the good news is that the encoding scheme preserves the
order.
See NodeId's inline(Node node) method:
https://jena.svn.sourceforge.net/svnroot/jena/TDB/trunk/src/main/java/com/hp/hpl/jena/tdb/store/NodeId.java
And, DateTimeNode, for example:
https://jena.svn.sourceforge.net/svnroot/jena/TDB/trunk/src/main/java/com/hp/hpl/jena/tdb/store/DateTimeNode.java
However, I am not sure these in line node ids are at the moment used at
query time. Am I right?
The dataTime value are rebuilt directly from the bits stored in the
NodeId, There isn't an entry in the node table for value directly
encoded. It uses unpackDateTime
The custom "NumberUtils.formatInt" stuff is because it's appreciably
faster than the standard java operations for paring integers which are
locale sensitive but that isn't needed here.
Inlcing into Nodeids is added by NodeTableInlin and it calls
NodeId.inline(node) ;
This is probably not a trivial change, but one worth aiming at.
What could be done is add better range index scans that started at a
particular value, and not look for "any" in that slot. However, the
biggest benefit is not hitting the NodeTable at all which is already there.
In theory, it should speed up this kind of FILTER expressions and TDB
will be able to answer certain queries without touching the node table
or scanning a large portion of your data just to find a few values.
Another very similar use case is with queries involving locations (i.e.
latitude and longitude). Sometimes you want to find things within a
bounded box, therefore you have a similar expression for latitude values
and one for longitude values.
Is it worth opening a JIRA issue (i.e. a feature request) for this?
Paolo
Andy