Hi Stephen,
yes, what you are proposing is what Andy said to me in another (now old)
thread which I've just found: http://markmail.org/thread/czopj5de3w62aacn.
I started to look at how I could implement this from here:
OpExecutorTDB.executeBGP(...) method which uses TransformFilterPlacement (which
is "interesting" :-)).
Now, somehow/somewhere in TransformFilterPlacement we need to spot these two or
three patterns (is it here the best place to do that? or, should it be a
transformation before that?):
?s <p> ?o .
FILTER ( ?o < "..." )
?s <p> ?o .
FILTER ( ?o > "..." )
?s <p> ?o .
FILTER ( ( ?o > "..." ) && ( ?o < "..." )
Check if the values are "small" enough to be encoded inline into a node id.
If so, replace the ?s <p> ?o triple pattern with ?s <p> [ start, end ].
Somehow, I need a BasicPattern with the notion of an "interval" instead of a
fixed value. Is there something already available to do that?
Also, this has an impact on the QueryIterTriplePattern. Isn't it?
So, the conclusion so far is... it seems easy in theory, in practice it is not a
small/isolated change/optimisation. Isn't it?
A two or three paragram description on how this could be implemented it would
help me a lot and if there is agreement that this is a very common and frequent
pattern which could be benefit from this optimization, perhaps, we can create
a new JIRA issue for this.
Paolo
Stephen Allen wrote:
I think you can exploit the fact that dates are inlined in order in the
indexes. You would transform the filter into a graph pattern operator that
is able to do a range index scan. In this particular example, it would
probably use the POS index [1] to retrieve only the required triples without
having to touch any unrelated triples during the scan.
select *
{
?s <http://purl.org/dc/elements/1.1/date>
("2011-03-03T00:00:00Z"^^xsd:dateTime < ?date <
"2011-06-06T00:00:00Z"^^xsd:dateTime) .
}
This should also work for inequalities on other value types, with some
trickiness if you allow non-inlined values in your system (i.e. integers
greater than 56-bits).
-Stephen
[1] Of course this is making an assumption that there are fewer statements
with dates that match the filter than subjects with that predicate. It
would be up to a cost based optimizer to decide if the PSO index was more
selective in this case.
-----Original Message-----
From: Paolo Castagna [mailto:[email protected]]
Sent: Wednesday, October 19, 2011 4:38 PM
To: [email protected]
Subject: On SPARQL queries with FILTER ( ?date < "..."^^xsd:dateTime )
Hi,
a query pattern I often see is filtering by some xsd:dateTime interval,
for
example:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT * {
?s <http://purl.org/dc/elements/1.1/date> ?date .
FILTER ( ( ?date > "2011-03-03T00:00:00Z"^^xsd:dateTime ) &&
( ?date < "2011-06-06T00:00:00Z"^^xsd:dateTime ) )
}
Even with moderate size stores this query can take quite a while to
execute.
I'd like to know if there is something I could do to speedup these kind
of
queries.
I understand that the xsd:dateTime value is encoded by the
NodeTableInline.
However, I am not sure this is exploited at query time or I'd llike to
understand if there is something we could do better to further improve
performances of queries similar to the one above.
Thanks,
Paolo