[
https://issues.apache.org/jira/browse/JENA-144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paolo Castagna updated JENA-144:
--------------------------------
Summary: An optimisation for queries with FILTER ((?date >
"..."^^xsd:dateTime) && (?date < "..."^^xsd:dateTime)) (was: An optimimsation
for queries with FILTER ((?date > "..."^^xsd:dateTime) && (?date <
"..."^^xsd:dateTime)) )
> An optimisation for queries with FILTER ((?date > "..."^^xsd:dateTime) &&
> (?date < "..."^^xsd:dateTime))
> ---------------------------------------------------------------------------------------------------------
>
> Key: JENA-144
> URL: https://issues.apache.org/jira/browse/JENA-144
> Project: Jena
> Issue Type: Improvement
> Components: TDB
> Reporter: Paolo Castagna
> Labels: optimization, performance
> Original Estimate: 120h
> Remaining Estimate: 120h
>
> When TDB index literal values, if possible, it encodes the literal value
> directly into the NodeId.
> See NodeId.inline(Node node) method:
> http://svn.apache.org/repos/asf/incubator/jena/Jena2/TDB/trunk/src/main/java/com/hp/hpl/jena/tdb/store/NodeId.java
> At query time, since there isn't an entry in the node table for values
> encoded in this way, there is no need to perform lookups on the node table.
> Let's consider this query pattern:
> ?s <http://purl.org/dc/elements/1.1/date> ?date .
> FILTER ( ( ?date > "2011-06-06T00:00:00Z"^^xsd:dateTime ) &&
> ( ?date < "2011-06-07T00:00:00Z"^^xsd:dateTime ) )
> In this case the POS index will be used, doing a partial scan with a fixed P:
> [(P,0,0), (P+1,0,0)) where P is the NodeId corresponding to property used in
> the BGP (i.e. <http://purl.org/dc/elements/1.1/date> in the example above).
> However, if there are many subjects with a date, the filter expression needs
> to be evaluated for all the date values. Even if those date values came
> straight out of the POS index and not from the node table, this can take a
> while.
> We could have a better range index scan which starts at a particular value
> (i.e. "2011-06-06T00:00:00Z"^^xsd:dateTime, from the example above). The
> range index scan could be: [(P,D1,0), (P,D2,0)) where D1 and D2 are the
> NodeId corresponding to the values specified in the FILTER expression.
> It is also not clear how the optimizer could decide if this will be more
> selective than other triple patterns.
> See a couple of thread on jena-dev and jena-users mailing lists related to
> this:
> - http://markmail.org/thread/czopj5de3w62aacn
> - http://markmail.org/thread/pfwl6ukbpqfw23r6
> (Or, maybe, this sort of optimisation is too specific, overly complicated...
> and a caching layer would solve this and many other performance related
> issues! ;-))
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira