[ 
https://issues.apache.org/jira/browse/JENA-144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paolo Castagna updated JENA-144:
--------------------------------

    Summary: An optimisation for queries with FILTER ((?date > 
"..."^^xsd:dateTime) && (?date < "..."^^xsd:dateTime))   (was: An optimimsation 
for queries with FILTER ((?date > "..."^^xsd:dateTime) && (?date < 
"..."^^xsd:dateTime)) )
    
> An optimisation for queries with FILTER ((?date > "..."^^xsd:dateTime) && 
> (?date < "..."^^xsd:dateTime)) 
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: JENA-144
>                 URL: https://issues.apache.org/jira/browse/JENA-144
>             Project: Jena
>          Issue Type: Improvement
>          Components: TDB
>            Reporter: Paolo Castagna
>              Labels: optimization, performance
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> When TDB index literal values, if possible, it encodes the literal value 
> directly into the NodeId. 
> See NodeId.inline(Node node) method:
> http://svn.apache.org/repos/asf/incubator/jena/Jena2/TDB/trunk/src/main/java/com/hp/hpl/jena/tdb/store/NodeId.java
> At query time, since there isn't an entry in the node table for values 
> encoded in this way, there is no need to perform lookups on the node table.
> Let's consider this query pattern:
>     ?s <http://purl.org/dc/elements/1.1/date> ?date .
>     FILTER ( ( ?date > "2011-06-06T00:00:00Z"^^xsd:dateTime ) &&
>              ( ?date < "2011-06-07T00:00:00Z"^^xsd:dateTime ) )
> In this case the POS index will be used, doing a partial scan with a fixed P: 
> [(P,0,0), (P+1,0,0)) where P is the NodeId corresponding to property used in 
> the BGP (i.e. <http://purl.org/dc/elements/1.1/date> in the example above).
> However, if there are many subjects with a date, the filter expression needs 
> to be evaluated for all the date values. Even if those date values came 
> straight out of the POS index and not from the node table, this can take a 
> while.
> We could have a better range index scan which starts at a particular value 
> (i.e. "2011-06-06T00:00:00Z"^^xsd:dateTime, from the example above). The 
> range index scan could be: [(P,D1,0), (P,D2,0)) where D1 and D2 are the 
> NodeId corresponding to the values specified in the FILTER expression.
> It is also not clear how the optimizer could decide if this will be more 
> selective than other triple patterns.
> See a couple of thread on jena-dev and jena-users mailing lists related to 
> this:
>  - http://markmail.org/thread/czopj5de3w62aacn
>  - http://markmail.org/thread/pfwl6ukbpqfw23r6
> (Or, maybe, this sort of optimisation is too specific, overly complicated... 
> and a caching layer would solve this and many other performance related 
> issues! ;-))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to