[jira] [Commented] (JENA-626) SPARQL Query Caching

ASF GitHub Bot (JIRA) Tue, 19 Apr 2016 13:35:51 -0700

    [ 
https://issues.apache.org/jira/browse/JENA-626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15248574#comment-15248574
 ]


ASF GitHub Bot commented on JENA-626:
-------------------------------------

Github user afs commented on the pull request:

    https://github.com/apache/jena/pull/95#issuecomment-212116452
  
    An alternative (mentioned before) is to cache the query results before 
serialization and replay and iterator when there is a cache hit.  This could be 
done with an iterator wrapper that copies what it sees. This could also help 
with the fact that there is no control on the results - e.g. if the results are 
very long, it all gets cached.
    
    
    There are advantages and disadvantages:
    
    1. +ve : If the same query with made with a different "Accept" header, 
there is no cache hit and the query results are cached twice.
    
    1. +ve : The copying iterator can have some policy controls like limiting 
caching results to N,000 rows. For robustness reasons, we probably want some 
limits here so as not to cache the dataset by accident.
    
    1. -ve : the results are serialized each time.  Talking to Jetty at all is 
going to be slower than Vanish and this adds to that. 
    
    1. +ve : A bonus is that LIMIT/OFFSET can be done (a later feature) if the 
full results are executed.  If the query is LIMIT/OFFSET+ORDER (to get stablity 
- a common idiom), the exact query isn't being repeated but it is an expensive 
query.  Note this is alreay optimized by a TopN query so the interactions here 
are complicated but if a repeat results iterator is available, different 
LIMIT/OFFSET can be done.
    
    Questions and specific points on this PR:
    
    1. There is an nasty corner case - for long queries, the client can go away 
during the response being sent back.  If I read the code right, the cache entry 
has already been created. has this n=been tested (it is hard to test for badly 
behaved clients).
    1.How can an uninitialized cache entry get into the cache?
    1. SPARQL_Query, SPARQL_Query_Cache. What's the relationship here? There 
seems to be some duplication, log messages come out twice on first query and 
action.endRead() is called twice on first query.



> SPARQL Query Caching
> --------------------
>
>                 Key: JENA-626
>                 URL: https://issues.apache.org/jira/browse/JENA-626
>             Project: Apache Jena
>          Issue Type: Improvement
>            Reporter: Andy Seaborne
>            Assignee: Saikat Maitra
>              Labels: java, linked_data, rdf, sparql
>
> Add a caching layer to Fuseki to cache the results of SPARQL Query requests.  
> This cache should allow for in-memory and disk-based caching, configuration 
> and cache management, and coordination with data modification.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (JENA-626) SPARQL Query Caching

Reply via email to