[
https://issues.apache.org/jira/browse/JENA-626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15248574#comment-15248574
]
ASF GitHub Bot commented on JENA-626:
-------------------------------------
Github user afs commented on the pull request:
https://github.com/apache/jena/pull/95#issuecomment-212116452
An alternative (mentioned before) is to cache the query results before
serialization and replay and iterator when there is a cache hit. This could be
done with an iterator wrapper that copies what it sees. This could also help
with the fact that there is no control on the results - e.g. if the results are
very long, it all gets cached.
There are advantages and disadvantages:
1. +ve : If the same query with made with a different "Accept" header,
there is no cache hit and the query results are cached twice.
1. +ve : The copying iterator can have some policy controls like limiting
caching results to N,000 rows. For robustness reasons, we probably want some
limits here so as not to cache the dataset by accident.
1. -ve : the results are serialized each time. Talking to Jetty at all is
going to be slower than Vanish and this adds to that.
1. +ve : A bonus is that LIMIT/OFFSET can be done (a later feature) if the
full results are executed. If the query is LIMIT/OFFSET+ORDER (to get stablity
- a common idiom), the exact query isn't being repeated but it is an expensive
query. Note this is alreay optimized by a TopN query so the interactions here
are complicated but if a repeat results iterator is available, different
LIMIT/OFFSET can be done.
Questions and specific points on this PR:
1. There is an nasty corner case - for long queries, the client can go away
during the response being sent back. If I read the code right, the cache entry
has already been created. has this n=been tested (it is hard to test for badly
behaved clients).
1.How can an uninitialized cache entry get into the cache?
1. SPARQL_Query, SPARQL_Query_Cache. What's the relationship here? There
seems to be some duplication, log messages come out twice on first query and
action.endRead() is called twice on first query.
> SPARQL Query Caching
> --------------------
>
> Key: JENA-626
> URL: https://issues.apache.org/jira/browse/JENA-626
> Project: Apache Jena
> Issue Type: Improvement
> Reporter: Andy Seaborne
> Assignee: Saikat Maitra
> Labels: java, linked_data, rdf, sparql
>
> Add a caching layer to Fuseki to cache the results of SPARQL Query requests.
> This cache should allow for in-memory and disk-based caching, configuration
> and cache management, and coordination with data modification.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)