[ 
https://issues.apache.org/jira/browse/JENA-626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254157#comment-15254157
 ] 

ASF GitHub Bot commented on JENA-626:
-------------------------------------

Github user afs commented on the pull request:

    https://github.com/apache/jena/pull/95#issuecomment-213496231
  
    **Intent and Abstraction**
    
    Fuseki-caching isn't going to beat Vanish so I think the better use of 
Fuseki-caching is supporting cases flexibility like (in the future) paging 
results.
    
    I've sketched something in a branch in a work area:
    
    https://github.com/afs/jena/tree/fuseki-cache
    
    This is a sketch and not for serious use  - there is some quick-and-easy 
implementation, it's only lightly tested for one dataset only. No 
configurability.
    
    The classes changed are: `HttpAction`, `ResultsCache`, and `SPARQL_Query` 
to use the cache. `SPARQL_Query` has operations `processViaCache`, 
`prepareForCache`, `insertIntoCache`. It deals with two concurrent attempts to 
set the by letting them both run (it's the same answer right?!) and set the 
cache.
    
    The cache is invalided when `HttpAction.beginWrite` is called so all update 
routes are caught (SPARQL Update, GSP and the Uploader). I don't like that - it 
seems asymmetric that `beginWrite` is used and it assumes MR+SW.
    
    Cache actions are logged `** Cache`.
    
    **Space**
    
    If the query result (not the serialization) is stored, I would expect the 
memory footprint will be less because of sharing nodes with the original 
dataset.  Any graph pattern matching variable ends up with the 
node-by-reference.  Calculated expressions are fresh nodes. Long literals are 
shared.
    
    Literals from the data are not extra cost in memory. Let's assume that 
calculated nodes are small.  This is usually true - but they may be a lot of 
them.
    
    The calculation of the memory cost, is now approximated by the total number 
of cells in the results, i.e approximate with "num of rows * num of columns" 
and it can be calculated while capturing the `ResutlSet` copy.  We could put 
limits on the size of results sets cached and on total number of cells.
    
    Serialized results can easily sized.  They do not share space though.
    
    **Configuration**
    
    We need some configuration control, both server-wide on the `fuseki:Server` 
object in config.ttl and on each service.  Or use "Context" - caching is import 
so my suggestion is have properties to cl;early set values.  
    
    The server-wide case is, I think, less important. I suggest putting the 
configuration on service, not the dataset, so you can have two different 
policies, like cached and not-cached, on the same data. 
    
    The default should be "no caching".
    
    The having two services addresses the "cold cache/development" use case.
    
    We should still obey `Pragma: no-cache` and `Cache-control` but there are 
quite a lot of options and details so it might be wise to not aim to have 
everything for a first release, especially if caching is default off.
    
    #### Other
    
    Related-but-different observation: supporting conditional-GETs would be 
very good.  Just keep an epoxy number/timestamp for each dataset.



> SPARQL Query Caching
> --------------------
>
>                 Key: JENA-626
>                 URL: https://issues.apache.org/jira/browse/JENA-626
>             Project: Apache Jena
>          Issue Type: Improvement
>            Reporter: Andy Seaborne
>            Assignee: Saikat Maitra
>              Labels: java, linked_data, rdf, sparql
>
> Add a caching layer to Fuseki to cache the results of SPARQL Query requests.  
> This cache should allow for in-memory and disk-based caching, configuration 
> and cache management, and coordination with data modification.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to