Hi As we discussed I have implemented the first version of the SPARQL Query Caching and raised my first PR. :)
https://github.com/apache/jena/pull/95 Please review and share feedback Regards Saikat On Fri, Mar 13, 2015 at 10:30 PM, Saikat Maitra <saikat.mai...@gmail.com> wrote: > Hello Andy, Osma, > > Thank you for your feedback. > > I am also more inclined towards option 2 to implement the SPARQL_Cache > servlet as part of Fuseki. The fact that it will have access to Fuseki > internal and admin console makes a lot of sense. > > I really like the idea of selective CacheEntry invalidation at graph level > or triple level, I will think about it in more detail for implementation. > As of now the CacheEntry is set to expire every 5 mins. > > Regards > Saikat > > > > > > On Fri, Mar 13, 2015 at 4:33 PM, Osma Suominen <osma.suomi...@helsinki.fi> > wrote: > >> On 13/03/15 12:37, Andy Seaborne wrote: >> >> To check my understanding, let me put that in my own words: This is >>> essentially fronting the Fuseki server with another webapp that >>> intercepts all requests to the Fuseki server. ("All requests" because >>> it has to see update to manage the cache). >>> >> [...] >> >>> Both will work - personally, I'd go for option 2 because the >>> Sparql_Cache servlet can have detailed access to the internals of >>> Fuseki, for example the service registry and the admin interface. It >>> can be integrated into the UI (in Fuseki2) so the admin console can be >>> extended to manage the cache. See the stats page in the Fuseki2 UI. If >>> its external, the management will be a separate function. >>> >> >> I agree with Andy. I don't see much of a benefit for option 1 (an >> external service) since you can already deploy a reverse proxy such as >> Varnish or nginx in front of Fuseki and get pretty much the same benefits. >> See my quick tutorial for installing Varnish in front of Fuseki [1]. (this >> simple configuration will not cache POST requests, but it is possible to >> tweak Varnish to cache those as well) >> >> The problem with an external cache is that it cannot easily be made aware >> of changes in the data. So when anything in the data changes, you have to >> arrange to flush the cache - in most cases the entire cache, since it's >> very difficult to know whether it has affected the results of a particular >> query. >> >> Where I can see the benefit is basically option 2 that is integrated to >> the Fuseki servlet and is aware of updates. Even just flushing the entire >> cache when an update comes in would be an improvement on using an external >> cache, since at least there would be no possibility of stale data being >> served. But if the cache could somehow be made aware of which parts of the >> data (perhaps on a graph level, if not triple level) contributed to the >> cached results of a particular query, it could perhaps do even smarter >> invalidation and not throw away everything when a single triple changes. >> >> -Osma >> >> >> [1] https://github.com/NatLibFi/Skosmos/wiki/FusekiTuning#http-caching >> >> >> >> -- >> Osma Suominen >> D.Sc. (Tech), Information Systems Specialist >> National Library of Finland >> P.O. Box 26 (Teollisuuskatu 23) >> 00014 HELSINGIN YLIOPISTO >> Tel. +358 50 3199529 >> osma.suomi...@helsinki.fi >> http://www.nationallibrary.fi >> > >