On 17/10/15 11:29, Saikat Maitra wrote:
Hi

As we discussed I have implemented the first version of the SPARQL Query
Caching and raised my first PR. :)

:-)

It's good to see this.


https://github.com/apache/jena/pull/95

Please review and share feedback

We are commenting using github - it's copied to the JIRA for a Apache-preserved archive (shame about the duplication by email but that's a detail).

        Andy

Regards
Saikat

On Fri, Mar 13, 2015 at 10:30 PM, Saikat Maitra <saikat.mai...@gmail.com>
wrote:

Hello Andy, Osma,

Thank you for your feedback.

I am also more inclined towards option 2  to implement the SPARQL_Cache
servlet as part of Fuseki. The fact that it will have access to Fuseki
internal and admin console makes a lot of sense.

I really like the idea of selective CacheEntry invalidation at graph level
or triple level, I will think about it in more detail for implementation.
As of now the CacheEntry is set to expire every 5 mins.

Regards
Saikat





On Fri, Mar 13, 2015 at 4:33 PM, Osma Suominen <osma.suomi...@helsinki.fi>
wrote:

On 13/03/15 12:37, Andy Seaborne wrote:

To check my understanding, let me put that in my own words: This is
essentially fronting the Fuseki server with another webapp that
intercepts all requests to the Fuseki server.  ("All requests" because
it has to see update to manage the cache).

[...]

Both will work - personally, I'd go for option 2 because the
Sparql_Cache servlet can have detailed access to the internals of
Fuseki, for example the service registry and the admin interface.  It
can be integrated into the UI (in Fuseki2) so the admin console can be
extended to manage the cache.  See the stats page in the Fuseki2 UI.  If
its external, the management will be a separate function.


I agree with Andy. I don't see much of a benefit for option 1 (an
external service) since you can already deploy a reverse proxy such as
Varnish or nginx in front of Fuseki and get pretty much the same benefits.
See my quick tutorial for installing Varnish in front of Fuseki [1]. (this
simple configuration will not cache POST requests, but it is possible to
tweak Varnish to cache those as well)

The problem with an external cache is that it cannot easily be made aware
of changes in the data. So when anything in the data changes, you have to
arrange to flush the cache - in most cases the entire cache, since it's
very difficult to know whether it has affected the results of a particular
query.

Where I can see the benefit is basically option 2 that is integrated to
the Fuseki servlet and is aware of updates. Even just flushing the entire
cache when an update comes in would be an improvement on using an external
cache, since at least there would be no possibility of stale data being
served. But if the cache could somehow be made aware of which parts of the
data (perhaps on a graph level, if not triple level) contributed to the
cached results of a particular query, it could perhaps do even smarter
invalidation and not throw away everything when a single triple changes.

-Osma


[1] https://github.com/NatLibFi/Skosmos/wiki/FusekiTuning#http-caching



--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Teollisuuskatu 23)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suomi...@helsinki.fi
http://www.nationallibrary.fi





Reply via email to