Hi

As we discussed I have implemented the first version of the SPARQL Query
Caching and raised my first PR. :)

https://github.com/apache/jena/pull/95

Please review and share feedback

Regards
Saikat

On Fri, Mar 13, 2015 at 10:30 PM, Saikat Maitra <saikat.mai...@gmail.com>
wrote:

> Hello Andy, Osma,
>
> Thank you for your feedback.
>
> I am also more inclined towards option 2  to implement the SPARQL_Cache
> servlet as part of Fuseki. The fact that it will have access to Fuseki
> internal and admin console makes a lot of sense.
>
> I really like the idea of selective CacheEntry invalidation at graph level
> or triple level, I will think about it in more detail for implementation.
> As of now the CacheEntry is set to expire every 5 mins.
>
> Regards
> Saikat
>
>
>
>
>
> On Fri, Mar 13, 2015 at 4:33 PM, Osma Suominen <osma.suomi...@helsinki.fi>
> wrote:
>
>> On 13/03/15 12:37, Andy Seaborne wrote:
>>
>> To check my understanding, let me put that in my own words: This is
>>> essentially fronting the Fuseki server with another webapp that
>>> intercepts all requests to the Fuseki server.  ("All requests" because
>>> it has to see update to manage the cache).
>>>
>> [...]
>>
>>> Both will work - personally, I'd go for option 2 because the
>>> Sparql_Cache servlet can have detailed access to the internals of
>>> Fuseki, for example the service registry and the admin interface.  It
>>> can be integrated into the UI (in Fuseki2) so the admin console can be
>>> extended to manage the cache.  See the stats page in the Fuseki2 UI.  If
>>> its external, the management will be a separate function.
>>>
>>
>> I agree with Andy. I don't see much of a benefit for option 1 (an
>> external service) since you can already deploy a reverse proxy such as
>> Varnish or nginx in front of Fuseki and get pretty much the same benefits.
>> See my quick tutorial for installing Varnish in front of Fuseki [1]. (this
>> simple configuration will not cache POST requests, but it is possible to
>> tweak Varnish to cache those as well)
>>
>> The problem with an external cache is that it cannot easily be made aware
>> of changes in the data. So when anything in the data changes, you have to
>> arrange to flush the cache - in most cases the entire cache, since it's
>> very difficult to know whether it has affected the results of a particular
>> query.
>>
>> Where I can see the benefit is basically option 2 that is integrated to
>> the Fuseki servlet and is aware of updates. Even just flushing the entire
>> cache when an update comes in would be an improvement on using an external
>> cache, since at least there would be no possibility of stale data being
>> served. But if the cache could somehow be made aware of which parts of the
>> data (perhaps on a graph level, if not triple level) contributed to the
>> cached results of a particular query, it could perhaps do even smarter
>> invalidation and not throw away everything when a single triple changes.
>>
>> -Osma
>>
>>
>> [1] https://github.com/NatLibFi/Skosmos/wiki/FusekiTuning#http-caching
>>
>>
>>
>> --
>> Osma Suominen
>> D.Sc. (Tech), Information Systems Specialist
>> National Library of Finland
>> P.O. Box 26 (Teollisuuskatu 23)
>> 00014 HELSINGIN YLIOPISTO
>> Tel. +358 50 3199529
>> osma.suomi...@helsinki.fi
>> http://www.nationallibrary.fi
>>
>
>

Reply via email to