Re: GeoSPARQL process

ajs6f Mon, 15 Apr 2019 07:09:24 -0700

Thanks, Greg, this is very detailed. Once the new module is in and settled and 
we have a release or two to learn from, I will take a closer look at the usage 
of this code to understand how it differs from the kind of caching that occurs 
elsewhere in Jena.


ajs6f

> On Apr 14, 2019, at 6:21 AM, Greg Albiston <galbis...@mail.com> wrote:
> 
> Hi,
> 
> There are a lot of permutations that a GeoSPARQL query could take which
> can generate different values that may or may not be useful later on.
> The general strategy is to keep what is generated for a while and if
> isn't used then drop it. I don't think any of the Cache implementations
> offer this or a suitable alternative.
> 
> The expiring-map removes entries that haven't been reused after a period
> of time. The duration to retain, rate of checking and maximum size can
> all be set. It is used for three purposes:
> 
> - The Geometry Wrapper object resulting from de-serialising the Geometry
> Literals.
> - The transformed Geometry Wrapper object from changing the spatial
> reference system.
> - The result of a spatial relation between two Geometry Literals to
> avoid re-testing when Query Re-writing is applied.
> 
> Most of the GeoSPARQL functions are between two Geometry Literals, so
> one could be needed in the next iteration of the query and the other
> could be needed later.
> 
> The first purpose offers the biggest impact on performance as there are
> additional de-serialising of the Geometry Literal while Jena is
> processing the query. Complex shages, e.g. polygons, can be very costly
> to extract.
> 
> The second purpose offers most benefit when complex shapes need
> transforming. These transformations may be needed again during this
> query but not the next. e.g. dataset is in SRS A. Query 1 is a
> comparison with a set of values in SRS B. Query 2 then is a comparison
> with a set of values in SRS C. The results from Query 1 are useless and
> may never be needed again.
> 
> The third purpose is due to GeoSPARQL allowing query re-writing where
> the Geometry Literal isn't specified and instead Features and Geometries
> are used, so a single query could test the same spatial relations upto
> four times depending on bindings.
> 
> The expiring-map is allowed to fill up while the query is processing and
> then drops entries that aren't reused (in batches) or once the query
> completes. Once it is full, new entries are quickly rejected but space
> is freed up later from those entries not being re-used. A user with a
> small dataset can cache everything while a large dataset can choose to
> constrain it to get some benefit from caching without consuming vast
> junks of memory.
> 
> I tried using the Apache Collections 4 LRUMap and it made performance
> worse once it was filled (at a guess due to "one out, one in" and
> constant searching). I only found one Java implementation of a time
> based cache. It seemed excessive to have the whole dependency for one
> class and it wasn't as flexible as required.
> 
> Hopefully this clarifies why the expiring-map approach was adopted.
> 
> Thanks,
> 
> Greg
> 
> On 10/04/2019 16:50, ajs6f wrote:
>> Just out of curiosity, Greg, what is the functionality offered by Expiring 
>> Map that isn't offered by Jena's already-extant oaj.atlas.lib.Cache 
>> implementations? Is it the ability to manually trigger expirations?
>> 
>> ajs6f
>> 
>>> On Apr 9, 2019, at 12:02 PM, Andy Seaborne <a...@apache.org> wrote:
>>> 
>>> [INFO] |  \- io.github.galbiston:expiring-map:jar:1.0.2:compile

Re: GeoSPARQL process

Reply via email to