Thanks, Greg, this is very detailed. Once the new module is in and settled and we have a release or two to learn from, I will take a closer look at the usage of this code to understand how it differs from the kind of caching that occurs elsewhere in Jena.
ajs6f > On Apr 14, 2019, at 6:21 AM, Greg Albiston <galbis...@mail.com> wrote: > > Hi, > > There are a lot of permutations that a GeoSPARQL query could take which > can generate different values that may or may not be useful later on. > The general strategy is to keep what is generated for a while and if > isn't used then drop it. I don't think any of the Cache implementations > offer this or a suitable alternative. > > The expiring-map removes entries that haven't been reused after a period > of time. The duration to retain, rate of checking and maximum size can > all be set. It is used for three purposes: > > - The Geometry Wrapper object resulting from de-serialising the Geometry > Literals. > - The transformed Geometry Wrapper object from changing the spatial > reference system. > - The result of a spatial relation between two Geometry Literals to > avoid re-testing when Query Re-writing is applied. > > Most of the GeoSPARQL functions are between two Geometry Literals, so > one could be needed in the next iteration of the query and the other > could be needed later. > > The first purpose offers the biggest impact on performance as there are > additional de-serialising of the Geometry Literal while Jena is > processing the query. Complex shages, e.g. polygons, can be very costly > to extract. > > The second purpose offers most benefit when complex shapes need > transforming. These transformations may be needed again during this > query but not the next. e.g. dataset is in SRS A. Query 1 is a > comparison with a set of values in SRS B. Query 2 then is a comparison > with a set of values in SRS C. The results from Query 1 are useless and > may never be needed again. > > The third purpose is due to GeoSPARQL allowing query re-writing where > the Geometry Literal isn't specified and instead Features and Geometries > are used, so a single query could test the same spatial relations upto > four times depending on bindings. > > The expiring-map is allowed to fill up while the query is processing and > then drops entries that aren't reused (in batches) or once the query > completes. Once it is full, new entries are quickly rejected but space > is freed up later from those entries not being re-used. A user with a > small dataset can cache everything while a large dataset can choose to > constrain it to get some benefit from caching without consuming vast > junks of memory. > > I tried using the Apache Collections 4 LRUMap and it made performance > worse once it was filled (at a guess due to "one out, one in" and > constant searching). I only found one Java implementation of a time > based cache. It seemed excessive to have the whole dependency for one > class and it wasn't as flexible as required. > > Hopefully this clarifies why the expiring-map approach was adopted. > > Thanks, > > Greg > > On 10/04/2019 16:50, ajs6f wrote: >> Just out of curiosity, Greg, what is the functionality offered by Expiring >> Map that isn't offered by Jena's already-extant oaj.atlas.lib.Cache >> implementations? Is it the ability to manually trigger expirations? >> >> ajs6f >> >>> On Apr 9, 2019, at 12:02 PM, Andy Seaborne <a...@apache.org> wrote: >>> >>> [INFO] | \- io.github.galbiston:expiring-map:jar:1.0.2:compile