Lorenz Bühmann created JENA-2311:
------------------------------------
Summary: query rewrite index does too expensive caching on geo
literals
Key: JENA-2311
URL: https://issues.apache.org/jira/browse/JENA-2311
Project: Apache Jena
Issue Type: Improvement
Components: GeoSPARQL
Affects Versions: Jena 4.4.0
Reporter: Lorenz Bühmann
Using a GeoSPARQL query with a geospatial property function, e.g.
{code:java}
SELECT * {
:x geo:hasGeometry ?geo1 .
?s2 geo:hasGeometry ?geo2 .
?geo1 geo:sfContains ?geo2
}
{code}
leads to heavy memory consumption for larger datasets - and we're not talking
about big data at all. Imagine given a polygon and checking for millions of
geometries for containment in the polygon.
In the {{QueryRewriteIndex}} class for caching a key will be generated, but
this is horribly expensive given that the string representation of Geometries
is called millions of times leading millions of Byte arrays being created
leading a to a possible OOM exception - we got it with 8GB assigned.
The key generation for reference:
{code:java}
String key = subjectGeometryLiteral.getLiteralLexicalForm() + KEY_SEPARATOR +
predicate.getURI() + KEY_SEPARATOR +
objectGeometryLiteral.getLiteralLexicalForm();
{code}
My suggestion is to use a separate {{Node -> Integer}} (or {{Long}} Guava cache
and use the long values instead to generate the cache key. Or any other more
efficient datastructure, not even sure if a String is necessary?
--
This message was sent by Atlassian Jira
(v8.20.1#820001)