Lorenz Bühmann created JENA-2311: ------------------------------------ Summary: query rewrite index does too expensive caching on geo literals Key: JENA-2311 URL: https://issues.apache.org/jira/browse/JENA-2311 Project: Apache Jena Issue Type: Improvement Components: GeoSPARQL Affects Versions: Jena 4.4.0 Reporter: Lorenz Bühmann
Using a GeoSPARQL query with a geospatial property function, e.g. {code:java} SELECT * { :x geo:hasGeometry ?geo1 . ?s2 geo:hasGeometry ?geo2 . ?geo1 geo:sfContains ?geo2 } {code} leads to heavy memory consumption for larger datasets - and we're not talking about big data at all. Imagine given a polygon and checking for millions of geometries for containment in the polygon. In the {{QueryRewriteIndex}} class for caching a key will be generated, but this is horribly expensive given that the string representation of Geometries is called millions of times leading millions of Byte arrays being created leading a to a possible OOM exception - we got it with 8GB assigned. The key generation for reference: {code:java} String key = subjectGeometryLiteral.getLiteralLexicalForm() + KEY_SEPARATOR + predicate.getURI() + KEY_SEPARATOR + objectGeometryLiteral.getLiteralLexicalForm(); {code} My suggestion is to use a separate {{Node -> Integer}} (or {{Long}} Guava cache and use the long values instead to generate the cache key. Or any other more efficient datastructure, not even sure if a String is necessary? -- This message was sent by Atlassian Jira (v8.20.1#820001)