Lorenz Bühmann created JENA-2311:
------------------------------------

             Summary: query rewrite index does too expensive caching on geo 
literals
                 Key: JENA-2311
                 URL: https://issues.apache.org/jira/browse/JENA-2311
             Project: Apache Jena
          Issue Type: Improvement
          Components: GeoSPARQL
    Affects Versions: Jena 4.4.0
            Reporter: Lorenz Bühmann


Using a GeoSPARQL query with a geospatial property function, e.g.


{code:java}
SELECT * {
:x geo:hasGeometry ?geo1 .
?s2 geo:hasGeometry ?geo2 .
?geo1 geo:sfContains ?geo2
}
{code}


leads to heavy memory consumption for larger datasets - and we're not talking 
about big data at all. Imagine given a polygon and checking for millions of 
geometries for containment in the polygon.

In the {{QueryRewriteIndex}} class for caching a key will be generated, but 
this is horribly expensive given that the string representation of Geometries 
is called millions of times leading millions of Byte arrays being created 
leading a to a possible OOM exception - we got it with 8GB assigned.
The key generation for reference:

{code:java}
String key = subjectGeometryLiteral.getLiteralLexicalForm() + KEY_SEPARATOR + 
predicate.getURI() + KEY_SEPARATOR + 
objectGeometryLiteral.getLiteralLexicalForm();
{code}

My suggestion is to use a separate {{Node -> Integer}} (or {{Long}} Guava cache 
and use the long values instead to generate the cache key. Or any other more 
efficient datastructure, not even sure if a String is necessary?






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to