[
https://issues.apache.org/jira/browse/JENA-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141167#comment-15141167
]
A. Soroka commented on JENA-1138:
---------------------------------
Okay, I've got some profiling done, and not surprisingly, most of what's in the
heap when the JVM falls over are implementation classes from the persistent
data structure library that supports the new dataset implementation, mostly
millions of "maps" that are actually deltas inside a few "real" data
structures, in the manner typical of persistent data structures. Given that the
problem goes away with a slightly larger heap, I'm not immediately suspicious
of there being any rampantly unnecessary object creation here, but there are
definitely plenty of places where very short-lived objects are getting created
(e.g. {{Optional<V> PMap.get(K key)}}).
I would like to try the correction mentioned by [~rvesse] first, because I
think it will cut down the short-lived objects somewhat, and if that doesn't
produce results that seem acceptable, I can start optimizing the signatures in
the types under the new dataset impl to create more durable objects. [~rvesse]
can you please help me by creating a ticket for "transactionalizing" the
loading going on here and assigning it to me? I assume we are talking about
{{ModDatasetGeneral}} line 96, because the part of the reading in that class
that uses {{DatasetUtils}} appears to be transaction-aware.
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> ------------------------------------------------------
>
> Key: JENA-1138
> URL: https://issues.apache.org/jira/browse/JENA-1138
> Project: Apache Jena
> Issue Type: Bug
> Components: Cmd line tools
> Affects Versions: Jena 3.0.1
> Environment: Oracle JDK 1.8.0, Windows 7 64bit
> Reporter: Giovanni Mels
> Labels: performance
> Attachments: sample-data.zip
>
>
> Since 3.0.1 we get {{java.lang.OutOfMemoryError: GC overhead limit exceeded}}
> exceptions when using the {{sparql}} command line tool, even on relative
> small datasets (~1.6 million triples).
> The issue occurs when the dataset is loaded in memory, so before the actual
> query execution.
> {code}
> sparql --query empty.rq --data sample-data.ttl
> {code}
> Where {{empty.rq}} contains:
> {noformat}
> SELECT * WHERE {}
> {noformat}
> This query takes ~20 seconds using Jena 2.13.0 and Jena 3.0.0, it fails with
> 3.0.1 after ~4 minutes with {{java.lang.OutOfMemoryError: GC overhead limit
> exceeded}}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)