[ 
https://issues.apache.org/jira/browse/JENA-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141167#comment-15141167
 ] 

A. Soroka commented on JENA-1138:
---------------------------------

Okay, I've got some profiling done, and not surprisingly, most of what's in the 
heap when the JVM falls over are implementation classes from the persistent 
data structure library that supports the new dataset implementation, mostly 
millions of "maps" that are actually deltas inside a few "real" data 
structures, in the manner typical of persistent data structures. Given that the 
problem goes away with a slightly larger heap, I'm not immediately suspicious 
of there being any rampantly unnecessary object creation here, but there are 
definitely plenty of places where very short-lived objects are getting created 
(e.g. {{Optional<V> PMap.get(K key)}}).

I would like to try the correction mentioned by [~rvesse] first, because I 
think it will cut down the short-lived objects somewhat, and if that doesn't 
produce results that seem acceptable, I can start optimizing the signatures in 
the types under the new dataset impl to create more durable objects. [~rvesse] 
can you please help me by creating a ticket for "transactionalizing" the 
loading going on here and assigning it to me? I assume we are talking about 
{{ModDatasetGeneral}} line 96, because the part of the reading in that class 
that uses {{DatasetUtils}} appears to be transaction-aware.

> java.lang.OutOfMemoryError: GC overhead limit exceeded
> ------------------------------------------------------
>
>                 Key: JENA-1138
>                 URL: https://issues.apache.org/jira/browse/JENA-1138
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: Cmd line tools
>    Affects Versions: Jena 3.0.1
>         Environment: Oracle JDK 1.8.0, Windows 7 64bit
>            Reporter: Giovanni Mels
>              Labels: performance
>         Attachments: sample-data.zip
>
>
> Since 3.0.1 we get {{java.lang.OutOfMemoryError: GC overhead limit exceeded}} 
> exceptions when using the {{sparql}} command line tool, even on relative 
> small datasets (~1.6 million triples).
> The issue occurs when the dataset is loaded in memory, so before the actual 
> query execution. 
> {code}
> sparql --query empty.rq --data sample-data.ttl
> {code}
> Where {{empty.rq}} contains:
> {noformat}
> SELECT * WHERE {}
> {noformat}
> This query takes ~20 seconds using Jena 2.13.0 and Jena 3.0.0, it fails with 
> 3.0.1 after ~4 minutes with {{java.lang.OutOfMemoryError: GC overhead limit 
> exceeded}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to