[ 
https://issues.apache.org/jira/browse/JENA-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15140962#comment-15140962
 ] 

A. Soroka commented on JENA-1138:
---------------------------------

I am curious to know exactly how the dataset in question was built, i.e. what 
method from {{DatasetFactory}} was called. It's certainly possible to use the 
new (transactional) dataset or the older (non-transactional, but much leaner) 
dataset. The current results of {{DatasetFactory::create}} and 
{{DatasetFactory::createGeneral}} are going to be the _old_ dataset impl, not 
the new one. Only by using {{DatasetFactory::createTxnMem}} would you get the 
new one. What's more, the way you use the new dataset can make a difference: 
using transactions properly can lower the running costs, although as [~rvesse] 
rightly says, the new in-memory dataset is inherently much hungrier for memory.

Generally, the new dataset impl has the characteristics that loading is slower 
(although normally not by nearly the factor you describe) but querying is 
faster (depending on your query, of course). 

> java.lang.OutOfMemoryError: GC overhead limit exceeded
> ------------------------------------------------------
>
>                 Key: JENA-1138
>                 URL: https://issues.apache.org/jira/browse/JENA-1138
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: Cmd line tools
>    Affects Versions: Jena 3.0.1
>         Environment: Oracle JDK 1.8.0, Windows 7 64bit
>            Reporter: Giovanni Mels
>              Labels: performance
>
> Since 3.0.1 we get {{java.lang.OutOfMemoryError: GC overhead limit exceeded}} 
> exceptions when using the {{sparql}} command line tool, even on relative 
> small datasets (~1.6 million triples).
> The issue occurs when the dataset is loaded in memory, so before the actual 
> query execution. 
> {code}
> sparql --query empty.rq --data sample-data.ttl
> {code}
> Where {{empty.rq}} contains:
> {noformat}
> SELECT * WHERE {}
> {noformat}
> This query takes ~20 seconds using Jena 2.13.0 and Jena 3.0.0, it fails with 
> 3.0.1 after ~4 minutes with {{java.lang.OutOfMemoryError: GC overhead limit 
> exceeded}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to