[
https://issues.apache.org/jira/browse/JENA-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15140948#comment-15140948
]
Rob Vesse commented on JENA-1138:
---------------------------------
The default in-memory dataset changed in 3.0.1 to a different implementation
that likely has more overheads than the old implementation hence the OOM issue
and the change in runtime.
The amount of time before the failure is interesting, it may be that your
dataset causes a lot of GC thrashing for some reason. Can you please attach a
sample dataset that shows the issue so we can attempt to reproduce?
Note that 1.6 million triples may be relatively small in general terms but for
a single use command line dataset it is rather large. If you are frequently
using this pattern on datasets with millions of triples you would likely get
much better performance by first creating new TDB databases and then using the
{{tdbquery}} tool on those
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> ------------------------------------------------------
>
> Key: JENA-1138
> URL: https://issues.apache.org/jira/browse/JENA-1138
> Project: Apache Jena
> Issue Type: Bug
> Components: Cmd line tools
> Affects Versions: Jena 3.0.1
> Environment: Oracle JDK 1.8.0, Windows 7 64bit
> Reporter: Giovanni Mels
> Labels: performance
>
> Since 3.0.1 we get "java.lang.OutOfMemoryError: GC overhead limit exceeded"
> exceptions when using the {{sparql}} command line tool, even on relative
> small datasets (~1.6 million triples).
> The issue occurs when the dataset is loaded in memory, so before the actual
> query execution.
> {code}
> sparql --query empty.rq --data sample-data.ttl
> {code}
> (empty.rq = _SELECT * WHERE \{ \}_)
> This query takes ~20 seconds using Jena 2.13.0 and Jena 3.0.0, it fails with
> 3.0.1 after ~4 minutes with "java.lang.OutOfMemoryError: GC overhead limit
> exceeded".
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)