[
https://issues.apache.org/jira/browse/JENA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14636592#comment-14636592
]
Eugene Tenkaev edited comment on JENA-985 at 7/22/15 9:37 AM:
--------------------------------------------------------------
{quote}How much data is in the data extract you are using?{quote}
Data is in NT format: 1,74 GB + 2,76 GB = 4,5 GB
Data is DBpedia dumps:
http://downloads.dbpedia.org/2014/en/short_abstracts_en.nt.bz2 and
http://downloads.dbpedia.org/2014/en/long_abstracts_en.nt.bz2
After converting this data to TDB dataset I got two folders with next sizes:
2,44 GB + 3,43 GB = 5,87 GB
{quote}Is it createDataset run and then printAllAbstracts in different JVM
runs?{quote}
No all in one.
{quote}And when does it run out of memory? (how many iterations of the
loop?){quote}
I don't count.
{quote}TDBMaker is not really supposed to be called by app code - can you load
with the bulk loader (tddloader)?{quote}
No we automatically download and create datasets and all this done by specific
worker in java.
was (Author: hronom):
{quote}How much data is in the data extract you are using?{quote}
Data is in NT format: 1,74 GB + 2,76 GB = 4,5 GB
After converting this data to TDB dataset I got two folders with next sizes:
2,44 GB + 3,43 GB = 5,87 GB
Data is DBpedia dumps:
http://downloads.dbpedia.org/2014/en/short_abstracts_en.nt.bz2 and
http://downloads.dbpedia.org/2014/en/long_abstracts_en.nt.bz2
{quote}Is it createDataset run and then printAllAbstracts in different JVM
runs?{quote}
No all in one.
{quote}And when does it run out of memory? (how many iterations of the
loop?){quote}
I don't count.
{quote}TDBMaker is not really supposed to be called by app code - can you load
with the bulk loader (tddloader)?{quote}
No we automatically download and create datasets and all this done by specific
worker in java.
> Iterate using Apache Jena ExtendedIterator on Graph with big amount of triples
> ------------------------------------------------------------------------------
>
> Key: JENA-985
> URL: https://issues.apache.org/jira/browse/JENA-985
> Project: Apache Jena
> Issue Type: Bug
> Components: Core
> Affects Versions: Jena 2.13.0
> Environment: *Hardware*
> Windows 7 64-bit
> Intel Core i7 4785T @ 2.20GHz
> RAM 16,0GB DDR3
> 465GB Samsung SSD 850 EVO 500G SCSI Disk Device (SSD)
> *Software environment*
> java version "1.7.0_75"
> Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)
> *Running options*
> VM options: -Xmx14g
> Reporter: Eugene Tenkaev
> Priority: Minor
>
> I'm generating Apache Jena Graph from DBpedia dumps and now I want iterate
> through all "dbpedia-owl:abstract".
> So I do something like this:
> {code:java}
> ExtendedIterator<Triple> iterator = Graph.find(Node.ANY,
> NodeFactory.createURI("dbpedia-owl:abstract"), Node.ANY);
> {code}
> But then I try to iterate, memory consumption is increased, so looks like
> "ExtendedIterator" store found nodes.
> I use VisualVM profiler and found that while I iterate, count of
> "com.hp.hpl.jena.graph.Node_URI" is increasing.
> I try to do "iterator.reset()" but this takes no effect.
> Is this bug or feature?:D
> Can I iterate through all DBpedia abstracts without storing nodes and without
> increasing consumption of memory that gc can't freed?
> Sorry for my bad english.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)